On Mon, Nov 07, 2016 at 11:09:10AM +0000, Brian Candler wrote: > On 07/11/2016 10:42, Stefan Hajnoczi wrote: > > Let's try to isolate the cause of this crash: > > > > Are you able to switch -netdev user to -netdev tap so we can rule out > > the slirp user network stack as the source of memory corruption? > Let me try to set that up. Using packer.io, I will have to start a VM by > hand, and then use the 'null' builder to ssh to the existing VM (whereas > normally packer fires up the qemu process by itself) > > > Alternatively could you re-run with virtio-blk instead of virtio-scsi to > > see if that eliminates crashes? > This is what I got after changing to virtio: > > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/local/bin/qemu-system-x86_64 -netdev > user,id=user.0,hostfwd=tcp::2521-:22'. > Program terminated with signal SIGABRT, Aborted. > #0 0x00007fa76d645428 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:54 > 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > [Current thread is 1 (Thread 0x7fa76f065a80 (LWP 18155))] > (gdb) bt > #0 0x00007fa76d645428 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:54 > #1 0x00007fa76d64702a in __GI_abort () at abort.c:89 > #2 0x00007fa76d63dbd7 in __assert_fail_base (fmt=<optimised out>, > assertion=assertion@entry=0x5629cea98cd5 "mr != NULL", > file=file@entry=0x5629cea7a884 "/home/nsrc/qemu-2.7.0/exec.c", > line=line@entry=2967, > function=function@entry=0x5629cea7af00 <__PRETTY_FUNCTION__.42881> > "address_space_unmap") > at assert.c:92 > #3 0x00007fa76d63dc82 in __GI___assert_fail ( > assertion=assertion@entry=0x5629cea98cd5 "mr != NULL", > file=file@entry=0x5629cea7a884 "/home/nsrc/qemu-2.7.0/exec.c", > line=line@entry=2967, > function=function@entry=0x5629cea7af00 <__PRETTY_FUNCTION__.42881> > "address_space_unmap") > at assert.c:101 > #4 0x00005629ce6c0ffe in address_space_unmap (as=<optimised out>, > buffer=<optimised out>, > len=<optimised out>, is_write=1, access_len=4096) at > /home/nsrc/qemu-2.7.0/exec.c:2967 > #5 0x00005629ce743beb in virtqueue_unmap_sg > (elem=elem@entry=0x5629d29d5290, len=len@entry=61441, > vq=0x5629d13186b0) at /home/nsrc/qemu-2.7.0/hw/virtio/virtio.c:254 > #6 0x00005629ce744422 in virtqueue_fill (vq=vq@entry=0x5629d13186b0, > elem=elem@entry=0x5629d29d5290, len=61441, idx=idx@entry=0) > at /home/nsrc/qemu-2.7.0/hw/virtio/virtio.c:282 > #7 0x00005629ce7445db in virtqueue_push (vq=0x5629d13186b0, > elem=elem@entry=0x5629d29d5290, > len=<optimised out>) at /home/nsrc/qemu-2.7.0/hw/virtio/virtio.c:308 > #8 0x00005629ce71894d in virtio_blk_req_complete > (req=req@entry=0x5629d29d5290, > status=status@entry=0 '\000') at > /home/nsrc/qemu-2.7.0/hw/block/virtio-blk.c:58 > #9 0x00005629ce718b59 in virtio_blk_rw_complete (opaque=<optimised out>, > ret=0) > at /home/nsrc/qemu-2.7.0/hw/block/virtio-blk.c:121 > #10 0x00005629ce98025a in blk_aio_complete (acb=0x5629d298f370) > at /home/nsrc/qemu-2.7.0/block/block-backend.c:923 > #11 0x00005629ce9efaea in coroutine_trampoline (i0=<optimised out>, > i1=<optimised out>) > at /home/nsrc/qemu-2.7.0/util/coroutine-ucontext.c:78 > #12 0x00007fa76d65a5d0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #13 0x00007ffee3d75a20 in ?? () > #14 0x2d2d2d2d2d2d2d2d in ?? () > ---Type <return> to continue, or q <return> to quit--- > #15 0x00000000000000f0 in ?? () > #16 0x0000000000000000 in ?? () > > Aside: I see "virtqueue_unmap_sg" in the backtrace. Is this correct even for > a non-SCSI virtio?
Great, now we know virtio-scsi is not causing this crash. virtqueue_unmap_sg() is used by all virtio devices. "sg" means scatter-gather list. It's unmapping the buffers that the guest passed to the host. > The command line was something like this (captured by running packer another > time, so the ports and filenames are not exactly the same) > > /usr/local/bin/qemu-system-x86_64 -m 4G -vnc [::]:59 -machine > type=pc,accel=kvm -netdev user,id=user.0,hostfwd=tcp::2879-:22 -boot c -smp > 8,sockets=1,cores=4,threads=2 -name vtp-nmm-201611071057.qcow2 -device > virtio-net,netdev=user.0 -drive > file=output-qemu-vtp-nmm/vtp-nmm-201611071057.qcow2,if=virtio,cache=writeback,discard=ignore,format=qcow2 > > > > The core dumps are likely to contain more clues. If you are comfortable > > with gdb and debugging C code you could dump the memory surround where > > the junk value (mr) was loaded from. Perhaps there is a hint about who > > zeroed the memory. In the first core dump you could start with: > > > > (gdb) up 6 # go to the dma_blk_unmap() stack frame > > (gdb) p *(DMAAIOCB *)0x560909ceca90 > > (gdb) p *((DMAAIOCB *)0x560909ceca90).sg > > (gdb) up 6 > #6 dma_blk_unmap (dbs=dbs@entry=0x560909ceca90) at > /home/nsrc/qemu-2.7.0/dma-helpers.c:102 > 102 dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base, > (gdb) p *(DMAAIOCB *)0x560909ceca90 > $1 = {common = {aiocb_info = 0x560907c15690 <dma_aiocb_info>, bs = 0x0, > cb = 0x56090767e250 <scsi_dma_complete>, opaque = 0x560909c2b8e0, refcnt > = 1}, > ctx = 0x5609087d82a0, acb = 0x0, sg = 0x560909af7430, offset = 4302675968, > dir = DMA_DIRECTION_FROM_DEVICE, sg_cur_index = 126, sg_cur_byte = 0, iov > = { > iov = 0x560909c6e960, niov = 126, nalloc = 126, size = 1048576}, bh = > 0x0, > io_func = 0x56090767d110 <scsi_dma_readv>, io_func_opaque = > 0x560909c2b8e0} > (gdb) p *((DMAAIOCB *)0x560909ceca90).sg > $2 = {sg = 0x560909fab1e0, nsg = 126, nalloc = 143, size = 1048576, dev = > 0x5609087e5630, > as = 0x560907e20480 <address_space_memory>} > (gdb) > > I'm comfortable with C, but don't really know what I'm looking for, nor what > the data structures represent :-) > > (gdb) p dbs->iov.niov > $3 = 126 > (gdb) p i > $4 = 125 > > ...so it appears it was in the last iteration of the loop. > > (gdb) print dbs->sg->as > $5 = (AddressSpace *) 0x560907e20480 <address_space_memory> > (gdb) print dbs->iov.iov[i].iov_base > $6 = (void *) 0x7f354099e000 > (gdb) print dbs->iov.iov[i].iov_len > $7 = 8192 > (gdb) print dbs->dir > $8 = DMA_DIRECTION_FROM_DEVICE > > Unfortunately, much has been inlined: > > (gdb) frame 4 > #4 0x000056090749dffe in address_space_unmap (as=<optimised out>, > buffer=<optimised out>, > len=<optimised out>, is_write=1, access_len=8192) at > /home/nsrc/qemu-2.7.0/exec.c:2967 > 2967 assert(mr != NULL); > (gdb) print mr > $9 = (MemoryRegion *) 0x0 > (gdb) print buffer > $10 = <optimised out> buffer should be 0x7f354099e000. memory_region_from_host() returned NULL because it was unable to find the MemoryRegion for this host address. Are you hotplugging and devices or adding/removing memory from the guest? Stefan
signature.asc
Description: PGP signature