On Mon, Mar 07, 2016 at 08:00:49PM +0100, Christian Borntraeger wrote: > On 03/07/2016 06:01 PM, Stefan Hajnoczi wrote: > > On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote: > >> Folks, > >> > >> I had a crash of a qemu guest in tracked_request_begin. > >> The testcase was a guest with ramdisk/kernel that reboots in a > >> loop. (about 10 times per second) with a single null-co disk > >> attached. No idea how to reproduce this, seems to be a lucky hit. > >> > >> (gdb) bt > >> #0 0x00000000101db5ba in tracked_request_begin > >> (req=req@entry=0x3ff90f1bdc0, bs=bs@entry=0x42a39190, > >> offset=offset@entry=0, bytes=bytes@entry=4096, > >> type=type@entry=BDRV_TRACKED_READ) > >> at /home/cborntra/REPOS/qemu/block/io.c:390 > >> #1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, > >> bytes=4096, qiov=0x3ff7400cbd8, flags=<optimized out>, > >> flags@entry=(unknown: 0)) > >> at /home/cborntra/REPOS/qemu/block/io.c:1001 > >> #2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), > >> qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized > >> out>, bs=<optimized out>) > >> at /home/cborntra/REPOS/qemu/block/io.c:1024 > >> #3 bdrv_co_do_rw (opaque=0x3ff7400e370) at > >> /home/cborntra/REPOS/qemu/block/io.c:2173 > >> #4 0x000000001022d8f6 in coroutine_trampoline (i0=<optimized out>, > >> i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79 > >> #5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6 > >> > >> looking at the code we are at > >> > >> QLIST_INSERT_HEAD(&bs->tracked_requests, req, list); > >> which translates to > >> > >> if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL) > >> (&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next; > >> (&bs->tracked_requests)->lh_first = (req); > >> (req)->list.le_prev = &(&bs->tracked_requests)->lh_first; > >> > >> gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile > >> (gdb) print /x bs->tracked_requests > >> $6 = {lh_first = 0x0} > >> > >> Now looking at the code I am asking myself if this can happen in parallel > >> to another code that touches tracked_requests, because gcc seems to read > >> &bs->tracked_requests)->lh_first twice (first to check the value, then > >> to use it as pointer) > > > > tracked_requests is protected by AioContext. Perhaps something is doing > > I/O without acquiring AioContext? > > Hmm, the guest was rebooting, which resets all devices. Maybe something > in that code is still not right? I will have a look.
virtio_blk_reset() does acquire AioContext so at least that part should be safe with running IOThreads. Stefan
signature.asc
Description: PGP signature