On 03/09/2016 01:55 PM, Paolo Bonzini wrote: > > > On 09/03/2016 13:21, Christian Borntraeger wrote: >> I have some random crashes at startup >> >> Stack trace of thread 48326: >> #0 0x000002aa2e0cce46 bdrv_co_do_rw (qemu-system-s390x) >> #1 0x000002aa2e159e8e coroutine_trampoline >> (qemu-system-s390x) >> #2 0x000003ffbc35150a __makecontext_ret (libc.so.6) >> >> >> that I was able to bisect. >> commit 2906cddfecff21af20eedab43288b485a679f9ac does crash regularly, >> 2906cddfecff21af20eedab43288b485a679f9ac^ does not. >> >> I will try to find somebody that looks into that - unless you have an idea. > > The only random idea is to move > > vblk->dataplane_started = true > > to the beginning of virtio_blk_data_plane_start rather than the end. > > Paolo
FWIW, it seems that this patch triggers this error, the "tracked_request_begin" that I reported yesterday and / or some early read issues from the bootloader in a random fashion. Using 2906cddfecff21af20eedab43288b485a679f9ac^ seems to work all the time, moving around vblk->dataplane_started = true also triggers all 3 types of bugs, e.g. Thread 1 (Thread 0x3ffaabff910 (LWP 32782)): #0 0x0000000010329a70 in bdrv_co_do_rw (opaque=0x0) at /home/cborntra/REPOS/qemu/block/io.c:2170 #1 0x00000000103b2e7a in coroutine_trampoline (i0=1023, i1=-2147470992) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79 #2 0x000003ffac85150a in __makecontext_ret () from /lib64/libc.so.6 (gdb) list 2165 2166 /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */ 2167 static void coroutine_fn bdrv_co_do_rw(void *opaque) 2168 { 2169 BlockAIOCBCoroutine *acb = opaque; 2170 BlockDriverState *bs = acb->common.bs; 2171 2172 if (!acb->is_write) { 2173 acb->req.error = bdrv_co_do_readv(bs, acb->req.sector, 2174 acb->req.nb_sectors, acb->req.qiov, acb->req.flags); I will try to find somebody to work on this.