On 12/08/2015 01:00 PM, Cornelia Huck wrote: > On Tue, 8 Dec 2015 10:59:54 +0100 > Kevin Wolf <kw...@redhat.com> wrote: > >> Am 07.12.2015 um 17:42 hat Cornelia Huck geschrieben: >>> On Mon, 7 Dec 2015 11:02:51 +0100 >>> Cornelia Huck <cornelia.h...@de.ibm.com> wrote: >>> >>>> On Thu, 3 Dec 2015 13:00:00 +0800 >>>> Stefan Hajnoczi <stefa...@redhat.com> wrote: >>>> >>>>> From: Fam Zheng <f...@redhat.com> >>>>> >>>>> The assertion problem was noticed in 06c3916b35a, but it wasn't >>>>> completely fixed, because even though the req is not marked as >>>>> serialising, it still gets serialised by wait_serialising_requests >>>>> against other serialising requests, which could lead to the same >>>>> assertion failure. >>>>> >>>>> Fix it by even more explicitly skipping the serialising for this >>>>> specific case. >>>>> >>>>> Signed-off-by: Fam Zheng <f...@redhat.com> >>>>> Message-id: 1448962590-2842-2-git-send-email-f...@redhat.com >>>>> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> >>>>> --- >>>>> block/backup.c | 2 +- >>>>> block/io.c | 12 +++++++----- >>>>> include/block/block.h | 4 ++-- >>>>> trace-events | 2 +- >>>>> 4 files changed, 11 insertions(+), 9 deletions(-) >>>> >>>> This one causes segfaults for me: >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071 >>>> 3071 if (!drv) { >>>> >>>> (gdb) bt >>>> #0 bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071 >> >> This looks like some kind of memory corruption that hit blk->bs. It's >> most definitely not a valid pointer anyway. >> >>>> #1 0x0000000080216974 in blk_is_inserted (blk=<optimized out>) >>>> at /data/git/yyy/qemu/block/block-backend.c:986 >>>> #2 0x00000000802169c6 in blk_is_available (blk=blk@entry=0x3ffb17e7960) >>>> at /data/git/yyy/qemu/block/block-backend.c:991 >>>> #3 0x0000000080216d12 in blk_check_byte_request >>>> (blk=blk@entry=0x3ffb17e7960, >>>> offset=offset@entry=4928966656, size=16384) >>>> at /data/git/yyy/qemu/block/block-backend.c:558 >>>> #4 0x0000000080216df2 in blk_check_request (blk=blk@entry=0x3ffb17e7960, >>>> sector_num=sector_num@entry=9626888, nb_sectors=nb_sectors@entry=32) >>>> at /data/git/yyy/qemu/block/block-backend.c:589 >>>> #5 0x0000000080217ee8 in blk_aio_readv (blk=0x3ffb17e7960, sector_num= >>>> 9626888, iov=0x8098c658, nb_sectors=<optimized out>, cb= >>>> 0x80081150 <virtio_blk_rw_complete>, opaque=0x80980620) >>>> at /data/git/yyy/qemu/block/block-backend.c:727 >>>> #6 0x000000008008186e in submit_requests (niov=<optimized out>, >>>> num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>, >>>> blk=<optimized out>) at /data/git/yyy/qemu/hw/block/virtio-blk.c:366 >>>> #7 virtio_blk_submit_multireq (mrb=<optimized out>, blk=<optimized out>) >>>> at /data/git/yyy/qemu/hw/block/virtio-blk.c:444 >>>> #8 virtio_blk_submit_multireq (blk=0x3ffb17e7960, mrb=0x3ffffffeb58) >>>> at /data/git/yyy/qemu/hw/block/virtio-blk.c:389 >>>> #9 0x00000000800823ee in virtio_blk_handle_output (vdev=<optimized out>, >>>> vq=<optimized out>) at /data/git/yyy/qemu/hw/block/virtio-blk.c:615 >>>> #10 0x00000000801e367e in aio_dispatch (ctx=0x80918520) >>>> at /data/git/yyy/qemu/aio-posix.c:326 >>>> #11 0x00000000801d28b0 in aio_ctx_dispatch (source=<optimized out>, >>>> callback=<optimized out>, user_data=<optimized out>) >>>> at /data/git/yyy/qemu/async.c:231 >>>> #12 0x000003fffd36a05a in g_main_context_dispatch () >>>> from /lib64/libglib-2.0.so.0 >>>> #13 0x00000000801e0ffa in glib_pollfds_poll () >>>> at /data/git/yyy/qemu/main-loop.c:211 >>>> #14 os_host_main_loop_wait (timeout=<optimized out>) >>>> at /data/git/yyy/qemu/main-loop.c:256 >>>> #15 main_loop_wait (nonblocking=<optimized out>) >>>> at /data/git/yyy/qemu/main-loop.c:504 >>>> #16 0x00000000800148a6 in main_loop () at /data/git/yyy/qemu/vl.c:1923 >>>> #17 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) >>>> at /data/git/yyy/qemu/vl.c:4684 >>>> >>>> Relevant part of command line: >>>> >>>> -drive >>>> file=/dev/sda,if=none,id=drive-virtio-disk0,format=raw,serial=ccwzfcp1,cache=none >>>> -device >>>> virtio-blk-ccw,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,scsi=off >>> >>> I played around a bit. The main part of this change seems to be calling >>> wait_serialising_requests() conditionally; reverting this makes the >>> guest boot again. >>> >>> I then tried to find out when wait_serialising_requests() was NOT >>> called and added fprintfs: well, it was _always_ called. I then added a >>> fprintf for flags at the beginning of the function: this produced a >>> segfault no matter whether wait_serialising_requests() was called >>> conditionally or unconditionally. Weird race? >>> >>> Anything further I can do? I guess this patch fixes a bug for someone, >>> but it means insta-death for my setup... >> >> If it happens immediately, perhaps running under valgrind is possible >> and could give some hints about what happened with blk->bs? > > Just a quick update: This triggers on a qemu built with a not-so-fresh > gcc 4.7.2 (and it seems to depend on compiler optimizations as well). > We can't trigger it with newer gccs. No hints from valgrind, sadly. > Investigation ongoing. > >
Some updates after looking at Cornelias system. It seem to be related to the F18 gcc that is still on that test system. Problem happens when hw/block/virtio-blk.c is compiled with -O2 and goes away with -O1 and -O0 (I trimmed that down to -fexpensive-optimizations) The system calls virtio_blk_submit_multireq 26 times and then it messes up the blk pointer: (gdb) display blk->name 1: blk->name = 0x80894300 "drive-virtio-disk0" (gdb) next 419 if (niov + req->qiov.niov > IOV_MAX) { 1: blk->name = 0x80894300 "drive-virtio-disk0" (gdb) 424 if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > max_xfer_len) { 1: blk->name = 0x80894300 "drive-virtio-disk0" (gdb) 419 if (niov + req->qiov.niov > IOV_MAX) { 1: blk->name = 0x80894300 "drive-virtio-disk0" (gdb) 429 if (sector_num + nb_sectors != req->sector_num) { 1: blk->name = 0x80894300 "drive-virtio-disk0" (gdb) 434 submit_requests(blk, mrb, start, num_reqs, niov); 1: blk->name = 0x80894300 "drive-virtio-disk0" (gdb) 413 for (i = 0; i < mrb->num_reqs; i++) { 1: blk->name = 0x8089ae40 "" uninlining submit_request makes the problem go away, so might be a gcc bug in Fedora18. I am now going to look at the disassembly to be sure. Christian