Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
On Thu, Oct 1, 2015 at 5:20 AM, Hannes Reinecke <h...@suse.de> wrote: > On 10/01/2015 11:00 AM, Michael S. Tsirkin wrote: >> On Thu, Oct 01, 2015 at 03:10:14AM +0200, Thomas D. wrote: >>> Hi, >>> >>> I have a virtual machine which fails to boot linux-4.1.8 while mounting >>> file systems: >>> >>>> * Mounting local filesystem ... >>>> ----[ cut here ] >>>> kernel BUG at drivers/block/virtio_blk.c:172! >>>> invalid opcode: 000 [#1] SMP >>>> Modules linked in: pcspkr psmouse dm_log_userspace virtio_net e1000 fuse >>>> nfs lockd grace sunrpc fscache dm_snapshot dm_bufio dm_mirror >>>> dm_region_hash dm_log usbhid usb_storage sr_mod cdrom >>>> CPU: 7 PIDL 2254 Comm: dmcrypt_write Not tainted 4.1.8-gentoo #1 >>>> Hardware name: Red Hat KVM, BIOS seabios-1.7.5-8.el7 04/01/2014 >>>> task: 88061fb7 ti: 88061ff3 task.ti: 88061ff3 >>>> RIP: 0010:[] [] >>>> virtio_queue_rq+0x210/0x2b0 >>>> RSP: 0018:88061ff33ba8 EFLAGS: 00010202 >>>> RAX: 00b1 RBX: 88061fb2fc00 RCX: 88061ff33c30 >>>> RDX: 0008 RSI: 88061ff33c50 RDI: 88061fb2fc00 >>>> RBP: 88061ff33bf8 R08: 88061eef3540 R09: 88061ff33c30 >>>> R10: R11: 00af R12: >>>> R13: 88061eef3540 R14: 88061eef3540 R15: 880622c7ca80 >>>> FS: () GS:88063fdc() >>>> knlGS: >>>> CS: 0010 DS: ES: CR0: 80050033 >>>> CR2: 01ffe468 CR3: bb343000 CR4: 001406e0 >>>> Stack: >>>> 880622d4c478 88061ff33bd8 88061fb2f >>>> 0001 88061fb2fc00 88061ff33c30 0 >>>> 88061eef3540 88061ff33c98 b43eb >>>> >>>> Call Trace: >>>> [] __blk_mq_run_hw_queue+0x1d0/0x370 >>>> [] blk_mq_run_hw_queue+0x95/0xb0 >>>> [] blk_mq_flush_plug_list+0x129/0x140 >>>> [] blk_finish_plug+0x18/0x50 >>>> [] dmcrypt_write+0x1da/0x1f0 >>>> [] ? wake_up_state+0x20/0x20 >>>> [] ? crypt_iv_lmk_dtr+0x60/0x60 >>>> [] kthread_create_on_node+0x180/0x180 >>>> [] ret_from_fork+0x42/0x70 >>>> [] ? kthread_create_on_node+0x180/0x180 >>>> Code: 00 41 c7 85 78 01 00 00 08 00 00 00 49 c7 85 80 01 00 00 00 00 >>>> 00 00 41 89 85 7c 01 00 00 e9 93 fe ff ff 66 0f 1f 44 00 00 <0f> 0b 66 0f >>>> 1f 44 00 00 49 8b 87 b0 00 00 00 41 83 e6 ef 4a 8b >>>> RIP [] virtio_queue_rq+0x210/0x2b0 >>>> RSP: >>>> ---[ end trace 8078357c459d5fc0 ]--- >> >> >> So this BUG_ON is from 1cf7e9c68fe84248174e998922b39e508375e7c1. >> commit 1cf7e9c68fe84248174e998922b39e508375e7c1 >> Author: Jens Axboe <ax...@kernel.dk> >> Date: Fri Nov 1 10:52:52 2013 -0600 >> >> virtio_blk: blk-mq support >> >> >> BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); >> >> >> On probe, we do >> /* We can handle whatever the host told us to handle. */ >> blk_queue_max_segments(q, vblk->sg_elems-2); >> >> >> To debug this, >> maybe you can print out sg_elems at init time and when this fails, >> to make sure some kind of memory corruption >> does not change sg_elems after initialization? >> >> >> Jens, how may we get more segments than blk_queue_max_segments? >> Is driver expected to validate and drop such requests? >> > Whee! I'm not alone anymore! > > I have seen similar issues even on non-mq systems; occasionally > I'm hitting this bug in drivers/scsi/scsi_lib.c:scsi_init_io() > > count = blk_rq_map_sg(req->q, req, sdb->table.sgl); > BUG_ON(count > sdb->table.nents); > > There are actually two problems here: > The one is that blk_rq_map_sg() requires a table (ie the last > argument), but doesn't have any indications on how large the > table is. > So one needs to check if the returned number of mapped sg > elements exceeds the number of elements in the table. > If so we already _have_ a memory overflow, and the only > thing we can to is sit in a corner and cry. > This really would need to be fixed up, eg by adding > another argument for the table size. > > This other problem is that this _really_ shouldn't happen, > and points to some issue with the block layer in general. > Which I've been trying to find for several months now, > but no avail :-( This particular dm-crypt on virtio-blk issue is fixed with this commit: http://git.kernel.org/linus/586b286b110e94eb31840ac5afc0c24e0881fe34 Linus pulled this into v4.3-rc3. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
On Thu, Oct 01, 2015 at 12:16:04PM +0200, Jens Axboe wrote: > On 10/01/2015 11:00 AM, Michael S. Tsirkin wrote: > >On Thu, Oct 01, 2015 at 03:10:14AM +0200, Thomas D. wrote: > >>Hi, > >> > >>I have a virtual machine which fails to boot linux-4.1.8 while mounting > >>file systems: > >> > >>>* Mounting local filesystem ... > >>>----[ cut here ] > >>>kernel BUG at drivers/block/virtio_blk.c:172! > >>>invalid opcode: 000 [#1] SMP > >>>Modules linked in: pcspkr psmouse dm_log_userspace virtio_net e1000 fuse > >>>nfs lockd grace sunrpc fscache dm_snapshot dm_bufio dm_mirror > >>>dm_region_hash dm_log usbhid usb_storage sr_mod cdrom > >>>CPU: 7 PIDL 2254 Comm: dmcrypt_write Not tainted 4.1.8-gentoo #1 > >>>Hardware name: Red Hat KVM, BIOS seabios-1.7.5-8.el7 04/01/2014 > >>>task: 88061fb7 ti: 88061ff3 task.ti: 88061ff3 > >>>RIP: 0010:[] [] > >>>virtio_queue_rq+0x210/0x2b0 > >>>RSP: 0018:88061ff33ba8 EFLAGS: 00010202 > >>>RAX: 00b1 RBX: 88061fb2fc00 RCX: 88061ff33c30 > >>>RDX: 0008 RSI: 88061ff33c50 RDI: 88061fb2fc00 > >>>RBP: 88061ff33bf8 R08: 88061eef3540 R09: 88061ff33c30 > >>>R10: R11: 00af R12: > >>>R13: 88061eef3540 R14: 88061eef3540 R15: 880622c7ca80 > >>>FS: () GS:88063fdc() > >>>knlGS: > >>>CS: 0010 DS: ES: CR0: 80050033 > >>>CR2: 01ffe468 CR3: bb343000 CR4: 001406e0 > >>>Stack: > >>> 880622d4c478 88061ff33bd8 88061fb2f > >>> 0001 88061fb2fc00 88061ff33c30 0 > >>> 88061eef3540 88061ff33c98 b43eb > >>> > >>>Call Trace: > >>> [] __blk_mq_run_hw_queue+0x1d0/0x370 > >>> [] blk_mq_run_hw_queue+0x95/0xb0 > >>> [] blk_mq_flush_plug_list+0x129/0x140 > >>> [] blk_finish_plug+0x18/0x50 > >>> [] dmcrypt_write+0x1da/0x1f0 > >>> [] ? wake_up_state+0x20/0x20 > >>> [] ? crypt_iv_lmk_dtr+0x60/0x60 > >>> [] kthread_create_on_node+0x180/0x180 > >>> [] ret_from_fork+0x42/0x70 > >>> [] ? kthread_create_on_node+0x180/0x180 > >>>Code: 00 41 c7 85 78 01 00 00 08 00 00 00 49 c7 85 80 01 00 00 00 00 > >>>00 00 41 89 85 7c 01 00 00 e9 93 fe ff ff 66 0f 1f 44 00 00 <0f> 0b 66 0f > >>>1f 44 00 00 49 8b 87 b0 00 00 00 41 83 e6 ef 4a 8b > >>>RIP [] virtio_queue_rq+0x210/0x2b0 > >>> RSP: > >>>---[ end trace 8078357c459d5fc0 ]--- > > > > > >So this BUG_ON is from 1cf7e9c68fe84248174e998922b39e508375e7c1. > > commit 1cf7e9c68fe84248174e998922b39e508375e7c1 > > Author: Jens Axboe <ax...@kernel.dk> > > Date: Fri Nov 1 10:52:52 2013 -0600 > > > > virtio_blk: blk-mq support > > > > > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); > > > > > >On probe, we do > > /* We can handle whatever the host told us to handle. */ > > blk_queue_max_segments(q, vblk->sg_elems-2); > > > > > >To debug this, > >maybe you can print out sg_elems at init time and when this fails, > >to make sure some kind of memory corruption > >does not change sg_elems after initialization? > > > > > >Jens, how may we get more segments than blk_queue_max_segments? > >Is driver expected to validate and drop such requests? > > The answer is that this should not happen. If the driver informs of a limit > on the number of segments, that should never be exceeded. If it does, then > it's a bug in either the SG mapping, or in the building of the request - > either the request gets built too large for some reason, or the mapping > doesn't always coalesce segments even though it should. > > The problem is that we get notified out-of-band, when we attempt to push the > request to the driver. At this point, much of the context could be lost, > like it is in your case. > > Looking at the specific virtio_blk case, it does seem that it is > checking the segment count before mapping. Does the below fix the > problem, or does the BUG_ON() still trigger? Jens, I have no idea whether this is the right thing to do, so please merge this patch directly if it makes sense. > > diff
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
On 10/01/2015 11:00 AM, Michael S. Tsirkin wrote: On Thu, Oct 01, 2015 at 03:10:14AM +0200, Thomas D. wrote: Hi, I have a virtual machine which fails to boot linux-4.1.8 while mounting file systems: * Mounting local filesystem ... [ cut here ] kernel BUG at drivers/block/virtio_blk.c:172! invalid opcode: 000 [#1] SMP Modules linked in: pcspkr psmouse dm_log_userspace virtio_net e1000 fuse nfs lockd grace sunrpc fscache dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log usbhid usb_storage sr_mod cdrom CPU: 7 PIDL 2254 Comm: dmcrypt_write Not tainted 4.1.8-gentoo #1 Hardware name: Red Hat KVM, BIOS seabios-1.7.5-8.el7 04/01/2014 task: 88061fb7 ti: 88061ff3 task.ti: 88061ff3 RIP: 0010:[] [] virtio_queue_rq+0x210/0x2b0 RSP: 0018:88061ff33ba8 EFLAGS: 00010202 RAX: 00b1 RBX: 88061fb2fc00 RCX: 88061ff33c30 RDX: 0008 RSI: 88061ff33c50 RDI: 88061fb2fc00 RBP: 88061ff33bf8 R08: 88061eef3540 R09: 88061ff33c30 R10: R11: 00af R12: R13: 88061eef3540 R14: 88061eef3540 R15: 880622c7ca80 FS: () GS:88063fdc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 01ffe468 CR3: bb343000 CR4: 001406e0 Stack: 880622d4c478 88061ff33bd8 88061fb2f 0001 88061fb2fc00 88061ff33c30 0 88061eef3540 88061ff33c98 b43eb Call Trace: [] __blk_mq_run_hw_queue+0x1d0/0x370 [] blk_mq_run_hw_queue+0x95/0xb0 [] blk_mq_flush_plug_list+0x129/0x140 [] blk_finish_plug+0x18/0x50 [] dmcrypt_write+0x1da/0x1f0 [] ? wake_up_state+0x20/0x20 [] ? crypt_iv_lmk_dtr+0x60/0x60 [] kthread_create_on_node+0x180/0x180 [] ret_from_fork+0x42/0x70 [] ? kthread_create_on_node+0x180/0x180 Code: 00 41 c7 85 78 01 00 00 08 00 00 00 49 c7 85 80 01 00 00 00 00 00 00 41 89 85 7c 01 00 00 e9 93 fe ff ff 66 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 49 8b 87 b0 00 00 00 41 83 e6 ef 4a 8b RIP [] virtio_queue_rq+0x210/0x2b0 RSP: ---[ end trace 8078357c459d5fc0 ]--- So this BUG_ON is from 1cf7e9c68fe84248174e998922b39e508375e7c1. commit 1cf7e9c68fe84248174e998922b39e508375e7c1 Author: Jens Axboe <ax...@kernel.dk> Date: Fri Nov 1 10:52:52 2013 -0600 virtio_blk: blk-mq support BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); On probe, we do /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk->sg_elems-2); To debug this, maybe you can print out sg_elems at init time and when this fails, to make sure some kind of memory corruption does not change sg_elems after initialization? Jens, how may we get more segments than blk_queue_max_segments? Is driver expected to validate and drop such requests? The answer is that this should not happen. If the driver informs of a limit on the number of segments, that should never be exceeded. If it does, then it's a bug in either the SG mapping, or in the building of the request - either the request gets built too large for some reason, or the mapping doesn't always coalesce segments even though it should. The problem is that we get notified out-of-band, when we attempt to push the request to the driver. At this point, much of the context could be lost, like it is in your case. Looking at the specific virtio_blk case, it does seem that it is checking the segment count before mapping. Does the below fix the problem, or does the BUG_ON() still trigger? diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 6ca35495a5be..1501701b0202 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -169,8 +169,6 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, int err; bool notify = false; - BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); - vbr->req = req; if (req->cmd_flags & REQ_FLUSH) { vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_FLUSH); @@ -203,6 +201,7 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, num = blk_rq_map_sg(hctx->queue, vbr->req, vbr->sg); if (num) { + BUG_ON(num + 2 > vblk->sg_elems); if (rq_data_dir(vbr->req) == WRITE) vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_OUT); else -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
Hi, Michael S. Tsirkin wrote: > To debug this, > maybe you can print out sg_elems at init time and when this fails, > to make sure some kind of memory corruption > does not change sg_elems after initialization? If that was addressed to me could you please tell me how to do that? -Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
Hi, Mike Snitzer wrote: > This particular dm-crypt on virtio-blk issue is fixed with this commit: > http://git.kernel.org/linus/586b286b110e94eb31840ac5afc0c24e0881fe34 > > Linus pulled this into v4.3-rc3. I have this patch applied to linux-4.1.9. This could be the reason why I don't see the issue on boot with linux-4.1.9. So is the freeze I am experiencing with linux-4.1.9 another issue introduced with linux-4.1.9 or another symptom/regression? Any idea? -Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
kernel BUG at drivers/block/virtio_blk.c:172!
Hi, I have a virtual machine which fails to boot linux-4.1.8 while mounting file systems: > * Mounting local filesystem ... > [ cut here ]---- > kernel BUG at drivers/block/virtio_blk.c:172! > invalid opcode: 000 [#1] SMP > Modules linked in: pcspkr psmouse dm_log_userspace virtio_net e1000 fuse nfs > lockd grace sunrpc fscache dm_snapshot dm_bufio dm_mirror dm_region_hash > dm_log usbhid usb_storage sr_mod cdrom > CPU: 7 PIDL 2254 Comm: dmcrypt_write Not tainted 4.1.8-gentoo #1 > Hardware name: Red Hat KVM, BIOS seabios-1.7.5-8.el7 04/01/2014 > task: 88061fb7 ti: 88061ff3 task.ti: 88061ff3 > RIP: 0010:[] [] > virtio_queue_rq+0x210/0x2b0 > RSP: 0018:88061ff33ba8 EFLAGS: 00010202 > RAX: 00b1 RBX: 88061fb2fc00 RCX: 88061ff33c30 > RDX: 0008 RSI: 88061ff33c50 RDI: 88061fb2fc00 > RBP: 88061ff33bf8 R08: 88061eef3540 R09: 88061ff33c30 > R10: R11: 00af R12: > R13: 88061eef3540 R14: 88061eef3540 R15: 880622c7ca80 > FS: () GS:88063fdc() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 01ffe468 CR3: bb343000 CR4: 001406e0 > Stack: > 880622d4c478 88061ff33bd8 88061fb2f > 0001 88061fb2fc00 88061ff33c30 0 > 88061eef3540 88061ff33c98 b43eb > > Call Trace: > [] __blk_mq_run_hw_queue+0x1d0/0x370 > [] blk_mq_run_hw_queue+0x95/0xb0 > [] blk_mq_flush_plug_list+0x129/0x140 > [] blk_finish_plug+0x18/0x50 > [] dmcrypt_write+0x1da/0x1f0 > [] ? wake_up_state+0x20/0x20 > [] ? crypt_iv_lmk_dtr+0x60/0x60 > [] kthread_create_on_node+0x180/0x180 > [] ret_from_fork+0x42/0x70 > [] ? kthread_create_on_node+0x180/0x180 > Code: 00 41 c7 85 78 01 00 00 08 00 00 00 49 c7 85 80 01 00 00 00 00 00 > 00 41 89 85 7c 01 00 00 e9 93 fe ff ff 66 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 > 00 00 49 8b 87 b0 00 00 00 41 83 e6 ef 4a 8b > RIP [] virtio_queue_rq+0x210/0x2b0 > RSP: > ---[ end trace 8078357c459d5fc0 ]--- System details: === VM (KVM), I don't know any details about the host system. The disk (vda) is partitioned in vda1 (ext4, /boot) and vda2 (LUKS-encrypted). On top of the LUKS volume I have LVM. On top of LVM I have several partitions, all XFS (so rootfs is on XFS!). 1/20 reboots seems to success. Today I tried to upgrade to linux-4.1.9. Mounting wasn't a problem anymore but after ~5min the system always dies with > NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! (CPU number varies) dmesg: > [0.00] Initializing cgroup subsys cpuset > [0.00] Initializing cgroup subsys cpu > [0.00] Initializing cgroup subsys cpuacct > [0.00] Linux version 4.1.8-gentoo (root@sysresccd) (gcc version 4.9.3 > (Gentoo 4.9.3 p1.2, pie-0.6.3) ) #1 SMP Thu Sep 24 03:44:37 CEST 2015 > [0.00] Command line: BOOT_IMAGE=/kernel-genkernel-x86_64-4.1.8-gentoo > root=/dev/mapper/vps1storage-volRoot ro dolvm > crypt_root=UUID=fdea18e4-ba0f-1234-1234-12345678 > root=UUID=86dad6e6-e43e-1234-1234-987654321 rootfs=xfs scandelay=3 nomodeset > [0.00] KERNEL supported cpus: > [0.00] Intel GenuineIntel > [0.00] e820: BIOS-provided physical RAM map: > [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable > [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved > [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved > [0.00] BIOS-e820: [mem 0x0010-0xbffdefff] usable > [0.00] BIOS-e820: [mem 0xbffdf000-0xbfff] reserved > [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved > [0.00] BIOS-e820: [mem 0xfffc-0x] reserved > [0.00] BIOS-e820: [mem 0x0001-0x00063fff] usable > [0.00] NX (Execute Disable) protection: active > [0.00] SMBIOS 2.8 present. > [0.00] DMI: Red Hat KVM, BIOS seabios-1.7.5-8.el7 04/01/2014 > [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved > [0.00] e820: remove [mem 0x000a-0x000f] usable > [0.00] e820: last_pfn = 0x64 max_arch_pfn = 0x4 > [0.00] MTRR default type: write-back > [0.00] MTRR fixed ranges enabled: > [0.00] 0-9 write-back > [0.00] A-B uncachable > [0.00] C-F write-protect > [0.00] MTRR variable ranges enabled: > [0.00] 0 base C000 mask 3FFFC000 uncachable > [0.00
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
Hi, Jens Axboe wrote: > Looking at the specific virtio_blk case, it does seem that it is > checking the segment count before mapping. Does the below fix the > problem, or does the BUG_ON() still trigger? Have I understood you correctly, that I should test your patch against linux-4.1.8? PS: I upgraded several other systems (they are using dm with LUKS too but are all ext4, no XFS like this VM) to linux-4.1.9, too and they all freezed while running (booting was never a problem like with the system using XFS). Some of them were running for hours until they freezed, some freezed just seconds after start up finished. Because I haven't seen the BUG_ON on any of these systems before with linux-4.1.8 I am not sure if this is the same problem (and the system with XFS from where I showed you the trace seems to boot fine with linux-4.1.9 but now freezes, too). Any idea how to debug the freeze? PS2: On Google I found http://oss.sgi.com/pipermail/xfs/2014-November/039105.html -- that's why I am mentioning that I am using XFS (and except to the file system the VM is identical with the other VMs which didn't show any problems until linux-4.1.9). I cannot find the patch from http://oss.sgi.com/pipermail/xfs/2014-November/039149.html in the kernel mainline. Is it worth to check that direction, too or doesn't the trace indicate any relationship? Thanks! -Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
Hi > seems like we have two problems: > > The first (origin) problem seems to be already fixed by Mike's patch. > > I applied the patch against linux-4.1.8, rebooted several times without > a problem. But I'll keep testing for sure. I already experienced one freeze with linux-4.1.8 and Mike's patch (same like in 4.1.9). Doesn't mean that this is related to Mike's patch (I see the same freezes with vanilla 4.1.9 not containing the patch with a higher frequency). -Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
Hi, seems like we have two problems: The first (origin) problem seems to be already fixed by Mike's patch. I applied the patch against linux-4.1.8, rebooted several times without a problem. But I'll keep testing for sure. The second problem is a new bug within linux-4.1.9: I experience the > NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! freeze now on all my systems (virtual machines *and* physical systems; Gentoo boxes, Debian boxes, with and without dm (LUKS), with XFS or ext4 -- looks like a general problem). I will post it to LKML itself (I don't have a trace about what's hanging). Thanks for all your time and help! -Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
On Thu, Oct 01, 2015 at 03:10:14AM +0200, Thomas D. wrote: > Hi, > > I have a virtual machine which fails to boot linux-4.1.8 while mounting > file systems: > > > * Mounting local filesystem ... > > [ cut here ]---- > > kernel BUG at drivers/block/virtio_blk.c:172! > > invalid opcode: 000 [#1] SMP > > Modules linked in: pcspkr psmouse dm_log_userspace virtio_net e1000 fuse > > nfs lockd grace sunrpc fscache dm_snapshot dm_bufio dm_mirror > > dm_region_hash dm_log usbhid usb_storage sr_mod cdrom > > CPU: 7 PIDL 2254 Comm: dmcrypt_write Not tainted 4.1.8-gentoo #1 > > Hardware name: Red Hat KVM, BIOS seabios-1.7.5-8.el7 04/01/2014 > > task: 88061fb7 ti: 88061ff3 task.ti: 88061ff3 > > RIP: 0010:[] [] > > virtio_queue_rq+0x210/0x2b0 > > RSP: 0018:88061ff33ba8 EFLAGS: 00010202 > > RAX: 00b1 RBX: 88061fb2fc00 RCX: 88061ff33c30 > > RDX: 0008 RSI: 88061ff33c50 RDI: 88061fb2fc00 > > RBP: 88061ff33bf8 R08: 88061eef3540 R09: 88061ff33c30 > > R10: R11: 00af R12: > > R13: 88061eef3540 R14: 88061eef3540 R15: 880622c7ca80 > > FS: () GS:88063fdc() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 01ffe468 CR3: bb343000 CR4: 001406e0 > > Stack: > > 880622d4c478 88061ff33bd8 88061fb2f > > 0001 88061fb2fc00 88061ff33c30 0 > > 88061eef3540 88061ff33c98 b43eb > > > > Call Trace: > > [] __blk_mq_run_hw_queue+0x1d0/0x370 > > [] blk_mq_run_hw_queue+0x95/0xb0 > > [] blk_mq_flush_plug_list+0x129/0x140 > > [] blk_finish_plug+0x18/0x50 > > [] dmcrypt_write+0x1da/0x1f0 > > [] ? wake_up_state+0x20/0x20 > > [] ? crypt_iv_lmk_dtr+0x60/0x60 > > [] kthread_create_on_node+0x180/0x180 > > [] ret_from_fork+0x42/0x70 > > [] ? kthread_create_on_node+0x180/0x180 > > Code: 00 41 c7 85 78 01 00 00 08 00 00 00 49 c7 85 80 01 00 00 00 00 > > 00 00 41 89 85 7c 01 00 00 e9 93 fe ff ff 66 0f 1f 44 00 00 <0f> 0b 66 0f > > 1f 44 00 00 49 8b 87 b0 00 00 00 41 83 e6 ef 4a 8b > > RIP [] virtio_queue_rq+0x210/0x2b0 > > RSP: > > ---[ end trace 8078357c459d5fc0 ]--- So this BUG_ON is from 1cf7e9c68fe84248174e998922b39e508375e7c1. commit 1cf7e9c68fe84248174e998922b39e508375e7c1 Author: Jens Axboe <ax...@kernel.dk> Date: Fri Nov 1 10:52:52 2013 -0600 virtio_blk: blk-mq support BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); On probe, we do /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk->sg_elems-2); To debug this, maybe you can print out sg_elems at init time and when this fails, to make sure some kind of memory corruption does not change sg_elems after initialization? Jens, how may we get more segments than blk_queue_max_segments? Is driver expected to validate and drop such requests? > > System details: > === > VM (KVM), I don't know any details about the host system. > > The disk (vda) is partitioned in vda1 (ext4, /boot) and vda2 > (LUKS-encrypted). > On top of the LUKS volume I have LVM. > On top of LVM I have several partitions, all XFS (so rootfs is on XFS!). > > 1/20 reboots seems to success. > > Today I tried to upgrade to linux-4.1.9. Mounting wasn't a problem > anymore but after ~5min the system always dies with > > > NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! > > (CPU number varies) > > dmesg: > > > [0.00] Initializing cgroup subsys cpuset > > [0.00] Initializing cgroup subsys cpu > > [0.00] Initializing cgroup subsys cpuacct > > [0.00] Linux version 4.1.8-gentoo (root@sysresccd) (gcc version > > 4.9.3 (Gentoo 4.9.3 p1.2, pie-0.6.3) ) #1 SMP Thu Sep 24 03:44:37 CEST 2015 > > [0.00] Command line: > > BOOT_IMAGE=/kernel-genkernel-x86_64-4.1.8-gentoo > > root=/dev/mapper/vps1storage-volRoot ro dolvm > > crypt_root=UUID=fdea18e4-ba0f-1234-1234-12345678 > > root=UUID=86dad6e6-e43e-1234-1234-987654321 rootfs=xfs scandelay=3 nomodeset > > [0.00] KERNEL supported cpus: > > [0.00] Intel GenuineIntel > > [0.00] e820: BIOS-provided physical RAM map: > > [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable > > [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] > > reserved > >
Re: req->nr_phys_segments > queue_max_segments (was Re: kernel BUG at drivers/block/virtio_blk.c:172!)
On 10/01/2015 11:00 AM, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 03:10:14AM +0200, Thomas D. wrote: >> Hi, >> >> I have a virtual machine which fails to boot linux-4.1.8 while mounting >> file systems: >> >>> * Mounting local filesystem ... >>> ----[ cut here ]-------- >>> kernel BUG at drivers/block/virtio_blk.c:172! >>> invalid opcode: 000 [#1] SMP >>> Modules linked in: pcspkr psmouse dm_log_userspace virtio_net e1000 fuse >>> nfs lockd grace sunrpc fscache dm_snapshot dm_bufio dm_mirror >>> dm_region_hash dm_log usbhid usb_storage sr_mod cdrom >>> CPU: 7 PIDL 2254 Comm: dmcrypt_write Not tainted 4.1.8-gentoo #1 >>> Hardware name: Red Hat KVM, BIOS seabios-1.7.5-8.el7 04/01/2014 >>> task: 88061fb7 ti: 88061ff3 task.ti: 88061ff3 >>> RIP: 0010:[] [] >>> virtio_queue_rq+0x210/0x2b0 >>> RSP: 0018:88061ff33ba8 EFLAGS: 00010202 >>> RAX: 00b1 RBX: 88061fb2fc00 RCX: 88061ff33c30 >>> RDX: 0008 RSI: 88061ff33c50 RDI: 88061fb2fc00 >>> RBP: 88061ff33bf8 R08: 88061eef3540 R09: 88061ff33c30 >>> R10: R11: 00af R12: >>> R13: 88061eef3540 R14: 88061eef3540 R15: 880622c7ca80 >>> FS: () GS:88063fdc() knlGS: >>> CS: 0010 DS: ES: CR0: 80050033 >>> CR2: 01ffe468 CR3: bb343000 CR4: 001406e0 >>> Stack: >>> 880622d4c478 88061ff33bd8 88061fb2f >>> 0001 88061fb2fc00 88061ff33c30 0 >>> 88061eef3540 88061ff33c98 b43eb >>> >>> Call Trace: >>> [] __blk_mq_run_hw_queue+0x1d0/0x370 >>> [] blk_mq_run_hw_queue+0x95/0xb0 >>> [] blk_mq_flush_plug_list+0x129/0x140 >>> [] blk_finish_plug+0x18/0x50 >>> [] dmcrypt_write+0x1da/0x1f0 >>> [] ? wake_up_state+0x20/0x20 >>> [] ? crypt_iv_lmk_dtr+0x60/0x60 >>> [] kthread_create_on_node+0x180/0x180 >>> [] ret_from_fork+0x42/0x70 >>> [] ? kthread_create_on_node+0x180/0x180 >>> Code: 00 41 c7 85 78 01 00 00 08 00 00 00 49 c7 85 80 01 00 00 00 00 >>> 00 00 41 89 85 7c 01 00 00 e9 93 fe ff ff 66 0f 1f 44 00 00 <0f> 0b 66 0f >>> 1f 44 00 00 49 8b 87 b0 00 00 00 41 83 e6 ef 4a 8b >>> RIP [] virtio_queue_rq+0x210/0x2b0 >>> RSP: >>> ---[ end trace 8078357c459d5fc0 ]--- > > > So this BUG_ON is from 1cf7e9c68fe84248174e998922b39e508375e7c1. > commit 1cf7e9c68fe84248174e998922b39e508375e7c1 > Author: Jens Axboe <ax...@kernel.dk> > Date: Fri Nov 1 10:52:52 2013 -0600 > > virtio_blk: blk-mq support > > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); > > > On probe, we do > /* We can handle whatever the host told us to handle. */ > blk_queue_max_segments(q, vblk->sg_elems-2); > > > To debug this, > maybe you can print out sg_elems at init time and when this fails, > to make sure some kind of memory corruption > does not change sg_elems after initialization? > > > Jens, how may we get more segments than blk_queue_max_segments? > Is driver expected to validate and drop such requests? > Whee! I'm not alone anymore! I have seen similar issues even on non-mq systems; occasionally I'm hitting this bug in drivers/scsi/scsi_lib.c:scsi_init_io() count = blk_rq_map_sg(req->q, req, sdb->table.sgl); BUG_ON(count > sdb->table.nents); There are actually two problems here: The one is that blk_rq_map_sg() requires a table (ie the last argument), but doesn't have any indications on how large the table is. So one needs to check if the returned number of mapped sg elements exceeds the number of elements in the table. If so we already _have_ a memory overflow, and the only thing we can to is sit in a corner and cry. This really would need to be fixed up, eg by adding another argument for the table size. This other problem is that this _really_ shouldn't happen, and points to some issue with the block layer in general. Which I've been trying to find for several months now, but no avail :-( Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172!
On 12.11.2014 11:18, Jens Axboe wrote: On 11/11/2014 09:42 AM, Ming Lei wrote: The attached patch should fix the problem, and hope it is the last one, :-) Dongsu and Jeff, any of you test this variant? I think this is the last one, at least I hope so as well... Yes, I've just tested it again with the Ming's patch. It passed a full cycle of xfstests: No crash, no particular regression. The code in blk_recount_segments() seems to make sense too. Thanks! ;-) Dongsu ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172!
On 11/11/2014 09:42 AM, Ming Lei wrote: The attached patch should fix the problem, and hope it is the last one, :-) Dongsu and Jeff, any of you test this variant? I think this is the last one, at least I hope so as well... -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172!
On Wed, 12 Nov 2014 11:18:32 -0700 Jens Axboe ax...@kernel.dk wrote: On 11/11/2014 09:42 AM, Ming Lei wrote: The attached patch should fix the problem, and hope it is the last one, :-) Dongsu and Jeff, any of you test this variant? I think this is the last one, at least I hope so as well... Yes, thanks! That patch seems to fix the problem for me. You can add: Tested-by: Jeff Layton jlay...@poochiereds.net ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172!
On Tue, Nov 11, 2014 at 11:42 PM, Dongsu Park dongsu.p...@profitbricks.com wrote: Hi Ming, On 11.11.2014 08:56, Ming Lei wrote: On Tue, Nov 11, 2014 at 7:31 AM, Jens Axboe ax...@kernel.dk wrote: Known, I'm afraid, Ming is looking into it. Actually I had also tried to reproduce this bug, without success. But today I happened to know how to trigger the bug, by coincidence, during testing other things. Try to run xfstests/generic/034. You'll see the crash immediately. Tested on a QEMU VM with kernel 3.18-rc4, virtio-blk, dm-flakey and xfs. There is one obvious bug which should have been fixed by below patch(0001-block-blk-merge-fix-blk_recount_segments.patch): http://marc.info/?l=linux-virtualizationm=141562191719405q=p3 This patch didn't bring anything to me, as Lukas said. And there might be another one, I appreciate someone can post log which is printed by patch(blk-seg.patch) in below link if the bug still can be triggered even with above fix: http://marc.info/?l=linux-virtualizationm=141473040618467q=p3 blk_recount_segments: 1-1-1 vcnt-128 segs-128 As long as I understood so far, the reason is that bi_phys_segments gets sometimes bigger than queue_max_sectors() after blk_recount_segments(). That happens no matter whether segments are recalculated or not. I know the problem now, since bi_vcnt can't be used for cloned bio, and the patch I sent last time is wrong too. I'm not completely sure about what to do, but how about the attached patch? It seems to work, according to several xfstests runs. Cheers, Dongsu From 1db98323931eb9ab430116c4d909d2c16e22 Mon Sep 17 00:00:00 2001 From: Dongsu Park dongsu.p...@profitbricks.com Date: Tue, 11 Nov 2014 13:10:59 +0100 Subject: [RFC PATCH] blk-merge: make bi_phys_segments consider also queue_max_segments() When recounting the number of physical segments, the number of max segments of request_queue must be also taken into account. Otherwise bio-bi_phys_segments could get bigger than queue_max_segments(). Then this results in virtio_queue_rq() seeing req-nr_phys_segments that is greater than expected. Although the initial queue_max_segments was set to (vblk-sg_elems - 2), a request comes in with a larger value of nr_phys_segments, which triggers the BUG_ON() condition. This commit should fix a kernel crash in virtio_blk, which occurs especially frequently when it runs with blk-mq, device mapper, and xfs. The simplest way to reproduce this bug is to run xfstests/generic/034. Note, test 034 requires dm-flakey to be turned on in the kernel config. See the kernel trace below: [ cut here ] kernel BUG at drivers/block/virtio_blk.c:172! invalid opcode: [#1] SMP CPU: 1 PID: 3343 Comm: mount Not tainted 3.18.0-rc4+ #55 RIP: 0010:[81561027] [81561027] virtio_queue_rq+0x277/0x280 Call Trace: [8142e908] __blk_mq_run_hw_queue+0x1a8/0x300 [8142f00d] blk_mq_run_hw_queue+0x6d/0x90 [8143003e] blk_sq_make_request+0x23e/0x360 [81422e20] generic_make_request+0xc0/0x110 [81422ed9] submit_bio+0x69/0x130 [812f013d] _xfs_buf_ioapply+0x2bd/0x410 [81315f38] ? xlog_bread_noalign+0xa8/0xe0 [812f1bd1] xfs_buf_submit_wait+0x61/0x1d0 [81315f38] xlog_bread_noalign+0xa8/0xe0 [81316917] xlog_bread+0x27/0x60 [8131ad11] xlog_find_verify_cycle+0xe1/0x190 [8131b291] xlog_find_head+0x2d1/0x3c0 [8131b3ad] xlog_find_tail+0x2d/0x3f0 [8131b78e] xlog_recover+0x1e/0xf0 [8130fbac] xfs_log_mount+0x24c/0x2c0 [813075db] xfs_mountfs+0x44b/0x7a0 [8130a98a] xfs_fs_fill_super+0x2ba/0x330 [811cea64] mount_bdev+0x194/0x1d0 [8130a6d0] ? xfs_parseargs+0xbe0/0xbe0 [813089a5] xfs_fs_mount+0x15/0x20 [811cf389] mount_fs+0x39/0x1b0 [8117bf75] ? __alloc_percpu+0x15/0x20 [811e9887] vfs_kern_mount+0x67/0x110 [811ec584] do_mount+0x204/0xad0 [811ed18b] SyS_mount+0x8b/0xe0 [81788e12] system_call_fastpath+0x12/0x17 RIP [81561027] virtio_queue_rq+0x277/0x280 ---[ end trace ae3ec6426f011b5d ]--- Signed-off-by: Dongsu Park dongsu.p...@profitbricks.com Tested-by: Dongsu Park dongsu.p...@profitbricks.com Cc: Ming Lei tom.leim...@gmail.com Cc: Jens Axboe ax...@kernel.dk Cc: Rusty Russell ru...@rustcorp.com.au Cc: Jeff Layton jlay...@poochiereds.net Cc: Dave Chinner da...@fromorbit.com Cc: Michael S. Tsirkin m...@redhat.com Cc: Lukas Czerner lczer...@redhat.com Cc: Christoph Hellwig h...@lst.de Cc: virtualization@lists.linux-foundation.org --- block/blk-merge.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index b3ac40a..d808601 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -103,13 +103,16 @@ void blk_recount_segments(struct request_queue *q, struct
Re: Re: kernel BUG at drivers/block/virtio_blk.c:172
On Fri, 31 Oct 2014, Ming Lei wrote: Date: Fri, 31 Oct 2014 12:40:00 +0800 From: Ming Lei tom.leim...@gmail.com To: Jens Axboe ax...@kernel.dk Cc: Lukáš Czerner lczer...@redhat.com, Linux Virtualization virtualization@lists.linux-foundation.org, Christoph Hellwig h...@lst.de Subject: Re: Re: kernel BUG at drivers/block/virtio_blk.c:172 On Thu, Oct 30, 2014 at 10:38 PM, Jens Axboe ax...@kernel.dk wrote: Forgot to CC you... Forwarded Message Subject: Re: kernel BUG at drivers/block/virtio_blk.c:172 Date: Thu, 30 Oct 2014 08:38:08 -0600 From: Jens Axboe ax...@kernel.dk To: Lukáš Czerner lczer...@redhat.com, virtualization@lists.linux-foundation.org CC: h...@lst.de On 2014-10-30 08:33, Lukáš Czerner wrote: Hi, I've just hit this BUG at drivers/block/virtio_blk.c when updated to the kernel from the top of the Linus git tree. commit a7ca10f263d7e673c74d8e0946d6b9993405cc9c This is my virtual machine running on RHEL7 guest qemu-kvm-1.5.3-60.el7.x86_64 The last upstream kernel (3.17.0-rc4) worked well. I'll try to bisect, but meanwhile this is a backtrace I got very early in the boot. The root fs on that guest is xfs and I am using raw disk image and virtio driver. Let me know if you need more information. Ming, looks like this still isn't really fixed. The above upstream commit has the latest fixup as well for the segments being wrong, so nothing else should be pending. That looks weird, and I can't reproduce with mkfs.xfs mount in my environment. Lukáš, could you reproduce the issue with attached debug patch and post the result? BTW, do you pass 'scsi=off' in the qemu command line for the virito-blk device? Hi, so I encountered it again on 3.17.0-rc4. This output is from the run with your patch. I am using libvirt (virt-manager) to configure and run the virtual machine, but looking at the xml, I do not think it's passing 'scsi=off' at all. Btw, that xfs file system is a root file system. [3.667553] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.668692] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.669897] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.671083] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.672476] [ cut here ] [3.673439] kernel BUG at drivers/block/virtio_blk.c:172! [3.673439] invalid opcode: [#1] SMP [3.673439] Modules linked in: xfs libcrc32c sd_mod ata_generic pata_acpi qxl drm_kms_helper ttm drm ata_piix virtio_net virtio_blk libata virtio_pci virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod [3.673439] CPU: 1 PID: 596 Comm: mount Not tainted 3.18.0-rc4+ #10 [3.673439] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [3.673439] task: 880035419b20 ti: 88022e71c000 task.ti: 88022e71c000 [3.673439] RIP: 0010:[a005a737] [a005a737] virtio_queue_rq+0x277/0x280 [virtio_blk] [3.673439] RSP: 0018:88022e71f7d8 EFLAGS: 00010202 [3.673439] RAX: 0082 RBX: 88022e97 RCX: dead00200200 [3.673439] RDX: RSI: 88022e97 RDI: 88022f2c3c00 [3.673439] RBP: 88022e71f818 R08: 88022e97 R09: 88022e71f840 [3.673439] R10: 028d R11: 88022e71f4ee R12: 88022e71f840 [3.673439] R13: 88022f2c3c00 R14: 88022f722d80 R15: [3.673439] FS: 7fb1e8de6880() GS:88023722() knlGS: [3.673439] CS: 0010 DS: ES: CR0: 8005003b [3.673439] CR2: 7f173410e010 CR3: 00022e2e1000 CR4: 06e0 [3.673439] Stack: [3.673439] 88022e970170 0001 88022f2c3c00 [3.673439] 88022e71f840 88022e97 88022f2c3c08 [3.673439] 88022e71f888 812daa68 88022e8f 0003 [3.673439] Call Trace: [3.673439] [812daa68] __blk_mq_run_hw_queue+0x1c8/0x330 [3.673439] [812db1e0] blk_mq_run_hw_queue+0x70/0xa0 [3.673439] [812dbff5] blk_mq_insert_requests+0xc5/0x120 [3.673439] [812dcbbb] blk_mq_flush_plug_list+0x13b/0x160 [3.673439] [812d2391] blk_flush_plug_list+0xc1/0x220 [3.673439] [812d28a8] blk_finish_plug+0x18/0x50 [3.673439] [a01ce487] _xfs_buf_ioapply+0x327/0x430 [xfs] [3.673439] [8109ae20] ? wake_up_state+0x20/0x20 [3.673439] [a01d0424] ? xfs_bwrite+0x24/0x60 [xfs] [3.673439] [a01cffb1] xfs_buf_submit_wait+0x61/0x1d0 [xfs] [3.673439] [a01d0424] xfs_bwrite+0x24/0x60 [xfs] [3.673439] [a01f5dc7] xlog_bwrite+0x87/0x110 [xfs] [3.673439] [a01f6df3] xlog_write_log_records+0x1b3/0x250 [xfs] [3.673439] [a01f6f98] xlog_clear_stale_blocks+0x108/0x1b0 [xfs] [3.673439] [a01f66a3] ? xlog_bread+0x43
Re: Re: kernel BUG at drivers/block/virtio_blk.c:172
On Mon, Nov 10, 2014 at 7:59 PM, Lukáš Czerner lczer...@redhat.com wrote: Hi, so I encountered it again on 3.17.0-rc4. This output is from the run with your patch. I am using libvirt (virt-manager) to configure and run the virtual machine, but looking at the xml, I do not think it's passing 'scsi=off' at all. Btw, that xfs file system is a root file system. [3.667553] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.668692] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.669897] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.671083] blk_recount_segments: 1-0-1 vcnt-0 segs-128 Hamm, I should have used bi_phys_segments to decide if merge is needed, and attached patch should fix the problem. Thanks, Ming Lei From bcb4f411c23e114f7c77c0858d7f78f50fda68e9 Mon Sep 17 00:00:00 2001 From: Ming Lei tom.leim...@gmail.com Date: Mon, 10 Nov 2014 20:10:24 +0800 Subject: [PATCH] block/blk-merge: fix blk_recount_segments bio-bi_phys_segments should be used for deciding if merge is needed instead of bio-vcnt which isn't valid for cloned bio. Signed-off-by: Ming Lei tom.leim...@gmail.com --- block/blk-merge.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index b3ac40a..3387fd6 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -99,7 +99,7 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio) { bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, q-queue_flags); - bool merge_not_need = bio-bi_vcnt queue_max_segments(q); + bool merge_not_need = bio-bi_phys_segments queue_max_segments(q); if (no_sg_merge !bio_flagged(bio, BIO_CLONED) merge_not_need) -- 1.7.9.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Re: kernel BUG at drivers/block/virtio_blk.c:172
On Mon, 10 Nov 2014, Ming Lei wrote: Date: Mon, 10 Nov 2014 20:18:32 +0800 From: Ming Lei tom.leim...@gmail.com To: Lukáš Czerner lczer...@redhat.com Cc: Jens Axboe ax...@kernel.dk, Linux Virtualization virtualization@lists.linux-foundation.org, Christoph Hellwig h...@lst.de Subject: Re: Re: kernel BUG at drivers/block/virtio_blk.c:172 On Mon, Nov 10, 2014 at 7:59 PM, Lukáš Czerner lczer...@redhat.com wrote: Hi, so I encountered it again on 3.17.0-rc4. This output is from the run with your patch. I am using libvirt (virt-manager) to configure and run the virtual machine, but looking at the xml, I do not think it's passing 'scsi=off' at all. Btw, that xfs file system is a root file system. [3.667553] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.668692] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.669897] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.671083] blk_recount_segments: 1-0-1 vcnt-0 segs-128 Hamm, I should have used bi_phys_segments to decide if merge is needed, and attached patch should fix the problem. Thanks for the patch, unfortunately it does not fix the issue for me. I am willing to try something else though :) -Lukas Thanks, Ming Lei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Re: kernel BUG at drivers/block/virtio_blk.c:172
On Mon, Nov 10, 2014 at 9:05 PM, Lukáš Czerner lczer...@redhat.com wrote: On Mon, 10 Nov 2014, Ming Lei wrote: Date: Mon, 10 Nov 2014 20:18:32 +0800 From: Ming Lei tom.leim...@gmail.com To: Lukáš Czerner lczer...@redhat.com Cc: Jens Axboe ax...@kernel.dk, Linux Virtualization virtualization@lists.linux-foundation.org, Christoph Hellwig h...@lst.de Subject: Re: Re: kernel BUG at drivers/block/virtio_blk.c:172 On Mon, Nov 10, 2014 at 7:59 PM, Lukáš Czerner lczer...@redhat.com wrote: Hi, so I encountered it again on 3.17.0-rc4. This output is from the run with your patch. I am using libvirt (virt-manager) to configure and run the virtual machine, but looking at the xml, I do not think it's passing 'scsi=off' at all. Btw, that xfs file system is a root file system. [3.667553] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.668692] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.669897] blk_recount_segments: 1-0-1 vcnt-0 segs-128 [3.671083] blk_recount_segments: 1-0-1 vcnt-0 segs-128 Hamm, I should have used bi_phys_segments to decide if merge is needed, and attached patch should fix the problem. Thanks for the patch, unfortunately it does not fix the issue for me. I am willing to try something else though :) Care to post the log(like blk_recount_segments: 1-0-1 vcnt-0 segs-128) after applying the patch in my last reply? Thanks, -- Ming Lei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
kernel BUG at drivers/block/virtio_blk.c:172!
In the latest Fedora rawhide kernel in the repos, I'm seeing the following oops when mounting xfs. rc2-ish kernels seem to be fine: [ 64.669633] [ cut here ] [ 64.670008] kernel BUG at drivers/block/virtio_blk.c:172! [ 64.670008] invalid opcode: [#1] SMP [ 64.670008] Modules linked in: xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev snd_timer snd virtio_net virtio_balloon soundcore serio_raw parport_pc virtio_console pvpanic parport i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc qxl virtio_blk drm_kms_helper ttm drm ata_generic virtio_pci virtio_ring virtio pata_acpi [ 64.670008] CPU: 1 PID: 705 Comm: mount Not tainted 3.18.0-0.rc3.git2.1.fc22.x86_64 #1 [ 64.670008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 64.670008] task: 8800d94a4ec0 ti: 8800d9f38000 task.ti: 8800d9f38000 [ 64.670008] RIP: 0010:[a00287c0] [a00287c0] virtio_queue_rq+0x290/0x2a0 [virtio_blk] [ 64.670008] RSP: 0018:8800d9f3b778 EFLAGS: 00010202 [ 64.670008] RAX: 0082 RBX: 8800d8375700 RCX: dead00200200 [ 64.670008] RDX: 0001 RSI: 8800d8375700 RDI: 8800d82c4c00 [ 64.670008] RBP: 8800d9f3b7b8 R08: 8800d8375700 R09: 0001 [ 64.670008] R10: 0001 R11: 0004 R12: 8800d9f3b7e0 [ 64.670008] R13: 8800d82c4c00 R14: 880118629200 R15: [ 64.670008] FS: 7f5c64dfd840() GS:88011b00() knlGS: [ 64.670008] CS: 0010 DS: ES: CR0: 8005003b [ 64.670008] CR2: 7fffe6458fb8 CR3: d06d3000 CR4: 06e0 [ 64.670008] Stack: [ 64.670008] 8801 8800d8375870 0001 8800d82c4c00 [ 64.670008] 8800d9f3b7e0 8800d8375700 8800d82c4c48 [ 64.670008] 8800d9f3b828 813ec258 8800d82c8000 0001 [ 64.670008] Call Trace: [ 64.670008] [813ec258] __blk_mq_run_hw_queue+0x1c8/0x330 [ 64.670008] [813ecd80] blk_mq_run_hw_queue+0x70/0x90 [ 64.670008] [813ee0cd] blk_sq_make_request+0x24d/0x5c0 [ 64.670008] [813dec68] generic_make_request+0xf8/0x150 [ 64.670008] [813ded38] submit_bio+0x78/0x190 [ 64.670008] [a02fc27e] _xfs_buf_ioapply+0x2be/0x5f0 [xfs] [ 64.670008] [a0333628] ? xlog_bread_noalign+0xa8/0xe0 [xfs] [ 64.670008] [a02ffe21] xfs_buf_submit_wait+0x91/0x840 [xfs] [ 64.670008] [a0333628] xlog_bread_noalign+0xa8/0xe0 [xfs] [ 64.670008] [a0333ea7] xlog_bread+0x27/0x60 [xfs] [ 64.670008] [a03357f3] xlog_find_verify_cycle+0xf3/0x1b0 [xfs] [ 64.670008] [a0335de5] xlog_find_head+0x2f5/0x3e0 [xfs] [ 64.670008] [a0335f0c] xlog_find_tail+0x3c/0x410 [xfs] [ 64.670008] [a033b12d] xlog_recover+0x2d/0x120 [xfs] [ 64.670008] [a033cfdb] ? xfs_trans_ail_init+0xcb/0x100 [xfs] [ 64.670008] [a0329c3d] xfs_log_mount+0xdd/0x2c0 [xfs] [ 64.670008] [a031f744] xfs_mountfs+0x514/0x9c0 [xfs] [ 64.670008] [a0320c8d] ? xfs_mru_cache_create+0x18d/0x1f0 [xfs] [ 64.670008] [a0322ed0] xfs_fs_fill_super+0x330/0x3b0 [xfs] [ 64.670008] [8126d4ac] mount_bdev+0x1bc/0x1f0 [ 64.670008] [a0322ba0] ? xfs_parseargs+0xbe0/0xbe0 [xfs] [ 64.670008] [a0320fd5] xfs_fs_mount+0x15/0x20 [xfs] [ 64.670008] [8126de58] mount_fs+0x38/0x1c0 [ 64.670008] [81202c15] ? __alloc_percpu+0x15/0x20 [ 64.670008] [812908f8] vfs_kern_mount+0x68/0x160 [ 64.670008] [81293d6c] do_mount+0x22c/0xc20 [ 64.670008] [8120d92e] ? might_fault+0x5e/0xc0 [ 64.670008] [811fcf1b] ? memdup_user+0x4b/0x90 [ 64.670008] [81294a8e] SyS_mount+0x9e/0x100 [ 64.670008] [8185e169] system_call_fastpath+0x12/0x17 [ 64.670008] Code: 00 00 c7 86 78 01 00 00 02 00 00 00 48 c7 86 80 01 00 00 00 00 00 00 89 86 7c 01 00 00 e9 02 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [ 64.670008] RIP [a00287c0] virtio_queue_rq+0x290/0x2a0 [virtio_blk] [ 64.670008] RSP 8800d9f3b778 [ 64.715347] ---[ end trace c0ff4a0f2fb21f7f ]--- It's reliably reproducible and I don't see this oops when I convert the same block device to ext4 and mount it. In this setup, the KVM guest has a virtio block device that has a LVM2 PV on it with an LV on it that contains the filesystem. Let me know if you need any other info to chase this down. Thanks! -- Jeff Layton jlay...@poochiereds.net ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172!
Jeff Layton jlay...@poochiereds.net writes: In the latest Fedora rawhide kernel in the repos, I'm seeing the following oops when mounting xfs. rc2-ish kernels seem to be fine: [ 64.669633] [ cut here ] [ 64.670008] kernel BUG at drivers/block/virtio_blk.c:172! Hmm, that's: BUG_ON(req-nr_phys_segments + 2 vblk-sg_elems); But during our probe routine we said: /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk-sg_elems-2); Jens? Thanks, Rusty. [ 64.670008] invalid opcode: [#1] SMP [ 64.670008] Modules linked in: xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev snd_timer snd virtio_net virtio_balloon soundcore serio_raw parport_pc virtio_console pvpanic parport i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc qxl virtio_blk drm_kms_helper ttm drm ata_generic virtio_pci virtio_ring virtio pata_acpi [ 64.670008] CPU: 1 PID: 705 Comm: mount Not tainted 3.18.0-0.rc3.git2.1.fc22.x86_64 #1 [ 64.670008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 64.670008] task: 8800d94a4ec0 ti: 8800d9f38000 task.ti: 8800d9f38000 [ 64.670008] RIP: 0010:[a00287c0] [a00287c0] virtio_queue_rq+0x290/0x2a0 [virtio_blk] [ 64.670008] RSP: 0018:8800d9f3b778 EFLAGS: 00010202 [ 64.670008] RAX: 0082 RBX: 8800d8375700 RCX: dead00200200 [ 64.670008] RDX: 0001 RSI: 8800d8375700 RDI: 8800d82c4c00 [ 64.670008] RBP: 8800d9f3b7b8 R08: 8800d8375700 R09: 0001 [ 64.670008] R10: 0001 R11: 0004 R12: 8800d9f3b7e0 [ 64.670008] R13: 8800d82c4c00 R14: 880118629200 R15: [ 64.670008] FS: 7f5c64dfd840() GS:88011b00() knlGS: [ 64.670008] CS: 0010 DS: ES: CR0: 8005003b [ 64.670008] CR2: 7fffe6458fb8 CR3: d06d3000 CR4: 06e0 [ 64.670008] Stack: [ 64.670008] 8801 8800d8375870 0001 8800d82c4c00 [ 64.670008] 8800d9f3b7e0 8800d8375700 8800d82c4c48 [ 64.670008] 8800d9f3b828 813ec258 8800d82c8000 0001 [ 64.670008] Call Trace: [ 64.670008] [813ec258] __blk_mq_run_hw_queue+0x1c8/0x330 [ 64.670008] [813ecd80] blk_mq_run_hw_queue+0x70/0x90 [ 64.670008] [813ee0cd] blk_sq_make_request+0x24d/0x5c0 [ 64.670008] [813dec68] generic_make_request+0xf8/0x150 [ 64.670008] [813ded38] submit_bio+0x78/0x190 [ 64.670008] [a02fc27e] _xfs_buf_ioapply+0x2be/0x5f0 [xfs] [ 64.670008] [a0333628] ? xlog_bread_noalign+0xa8/0xe0 [xfs] [ 64.670008] [a02ffe21] xfs_buf_submit_wait+0x91/0x840 [xfs] [ 64.670008] [a0333628] xlog_bread_noalign+0xa8/0xe0 [xfs] [ 64.670008] [a0333ea7] xlog_bread+0x27/0x60 [xfs] [ 64.670008] [a03357f3] xlog_find_verify_cycle+0xf3/0x1b0 [xfs] [ 64.670008] [a0335de5] xlog_find_head+0x2f5/0x3e0 [xfs] [ 64.670008] [a0335f0c] xlog_find_tail+0x3c/0x410 [xfs] [ 64.670008] [a033b12d] xlog_recover+0x2d/0x120 [xfs] [ 64.670008] [a033cfdb] ? xfs_trans_ail_init+0xcb/0x100 [xfs] [ 64.670008] [a0329c3d] xfs_log_mount+0xdd/0x2c0 [xfs] [ 64.670008] [a031f744] xfs_mountfs+0x514/0x9c0 [xfs] [ 64.670008] [a0320c8d] ? xfs_mru_cache_create+0x18d/0x1f0 [xfs] [ 64.670008] [a0322ed0] xfs_fs_fill_super+0x330/0x3b0 [xfs] [ 64.670008] [8126d4ac] mount_bdev+0x1bc/0x1f0 [ 64.670008] [a0322ba0] ? xfs_parseargs+0xbe0/0xbe0 [xfs] [ 64.670008] [a0320fd5] xfs_fs_mount+0x15/0x20 [xfs] [ 64.670008] [8126de58] mount_fs+0x38/0x1c0 [ 64.670008] [81202c15] ? __alloc_percpu+0x15/0x20 [ 64.670008] [812908f8] vfs_kern_mount+0x68/0x160 [ 64.670008] [81293d6c] do_mount+0x22c/0xc20 [ 64.670008] [8120d92e] ? might_fault+0x5e/0xc0 [ 64.670008] [811fcf1b] ? memdup_user+0x4b/0x90 [ 64.670008] [81294a8e] SyS_mount+0x9e/0x100 [ 64.670008] [8185e169] system_call_fastpath+0x12/0x17 [ 64.670008] Code: 00 00 c7 86 78 01 00 00 02 00 00 00 48 c7 86 80 01 00 00 00 00 00 00 89 86 7c 01 00 00 e9 02 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [ 64.670008] RIP [a00287c0] virtio_queue_rq+0x290/0x2a0 [virtio_blk] [ 64.670008] RSP 8800d9f3b778 [ 64.715347] ---[ end trace c0ff4a0f2fb21f7f ]--- It's reliably reproducible and I don't see this oops when I convert the same block device to ext4 and mount it. In this setup, the KVM guest has a virtio block device that has a LVM2 PV
Re: kernel BUG at drivers/block/virtio_blk.c:172!
On 2014-11-10 02:59, Rusty Russell wrote: Jeff Layton jlay...@poochiereds.net writes: In the latest Fedora rawhide kernel in the repos, I'm seeing the following oops when mounting xfs. rc2-ish kernels seem to be fine: [ 64.669633] [ cut here ] [ 64.670008] kernel BUG at drivers/block/virtio_blk.c:172! Hmm, that's: BUG_ON(req-nr_phys_segments + 2 vblk-sg_elems); But during our probe routine we said: /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk-sg_elems-2); Jens? Known, I'm afraid, Ming is looking into it. -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172!
On Tue, Nov 11, 2014 at 7:31 AM, Jens Axboe ax...@kernel.dk wrote: On 2014-11-10 02:59, Rusty Russell wrote: Jeff Layton jlay...@poochiereds.net writes: In the latest Fedora rawhide kernel in the repos, I'm seeing the following oops when mounting xfs. rc2-ish kernels seem to be fine: [ 64.669633] [ cut here ] [ 64.670008] kernel BUG at drivers/block/virtio_blk.c:172! Hmm, that's: BUG_ON(req-nr_phys_segments + 2 vblk-sg_elems); But during our probe routine we said: /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk-sg_elems-2); Jens? Known, I'm afraid, Ming is looking into it. There is one obvious bug which should have been fixed by below patch(0001-block-blk-merge-fix-blk_recount_segments.patch): http://marc.info/?l=linux-virtualizationm=141562191719405q=p3 And there might be another one, I appreciate someone can post log which is printed by patch(blk-seg.patch) in below link if the bug still can be triggered even with above fix: http://marc.info/?l=linux-virtualizationm=141473040618467q=p3 Thanks, -- Ming Lei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172
On 2014-10-30 08:33, Lukáš Czerner wrote: Hi, I've just hit this BUG at drivers/block/virtio_blk.c when updated to the kernel from the top of the Linus git tree. commit a7ca10f263d7e673c74d8e0946d6b9993405cc9c This is my virtual machine running on RHEL7 guest qemu-kvm-1.5.3-60.el7.x86_64 The last upstream kernel (3.17.0-rc4) worked well. I'll try to bisect, but meanwhile this is a backtrace I got very early in the boot. The root fs on that guest is xfs and I am using raw disk image and virtio driver. Let me know if you need more information. Ming, looks like this still isn't really fixed. The above upstream commit has the latest fixup as well for the segments being wrong, so nothing else should be pending. Leaving trace below. [2.806242] [ cut here ] [2.807018] kernel BUG at drivers/block/virtio_blk.c:172! [2.807018] invalid opcode: [#1] SMP [2.807018] Modules linked in: xfs libcrc32c sd_mod qxl ata_generic pata_acpi drm_kms_helper ttm drm ata_piix virtio_net virtio_blk libata virtio_pci floppy virtio_ring i2c_core virtio dm_mirror dm_region_hash dm_log dm_mod [2.807018] CPU: 2 PID: 580 Comm: mount Not tainted 3.18.0-rc2+ #4 [2.807018] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [2.807018] task: 880035e3b640 ti: 880034d24000 task.ti: 880034d24000 [2.807018] RIP: 0010:[a0034737] [a0034737] virtio_queue_rq+0x277/0x280 [virtio_blk] [2.807018] RSP: 0018:880034d277b8 EFLAGS: 00010202 [2.807018] RAX: 0082 RBX: 88022f2f RCX: dead00200200 [2.807018] RDX: 0001 RSI: 88022f2f RDI: 88022f2e3400 [2.807018] RBP: 880034d277f8 R08: 88022f2f R09: 880034d27820 [2.807018] R10: R11: 1000 R12: 880034d27820 [2.807018] R13: 88022f2e3400 R14: 880035dca480 R15: [2.807018] FS: 7f23686e0880() GS:88023724() knlGS: [2.807018] CS: 0010 DS: ES: CR0: 80050033 [2.807018] CR2: 7f8c71668000 CR3: 35d8e000 CR4: 06e0 [2.807018] Stack: [2.807018] 8801 88022f2f0170 88022f2e3400 [2.807018] 880034d27820 88022f2f 88022f2e3408 [2.807018] 880034d27868 812e0628 88022e8c 0003 [2.807018] Call Trace: [2.807018] [812e0628] __blk_mq_run_hw_queue+0x1c8/0x330 [2.807018] [812d233f] ? part_round_stats+0x4f/0x60 [2.807018] [812e0da0] blk_mq_run_hw_queue+0x70/0xa0 [2.807018] [812e1e68] blk_sq_make_request+0x258/0x380 [2.807018] [812d32e0] generic_make_request+0xe0/0x130 [2.807018] [812d33a8] submit_bio+0x78/0x160 [2.807018] [a01c3516] _xfs_buf_ioapply+0x2e6/0x430 [xfs] [2.807018] [a01eacd8] ? xlog_bread_noalign+0xa8/0xe0 [xfs] [2.807018] [a01c5081] xfs_buf_submit_wait+0x61/0x1d0 [xfs] [2.807018] [a01eacd8] xlog_bread_noalign+0xa8/0xe0 [xfs] [2.807018] [a01eb6e7] xlog_bread+0x27/0x60 [xfs] [2.807018] [a01efd01] xlog_find_verify_cycle+0xf1/0x1b0 [xfs] [2.807018] [a01f02d1] xlog_find_head+0x2f1/0x3e0 [xfs] [2.807018] [a01f03fc] xlog_find_tail+0x3c/0x410 [xfs] [2.807018] [a01f07fd] xlog_recover+0x2d/0x130 [xfs] [2.807018] [a01f199f] ? xfs_trans_ail_init+0xaf/0xe0 [xfs] [2.807018] [a01e444a] xfs_log_mount+0xea/0x2c0 [xfs] [2.807018] [a01dbbfc] xfs_mountfs+0x46c/0x7a0 [xfs] [2.807018] [a01df10a] xfs_fs_fill_super+0x2ba/0x330 [xfs] [2.807018] [811ee510] mount_bdev+0x1b0/0x1f0 [2.807018] [a01dee50] ? xfs_parseargs+0xbf0/0xbf0 [xfs] [2.807018] [a01dd085] xfs_fs_mount+0x15/0x20 [xfs] [2.807018] [811eee89] mount_fs+0x39/0x1b0 [2.807018] [81194505] ? __alloc_percpu+0x15/0x20 [2.807018] [8120ab2b] vfs_kern_mount+0x6b/0x110 [2.807018] [8120d91c] do_mount+0x22c/0xb30 [2.807018] [8118f2b6] ? memdup_user+0x46/0x80 [2.807018] [8120e562] SyS_mount+0xa2/0x110 [2.807018] [81671269] system_call_fastpath+0x12/0x17 [2.807018] Code: fe ff ff 90 0f b7 86 f4 00 00 00 c7 86 78 01 00 00 02 00 00 00 48 c7 86 80 01 00 00 00 00 00 00 89 86 7c 01 00 00 e9 12 fe ff ff 0f 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 31 f6 b9 08 00 00 [2.807018] RIP [a0034737] virtio_queue_rq+0x277/0x280 [virtio_blk] [2.807018] RSP 880034d277b8 [2.887706] ---[ end trace e667b0f035973c7a ]--- [2.888781] Kernel panic - not syncing: Fatal exception [2.889765] Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) -- Jens Axboe
Re: Re: kernel BUG at drivers/block/virtio_blk.c:172
On Thu, Oct 30, 2014 at 10:38 PM, Jens Axboe ax...@kernel.dk wrote: Forgot to CC you... Forwarded Message Subject: Re: kernel BUG at drivers/block/virtio_blk.c:172 Date: Thu, 30 Oct 2014 08:38:08 -0600 From: Jens Axboe ax...@kernel.dk To: Lukáš Czerner lczer...@redhat.com, virtualization@lists.linux-foundation.org CC: h...@lst.de On 2014-10-30 08:33, Lukáš Czerner wrote: Hi, I've just hit this BUG at drivers/block/virtio_blk.c when updated to the kernel from the top of the Linus git tree. commit a7ca10f263d7e673c74d8e0946d6b9993405cc9c This is my virtual machine running on RHEL7 guest qemu-kvm-1.5.3-60.el7.x86_64 The last upstream kernel (3.17.0-rc4) worked well. I'll try to bisect, but meanwhile this is a backtrace I got very early in the boot. The root fs on that guest is xfs and I am using raw disk image and virtio driver. Let me know if you need more information. Ming, looks like this still isn't really fixed. The above upstream commit has the latest fixup as well for the segments being wrong, so nothing else should be pending. That looks weird, and I can't reproduce with mkfs.xfs mount in my environment. Lukáš, could you reproduce the issue with attached debug patch and post the result? BTW, do you pass 'scsi=off' in the qemu command line for the virito-blk device? Thanks, -- Ming Lei diff --git a/block/blk-merge.c b/block/blk-merge.c index b3ac40a..91f3275 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -113,6 +113,15 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio) bio-bi_next = nxt; } + if (no_sg_merge (bio-bi_phys_segments = queue_max_segments(q))) + printk(%s: %d-%d-%d vcnt-%d segs-%d\n, +__func__, +no_sg_merge, +!bio_flagged(bio, BIO_CLONED), +merge_not_need, +bio-bi_vcnt, +bio-bi_phys_segments); + bio-bi_flags |= (1 BIO_SEG_VALID); } EXPORT_SYMBOL(blk_recount_segments); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization