On Wed, 11 Jul 2018 13:45:09 -0400 Chris <coderight+l...@gmail.com> wrote:
> On Wed, Jul 11, 2018 at 12:43 PM, Greg Kurz <gr...@kaod.org> wrote: > > I've been observing a similar delay on ppc64 with fedora28 guests: > > > > # dmesg | egrep 'scsi| sd ' > > [ 1.530946] scsi host0: Virtio SCSI HBA > > [ 1.532452] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK > > 2.5+ PQ: 0 ANSI: 5 > > [ 21.928378] sd 0:0:0:0: Power-on or device reset occurred > > [ 21.930012] sd 0:0:0:0: Attached scsi generic sg0 type 0 > > [ 21.931554] sd 0:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 > > GB/40.0 GiB) > > [ 21.931929] sd 0:0:0:0: [sda] Write Protect is off > > [ 21.933110] sd 0:0:0:0: [sda] Mode Sense: 63 00 00 08 > > [ 21.934084] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, > > doesn't support DPO or FUA > > [ 21.943566] sd 0:0:0:0: [sda] Attached SCSI disk > > > > Kernel version is 4.16.16-300.fc28.ppc64. And I cannot reproduce the > > issue with other distros that have an older kernel, eg, ubuntu 18.04 > > with kernel 4.15.0-23-generic. > > > > My first guess is that it might be a kernel-side regression introduced > > in 4.16... maybe bisect ? > > Interesting. I just tried kernel 4.17.5 from the mainline ppa on > Ubuntu 18.04 and now there is a delay. It's only 7.5 seconds but still > noticeable. There was previously no delay with the 4.15 kernel. > > Definitely seems like it could be something introduced in kernel 4.16. > > Chris > Bisect led me to this commit, merged in 4.16: commit b5b6e8c8d3b4cbeb447a0f10c7d5de3caa573299 Author: Ming Lei <ming....@redhat.com> Date: Tue Mar 13 17:42:42 2018 +0800 scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity Since commit 84676c1f21e8ff5 ("genirq/affinity: assign vectors to all possible CPUs") it is possible to end up in a scenario where only offline CPUs are mapped to an interrupt vector. This is only an issue for the legacy I/O path since with blk-mq/scsi-mq an I/O can't be submitted to a hardware queue if the queue isn't mapped to an online CPU. Fix this issue by forcing virtio-scsi to use blk-mq. Also, I realized I don't see the issue if I start QEMU with -smp 1. I'll continue digging but any suggestion is welcome :) Cheers, -- Greg