[PATCH V4 0/9] rework on the IRQ hardening of virtio
Hi All: This is a rework on the IRQ hardening for virtio which is done previously by the following commits are reverted: 9e35276a5344 ("virtio_pci: harden MSI-X interrupts") 080cd7c3ac87 ("virtio-pci: harden INTX interrupts") The reason is that it depends on the IRQF_NO_AUTOEN which may conflict with the assumption of the affinity managed IRQ that is used by some virtio drivers. And what's more, it is only done for virtio-pci but not other transports. In this rework, I try to implement a general virtio solution which borrows the idea of the INTX hardening by re-using per virtqueue boolean vq->broken and toggle it in virtio_device_ready() and virtio_reset_device(). Then we can simply reuse the existing checks in the vring_interrupt() and return early if the driver is not ready. Note that, I only did compile test on ccw and MMIO transport. Please review. Changes since v1: - Use transport specific irq synchronization method when possible - Drop the module parameter and enable the hardening unconditonally - Tweak the barrier/ordering facilities used in the code - Reanme irq_soft_enabled to driver_ready - Avoid unnecssary IRQ synchornization (e.g during boot) Changes since V2: - add ccw and MMIO support - rename synchronize_vqs() to synchronize_cbs() - switch to re-use vq->broken instead of introducing new device attributes for the future virtqueue reset support - remove unnecssary READ_ONCE()/WRITE_ONCE() - a new patch to remove device triggerable BUG_ON() - more tweaks on the comments Changes since V3: - Rename synchornize_vqs() to synchronize_cbs() - tweak the comment for synchronize_cbs() - switch to use a dedicated helper __virtio_unbreak_device() and document it should be only used for probing - switch to use rwlick to synchornize the non airq for ccw Jason Wang (8): virtio: use virtio_reset_device() when possible virtio: introduce config op to synchronize vring callbacks virtio-pci: implement synchronize_cbs() virtio-mmio: implement synchronize_cbs() virtio-ccw: implement synchronize_cbs() virtio: allow to unbreak virtqueue virtio: harden vring IRQ virtio: use WARN_ON() to warning illegal status value Stefano Garzarella (1): virtio: use virtio_device_ready() in virtio_device_restore() drivers/s390/virtio/virtio_ccw.c | 27 + drivers/virtio/virtio.c| 24 -- drivers/virtio/virtio_mmio.c | 9 +++ drivers/virtio/virtio_pci_legacy.c | 1 + drivers/virtio/virtio_pci_modern.c | 2 ++ drivers/virtio/virtio_ring.c | 32 +--- include/linux/virtio.h | 1 + include/linux/virtio_config.h | 39 +- 8 files changed, 123 insertions(+), 12 deletions(-) -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio
On Sat, May 07 2022, Jason Wang wrote: > Hi All: > > This is a rework on the IRQ hardening for virtio which is done > previously by the following commits are reverted: > > 9e35276a5344 ("virtio_pci: harden MSI-X interrupts") > 080cd7c3ac87 ("virtio-pci: harden INTX interrupts") > > The reason is that it depends on the IRQF_NO_AUTOEN which may conflict > with the assumption of the affinity managed IRQ that is used by some > virtio drivers. And what's more, it is only done for virtio-pci but > not other transports. > > In this rework, I try to implement a general virtio solution which > borrows the idea of the INTX hardening by re-using per virtqueue > boolean vq->broken and toggle it in virtio_device_ready() and > virtio_reset_device(). Then we can simply reuse the existing checks in > the vring_interrupt() and return early if the driver is not ready. > > Note that, I only did compile test on ccw and MMIO transport. Lockdep is unhappy with the ccw parts: WARNING: inconsistent lock state 5.18.0-rc6+ #191 Not tainted inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage. kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes: 058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: virtio_ccw_synchronize_cbs+0x4e/0x60 {IN-HARDIRQ-R} state was registered at: __lock_acquire+0x442/0xc20 lock_acquire.part.0+0xdc/0x228 lock_acquire+0xa6/0x1b0 _raw_read_lock_irqsave+0x72/0x100 virtio_ccw_int_handler+0x84/0x238 ccw_device_call_handler+0x72/0xd0 ccw_device_irq+0x7a/0x198 do_cio_interrupt+0x11c/0x1d0 __handle_irq_event_percpu+0xc2/0x318 handle_irq_event_percpu+0x26/0x68 handle_percpu_irq+0x64/0x88 generic_handle_irq+0x40/0x58 do_irq_async+0x56/0xb0 do_io_irq+0x82/0x160 io_int_handler+0xe6/0x120 rcu_read_lock_sched_held+0x3e/0xb0 lock_acquired+0x12e/0x208 new_inode+0x3e/0xd0 debugfs_get_inode+0x22/0x68 __debugfs_create_file+0x78/0x1c0 debugfs_create_file_unsafe+0x36/0x58 debugfs_create_u32+0x38/0x68 sched_init_debug+0xb0/0x1c0 do_one_initcall+0x108/0x280 do_initcalls+0x124/0x148 kernel_init_freeable+0x242/0x280 kernel_init+0x2e/0x158 __ret_from_fork+0x3c/0x50 ret_from_fork+0xa/0x40 irq event stamp: 539789 hardirqs last enabled at (539789): [<00d9c632>] _raw_spin_unlock_irqrestore+0x72/0x88 hardirqs last disabled at (539788): [<00d9c2b6>] _raw_spin_lock_irqsave+0x96/0xd0 softirqs last enabled at (539568): [<00d9e0d4>] __do_softirq+0x434/0x588 softirqs last disabled at (539503): [<0018cd66>] __irq_exit_rcu+0x146/0x170 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(&vcdev->irq_lock); lock(&vcdev->irq_lock); *** DEADLOCK *** 2 locks held by kworker/u4:0/9: #0: 0288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658 #1: 0384bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658 stack backtrace: CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191 Hardware name: QEMU 8561 QEMU (KVM/Linux) Workqueue: events_unbound async_run_entry_fn Call Trace: [<00d8af22>] dump_stack_lvl+0x92/0xd0 [<002032ac>] mark_lock_irq+0x864/0x968 [<00203670>] mark_lock.part.0+0x2c0/0x790 [<00203cea>] mark_usage+0x10a/0x178 [<0020692a>] __lock_acquire+0x442/0xc20 [<00207cc4>] lock_acquire.part.0+0xdc/0x228 [<00207eb6>] lock_acquire+0xa6/0x1b0 [<00d9c774>] _raw_write_lock+0x54/0xa8 [<00d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60 [<008eec04>] register_virtio_device+0xdc/0x1b0 [<00d5aabe>] virtio_ccw_online+0x246/0x2e8 [<00c9fecc>] ccw_device_set_online+0x1c4/0x540 [<00d5a05e>] virtio_ccw_auto_online+0x26/0x50 [<001ba2b0>] async_run_entry_fn+0x40/0x108 [<001ab9b4>] process_one_work+0x2a4/0x658 [<001abdd0>] worker_thread+0x68/0x440 [<001b4668>] kthread+0x128/0x130 [<00102fac>] __ret_from_fork+0x3c/0x50 [<00d9d3aa>] ret_from_fork+0xa/0x40 INFO: lockdep is turned off. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio
On Tue, May 10, 2022 at 5:29 PM Cornelia Huck wrote: > > On Sat, May 07 2022, Jason Wang wrote: > > > Hi All: > > > > This is a rework on the IRQ hardening for virtio which is done > > previously by the following commits are reverted: > > > > 9e35276a5344 ("virtio_pci: harden MSI-X interrupts") > > 080cd7c3ac87 ("virtio-pci: harden INTX interrupts") > > > > The reason is that it depends on the IRQF_NO_AUTOEN which may conflict > > with the assumption of the affinity managed IRQ that is used by some > > virtio drivers. And what's more, it is only done for virtio-pci but > > not other transports. > > > > In this rework, I try to implement a general virtio solution which > > borrows the idea of the INTX hardening by re-using per virtqueue > > boolean vq->broken and toggle it in virtio_device_ready() and > > virtio_reset_device(). Then we can simply reuse the existing checks in > > the vring_interrupt() and return early if the driver is not ready. > > > > Note that, I only did compile test on ccw and MMIO transport. > > Lockdep is unhappy with the ccw parts: > > > WARNING: inconsistent lock state > 5.18.0-rc6+ #191 Not tainted > > inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage. > kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes: > 058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: > virtio_ccw_synchronize_cbs+0x4e/0x60 > {IN-HARDIRQ-R} state was registered at: > __lock_acquire+0x442/0xc20 > lock_acquire.part.0+0xdc/0x228 > lock_acquire+0xa6/0x1b0 > _raw_read_lock_irqsave+0x72/0x100 > virtio_ccw_int_handler+0x84/0x238 > ccw_device_call_handler+0x72/0xd0 > ccw_device_irq+0x7a/0x198 > do_cio_interrupt+0x11c/0x1d0 > __handle_irq_event_percpu+0xc2/0x318 > handle_irq_event_percpu+0x26/0x68 > handle_percpu_irq+0x64/0x88 > generic_handle_irq+0x40/0x58 > do_irq_async+0x56/0xb0 > do_io_irq+0x82/0x160 > io_int_handler+0xe6/0x120 > rcu_read_lock_sched_held+0x3e/0xb0 > lock_acquired+0x12e/0x208 > new_inode+0x3e/0xd0 > debugfs_get_inode+0x22/0x68 > __debugfs_create_file+0x78/0x1c0 > debugfs_create_file_unsafe+0x36/0x58 > debugfs_create_u32+0x38/0x68 > sched_init_debug+0xb0/0x1c0 > do_one_initcall+0x108/0x280 > do_initcalls+0x124/0x148 > kernel_init_freeable+0x242/0x280 > kernel_init+0x2e/0x158 > __ret_from_fork+0x3c/0x50 > ret_from_fork+0xa/0x40 > irq event stamp: 539789 > hardirqs last enabled at (539789): [<00d9c632>] > _raw_spin_unlock_irqrestore+0x72/0x88 > hardirqs last disabled at (539788): [<00d9c2b6>] > _raw_spin_lock_irqsave+0x96/0xd0 > softirqs last enabled at (539568): [<00d9e0d4>] > __do_softirq+0x434/0x588 > softirqs last disabled at (539503): [<0018cd66>] > __irq_exit_rcu+0x146/0x170 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(&vcdev->irq_lock); > > lock(&vcdev->irq_lock); > > *** DEADLOCK *** It looks to me we need to use write_lock_irq()/write_unlock_irq() to do the synchronization. And we probably need to keep the read_lock_irqsave()/read_lock_irqrestore() logic since I can see the virtio_ccw_int_handler() to be called from process context (e.g from the io_subchannel_quiesce()). Thanks > > 2 locks held by kworker/u4:0/9: > #0: 0288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > process_one_work+0x1ea/0x658 > #1: 0384bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: > process_one_work+0x1ea/0x658 > > stack backtrace: > CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191 > Hardware name: QEMU 8561 QEMU (KVM/Linux) > Workqueue: events_unbound async_run_entry_fn > Call Trace: > [<00d8af22>] dump_stack_lvl+0x92/0xd0 > [<002032ac>] mark_lock_irq+0x864/0x968 > [<00203670>] mark_lock.part.0+0x2c0/0x790 > [<00203cea>] mark_usage+0x10a/0x178 > [<0020692a>] __lock_acquire+0x442/0xc20 > [<00207cc4>] lock_acquire.part.0+0xdc/0x228 > [<00207eb6>] lock_acquire+0xa6/0x1b0 > [<00d9c774>] _raw_write_lock+0x54/0xa8 > [<00d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60 > [<008eec04>] register_virtio_device+0xdc/0x1b0 > [<00d5aabe>] virtio_ccw_online+0x246/0x2e8 > [<00c9fecc>] ccw_device_set_online+0x1c4/0x540 > [<00d5a05e>] virtio_ccw_auto_online+0x26/0x50 > [<001ba2b0>] async_run_entry_fn+0x40/0x108 > [<001ab9b4>] process_one_work+0x2a4/0x658 > [<001abdd0>] worker_thread+0x68/0x440 > [<001b4668>] kthread+0x128/0x130 > [<00102fac>] __ret_from_fork+0x3c/0x50 > [<00d9d3aa>] ret_from_fork+0xa/0x40 > INFO: lockdep is turned off. > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio
On Wed, 11 May 2022 10:22:59 +0800 Jason Wang wrote: > >CPU0 > > > > lock(&vcdev->irq_lock); > > > > lock(&vcdev->irq_lock); > > > > *** DEADLOCK *** > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > do the synchronization. > > And we probably need to keep the > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > virtio_ccw_int_handler() to be called from process context (e.g from > the io_subchannel_quiesce()). > Sounds correct. Regards, Halil ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio
On Wed, May 11, 2022 at 10:02 PM Halil Pasic wrote: > > On Wed, 11 May 2022 10:22:59 +0800 > Jason Wang wrote: > > > >CPU0 > > > > > > lock(&vcdev->irq_lock); > > > > > > lock(&vcdev->irq_lock); > > > > > > *** DEADLOCK *** > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > do the synchronization. > > > > And we probably need to keep the > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > virtio_ccw_int_handler() to be called from process context (e.g from > > the io_subchannel_quiesce()). > > > > Sounds correct. As Cornelia and Vineeth pointed out, all the paths the vring_interrupt is called with irq disabled. So I will use spin_lock()/spin_unlock() in the next version. Thanks > > Regards, > Halil > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio
On Thu, 12 May 2022 11:31:08 +0800 Jason Wang wrote: > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > > do the synchronization. > > > > > > And we probably need to keep the > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > > virtio_ccw_int_handler() to be called from process context (e.g from > > > the io_subchannel_quiesce()). > > > > > > > Sounds correct. > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt > is called with irq disabled. > > So I will use spin_lock()/spin_unlock() in the next version. Can we do some sort of an assertion that if the kernel is built with the corresponding debug features will make sure this assumption holds (and warn if it does not)? That assertion would also document the fact. If an assertion is not possible, I think we should at least place a strategic comment that documents our assumption. Regards, Halil > > Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio
On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote: > On Thu, 12 May 2022 11:31:08 +0800 > Jason Wang wrote: > > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > > > do the synchronization. > > > > > > > > And we probably need to keep the > > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > > > virtio_ccw_int_handler() to be called from process context (e.g from > > > > the io_subchannel_quiesce()). > > > > > > > > > > Sounds correct. > > > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt > > is called with irq disabled. > > > > So I will use spin_lock()/spin_unlock() in the next version. > > Can we do some sort of an assertion that if the kernel is built with > the corresponding debug features will make sure this assumption holds > (and warn if it does not)? That assertion would also document the fact. Lockdep will do this automatically if you get it wrong, just like it did here. > If an assertion is not possible, I think we should at least place a > strategic comment that documents our assumption. That can't hurt. > Regards, > Halil > > > > > Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio
On Mon, May 16, 2022 at 10:25 PM Michael S. Tsirkin wrote: > > On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote: > > On Thu, 12 May 2022 11:31:08 +0800 > > Jason Wang wrote: > > > > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > > > > do the synchronization. > > > > > > > > > > And we probably need to keep the > > > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > > > > virtio_ccw_int_handler() to be called from process context (e.g from > > > > > the io_subchannel_quiesce()). > > > > > > > > > > > > > Sounds correct. > > > > > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt > > > is called with irq disabled. > > > > > > So I will use spin_lock()/spin_unlock() in the next version. > > > > Can we do some sort of an assertion that if the kernel is built with > > the corresponding debug features will make sure this assumption holds > > (and warn if it does not)? That assertion would also document the fact. > > Lockdep will do this automatically if you get it wrong, just like it > did here. > > > If an assertion is not possible, I think we should at least place a > > strategic comment that documents our assumption. > > That can't hurt. I will add some comments here. Thanks > > > Regards, > > Halil > > > > > > > > Thanks > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization