Hi Alex According to your suggestion, the final patch is as follows:
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 1d9fb25..869dce5 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -352,11 +352,13 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, vdev->ctx[vector].producer.token = trigger; vdev->ctx[vector].producer.irq = irq; ret = irq_bypass_register_producer(&vdev->ctx[vector].producer); - if (unlikely(ret)) + if (unlikely(ret)) { dev_info(&pdev->dev, "irq bypass producer (token %p) registration fails: %d\n", vdev->ctx[vector].producer.token, ret); + vdev->ctx[vector].producer.token = NULL; + } vdev->ctx[vector].trigger = trigger; return 0; -- I applied this patch, after several days of testing, it is now normal and the BUG no longer occurs. Now the kernel will only print the log of registration failure, Thanks Alex. Alex Williamson <alex.william...@redhat.com> 于2020年10月10日周六 下午10:26写道: > > On Sat, 10 Oct 2020 19:01:30 +0800 > gchen chen <gchen.guo...@gmail.com> wrote: > > > Alex Williamson <alex.william...@redhat.com> 于2020年10月10日周六 上午2:44写道: > > > > > > On Fri, 9 Oct 2020 12:30:04 +0800 > > > gchen chen <gchen.guo...@gmail.com> wrote: > > > > > > > Alex Williamson <alex.william...@redhat.com> 于2020年9月30日周三 下午10:09写道: > > > > > > > > > > > > > > > Please version your postings so we know which one to consider as the > > > > > current proposal. > > > > > > > > > > On Wed, 30 Sep 2020 20:54:39 +0800 > > > > > guomin_c...@sina.com wrote: > > > > > > > > > > > From: guomin chen <guomin_c...@sina.com> > > > > > > > > > > > > When the producer object registration fails,In the future, due to > > > > > > incorrect matching when unregistering, list_del(&producer->node) > > > > > > may still be called, then trigger a BUG: > > > > > > > > > > > > vfio-pci 0000:db:00.0: irq bypass producer (token > > > > > > 0000000060c8cda5) registration fails: -16 > > > > > > vfio-pci 0000:db:00.0: irq bypass producer (token > > > > > > 0000000060c8cda5) registration fails: -16 > > > > > > vfio-pci 0000:db:00.0: irq bypass producer (token > > > > > > 0000000060c8cda5) registration fails: -16 > > > > > > ... > > > > > > list_del corruption, ffff8f7fb8ba0828->next is LIST_POISON1 > > > > > > (dead000000000100) > > > > > > ------------[ cut here ]------------ > > > > > > kernel BUG at lib/list_debug.c:47! > > > > > > invalid opcode: 0000 [#1] SMP NOPTI > > > > > > CPU: 29 PID: 3914 Comm: qemu-kvm Kdump: loaded Tainted: G E > > > > > > -------- - -4.18.0-193.6.3.el8.x86_64 #1 > > > > > > Hardware name: Lenovo ThinkSystem SR650 > > > > > > -[7X06CTO1WW]-/-[7X06CTO1WW]-, > > > > > > BIOS -[IVE636Z-2.13]- 07/18/2019 > > > > > > RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c > > > > > > Code: ce ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 40 85 4d 88 e8 8c > > > > > > bc > > > > > > ce ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 d0 85 4d 88 e8 78 > > > > > > bc > > > > > > ce ff <0f> 0b 48 c7 c7 80 86 4d 88 e8 6a bc ce ff 0f 0b 48 > > > > > > 89 f2 48 89 fe > > > > > > RSP: 0018:ffffaa9d60197d20 EFLAGS: 00010246 > > > > > > RAX: 000000000000004e RBX: ffff8f7fb8ba0828 RCX: > > > > > > 0000000000000000 > > > > > > RDX: 0000000000000000 RSI: ffff8f7fbf4d6a08 RDI: > > > > > > ffff8f7fbf4d6a08 > > > > > > RBP: 0000000000000000 R08: 000000000000084b R09: > > > > > > 000000000000005d > > > > > > R10: 0000000000000000 R11: ffffaa9d60197bd0 R12: > > > > > > ffff8f4fbe863000 > > > > > > R13: 00000000000000c2 R14: 0000000000000000 R15: > > > > > > 0000000000000000 > > > > > > FS: 00007f7cb97fa700(0000) GS:ffff8f7fbf4c0000(0000) > > > > > > knlGS:0000000000000000 > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > CR2: 00007fcf31da4000 CR3: 0000005f6d404001 CR4: > > > > > > 00000000007626e0 > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > > > 0000000000000000 > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > > > > > 0000000000000400 > > > > > > PKRU: 55555554 > > > > > > Call Trace: > > > > > > irq_bypass_unregister_producer+0x9b/0xf0 [irqbypass] > > > > > > vfio_msi_set_vector_signal+0x8c/0x290 [vfio_pci] > > > > > > ? load_fixmap_gdt+0x22/0x30 > > > > > > vfio_msi_set_block+0x6e/0xd0 [vfio_pci] > > > > > > vfio_pci_ioctl+0x218/0xbe0 [vfio_pci] > > > > > > ? kvm_vcpu_ioctl+0xf2/0x5f0 [kvm] > > > > > > do_vfs_ioctl+0xa4/0x630 > > > > > > ? syscall_trace_enter+0x1d3/0x2c0 > > > > > > ksys_ioctl+0x60/0x90 > > > > > > __x64_sys_ioctl+0x16/0x20 > > > > > > do_syscall_64+0x5b/0x1a0 > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > > > > > > > > Cc: Alex Williamson <alex.william...@redhat.com> > > > > > > Cc: Cornelia Huck <coh...@redhat.com> > > > > > > Cc: Jiang Yi <gian...@amazon.com> > > > > > > Cc: Marc Zyngier <m...@kernel.org> > > > > > > Cc: Peter Xu <pet...@redhat.com> > > > > > > Cc: Eric Auger <eric.au...@redhat.com> > > > > > > Cc: "Michael S. Tsirkin" <m...@redhat.com> > > > > > > Cc: Jason Wang <jasow...@redhat.com> > > > > > > Cc: k...@vger.kernel.org > > > > > > Cc: linux-kernel@vger.kernel.org > > > > > > Signed-off-by: guomin chen <guomin_c...@sina.com> > > > > > > --- > > > > > > drivers/vfio/pci/vfio_pci_intrs.c | 13 +++++++++++-- > > > > > > drivers/vhost/vdpa.c | 7 +++++++ > > > > > > 2 files changed, 18 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c > > > > > > b/drivers/vfio/pci/vfio_pci_intrs.c > > > > > > index 1d9fb25..c371943 100644 > > > > > > --- a/drivers/vfio/pci/vfio_pci_intrs.c > > > > > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > > > > > > @@ -352,12 +352,21 @@ static int vfio_msi_set_vector_signal(struct > > > > > > vfio_pci_device *vdev, > > > > > > vdev->ctx[vector].producer.token = trigger; > > > > > > vdev->ctx[vector].producer.irq = irq; > > > > > > ret = > > > > > > irq_bypass_register_producer(&vdev->ctx[vector].producer); > > > > > > - if (unlikely(ret)) > > > > > > + if (unlikely(ret)) { > > > > > > dev_info(&pdev->dev, > > > > > > "irq bypass producer (token %p) registration fails: > > > > > > %d\n", > > > > > > vdev->ctx[vector].producer.token, ret); > > > > > > > > > > > > - vdev->ctx[vector].trigger = trigger; > > > > > > + kfree(vdev->ctx[vector].name); > > > > > > + eventfd_ctx_put(trigger); > > > > > > + > > > > > > + cmd = vfio_pci_memory_lock_and_enable(vdev); > > > > > > + free_irq(irq, trigger); > > > > > > + vfio_pci_memory_unlock_and_restore(vdev, cmd); > > > > > > + > > > > > > + vdev->ctx[vector].trigger = NULL; > > > > > > + } else > > > > > > + vdev->ctx[vector].trigger = trigger; > > > > > > > > > > > > return 0; > > > > > > } > > > > > > > > > > Once again, the irq bypass registration cannot cause the vector setup > > > > > to fail, either by returning an error code or failing to configure the > > > > > vector while returning success. It's my assertion that we simply need > > > > > to set the producer.token to NULL on failure such that unregistering > > > > > the producer will not generate a match, as you've done below. The > > > > > vector still works even if this registration fails. > > > > > > > > > Yes, the irq bypass registration cannot cause the vector setup to fail. > > > > But if I simply set producer.token to NULL when fails, instead of > > > > cleaning up vector, it will trigger the following BUG: > > > > > > > > vfio_ecap_init: 0000:db:00.0 hiding ecap 0x1e@0x310 > > > > vfio-pci 0000:db:00.0: irq bypass producer (token 000000004409229f) > > > > registration fails: -16 > > > > ------------[ cut here ]------------ > > > > kernel BUG at drivers/pci/msi.c:352! > > > > invalid opcode: 0000 [#1] SMP NOPTI > > > > CPU: 55 PID: 9389 Comm: qemu-kvm Kdump: loaded Tainted: G > > > > E --------- - - 4.18.0-193.irqb.r1.el8.x86_64 #1 > > > > Hardware name: Lenovo ThinkSystem SR650 -[7X06CTO1WW]-/-[7X06CTO1WW]-, > > > > BIOS -[IVE636Z-2.13]- 07/18/2019 > > > > RIP: 0010:free_msi_irqs+0x180/0x1b0 > > > > Code: 14 85 c0 0f 84 d5 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 > > > > c5 fe ff ff 8b 7b 10 01 ef e8 d7 4a c9 ff 48 83 78 70 00 74 e3 > > > > <0f> 0b 49 8d b5 b0 00 00 00 e8 e2 e3 c9 ff e9 c7 fe ff ff 48 > > > > 8b 7b > > > > RSP: 0018:ffffaeca4f4bfcd8 EFLAGS: 00010286 > > > > RAX: ffff8bec77441600 RBX: ffff8bbcdb637e40 RCX: 0000000000000000 > > > > RDX: 0000000000000000 RSI: 00000000000001ab RDI: ffffffff8ea5b2a0 > > > > RBP: 0000000000000000 R08: ffff8bec7e746828 R09: ffff8bec7e7466a8 > > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8bbcde921308 > > > > R13: ffff8bbcde921000 R14: 000000000000000b R15: 0000000000000021 > > > > FS: 00007fd18d7fa700(0000) GS:ffff8bec7f6c0000(0000) > > > > knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 00007f83650024a0 CR3: 000000476e70c001 CR4: 00000000007626e0 > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > PKRU: 55555554 > > > > Call Trace: > > > > pci_disable_msix+0xf3/0x120 > > > > pci_free_irq_vectors+0xe/0x20 > > > > vfio_msi_disable+0x89/0xd0 [vfio_pci] > > > > vfio_pci_set_msi_trigger+0x229/0x2d0 [vfio_pci] > > > > vfio_pci_ioctl+0x24f/0xdb0 [vfio_pci] > > > > ? pollwake+0x74/0x90 > > > > ? wake_up_q+0x70/0x70 > > > > do_vfs_ioctl+0xa4/0x630 > > > > ? __alloc_fd+0x33/0x140 > > > > ? syscall_trace_enter+0x1d3/0x2c0 > > > > ksys_ioctl+0x60/0x90 > > > > __x64_sys_ioctl+0x16/0x20 > > > > do_syscall_64+0x5b/0x1a0 > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > > Please post the patch that triggers this, I'm not yet convinced we're > > > speaking of the same solution. The user ioctl cannot fail due to the > > > failure to setup a bypass accelerator, nor can the ioctl return success > > > without configuring all of the user requested vectors, which is what I > > > understand the v2 patch above to do. We simply want to configure the > > > failed producer such that when we unregister it at user request, we > > > avoid creating a bogus match. It's not apparent to me why doing that > > > would cause any changes to the setup or teardown of the MSI vector in > > > PCI code. Thanks, > > > > > > Alex > > > > > Hi Alex, as you said before, I only need to set the producer.token > > to NULL on failure such that unregistering the producer will not > > generate a match. > > > > So I wrote a patch (As you said patch v2), as follows: > > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c > > b/drivers/vfio/pci/vfio_pci_intrs.c > > index 1d9fb25..1969cd0 100644 > > --- a/drivers/vfio/pci/vfio_pci_intrs.c > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > > @@ -352,12 +352,15 @@ static int vfio_msi_set_vector_signal(struct > > vfio_pci_device *vdev, > > vdev->ctx[vector].producer.token = trigger; > > vdev->ctx[vector].producer.irq = irq; > > ret = irq_bypass_register_producer(&vdev->ctx[vector].producer); > > - if (unlikely(ret)) > > + if (unlikely(ret)) { > > dev_info(&pdev->dev, > > "irq bypass producer (token %p) registration fails: %d\n", > > vdev->ctx[vector].producer.token, ret); > > > > - vdev->ctx[vector].trigger = trigger; > > + eventfd_ctx_put(trigger); > > + vdev->ctx[vector].trigger = NULL; > > + } else > > + vdev->ctx[vector].trigger = trigger; > > > > return 0; > > } > > How does this remotely match "only need to set the producer.token to > NULL on failure"? What I'm suggesting is: > > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -352,10 +352,12 @@ static int vfio_msi_set_vector_signal(struct > vfio_pci_device *vdev, > vdev->ctx[vector].producer.token = trigger; > vdev->ctx[vector].producer.irq = irq; > ret = irq_bypass_register_producer(&vdev->ctx[vector].producer); > - if (unlikely(ret)) > + if (unlikely(ret)) { > dev_info(&pdev->dev, > "irq bypass producer (token %p) registration fails: %d\n", > vdev->ctx[vector].producer.token, ret); > + vdev->ctx[vector].producer.token = NULL; > + } > > vdev->ctx[vector].trigger = trigger; > > This is exactly what you proposed for vhost/vdpa.c, so I don't see why > you're playing with the trigger context, which will clearly cause > problems. Thanks, > > Alex > > > -- > > > > However, when I use this patch to testing, the following bugs are > > triggered when vfio_msi_disable() called because the msi vector > > is not cleaned up: > > > > vfio_ecap_init: 0000:db:00.0 hiding ecap 0x1e@0x310 > > vfio-pci 0000:db:00.0: irq bypass producer (token 000000004409229f) > > registration fails: -16 > > ------------[ cut here ]------------ > > kernel BUG at drivers/pci/msi.c:352! > > invalid opcode: 0000 [#1] SMP NOPTI > > CPU: 55 PID: 9389 Comm: qemu-kvm Kdump: loaded Tainted: G > > E --------- - - 4.18.0-193.irqb.r1.el8.x86_64 #1 > > Hardware name: Lenovo ThinkSystem SR650 -[7X06CTO1WW]-/-[7X06CTO1WW]-, > > BIOS -[IVE636Z-2.13]- 07/18/2019 > > RIP: 0010:free_msi_irqs+0x180/0x1b0 > > Code: 14 85 c0 0f 84 d5 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 > > c5 fe ff ff 8b 7b 10 01 ef e8 d7 4a c9 ff 48 83 78 70 00 74 e3 > > <0f> 0b 49 8d b5 b0 00 00 00 e8 e2 e3 c9 ff e9 c7 fe ff ff 48 > > 8b 7b > > RSP: 0018:ffffaeca4f4bfcd8 EFLAGS: 00010286 > > RAX: ffff8bec77441600 RBX: ffff8bbcdb637e40 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: 00000000000001ab RDI: ffffffff8ea5b2a0 > > RBP: 0000000000000000 R08: ffff8bec7e746828 R09: ffff8bec7e7466a8 > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8bbcde921308 > > R13: ffff8bbcde921000 R14: 000000000000000b R15: 0000000000000021 > > FS: 00007fd18d7fa700(0000) GS:ffff8bec7f6c0000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007f83650024a0 CR3: 000000476e70c001 CR4: 00000000007626e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > PKRU: 55555554 > > Call Trace: > > pci_disable_msix+0xf3/0x120 > > pci_free_irq_vectors+0xe/0x20 > > vfio_msi_disable+0x89/0xd0 [vfio_pci] > > vfio_pci_set_msi_trigger+0x229/0x2d0 [vfio_pci] > > vfio_pci_ioctl+0x24f/0xdb0 [vfio_pci] > > ? pollwake+0x74/0x90 > > ? wake_up_q+0x70/0x70 > > do_vfs_ioctl+0xa4/0x630 > > ? __alloc_fd+0x33/0x140 > > ? syscall_trace_enter+0x1d3/0x2c0 > > ksys_ioctl+0x60/0x90 > > __x64_sys_ioctl+0x16/0x20 > > do_syscall_64+0x5b/0x1a0 > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > >