Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On 23:18-20211210, Thomas Gleixner wrote: [...] > > It's also available from git: > > git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git > msi-v3-part-2 [...] > --- > drivers/dma/ti/k3-udma-private.c|6 > drivers/dma/ti/k3-udma.c| 14 - > drivers/irqchip/irq-ti-sci-inta.c |2 > drivers/soc/ti/k3-ringacc.c |6 > drivers/soc/ti/ti_sci_inta_msi.c| 22 -- > include/linux/soc/ti/ti_sci_inta_msi.h |1 Also while testing on TI K3 platforms, I noticed: msi_device_data_release/msi_device_destroy_sysfs in am64xx-evm / j7200 [1] https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33#file-am64xx-evm-txt-L1018 [2] https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33#file-j7200-evm-txt-L1076 Which is not present in vanilla v5.16-rc4 v5.16-rc4: https://gist.github.com/nmenon/1aee3f0a7da47d5e9dcb7336b32a70cb msi-v3-part-2: https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33 (.config https://gist.github.com/nmenon/ec6f95303828abf16a64022d8e3a269f) Vs: next-20211208: https://gist.github.com/nmenon/f5ca3558bd5c1fbe62dc5ceb420b536e -- Regards, Nishanth Menon Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Mon, Dec 13 2021 at 12:29, Nishanth Menon wrote: > On 23:18-20211210, Thomas Gleixner wrote: > Also while testing on TI K3 platforms, I noticed: > > msi_device_data_release/msi_device_destroy_sysfs in am64xx-evm / j7200 The warning complains about a device being released with MSI descriptors still attached to the device. This was added by: 5b012cede0f7 ("device: Add device::msi_data pointer and struct msi_device_data") That's not a regression caused by this commit. The warning is just exposing an already existing problem in the iwlwifi driver, which seems to do: probe() setup_pci_msi[x]_interrupts() start_drv() if (try_to_load_firmware() == FAIL) device_release_driver() ... msi_device_data_release() WARN() Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On 10:41-20211214, Thomas Gleixner wrote: > On Mon, Dec 13 2021 at 12:29, Nishanth Menon wrote: > > On 23:18-20211210, Thomas Gleixner wrote: > > Also while testing on TI K3 platforms, I noticed: > > > > msi_device_data_release/msi_device_destroy_sysfs in am64xx-evm / j7200 > > The warning complains about a device being released with MSI descriptors > still attached to the device. This was added by: > > 5b012cede0f7 ("device: Add device::msi_data pointer and struct > msi_device_data") > > That's not a regression caused by this commit. The warning is just > exposing an already existing problem in the iwlwifi driver, which seems > to do: > >probe() > setup_pci_msi[x]_interrupts() > start_drv() >if (try_to_load_firmware() == FAIL) > device_release_driver() > ... > msi_device_data_release() > WARN() > Agreed that the warning is fine, the null pointer exception that follows [1] [2] it however does'nt look right and it can be trivially fixed with the following fixup for ee90787487bc ("genirq/msi: Provide msi_device_populate/destroy_sysfs()") below, with that the log looks like [3] - the warn is good, the null pointer exception and resultant crash could be avoided (not saying this is the best solution): diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index ab5e83f41188..24edb870c66f 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -252,11 +252,14 @@ int msi_device_populate_sysfs(struct device *dev) */ void msi_device_destroy_sysfs(struct device *dev) { - const struct attribute_group **msi_irq_groups = dev->msi.data->attrs; + const struct attribute_group **msi_irq_groups; struct device_attribute *dev_attr; struct attribute **msi_attrs; int count = 0; + if (!dev->msi.data) + return; + msi_irq_groups = dev->msi.data->attrs; dev->msi.data->attrs = NULL; if (!msi_irq_groups) return; [1] https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33#file-am64xx-evm-txt-L1049 [2] https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33#file-j7200-evm-txt-L [3] https://gist.github.com/nmenon/575afe7d04463026a7e420a76c2c1c5b https://gist.github.com/nmenon/575afe7d04463026a7e420a76c2c1c5b#file-am64xx-evm-txt-L1018 https://gist.github.com/nmenon/575afe7d04463026a7e420a76c2c1c5b#file-j7200-evm-txt-L1053 -- Regards, Nishanth Menon Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Tue, Dec 14 2021 at 10:22, Nishanth Menon wrote: > On 10:41-20211214, Thomas Gleixner wrote: > Agreed that the warning is fine, the null pointer exception that follows > [1] [2] it however does'nt look right and it can be trivially fixed with the > following fixup for ee90787487bc ("genirq/msi: Provide > msi_device_populate/destroy_sysfs()") below, with that the log looks > like [3] - the warn is good, the null pointer exception and resultant > crash could be avoided (not saying this is the best solution): Aaargh. [ 13.478122] Call trace: [ 13.509042] msi_device_destroy_sysfs+0x18/0x88 [ 13.509058] msi_domain_free_irqs+0x34/0x58 [ 13.509064] pci_msi_teardown_msi_irqs+0x30/0x3c [ 13.509072] free_msi_irqs+0x78/0xd4 [ 13.509077] pci_disable_msix+0x138/0x164 [ 13.529930] pcim_release+0x70/0x238 [ 13.529942] devres_release_all+0x9c/0xfc [ 13.529951] device_release_driver_internal+0x1a0/0x244 [ 13.542725] device_release_driver+0x18/0x24 [ 13.542741] iwl_req_fw_callback+0x1a28/0x1ddc [iwlwifi] [ 13.552308] request_firmware_work_func+0x50/0x9c [ 13.552320] process_one_work+0x194/0x25c That's not a driver problem, that's an ordering issue vs. the devres muck. Let me go back to the drawing board. Sigh... Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Tue, Dec 14 2021 at 17:36, Thomas Gleixner wrote: > On Tue, Dec 14 2021 at 10:22, Nishanth Menon wrote: >> On 10:41-20211214, Thomas Gleixner wrote: > [ 13.478122] Call trace: > [ 13.509042] msi_device_destroy_sysfs+0x18/0x88 > [ 13.509058] msi_domain_free_irqs+0x34/0x58 > [ 13.509064] pci_msi_teardown_msi_irqs+0x30/0x3c > [ 13.509072] free_msi_irqs+0x78/0xd4 > [ 13.509077] pci_disable_msix+0x138/0x164 > [ 13.529930] pcim_release+0x70/0x238 > [ 13.529942] devres_release_all+0x9c/0xfc > [ 13.529951] device_release_driver_internal+0x1a0/0x244 > [ 13.542725] device_release_driver+0x18/0x24 > [ 13.542741] iwl_req_fw_callback+0x1a28/0x1ddc [iwlwifi] > [ 13.552308] request_firmware_work_func+0x50/0x9c > [ 13.552320] process_one_work+0x194/0x25c > > That's not a driver problem, that's an ordering issue vs. the devres > muck. Let me go back to the drawing board. Sigh... Which is pretty obvious why: pcim_enable_device() devres_alloc(pcim_release...); ... pci_irq_alloc() msi_setup_device_data() devres_alloc(msi_device_data_release, ...) and once the device is released: msi_device_data_release() ... pcim_release() pci_disable_msi[x]() Groan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
Nishanth, On Tue, Dec 14 2021 at 18:03, Thomas Gleixner wrote: > msi_device_data_release() > ... > pcim_release() >pci_disable_msi[x]() > > Groan I think I managed to distangle this. Can you please give: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-2 and/or the full pile: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-3 a test ride? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On 21:15-20211214, Thomas Gleixner wrote: > Nishanth, > > On Tue, Dec 14 2021 at 18:03, Thomas Gleixner wrote: > > msi_device_data_release() > > ... > > pcim_release() > >pci_disable_msi[x]() > > > > Groan > > I think I managed to distangle this. Can you please give: > >git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-2 Umm.. I am not entirely sure what is going on.. but all kinds of weird corruption seems to occur with msi-v4-part-2 that does'nt seem to be present in v5.16-rc5. (I use NFS since ethernet in K3 platforms use inta/intr and dma that is impacted by this series). I will try and rebase your patches on v5.16-rc4 to be sure as well and report back later today once i get some time. [1] https://gist.github.com/nmenon/a66e022926c4c15313c45d44313d860c msi-v4-part-2 [2] https://gist.github.com/nmenon/43085664d69ad846d596e76a06ed0656 v5.16-rc5 -- Regards, Nishanth Menon Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
Nishanth, On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote: > On 21:15-20211214, Thomas Gleixner wrote: >> I think I managed to distangle this. Can you please give: >> >>git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-2 > > > Umm.. I am not entirely sure what is going on.. but all kinds of weird > corruption seems to occur with msi-v4-part-2 that does'nt seem to be > present in v5.16-rc5. (I use NFS since ethernet in K3 platforms use > inta/intr and dma that is impacted by this series). > > I will try and rebase your patches on v5.16-rc4 to be sure as well and > report back later today once i get some time. > > [1] https://gist.github.com/nmenon/a66e022926c4c15313c45d44313d860c > msi-v4-part-2 > [2] https://gist.github.com/nmenon/43085664d69ad846d596e76a06ed0656 v5.16-rc5 thanks for trying. I'll have a look again with brain awake tomorrow morning. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Tue, Dec 14 2021 at 22:19, Thomas Gleixner wrote: > On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote: > > thanks for trying. I'll have a look again with brain awake tomorrow > morning. Morning was busy with other things, but I found what my sleepy brain managed to do wrong yesterday evening. Let me reintegrate the pile and I'll send you an update. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Wed, Dec 15 2021 at 17:18, Thomas Gleixner wrote: > On Tue, Dec 14 2021 at 22:19, Thomas Gleixner wrote: >> On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote: >> >> thanks for trying. I'll have a look again with brain awake tomorrow >> morning. > > Morning was busy with other things, but I found what my sleepy brain > managed to do wrong yesterday evening. > > Let me reintegrate the pile and I'll send you an update. git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4.1-part-2 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4.2-part-3 That should cure the problem. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On 17:35-20211215, Thomas Gleixner wrote: >git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git > msi-v4.1-part-2 [...] > That should cure the problem. And it sure does. Thanks for looking closer and providing a fix. https://gist.github.com/nmenon/9862a1c31b17fd6dfe0a30c54d396187 (msi-v4.1-part-2) looks clean Also while I had detected pointer corruption in the previous v4 https://gist.github.com/nmenon/ce4d12f460db5cd511185c047d5d35d0 Running it again on v4.1 does indicate the fix is in place. https://gist.github.com/nmenon/3231fbb0faa1b9c827b40d1ae6160626 please feel free to add: Tested-by: Nishanth Menon -- Regards, Nishanth Menon Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
Hi Thomas, On 17:35-20211215, Thomas Gleixner wrote: >git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git > msi-v4.2-part-3 As you helped offline, summarizing the details on part3 of the series: I was seeing failure[1] of NFS(DMA) on all TI K3 platforms: [1.013258] ti-bcdma 485c0100.dma-controller: Number of rings: 68 [1.019963] ti-bcdma 485c0100.dma-controller: Failed to allocate IRQs -28 [1.026938] ti-bcdma 485c0100.dma-controller: Failed to allocate MSI interrupts Rationale as you explained: " -28 is ENOSPC, which is returned when the interrupt allocation in the MSI domain fails. Fix below. " Which turned out to be the fixup[2] you suggested and I confirm that fixes the problem for me. With the fixup in place: Tested-by: Nishanth Menon for part 3 of the series as well. Thanks once again for your help. Hope we can roll in the fixes for part3. [1] https://gist.github.com/nmenon/5971ab27aa626c022e276cc946e4b6c3 [2] --- a/drivers/soc/ti/ti_sci_inta_msi.c +++ b/drivers/soc/ti/ti_sci_inta_msi.c @@ -68,6 +68,7 @@ static int ti_sci_inta_msi_alloc_descs(s int set, i, count = 0; memset(&msi_desc, 0, sizeof(msi_desc)); + msi_desc.nvec_used = 1; for (set = 0; set < res->sets; set++) { for (i = 0; i < res->desc[set].num; i++, count++) { -- Regards, Nishanth Menon Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
From: Thomas Gleixner Sent: Wednesday, December 15, 2021 8:36 AM > > On Wed, Dec 15 2021 at 17:18, Thomas Gleixner wrote: > > > On Tue, Dec 14 2021 at 22:19, Thomas Gleixner wrote: > >> On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote: > >> > >> thanks for trying. I'll have a look again with brain awake tomorrow > >> morning. > > > > Morning was busy with other things, but I found what my sleepy brain > > managed to do wrong yesterday evening. > > > > Let me reintegrate the pile and I'll send you an update. > >git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git > msi-v4.1-part-2 >git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git > msi-v4.2-part-3 > > That should cure the problem. Tested the msi-v4.2-part-3 tag in two different Azure/Hyper-V VMs. One is a Generation 1 VM that has legacy PCI devices and one is a Generation 2 VM with no legacy PCI devices. Tested hot add and remove of Mellanox CX-3 and CX-4 SR-IOV NIC virtual functions that are directly mapped into the VM. Also tested local NVMe devices directly mapped into one of the VMs. No issues encountered. So for Azure/Hyper-V specifically, Tested-by: Michael Kelley ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
Nishanth, On Wed, Dec 15 2021 at 19:45, Nishanth Menon wrote: > On 17:35-20211215, Thomas Gleixner wrote: > Thanks once again for your help. Hope we can roll in the fixes for > part3. Sure, it's only the one-liner for ti sci. Got it folded already. Thanks for your help and testing! tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu