Re: REGRESSION: Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-11-12 Thread Thomas Gleixner
On Thu, Nov 12 2020 at 15:15, Thomas Gleixner wrote:
> On Thu, Nov 12 2020 at 08:55, Jason Gunthorpe wrote:
>> On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
>> They were unable to bisect further into the series because some of the
>> interior commits don't boot :(
>>
>> When we try to load the mlx5 driver on a bare metal VF it gets this:
>>
>> [Thu Oct 22 08:54:51 2020] DMAR: DRHD: handling fault status reg 2
>> [Thu Oct 22 08:54:51 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
>> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
>> [Thu Oct 22 08:55:04 2020] mlx5_core :42:00.1 eth4: Link down
>> [Thu Oct 22 08:55:11 2020] mlx5_core :42:00.1 eth4: Link up
>> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
>> mlx5_cmd_eq_recover:264:(pid 3390): Recovered 1 EQEs on cmd_eq
>> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
>> wait_func_handle_exec_timeout:1051:(pid 3390): cmd0: CREATE_EQ(0×301) 
>> recovered after timeout
>> [Thu Oct 22 08:55:54 2020] DMAR: DRHD: handling fault status reg 102
>> [Thu Oct 22 08:55:54 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
>> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
>>
>> If you have any idea Ziyad and Itay can run any debugging you like.
>>
>> I suppose it is because this series is handing out compatability
>> addr/data pairs while the IOMMU is setup to only accept remap ones
>> from SRIOV VFs?
>
> So the issue seems to be that the VF device has the default irq domain
> assigned and not the remapping domain. Let me stare into the code to see
> how these VF devices are set up and registered with the IOMMU/remap
> unit.

Found the reason. Will fix it after walking the dogs. Brain needs some
fresh air.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: REGRESSION: Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-11-12 Thread Thomas Gleixner
Jason,

(trimmed CC list a bit)

On Thu, Nov 12 2020 at 08:55, Jason Gunthorpe wrote:
> On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> They were unable to bisect further into the series because some of the
> interior commits don't boot :(
>
> When we try to load the mlx5 driver on a bare metal VF it gets this:
>
> [Thu Oct 22 08:54:51 2020] DMAR: DRHD: handling fault status reg 2
> [Thu Oct 22 08:54:51 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
> [Thu Oct 22 08:55:04 2020] mlx5_core :42:00.1 eth4: Link down
> [Thu Oct 22 08:55:11 2020] mlx5_core :42:00.1 eth4: Link up
> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
> mlx5_cmd_eq_recover:264:(pid 3390): Recovered 1 EQEs on cmd_eq
> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
> wait_func_handle_exec_timeout:1051:(pid 3390): cmd0: CREATE_EQ(0×301) 
> recovered after timeout
> [Thu Oct 22 08:55:54 2020] DMAR: DRHD: handling fault status reg 102
> [Thu Oct 22 08:55:54 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
>
> If you have any idea Ziyad and Itay can run any debugging you like.
>
> I suppose it is because this series is handing out compatability
> addr/data pairs while the IOMMU is setup to only accept remap ones
> from SRIOV VFs?

So the issue seems to be that the VF device has the default irq domain
assigned and not the remapping domain. Let me stare into the code to see
how these VF devices are set up and registered with the IOMMU/remap
unit.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

REGRESSION: Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-11-12 Thread Jason Gunthorpe
On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.

Hi Thomas,

Our test team has been struggling with a regression on bare metal
SRIOV VFs since -rc1 that they were able to bisect to this series

This commit tests good:

5712c3ed549e ("Merge tag 'armsoc-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc")

This commit tests bad:

981aa1d366bf ("PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS")

They were unable to bisect further into the series because some of the
interior commits don't boot :(

When we try to load the mlx5 driver on a bare metal VF it gets this:

[Thu Oct 22 08:54:51 2020] DMAR: DRHD: handling fault status reg 2
[Thu Oct 22 08:54:51 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
index 1600 [fault reason 37] Blocked a compatibility format interrupt request
[Thu Oct 22 08:55:04 2020] mlx5_core :42:00.1 eth4: Link down
[Thu Oct 22 08:55:11 2020] mlx5_core :42:00.1 eth4: Link up
[Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: mlx5_cmd_eq_recover:264:(pid 
3390): Recovered 1 EQEs on cmd_eq
[Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
wait_func_handle_exec_timeout:1051:(pid 3390): cmd0: CREATE_EQ(0×301) 
recovered after timeout
[Thu Oct 22 08:55:54 2020] DMAR: DRHD: handling fault status reg 102
[Thu Oct 22 08:55:54 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
index 1600 [fault reason 37] Blocked a compatibility format interrupt request

If you have any idea Ziyad and Itay can run any debugging you like.

I suppose it is because this series is handing out compatability
addr/data pairs while the IOMMU is setup to only accept remap ones
from SRIOV VFs?

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu