Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-04 Thread Jean-Philippe Brucker
Hi Shameer,

On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
> Hi Jean/zhangfei,
> Is it possible to have a branch with minimum required SVA/UACCE related 
> patches
> that are already public and can be a "stable" candidate for future respin of 
> Eric's series?
> Please share your thoughts.

By "stable" you mean a fixed branch with the latest SVA/UACCE patches
based on mainline?  The uacce-devel branches from
https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
(they track the latest sva/zip-devel branch
https://jpbrucker.net/git/linux/ which is roughly based on mainline.)

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-04 Thread Shameerali Kolothum Thodi
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> Sent: 04 December 2020 09:54
> To: Shameerali Kolothum Thodi 
> Cc: Auger Eric ; wangxingang
> ; Xieyingtai ;
> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> vivek.gau...@arm.com; alex.william...@redhat.com;
> zhangfei@linaro.org; robin.mur...@arm.com;
> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
> ; qubingbing 
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Shameer,
> 
> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
> > Hi Jean/zhangfei,
> > Is it possible to have a branch with minimum required SVA/UACCE related
> patches
> > that are already public and can be a "stable" candidate for future respin of
> Eric's series?
> > Please share your thoughts.
> 
> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
> based on mainline? 

Yes. 

 The uacce-devel branches from
> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> (they track the latest sva/zip-devel branch
> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)

Thanks. 

Hi Eric,

Could you please take a look at the above branches and see whether it make sense
to rebase on top of either of those?

>From vSVA point of view, it will be less rebase hassle if we can do that.

Thanks,
Shameer

> Thanks,
> Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-04 Thread Auger Eric
Hi Shameer, Jean-Philippe,

On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
> Hi Jean,
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
>> Sent: 04 December 2020 09:54
>> To: Shameerali Kolothum Thodi 
>> Cc: Auger Eric ; wangxingang
>> ; Xieyingtai ;
>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> vivek.gau...@arm.com; alex.william...@redhat.com;
>> zhangfei@linaro.org; robin.mur...@arm.com;
>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
>> ; qubingbing 
>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>> unmanaged ASIDs
>>
>> Hi Shameer,
>>
>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
>>> Hi Jean/zhangfei,
>>> Is it possible to have a branch with minimum required SVA/UACCE related
>> patches
>>> that are already public and can be a "stable" candidate for future respin of
>> Eric's series?
>>> Please share your thoughts.
>>
>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>> based on mainline? 
> 
> Yes. 
> 
>  The uacce-devel branches from
>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>> (they track the latest sva/zip-devel branch
>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> 
> Thanks. 
> 
> Hi Eric,
> 
> Could you please take a look at the above branches and see whether it make 
> sense
> to rebase on top of either of those?
> 
> From vSVA point of view, it will be less rebase hassle if we can do that.

Sure. I will rebase on top of this ;-)

Thanks

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks,
>> Jean
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD-Vi: Event logged [IO_PAGE_FAULT device=42:00.0 domain=0x005e address=0xfffffffdf8030000 flags=0x0008]

2020-12-04 Thread Marc Smith
On Thu, Dec 3, 2020 at 1:18 AM Marc Smith  wrote:
>
> Hi,
>
> First, I must preface this email by apologizing in advance for asking
> about a distro kernel (RHEL in this case); so not truly reporting this
> problem and requesting a fix here (I know this should be taken up with
> the vendor), rather hoping someone can give me a few hints/pointers on
> where to look next for debugging this issue.
>
> I'm using RHEL 7.8.2003 (CentOS) with a 3.10.0-1127.18.2.el7 kernel.
> The systems use a Supermicro H12SSW-NT board (AMD), and we have the
> IOMMU enabled along with SR-IOV. I have several virtual machines (QEMU
> KVM) that run on these servers, and I'm passing PCIe end-points into
> the VMs (in some cases the whole PCIe EP itself, and for some devices
> I use SR-IOV and pass in the VFs to the VMs). The VM's run Linux as
> their guest OS (a couple different distros).
>
> While the servers (VMs) are idle, I don't experience any problems. But
> when I start doing a lot of I/O in the virtual machines (iSCSI across
> Ethernet interfaces, disk I/O via SAS HBAs that are passed into the
> VM, etc.) I notice the following after some time at the host layer
> ("hypervisor"):
> Nov 29 10:50:00 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
> device=42:00.0 domain=0x005e address=0xfffdf803 flags=0x0008]
> Nov 29 22:02:03 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
> device=c8:02.1 domain=0x005f address=0xfffdf806 flags=0x0008]
> Nov 30 02:13:54 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
> device=42:00.0 domain=0x005e address=0xfffdf802 flags=0x0008]
> Nov 30 02:28:44 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
> device=c8:02.0 domain=0x005e address=0xfffdf802 flags=0x0008]
> Nov 30 10:48:53 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
> device=01:00.0 domain=0x005e address=0xfffdf804 flags=0x0008]
> Dec  2 07:05:22 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
> device=c8:03.0 domain=0x005e address=0xfffdf801 flags=0x0008]
>
> These events happen to all PCIe devices that are passed into the VMs,
> although not all at once... as you can see on the timestamps above,
> they are not very frequent when under heavy load (in the log snippet
> above, the system was doing a big workload over several days). For the
> Ethernet devices that are passed into the VMs, I noticed that they
> experience transmit hangs / resets in the virtual machines, and when
> these occur, they correspond to a matching IO_PAGE_FAULT that belongs
> to that PCI device.
>
> FWIW, those NIC hangs look like this (visible in the VM guest OS):
> [17879.279091] NETDEV WATCHDOG: s1p1 (bnxt_en): transmit queue 2 timed out
> [17879.279111] WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:447
> dev_watchdog+0x121/0x17e
> ...
> [17879.279213] bnxt_en :01:09.0 s1p1: TX timeout detected,
> starting reset task!
> [17883.075299] bnxt_en :01:09.0 s1p1: Resp cmpl intr err msg: 0x51
> [17883.075302] bnxt_en :01:09.0 s1p1: hwrm_ring_free type 1
> failed. rc:fff0 err:0
> [17886.957100] bnxt_en :01:09.0 s1p1: Resp cmpl intr err msg: 0x51
> [17886.957103] bnxt_en :01:09.0 s1p1: hwrm_ring_free type 2
> failed. rc:fff0 err:0
> [17890.843023] bnxt_en :01:09.0 s1p1: Resp cmpl intr err msg: 0x51
> [17890.843025] bnxt_en :01:09.0 s1p1: hwrm_ring_free type 2
> failed. rc:fff0 err:0
>
> We see these NIC hangs in the VMs occur with both Broadcom and
> Mellanox Ethernet adapters that are passed into the VMs, so I don't
> think it's the NICs causing the IO_PAGE_FAULT events observed in the
> hypervisor. Plus we see IO_PAGE_FAULT's for devices other than
> Ethernet adapters.
>
>
> I have several of these same servers (all using the same motherboard,
> processor, memory, BIOS, etc.) and they all experience this behavior
> with the IO_PAGE_FAULT events, so I don't believe it to be any one
> faulty server / component. I guess my question is I'm not sure where
> to dig/push next. Is this perhaps an issue with the BIOS/firmware on
> these motherboards? Something with the chipset (AMD IOMMU)? A
> colleague has suggested that even the AGESA may be related. Or should
> I be focusing on the Linux kernel, the AMD IOMMU driver (software)?
>
> I've been poking around other similar bug reports, and I see the
> IO_PAGE_FAULT and NIC reset / transmit hang seem to be related in
> other posts. This commit looked promising:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e50ce03976fbc8ae995a000c4b10c737467beaa
>
> But I see RH has already back-ported it into their
> 3.10.0-1127.18.2.el7 kernel source. I'm open to trying a newer Linux
> vanilla kernel (eg, 5.4.x) but would prefer to resolve this in the
> RHEL kernel I'm using now. I'll take a look at this next, although due
> to the complex nature of this hypervisor/VM setup, it's a bit tedious
> to test.
>
>
> Kernel messages from boot (using the amd_iommu_dump=1 parameter):
> ...
> [0.214395] AMD-Vi: Usi

Re: [EXTERNAL] Re: Question regarding VIOT proposal

2020-12-04 Thread Jean-Philippe Brucker
Hi,

On Thu, Dec 03, 2020 at 04:01:27PM -0700, Al Stone wrote:
> On 03 Dec 2020 22:21, Yinghan Yang wrote:
> > Hi Jean,
> > 
> > I'm sorry for the delayed response. I think the new "PCI range node" 
> > description makes sense. Could you please make this change in the proposal?
> > 
> > Other than that, the proposal looks good to go.

Thanks for the feedback, I made the change

> > 
> > Thanks,
> > Yinghan
> 
> Jean, were you going to update your existing doc first?  If you
> do that, then I can cut and paste the changes into the existing
> ASWG proposal.  Or do you need to send out an RFC to the mailing
> list first and finalize it there?

I updated the doc: https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf
You can incorporate it into the ASWG proposal.
Changes since v8:
* One typo (s/programing/programming/)
* Modified the PCI Range node to include a segment range.

I also updated the Linux and QEMU implementations on branch
virtio-iommu/devel in https://jpbrucker.net/git/linux/ and
https://jpbrucker.net/git/qemu/

Thanks again for helping with this

Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: Question regarding VIOT proposal

2020-12-04 Thread Al Stone
On 04 Dec 2020 19:09, Jean-Philippe Brucker wrote:
> Hi,
> 
> On Thu, Dec 03, 2020 at 04:01:27PM -0700, Al Stone wrote:
> > On 03 Dec 2020 22:21, Yinghan Yang wrote:
> > > Hi Jean,
> > > 
> > > I'm sorry for the delayed response. I think the new "PCI range node" 
> > > description makes sense. Could you please make this change in the 
> > > proposal?
> > > 
> > > Other than that, the proposal looks good to go.
> 
> Thanks for the feedback, I made the change
> 
> > > 
> > > Thanks,
> > > Yinghan
> > 
> > Jean, were you going to update your existing doc first?  If you
> > do that, then I can cut and paste the changes into the existing
> > ASWG proposal.  Or do you need to send out an RFC to the mailing
> > list first and finalize it there?
> 
> I updated the doc: https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf
> You can incorporate it into the ASWG proposal.
> Changes since v8:
> * One typo (s/programing/programming/)
> * Modified the PCI Range node to include a segment range.
> 
> I also updated the Linux and QEMU implementations on branch
> virtio-iommu/devel in https://jpbrucker.net/git/linux/ and
> https://jpbrucker.net/git/qemu/
> 
> Thanks again for helping with this
> 
> Jean

Perfect.  Thanks.  I'll update the ASWG info right away.

-- 
ciao,
al
---
Al Stone
Software Engineer
Red Hat, Inc.
a...@redhat.com
---

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [EXTERNAL] Re: Question regarding VIOT proposal

2020-12-04 Thread Yinghan Yang via iommu
Thank you Jean.

Yinghan

-Original Message-
From: Jean-Philippe Brucker  
Sent: Friday, December 4, 2020 10:09 AM
To: Al Stone 
Cc: Yinghan Yang ; 
iommu@lists.linux-foundation.org; Alexander Grest 
; eric.au...@redhat.com; j...@8bytes.org; 
kevin.t...@intel.com; lorenzo.pieral...@arm.com; m...@redhat.com; Boeuf, 
Sebastien 
Subject: Re: [EXTERNAL] Re: Question regarding VIOT proposal

Hi,

On Thu, Dec 03, 2020 at 04:01:27PM -0700, Al Stone wrote:
> On 03 Dec 2020 22:21, Yinghan Yang wrote:
> > Hi Jean,
> > 
> > I'm sorry for the delayed response. I think the new "PCI range node" 
> > description makes sense. Could you please make this change in the proposal?
> > 
> > Other than that, the proposal looks good to go.

Thanks for the feedback, I made the change

> > 
> > Thanks,
> > Yinghan
> 
> Jean, were you going to update your existing doc first?  If you do 
> that, then I can cut and paste the changes into the existing ASWG 
> proposal.  Or do you need to send out an RFC to the mailing list first 
> and finalize it there?

I updated the doc: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjpbrucker.net%2Fvirtio-iommu%2Fviot%2Fviot-v9.pdf&data=04%7C01%7CYinghan.Yang%40microsoft.com%7C91f189f2a0814e6743c308d8987fc809%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637427022395762927%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=uB0xVHvFdF1wkb2D4KJFW8JMGNtiT3tAsoNVU%2FdLlLA%3D&reserved=0
You can incorporate it into the ASWG proposal.
Changes since v8:
* One typo (s/programing/programming/)
* Modified the PCI Range node to include a segment range.

I also updated the Linux and QEMU implementations on branch virtio-iommu/devel 
in 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjpbrucker.net%2Fgit%2Flinux%2F&data=04%7C01%7CYinghan.Yang%40microsoft.com%7C91f189f2a0814e6743c308d8987fc809%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637427022395762927%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8OS6A%2Bw1r77hiWIhUWGiUU1rZTXh0Qmx%2Fu7LzIIOalo%3D&reserved=0
 and
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjpbrucker.net%2Fgit%2Fqemu%2F&data=04%7C01%7CYinghan.Yang%40microsoft.com%7C91f189f2a0814e6743c308d8987fc809%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637427022395762927%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qAX7dTxzkA%2FcqUg2urWipPv%2BCdu5yxuWGt3ndBYlQKU%3D&reserved=0

Thanks again for helping with this

Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu