Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-25 Thread Simon Richter

Hi,

On 10/26/23 02:11, Lukas Wunner wrote:


This has recently been brought up internally at Intel and nobody could
understand why there's a whitelist in the first place.  A long-time PCI
architect told me that Intel silicon validation has been testing P2PDMA
at least since the Lindenhurst days, i.e. since 2005.


My PCIe test box generates URE completions in the root complex when I 
try to address iGPU BARs from an FPGA, and texture fetches from the iGPU 
that use BAR addresses on the FPGA do not get forwarded (so I venture 
that is an URE as well).


CPU:  i3-3225 CPU @ 3.30GHz (fam: 06, model: 3a, stepping: 09)

pci :00:00.0: [8086:0150] type 00 class 0x06
pci :00:01.0: [8086:0151] type 01 class 0x060400
pci :00:02.0: [8086:0162] type 00 class 0x03
pci :00:14.0: [8086:1e31] type 00 class 0x0c0330
pci :00:16.0: [8086:1e3a] type 00 class 0x078000
pci :00:1a.0: [8086:1e2d] type 00 class 0x0c0320
pci :00:1b.0: [8086:1e20] type 00 class 0x040300
pci :00:1c.0: [8086:1e10] type 01 class 0x060400
pci :00:1c.4: [8086:1e18] type 01 class 0x060400
pci :00:1d.0: [8086:1e26] type 00 class 0x0c0320
pci :00:1f.0: [8086:1e4a] type 00 class 0x060100
pci :00:1f.2: [8086:1e00] type 00 class 0x01018f
pci :00:1f.3: [8086:1e22] type 00 class 0x0c0500
pci :00:1f.5: [8086:1e08] type 00 class 0x010185
pci :01:00.0: [1172:1337] type 00 class 0xff
pci :03:00.0: [10ec:8168] type 00 class 0x02

So there is at least one configuration that doesn't work. :P

   Simon



Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-25 Thread Logan Gunthorpe



On 2023-10-25 11:35, Bjorn Helgaas wrote:
> On Wed, Oct 25, 2023 at 07:11:26PM +0200, Lukas Wunner wrote:
>> On Wed, Oct 25, 2023 at 10:30:07AM -0600, Logan Gunthorpe wrote:
>>> In addition to the above, P2PDMA transfers are only allowed by the
>>> kernel for traffic that flows through certain host bridges that are
>>> known to work. For AMD, all modern CPUs are on this list, but for Intel,
>>> the list is very patchy.
>>
>> This has recently been brought up internally at Intel and nobody could
>> understand why there's a whitelist in the first place.  A long-time PCI
>> architect told me that Intel silicon validation has been testing P2PDMA
>> at least since the Lindenhurst days, i.e. since 2005.
>>
>> What's the reason for the whitelist?  Was there Intel hardware which
>> didn't support it or turned out to be broken?
>>
>> I imagine (but am not certain) that the feature might only be enabled
>> for server SKUs, is that the reason?
> 
> No, the reason is that the PCIe spec doesn't require routing of
> peer-to-peer transactions between Root Ports:
> https://git.kernel.org/linus/0f97da831026
> 
> I think there was a little discussion about adding a firmware
> interface to advertise this capability, but I guess nobody cared
> enough to advance it.

Yes, I remember someone advancing that in the PCI spec, but I don't know
that it got anywhere.

I definitely remember also testing Intel hardware several years ago
where P2PDMA "worked" but the performance was so awful there was no point.

I vaguely remember this not working on non-server machines in the past
(circa 2015). That's why we had to buy a Xeon. Though this was a long
time ago and my memory is fuzzy.

I'd love it if someone from Intel can give us a reasonable check on the
CPU that guarantees P2PDMA will work for everything that passes the
check (like AMD has done.) But in the absence of Intel telling us this
we can't easily make these assumptions.

Logan



Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-25 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Logan Gunthorpe 
> Sent: Wednesday, October 25, 2023 12:30 PM
> To: Uwe Kleine-König ; Bjorn Helgaas
> 
> Cc: Simon Richter ; 1015...@bugs.debian.org; linux-
> p...@vger.kernel.org; Deucher, Alexander ;
> Krzysztof Wilczyński ; Emanuele Rocca 
> Subject: Re: Enabling PCI_P2PDMA for distro kernels?
>
>
>
> On 2023-10-25 00:19, Uwe Kleine-König wrote:
> > Hello,
> >
> > in https://bugs.debian.org/1015871 the Debian kernel team got a
> > request to enable PCI_P2PDMA. Given the description of the feature and
> > also the "If unsure, say N." I wonder if you consider it safe to
> > enable this option.
>
> I don't know. Not being a security expert, I'd say the attack surface exposed 
> is
> fairly minimal. Most of what goes on is internal to the kernel. So the main 
> risk
> is the same rough risk that goes with enabling any feature: there may be bugs.
>
> My opinion is that 'No' is recommended because the feature is still very
> nascent and advanced. Right now it enables two user visible niche
> features: p2p transfers in nvme-target between an NVMe device and an
> RDMA NIC and transferring buffers between two NVMe devices through the
> CMB via O_DIRECT. Both uses require an NVMe device with CMB memory,
> which is rare.
>
> Anyone using this option to do GPU P2PDMA transfers are certainly using out
> of tree (and likely proprietary) modules as the upstream kernel does not yet
> appear to support anything like that at this time. Thus it's not clear how 
> such
> code is using the P2PDMA subsystem or what implications there may be.
>

AMD GPUs can use P2PDMA for resource sharing between GPUs using upstream 
kernels and mesa and also ROCm.  E.g., if you have multiple GPUs in a system 
you can render on one and display on the other without an extra trip through 
system memory.  This is common on laptops and desktops with multiple GPUs.  
Enabling P2PDMA provides a nice perf boost on these systems due to reduced 
copies.  Or with ROCm, GPUs can directly access local memory on other GPUs.  
It's also possible between at least AMD GPUs and some RDMA NICs.  There are 
also a lot of use cases for P2PDMA between devices and NVME devices, but due to 
differences in memory sharing APIs there is no simple path to move forward 
here.  I think it's something is a chicken and an egg problem for wider 
adoption.


> It's not commonly the case that using these features increases throughput as
> CMB memory is usually much slower than system memory. It's use makes
> more sense in smaller/cheaper boutique systems where the system memory
> or bus bandwidth to the CPU is limited. Typically with a PCIe switch involved.
>
> In addition to the above, P2PDMA transfers are only allowed by the kernel for
> traffic that flows through certain host bridges that are known to work. For
> AMD, all modern CPUs are on this list, but for Intel, the list is very patchy.
> When using a PCIe switch (also uncommon) this restriction is not present
> seeing the traffic can avoid the host bridge.

The older pre-Zen AMD CPUs support it too, but only for writes.

Alex

>
> Thus, my contention is anyone experimenting with this stuff ought to be
> capable of installing a custom kernel with the feature enabled.
>
> Logan


Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-25 Thread Bjorn Helgaas
On Wed, Oct 25, 2023 at 07:11:26PM +0200, Lukas Wunner wrote:
> On Wed, Oct 25, 2023 at 10:30:07AM -0600, Logan Gunthorpe wrote:
> > In addition to the above, P2PDMA transfers are only allowed by the
> > kernel for traffic that flows through certain host bridges that are
> > known to work. For AMD, all modern CPUs are on this list, but for Intel,
> > the list is very patchy.
> 
> This has recently been brought up internally at Intel and nobody could
> understand why there's a whitelist in the first place.  A long-time PCI
> architect told me that Intel silicon validation has been testing P2PDMA
> at least since the Lindenhurst days, i.e. since 2005.
> 
> What's the reason for the whitelist?  Was there Intel hardware which
> didn't support it or turned out to be broken?
> 
> I imagine (but am not certain) that the feature might only be enabled
> for server SKUs, is that the reason?

No, the reason is that the PCIe spec doesn't require routing of
peer-to-peer transactions between Root Ports:
https://git.kernel.org/linus/0f97da831026

I think there was a little discussion about adding a firmware
interface to advertise this capability, but I guess nobody cared
enough to advance it.

Bjorn



Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-25 Thread Lukas Wunner
On Wed, Oct 25, 2023 at 10:30:07AM -0600, Logan Gunthorpe wrote:
> In addition to the above, P2PDMA transfers are only allowed by the
> kernel for traffic that flows through certain host bridges that are
> known to work. For AMD, all modern CPUs are on this list, but for Intel,
> the list is very patchy.

This has recently been brought up internally at Intel and nobody could
understand why there's a whitelist in the first place.  A long-time PCI
architect told me that Intel silicon validation has been testing P2PDMA
at least since the Lindenhurst days, i.e. since 2005.

What's the reason for the whitelist?  Was there Intel hardware which
didn't support it or turned out to be broken?

I imagine (but am not certain) that the feature might only be enabled
for server SKUs, is that the reason?

Thanks,

Lukas



Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-25 Thread Logan Gunthorpe



On 2023-10-25 00:19, Uwe Kleine-König wrote:
> Hello,
> 
> in https://bugs.debian.org/1015871 the Debian kernel team got a request
> to enable PCI_P2PDMA. Given the description of the feature and also the
> "If unsure, say N." I wonder if you consider it safe to enable this
> option.

I don't know. Not being a security expert, I'd say the attack surface
exposed is fairly minimal. Most of what goes on is internal to the
kernel. So the main risk is the same rough risk that goes with enabling
any feature: there may be bugs.

My opinion is that 'No' is recommended because the feature is still very
nascent and advanced. Right now it enables two user visible niche
features: p2p transfers in nvme-target between an NVMe device and an
RDMA NIC and transferring buffers between two NVMe devices through the
CMB via O_DIRECT. Both uses require an NVMe device with CMB memory,
which is rare.

Anyone using this option to do GPU P2PDMA transfers are certainly using
out of tree (and likely proprietary) modules as the upstream kernel does
not yet appear to support anything like that at this time. Thus it's not
clear how such code is using the P2PDMA subsystem or what implications
there may be.

It's not commonly the case that using these features increases
throughput as CMB memory is usually much slower than system memory. It's
use makes more sense in smaller/cheaper boutique systems where the
system memory or bus bandwidth to the CPU is limited. Typically with a
PCIe switch involved.

In addition to the above, P2PDMA transfers are only allowed by the
kernel for traffic that flows through certain host bridges that are
known to work. For AMD, all modern CPUs are on this list, but for Intel,
the list is very patchy. When using a PCIe switch (also uncommon) this
restriction is not present seeing the traffic can avoid the host bridge.

Thus, my contention is anyone experimenting with this stuff ought to be
capable of installing a custom kernel with the feature enabled.

Logan



Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-24 Thread Uwe Kleine-König
Hello,

in https://bugs.debian.org/1015871 the Debian kernel team got a request
to enable PCI_P2PDMA. Given the description of the feature and also the
"If unsure, say N." I wonder if you consider it safe to enable this
option.

Assuming this option isn't completely free of security concerns, a
kernel option to explicitly enable would be nice for a distro kernel.
This way the option could be enabled (but dormant and so safe) and users
who want to benefit from it despite the concerns can still do so.

Some side information:

 - According to Emanuele Rocca this option is enabled in Fedora Server
   38 and openSUSE Tumbleweed

 - I already asked in #linux-pci for feedback, Krzysztof Wilczyński
   recommended there to bring this topic forward via mail and pointed
   out a (paywalled) ACM paper about this topic
   (https://dl.acm.org/doi/10.1145/3409963.3410491).

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature