On Mon, Jan 08, 2018 at 07:34:34PM +0100, Christoph Hellwig wrote: > > > > And on that topic, does this scheme work with HFI? > > > > > > No, and I guess we need an opt-out. HFI generally seems to be > > > extremely weird. > > > > This series needs some kind of fix so HFI, QIB, rxe, etc don't get > > broken, and it shouldn't be 'fixed' at the RDMA level. > > I don't think rxe is a problem as it won't show up a pci device.
Right today's restrictions save us.. > HFI and QIB do show as PCI devices, and could be used for P2P transfers > from the PCI point of view. It's just that they have a layer of > software indirection between their hardware and what is exposed at > the RDMA layer. > > So I very much disagree about where to place that workaround - the > RDMA code is exactly the right place. But why? RDMA is using core code to do this. It uses dma_ops in struct device and it uses normal dma_map SG. How is it RDMA's problem that some PCI drivers provide strange DMA ops? Admittedly they are RDMA drivers, but it is a core mechanism they (ab)use these days.. > > It could, if we had a DMA op for p2p then the drivers that provide > > their own ops can implement it appropriately or not at all. > > > > Eg the correct implementation for rxe to support p2p memory is > > probably somewhat straightfoward. > > But P2P is _not_ a factor of the dma_ops implementation at all, > it is something that happens behind the dma_map implementation. Only as long as the !ACS and switch limitations are present. Those limitations are fine to get things started, but there is going to a be a push improve the system to remove them. > > Very long term the IOMMUs under the ops will need to care about this, > > so the wrapper is not an optimal place to put it - but I wouldn't > > object if it gets it out of RDMA :) > > Unless you have an IOMMU on your PCIe switch and not before/inside > the root complex that is not correct. I understand the proposed patches restrict things to require a switch and not transit the IOMMU. But *very long term* P2P will need to work with paths that transit the system IOMMU and root complex. This already exists as out-of-tree funtionality that has been deployed in production for years and years that does P2P through the root complex with the IOMMU turned off. Jason