RE: Enabling peer to peer device transactions for PCIe devices
> -Original Message- > From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] > Sent: Friday, January 06, 2017 1:26 PM > To: Jerome Glisse > Cc: Sagalovitch, Serguei; Jerome Glisse; Deucher, Alexander; 'linux- > ker...@vger.kernel.org'; 'linux-r...@vger.kernel.org'; 'linux- > nvd...@lists.01.org'; 'linux-me...@vger.kernel.org'; 'dri- > de...@lists.freedesktop.org'; 'linux-...@vger.kernel.org'; Kuehling, Felix; > Blinzer, Paul; Koenig, Christian; Suthikulpanit, Suravee; Sander, Ben; > h...@infradead.org; Zhou, David(ChunMing); Yu, Qiang > Subject: Re: Enabling peer to peer device transactions for PCIe devices > > On Fri, Jan 06, 2017 at 12:37:22PM -0500, Jerome Glisse wrote: > > On Fri, Jan 06, 2017 at 11:56:30AM -0500, Serguei Sagalovitch wrote: > > > On 2017-01-05 08:58 PM, Jerome Glisse wrote: > > > > On Thu, Jan 05, 2017 at 05:30:34PM -0700, Jason Gunthorpe wrote: > > > > > On Thu, Jan 05, 2017 at 06:23:52PM -0500, Jerome Glisse wrote: > > > > > > > > > > > > I still don't understand what you driving at - you've said in both > > > > > > > cases a user VMA exists. > > > > > > In the former case no, there is no VMA directly but if you want one > than > > > > > > a device can provide one. But such VMA is useless as CPU access is > not > > > > > > expected. > > > > > I disagree it is useless, the VMA is going to be necessary to support > > > > > upcoming things like CAPI, you need it to support O_DIRECT from the > > > > > filesystem, DPDK, etc. This is why I am opposed to any model that is > > > > > not VMA based for setting up RDMA - that is shorted sighted and > does > > > > > not seem to reflect where the industry is going. > > > > > > > > > > So focus on having VMA backed by actual physical memory that > covers > > > > > your GPU objects and ask how do we wire up the '__user *' to the > DMA > > > > > API in the best way so the DMA API still has enough information to > > > > > setup IOMMUs and whatnot. > > > > I am talking about 2 different thing. Existing hardware and API where > you > > > > _do not_ have a vma and you do not need one. This is just > > > > > existing stuff. > > > > I do not understand why you assume that existing API doesn't need one. > > > I would say that a lot of __existing__ user level API and their support in > > > kernel (especially outside of graphics domain) assumes that we have vma > and > > > deal with __user * pointers. > > +1 > > > Well i am thinking to GPUDirect here. Some of GPUDirect use case do not > have > > vma (struct vm_area_struct) associated with them they directly apply to > GPU > > object that aren't expose to CPU. Yes some use case have vma for share > buffer. > > Lets stop talkind about GPU direct. Today we can't even make VMA > pointing at a PCI bar work properly in the kernel - lets start there > please. People can argue over other options once that is done. > > > For HMM plan is to restrict to ODP and either to replace ODP with HMM or > change > > ODP to not use get_user_pages_remote() but directly fetch informations > from > > CPU page table. Everything else stay as it is. I posted patchset to replace > > ODP with HMM in the past. > > Make a generic API for all of this and you'd have my vote.. > > IMHO, you must support basic pinning semantics - that is necessary to > support generic short lived DMA (eg filesystem, etc). That hardware > can clearly do that if it can support ODP. We would definitely like to have support for hardware that can't handle page faults gracefully. Alex ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
RE: Enabling peer to peer device transactions for PCIe devices
> -Original Message- > From: Haggai Eran [mailto:hagg...@mellanox.com] > Sent: Wednesday, November 30, 2016 5:46 AM > To: Jason Gunthorpe > Cc: linux-ker...@vger.kernel.org; linux-r...@vger.kernel.org; linux- > nvd...@ml01.01.org; Koenig, Christian; Suthikulpanit, Suravee; Bridgman, > John; Deucher, Alexander; linux-me...@vger.kernel.org; > dan.j.willi...@intel.com; log...@deltatee.com; dri- > de...@lists.freedesktop.org; Max Gurtovoy; linux-...@vger.kernel.org; > Sagalovitch, Serguei; Blinzer, Paul; Kuehling, Felix; Sander, Ben > Subject: Re: Enabling peer to peer device transactions for PCIe devices > > On 11/28/2016 9:02 PM, Jason Gunthorpe wrote: > > On Mon, Nov 28, 2016 at 06:19:40PM +, Haggai Eran wrote: > >>>> GPU memory. We create a non-ODP MR pointing to VRAM but rely on > >>>> user-space and the GPU not to migrate it. If they do, the MR gets > >>>> destroyed immediately. > >>> That sounds horrible. How can that possibly work? What if the MR is > >>> being used when the GPU decides to migrate? > >> Naturally this doesn't support migration. The GPU is expected to pin > >> these pages as long as the MR lives. The MR invalidation is done only as > >> a last resort to keep system correctness. > > > > That just forces applications to handle horrible unexpected > > failures. If this sort of thing is needed for correctness then OOM > > kill the offending process, don't corrupt its operation. > Yes, that sounds fine. Can we simply kill the process from the GPU driver? > Or do we need to extend the OOM killer to manage GPU pages? Christian sent out an RFC patch a while back that extended the OOM to cover memory allocated for the GPU: https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html Alex > > > > >> I think it is similar to how non-ODP MRs rely on user-space today to > >> keep them correct. If you do something like madvise(MADV_DONTNEED) > on a > >> non-ODP MR's pages, you can still get yourself into a data corruption > >> situation (HCA sees one page and the process sees another for the same > >> virtual address). The pinning that we use only guarentees the HCA's page > >> won't be reused. > > > > That is not really data corruption - the data still goes where it was > > originally destined. That is an application violating the > > requirements of a MR. > I guess it is a matter of terminology. If you compare it to the ODP case > or the CPU case then you usually expect a single virtual address to map to > a single physical page. Violating this cause some of your writes to be dropped > which is a data corruption in my book, even if the application caused it. > > > An application cannot munmap/mremap a VMA > > while a non ODP MR points to it and then keep using the MR. > Right. And it is perfectly fine to have some similar requirements from the > application > when doing peer to peer with a non-ODP MR. > > > That is totally different from a GPU driver wanthing to mess with > > translation to physical pages. > > > >>> From what I understand we are not really talking about kernel p2p, > >>> everything proposed so far is being mediated by a userspace VMA, so > >>> I'd focus on making that work. > > > >> Fair enough, although we will need both eventually, and I hope the > >> infrastructure can be shared to some degree. > > > > What use case do you see for in kernel? > Two cases I can think of are RDMA access to an NVMe device's controller > memory buffer, and O_DIRECT operations that access GPU memory. > Also, HMM's migration between two GPUs could use peer to peer in the > kernel, > although that is intended to be handled by the GPU driver if I understand > correctly. > > > Presumably in-kernel could use a vmap or something and the same basic > > flow? > I think we can achieve the kernel's needs with ZONE_DEVICE and DMA-API > support > for peer to peer. I'm not sure we need vmap. We need a way to have a > scatterlist > of MMIO pfns, and ZONE_DEVICE allows that. > > Haggai ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm