On Jun 5, 2013, at 6:39 AM, Haggai Eran <hagg...@mellanox.com> wrote:
> Perhaps I'm missing something, but I believe ODP deals with the first > two problems in the list (slide 8), even if it doesn't solve them > completely. Unfortunately, it does not. If we could register(0 ... 2^64) and never have to worry about registered memory, that might be cool (depending on how that actually works) -- more below. See this blog post that describes the freed registered memory issue: http://blogs.cisco.com/performance/registered-memory-rma-rdma-and-mpi-implementations/ and consider the following valid user code: a = malloc(x); // a gets (va=0x100, pa=0x12345) back from malloc MPI_Send(a, ...); // MPI registers 0x100 for len=x, and saves (0x100,x) in reg cache free(a); a = malloc(x); // a gets (va=0x100, pa=0x98765) back from malloc MPI_Send(a, ...); // MPI sees a=0x100 and things that it is already registered // ...kaboom In short, MPI has to intercept free/sbrk/whatever so that it can update its registration cache. > In the future we want to implement an implicit memory region covering > the entire process address space, thus eliminating the need for memory > registration almost completely (you might still want memory > registration, or memory windows, in order to control permissions of > remote operations). This would be great, as long as it's fast, transparent, and has no subtle implementation effects (like causing additional RNR NAKs for pages that are still in memory, which, according to your descriptions, it sounds like it won't). > We can also allow fork to work with our implementation. Copy-on-write > will work with ODP regions by invalidating the HCA's page tables before > modifying the pages to be read-only. A page fault from the HCA can then > refill the pages, or even break COW in case of a write. That would be cool, too. fork() has been a continuing problem -- solving that problem would be wonderful. If this ODP stuff becomes a new verb, it would be good: - if these fork-fixing / register-infinite capabilities can be queried at run time (maybe on ibv_device_cap_flags?) so that ULPs can know to use this functionality - if driver owners can get a heads up so that they can know to implement it >> Why don't we have something like ummunotify yet? > I think that the problem we are trying to solve is better handled inside > the kernel. If you are going to change the HCA's memory mappings, you'd > have to go through the kernel anyway. If/when you allow registering all memory, then I think you're right -- the MPI-must-intercept-free/sbrk-whatever issue may go away (that's why I started this thread asking about register(0 .. 2^64)). But without that, unless I'm missing something, I don't think it solves the MPI-must-catch-free-sbrk-etc. issues...? And therefore, having some kind of ummunotify-like functionality as a verb would be a Very Good Thing. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html