On Jun 5, 2013, at 6:39 AM, Haggai Eran <hagg...@mellanox.com> wrote:

> Perhaps I'm missing something, but I believe ODP deals with the first
> two problems in the list (slide 8), even if it doesn't solve them
> completely.

Unfortunately, it does not.  If we could register(0 ... 2^64) and never have to 
worry about registered memory, that might be cool (depending on how that 
actually works) -- more below.

See this blog post that describes the freed registered memory issue:

    
http://blogs.cisco.com/performance/registered-memory-rma-rdma-and-mpi-implementations/

and consider the following valid user code:

a = malloc(x);    // a gets (va=0x100, pa=0x12345) back from malloc
MPI_Send(a, ...); // MPI registers 0x100 for len=x, and saves (0x100,x) in reg 
cache
free(a);
a = malloc(x);    // a gets (va=0x100, pa=0x98765) back from malloc
MPI_Send(a, ...); // MPI sees a=0x100 and things that it is already registered
// ...kaboom

In short, MPI has to intercept free/sbrk/whatever so that it can update its 
registration cache.

> In the future we want to implement an implicit memory region covering
> the entire process address space, thus eliminating the need for memory
> registration almost completely (you might still want memory
> registration, or memory windows, in order to control permissions of
> remote operations).

This would be great, as long as it's fast, transparent, and has no subtle 
implementation effects (like causing additional RNR NAKs for pages that are 
still in memory, which, according to your descriptions, it sounds like it 
won't).

> We can also allow fork to work with our implementation. Copy-on-write
> will work with ODP regions by invalidating the HCA's page tables before
> modifying the pages to be read-only. A page fault from the HCA can then
> refill the pages, or even break COW in case of a write.

That would be cool, too.  fork() has been a continuing problem -- solving that 
problem would be wonderful.

If this ODP stuff becomes a new verb, it would be good:

- if these fork-fixing / register-infinite capabilities can be queried at run 
time (maybe on ibv_device_cap_flags?) so that ULPs can know to use this 
functionality
- if driver owners can get a heads up so that they can know to implement it

>> Why don't we have something like ummunotify yet?
> I think that the problem we are trying to solve is better handled inside
> the kernel. If you are going to change the HCA's memory mappings, you'd
> have to go through the kernel anyway.

If/when you allow registering all memory, then I think you're right -- the 
MPI-must-intercept-free/sbrk-whatever issue may go away (that's why I started 
this thread asking about register(0 .. 2^64)).  But without that, unless I'm 
missing something, I don't think it solves the MPI-must-catch-free-sbrk-etc. 
issues...?  And therefore, having some kind of ummunotify-like functionality as 
a verb would be a Very Good Thing.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to