On Fri, Jun 07, 2013 at 10:59:43PM +0000, Jeff Squyres (jsquyres) wrote:

> > I don't think this covers other memory regions, like those added via mmap, 
> > right?
>  
> We talked about this at the MPI Forum this week; it doesn't seem
> like ODP fixes any MPI problems.

ODP without 'register all address space' changes the nature of the
problem, and fixes only one problem.

You do need to cache registrations, and all the tuning parameters (how
much do I cache, how long do I hold it for, etc, etc) all still apply.

What goes away (is fixed) is the need for intercepts and the need to
purge address space from the cache because the backing registration
has become non-coherent/invalid. Registrations are always
coherent/valid with ODP.

This cache, and the associated optimization problem, can never go
away. With a 'register all of memory' semantic the cache can move into
the kernel, but the performance implication and overheads are all
still present, just migrated.

> 2. MPI still has to intercept (at least) munmap().

Curious to know what for? 

If you want to prune registrations (ie to reduce memory footprint),
this can be done lazyily at any time (eg in a background thread or
something). Read /proc/self/maps and purge all the registrations
pointing to unmapped memory. Similar to garbage collection.

There is no harm in keeping a registration for a long period, except
for the memory footprint in the kernel.

> 3. Having mmap/malloc/etc. return "new" memory that may already be
> registered because of a prior memory registration and subsequent
> munmap/free/etc. is just plain weird.  Worse, if we re-register it,
> ref counts could go such that the actual registration will never
> actually expire until the process dies (which could lead to
> processes with abnormally large memory footprints, because they
> never actually let go of memory because it's still registered).

This is entirely on the registration cache implementation to sort
out, there are lots of performance/memory trade offs.

It is only weird when you think about it in terms of buffers. memory
registration has to do with address space, not buffers.

> What MPI wants is:
> 
> 1. verbs for ummunotify-like functionality
> 2. non-blocking memory registration verbs; poll the cq to know when it has 
> completed

To me, ODP with an additional 'register all address space' semantic, plus
an asynchronous prefetch does both of these for you.

1. ummunotify functionality and caching is now in the kernel, under
   ODP. RDMA access to an 'all of memory' registration always does the
   right thing.
2. asynchronous prefetch (eg as a work request) triggers ODP and
   kernel actions to ready a subset of memory for RDMA, including
   all the work that memory registration does today (get_user_pages,
   COW break, etc)
   
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to