On Jun 6, 2013, at 4:33 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> I don't think this covers other memory regions, like those added via mmap, 
> right?


We talked about this at the MPI Forum this week; it doesn't seem like ODP fixes 
any MPI problems.

1. MPI still has to have a memory registration cache, because 
ibv_reg_mr(0...sbrk()) doesn't cover the stack or mmap'ed memory, etc.

2. MPI still has to intercept (at least) munmap().

3. Having mmap/malloc/etc. return "new" memory that may already be registered 
because of a prior memory registration and subsequent munmap/free/etc. is just 
plain weird.  Worse, if we re-register it, ref counts could go such that the 
actual registration will never actually expire until the process dies (which 
could lead to processes with abnormally large memory footprints, because they 
never actually let go of memory because it's still registered).

4. Even if MPI checks the value of sbrk() and re-registers (0...sbrk()) when 
sbrk() increases, this would seem to create a lot of work for the kernel -- 
which is both slow and synchronous.  Example:

a = malloc(5GB);
MPI_Send(a, 1, MPI_CHAR, ...); // MPI sends 1 byte

Then the MPI_Send of 1 byte will have to pay the cost of registering 5GB of new 
memory.

-----

Unless we understand this wrong (and there's definitely a chance that we do!), 
it doesn't sound like ODP solves anything for MPI.  Especially since HPC 
applications almost never swap (in fact, swap is usually disabled in HPC 
environments).

What MPI wants is:

1. verbs for ummunotify-like functionality
2. non-blocking memory registration verbs; poll the cq to know when it has 
completed

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to