On Jun 6, 2013, at 4:33 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> I don't think this covers other memory regions, like those added via mmap, > right? We talked about this at the MPI Forum this week; it doesn't seem like ODP fixes any MPI problems. 1. MPI still has to have a memory registration cache, because ibv_reg_mr(0...sbrk()) doesn't cover the stack or mmap'ed memory, etc. 2. MPI still has to intercept (at least) munmap(). 3. Having mmap/malloc/etc. return "new" memory that may already be registered because of a prior memory registration and subsequent munmap/free/etc. is just plain weird. Worse, if we re-register it, ref counts could go such that the actual registration will never actually expire until the process dies (which could lead to processes with abnormally large memory footprints, because they never actually let go of memory because it's still registered). 4. Even if MPI checks the value of sbrk() and re-registers (0...sbrk()) when sbrk() increases, this would seem to create a lot of work for the kernel -- which is both slow and synchronous. Example: a = malloc(5GB); MPI_Send(a, 1, MPI_CHAR, ...); // MPI sends 1 byte Then the MPI_Send of 1 byte will have to pay the cost of registering 5GB of new memory. ----- Unless we understand this wrong (and there's definitely a chance that we do!), it doesn't sound like ODP solves anything for MPI. Especially since HPC applications almost never swap (in fact, swap is usually disabled in HPC environments). What MPI wants is: 1. verbs for ummunotify-like functionality 2. non-blocking memory registration verbs; poll the cq to know when it has completed -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html