Re: [ofa-general] New proposal for memory management

Jeff Squyres Thu, 30 Apr 2009 07:39:28 -0700

On Apr 29, 2009, at 4:45 PM, Barrett, Brian W wrote:

If you think this sounds like a hassle, think about what it lookslike fromthe point of view of the MPI implementer (or any other developerwriting
libraries which sit between user data and OFED, like GASNet).

If you don't care about what pain MPI implementors have to go through(and you probably don't ;-) ) -- consider that this is a majorroadblock to most *anyone* who wants to write to user verbs.


<banging the same old drum>

I heard lots of variations of "Why isn't OFED more popular?" in Sonomathis year. This is at least one big reason why: no (normal/non-superhuman programmers) can write verbs code (IMHO). MPI's *have* tosupport OpenFabrics -- HPC customers demand it. But non-HPC customershave a clear alternative: they'll just write sockets code. And theprice/performance for using sockets over IB/iWARP may or may not beattractive depending on the customer's buying capacity. Hence -- theyjust buy gigE (10gigE, when the price drops low enough).

Doesn't OpenFabrics want to grow beyond MPI? Woody said that verbs isdesigned to support a billion different things -- outside of MPI and afew storage protocols (none of which are widely adopted), how much isOFED used?


</banging the same old drum>

Jeff and I talked for a while today, and we're pretty sure that aslong asthe byte set by the kernel notifier is written before the pages arereturnedinto the unallocated list, there isn't actually a race condition.[snip]
However, there's still then the problem with the notifier concept ofhow thekernel passes which pages were given back to the kernel. It has topass a(potentially very large) amount of data back to the user, so thememoryownership issues with kernel/user space are interesting. It alsohas tosomewhat atomically prepare the list and undset the notifier byte,which is
also problematic.  But probably workable.

I feel compelled to amend this: this notifier concept *may beworkable*, but it's still quite complex for the reasons Brian cited.The goal here is to *reduce* complexity, especially for applications/ULPs using the verbs stack.

If we put the registration cache in the network stack, application/ULPcomplexity will be reduced significantly. My $0.02 is that using anotifier solution is still fairly complex and introduces a new set ofproblems.

FWIW: Putting the registration cache in the userspace verbs stackmeans that verbs will now have to do the horrid malloc/mmap/etc.intercept tricks that MPI implementations currently do. Take it fromus -- this is not a business you want to be in. Such interceptsbreaks tools like valgrind and other memory-checking debuggers. Eventhe best intercept hooks available today can still be subverted. OpenMPI (and MX!) has to insert a pre-main hook to setup these intercepts,and then check later to ensure that no one else subverted our hooks.Yuck.


It's memory management.  And that belongs in the kernel.

--
Jeff Squyres
Cisco Systems

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] New proposal for memory management

Reply via email to