I think that this thread has gotten to the point where people are no
longer reading each post carefully and are therefore re-hashing points
that have already been discussed. It has therefore reached the end of
its usefulness.
It was suggested today that a teleconference to discuss these issues
might be much more useful (an hour-long teleconference can save a
week's worth of emails!). This will be a technical call to discuss
memory registration issues; it will not be an EWG call. I've setup a
WebEx call for next Monday at the "normal" time: noon US Eastern, 9am
US Pacific, 7pm Israel. The invite will be coming to the ewg and
general lists shortly.
*** PLEASE USE THE WEBEX URL TO JOIN THE TELECONFERENCE (vs. just
dialing in)
(when you logon, it'll prompt you for a phone number to call you
back;
yes, non-US phone numbers are supported)
I will make up a small number of slides that attempt to summarize all
the arguments (on both sides) so far. Hopefully, they can serve as a
starting point for discussion.
Thanks; see you next Monday.
On May 1, 2009, at 1:09 PM, Roland Dreier (rdreier) wrote:
> You mentioned that doing this stuff is a choice; the choice that
> MPI's/ ULPs/applications therefore have is:
>
> - don't use registration caches/memory allocation hooking, have
> terrible performance
> - use registration caches/memory allocation hooking, have good
> performance
I think it's a bit of a stretch to suggest that all or even most
userspace RDMA applications have the same need for registration
caching
as MPI. In fact my feeling is that the fact that MPI must deal with
RDMA to arbitrary memory allocated by an application out of MPI's
control is the exception. My most recent experience was with Cisco's
RAB library, and in that case we simply designed the library so that
all
RDMA was done to memory allocated by the library -- so no need for a
registration cache, and in fact no need for registration in any fast
path. I suspect that the majority of code written to use RDMA
natively
will be designed with similar properties.
So this proposal is very much an MPI-specific interface. Which
leads to
my next point. I have no doubt that the MPI community has a very good
idea of a memory registration interface that would make MPI
implementations simpler and more robust. However I don't think
there's
quite as much expertise about what the best way to implement such an
interface is.
My initial reaction is that I don't want to extend the kernel ABI with
a set of new MPI-specific verbs if there's a way around it. We've
been
told over and over that the registration cache is complex and fragile
code -- but moving complex and fragile code into the kernel doesn't
magically make it any simpler or more robust, it just means that bugs
now crash the whole system instead of just affecting one process.
Now, of course MMU notifiers allow the kernel to know reliably when a
process's page tables change, which means that all the complicated
malloc hooking etc is not needed. So that complexity is avoided in
the
kernel. But suppose I give userspace the same MMU notifier capability
(eg I add a system call like "if any mappings in the virtual address
range X ... Y change, then write a 1 to virtual address Z") -- then
what
do I gain from having the rest of the registration caching in the
kernel? (And avoiding the duplication of caching code between
multiple
MPI implementations is not an answer -- it's quite feasible to put the
caching code into libibverbs if that's the best place for it)
- R.
--
Jeff Squyres
Cisco Systems
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general