> Roland and I chatted on the phone today; I think I now understand > Roland's counter-proposal (I clearly didn't before). Let me try to > summarize: > > 1. Add a new verb for "set this userspace flag to 1 if mr X ever > becomes invalid" > 2. Add a new verb for "no longer tell me if mr X ever becomes invalid" > (i.e., remove the effects of #1) > 3. Add run-time query indicating whether #1 works > 4. Add [optional] memory registration caching to libibverbs
Looking closer at how to actually implement this, I see that the MMU notifiers (cf <linux/mmu_notifier.h>) may be called with locks held, so the kernel can't do a put_user() or the equivalent from the notifier. Therefore I think the interface we would expose to userspace would be something more like mmap() on some special file to get some kernel memory mapped into userspace, and then ioctl() to register/unregister a "set this flag if address range X...Y is affected." To be honest I don't really love this idea -- the kernel still needs a fairly complicated data structure to efficiently track the address ranges being tracked, the size of the mmap() limits the number of ranges being tracked based on a static limit set at initialization time (or handling multiple maps gets still more complex), and there is some careful thinking required to make sure there are no memory ordering or cache aliasing issues. So then I thought some about how to implement the full MR cache in the kernel. And that fairly quickly gets into some complex stuff as well -- for example, since we can't take sleeping locks from MMU notifiers, but we can't hold non-sleeping locks across MR register operations, we need to drop our MR cache lock while registering things, which forces us to deal with rolling back registrations if we miss the cache initially but then find that another thread has already added a registration to the cache while we were trying to register the same memory. Keeping the actual MR caching in userspace does seem to make things simpler because the locking is much easier without having to worry about sleeping vs. non-sleeping locks. Also doing the cache in userspace with my flag idea above has the nice property that the fast path of hitting the cache on memory registration has no system call and in fact testing the flag may even be a CPU cache hit if memory registration is a hot enough path. Doing it in the kernel means even the best case has a system call -- which is very cheap with current CPUs but still a non-zero cost. So I'm really not sure what the right way to go is yet. Further opinions would be helpful. - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
