On Fri, Nov 17, 2006 at 02:16:07PM -0800, Matthew Dillon wrote: > Tokens have worked out very well for long code paths such as > traversing a mountlist in order to perform a filesystem sync. There > is virtually no knowledge pollution verses the mutex model where you > have to pass the held mutex down any code path that might potentially > block.
Good, nice to know that. I'll be thinking about this more. > That said, we are having even MORE success with cpu localization, > allowing us to code lockless algorithms. So much success, in fact, > that many of the things I originally envisioned as using tokens now > don't use any locks at all. It's good for a certain kinds of data, but simply replicating the data (assumption here) across CPUs isn't nearly as direct as a setting forward and backwards pointers in a double linked list and then do a write barrier. Linux has list data structures tightly integrated with RCU for use in parallelized hash tables. The data structure is common and operation is very quick as opposed to manually loading that stuff for each processor. RCU does it for you via the memory controller of your machine via read/write barriers (read is done using an RCU dereference wrapper which opens up a future discussion since it doesn't have to order itself just using a memory barrier). It's a superior algorithm for multiprocessor shared dictionaries without copying data all over the place. I don't see how replication can get you the same properties in a single data structure. > Also, standard lockmgr locks have been simplified considerably and > they also now leak into some of the areas where I originally envisioned > using tokens. > > The areas which still need solutions are primarily tree topology lookups > and updates. The VM page hash table, the buffer cache, and the > name cache. I am not at all convinced that RCU is the answer I > am looking for. What I do want to do is to find a general solution > that can be implemented by the red-black tree module itself rather > than by all the code that uses RB trees. Not sure what you mean by "tree topology" and if it's related to your clustering stuff or not. But please do read that paper on a scalable page cache I mail to you previously. It's got some interesting insights and a potential solution for your problems. The fundamental thinking for this kind of thing needs to be developed, but it shouldn't be a barrier to understanding these ideas. Linux RCUed a radix tree for some of these specific purposes BTW. I think you'll find that they have used RCU very successfully for the subsystems you're concerned about. Their dcache is RCUified. Whether they meet your full criteria of being a cluster OS or not I can't say since I don't fully understand your design. Read mostly places that are hammered heavily in Linux have completely move to RCU and not rwlocks. These are concrete examples of how to solve what you mentioned above and they have performed very well for that community. Please listen to me. :) BTW, the single sxlock in FreeBSD is useless since it's serialized against a single atomic operation. It's doesn't permit parallel reader in a true sense unlike another IBM algorithm. bill
