Toni, interesting reading, thanks. For what little it's worth, a few
responses from me...
> The key problem is to have enough understanding of how Judy is
> implemented in order to identify the exact areas that have contention
> under some number of writers/readers.
It's been some years now but going from memory...
First you must understand Judy is a by-expanse (digital) tree, not a
by-population (balanced) tree. The hallmark of the latter is continual
attempts to balance subtrees, and Judy avoids all of that nonsense.
Instead it uses methods that make it perfectly reasonable for one
subtree to have zero elements in it while another adjacent subtree has a
billion elements. The trick was to find ways to accomplish this without
ever paying much of a time or space penalty, such as by getting a lot
deeper (many cache misses) or having a lot of null pointers (wasted
memory). As we worked on the problem, we were continually amazed at the
nearly miraculous elegance (although it's not appaarent in the code) of
each new trick we found. When I studied a lot of academic papers and
books later, I saw that NO ONE seemed to even be working in the same
direction, due to limiting beliefs.
(One guy wrote a paper that touched on the same concepts, and I think he
was swallowed up by Google. :-)
With this in mind, you can see that if a reader and a writer are working
in different subtrees, there's no collision at all (well almost never).
Furthermore, the shape of the entire tree is extremely stable (except
when calling a *Free() function); single insert or delete operations
(almost always?) affect at most two nodes (a parent and child).
The problem is still how to know that threads A and B are going into
certain areas of the tree and will or won't collide, without paying a
lot of cost for locking at every subtree level. I seem to recall we
toyed with a way to lock just the one subtree being directly affected by
a write, against any read or write, once determining which subtree it
was. But how do you know a reader is looking at that leaf node RIGHT
NOW? Without having it "pre-register" (lock), etc.
I think we also considered ways to do insert/delete "atomically", that
is, get the new subtree ready and then substitute it into place by
revising a single word. However, how do you know when it's OK to free
the old memory that might still be used by another thread starved for
CPU cycles?
Since Judy is a digital tree, the keys/indexes inherently tell you
immediately where you are going in the tree -- although you don't know
until you look what's under that "address". Say I read (look up)
key/index 0x23124567 at the same time I'm writing (inserting)
0x47132315. You can very quickly determine that 0x23 and 0x47 are
unrelated paths -- unless the array is so small that the top-level node
is all there is ("oops"). Once you populate the array enough to fan out
the first digit (byte), you are in separate subtrees. So conceptually
some things are easier, but others are not.
> Example: if restricted to 1-writer and 1-reader only each on a
> different thread running simultaneously on multi-core architecture,
> what point(s) in a call to read an existing Judy structure would
> conflict with a simultaneous call to write the same structure?
> Specific lines of code here are needed, not just "this routine needs
> to have exclusive access".
Sorry I don't have time to go find that for you. But I can tell you
it's probably not that simple. Since we started on HPUX (no inline
functions) and only later ported to Linux, and in order to avoid huge
replicated sections of nearly identical code, I talked Doug into letting
me use a lot of macros, which ended up being somewhat complex, although
you can think of them as inline functions. This gave us efficiency
(avoiding true function calls, and even unrolling some loops) with
minimal code duplication, although quite often for debugging we had to
cc -E the source and look at what the compiler actually saw.
Anyway, this makes it hard to point at specific lines of code and say,
"here you might have contention."
I do seem to recall that the functions themselves, however, were a
reasonably elegant set depending on the type of node being handled.
However, they too are somewhat complex because we generated Judy1 and
JudyL code from common sources by another process. Again, to minimize
redundant (hard for a human to read) huge code blocks (at the cost of
dealing with layers of abstraction instead).
> As per the comments of the author cited in my previous posting to this
> forum, a basic hash table implementation is around 300 lines long
> whereas Judy is considerably more.
Shrug. Earlier versions of Judy were smaller, but it grew ~10X to get
the "last" ~10X of performance in Judy IV, the released version. The
overall object code library is still small enough to be negligible
(meaning, link it archived!) I'd say, based on our experience, that you
can't get utter simplicity AND performance at the same time, at least
not in this domain ("sparse arrays"). The Judy data structure and code
is not THAT complex -- once you really understand it. The size arises
mostly from the variety of structures (tricks) it uses to get optimal
performance in all possible cases, switching between various kinds of
nodes and leaves.
> Whereas the former is well understood and can (via Google and cut 'n'
> paste) be transformed into any number of lock free (even possibly wait
> free and even atomic instruction free) structures, the latter is far
> harder due to the code length and complexity.
Sorry. For "small, bounded" problems, don't use Judy if that matters to
you.
> A similar effort to what the authors undertook to understand and
> implement for a chip feature like cache line size would be required to
> decompose and reassembly Judy to take advantages of the new papers in
> this complex area.
Maybe not that bad. I'd have to know more first about what the new
papers are actually saying...
Alas, HP ceased paying me/us to work on this, and since then I've found
more interest in doing engineering that pays, than continuing to work on
libJudy as a labor of love. Oh well. Shame on me.
Thanks,
Alan Silverstein
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel