Hi Younès:

> I'm also interested in the new version core design and coding.

I will supply all the information you want.  However, at first let me
try to be brief and start with some background.

Since its first public release in year 2002, I have done perhaps ten
thousand of measurements on Judy's performance.  I almost always come to
the same conclusion:  To quote a Judy team member (Bob Gobeille) back in
2000 -- "It's all about cache-line fills stupid".  This is far more true
in modern processors (C2D) than it was in 2000.

To be overly simplistic about Judy:  It is a "Digital Tree" built with
256 node branches.  However, a 256 node branch, while being very fast,
is not memory efficient enough to be practical.  Therefore Judy has 3
forms of that object.  1) a 256 node Uncompressed branch, 2) a 2..7 node
Linear branch, 2) a variable node Bitmap branch.  Number 1 and 2 are
usually a one (1) cache-line fill object, while a Bitmap branch is
usually a 2 cache-line fill object.  The Judy designed allowed only one
(1) Bitmap branch in every path down the tree for that reason.  Since
this was always at the bottom (Uncompressed branches higher in the tree
are memory don't-cares), it meant that there is an extra cache-line fill
to traverse the tree.  After, many failed attempts to improve Judy
performance elsewhere(*), this turns out to be the most promising place
to improve Judy's performance.

I remember very clearly back in 2000 or 2001 calling a meeting of the
Judy team members to try to come up with a compressed branch structure
that would be a 1 cache-line fill object.  We failed and the Bitmap
branch was the outcome.  I did consider using uncompressed branches with
fewer nodes with larger expanses, but the algorithms to handle this were
just to complex to even imagine, much less write.

The new Judy does NOT have Bitmap branches and has a method of using
different width Uncompressed branches (with 4, 16, 64, 256 nodes).  It
also has a "graceful" way of growing in width and may be practical to
grow even beyond 256 (the coup de grâce would be 65536).  The primary
trick to get this manageable (code-able) was to allow this object to
point to only Leafs.  My breadboards of these objects have been very
promising and I am presently writing code for a complete version.
Better memory efficiency is a positive side effect.  I expect the new
Judy to be very memory efficient.  To simplify Judy, the "odd" length
byte internal indexes (3,5,6,7) were abandon.  It turned out that the
memory savings using these sizes is wasted in very short Leafs and the
search times were a noticeable problem with large Leafs.

When a Leaf is filled (Rev 1.0.4 Judy) to its max population, it was
turned into a branch, pointing to new leafs of smaller size, but with
the same population sum.  This process was called a "Cascade" or a
"Splay" of the leaf into many leafs of a narrower expanse per leaf.
This resulted in an abrupt "Insert" and caused many mallocs for the new
Leafs + the Branch.  It also caused an abrupt increase in memory used to
add one Index.  In the new Judy I say "graceful" because the common
scenario of when a Leaf reaches max population is to Splay into the 3
higher adjacent expanses + itself.  This results in a max of 4 malloc()
and 1 free().  When the adjacent expanse already has Leafs (from a
previous Splay), then a "Cascade" into the next bigger Branch occurs.
But, only 1 Malloc (for the new bigger branch) and 1 free (the old
smaller branch) occurs plus pointers from the old Branch are copied to
every 4th node in the new branch -- thus leaving a new set of "adjacent"
nodes available to grow (Splay) into.  I call every 4th leaf/expanse a
sentinel Leaf.  A sentinel Leaf can contain up 4 expanses of Indexes.
The net result is a Leaf, typically, has a 1 to 4 change in size,
instead of a theoretical 256 to 1.  I am very excited to test if this is
a realistic solution for a 65536 node branch.  If it is, then one less
cache-line fill be required for an access to a large array -- resulting
in performance perhaps twice as fast as the current Judy (Rev 1.0.4).
It will also be interesting to see how small a leaf can be before
diminishing returns are achieved.


Doug Baskins (Jan2008)


(*) At first, larger Leafs seem to be the only place where improvement
was possible (I wasted several years on that notion).  Experiments
showed that "Insert" times were dominated by the size of the leaf, not
the search times.  Also, good binary searches of Leafs had too many
cache-line fills.  A typical Leaf search had to be reduced to a single
cache-line fill.  Conclusion:  smaller Leafs were the way to improve
performance of both "Get" and "Insert" -- simply because of the number
of cache-line fills incurred.
 
Doug Baskins <[EMAIL PROTECTED]>



----- Original Message ----
From: yjudy <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [email protected]
Sent: Wednesday, January 16, 2008 6:52:13 PM
Subject: New Judy version


Hi guys!

I followed the discussion about the new Judy version & features.
And as you, I'm waiting for it ;-)

The only problem it to have these new killer features with the same 
(poor) interfaces
to access them.

I need to help on that. I'm also interested in the new version core 
design and coding.

cheers
Younès.




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Reply via email to