Re: var len keys

Alan Silverstein Fri, 25 May 2007 12:00:48 -0700

John et al,

> ...well after many posts to Alan S., I finally found my problem ..
> RTFM.  JudyFirst is actually find key greater or equal, which requires
> an initial value ;(


We've been exchanging email offline.  I'm finally getting geared up to
trying being more helpful to John.  My own takeaway on the J1F() issue
is that the API, code, and docs are still good and correct, but as
always, it's possible to have a "UI disconnect" and an incorrect mental
model.  It happens to everyone, and it's hard to know how to prevent it.

> Why do we have the almost useless JudySL functions (no one uses null
> terminated string anymore)...

This is a highly subjective statement, and you know that, right?  :-)

There's a writeup somewhere about using JudyL as the base for long (but
fixed) or variable-length key to value mappers.  Here's a paraphrase of
it from memory...

You can think of JudyL as enabling technology for a ridiculously wide,
shallow tree where each node has a fanout of 2^32 or 2^64 (your native
word size).  An interesting problem is how to know the length of a
length-associated rather than length-terminated key/index WHILE keeping
them sorted in lexicographic order (if you care).  Judy being a
by-expanse (radix) tree, it naturally radix-sorts indexes.  (True
sort(1) capability, of course, is not just lexicographic, so Judy is not
an appropriate technology "just for sorting" in the general case.)

As discussed in the paper, C-style null-terminated strings are sorted,
and that's what JudySL provides.  Why have it?  It works for C programs,
it's handy, and it was a good demo too.

Now, Perl-type strings allow any char in the value, including a null
char, and thus are what I call length-associated.  The simple way to
store these kinds of variable-length keys is to start with a top-level
JudyL array by length.  Each value points to a subarray of array...  of
keys all of that same length.

The paper talks about ways of simultaneously lexico-sorting
length-associated variable-length keys, and knowing when to terminate by
length.  I never found a really "pretty" solution.  You just have to
have some kind of indication "out at the end" that you are terminating.

All of this grew out of me reading academic papers late in the Judy
project and realizing that, at least to academics, the long/variable
length key problem was the only one worth addressing.  I think they
glossed over practical reality, but what the heck...

One more note.  I always have to remind MYSELF that, yes, it's fine and
dandy and efficient to "waste a whole Judy array" when you need it, such
as for the initial sort-by-length.  That's the beauty of the Judy
library as an engine for solving higher-level problems.  It's efficient
in space and time for 0,1,2,... N (peta-level) indexes.

> ...and then JudyHS, which uses a length but can't iterate (in any
> order) over the available keys?

I know little about JudyHS, Doug added it after we LGPL'd the stuff.
Doug?

> I'd sure like JudySL to use a length instead of a null terminator, and
> JudyHS to be able to iterate over all the keys.

So, go write what you need.  :-)

The JudySL source code should be a fine example.  I do recommend you
find and read my short paper on variable length keys before going too
far.

Cheers,
Alan Silverstein

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: var len keys

Reply via email to