Re: var len keys

skaller Fri, 25 May 2007 19:35:38 -0700

On Fri, 2007-05-25 at 13:00 -0600, Alan Silverstein wrote:
> John et al,


> > Why do we have the almost useless JudySL functions (no one uses null
> > terminated string anymore)...
> 
> This is a highly subjective statement, and you know that, right?  :-)

Nope. Today we have Unicode, and people use binary data a lot.
Null terminated strings are *out* in quality code, in most
places. C++ doesn't use them, and most code people say the
code is '8 bit clean' to mean that you can put 0 char in
a string and the code will still work.

So if the claim is subjective it's backed by considerable
experience in with many different systems.

> There's a writeup somewhere about using JudyL as the base for long (but
> fixed) or variable-length key to value mappers. 

Of course, JudyL can be used to do this, either by making
a tree of JudyL arrays, or better, a hybrid data structure
using some Judy fanout followed by some other data structure
such as a hashtable or just a plain list.

However, I played with my own trie implementation, and had
no trouble with variable length strings, where the length
was an index -- you just have to pass the length around.

In fact 'JudyL' should really be a special case of this
with length = sizeof(void*).

> Now, Perl-type strings allow any char in the value, including a null
> char, and thus are what I call length-associated.  The simple way to
> store these kinds of variable-length keys is to start with a top-level
> JudyL array by length.  Each value points to a subarray of array...  of
> keys all of that same length.

There are certainly ways to use the existing technology, no doubt
about that. The question is why the most general interface
is not provided.

JudyHS stores the right kind of data but there are no iterators,
which limits its utility.

JudySL provides the iterators but limits the data.

> One more note.  I always have to remind MYSELF that, yes, it's fine and
> dandy and efficient to "waste a whole Judy array" when you need it, such
> as for the initial sort-by-length.  That's the beauty of the Judy
> library as an engine for solving higher-level problems.  It's efficient
> in space and time for 0,1,2,... N (peta-level) indexes.

The only reason not to provide a SINGLE interface with variable
bit length keys is optimisation. JudyL + length count then subsumes
Judy1, JudySL and JudyHS. People wishing to use 0 terminated
strings just have to call strlen() on the argument, and learn
to stop using crappy data structures. Secure software never
uses 0 terminated strings .. it's not safe to do so.

More generally, one can envisage a structure vector:

        { fan1, fan2, ... fan_rest }

which are the number of bits in each fanout, however
this might be hard to implement efficiently. in addition
the size of the stored data could be variable: it's a pain
using a pointer to two words of data, since it costs
3 words and create a memory management problem.

Of course the POINT of Judy is to heavily optimise
special cases chosen for their utility, so I'm not
advocating the most general possible interface. ....

Just wishing JudySL used a length count, not null termination,
since 99% of all uses will be storing binary data which
mandates inclusion of null bytes.

Felix strings, for example, are C++ strings, and
are specified as 8 bit clean so I can't put them in 
a JudySL array, and I can't use JudyHS either since
there are no iterators.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: var len keys

Reply via email to