John et al,
> Today we have Unicode, and people use binary data a lot. Null
> terminated strings are *out* in quality code, in most places. C++
> doesn't use them, and most code people say the code is '8 bit clean'
> to mean that you can put 0 char in a string and the code will still
> work.
Fine. I think it all depends on context. Ironically, today I'm being
paid to retrofit libJudy to some old C software that is very much valued
in daily production use, and JudySL has a definite place in that
context. (First time since 2002 or so that I've actually used Judy for
something.)
> Of course, JudyL can be used to do this, either by making a tree of
> JudyL arrays, or better, a hybrid data structure using some Judy
> fanout followed by some other data structure such as a hashtable or
> just a plain list.
Right.
> In fact 'JudyL' should really be a special case of this with length =
> sizeof(void*).
As I described, nothing stops you from building JudyVL (let's call it,
where V = variable length key) using JudyL as the engine for your tree
nodes.
> There are certainly ways to use the existing technology, no doubt
> about that. The question is why the most general interface is not
> provided.
Because...
- the original motivation for libJudy was to map words to words, and
that alone is a common and interesting problem
- we wrestled with some very difficult lower-level concepts for a long
time, that had to do with bits, bytes, and words, to get to the Judy
IV that was LGPL'd
- variable-length keys weren't high enough on our list at the time
- we ran out of time (although I did explore the topic, as related
previously)
It might just come down to expectations. If you accept Judy for what it
is, not expect it to be more than it is, it still has great value, both
as an implementation and (once you really understand the data structures
and algorithms), as a philosophical education. ("How did they do
that?") The latter I tried to explain as best I could in the Shop
Manual, in lieu of time/energy to create formal academic-style
documents.
> The only reason not to provide a SINGLE interface with variable bit
> length keys is optimisation. JudyL + length count then subsumes
> Judy1, JudySL and JudyHS.
Don't be so quick to judge that. As I learned from Doug, you don't know
anything about performance until you look at what's actually generated
and run. You might find that, unless you screw it up somehow by being
careless, clever, or not knowing enough, if you build a JudyVL on top of
JudyL, the resulting code and run-time path looks very much like what
you might create if you somehow engineered it all at a lower level
"inside the Judy tree".
That's why I like to say that you can think of JudyL as being very much
like a hash table of hash table... where the hash algorithm is, "index
on next byte" (fast and simple although the nodes are often
"compressed"), and the synonyms are handled by the next level hash table.
Turns out the code acts very much like that.
> People wishing to use 0 terminated strings just have to call strlen()
> on the argument, and learn to stop using crappy data structures.
Your quickness to demean anything you don't like, grows tiresome.
> Just wishing JudySL used a length count, not null termination, since
> 99% of all uses will be storing binary data which mandates inclusion
> of null bytes.
Then it wouldn't be JudySL. Check out the JudyNL code that I think is
already in the library but undocumented. IF I recall right, it's in
there, although Doug wouldn't let me fully support and document it.
Alan
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel