> For me the 2 major improvements to judy arrays are:  1 - Using Judy
> arrays for non null terminated strings (this is currently a big
> drawback)...

I'd like to know if that's because you want basic sorting of those
strings as a side effect, or something else.  Basic sorting =
lexicographically (no knowledge of multibyte chars) versus
sort-by-length.

As I've mentioned before, I call this the length-terminated (like a C
string) versus length-associated (like a Perl string) problem.  In
general, any variable-sized key/index (chars, bytes, bits, whatever) is
length-associated, that is, no magic termination value inside the
string.  However, we wrote JudySL for C strings only, where NUL
termination exists, hence you get lexicographical sorting for free.

You can always save length-associated keys/indexes/strings efficiently
by making the first level a JudyL array by length (got that?), but this
array-of-array sorts/groups keys/strings by length first.  "Too bad," if
you want lexicographical sorting, use a different algorithm?

Storing variable-length, length-associated keys sorted lexicographically
is actually a fairly hard problem.  You give up some speed and space by
inserted magic "termination information" between the levels of the tree.
I think I wrote a paper about this that might be hiding somewhere on
SourceForge.

Cheers,
Alan Silverstein

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Reply via email to