Re: AW: AW: New Judy version

Doug Baskins Sun, 01 Feb 2009 19:20:34 -0800

John:

I assume that you want it sorted by some means since JudyHS()
already does it.  I also assume that you want a lexicographical
sort.  If you pass anything other than a uint8_t string, the results
will be different in big and little Endian machines.  Things like
JudySLNext wont be of much use with non-uint8_t strings.
Binary (non-uint8_t) sorting becomes tricky and machine dependent in
any case.  I don't want to spend the rest of my life explaining that to people.
There are other problems too, but I can't remember them right now.

I have already decided to use 4 bytes decodes in 64 bit versions
of JudySL on the new version.  JudySL is surprisingly fast because
of the low entropy of sorted text strings and get worse the longer
number of decode bytes.  Sorted numbers also have very low entropy.
I might use that more effectively with the new Judy -- if I have time?
Well, I have the rest of my life.  My ultimate goal is to finally put hash
methods to rest.

Doug

Doug Baskins <[email protected]>

----- Original Message ----
From: john skaller <[email protected]>
To: Doug Baskins <[email protected]>
Cc: [email protected]; Aleksey Cheusov <[email protected]>
Sent: Sunday, February 1, 2009 8:08:11 PM
Subject: Re: AW: AW: New Judy version

On 01/02/2009, at 6:06 PM, Doug Baskins wrote:

>  What is
> a good API for C++?

BTW: the one thing I'd ask for in a new Judy is a version which accepts
length controlled strings (instead of or as well as NULL terminated  
strings).
This should be quite easy, you pass the keys as two values: a pointer
and a length count. When "recursing" along the string, the internal
API just subtracts one from the length count. When the length is zero,
you're at the end of the string. The *logic* would be identical to the
current JudyS, only the data type of a key would change sligthtly.

This API allows arbitrary binary keys (not just strings). The only
trick is that the user has to ensure the keys are clean, eg padding
bytes are always zero, and the keys are canonical.

It would be interesting to compare "Judy LCS with length=4" for 32 bit  
data
with JudyL on a 64 bit extension of that data: 4 byte keys instead of 8,
but the overhead of one extra parameter (the length) to be passed around
a loop instead of using loop unrolled 8 times (using the program counter
to keep track of the tree depth).

--
john skaller
[email protected]

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: AW: AW: New Judy version

Reply via email to