On 02/04/2012, at 11:32 PM, Doug Baskins wrote:
> John:
>
> Glad to hear of your successes with Judy.
>
> I have often thought of how to do a faster Judy[1L]Next(). It has
> always fallen below my priorities list -- partly because I believed
> that modern processor caches would lessen the overhead of a
> complete re-walking of the tree to almost the same place (thus same
> cache-lines).
I believe that is probably right, *provided* the code is in a tight
enough loop.
> However, I have recently discovered another problem with
> modern processors that may be a mitigating factor. That is
> "branch prediction" miss-match of the compiler with the processor.
Yes. Branch prediction causes speculative cache loads of both
instructions and data.
Gcc at least has a way to "hint" to the compiler whether a branch
is likely to be taken. Felix actually emits these hints. However I have
not noticed gcc doing anything with them.
What it is *supposed* to do is reorder the blocks so that branch
prediction will be good
>
> So here is my best shot on how I would approach the problem:
>
> Note: Key & Index words are used interchangeably and mean the same thing.
>
> A walk of the Judy tree often terminates with an object called a Leaf.
> A Leaf has 1..256 Key/Value items in sorted order for an average
> length of around 20+ elements. If Judy[1L]NextX() returned the
> whole Leaf (or partial if required), then that array of 20+ Keys/Values
> could be scanned very quickly. It would be up to the caller to
> determine how long this data would be valid (I.E. no Inserts or Deletes.)
> The caller would have to supply the memory for storing the Keys/Values
> along with how many were returned and the current Key. The caller would
> also have to remember what was the last Key accessed (cursor) out of the
> memory to determine the next Key. The distance from Key (Leaf population)
> to the Value would also be required. There a few other minor details
> I haven't mentioned.
Right. I understand I think. So you'd do a call like
next_block ...
and you'd get a single entry (as usual) OR if you hit a leaf,
an array.
> Actually, it would be fairly easy to implement, but the API would be ugly.
Hmmm ..
> Let me know if this seems worthwhile. It seems it has the potential of being
> much faster.
There's a way to hide this by control inversion:
for_each (Jarray, function);
perhaps more generally:
map ...
fold_left
fold_right
These are traditional functional programming things. Because you have to supply
a callback, it's a lot more of a pain to use: you are the slave of the
iteration,
instead of the master.
BUT .. the big advantage here is that it could use faster sequential
scanning technology .. without exposing the details like the "get a block"
API mentioned above.
A compromise API would be like: "here is an array size 100, fill it
with 100 entries starting at key such-and-such". You could then optimally
process the 100 entries, then get the next 100.
> About the Null terminator problem with JudySL. I think a student in perhaps
> Brazil has written a *Next() to JudyHS(). If I remember correctly, a simple
> #define will change JudyHS() into a sorted (numeric?) order store of the tree
> of trees.
> If your interested, I can research it further.
Well, Felix at least has a use for:
set of words
map of word to word
set of strings
map of string to word
map of string to string
Generally, a map of word to "anything" would be good.
This can be done by map to a word, which is a pointer.
At the cost of an indirection and a memory management problem.
However Felix uses C++ strings which are 8 bit clean.
Text will never contain a null: utf-8 encoded Unicode contains
no nulls. ASCII contains no nulls (in human readable text).
But strings also get used for binary blobs :)
A trivial example: an IP address is 4 bytes, some of which can be 0.
Of course this can be done now, with a JudyL instead of S.
--
john skaller
[email protected]
http://felix-lang.org
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel