On 11/16/2012 6:47 PM, john skaller wrote:
On 17/11/2012, at 10:43 AM, Tim Margheim wrote:
One of the most frequent operations I need to perform is a prefix
search, i.e. "return the first element that starts with Index, or return
null if nothing does".
I'm currently doing this by calling JudySLNext(). If it returns a
non-null pointer, but the new Index doesn't start with the prefix, I
still return null.
Two questions:
1.) Does anyone know of a more efficient way to do that with the current
set of Judy functions?
The searching is close to optimal. The only cost is copying the unwanted
key, i.e. descending into the Judy tree copying characters as we go,
when we don't actually want that key.
In the first question, I was asking if anyone can think of a more
efficient way to do a prefix search /using/ the current Judy
functions--not asking for something more efficient /than/ the current
functions.
It's hard to see how to improve on "JudySLNext() followed by an
strncmp() on the returned key", but I thought it couldn't hurt to ask.
However subsequent insertion is expensive since the search
must be repeated, even when the position for a new key is known.
Someone made a "cursor" based version of Judy that fixed this,
however that code is not re-entrant and therefore unacceptable
to me at least, because their modified Judy only supported
one cursor.
Hmm, I might have to look into that--my current application isn't
affected much by the insertion performance, but it could matter for
another application that I have in mind. And I could make do with one
cursor.
However, if 0 is an invalid value for someone's JudyTree, then they
don't need to do a search-then-insert-if-it-failed. You can replace the
initial find with an insertion. If the pointed-to-value returned by the
insertion is 0, then you know the key wasn't already in the tree. If
the pointed-to-value is anything other than 0, then you know the key
/was/ already in the tree. So for those kinds of purposes, you don't
need a cursor.
In fact, even if 0 /is/ a valid value, you could get around it if your
application would allow you to store "value+1" instead of "value". The
only cost is an addition/subtraction on storage & retrieval of the
value, plus the slight reduction in the range of values you can store.
Also, if you modified the Judy insert function to accept a pointer to a
bool, then the bool could return whether or not the insertion created a
new element or returned an already-existing one.
2.) Doug, have you thought about making a search function that will give
up as soon as it determines that there are no matches beginning with
Index? What do you think about including it in one of the upcoming
versions?
A more consistent function would be
find-between
Good point. And it would be simple enough to write an inline
prefix-search function that auto-generates the upper-bound string and
wraps a call to find-between. (Though if you have to call the wrapper
function multiple times, it would involve multiple allocations of the
upper-bound buffer, so you might want to use the find-between function
directly in that case. Or make a version of the wrapper where the
upper-bound buffer is passed in and then populated by the wrapper.)
On the other hand, you'd also want to carefully compare the performance
of a find-between with the performance of a prefix search. It might be
that the find-between requires some additional instructions. Still,
find-between would certainly perform better than running Next and then
checking if the returned key is outside your bounds.
---
Tim
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel