Matt Quail wrote:
We have a system where I'll be given 10 or 20 unique keys.
I assume you mean you have one unique-key field, and you are given
10-20 values to find for this one field?
Internally I'm creating a new Term and then calling
IndexReader.termDocs() on this term. Then if termdocs.next()
matches then I'll return this document.
Are you calling reader.termDocs() inside a (tight) loop? It might be
better to create one TermEnum, and use "seek". Something like this:
Yes.. this is another approach I was thinking of taking. I was thinking
of building up a list of indexes which had a high probability of holding
the given document and then searching for each of them.
What I'm worried about though is that it would be a bit slower... I'm
just going to have to test out different implementations to see....
<snip>
I'm pretty sure that will work. And if you can avoid the multi-
threading issues, you might try and use the same TermDocs object for
as long as possible (that is, move it up out of as many tight loops
as you can).
Well... that doesn't look like the biggest overhead. The bottleneck
seens to be in seek() and the fact that its using an InputStream to read
bytes off disk. I actually tried to speed that up by crainking
InputSteam.BUFFER_SIZE var higher but that didn't work either though I'm
not sure if its a caching issue. I sent an email to the list about this
earlier but no one responded.
So it seems like my bottleneck is in seek() so It would make sense to
figure out how to limit this.
Is this O(log(N)) btw or is it O(N) ?
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]