>>My documents are quite big sometimes up to 300ktokens.

You could look at indexing them as seperate documents using overlapping 
sections of text. Erik used this for one of his projects.

Cheers
Mark



----- Original Message ----
From: Michael Stoppelman <stop...@gmail.com>
To: java-user@lucene.apache.org
Sent: Tuesday, 3 February, 2009 7:24:06
Subject: Poor QPS with highlighting

Hi all,

My search backends are only able to eek out 13-15 qps even with the entire
index in memory (this makes it very expensive to scale). According to my
YourKit profiler 80% of the program's time ends up in highlighting. With
highlighting disabled my backend gets about 45-50 qps (cheaper scaling)!
We're using Mark's TokenSources contrib. to make reconstructing of the
document quicker. I was contemplating patching the index to store offsets
for every term (instead of just the ordinal positions) so that I could make
the highlighting faster (since you would know where you hit in the document
on the search pass). I saw this thread from 2004:
http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg04743.html -
which asks about adding offsets to the index but it was decided against
because it would make the index too large. I can totally understand this;
but as machines get more beefy it would probably be nice to make this
optional since having 15 qps vs 50qps is quite a trade-off right now. Are
other folks seeing this? My documents are quite big sometimes up to 300k
tokens. Also my document fields are compressed which is also a time sink for
the cpu.

Please let me know if you need more details, happy to share.

Sincerely,
M



   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to