>>My documents are quite big sometimes up to 300ktokens. You could look at indexing them as seperate documents using overlapping sections of text. Erik used this for one of his projects.
Cheers Mark ----- Original Message ---- From: Michael Stoppelman <stop...@gmail.com> To: java-user@lucene.apache.org Sent: Tuesday, 3 February, 2009 7:24:06 Subject: Poor QPS with highlighting Hi all, My search backends are only able to eek out 13-15 qps even with the entire index in memory (this makes it very expensive to scale). According to my YourKit profiler 80% of the program's time ends up in highlighting. With highlighting disabled my backend gets about 45-50 qps (cheaper scaling)! We're using Mark's TokenSources contrib. to make reconstructing of the document quicker. I was contemplating patching the index to store offsets for every term (instead of just the ordinal positions) so that I could make the highlighting faster (since you would know where you hit in the document on the search pass). I saw this thread from 2004: http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg04743.html - which asks about adding offsets to the index but it was decided against because it would make the index too large. I can totally understand this; but as machines get more beefy it would probably be nice to make this optional since having 15 qps vs 50qps is quite a trade-off right now. Are other folks seeing this? My documents are quite big sometimes up to 300k tokens. Also my document fields are compressed which is also a time sink for the cpu. Please let me know if you need more details, happy to share. Sincerely, M --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org