Hi all, My search backends are only able to eek out 13-15 qps even with the entire index in memory (this makes it very expensive to scale). According to my YourKit profiler 80% of the program's time ends up in highlighting. With highlighting disabled my backend gets about 45-50 qps (cheaper scaling)! We're using Mark's TokenSources contrib. to make reconstructing of the document quicker. I was contemplating patching the index to store offsets for every term (instead of just the ordinal positions) so that I could make the highlighting faster (since you would know where you hit in the document on the search pass). I saw this thread from 2004: http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg04743.html - which asks about adding offsets to the index but it was decided against because it would make the index too large. I can totally understand this; but as machines get more beefy it would probably be nice to make this optional since having 15 qps vs 50qps is quite a trade-off right now. Are other folks seeing this? My documents are quite big sometimes up to 300k tokens. Also my document fields are compressed which is also a time sink for the cpu.
Please let me know if you need more details, happy to share. Sincerely, M