Thanks for the response, Doug My working assumption was that whatever analysis was done in evaluating the query would be costly to repeat but from your breadown of what is actually required it looks like all of my requirements can be met based on calls to IndexReader#docFreq(term) which I would expect to be very quick.
As for your suggestion on selecting "best fragments" using RamDirectories - for the purposes of highlighting, the RAM indexing code and the highlighting code (marking up orginal text) would need to find a way to share the results of the same tokenization pass if it was to be performant. Before considering what is involved in coding this I did some benchmarking to compare processing times for different operations on the same set of 16kb sized docs using the same (stemming) analyzer: - Tokenization: 86 ms (avg time taken to simply tokenize the doc) - Highlighting: 90 ms ( avg time taken to parse query terms, tokenize. highlight query terms and select best fragments using current impl) - RAM indexing: 118 ms (avg time taken to tokenize and index docs only) As you can see, the RAM indexing approach to highlighting incurs some noticable overheads in its first step before I consider adding the steps to fragment docs, query and highlight., so I'm not sure if this approach is worth pursuing. I am tempted to just add some idf weighting into the current highlighter's fragment selection logic. Cheers Mark --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]