On 5-Mar-09, at 2:42 PM, Chris Hostetter wrote:


: What I would LOVE is if I could do it in a standard Lucene search like I
: mentioned earlier.
: Hit.doc[0].getHitTokenList() :confused:
: Something like this...

The Query/Scorer APIs don't provide any mechanism for information like
that to be conveyed back up the call chain -- mainly because it's more
heavy weight then most people need.

If you have custom Query/Scorer implementations, you can keep track of
whatever state you want when executing a QUery -- in fact the SpanQuery family of queries do keep track of exactly the type of info you seem to want, and after executing a query, you can ask it for the "Spans" of any matching document -- the down side is the a loss in performance of query execution (because it takes time/memory to keep track of all the matches)

Even then, if I'm not mistaken, spans track token _positions_, not _offsets_ in the original string.

A reverse text index like lucene is fast precisely because it doesn't have to keep track of this information. I think the best alternative might be to use termvectors, which are essentially a cache of the analyzed tokens for a document.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to