On 5-Mar-09, at 2:42 PM, Chris Hostetter wrote:
: What I would LOVE is if I could do it in a standard Lucene search
like I
: mentioned earlier.
: Hit.doc[0].getHitTokenList() :confused:
: Something like this...
The Query/Scorer APIs don't provide any mechanism for information like
that to be conveyed back up the call chain -- mainly because it's more
heavy weight then most people need.
If you have custom Query/Scorer implementations, you can keep track of
whatever state you want when executing a QUery -- in fact the
SpanQuery
family of queries do keep track of exactly the type of info you seem
to
want, and after executing a query, you can ask it for the "Spans" of
any
matching document -- the down side is the a loss in performance of
query
execution (because it takes time/memory to keep track of all the
matches)
Even then, if I'm not mistaken, spans track token _positions_, not
_offsets_ in the original string.
A reverse text index like lucene is fast precisely because it doesn't
have to keep track of this information. I think the best alternative
might be to use termvectors, which are essentially a cache of the
analyzed tokens for a document.
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org