Re: Getting tokens from search results. Simple concept

Mike Klaas Fri, 06 Mar 2009 18:45:49 -0800

On 5-Mar-09, at 2:42 PM, Chris Hostetter wrote:

: What I would LOVE is if I could do it in a standard Lucene searchlike I
: mentioned earlier.
: Hit.doc[0].getHitTokenList() :confused:
: Something like this...

The Query/Scorer APIs don't provide any mechanism for information like
that to be conveyed back up the call chain -- mainly because it's more
heavy weight then most people need.

If you have custom Query/Scorer implementations, you can keep track of
whatever state you want when executing a QUery -- in fact theSpanQueryfamily of queries do keep track of exactly the type of info you seemtowant, and after executing a query, you can ask it for the "Spans" ofanymatching document -- the down side is the a loss in performance ofqueryexecution (because it takes time/memory to keep track of all thematches)

Even then, if I'm not mistaken, spans track token _positions_, not_offsets_ in the original string.

A reverse text index like lucene is fast precisely because it doesn'thave to keep track of this information. I think the best alternativemight be to use termvectors, which are essentially a cache of theanalyzed tokens for a document.


-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Getting tokens from search results. Simple concept

Reply via email to