Re: Does Index have a Tokenizer Built into it

John Paul Sondag Wed, 18 Jul 2007 09:14:46 -0700

Is there a way to know how big to make the array before hand (how many terms
are in the topic total?).  I'm worried about the efficiency of this, since
I'd have to rebuild every document that is a "hit" on the fly to make a
snippet for each "hit" on the page (say 10 a page).


Now I have to wonder how storing the termPosition vectors in the index +
sorting them by position  compares to storing the location of the document +
using a tokenizer on the document.  Both in the end give me the result I
want.

Any opinions?

--JP

On 7/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: After indexing I have been able to retrieve the TermPositionVector from
the
: index and it has all of the data, but I cannot find a way where given a
: position I can retrieve the term at that position. Which is how I was
hoping
: to create my contextual snippets.

there is no easy way to go from a position to a term -- coincidently there
is a very recent thread on this on java-dev...

http://www.nabble.com/Best-Practices-for-getting-Strings-from-a-position-range-tf4084187.html

...a new API may come out of it, but in the mean time you may be
interested in taking the approach the current highlighter uses (as
mentioned in that thread), of using the TermPositionVector to rebuild the
orriginal tokenstream, then skipping ahead to the positions you are
interested in.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Does Index have a Tokenizer Built into it

Reply via email to