Re: Using Lucene for searching tokens, not storing them.

karl wettin Sun, 16 Apr 2006 10:17:07 -0700


15 apr 2006 kl. 21.32 skrev Paul Elschot:


implements TermPositions {
         public int nextPosition() throws IOException {


This enumerates all positions of the Term in the document
as returned by the Tokenizer used by the Analyzer


Aha. And I didn't see the TermPositionVector until now.

This leads me to a new question. How is multiple fields with the samename treated? Are the positions concated or in a "z-axis"? I seeSpanQuery-troubles with both.


Concated renders SpanFirst unusable on fields n > 0
        [hello,0] [world,1] [foo,2] [bar,3]

"Z-axis" mess up SpanNear, as "hello bar" is correct.
        [hello,0] [world,1]
        [foo,0] [bar,1]

Hmm.. (with double semantics, as this would mean I can't use the termpositions to train my hidden markov models).


Thanks for explaining!

For any interested party, I do this because I have a fairly smallcorpus with very heavy load. I think there is a lot to win by notcreating new instances of what not, seeking in the file-centricDirectory, parsing pseudo-UTF8, et.c. at query time. I simply storeall instance of everything (the index in a bunch of Lists and Maps.Bits are cheaper than ticks.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene for searching tokens, not storing them.

Reply via email to