15 apr 2006 kl. 21.32 skrev Paul Elschot:

implements TermPositions {
         public int nextPosition() throws IOException {

This enumerates all positions of the Term in the document
as returned by the Tokenizer used by the Analyzer

Aha. And I didn't see the TermPositionVector until now.

This leads me to a new question. How is multiple fields with the same name treated? Are the positions concated or in a "z-axis"? I see SpanQuery-troubles with both.

Concated renders SpanFirst unusable on fields n > 0
        [hello,0] [world,1] [foo,2] [bar,3]

"Z-axis" mess up SpanNear, as "hello bar" is correct.
        [hello,0] [world,1]
        [foo,0] [bar,1]

Hmm.. (with double semantics, as this would mean I can't use the term positions to train my hidden markov models).

Thanks for explaining!

For any interested party, I do this because I have a fairly small corpus with very heavy load. I think there is a lot to win by not creating new instances of what not, seeking in the file-centric Directory, parsing pseudo-UTF8, et.c. at query time. I simply store all instance of everything (the index in a bunch of Lists and Maps. Bits are cheaper than ticks.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to