15 apr 2006 kl. 21.32 skrev Paul Elschot:
implements TermPositions {
public int nextPosition() throws IOException {
This enumerates all positions of the Term in the document
as returned by the Tokenizer used by the Analyzer
Aha. And I didn't see the TermPositionVector until now.
This leads me to a new question. How is multiple fields with the same
name treated? Are the positions concated or in a "z-axis"? I see
SpanQuery-troubles with both.
Concated renders SpanFirst unusable on fields n > 0
[hello,0] [world,1] [foo,2] [bar,3]
"Z-axis" mess up SpanNear, as "hello bar" is correct.
[hello,0] [world,1]
[foo,0] [bar,1]
Hmm.. (with double semantics, as this would mean I can't use the term
positions to train my hidden markov models).
Thanks for explaining!
For any interested party, I do this because I have a fairly small
corpus with very heavy load. I think there is a lot to win by not
creating new instances of what not, seeking in the file-centric
Directory, parsing pseudo-UTF8, et.c. at query time. I simply store
all instance of everything (the index in a bunch of Lists and Maps.
Bits are cheaper than ticks.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]