Re: positional token info

Steve Rowe Tue, 21 Oct 2003 07:45:45 -0700

Erik,

I've submitted a patch (BUG# 23730) very similar to yours, in response to a request to fix phrases matching where they should not:

<URL:http://mail-archive.com/[EMAIL PROTECTED]/msg04349.html>

Bug #23730:
<URL:http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730>

> But, how would you actually *use* an index that was indexed with the
> holes noted by > 1 position increments?

As the lucene-user email linked above notes, setting the position increment interdicts false phrase matching.

Steve Rowe

Erik Hatcher wrote:

On Tuesday, October 21, 2003, at 03:36 AM, Pierrick Brihaye wrote:

The basic idea is to have several tokens at the same position (i.e. setPositionIncrement(0)) which are different possible stems for the same word.

Right. Like I said, I recognize the benefits of using a position increment of 0.

I certainly see the benefit of putting tokens into zero-increment positions, but are increments of 2 or more at all useful?

Who knows ? I may be interesting to keep track of the *presence* of "empty words", e.g. "[the] sky [is] blue", "[the] sky [is] [really] blue", "[the] sky [is] [that] [really] blue". The traditionnal reduction to "sky blue" is maybe over-simplistic for some cases...

But, how would you actually *use* an index that was indexed with the holes noted by > 1 position increments?

Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: positional token info

Reply via email to