I've submitted a patch (BUG# 23730) very similar to yours, in response to a request to fix phrases matching where they should not:
<URL:http://mail-archive.com/[EMAIL PROTECTED]/msg04349.html>
Bug #23730: <URL:http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730>
> But, how would you actually *use* an index that was indexed with the > holes noted by > 1 position increments?
As the lucene-user email linked above notes, setting the position increment interdicts false phrase matching.
Steve Rowe
Erik Hatcher wrote:
On Tuesday, October 21, 2003, at 03:36 AM, Pierrick Brihaye wrote:
The basic idea is to have several tokens at the same position (i.e. setPositionIncrement(0)) which are different possible stems for the same word.
Right. Like I said, I recognize the benefits of using a position increment of 0.
I certainly see the benefit of putting tokens into zero-increment positions, but are increments of 2 or more at all useful?
Who knows ? I may be interesting to keep track of the *presence* of "empty words", e.g. "[the] sky [is] blue", "[the] sky [is] [really] blue", "[the] sky [is] [that] [really] blue". The traditionnal reduction to "sky blue" is maybe over-simplistic for some cases...
But, how would you actually *use* an index that was indexed with the holes noted by > 1 position increments?
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]