DO NOT REPLY [Bug 23730] New: - Token positioning disallows phrase matching across stopwords

bugzilla Fri, 10 Oct 2003 08:37:28 -0700

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730

Token positioning disallows phrase matching across stopwords

           Summary: Token positioning disallows phrase matching across
                    stopwords
           Product: Lucene
           Version: CVS Nightly - Specify date in submission
          Platform: All
               URL: http://www.mail-archive.com/lucene-
                    [EMAIL PROTECTED]/msg04349.html
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: Analysis
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


The URL I gave is to an archived Lucene-User mailing list post, in which a new
user describes surprise at phrase queries succeeding when stopwords appear
between phrase tokens in the original text.

I think that the default StopFilter.java implementation should implement the
position adjusting behavior described in the Lucene API docs:
<URL:http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Token.html#setPositionIncrement(int)>
"Set [the position increment] to values greater than one to inhibit exact phrase
matches. If, for example, one does not want phrases to match across removed stop
words, then one could build a stop word filter that removes stop words and also
sets the increment to the number of stop words removed before each non-stop
word. Then exact phrase queries will only match when the terms occur with no
intervening stop words."

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 23730] New: - Token positioning disallows phrase matching across stopwords

Reply via email to