DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730 Token positioning disallows phrase matching across stopwords Summary: Token positioning disallows phrase matching across stopwords Product: Lucene Version: CVS Nightly - Specify date in submission Platform: All URL: http://www.mail-archive.com/lucene- [EMAIL PROTECTED]/msg04349.html OS/Version: All Status: NEW Severity: Enhancement Priority: Other Component: Analysis AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] The URL I gave is to an archived Lucene-User mailing list post, in which a new user describes surprise at phrase queries succeeding when stopwords appear between phrase tokens in the original text. I think that the default StopFilter.java implementation should implement the position adjusting behavior described in the Lucene API docs: <URL:http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Token.html#setPositionIncrement(int)> "Set [the position increment] to values greater than one to inhibit exact phrase matches. If, for example, one does not want phrases to match across removed stop words, then one could build a stop word filter that removes stop words and also sets the increment to the number of stop words removed before each non-stop word. Then exact phrase queries will only match when the terms occur with no intervening stop words." --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
