[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900057#comment-16900057 ]
Steve Rowe commented on SOLR-6468: ---------------------------------- Hi Alexander, I'm not sure about the performance impact, you'd have to test to see how it performs on your own data. The only downside I know of: Since you're removing content prior to tokenization, if the boundaries you use for MappingCharFilter are not the same as those used in tokenization, or if your replacement string impacts tokenization, you may see some differences from the behavior of your analysis chain when using StopFilter. My recommendation: test using some real world data. > Regression: StopFilterFactory doesn't work properly without deprecated > enablePositionIncrements="false" > ------------------------------------------------------------------------------------------------------- > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis > Affects Versions: 4.8.1, 4.9, 5.3.1, 6.6.2, 7.1 > Reporter: Alexander S. > Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > <fieldType name="words_ngram" class="solr.TextField" omitNorms="false" > autoGeneratePhraseQueries="true"> > <analyzer> > <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\w]+" /> > <filter class="solr.StopFilterFactory" words="url_stopwords.txt" > ignoreCase="true" /> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org