[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

Nikhil Dhiman (JIRA) Tue, 13 Feb 2018 22:48:21 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363554#comment-16363554
 ]


Nikhil Dhiman commented on SOLR-6468:
-------------------------------------

Hi [~aheaven]

I am also facing the exact problem, not able to get the expected results 
because of the gap left by stop filter factory. I found you mentioned that it 
worked for you after enabling enablePositionIncrements="false" and 
luceneMatchVersion="4.3"  on stop filter factory. I tried the same but getting 
"Error instantiating class" error while creating the core. I am already on 
Lucene version 6.1. Below are exact config changes for your reference. Please 
suggest how to enable enablePositionIncrements feature for me. Thanks much in 
advance!


{panel:title=Config Changes}
          <fieldType class="solr.TextField" name="text_general" 
positionIncrementGap="100">
                <analyzer type="index">
                        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                        <filter class="solr.LowerCaseFilterFactory"/>           
        
                        <filter 
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                        <filter class="solr.StopFilterFactory" 
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" 
luceneMatchVersion="4.3" />     <filter class="solr.TrimFilterFactory"/>
                </analyzer>
                <analyzer type="query">
                        <charFilter class="solr.HTMLStripCharFilterFactory"/>
                        <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-specialchars.txt"/>                  
                        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                        <filter class="solr.StopFilterFactory" 
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" 
luceneMatchVersion="4.3"/>
                        <filter class="solr.LowerCaseFilterFactory"/>           
        
                        <filter 
class="solr.RemoveDuplicatesTokenFilterFactory"/>                               
                
                </analyzer>
                <similarity class="solr.IBSimilarityFactory">
                      <str name="distribution">LL</str>
                      <str name="lambda">DF</str>
                      <str name="normalization">H2</str>
                </similarity>
        </fieldType>
{panel}


> Regression: StopFilterFactory doesn't work properly without deprecated 
> enablePositionIncrements="false"
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6468
>                 URL: https://issues.apache.org/jira/browse/SOLR-6468
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.8.1, 4.9, 5.3.1, 7.1, 6.6.2
>            Reporter: Alexander S.
>            Priority: Major
>         Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
> <fieldType name="words_ngram" class="solr.TextField" omitNorms="false" 
> autoGeneratePhraseQueries="true">
>   <analyzer>
>     <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\w]+" />
>     <filter class="solr.StopFilterFactory" words="url_stopwords.txt" 
> ignoreCase="true" />
>     <filter class="solr.LowerCaseFilterFactory" />
>   </analyzer>
> </fieldType>
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

Reply via email to