Uwe Schindler created LUCENE-7444:
-------------------------------------

             Summary: Remove StopFilter from StandardAnalyzer in Lucene-Core
                 Key: LUCENE-7444
                 URL: https://issues.apache.org/jira/browse/LUCENE-7444
             Project: Lucene - Core
          Issue Type: Task
          Components: core/other, modules/analysis
    Affects Versions: 6.2
            Reporter: Uwe Schindler


Yonik said on LUCENE-7318:

{quote}
bq. I think it would make a good default for most Lucene users, and we should 
graduate it from the analyzers module into core, and make it the default for 
IndexWriter.

This "StandardAnalyzer" is specific to English, as it removes English stopwords.
That seems to be an odd choice now for a few reasons:
- It was argued in the past (rather vehemently) that Solr should not prefer 
english in it's default "text" field
- AFAIK, removing stopwords is no longer considered best practice.

Given that removal of english stopwords is the only thing that really makes 
this analyzer english-centric (and given the negative impact that can have on 
other languages), it seems like the stopword filter should be removed from 
StandardAnalyzer.
{quote}

When trying to fix the backwards incompatibility issues in LUCENE-7318, it 
looks like most unrelated code moved from analysis module to core (and changing 
package names!!!! :( ) was related to word list loading. If we follow Yonik's 
suggestion, we can revert all those changes. I agree with hin, an "universal" 
analyzer should not have any language specific stop-words.

The other thing is LowercaseFilter, but I'd suggest to simply add a clone of it 
to Lucene core and leave the analysis-module self-contained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to