[ 
https://issues.apache.org/jira/browse/LUCENE-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452342#comment-15452342
 ] 

Yonik Seeley commented on LUCENE-7318:
--------------------------------------

bq. I think it would make a good default for most Lucene users, and we should 
graduate it from the analyzers module into core, and make it the default for 
IndexWriter.

This "StandardAnalyzer" is specific to English, as it removes English stopwords.
That seems to be an odd choice now for a few reasons:
- It was argued in the past (rather vehemently) that Solr should not prefer 
english in it's default "text" field
- AFAIK, removing stopwords is no longer considered best practice.

Given that removal of english stopwords is the only thing that really makes 
this analyzer english-centric (and given the negative impact that can have on 
other languages), it seems like the stopword filter should be removed from 
StandardAnalyzer.

> Graduate StandardAnalyzer out of analyzers module into core
> -----------------------------------------------------------
>
>                 Key: LUCENE-7318
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7318
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.2
>
>         Attachments: LUCENE-7318.patch
>
>
> Spinoff from LUCENE-7314:
> {{StandardAnalyzer}} has progressed substantially since we broke out the 
> analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode 
> Text Segmentation).  It's also much faster than it used to be, since it 
> switched to JFlex a while back.  Many bug fixes, etc.
> I think it would make a good default for most Lucene users, and we should 
> graduate it from the analyzers module into core, and make it the default for 
> {{IndexWriter}}.
> It's really quite crazy that users must go digging in the analyzers module to 
> get started with Lucene ... we don't make them dig through the codecs module 
> to find a good default codec ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to