[ 
https://issues.apache.org/jira/browse/LUCENE-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-7318:
---------------------------------------
    Attachment: LUCENE-7318.patch

Rote patch, moving {{StandardAnalyzer/Tokenizer}}, and the utility
classes it uses, to core's oal.analysis module.

I left {{ClassicAnalyzer}} and {{UAX29URLEmailTokenizer}} in the
analysis module.

"ant test" passes but precommit is still angry about some javadocs
... I'll iterate.

The one non-rote change I did was to move the
{{ENGLISH_STOP_WORDS_SET}} from {{StopAnalyzer}} (still in analyzers
module) to {{StandardAnalyzer}}.

I also added "jflex" target to core's build.xml, to regenerate the
tokenizer.

I left {{ClassicAnalyzer}}, and the factories, in the analysis/common
module.


> Graduate StandardAnalyzer out of analyzers module into core
> -----------------------------------------------------------
>
>                 Key: LUCENE-7318
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7318
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.2
>
>         Attachments: LUCENE-7318.patch
>
>
> Spinoff from LUCENE-7314:
> {{StandardAnalyzer}} has progressed substantially since we broke out the 
> analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode 
> Text Segmentation).  It's also much faster than it used to be, since it 
> switched to JFlex a while back.  Many bug fixes, etc.
> I think it would make a good default for most Lucene users, and we should 
> graduate it from the analyzers module into core, and make it the default for 
> {{IndexWriter}}.
> It's really quite crazy that users must go digging in the analyzers module to 
> get started with Lucene ... we don't make them dig through the codecs module 
> to find a good default codec ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to