[jira] [Commented] (LUCENE-7318) Graduate StandardAnalyzer out of analyzers module into core

Uwe Schindler (JIRA) Sun, 11 Sep 2016 08:17:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15481905#comment-15481905
 ]


Uwe Schindler commented on LUCENE-7318:
---------------------------------------

bq. Hmm, why not leave StopFilter, etc., in core, and put (deprecated) 
subclasses in the old package names?

I plan to do this for 6.x and 6.2.1, but I won't deprecate the duplicates for 
now. So I will just subclass in analyzers/common, although this is still a lot 
of code duplication (most classes only have ctors that need to be cloned).

All other discussion should be placed in LUCENE-7444. Once this is discussed 
and finalized, we can decide in 6.3, which classes to deprecate (if we do this 
at all). My personal opinion is:
- Move StandardTokenizer to core (no package name change, so no backwards layer 
needed)
- Move no-op StandardFilter to core, too, but deprecate from beginning (no 
package name change, so no backwards layer needed)
- Add all "original" classes back in analyzers/common by subclassing, but don't 
deprecate

Later-on (LUCENE-7444):
- Remove StopFilter. For first time users, the decision of Stop words or not 
should be simple and our recommendation: no stop words please for something 
thats called "Standard"
- StopFilter and all its superclasses and utility classes move back into 
analysis/common. I'd also suggest this for LowercaseFilter and just clone it in 
core as a package-private class inside oal/analysis/standard.
- The CharacterUtils can stay in core, but moved completely to utils package (I 
have no strong opinion there)

People that want to have stopwords can always define their own Analyzer using 
CustomAnalyzer.

> Graduate StandardAnalyzer out of analyzers module into core
> -----------------------------------------------------------
>
>                 Key: LUCENE-7318
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7318
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Blocker
>             Fix For: master (7.0), 6.2, 6.2.1
>
>         Attachments: LUCENE-7318.patch
>
>
> Spinoff from LUCENE-7314:
> {{StandardAnalyzer}} has progressed substantially since we broke out the 
> analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode 
> Text Segmentation).  It's also much faster than it used to be, since it 
> switched to JFlex a while back.  Many bug fixes, etc.
> I think it would make a good default for most Lucene users, and we should 
> graduate it from the analyzers module into core, and make it the default for 
> {{IndexWriter}}.
> It's really quite crazy that users must go digging in the analyzers module to 
> get started with Lucene ... we don't make them dig through the codecs module 
> to find a good default codec ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7318) Graduate StandardAnalyzer out of analyzers module into core

Reply via email to