[
https://issues.apache.org/jira/browse/LUCENE-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15481905#comment-15481905
]
Uwe Schindler commented on LUCENE-7318:
---------------------------------------
bq. Hmm, why not leave StopFilter, etc., in core, and put (deprecated)
subclasses in the old package names?
I plan to do this for 6.x and 6.2.1, but I won't deprecate the duplicates for
now. So I will just subclass in analyzers/common, although this is still a lot
of code duplication (most classes only have ctors that need to be cloned).
All other discussion should be placed in LUCENE-7444. Once this is discussed
and finalized, we can decide in 6.3, which classes to deprecate (if we do this
at all). My personal opinion is:
- Move StandardTokenizer to core (no package name change, so no backwards layer
needed)
- Move no-op StandardFilter to core, too, but deprecate from beginning (no
package name change, so no backwards layer needed)
- Add all "original" classes back in analyzers/common by subclassing, but don't
deprecate
Later-on (LUCENE-7444):
- Remove StopFilter. For first time users, the decision of Stop words or not
should be simple and our recommendation: no stop words please for something
thats called "Standard"
- StopFilter and all its superclasses and utility classes move back into
analysis/common. I'd also suggest this for LowercaseFilter and just clone it in
core as a package-private class inside oal/analysis/standard.
- The CharacterUtils can stay in core, but moved completely to utils package (I
have no strong opinion there)
People that want to have stopwords can always define their own Analyzer using
CustomAnalyzer.
> Graduate StandardAnalyzer out of analyzers module into core
> -----------------------------------------------------------
>
> Key: LUCENE-7318
> URL: https://issues.apache.org/jira/browse/LUCENE-7318
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Blocker
> Fix For: master (7.0), 6.2, 6.2.1
>
> Attachments: LUCENE-7318.patch
>
>
> Spinoff from LUCENE-7314:
> {{StandardAnalyzer}} has progressed substantially since we broke out the
> analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode
> Text Segmentation). It's also much faster than it used to be, since it
> switched to JFlex a while back. Many bug fixes, etc.
> I think it would make a good default for most Lucene users, and we should
> graduate it from the analyzers module into core, and make it the default for
> {{IndexWriter}}.
> It's really quite crazy that users must go digging in the analyzers module to
> get started with Lucene ... we don't make them dig through the codecs module
> to find a good default codec ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]