[
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738113#comment-13738113
]
Uwe Schindler edited comment on LUCENE-5170 at 8/13/13 11:58 AM:
-----------------------------------------------------------------
Robert: After reviewing the code:
The fixed-nonchangeable "default" in AnalyzerWrapper is PerField, which is a
large overhead and should only be used in stuff like PerFieldAnalyzerWrapper
(this class should call super(PerField) in its own ctor). But for other use
cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped
analyzer). It looks like the current impl in AnalyzerWrapper is somehow
assuming you want to wrap per field.
I would suggest to make it mandatory in Lucene trunk, and add the missing ctor
in Lucene 4.x, too. The default one should be deprecated with a hint that it
might be a bad idea to use this default.
My use case is:
I have lots of predefined Analyzers for several languages or functionality in
my search application. I have some additional AnalyzerWrappers around that
simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I
can use that with another field). So, my wrapper just takes one of these
per-language Analyzers and wraps with another additional TokenFilter. As the
underlying Analyzer is global reuse, I need to make the wrapper global, too -
currently impossible. Per field is a waste of resources in this case.
Only PerFieldAnalyzerWrapper should use PerField strategy hardcoded (as it is
per field), the base class not!
So I would suggest to make the base class AnalyzerWrapper copy the ctor of the
superclass Analyzer and deprecate the default ctor in 4.x. For my above example
(to wrap another analyzer), I still need the resuse strategy of the inner
analyzer, so I need set getter on Analyzer.java, too (see current patch).
was (Author: thetaphi):
Robert: After reviewing the code:
The fixed-nonchangeable "default" in AnalyzerWrapper is PerField, which is a
large overhead and should only be used in stuff like PerFieldAnalyzerWrapper
(this class should call super(PerField) in its own ctor). But for other use
cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped
analyzer). It looks like the current impl in AnalyzerWrapper is somehow
assuming you want to wrap per field.
I would suggest to make it mandatory in Lucene trunk, and add the missing ctor
in Lucene 4.x, too. The default one should be deprecated with a hint that it
might be a bad idea to use this default.
My use case is:
I have lots of predefined Analyzers for several languages or functionality in
my search application. I have some additional AnalyzerWrappers around that
simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I
can use that with another field). So, my wrapper just takes one of these
per-language Analyzers and wraps with another additional TokenFilter. As the
underlying Analyzer is global reuse, I need to make the wrapper global, too -
currently impossible. Per field is a waste of resources in this case.
So I would suggest to make the base class AnalyzerWrapper copy the ctor of the
superclass Analyzer and deprecate the default ctor in 4.x. For my above example
(to wrap another analyzer), I still need the resuse strategy of the inner
analyzer, so I need set getter on Analyzer.java, too (see current patch).
> Add getter for reuse strategy to Analyzer
> -----------------------------------------
>
> Key: LUCENE-5170
> URL: https://issues.apache.org/jira/browse/LUCENE-5170
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5170.patch
>
>
> If you write an Analyzer that wraps another one (but without using
> AnalyzerWrapper) you may need use the same reuse strategy in your wrapper.
> This is not possible as there is no way to get the reuse startegy (private
> field and no getter).
> An example is ES's NamedAnalyzer, see my comment:
> [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
> This would add a getter, just a 3-liner.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]