[ https://issues.apache.org/jira/browse/LUCENE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296959#comment-16296959 ]
ASF subversion and git services commented on LUCENE-5803: --------------------------------------------------------- Commit 9f7f76f267bd46b0069731ba1ae4990d31c33df8 in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9f7f76f ] LUCENE-5803: Add a Solr test that we reuse analysis components across fields for the same field type > Add another AnalyzerWrapper class that does not have its own cache, so > delegate-only wrappers don't create thread local resources several times > ----------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-5803 > URL: https://issues.apache.org/jira/browse/LUCENE-5803 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 4.9 > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: 4.10, 6.0 > > Attachments: LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch, > LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch > > > This is a followup issue for the following Elasticsearch issue: > https://github.com/elasticsearch/elasticsearch/pull/6714 > Basically the problem is the following: > - Elasticsearch has a pool of Analyzers that are used for analysis in several > indexes > - Each index uses a different PerFieldAnalyzerWrapper > PerFieldAnalyzerWrapper uses PER_FIELD_REUSE_STRATEGY. Because of this it > caches the tokenstreams for every field. If there are many fields, this are a > lot. In addition, the underlying analyzers may also cache tokenstreams and > other PerFieldAnalyzerWrappers do the same, although the delegate Analyzer > can always return the same components. > We should add similar code to Elasticsearch's directly to Lucene: If the > delegating Analyzer just delegates per Field or just wraps CharFilters around > the Reader, there is no need to cache the TokenStreamComponents a second time > in the delegating Analyzers. This is only needed, if the delegating Analyzers > adds additional TokenFilters (like ShingleAnalyzerWrapper). > We should name this new class DelegatingAnalyzerWrapper extends > AnalyzerWrapper. The wrapComponents method must be final, because we are not > allowed to add additional TokenFilters, but unlike ES, we don't need to > disallow wrapping with CharFilters. > Internally this class uses a private ReuseStrategy that just delegates to the > underlying analyzer. It does not matter here if the strategy of the delegate > is global or per field, this is private to the delegate. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org