[ 
https://issues.apache.org/jira/browse/LUCENE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296959#comment-16296959
 ] 

ASF subversion and git services commented on LUCENE-5803:
---------------------------------------------------------

Commit 9f7f76f267bd46b0069731ba1ae4990d31c33df8 in lucene-solr's branch 
refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9f7f76f ]

LUCENE-5803: Add a Solr test that we reuse analysis components across fields 
for the same field type


> Add another AnalyzerWrapper class that does not have its own cache, so 
> delegate-only wrappers don't create thread local resources several times
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5803
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5803
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.10, 6.0
>
>         Attachments: LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch, 
> LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch
>
>
> This is a followup issue for the following Elasticsearch issue: 
> https://github.com/elasticsearch/elasticsearch/pull/6714
> Basically the problem is the following:
> - Elasticsearch has a pool of Analyzers that are used for analysis in several 
> indexes
> - Each index uses a different PerFieldAnalyzerWrapper
> PerFieldAnalyzerWrapper uses PER_FIELD_REUSE_STRATEGY. Because of this it 
> caches the tokenstreams for every field. If there are many fields, this are a 
> lot. In addition, the underlying analyzers may also cache tokenstreams and 
> other PerFieldAnalyzerWrappers do the same, although the delegate Analyzer 
> can always return the same components.
> We should add similar code to Elasticsearch's directly to Lucene: If the 
> delegating Analyzer just delegates per Field or just wraps CharFilters around 
> the Reader, there is no need to cache the TokenStreamComponents a second time 
> in the delegating Analyzers. This is only needed, if the delegating Analyzers 
> adds additional TokenFilters (like ShingleAnalyzerWrapper).
> We should name this new class DelegatingAnalyzerWrapper extends 
> AnalyzerWrapper. The wrapComponents method must be final, because we are not 
> allowed to add additional TokenFilters, but unlike ES, we don't need to 
> disallow wrapping with CharFilters.
> Internally this class uses a private ReuseStrategy that just delegates to the 
> underlying analyzer. It does not matter here if the strategy of the delegate 
> is global or per field, this is private to the delegate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to