[ 
https://issues.apache.org/jira/browse/LUCENE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746380#action_12746380
 ] 

Tim Smith commented on LUCENE-1842:
-----------------------------------

bq. still pay the price for filling the two hashmaps and the cache lookups. 
this would only ever be incurred once per thread (if the same root 
AttributeSource was always used)
the cache lookups would still need to be done at TokenStream.reset() time, 
however they would pretty much always get a hit

the main use case this proposal supports is as follows:

i have a TokenStream that merges multiple sub token streams (i call this out in 
LUCENE-1826)
in order to do this really efficiently, all sub token streams need to share the 
same AttributeSource
then, the "merging" TokenStream can just iterate through its sub streams, 
calling incrementToken() to consume all tokens from each stream

without the ability to reset the "sub" streams AttributeSource to the same 
AttributeSource used by this merging TokenStream, the you have to copy the 
attributes from the sub streams as you iterate 
furthermore, the "sub" TokenStreams could potentially be any TokenStream (or 
chain of TokenStreams rooted with a Tokenizer)
without the reset(AttributeSource) method, i would have to create the 
TokenStream chain anew for every "merging" TokenStream (or do the attribute 
copying approach)


> Add reset(AttributeSource) method to AttributeSource
> ----------------------------------------------------
>
>                 Key: LUCENE-1842
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1842
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Analysis
>            Reporter: Tim Smith
>            Priority: Minor
>             Fix For: 2.9
>
>
> Originally proposed in LUCENE-1826
> Proposing the addition of the following method to AttributeSource
> {code}
> public void reset(AttributeSource input) {
>     if (input == null) {
>       throw new IllegalArgumentException("input AttributeSource must not be 
> null");
>     }
>     this.attributes = input.attributes;
>     this.attributeImpls = input.attributeImpls;
>     this.factory = input.factory;
> }
> {code}
> Impacts:
> * requires all TokenStreams/TokenFIlters/etc to call addAttribute() in their 
> reset() method, not in their constructor
> * requires making AttributeSource.attributes and 
> AttributeSource.attributesImpl non-final
> Advantages:
> Allows creating only a single actual AttributeSource per thread that can then 
> be used for indexing with a multitude of TokenStream/Tokenizer combinations 
> (allowing utmost reuse of TokenStream/Tokenizer instances)
> this results in only a single "attributes"/"attributesImpl" map being 
> required per thread
> addAttribute() calls will almost always return right away (will only be 
> "initialized" once per thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to