[ https://issues.apache.org/jira/browse/LUCENE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746380#action_12746380 ]
Tim Smith commented on LUCENE-1842: ----------------------------------- bq. still pay the price for filling the two hashmaps and the cache lookups. this would only ever be incurred once per thread (if the same root AttributeSource was always used) the cache lookups would still need to be done at TokenStream.reset() time, however they would pretty much always get a hit the main use case this proposal supports is as follows: i have a TokenStream that merges multiple sub token streams (i call this out in LUCENE-1826) in order to do this really efficiently, all sub token streams need to share the same AttributeSource then, the "merging" TokenStream can just iterate through its sub streams, calling incrementToken() to consume all tokens from each stream without the ability to reset the "sub" streams AttributeSource to the same AttributeSource used by this merging TokenStream, the you have to copy the attributes from the sub streams as you iterate furthermore, the "sub" TokenStreams could potentially be any TokenStream (or chain of TokenStreams rooted with a Tokenizer) without the reset(AttributeSource) method, i would have to create the TokenStream chain anew for every "merging" TokenStream (or do the attribute copying approach) > Add reset(AttributeSource) method to AttributeSource > ---------------------------------------------------- > > Key: LUCENE-1842 > URL: https://issues.apache.org/jira/browse/LUCENE-1842 > Project: Lucene - Java > Issue Type: Wish > Components: Analysis > Reporter: Tim Smith > Priority: Minor > Fix For: 2.9 > > > Originally proposed in LUCENE-1826 > Proposing the addition of the following method to AttributeSource > {code} > public void reset(AttributeSource input) { > if (input == null) { > throw new IllegalArgumentException("input AttributeSource must not be > null"); > } > this.attributes = input.attributes; > this.attributeImpls = input.attributeImpls; > this.factory = input.factory; > } > {code} > Impacts: > * requires all TokenStreams/TokenFIlters/etc to call addAttribute() in their > reset() method, not in their constructor > * requires making AttributeSource.attributes and > AttributeSource.attributesImpl non-final > Advantages: > Allows creating only a single actual AttributeSource per thread that can then > be used for indexing with a multitude of TokenStream/Tokenizer combinations > (allowing utmost reuse of TokenStream/Tokenizer instances) > this results in only a single "attributes"/"attributesImpl" map being > required per thread > addAttribute() calls will almost always return right away (will only be > "initialized" once per thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org