[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558832#comment-13558832 ]
Renaud Delbru commented on LUCENE-4642: --------------------------------------- {quote} Personally: I think we should remove Tokenizer(AttributeSource): it bloats the APIs and causes ctor explosion. {quote} Why not the contrary instead ? I.e., remove Tokenizer(AttributeFactory) and leave Tokenizer(AttributeSource) since AttributeFactory is an enclosed class of AttributeSource ? Limiting the API to only AttributeFactory will restrict it unnecessarily imho. Our use case is to be able to create "advanced token streams", where one "parent token stream" can have multiple "child token streams", the parent token stream will share their attribute sources with the child token streams for performance reasons. Emulating this behaviour by doing copies of the attributes from stream to stream is really ineffective (our throughput is divided by at least 3). A more concrete use case is the ability to create "specific token streams" for a particular "token type". For example, our parent tokenizer tokenizes a string into a list of tokens, each one having a specific type. Then, each token is processed downstream by "child token streams". The child token stream that will process the token depends on the token type attribute. > TokenizerFactory should provide a create method with a given AttributeSource > ---------------------------------------------------------------------------- > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 4.1 > Reporter: Renaud Delbru > Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org