[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558832#comment-13558832
 ] 

Renaud Delbru commented on LUCENE-4642:
---------------------------------------

{quote}
Personally: I think we should remove Tokenizer(AttributeSource): it bloats the 
APIs and causes ctor explosion.
{quote}

Why not the contrary instead ? I.e., remove Tokenizer(AttributeFactory) and 
leave Tokenizer(AttributeSource) since AttributeFactory is an enclosed class of 
AttributeSource ? Limiting the API to only AttributeFactory will restrict it 
unnecessarily imho.

Our use case is to be able to create "advanced token streams", where one 
"parent token stream" can have multiple "child token streams", the parent token 
stream will share their attribute sources with the child token streams for 
performance reasons. Emulating this behaviour by doing copies of the attributes 
from stream to stream is really ineffective (our throughput is divided by at 
least 3).
A more concrete use case is the ability to create "specific token streams" for 
a particular "token type". For example, our parent tokenizer tokenizes a string 
into a list of tokens, each one having a specific type. Then, each token is 
processed downstream by "child token streams". The child token stream that will 
process the token depends on the token type attribute.
                
> TokenizerFactory should provide a create method with a given AttributeSource
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-4642
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4642
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.1
>            Reporter: Renaud Delbru
>            Assignee: Steve Rowe
>              Labels: analysis, attribute, tokenizer
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to