[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562853#comment-13562853
 ] 

Renaud Delbru commented on LUCENE-4642:
---------------------------------------

@steve:

{quote}
have you looked at TeeSinkTokenFilter
{quote}

Yes, and from my current understanding, it is similar to our current 
implementation. The problem with this approach is that the exchange of 
attributes is performed using the AttributeSource.State API with 
AttributeSource#captureState and AttributeSource#restoreState, which copies the 
values of all attribute implementations that the state contains, and this is 
very inefficient as it has to copies arrays and other objects (e.g., char term 
arrays, etc.) for every single token.

@robert:

Concerning the problem of UOEs, the new patch of Steve reduces the number of 
UOEs to one only, which is much more reasonable than my first approach. I have 
looked at the current state of the Lucene trunk, and there are already a lot of 
UOEs in many places. So, I would suggest that this problem may not be a 
blocking one (but I might be wrong).

Concerning the problem of constructor explosion, maybe we can find a consensus. 
Your proposition of removing Tokenizer(AttributeSource) cannot work for us, as 
we need it to share a same AttributeSource across multiple streams. However, as 
I proposed, removing the Tokenizer(AttributeFactory) could work as it could be 
emulated by using Tokenizer(AttributeSource).


                
> TokenizerFactory should provide a create method with a given AttributeSource
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-4642
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4642
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.1
>            Reporter: Renaud Delbru
>            Assignee: Steve Rowe
>              Labels: analysis, attribute, tokenizer
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to