[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598688#comment-13598688 ] Renaud Delbru commented on LUCENE-4642: --- Hi Steve, I imagine things were busy these past days with the 4.2 release. Would you need help to finalise this patch ? thanks. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587283#comment-13587283 ] Steve Rowe commented on LUCENE-4642: Hi Renaud, I skimmed your patch, looks good, I'll take a closer look in the next couple days for completeness and testing. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587269#comment-13587269 ] Renaud Delbru commented on LUCENE-4642: --- Hi, any updates about the patch ? thanks. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578331#comment-13578331 ] Renaud Delbru commented on LUCENE-4642: --- Hi, would this patch be considered for inclusion at some point in time ? Thanks. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569883#comment-13569883 ] Renaud Delbru commented on LUCENE-4642: --- Hi, I have submitted a patch which integrates: - the patch from Uwe - the removal of the Tokenizer(AttributeSource) constructor - the addition of a TokenizerFactory.create(AttributeFactory) method - some of the changes from the previous patch from Steve (e.g., TokenizerFactory.create method throw UOE by default) All test suites are passing. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564422#comment-13564422 ] Renaud Delbru commented on LUCENE-4642: --- Great, I think that AttributeFactory hack could work for us. Would you agree to add a TokenizerFactory.create(AttributeFactory) method ? I could prepare a patch for that. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563798#comment-13563798 ] Robert Muir commented on LUCENE-4642: - +1 for Uwe's patch. I think this constructor is dangerous, i dont want it on every tokenizer. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563791#comment-13563791 ] Uwe Schindler commented on LUCENE-4642: --- bq. And I guess I was secretly hoping we could remove Tokenizer(AttributeSource) if we fixed the solr hack. This is my opinion, too! > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563790#comment-13563790 ] Uwe Schindler commented on LUCENE-4642: --- TokenStreams are final and their settings should not be modifiable (the ones which still have setters are there for backwards compatibility in Lucene 3.x, in 4.0 all settings should be unmodifiable). It is also impossible to change the AttributeFactory or AttributeSource after construction because the attributes are created during construction (addAttribute in the implicit field initialization constructor), so changing the AttributeSource/Factory afterwards will not work. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563784#comment-13563784 ] Renaud Delbru commented on LUCENE-4642: --- Hi Robert, I understand your point of view. One possible alternative for simplifying the API would be to refactor constructors with AttributeSource/AttributeFactory into setters. After a quick look, this looks compatible with the existing tokenizers and tokenizer factories. The setting of AttributeSource/AttributeFactory for a tokenizer will be transparent (i.e., they do not have to explicitly create a constructor), and specific extension can be still implemented by subclasses (e.g., NumericTokenStream can overwrite the setAttributeFactory method to wrap a given factory with NumericAttributeFactory). For the tokenizer factories, we can then implement a create method with an AttributeSource/AttributeFactory parameter, which will call the abstract method create and then call the setAttributeSource/setAttributeFactory on the newly created tokenizer. What do you think ? Did I miss something in my reasoning which could break something ? > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563108#comment-13563108 ] Robert Muir commented on LUCENE-4642: - My problem i guess with AttributeSource/AttributeFactory is that they invade on every single custom tokenizer: the API is not good. I realize its useful for expert users to be able to plug in their own, but why in the world must *every* tokenizer have ctor explosion (minimum 3) to support this? And I guess I was secretly hoping we could remove Tokenizer(AttributeSource) if we fixed the solr hack. :) Again my main problem is not about what you want to do, its instead related to the existing APIs (Tokenizer.java) and where we are heading if we perpetuate this to the analysis factories (TokenizerFactory) too. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562853#comment-13562853 ] Renaud Delbru commented on LUCENE-4642: --- @steve: {quote} have you looked at TeeSinkTokenFilter {quote} Yes, and from my current understanding, it is similar to our current implementation. The problem with this approach is that the exchange of attributes is performed using the AttributeSource.State API with AttributeSource#captureState and AttributeSource#restoreState, which copies the values of all attribute implementations that the state contains, and this is very inefficient as it has to copies arrays and other objects (e.g., char term arrays, etc.) for every single token. @robert: Concerning the problem of UOEs, the new patch of Steve reduces the number of UOEs to one only, which is much more reasonable than my first approach. I have looked at the current state of the Lucene trunk, and there are already a lot of UOEs in many places. So, I would suggest that this problem may not be a blocking one (but I might be wrong). Concerning the problem of constructor explosion, maybe we can find a consensus. Your proposition of removing Tokenizer(AttributeSource) cannot work for us, as we need it to share a same AttributeSource across multiple streams. However, as I proposed, removing the Tokenizer(AttributeFactory) could work as it could be emulated by using Tokenizer(AttributeSource). > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562765#comment-13562765 ] Steve Rowe commented on LUCENE-4642: Renaud, have you looked at [TeeSinkTokenFilter|http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html]? Sounds to me like a good fit for the use case you mentioned. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562745#comment-13562745 ] Robert Muir commented on LUCENE-4642: - I raised a lot of questions. I think they are valid concerns. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562740#comment-13562740 ] Renaud Delbru commented on LUCENE-4642: --- Hi, are there still some open questions on this issue that block the patch of being committed ? > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558850#comment-13558850 ] Renaud Delbru commented on LUCENE-4642: --- {quote} Because its totally unrelated. {quote} Well, I think the user could simply create a new AttributeSource with a given AttributeFactory to emulate the Tokenizer(AttributeFactory) ? But that might add some burden on the user side. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558842#comment-13558842 ] Robert Muir commented on LUCENE-4642: - {quote} Why not the contrary instead ? I.e., remove Tokenizer(AttributeFactory) and leave Tokenizer(AttributeSource) since AttributeFactory is an enclosed class of AttributeSource ? Limiting the API to only AttributeFactory will restrict it unnecessarily imho. {quote} Because its totally unrelated. AttributeFactory lets you customize the attribute implementations. But the AttributeSource ctor is a even crazier thing: its sharing actual attributes objects with another attributesource. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558832#comment-13558832 ] Renaud Delbru commented on LUCENE-4642: --- {quote} Personally: I think we should remove Tokenizer(AttributeSource): it bloats the APIs and causes ctor explosion. {quote} Why not the contrary instead ? I.e., remove Tokenizer(AttributeFactory) and leave Tokenizer(AttributeSource) since AttributeFactory is an enclosed class of AttributeSource ? Limiting the API to only AttributeFactory will restrict it unnecessarily imho. Our use case is to be able to create "advanced token streams", where one "parent token stream" can have multiple "child token streams", the parent token stream will share their attribute sources with the child token streams for performance reasons. Emulating this behaviour by doing copies of the attributes from stream to stream is really ineffective (our throughput is divided by at least 3). A more concrete use case is the ability to create "specific token streams" for a particular "token type". For example, our parent tokenizer tokenizes a string into a list of tokens, each one having a specific type. Then, each token is processed downstream by "child token streams". The child token stream that will process the token depends on the token type attribute. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558780#comment-13558780 ] Robert Muir commented on LUCENE-4642: - I'd also like to know the use case here. Personally: I think we should remove Tokenizer(AttributeSource): it bloats the APIs and causes ctor explosion. There are no real use-cases in the lucene/solr codebase: its only used by a HACK (TrieTokenizerFactory in Solr), which should instead be fixed. AttributeFactory on the other hand is different (e.g. real use cases like numerics and collation) > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558772#comment-13558772 ] Robert Muir commented on LUCENE-4642: - don't you think adding the AttributeFactory ctor would be more useful? I think its much more esoteric to provide an AttributeSource to a tokenizer. {quote} in order not to break existing factories, I think it would be better to make this new method throw UOE by default instead of being abstract. {quote} I don't agree with this for trunk. we should add deprecations or whatever in 4.x, but trunk should be clean without any UOEs. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1307#comment-1307 ] Adrien Grand commented on LUCENE-4642: -- I'm not familiar enough with Lucene anlysis to know whether it should be exposed in the factories, but in order not to break existing factories, I think it would be better to make this new method throw UOE by default instead of being abstract. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru >Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555146#comment-13555146 ] Renaud Delbru commented on LUCENE-4642: --- Could someone from the team tell us if this patch may be considered for inclusion at some point ? We currently need it in our project, and therefore it is kind of blocking us in our development. Thanks. > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542144#comment-13542144 ] Renaud Delbru commented on LUCENE-4642: --- Hi, Any plan to commit this patch ? Or is there additional work to do before ? thanks > TokenizerFactory should provide a create method with a given AttributeSource > > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.1 >Reporter: Renaud Delbru > Labels: analysis, attribute, tokenizer > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4642.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory > does not provide an API to create tokenizers with a given AttributeSource. > Side note: There are still a lot of tokenizers that do not provide > constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org