[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-03-11 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598688#comment-13598688
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Hi Steve, I imagine things were busy these past days with the 4.2 release. 
Would you need help to finalise this patch ? thanks.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-02-26 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587283#comment-13587283
 ] 

Steve Rowe commented on LUCENE-4642:


Hi Renaud,

I skimmed your patch, looks good, I'll take a closer look in the next couple 
days for completeness and testing.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-02-26 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587269#comment-13587269
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Hi, any updates about the patch ? thanks.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-02-14 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578331#comment-13578331
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Hi, would this patch be considered for inclusion at some point in time ? Thanks.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-02-03 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569883#comment-13569883
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Hi,

I have submitted a patch which integrates:
- the patch from Uwe
- the removal of the Tokenizer(AttributeSource) constructor
- the addition of a TokenizerFactory.create(AttributeFactory) method
- some of the changes from the previous patch from Steve (e.g., 
TokenizerFactory.create method throw UOE by default)

All test suites are passing.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-28 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564422#comment-13564422
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Great, I think that AttributeFactory hack could work for us. Would you agree to 
add a TokenizerFactory.create(AttributeFactory) method ? I could prepare a 
patch for that.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563798#comment-13563798
 ] 

Robert Muir commented on LUCENE-4642:
-

+1 for Uwe's patch. I think this constructor is dangerous, i dont want it on 
every tokenizer.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563791#comment-13563791
 ] 

Uwe Schindler commented on LUCENE-4642:
---

bq. And I guess I was secretly hoping we could remove 
Tokenizer(AttributeSource) if we fixed the solr hack. 

This is my opinion, too!

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563790#comment-13563790
 ] 

Uwe Schindler commented on LUCENE-4642:
---

TokenStreams are final and their settings should not be modifiable (the ones 
which still have setters are there for backwards compatibility in Lucene 3.x, 
in 4.0 all settings should be unmodifiable). It is also impossible to change 
the AttributeFactory or AttributeSource after construction because the 
attributes are created during construction (addAttribute in the implicit field 
initialization constructor), so changing the AttributeSource/Factory afterwards 
will not work.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-27 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563784#comment-13563784
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Hi Robert,

I understand your point of view. One possible alternative for simplifying the 
API would be to refactor constructors with AttributeSource/AttributeFactory 
into setters. After a quick look, this looks compatible with the existing 
tokenizers and tokenizer factories. 
The setting of AttributeSource/AttributeFactory for a tokenizer will be 
transparent (i.e., they do not have to explicitly create a constructor), and 
specific extension can be still implemented by subclasses (e.g., 
NumericTokenStream can overwrite the setAttributeFactory method to wrap a given 
factory with NumericAttributeFactory).
For the tokenizer factories, we can then implement a create method with an 
AttributeSource/AttributeFactory parameter, which will call the abstract method 
create and then call the setAttributeSource/setAttributeFactory on the newly 
created tokenizer.

What do you think ? Did I miss something in my reasoning which could break 
something ? 

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563108#comment-13563108
 ] 

Robert Muir commented on LUCENE-4642:
-

My problem i guess with AttributeSource/AttributeFactory is that they invade on 
every single custom tokenizer: the API is not good.

I realize its useful for expert users to be able to plug in their own, but why 
in the world must *every*  tokenizer have ctor explosion (minimum 3) to support 
this? 

And I guess I was secretly hoping we could remove Tokenizer(AttributeSource) if 
we fixed the solr hack. :)

Again my main problem is not about what you want to do, its instead related to 
the existing APIs (Tokenizer.java) and where we are heading if we perpetuate 
this to the analysis factories (TokenizerFactory) too.


> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-25 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562853#comment-13562853
 ] 

Renaud Delbru commented on LUCENE-4642:
---

@steve:

{quote}
have you looked at TeeSinkTokenFilter
{quote}

Yes, and from my current understanding, it is similar to our current 
implementation. The problem with this approach is that the exchange of 
attributes is performed using the AttributeSource.State API with 
AttributeSource#captureState and AttributeSource#restoreState, which copies the 
values of all attribute implementations that the state contains, and this is 
very inefficient as it has to copies arrays and other objects (e.g., char term 
arrays, etc.) for every single token.

@robert:

Concerning the problem of UOEs, the new patch of Steve reduces the number of 
UOEs to one only, which is much more reasonable than my first approach. I have 
looked at the current state of the Lucene trunk, and there are already a lot of 
UOEs in many places. So, I would suggest that this problem may not be a 
blocking one (but I might be wrong).

Concerning the problem of constructor explosion, maybe we can find a consensus. 
Your proposition of removing Tokenizer(AttributeSource) cannot work for us, as 
we need it to share a same AttributeSource across multiple streams. However, as 
I proposed, removing the Tokenizer(AttributeFactory) could work as it could be 
emulated by using Tokenizer(AttributeSource).



> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562765#comment-13562765
 ] 

Steve Rowe commented on LUCENE-4642:


Renaud, have you looked at 
[TeeSinkTokenFilter|http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html]?
  Sounds to me like a good fit for the use case you mentioned.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562745#comment-13562745
 ] 

Robert Muir commented on LUCENE-4642:
-

I raised a lot of questions. I think they are valid concerns.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-25 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562740#comment-13562740
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Hi, 

are there still some open questions on this issue that block the patch of being 
committed ? 

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-21 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558850#comment-13558850
 ] 

Renaud Delbru commented on LUCENE-4642:
---

{quote}
Because its totally unrelated.
{quote}

Well, I think the user could simply create a new AttributeSource with a given 
AttributeFactory to emulate the Tokenizer(AttributeFactory) ? But that might 
add some burden on the user side.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558842#comment-13558842
 ] 

Robert Muir commented on LUCENE-4642:
-

{quote}
Why not the contrary instead ? I.e., remove Tokenizer(AttributeFactory) and 
leave Tokenizer(AttributeSource) since AttributeFactory is an enclosed class of 
AttributeSource ? Limiting the API to only AttributeFactory will restrict it 
unnecessarily imho.
{quote}

Because its totally unrelated.

AttributeFactory lets you customize the attribute implementations.

But the AttributeSource ctor is a even crazier thing: its sharing actual 
attributes objects with another attributesource.


> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-21 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558832#comment-13558832
 ] 

Renaud Delbru commented on LUCENE-4642:
---

{quote}
Personally: I think we should remove Tokenizer(AttributeSource): it bloats the 
APIs and causes ctor explosion.
{quote}

Why not the contrary instead ? I.e., remove Tokenizer(AttributeFactory) and 
leave Tokenizer(AttributeSource) since AttributeFactory is an enclosed class of 
AttributeSource ? Limiting the API to only AttributeFactory will restrict it 
unnecessarily imho.

Our use case is to be able to create "advanced token streams", where one 
"parent token stream" can have multiple "child token streams", the parent token 
stream will share their attribute sources with the child token streams for 
performance reasons. Emulating this behaviour by doing copies of the attributes 
from stream to stream is really ineffective (our throughput is divided by at 
least 3).
A more concrete use case is the ability to create "specific token streams" for 
a particular "token type". For example, our parent tokenizer tokenizes a string 
into a list of tokens, each one having a specific type. Then, each token is 
processed downstream by "child token streams". The child token stream that will 
process the token depends on the token type attribute.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558780#comment-13558780
 ] 

Robert Muir commented on LUCENE-4642:
-

I'd also like to know the use case here.

Personally: I think we should remove Tokenizer(AttributeSource): it bloats the 
APIs and causes ctor explosion.
There are no real use-cases in the lucene/solr codebase: its only used by a 
HACK (TrieTokenizerFactory in Solr), which should instead be fixed.

AttributeFactory on the other hand is different (e.g. real use cases like 
numerics and collation)


> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558772#comment-13558772
 ] 

Robert Muir commented on LUCENE-4642:
-

don't you think adding the AttributeFactory ctor would be more useful? I think 
its much more esoteric to provide an AttributeSource to a tokenizer.

{quote}
in order not to break existing factories, I think it would be better to make 
this new method throw UOE by default instead of being abstract.
{quote}

I don't agree with this for trunk. we should add deprecations or whatever in 
4.x, but trunk should be clean without any UOEs.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1307#comment-1307
 ] 

Adrien Grand commented on LUCENE-4642:
--

I'm not familiar enough with Lucene anlysis to know whether it should be 
exposed in the factories, but in order not to break existing factories, I think 
it would be better to make this new method throw  UOE by default instead of 
being abstract.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>Assignee: Steve Rowe
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-16 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555146#comment-13555146
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Could someone from the team tell us if this patch may be considered for 
inclusion at some point ? We currently need it in our project, and therefore it 
is kind of blocking us in our development. Thanks.

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource

2013-01-02 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542144#comment-13542144
 ] 

Renaud Delbru commented on LUCENE-4642:
---

Hi,

Any plan to commit this patch ? Or is there additional work to do before ?

thanks

> TokenizerFactory should provide a create method with a given AttributeSource
> 
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.1
>Reporter: Renaud Delbru
>  Labels: analysis, attribute, tokenizer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4642.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory 
> does not provide an API to create tokenizers with a given AttributeSource.
> Side note: There are still a lot of tokenizers that do not provide 
> constructors that take AttributeSource and AttributeFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org