[jira] [Commented] (LUCENE-6653) Cleanup TermToBytesRefAttribute

2015-07-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612051#comment-14612051
 ] 

ASF subversion and git services commented on LUCENE-6653:
-

Commit 1688845 from [~thetaphi] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1688845 ]

Merged revision(s) 1688830 from lucene/dev/trunk:
LUCENE-6653, LUCENE-6652: Refactor TermToBytesRefAttribute; add 
oal.analysis.tokenattributes.BytesTermAttribute; remove code duplication in 
tests

> Cleanup TermToBytesRefAttribute
> ---
>
> Key: LUCENE-6653
> URL: https://issues.apache.org/jira/browse/LUCENE-6653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6653.patch
>
>
> While working on LUCENE-6652, I figured out that there were so many test with 
> wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept 
> back from Lucene 4.0 was no longer correct:
> - We don't return the hash code anymore; it is calculated by BytesRefHash
> - The interface is horrible to use. It tends to reuse the BytesRef instance 
> but the whole thing is not correct.
> Instead we should remove the fillBytesRef() method from the interface and let 
> getBytesRef() populate and return the BytesRef. It does not matter if the 
> attribute reuses the BytesRef or returns a new one. It just get consumed like 
> a standard CharTermAttribute. You get a BytesRef and can use it until you 
> call incrementToken().
> As the TermsToBytesRefAttribute is marked experimental, I see no reason why 
> we should not change the semantics to be more easy to understand and behave 
> like all other attributes. I will add a note to the backwards incompatible 
> changes in Lucene 5.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6653) Cleanup TermToBytesRefAttribute

2015-07-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611997#comment-14611997
 ] 

ASF subversion and git services commented on LUCENE-6653:
-

Commit 1688830 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1688830 ]

LUCENE-6653, LUCENE-6652: Refactor TermToBytesRefAttribute; add 
oal.analysis.tokenattributes.BytesTermAttribute; remove code duplication in 
tests

> Cleanup TermToBytesRefAttribute
> ---
>
> Key: LUCENE-6653
> URL: https://issues.apache.org/jira/browse/LUCENE-6653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6653.patch
>
>
> While working on LUCENE-6652, I figured out that there were so many test with 
> wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept 
> back from Lucene 4.0 was no longer correct:
> - We don't return the hash code anymore; it is calculated by BytesRefHash
> - The interface is horrible to use. It tends to reuse the BytesRef instance 
> but the whole thing is not correct.
> Instead we should remove the fillBytesRef() method from the interface and let 
> getBytesRef() populate and return the BytesRef. It does not matter if the 
> attribute reuses the BytesRef or returns a new one. It just get consumed like 
> a standard CharTermAttribute. You get a BytesRef and can use it until you 
> call incrementToken().
> As the TermsToBytesRefAttribute is marked experimental, I see no reason why 
> we should not change the semantics to be more easy to understand and behave 
> like all other attributes. I will add a note to the backwards incompatible 
> changes in Lucene 5.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6653) Cleanup TermToBytesRefAttribute

2015-07-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611955#comment-14611955
 ] 

Michael McCandless commented on LUCENE-6653:


bq. Mike, are you also fine with the changes to the TermToBytesRefAttribute?

Yes, big +1 to the new simpler API and to backport hard break to 5.x.

> Cleanup TermToBytesRefAttribute
> ---
>
> Key: LUCENE-6653
> URL: https://issues.apache.org/jira/browse/LUCENE-6653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6653.patch
>
>
> While working on LUCENE-6652, I figured out that there were so many test with 
> wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept 
> back from Lucene 4.0 was no longer correct:
> - We don't return the hash code anymore; it is calculated by BytesRefHash
> - The interface is horrible to use. It tends to reuse the BytesRef instance 
> but the whole thing is not correct.
> Instead we should remove the fillBytesRef() method from the interface and let 
> getBytesRef() populate and return the BytesRef. It does not matter if the 
> attribute reuses the BytesRef or returns a new one. It just get consumed like 
> a standard CharTermAttribute. You get a BytesRef and can use it until you 
> call incrementToken().
> As the TermsToBytesRefAttribute is marked experimental, I see no reason why 
> we should not change the semantics to be more easy to understand and behave 
> like all other attributes. I will add a note to the backwards incompatible 
> changes in Lucene 5.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6653) Cleanup TermToBytesRefAttribute

2015-07-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611863#comment-14611863
 ] 

Robert Muir commented on LUCENE-6653:
-

+1, this patch is great!

> Cleanup TermToBytesRefAttribute
> ---
>
> Key: LUCENE-6653
> URL: https://issues.apache.org/jira/browse/LUCENE-6653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6653.patch
>
>
> While working on LUCENE-6652, I figured out that there were so many test with 
> wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept 
> back from Lucene 4.0 was no longer correct:
> - We don't return the hash code anymore; it is calculated by BytesRefHash
> - The interface is horrible to use. It tends to reuse the BytesRef instance 
> but the whole thing is not correct.
> Instead we should remove the fillBytesRef() method from the interface and let 
> getBytesRef() populate and return the BytesRef. It does not matter if the 
> attribute reuses the BytesRef or returns a new one. It just get consumed like 
> a standard CharTermAttribute. You get a BytesRef and can use it until you 
> call incrementToken().
> As the TermsToBytesRefAttribute is marked experimental, I see no reason why 
> we should not change the semantics to be more easy to understand and behave 
> like all other attributes. I will add a note to the backwards incompatible 
> changes in Lucene 5.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6653) Cleanup TermToBytesRefAttribute

2015-07-02 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611735#comment-14611735
 ] 

Uwe Schindler commented on LUCENE-6653:
---

Mike, are you also fine with the changes to the TermToBytesRefAttribute? I 
would backport those and mention the change of "workflow" in the backwards 
incompatible changes. People will get a compile error in any case if they 
define own attributes using this interface, but it will for sure not affect 
many users (maybe only those who wnated to get binary terms), which is now easy 
:-)

> Cleanup TermToBytesRefAttribute
> ---
>
> Key: LUCENE-6653
> URL: https://issues.apache.org/jira/browse/LUCENE-6653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6653.patch
>
>
> While working on LUCENE-6652, I figured out that there were so many test with 
> wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept 
> back from Lucene 4.0 was no longer correct:
> - We don't return the hash code anymore; it is calculated by BytesRefHash
> - The interface is horrible to use. It tends to reuse the BytesRef instance 
> but the whole thing is not correct.
> Instead we should remove the fillBytesRef() method from the interface and let 
> getBytesRef() populate and return the BytesRef. It does not matter if the 
> attribute reuses the BytesRef or returns a new one. It just get consumed like 
> a standard CharTermAttribute. You get a BytesRef and can use it until you 
> call incrementToken().
> As the TermsToBytesRefAttribute is marked experimental, I see no reason why 
> we should not change the semantics to be more easy to understand and behave 
> like all other attributes. I will add a note to the backwards incompatible 
> changes in Lucene 5.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6653) Cleanup TermToBytesRefAttribute

2015-07-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611671#comment-14611671
 ] 

Michael McCandless commented on LUCENE-6653:


+1, thanks for cleaning up all those dup'd binary token streams [~thetaphi]!

> Cleanup TermToBytesRefAttribute
> ---
>
> Key: LUCENE-6653
> URL: https://issues.apache.org/jira/browse/LUCENE-6653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6653.patch
>
>
> While working on LUCENE-6652, I figured out that there were so many test with 
> wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept 
> back from Lucene 4.0 was no longer correct:
> - We don't return the hash code anymore; it is calculated by BytesRefHash
> - The interface is horrible to use. It tends to reuse the BytesRef instance 
> but the whole thing is not correct.
> Instead we should remove the fillBytesRef() method from the interface and let 
> getBytesRef() populate and return the BytesRef. It does not matter if the 
> attribute reuses the BytesRef or returns a new one. It just get consumed like 
> a standard CharTermAttribute. You get a BytesRef and can use it until you 
> call incrementToken().
> As the TermsToBytesRefAttribute is marked experimental, I see no reason why 
> we should not change the semantics to be more easy to understand and behave 
> like all other attributes. I will add a note to the backwards incompatible 
> changes in Lucene 5.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6653) Cleanup TermToBytesRefAttribute

2015-07-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611179#comment-14611179
 ] 

Uwe Schindler commented on LUCENE-6653:
---

All tests pass (Lucene + Solr).

> Cleanup TermToBytesRefAttribute
> ---
>
> Key: LUCENE-6653
> URL: https://issues.apache.org/jira/browse/LUCENE-6653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6653.patch
>
>
> While working on LUCENE-6652, I figured out that there were so many test with 
> wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept 
> back from Lucene 4.0 was no longer correct:
> - We don't return the hash code anymore; it is calculated by BytesRefHash
> - The interface is horrible to use. It tends to reuse the BytesRef instance 
> but the whole thing is not correct.
> Instead we should remove the fillBytesRef() method from the interface and let 
> getBytesRef() populate and return the BytesRef. It does not matter if the 
> attribute reuses the BytesRef or returns a new one. It just get consumed like 
> a standard CharTermAttribute. You get a BytesRef and can use it until you 
> call incrementToken().
> As the TermsToBytesRefAttribute is marked experimental, I see no reason why 
> we should not change the semantics to be more easy to understand and behave 
> like all other attributes. I will add a note to the backwards incompatible 
> changes in Lucene 5.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org