[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural comparator in favour of Java 8 one
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174080#comment-15174080 ] ASF subversion and git services commented on LUCENE-7053: - Commit 3c27980c4ae716ba74b3a0e2c70b3dd1c1d4 in lucene-solr's branch refs/heads/master from [~thetaphi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3c27980 ] LUCENE-7053: Simplify code to work around Java 8u25 compiler bug > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural > comparator in favour of Java 8 one > -- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch, > LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. I know originally we > added the different comparators to be able to allow the index term dict to be > sorted in different order. This never proved to be useful, as many Lucene > queries rely on the default order. The only codec that used another byte > order internally was the Lucene 3 one (but it used the unicode spaghetti > algorithm to reorder its term enums at runtime). > This patch also removes the BytesRef-Comparator completely and just > implements compareTo. So all code can rely on natural ordering. > This patch also cleans up other usages of natural order comparators, e.g. in > ArrayUtil, because Java 8 natively provides a comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural comparator in favour of Java 8 one
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171680#comment-15171680 ] Michael McCandless commented on LUCENE-7053: Thanks [~thetaphi]! > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural > comparator in favour of Java 8 one > -- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch, > LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. I know originally we > added the different comparators to be able to allow the index term dict to be > sorted in different order. This never proved to be useful, as many Lucene > queries rely on the default order. The only codec that used another byte > order internally was the Lucene 3 one (but it used the unicode spaghetti > algorithm to reorder its term enums at runtime). > This patch also removes the BytesRef-Comparator completely and just > implements compareTo. So all code can rely on natural ordering. > This patch also cleans up other usages of natural order comparators, e.g. in > ArrayUtil, because Java 8 natively provides a comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural comparator in favour of Java 8 one
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171596#comment-15171596 ] ASF subversion and git services commented on LUCENE-7053: - Commit 8ffa436f00d24cb45af49160739f71b3654349ce in lucene-solr's branch refs/heads/master from [~thetaphi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ffa436 ] LUCENE-7053: Move comparator to better place in code; generalize to use CharSequence instead of String > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural > comparator in favour of Java 8 one > -- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch, > LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. I know originally we > added the different comparators to be able to allow the index term dict to be > sorted in different order. This never proved to be useful, as many Lucene > queries rely on the default order. The only codec that used another byte > order internally was the Lucene 3 one (but it used the unicode spaghetti > algorithm to reorder its term enums at runtime). > This patch also removes the BytesRef-Comparator completely and just > implements compareTo. So all code can rely on natural ordering. > This patch also cleans up other usages of natural order comparators, e.g. in > ArrayUtil, because Java 8 natively provides a comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural comparator in favour of Java 8 one
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171576#comment-15171576 ] ASF subversion and git services commented on LUCENE-7053: - Commit f48d23cd1448f20fb1b97ec986ded76a04a7075c in lucene-solr's branch refs/heads/master from [~thetaphi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f48d23c ] LUCENE-7053: Remove custom comparators from BytesRef class and solely use natural byte[] comparator throughout codebase. It also replaces the natural comparator in ArrayUtil by Java 8's Comparator#naturalOrder(). > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural > comparator in favour of Java 8 one > -- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch, > LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. I know originally we > added the different comparators to be able to allow the index term dict to be > sorted in different order. This never proved to be useful, as many Lucene > queries rely on the default order. The only codec that used another byte > order internally was the Lucene 3 one (but it used the unicode spaghetti > algorithm to reorder its term enums at runtime). > This patch also removes the BytesRef-Comparator completely and just > implements compareTo. So all code can rely on natural ordering. > This patch also cleans up other usages of natural order comparators, e.g. in > ArrayUtil, because Java 8 natively provides a comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural comparator in favour of Java 8 one
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1517#comment-1517 ] Robert Muir commented on LUCENE-7053: - +1 > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural > comparator in favour of Java 8 one > -- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch, > LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. I know originally we > added the different comparators to be able to allow the index term dict to be > sorted in different order. This never proved to be useful, as many Lucene > queries rely on the default order. The only codec that used another byte > order internally was the Lucene 3 one (but it used the unicode spaghetti > algorithm to reorder its term enums at runtime). > This patch also removes the BytesRef-Comparator completely and just > implements compareTo. So all code can rely on natural ordering. > This patch also cleans up other usages of natural order comparators, e.g. in > ArrayUtil, because Java 8 natively provides a comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural comparator in favour of Java 8 one
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171087#comment-15171087 ] Uwe Schindler commented on LUCENE-7053: --- All tests pass. > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural > comparator in favour of Java 8 one > -- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch, > LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. I know originally we > added the different comparators to be able to allow the index term dict to be > sorted in different order. This never proved to be useful, as many Lucene > queries rely on the default order. The only codec that used another byte > order internally was the Lucene 3 one (but it used the unicode spaghetti > algorithm to reorder its term enums at runtime). > This patch also removes the BytesRef-Comparator completely and just > implements compareTo. So all code can rely on natural ordering. > This patch also cleans up other usages of natural order comparators, e.g. in > ArrayUtil, because Java 8 natively provides a comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator()
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171058#comment-15171058 ] Robert Muir commented on LUCENE-7053: - Yes, please, and remove BytesRef.COMPARATOR which just duplicates that: naturalOrder() already returns a singleton. Also in cases like TreeSet creation in the join tests, we should just make {{new TreeSet<>()}} and not pass any comparator in at all. > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator() > --- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. The only left over > user was TSTLookup. Moves the code there as private impl detail. > This also converts the comparators to lambdas for better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator()
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171027#comment-15171027 ] Uwe Schindler commented on LUCENE-7053: --- As we implemented {{compareTo}} we could remove the comparator completely. One could use {{Collections.naturalOrder()}} instead (naturalOrder is defined to use {{compareTo}}. At places like Collections.sort() we could remove the comparator argument completely. Any comments on this? > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator() > --- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch, LUCENE-7053.patch, LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. The only left over > user was TSTLookup. Moves the code there as private impl detail. > This also converts the comparators to lambdas for better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator()
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171012#comment-15171012 ] Uwe Schindler commented on LUCENE-7053: --- bq. We can take this further, e.g. I grep'd for places calling BytesRef.getUTF8SortedAsUnicodeComparator and it turns up silliness in BlockTermsReader that should just be invoking BytesRef.compareTo directly instead, I think? Yeah. As said, we may not remove the comparator completely, but we should only use it at places where we can't use {{Comparable}} interface that BytesRef implements. bq. You can also fix TestUnicodeUtil's custom String -> int[] code points logic maybe? Will check this, too. I am currently investigating it Java 8 already has some Comparator interface somewhere ready-to use. But does not look like that. > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator() > --- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. The only left over > user was TSTLookup. Moves the code there as private impl detail. > This also converts the comparators to lambdas for better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator()
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171010#comment-15171010 ] Michael McCandless commented on LUCENE-7053: +1 to the patch. You can also fix {{TestUnicodeUtil}}'s custom String -> int[] code points logic maybe? bq. There is a bit code duplication in both tests (sorting Strings in code point order), should we maybe move the new comparator to TestUtil? +1 We can take this further, e.g. I grep'd for places calling {{BytesRef.getUTF8SortedAsUnicodeComparator}} and it turns up silliness in {{BlockTermsReader}} that should just be invoking {{BytesRef.compareTo}} directly instead, I think? > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator() > --- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. The only left over > user was TSTLookup. Moves the code there as private impl detail. > This also converts the comparators to lambdas for better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7053) Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator()
[ https://issues.apache.org/jira/browse/LUCENE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171008#comment-15171008 ] Uwe Schindler commented on LUCENE-7053: --- There is a bit code duplication in both tests (sorting Strings in code point order), should we maybe move the new comparator to TestUtil? > Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator() > --- > > Key: LUCENE-7053 > URL: https://issues.apache.org/jira/browse/LUCENE-7053 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: master, 6.0 > > Attachments: LUCENE-7053.patch > > > Followup from LUCENE-7052: This removes the legacy, deprecated > getUTF8SortedAsUTF16Comparator() in the BytesRef class. The only left over > user was TSTLookup. Moves the code there as private impl detail. > This also converts the comparators to lambdas for better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org