[ https://issues.apache.org/jira/browse/LUCENE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170991#comment-15170991 ]
Uwe Schindler edited comment on LUCENE-7052 at 2/28/16 10:34 AM: ----------------------------------------------------------------- Hi Mike, I know originally we added the different comparators to be able to allow the index term dict to be sorted in different order. This never prooved to be useful, as many Lucene queries rely on the default order. The only codec that used another byte order internally was the Lucene 3 one (but it used the unicode spaghetti algorithm to reorder its term enums at runtime). As this is now all gone, I'd suggest to also remove the utf8AsUtf16 comparator. Mabye remove the comparators at all and just implement BytesRef.compareTo() and use that one for sorting? I checked the code: utf8SortedAsUTF16SortOrder is only used in TSTLookup nowhere else anymore (except some test that check alternative sorts - those can be removed). As a first step I changed the BytesRef code to no longer use inner classes and instead use a lambda to define the comparators. But I'd suggest to remove at least the UTF-16 one completely and move it as private impl detail to TSTLookup (as only used there). _FYI: The lambda has no speed impact because it is called only once and internally compiles to a class file that implements Comparator. It just looks nicer than the horrible comparator classes_ was (Author: thetaphi): Hi Mike, I know originally we added the different comparators to be able to allow the index term dict to be sorted in different order. This never prooved to be useful, as many Lucene queries rely on the default order. The only codec that used another byte order internally was the Lucene 3 one (but it used the unicode spaghetti algorithm to reorder its term enums at runtime). As this is now all gone, I'd suggest to also remove the utf8AsUtf16 comparator. Mabye remove the comparators at all and just implement BytesRef.compareTo() and use that one for sorting? I checked the code: utf8SortedAsUTF16SortOrder is only used in TSTLookup nowhere else anymore (except some test that check alternative sorts - those can be removed). As a first step I changed the BytesRef code to no longer use inner classes and instead use a lambda to define the comparators. But I'd suggest to remove at least the UTF-16 one completely and move it as private impl detail and move it hidden TSTLookup (as only used there). _FYI: The lambda has no speed impact because it is called only once and internally compiles to a class file that implements Comparator. It just looks nicer than the horrible comparator classes_ > BytesRefHash.sort should always sort in unicode code point order > ---------------------------------------------------------------- > > Key: LUCENE-7052 > URL: https://issues.apache.org/jira/browse/LUCENE-7052 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: master, 6.0 > > Attachments: LUCENE-7052-cleanup1.patch, LUCENE-7052.patch > > > Today {{BytesRefHash.sort}} takes a custom {{Comparator}} but we always pass > it {{BytesRef.getUTF8SortedAsUnicodeComparator()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org