[
https://issues.apache.org/jira/browse/LUCENE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750978#comment-16750978
]
Alan Woodward commented on LUCENE-8657:
---------------------------------------
Here is a patch. The only minor difference I can see this making is that there
is a possibility that the Hunspell token filter might emit stacked tokens in a
slightly different order, but I don't think that will be an issue anywhere.
> CharsRef.compareTo() should always be in UTF-8 order
> ----------------------------------------------------
>
> Key: LUCENE-8657
> URL: https://issues.apache.org/jira/browse/LUCENE-8657
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8657.patch
>
>
> CharsRef.compareTo() currently directly compares byte values. However,
> everywhere that CharsRef objects are compared in the codebase instead uses
> the deprecated UTF16SortedAsUTF8Comparator static comparator. We should just
> reimplement compareTo() to use UTF-8 comparisons instead, and remove the
> deprecated methods.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]