[ 
https://issues.apache.org/jira/browse/LUCENE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750978#comment-16750978
 ] 

Alan Woodward commented on LUCENE-8657:
---------------------------------------

Here is a patch.  The only minor difference I can see this making is that there 
is a possibility that the Hunspell token filter might emit stacked tokens in a 
slightly different order, but I don't think that will be an issue anywhere.

> CharsRef.compareTo() should always be in UTF-8 order
> ----------------------------------------------------
>
>                 Key: LUCENE-8657
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8657
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8657.patch
>
>
> CharsRef.compareTo() currently directly compares byte values.  However, 
> everywhere that CharsRef objects are compared in the codebase instead uses 
> the deprecated UTF16SortedAsUTF8Comparator static comparator.  We should just 
> reimplement compareTo() to use UTF-8 comparisons instead, and remove the 
> deprecated methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to