[ https://issues.apache.org/jira/browse/LUCENE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750978#comment-16750978 ]
Alan Woodward commented on LUCENE-8657: --------------------------------------- Here is a patch. The only minor difference I can see this making is that there is a possibility that the Hunspell token filter might emit stacked tokens in a slightly different order, but I don't think that will be an issue anywhere. > CharsRef.compareTo() should always be in UTF-8 order > ---------------------------------------------------- > > Key: LUCENE-8657 > URL: https://issues.apache.org/jira/browse/LUCENE-8657 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Attachments: LUCENE-8657.patch > > > CharsRef.compareTo() currently directly compares byte values. However, > everywhere that CharsRef objects are compared in the codebase instead uses > the deprecated UTF16SortedAsUTF8Comparator static comparator. We should just > reimplement compareTo() to use UTF-8 comparisons instead, and remove the > deprecated methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org