[ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683171#action_12683171 ]
Michael McCandless commented on LUCENE-1434: -------------------------------------------- This looks good. I plan to commit shortly! > IndexableBinaryStringTools: convert arbitrary byte sequences into Strings > that can be used as index terms, and vice versa > ------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-1434 > URL: https://issues.apache.org/jira/browse/LUCENE-1434 > Project: Lucene - Java > Issue Type: New Feature > Components: Other > Affects Versions: 2.4 > Reporter: Steven Rowe > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1434.patch > > > Provides support for converting byte sequences to Strings that can be used as > index terms, and back again. The resulting Strings preserve the original byte > sequences' sort order (assuming the bytes are interpreted as unsigned). > The Strings are constructed using a Base 8000h encoding of the original > binary data - each char of an encoded String represents a 15-bit chunk from > the byte sequence. Base 8000h was chosen because it allows for all lower 15 > bits of char to be used without restriction; the surrogate range > [U+D800-U+DFFF] does not represent valid chars, and would require complicated > handling to avoid them and allow use of char's high bit. > This class is intended to serve as a mechanism to allow CollationKeys to > serve as index terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org