[ 
https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646699#action_12646699
 ] 

Michael McCandless commented on LUCENE-1435:
--------------------------------------------

bq. IndexableBinaryStringTools (LUCENE-1434) implements a base-8000h encoding: 
the lower 15 bits of each character have 1-7/8 bytes packed into them. It's 
radically different from the original byte array, at least in terms of looking 
at it with a text viewer like Luke. And I don't think CollationKeys themselves 
are intended for human consumption.

Oh OK.  So having done this term conversion, you can't really look at / use the 
resulting terms in the index for human consumption (you'd have to store stuff 
yourself).

bq. Perhaps I'm missing something, but o.a.l.index.TermEnum.skipTo(Term) 
compares the target term using String.compareTo(),

But we could just fix that to pay attention to the Collator for that field, if 
it has one, right?  (Or with flexible indexing I think the impl really should 
own this method, ie, it should be abstract in TermEnum).

I think the external approach is fine for starters... I just think long-term it 
may make sense to have core Lucene respect the Collator, but it really is an 
invasive change.  We should wait until we make progress on flexible indexing at 
which point such a change should be far less costly.

> CollationKeyFilter: convert tokens into CollationKeys encoded using 
> IndexableBinaryStringTools
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1435
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1435
>             Project: Lucene - Java
>          Issue Type: New Feature
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1435.patch, LUCENE-1435.patch
>
>
> Converts each token into its CollationKey using the provided collator, and 
> then encodes the CollationKey with IndexableBinaryStringTools, to allow it to 
> be stored as an index term.
> This will allow for efficient range searches and Sorts over fields that need 
> collation for proper ordering.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to