[
https://issues.apache.org/jira/browse/LUCENE-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630894#action_12630894
]
Grant Ingersoll commented on LUCENE-1279:
-----------------------------------------
{quote}
I think the problem is that every single index term has to be converted to a
CollationKey for every single (range) search.
{quote}
Yes, agreed. The question mainly is would that be faster than the String
comparisons. Basically, is a construction plus a bitwise compare faster than a
string compare?
{quote}
Languages, in some cases using the same character repertoire, define different
orderings. Also, I believe some orderings are context dependent - you can't
always compare character by character. So adding this stuff to Lucene would be
to duplicate a lot of the stuff that's already done in the Collator.
{quote}
Makes sense, was just wondering if there were some shortcuts to be had since we
have a very particular case and I was thinking maybe it would allow us to
narrow down the range to search.
For instance, hypothetically speaking, say your field had a full range of words
starting with A up to Z, but that you knew the ordering problem only occurred
between L and P and that your lower and upper terms K and Q, then you could
feel confident that you could skip to K and stop at Q w/o any ramifications. I
realize this is repeating what is in the Collator, but it would be nice if the
collator exposed the info. However, perhaps, if using a RuleBasedCollator, the
getRules() method could be used to optimize. Again, just thinking out loud, I
haven't explored it.
I agree, this should still go forward, even as is.
> RangeQuery and RangeFilter should use collation to check for range inclusion
> ----------------------------------------------------------------------------
>
> Key: LUCENE-1279
> URL: https://issues.apache.org/jira/browse/LUCENE-1279
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.3.1
> Reporter: Steven Rowe
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch,
> LUCENE-1279.patch
>
>
> See [this java-user
> discussion|http://www.nabble.com/lucene-farsi-problem-td16977096.html] of
> problems caused by Unicode code-point comparison, instead of collation, in
> RangeQuery.
> RangeQuery could take in a Locale via a setter, which could be used with a
> java.text.Collator and/or CollationKey's, to handle ranges for languages
> which have alphabet orderings different from those in Unicode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]