[jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion

Grant Ingersoll (JIRA) Sun, 14 Sep 2008 09:22:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630894#action_12630894
 ]


Grant Ingersoll commented on LUCENE-1279:
-----------------------------------------

{quote}
I think the problem is that every single index term has to be converted to a 
CollationKey for every single (range) search. 
{quote}

Yes, agreed.  The question mainly is would that be faster than the String 
comparisons.  Basically, is a construction plus a bitwise compare faster than a 
string compare?  


{quote}
Languages, in some cases using the same character repertoire, define different 
orderings. Also, I believe some orderings are context dependent - you can't 
always compare character by character. So adding this stuff to Lucene would be 
to duplicate a lot of the stuff that's already done in the Collator.
{quote}

Makes sense, was just wondering if there were some shortcuts to be had since we 
have a very particular case and I was thinking maybe it would allow us to 
narrow down the range to search.

For instance, hypothetically speaking, say your field had a full range of words 
starting with A up to Z, but that you knew the ordering problem only occurred 
between L and P and that your lower and upper terms K and Q, then you could 
feel confident that you could skip to K and stop at Q w/o any ramifications.  I 
realize this is repeating what is in the Collator, but it would be nice if the 
collator exposed the info.  However, perhaps, if using a RuleBasedCollator, the 
getRules() method could be used to optimize.  Again, just thinking out loud, I 
haven't explored it.

I agree, this should still go forward, even as is.


> RangeQuery and RangeFilter should use collation to check for range inclusion
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-1279
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1279
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Steven Rowe
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch, 
> LUCENE-1279.patch
>
>
> See [this java-user 
> discussion|http://www.nabble.com/lucene-farsi-problem-td16977096.html] of 
> problems caused by Unicode code-point comparison, instead of collation, in 
> RangeQuery.
> RangeQuery could take in a Locale via a setter, which could be used with a 
> java.text.Collator and/or CollationKey's, to handle ranges for languages 
> which have alphabet orderings different from those in Unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion

Reply via email to