[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Earwin Burrfoot (JIRA) Wed, 26 Nov 2008 04:05:43 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650974#action_12650974
 ]


Earwin Burrfoot commented on LUCENE-1461:
-----------------------------------------

bq. RangeQuery no longer relies on the sort order of the terms, which means 
tricks like padding numeric terms are no longer needed, I think?
I do rely on sort order for speed and simplicity, though I never used padding 
for numeric/date terms :) All dates/numbers/somethingelsespecial are converted 
to strings using base-2^15^ (to keep high bit=0, as 0xFFFF is used somewhere 
within Lucene intestines as EOS marker, darn it!) encoding. Plus adjustment to 
preserve sort order for negative numbers in face of unsigned java char. This 
transformation is insanely fast, and produces well-compressed results (I have 
FAT read->mem/write->mem+disk indexes).

bq. b) prefix the terms with a precision marker. The prefix is important for 
the sort order, so that all terms of one precision are in one "bunch" and not 
distributed between higher precsion terms.
And you can no longer use this field for sorting, as it has more than one term 
for each document.

bq. For my last implementation, based on filters I did not use a BooleanQuery 
with OR'ed ranges because of resource usage
Using filters here too

bq. Allowing each field to provide its own Comparator may still be helpful then
But you still store strings in the index. So essentially you'll convert your 
value from T to String, store it, retrieve it, convert back to T in such a 
custom comparator, and finally compare. Why should I need that second 
conversion and custom comparators, if I can have order-preserving bijective 
T<->String relation?



> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>         Attachments: DisjointMultiFilter.java, LUCENE-1461.patch, 
> LUCENE-1461a.patch, LUCENE-1461b.patch, RangeMultiFilter.java, 
> RangeMultiFilter.java, TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Reply via email to