Re: Numeric Range Restrictions: Queries vs Filters

Hoss Mon, 22 Nov 2004 18:33:59 -0800

Of course, not only did I manage to forget to include the attachment, but
when I sent a reply with the code, mail.apache.org rejected it because it
was a ZIP file.


So let's see how mail.apache.or feels about 6 seperate text files.


: Date: Mon, 22 Nov 2004 18:25:24 -0800 (PST)
: Subject: Numeric Range Restrictions: Queries vs Filters
:
: (NOTE: numbers in [] indicate Footnotes)
:
: I'm rather new to Lucene (and this list), so if I'm grossly
: misunderstanding things, forgive me.
:
: One of my main needs as I investigate Search technologies is to restrict
: results based on Ranges of numeric values.  Looking over the archives of
: this list, it seems that lots of people have run into problems dealing
: with this.  In particular, whenever someone asks a question about "Numeric
: Ranges" the question seem to always involve one (or more) of the
: following:
:
:    (a) Lexical sorting puts 11 in the range "1 TO 5"
:    (b) Dates (or Dates and Times)
:    (c) BooleanQuery$TooManyClauses Exceptions
:    (d) Should I use a filter?
:
: (a) is a solved problem as long as you use a formatter like
: LongField.java[1]
:
: (b) is really nothing more then a special case of dealing with generic
: numeric values.  While there are certainly special purposes solutions that
: sometimes apply to dealing with Date ranges, any good solution for dealing
: with raw numeric ranges can be applied to Dates (and Times)
:
: (c) is a situation that seems to come up a lot because of the way
: RangeQuery works.  The rewrite method walks all of the Terms in the index
: starting with "lowerTerm" and builds up BooleanQuery containing a separate
: TermQuery for every Term found, until it reaches the upperTerm.  This
: causes a range search of "0001 TO 1000" to generate a BooleanQuery with N
: clauses, where N is the quantity of unique values in the field which are
: lexically greater then 0001 and lexically less then 1000.  depending on
: the nature of your data, this might be 0 BooleanClauses, or it might be
: 1000 BooleanClauses; but the list is built before the search is ever even
: executed.
:
: At first, this may seem really strange -- I know I was certainly confused
: -- but there is a very good reason for it: Ultimately RangeQuery still
: provides you with a meaningful score for each document, based on the
: frequency (and quantity) of terms that document has in the range [2].  In
: order to do that, it has to expand itself, but what if you don't care if
: your Range restriction impacts the Score? [3]
:
: Which brings us to...
:
: (c) Filtering.  Filters in general make a lot of sense to me.  They are a
: way to specify (at query time) that only a certain subset of the index
: should be considered for results.  The Filter class has a very straight
: forward API that seems very easy to subclass to get the behavior I want.
: The Query API on the other hand ... I freely admit, that I can't make
: heads or tails out of it.  I don't even know where I would begin to try
: and write a new subclass of Query if I wanted to.
:
: I would think that most people who want to do a "numeric range
: restriction" on their data, probably don't care about the Scoring benefits
: of RangeQuery.  Looking at the code base, the way DateFilter works seems
: like it provides an ideal solution to any sort of Range restriction (not
: just Dates) that *should* be more efficient then using RangeQuery when
: dealing with an unbounded value set. (Both approaches need to iterate over
: all of the terms in the specified field using TermEnum, but RangeQuery has
: to build up an set of BooleanQuery objects for each matching term, and
: then each of those queries have to help score the documents -- DateFilter
: on the other hand only has to maintain a single BitSet of documents that
: it finds as it iterates)
:
: But I was surprised then to see the following quote from "Erik Hatcher" in
: the archives:
:
:   "In fact, DateFilter by itself is practically of no use, I think." [4]
:
: ...Erik goes on to suggest that given "a set of canned date ranges", it
: doesn't really matter if you use a RangeQuery or a DateFilter -- as long
: as you cache them to reuse them (with something like CachingWrappingFilter
: or QueryFilter).  I'm hoping that he might elaborate on that comment?
:
: As a test, I wrote a "RangeFilter" which borrows heavily from DateFilter
: to both convince myself it could work, and to do a comparison between it
: and RangeQuery. [5] Based on my limited tests, using a Filter to restrict
: to a Range is a lot faster then using RangeQuery -- independent of
: caching.
:
: The attachment contains my RangeFilter, a unit test that demonstrates it,
: and a Benchmarking unit test that does a side-by-side comparison with
: RangeQuery [6].  If developers feel that this class is useful, then by all
: means roll it into the code base.  (90% of it is cut/pasted from
: DateFilter/RangeQuery anyway)
:
:
:     Comments? ... Questions? ... Answers?
:
:
:
: Footnotes:
:
: [1] It seems to me this class is extremely useful, does anyone know
:     if there's a particular reason it hasn't been added to the main Lucene
:     codebase?
:     http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg04790.html
:
: [2] Take a look at RangeQueryScoreDemo.java in the attachment, which
:     produces output something like this...
:        Range Search for: 'apple' TO 'dog'
:        0.40924072 ... bed dog emu
:        0.38014847 ... DOG
:        0.2825246 ... cat
:        0.17657787 ... apple emu
:        0.12671615 ... dog
:
: [3] According to the list archives "Matt Quail" mentioned in May that
:     he was working on a "QuickRangeQuery" class that wouldn't have the
:     BooleanQuery limitation, at the expense of always scoring "1.0",
:     but I haven't seen any mention of anything like it since.  Is Matt
:     still an active list member?  Matt, is this something you're still
:     pursuing?
:     http://nagoya.apache.org/eyebrowse/ReadMsg?msgId=1659395
:
: [4] http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg07015.html
:
: [5] The only major difference between my RangeFilter and DateFilter is
:     that RangeFilter supports options for inclusion/exclusion
:     (individually for the low/high terms I might add).  But for the
:     purposes of a benchmark, doing the same thing with DateFilter would
:     have worked fine.
:
: [6] If you have ant, see "ant -projecthelp"; otherwise, read the top
:     of build.xml
:
:
:
: --
:
: -------------------------------------------------------------------
: "Oh, you're a tricky one."                        Chris M Hostetter
:      -- Trisha Weir                    [EMAIL PROTECTED]
:
:


--

-------------------------------------------------------------------
"Oh, you're a tricky one."                        Chris M Hostetter
     -- Trisha Weir                    [EMAIL PROTECTED]

<project name="proto-rangefilter" default="classes">
 <description>
  A RangeFilter class for Lucene, along with some simple JUnit
  classes to test it provide a comparison benchmark against RangeQuery.

  You will need to modify the "my.classpath" declaration in this
  build.xml before it will do much for you.

  Included Files:

    RangeQueryScoreDemo.java   - standalone demo of RangeQuery's scoring
    RangeFilter.java           - main code of the Filter
    BaseTestRangeFilter.java   - base class JUnit test that build a RAM index
    BenchTestRangeFilter.java  - benchmark written as a JUnit test
    TestRangeFilter.java       - JUnit test of RangeFilter
    build.xml                  - this file

 </description>
  
 <path id="my.classpath">
  <!-- modify classpath as neccessary -->
  <!-- the main things needed are lucene 1.4.2 and junit -->
  <pathelement path="."/>
  <pathelement location="../lucene/lucene-1.4.2/lucene-1.4.2.jar"/>
  <fileset dir="../../code/cvs/ssa/java/dist/" includes="**/*.jar" />
 </path>

 <target name="classes" description="Compiles all the code." >
  <javac srcdir="." destdir="." debug="true">
    <classpath refid="my.classpath" />
  </javac>
 </target>

 <target name="test" depends="classes" description="Run unit tests">
  <junit printsummary="on">
    <test name="TestRangeFilter" />
    <formatter type="plain" usefile="false" />
    <classpath refid="my.classpath" />
  </junit>
 </target>

 <target name="bench" depends="classes"
         description="Run simple benchmark of RangeQuery vs RangeFilter">
  <junit printsummary="on">
    <test name="BenchTestRangeFilter" />
    <formatter type="plain" usefile="false" />
    <classpath refid="my.classpath" />
  </junit>
 </target> 

 <target name="demo" depends="classes"
         description="Run a demo of RangeQuery and the way it scores">
  <java classname="RangeQueryScoreDemo">
    <classpath refid="my.classpath" />
  </java>
 </target> 

</project>

import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.DateFilter;

import java.io.IOException;
import java.util.Random;

import junit.framework.TestCase;

public class BaseTestRangeFilter extends TestCase {

    public static final boolean F = false;
    public static final boolean T = true;
    
    RAMDirectory index = new RAMDirectory();
    Random rand = new Random(101); // use a set seed to test is deterministic
    
    int maxR = Integer.MIN_VALUE;
    int minR = Integer.MAX_VALUE;

    int minId = 0;
    int maxId = 10000;

    static final int intLength = Integer.toString(Integer.MAX_VALUE).length();
    
    /**
     * a simple padding function that should work with any int
     */
    public static String pad(int n) {
        StringBuffer b = new StringBuffer(40);
        String p = "0";
        if (n < 0) {
            p = "-";
            n = Integer.MAX_VALUE + n + 1;
        }
        b.append(p);
        String s = Integer.toString(n);
        for (int i = s.length(); i <= intLength; i++) {
            b.append("0");
        }
        b.append(s);
        
        return b.toString();
    }

    public BaseTestRangeFilter(String name) {
	super(name);
        build();
    }
    public BaseTestRangeFilter() {
        build();
    }
    
    private void build() {
        try {
            
            /* build an index */
            IndexWriter writer = new IndexWriter(index,
                                                 new SimpleAnalyzer(), T);

            for (int d = minId; d <= maxId; d++) {
                Document doc = new Document();
                doc.add(Field.Keyword("id",pad(d)));
                int r= rand.nextInt();
                if (maxR < r) {
                    maxR = r;
                }
                if (r < minR) {
                    minR = r;
                }
                doc.add(Field.Keyword("rand",pad(r)));
                doc.add(Field.Keyword("body","body"));
                writer.addDocument(doc);
            }
            
            writer.optimize();
            writer.close();

        } catch (Exception e) {
            throw new RuntimeException("can't build index", e);
        }

    }

    public void testPad() {

        int[] tests = new int[] {
            -9999999, -99560, -100, -3, -1, 0, 3, 9, 10, 1000, 999999999
        };
        for (int i = 0; i < tests.length - 1; i++) {
            int a = tests[i];
            int b = tests[i+1];
            String aa = pad(a);
            String bb = pad(b);
            String label = a + ":" + aa + " vs " + b + ":" + bb;
            assertEquals("length of " + label, aa.length(), bb.length());
            assertTrue("compare less than " + label, aa.compareTo(bb) < 0);
        }

    }

}

import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.RangeQuery;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.DateFilter;

import java.io.IOException;
import java.util.Random;

import junit.framework.TestCase;

public class BenchTestRangeFilter extends BaseTestRangeFilter {

    /** use a fixed seed so tests are deterministic and comparable */
    public static final int SEED = 23;

    public static final int ITERS = 100;
    
    public BenchTestRangeFilter(String name) {
	super(name);
    }
    public BenchTestRangeFilter() {
        super();
    }

    /**
     * test the execution of several queries using the RangeFilter
     */
    public void testRangeFilter() throws IOException {

        doTest
            (new TestSearcher () {
                    public Hits search(IndexSearcher s, Query q, String f,
                                       String low, String high,
                                       boolean inclusive) throws IOException {
                        Filter ff = new RangeFilter(f, low, high,
                                                    inclusive, inclusive);
                        return s.search(q,ff);
                    }
                });
    }
    
    /**
     * test the execution of several queries using RangeQuery
     */
    public void testRangeQuery() throws IOException {
        
        /* make sure RangeQuery will work with the max possible range size */
        BooleanQuery.setMaxClauseCount(maxId + 1);
        
        doTest
            (new TestSearcher () {
                    public Hits search(IndexSearcher s, Query q, String f,
                                       String low, String high,
                                       boolean inclusive) throws IOException {
                        Query r = new RangeQuery(new Term(f, low),
                                                 new Term(f, high),
                                                 inclusive);
                        BooleanQuery qq = new BooleanQuery();
                        qq.add(q,true,false);
                        qq.add(r,true,false);
                        return s.search(qq);
                    }
                });

    }

    protected void doTest(TestSearcher tester) throws IOException {

        Random r = new Random(SEED);
        IndexReader reader = IndexReader.open(index);
	IndexSearcher search = new IndexSearcher(reader);

        int[] counts = new int[ITERS];

        /* pick some random Id ranges that are small and search them */
        for (int i =0; i < ITERS; i++) {
            int a = minId + r.nextInt((maxId - minId) / 2);
            int b = a + 100;
            
            String aa = pad(a);
            String bb = pad(b);
            
            Hits result;
            Query q = new TermQuery(new Term("body","body"));

            result = tester.search(search,q,"id",aa,bb,true);
            counts[i] = result.length();
        }
            

        /* pick some random Id ranges that are (on average) half
         * the size of the index and search them
         */
        for (int i =0; i < ITERS; i++) {
            int a = minId + r.nextInt((maxId - minId) / 2);
            int b = maxId - r.nextInt((maxId - minId) / 2);
            
            String aa = pad(a);
            String bb = pad(b);
            
            Hits result;
            Query q = new TermQuery(new Term("body","body"));

            result = tester.search(search,q,"id",aa,bb,true);
            counts[i] = result.length();
        }

        /* pick some random random ranges that are small and search them */
        for (int i =0; i < ITERS; i++) {
            int a = minR + r.nextInt(maxR / 100);
            int b = a + 500;
            
            String aa = pad(a);
            String bb = pad(b);
            
            Hits result;
            Query q = new TermQuery(new Term("body","body"));

            result = tester.search(search,q,"rand",aa,bb,true);
            counts[i] = result.length();
        }

        /* pick some random Id ranges that are (on average) half
         * the size of the index and search them
         */
        for (int i =0; i < ITERS; i++) {
            int a = minR + r.nextInt((maxR / 2) - (minR / 2) - 1);
            int b = maxR - r.nextInt((maxR / 2) - (minR / 2) - 1);
            
            String aa = pad(a);
            String bb = pad(b);
            
            Hits result;
            Query q = new TermQuery(new Term("body","body"));

            result = tester.search(search,q,"rand",aa,bb,true);
            counts[i] = result.length();
        }
        
        
    }
    
    public interface TestSearcher {
        public Hits search(IndexSearcher s, Query q, String f,
                           String low, String high, boolean inclusive)
            throws IOException;
    }
    
}


/**
 *
 * This code borrows heavily from RangeQuery.java and DateFilter.java
 * available here...
 *   http://jakarta.apache.org/lucene/
 *
 * This is the (c) and licence from those files....
 *
 *
 * Copyright 2004 The Apache Software Foundation
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.util.BitSet;
import java.util.Date;
import java.io.IOException;

import org.apache.lucene.search.Filter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.IndexReader;

/**
 * A Filter that restricts search results to a range of values in a given
 * fieled.
 * 
 * <p>
 * This code borrows heavily from RangeQuery, but implimented as a Filter
 * (much like DateFilter)
 * </p>
 * <p>
 * In theory, the advantage of using RangeFilter instead of a RangeQuery
 * is that while both use a TermEnum to walk the index from the low end
 * of the range to hte high end, RangeQuery does this to build up a list
 * of queries, which then all have to be searched against (to compute
 * score).  A Filter on the other hand doesn't care about scoring (and
 * in my mind -- neither do must people who want to do RAnge restrictions
 * in their searches) so as it walks the terms itcan just build up the
 * resulting BitSet of docs that contain terms in the range.
 * </p>
 *
 * <p>
 * Since DateFilter's constructor only deals with strings, you could just
 * use it instead; but this class has the added advantage of allowing you
 * to specify wether the upper of lower bounds should be included --
 * independantly.
 * </p>
 */
public class RangeFilter extends Filter {
    
    private String f;
    private String low;
    private String upp;
    private boolean inclLower;
    private boolean inclUpper;

    /**
     * @param field The field this range applies to
     * @param lower The lower bound on this range
     * @param upper The upper bound on this range
     * @param includeLower Does this range include the lower bound?
     * @param includeUpper Does this range include the upper bound?
     */
    public RangeFilter(String field, String lower, String upper,
                       boolean includeLower, boolean includeUpper) {
        f = field;
        low = lower;
        upp = upper;
        inclLower = includeLower;
        inclUpper = includeUpper;
        
        if (null == low && null == upp) {
            throw new IllegalArgumentException
                ("At least one value must be non-null");
        }
        if (inclLower && null == low) {
            throw new IllegalArgumentException
                ("The lower bound must be non-null to be inclusive");
        }
        if (inclUpper && null == upp) {
            throw new IllegalArgumentException
                ("The upper bound must be non-null to be inclusive");
        }
    }
    
    /**
     * Constructs a filter for field <code>field</code> matching
     * less than or equal to <code>value</code>
     */
    public static RangeFilter Less(String field, String upper) {
        return new RangeFilter(field, null, upper, false, true);
    }

    /**
     * Constructs a filter for field <code>field</code> matching
     * greater than or equal to <code>lower</code>
     */
    public static RangeFilter More(String field, String lower) {
        return new RangeFilter(field, lower, null, true, false);
    }
    
    /**
     * Returns a BitSet with true for documents which should be
     * permitted in search results, and false for those that should
     * not.
     */
    public BitSet bits(IndexReader reader) throws IOException {
        BitSet bits = new BitSet(reader.maxDoc());
        TermEnum enumerator =
            (null != low
             ? reader.terms(new Term(f, low))
             : reader.terms(new Term(f,"")));
        
        try {
            
            if (enumerator.term() == null) {
                return bits;
            }
            
            boolean checkLower = false;
            if (!inclLower) // make adjustments to set to exclusive
                checkLower = true;
        
            TermDocs termDocs = reader.termDocs();
            try {
                
                do {
                    Term term = enumerator.term();
                    if (term != null && term.field().equals(f)) {
                        if (!checkLower || null==low || term.text().compareTo(low) > 0) {
                            checkLower = false;
                            if (upp != null) {
                                int compare = upp.compareTo(term.text());
                                /* if beyond the upper term, or is exclusive and
                                 * this is equal to the upper term, break out */
                                if ((compare < 0) ||
                                    (!inclUpper && compare==0)) {
                                    break;
                                }
                            }
                            /* we have a good term, find the docs */
                            
                            termDocs.seek(enumerator.term());
                            while (termDocs.next()) {
                                bits.set(termDocs.doc());
                            }
                        }
                    } else {
                        break;
                    }
                }
                while (enumerator.next());
                
            } finally {
                termDocs.close();
            }
        } finally {
            enumerator.close();
        }

        return bits;
    }
    
    public String toString() {
        StringBuffer buffer = new StringBuffer();
        buffer.append(f);
        buffer.append(":");
        buffer.append(inclLower ? "[" : "{");
        if (null != low) {
            buffer.append(low);
        }
        buffer.append("-");
        if (null != upp) {
            buffer.append(upp);
        }
        buffer.append(inclUpper ? "]" : "}");
        return buffer.toString();
    }
}

import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.RangeQuery;

import java.io.IOException;
import java.util.Random;

/**
 * Simple demo of *why* range query expands itself out into a
 * BooleanQuery containing many, many, MANY terms (all of hte known
 * terms between the low/high terms of the range)
 */
public class RangeQueryScoreDemo {

    public static void main(String argv[]) throws IOException {

        Document doc;
        RAMDirectory d = new RAMDirectory();
        IndexWriter w = new IndexWriter(d, new SimpleAnalyzer(), true);
        doc = new Document();
        doc.add(Field.Text("words", "not in range"));
        w.addDocument(doc);
        doc = new Document();
        doc.add(Field.Text("words", "apple emu"));
        w.addDocument(doc);
        doc = new Document();
        doc.add(Field.Text("words", "bed dog emu"));
        w.addDocument(doc);
        doc = new Document();
        doc.add(Field.Text("words", "dog"));
        w.addDocument(doc);
        doc = new Document();
        doc.add(Field.Text("words", "cat"));
        w.addDocument(doc);
        doc = new Document();
        Field f = Field.Text("words", "DOG"); /* caps denote the boost */
        f.setBoost(3.0f);
        doc.add(f);
        w.addDocument(doc);
        w.optimize();
        w.close();

        IndexSearcher s = new IndexSearcher(d);
        System.out.println("Range Search for: 'apple' TO 'dog'");
        Hits h = s.search(new RangeQuery(new Term("words", "apple"),
                                         new Term("words", "dog"),
                                         true));
        for (int i = 0; i < h.length(); i++) {
            System.out.println(h.score(i) + " ... " + h.doc(i).get("words")); 
        }

    }
}

import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.DateFilter;

import java.io.IOException;
import java.util.Random;

import junit.framework.TestCase;

/**
 * A basic 'positive' Unit test class for the RangeFilter class.
 *
 * <p>
 * NOTE: at the moment, this class only tests for 'positive' results,
 * it does not verify the results to ensure their are no 'false positives',
 * nor does it adequately test 'negative' results.  It also does not test
 * that garbage in results in an Exception.
 */
public class TestRangeFilter extends BaseTestRangeFilter {

    public TestRangeFilter(String name) {
	super(name);
    }
    public TestRangeFilter() {
        super();
    }

    public void testRangeFilterId() throws IOException {

        IndexReader reader = IndexReader.open(index);
	IndexSearcher search = new IndexSearcher(reader);

        int medId = ((maxId - minId) / 2);
        
        String minIP = pad(minId);
        String maxIP = pad(maxId);
        String medIP = pad(medId);
    
        int numDocs = reader.numDocs();
        
        assertEquals("num of docs", numDocs, 1+ maxId - minId);
        
	Hits result;
        Query q = new TermQuery(new Term("body","body"));

        // test id, bounded on both ends
        
	result = search.search(q,new RangeFilter("id",minIP,maxIP,T,T));
	assertEquals("find all", numDocs, result.length());

	result = search.search(q,new RangeFilter("id",minIP,maxIP,T,F));
	assertEquals("all but last", numDocs-1, result.length());

	result = search.search(q,new RangeFilter("id",minIP,maxIP,F,T));
	assertEquals("all but first", numDocs-1, result.length());
        
	result = search.search(q,new RangeFilter("id",minIP,maxIP,F,F));
        assertEquals("all but ends", numDocs-2, result.length());
    
        result = search.search(q,new RangeFilter("id",medIP,maxIP,T,T));
        assertEquals("med and up", 1+ maxId-medId, result.length());
        
        result = search.search(q,new RangeFilter("id",minIP,medIP,T,T));
        assertEquals("up to med", 1+ medId-minId, result.length());

        // unbounded id

	result = search.search(q,new RangeFilter("id",minIP,null,T,F));
	assertEquals("min and up", numDocs, result.length());

	result = search.search(q,new RangeFilter("id",null,maxIP,F,T));
	assertEquals("max and down", numDocs, result.length());

	result = search.search(q,new RangeFilter("id",minIP,null,F,F));
	assertEquals("not min, but up", numDocs-1, result.length());
        
	result = search.search(q,new RangeFilter("id",null,maxIP,F,F));
	assertEquals("not max, but down", numDocs-1, result.length());
        
        result = search.search(q,new RangeFilter("id",medIP,maxIP,T,F));
        assertEquals("med and up, not max", maxId-medId, result.length());
        
        result = search.search(q,new RangeFilter("id",minIP,medIP,F,T));
        assertEquals("not min, up to med", medId-minId, result.length());

        // very small sets

	result = search.search(q,new RangeFilter("id",minIP,minIP,F,F));
	assertEquals("min,min,F,F", 0, result.length());
	result = search.search(q,new RangeFilter("id",medIP,medIP,F,F));
	assertEquals("med,med,F,F", 0, result.length());
	result = search.search(q,new RangeFilter("id",maxIP,maxIP,F,F));
	assertEquals("max,max,F,F", 0, result.length());
                     
	result = search.search(q,new RangeFilter("id",minIP,minIP,T,T));
	assertEquals("min,min,T,T", 1, result.length());
	result = search.search(q,new RangeFilter("id",null,minIP,F,T));
	assertEquals("nul,min,F,T", 1, result.length());

	result = search.search(q,new RangeFilter("id",maxIP,maxIP,T,T));
	assertEquals("max,max,T,T", 1, result.length());
	result = search.search(q,new RangeFilter("id",maxIP,null,T,F));
	assertEquals("max,nul,T,T", 1, result.length());

	result = search.search(q,new RangeFilter("id",medIP,medIP,T,T));
	assertEquals("med,med,T,T", 1, result.length());
        
    }

    public void testRangeFilterRand() throws IOException {

        IndexReader reader = IndexReader.open(index);
	IndexSearcher search = new IndexSearcher(reader);

        String minRP = pad(minR);
        String maxRP = pad(maxR);
    
        int numDocs = reader.numDocs();
        
        assertEquals("num of docs", numDocs, 1+ maxId - minId);
        
	Hits result;
        Query q = new TermQuery(new Term("body","body"));

        // test extremes, bounded on both ends
        
	result = search.search(q,new RangeFilter("rand",minRP,maxRP,T,T));
	assertEquals("find all", numDocs, result.length());

	result = search.search(q,new RangeFilter("rand",minRP,maxRP,T,F));
	assertEquals("all but biggest", numDocs-1, result.length());

	result = search.search(q,new RangeFilter("rand",minRP,maxRP,F,T));
	assertEquals("all but smallest", numDocs-1, result.length());
        
	result = search.search(q,new RangeFilter("rand",minRP,maxRP,F,F));
        assertEquals("all but extremes", numDocs-2, result.length());
    
        // unbounded

	result = search.search(q,new RangeFilter("rand",minRP,null,T,F));
	assertEquals("smallest and up", numDocs, result.length());

	result = search.search(q,new RangeFilter("rand",null,maxRP,F,T));
	assertEquals("biggest and down", numDocs, result.length());

	result = search.search(q,new RangeFilter("rand",minRP,null,F,F));
	assertEquals("not smallest, but up", numDocs-1, result.length());
        
	result = search.search(q,new RangeFilter("rand",null,maxRP,F,F));
	assertEquals("not biggest, but down", numDocs-1, result.length());
        
        // very small sets

	result = search.search(q,new RangeFilter("rand",minRP,minRP,F,F));
	assertEquals("min,min,F,F", 0, result.length());
	result = search.search(q,new RangeFilter("rand",maxRP,maxRP,F,F));
	assertEquals("max,max,F,F", 0, result.length());
                     
	result = search.search(q,new RangeFilter("rand",minRP,minRP,T,T));
	assertEquals("min,min,T,T", 1, result.length());
	result = search.search(q,new RangeFilter("rand",null,minRP,F,T));
	assertEquals("nul,min,F,T", 1, result.length());

	result = search.search(q,new RangeFilter("rand",maxRP,maxRP,T,T));
	assertEquals("max,max,T,T", 1, result.length());
	result = search.search(q,new RangeFilter("rand",maxRP,null,T,F));
	assertEquals("max,nul,T,T", 1, result.length());
        
    }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Numeric Range Restrictions: Queries vs Filters

Reply via email to