Thanks Erick & Paul,

I also found a great example of a custom filter in LIA (6.4 Using a custom
filter)

Here's my updated testcase if anybody is interested...

===== QueryParserTest.java ================================================ 
...
public class QueryParserTest extends LuceneTestCase {
        ...
        private static int MAX_HITS = 10;

        public void testCatchTooManyClauses() throws Exception {
                System.out.println("===>testCatchTooManyClauses");
                Vector docList = null;
                try {
                        causeTooManyClauses();
                }
                catch(BooleanQuery.TooManyClauses ex) {
                        Term term = new Term(field, queryStr);
                        final BitSet bs = new BitSet(reader.maxDoc());
                        TermDocs termDocs = reader.termDocs();
                        WildcardTermEnum wte = new WildcardTermEnum(reader,
term);
                        int cnt = 0;
                        docList = new Vector(MAX_HITS);
                        /*
                        the methods termDocs.next() and reader.document() go
to different places in 
                        the Lucene index so this will send the disk head up
and down.
                        see
http://lucene.apache.org/java/docs/fileformats.html
                        */
                        for (term = null; (term = wte.term()) != null && cnt
< MAX_HITS; wte.next()) {
                                // get doc ids from .frq file
                                termDocs.seek(term);
                                while (termDocs.next() && cnt++ < MAX_HITS)
{
                                        bs.set(termDocs.doc());
                                }
                        }
                        termDocs.close();
                        // retrieve the Document's in numerical order
                        for(int i=bs.nextSetBit(0); i>=0;
i=bs.nextSetBit(i+1)) {
                                docList.add(reader.document(i));
                        }
                }
                System.out.println("found:" + docList.size());
                assertTrue(docList.size() == MAX_HITS);
        } 
...
===== QueryParserTest.java ================================================
 

> -----Original Message-----
> From: Paul Elschot [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, 16 April 2006 5:13 AM
> To: java-user@lucene.apache.org
> Subject: Re: Catching BooleanQuery.TooManyClauses
> 
> 
> On Saturday 15 April 2006 13:44, Erick Erickson wrote:
> > With the warning that I'm not the most experienced Lucene 
> user in the
> > world...
> > 
> > I *think*, that rather than search for each term, it's more 
> efficient to
> > just use IndexReader.termDocs..... i.e.
> > 
> > Indexreader ir = <whatever>;
> > TermDocs termDocs = ir.TermDocs();
> > WildcardTermEnum wildEnum = <whatever>;
> > 
> > for (Term term = null; (term = wildEnum.term()) != null; 
> wildEnum.next()) {
> >       termDocs.seek(term);
> 
> This avoids the buffer space needed for each TermDocs by 
> using each term
> separately. A BooleanQuery over all the terms will use the 
> termDocs.next() and
> termDocs.doc() for all terms at the same time. It has to, 
> because more terms
> might match each document and it has to compute the query 
> score for each
> document.
> 
> >       while (termDocs.next()) {
> >             Document doc = reader.document(termDocs.doc())
> 
> The methods termDocs.next() and reader.document()
> go to different places in the Lucene index (see the index format),
> so this will send the disk head up and down.
> It's better to collect the termDocs.doc() values first,  for 
> example in a
> BitSet, and then retrieve the Document's in numerical order.
> Btw., this is what the ConstantScoreRangeQuery does to avoid 
> using all terms
> at the same time.
> 
> >       }
> > }
> > 
> > I know that for loop looks odd, but I just peeked at the 
> source code for the
> > TermEnum classes and see why it works.
> > 
> > One warning, as the folks on the board have pointed out to 
> me is that the
> > Hits object is not entirely efficient when you fetch lots 
> of docs (more than
> > 100 has been mentioned) and you should think about TopDocs 
> or some such.
> > 
> > Also, if you can avoid fetching the document (i.e. get 
> everything you want
> > from the index) you'll add efficiency. I have no clue how 
> much you're
> > returning to the user, so I don't know whether that would 
> work for you.....
> 
> In other words, one can use the above BitSet in a Filter lateron
> during an IndexSearcher.search() (or in a ConstantScoreQuery),
> and use Hits or TopDocs for document retrieval.
> 
> Regards,
> Paul Elschot.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.385 / Virus Database: 268.4.1/313 - Release 
> Date: 15/04/2006
>  
> 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.4.2/314 - Release Date: 16/04/2006
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to