My index gets rebuilt every night so I probably can afford
to construct the filters right after the index is rebuilt.  How
do I check each document (for empty fields) though? Would
I use an IndexReader to loop through the documents? If so,
which method(s) in the IndexReader class should I use?
termDocs()??? Thank you.

From: "Erick Erickson" <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Empty fields ...
Date: Tue, 18 Jul 2006 13:08:53 -0400

Quoting the guys "it depends" <G>...

At root, a filter is a bitset. So size-wise, you are using 1 bit/doc (plus
some small overhead). Both the storage required and the time to construct
are dependent on the characteristics of your corpus. I guess the only way
you can answer that for your particular situation is to test with your
corpus. I can say that I was surprised at how very fast constructing a
filter was in my situation. Which has no relevance to your situation <G>....

More of "it depends" is the fluidity of your index. If you construct it once
and don't modify it, you could consider storing your filters permanently.
Either in files or as "special documents" in your index or perhaps even in a
meta-data index. You can store documents of meta-data just by putting in
fields that are in none of your other documents..... Deletions/additions and
re-optimizations will affect the internal lucene doc IDs, so you've got to
be careful here about synchronization...

You could consider constructing your filters all in a bunch when you open
your searcher. Again, depending upon whether you modify your searcher often
will determine whether you want to do this or not.

What I'd really recommend is that you start by constructing your filters on
the fly, without even a caching wrapper and get some timings, mostly for
your peace of mind. I'd also do some timings when combining filters, just
for yucks.. There's no reason not to use a caching wrapper if you expect to
use these filters, which will load the first user with a delay, but you can
warm up your filters by issuing some canned queries upon startup....

Only if constructing any filters on the fly and using a caching wrapper
proves unsatisfactory would I move on to any kind of permanent storage.
Premature optimization and all that....

So, I don't have a good answer since I don't have a detailed knowledge of
your problem, but it should be relatively easy for you to get a sense of
whether this is a reasonable approach or not.

Hope this helps
Erick

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to