It'd be great to get this into Lucene.

Does FieldCacheTermsFilter let you specify a set of arbitrary terms to filter for, like TermsFilter in contrib/queries? And it's space/time efficient once FieldCache is populated?

Mike

Tim Sturge wrote:

Mike, Mike,

I have an implementation of FieldCacheTermsFilter (which uses field cache to
filter for a predefined set of terms) around if either of you are
interested. It is faster than materializing the filter roughly when the
filter matches more than 1% of the documents.

So it's not better for a large set of small filters (which you can
materialize on the spot) but it is better for a small set (but more than 32)
large filters.

Let me know if you're interested and I'll send it in.

Tim

On 12/10/08 3:34 AM, "Michael McCandless" <[EMAIL PROTECTED]> wrote:


In your approach, roughly how many filters do you have cached?  It
seems like it could be quite a few (one for each color, one for each
type, etc)?

You might be able to modify the new (on Lucene trunk)
FieldCacheRangeFilter to achieve this same filtering without actually
having to materialize the full bitset for each.

Mike

Michael Stoppelman wrote:

Yeah looks similar to what we've implemented for ourselves (although I
haven't looked at the implementation). We've got quite a custom
version of
lucene at this point. Using Solr at this point really isn't a viable
option,
but thanks for pointing this out.

M

On Tue, Dec 9, 2008 at 1:47 AM, Michael McCandless <
[EMAIL PROTECTED]> wrote:


This use case sounds alot like faceted navigation, which Solr
provides.

Mike


Michael Stoppelman wrote:

Hi all,

I'm working on upgrading to Lucene 2.4.0 from 2.3.2 and was trying
to
integrate the new DodIdSet changes since
o.a.l.search.Filter#bits() method
is now depreciated. For our app we actually heavily rely on bits
from the
Filter to do post-query filtering (I explain why below).

For example, if someone searches for product: "ipod" and then
filters a
type: "nano" (e.g. mini/nano/regular) AND color: "red" (e.g.
red/yellow/blue). In our current model the results are gathered in
the
following way:

1) "ipod" w/o attributes is run and the results are stored in a
hitcollector
2) "ipod" results are now filtered for color="red" AND type="mini"
using
the
lucene Filters
3) The filtered results are returned to the user.

The reason that the attributes are filtered post-query is so that
we can
return the other types and colors the user can filter by in the
future.
Meaning the UI would be able to show "blue", "green", "pink",
etc... if we
pre-filtered results by color and type before hand we wouldn't
know what
the
other filter options would be there for a broader result set.

Does anyone else have this use case? I'd imagine other folks are
probably
doing similar things to accomplish this.

M



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to