On Feb 8, 2007, at 8:28 AM, Kainth, Sachin wrote:

This email is meant for Chris Hostetter and of course anyone else who
may know about this,

I wonder if I can ask you a question. I have been reading of how you at
CNET have implemented categorisation and counting so that if i type
"Kodak Easyshare" in the reviews section you not only get a big list of all documents about this but you also get a list of categories in which
"Kodak EasyShare" appears.  So for example it will say that there are
documents in the "Digital cameras" category which contain "Kodak
EasyShare" and also documents in "Peripherals" with that same query.
I'd like to do the same thing as this and I'm not sure I've fully
understood the explainations I've read so far.  I know you have
described using lots of bitsets to do this but I'm not too clear on the
details.

Let me explain what I want to do.  It is very simple.  I have a Lucene
index containing just 3 fields (I mean field in the sense that you can
use the fieldName:searchTerm query syntax to search for the value
searchTerm in fieldName). The fields are "artist", "track" and "album".
What I want to do is if the user searches of the text "love" in the
track field they get a list of all the artists who have a track with
"love" in the title plus a list of all the albums with the word "love"
in the title. Along with these album and artist names I want a count of
the number of songs in each category.  If the user clicks on one of
these categories then that result subset is returned.

At the moment I just return the full list of artists, albums and tracks
which I want as well.  What I've described above will be in a top bar
which will allow the user to refine their search.

What I'm asking then is for some specific information about how I can
perform the categorisation and counts.

There are two ways to go about this:

  1) Use Solr.

2) If the number of unique artists and albums is reasonable enough, build BitSet's in memory into a Map. When someone searches for "love" (and who doesn't?) get a BitSet of the matching documents (using a HitCollector, or QueryFilter) and intersect it with all the ones in the Map. I use (but working on phasing things into a better fit with Solr and scalability) this same scheme on Collex at <http:// www.nines.org/collex> for all the facets on the right (though I do leverage some of Solr's goodies, my original implementation successfully used the BitSet-Map-in-memory scheme (now I use TermQuery's-in-memory, but leverage Solr's DocSet caching instead of BitSets). Its very fast! The cons to doing all this yourself is when things get bigger you gotta change how it works to scale, and Solr already has a lot of infrastructure in place for this eventuality.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to