Re: Counting and Categorisation

Erik Hatcher Thu, 08 Feb 2007 05:48:03 -0800


On Feb 8, 2007, at 8:28 AM, Kainth, Sachin wrote:

This email is meant for Chris Hostetter and of course anyone else who
may know about this,

I wonder if I can ask you a question. I have been reading of howyou at

CNET have implemented categorisation and counting so that if i type

"Kodak Easyshare" in the reviews section you not only get a biglist ofall documents about this but you also get a list of categories inwhich

"Kodak EasyShare" appears.  So for example it will say that there are
documents in the "Digital cameras" category which contain "Kodak
EasyShare" and also documents in "Peripherals" with that same query.
I'd like to do the same thing as this and I'm not sure I've fully
understood the explainations I've read so far.  I know you have

described using lots of bitsets to do this but I'm not too clear onthe

details.

Let me explain what I want to do.  It is very simple.  I have a Lucene
index containing just 3 fields (I mean field in the sense that you can
use the fieldName:searchTerm query syntax to search for the value

searchTerm in fieldName). The fields are "artist", "track" and"album".

What I want to do is if the user searches of the text "love" in the
track field they get a list of all the artists who have a track with
"love" in the title plus a list of all the albums with the word "love"

in the title. Along with these album and artist names I want acount of

the number of songs in each category.  If the user clicks on one of
these categories then that result subset is returned.

At the moment I just return the full list of artists, albums andtracks

which I want as well.  What I've described above will be in a top bar
which will allow the user to refine their search.

What I'm asking then is for some specific information about how I can
perform the categorisation and counts.


There are two ways to go about this:

  1) Use Solr.

2) If the number of unique artists and albums is reasonableenough, build BitSet's in memory into a Map. When someone searchesfor "love" (and who doesn't?) get a BitSet of the matching documents(using a HitCollector, or QueryFilter) and intersect it with all theones in the Map. I use (but working on phasing things into a betterfit with Solr and scalability) this same scheme on Collex at <http://www.nines.org/collex> for all the facets on the right (though I doleverage some of Solr's goodies, my original implementationsuccessfully used the BitSet-Map-in-memory scheme (now I useTermQuery's-in-memory, but leverage Solr's DocSet caching instead ofBitSets). Its very fast! The cons to doing all this yourself iswhen things get bigger you gotta change how it works to scale, andSolr already has a lot of infrastructure in place for this eventuality.


        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Counting and Categorisation

Reply via email to