Hello Vijay, I'm not sure whether such a change is feasible for you, but Solr has supported facets for some time now. Solr is a front-end for Lucene that provides a number of valuable features not found in Lucene itself: caching, FunctionQueries, and facets, to name a few. It has an XML interface and can be accessed from any language.
Best regards, -Graham > -----Original Message----- > From: Kapil Chhabra [mailto:[EMAIL PROTECTED] > Sent: Monday, January 15, 2007 11:43 PM > To: java-user@lucene.apache.org > Subject: Re: My Category Search Problem > > Hi Vijay, > I have hit the same problem in the past and have evaluated > various techniques to solve the same. > 1. Using the QueryFilter > The idea is to > a) create BitSets for each category once initially > b) run the search and extract the BitSet for the search results > c) Logically "AND" the result set with the category sets > d) find the cardinality of each such result and finally > display This was working just fine in my scenario but was not > scalable. The performance decreased with the increase in the > number categories. > (because of the "AND"ing in the loop) > > 2. Override the collect method of the HitCollector. > This method is called by lucene for every document in the > search results. > The idea is to: > a) override the method to use a HashMap (this works just fine for > me) for the category to count (hits) mapping > b) just keep incrementing the count for each category as > and when it is encountered in the search results. > c) the HashMap can be blank in the beginning and new > categories can be added to it when encountered. > > I am currently using the second method and it works. > > Hope this helps. > > Regards, > kapilChhabra > > > Vijay Santhanam wrote: > > Hi Lucene Users! > > > > > > > > I've been playing around with dotLucene on a few projects since for > > about 4 months, and I've found Lucene to be exceptionally powerful, > > speedy and thanks to LIA, really easy to use. > > > > But I've hit a problem that I fear will pose a performance > problem for > > our architecture and Lucene installation. > > > > > > > > We have an index of about 100,000 documents with about 30 fields, > > built from our database. > > > > Each document in the index contains a TOKENIZED field of Category > > Names, so that each document can belong to many categories. The > > category field is a tokenized string field. > > > > > > > > We have a new requirement to not only allow searches across > the whole > > index, but to return the number of documents in each of the (150) > > possible categories. This is like in an Amazon search > > > (http://amazon.com/s/ref=nb_ss_gw/105-0072880-3737226?url=search-alias > > %3Daps &field-keywords=diamond&Go.x=0&Go.y=0&Go=Go), where > a category > > list is presented on the left with the number of results in each > > category. > > > > > > > > So far, I can think of two possible ways to implement this: > > > > > > > > 1. Create a QueryFilter for the user enterered query, and perform a > > category field search for each category. > > 2. Create a separate index for each category, and sequentially (or > > concurrently) search across all the indexes. > > > > > > > > Does anyone know which solution is better than the other? > > > > > > > > Both solutions seem taxing to me because they both involve > "number of > > categories + 1" searches. > > > > > > > > Regards, > > > > -V > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]