RE: My Category Search Problem

Graham Stead Tue, 16 Jan 2007 06:58:30 -0800

Hello Vijay,

I'm not sure whether such a change is feasible for you, but Solr has
supported facets for some time now. Solr is a front-end for Lucene that
provides a number of valuable features  not found in Lucene itself: caching,
FunctionQueries, and facets, to name a few. It has an XML interface and can
be accessed from any language.


Best regards,
-Graham

> -----Original Message-----
> From: Kapil Chhabra [mailto:[EMAIL PROTECTED] 
> Sent: Monday, January 15, 2007 11:43 PM
> To: java-user@lucene.apache.org
> Subject: Re: My Category Search Problem
> 
> Hi Vijay,
> I have hit the same problem in the past and have evaluated 
> various techniques to solve the same.
> 1. Using the QueryFilter
> The idea is to
>     a) create BitSets for each category once initially
>     b) run the search and extract the BitSet for the search results
>     c) Logically "AND" the result set with the category sets
>     d) find the cardinality of each such result and finally 
> display This was working just fine in my scenario but was not 
> scalable. The performance decreased with the increase in the 
> number categories. 
> (because of the "AND"ing in the loop)
> 
> 2. Override the collect method of the HitCollector.
> This method is called by lucene for every document in the 
> search results.
> The idea is to:
>     a) override the method to use a HashMap (this works just fine for
> me) for the category to count (hits) mapping
>     b) just keep incrementing the count for each category as 
> and when it is encountered in the search results.
>     c) the HashMap can be blank in the beginning and new 
> categories can be added to it when encountered.
> 
> I am currently using the second method and it works.
> 
> Hope this helps.
> 
> Regards,
> kapilChhabra
> 
> 
> Vijay Santhanam wrote:
> > Hi Lucene Users!
> >
> >  
> >
> > I've been playing around with dotLucene on a few projects since for 
> > about 4 months, and I've found Lucene to be exceptionally powerful, 
> > speedy and thanks to LIA, really easy to use.
> >
> > But I've hit a problem that I fear will pose a performance 
> problem for 
> > our architecture and Lucene installation.
> >
> >  
> >
> > We have an index of about 100,000 documents with about 30 fields, 
> > built from our database.
> >
> > Each document in the index contains a TOKENIZED field of Category 
> > Names, so that each document can belong to many categories. The 
> > category field is a tokenized string field.
> >
> >  
> >
> > We have a new requirement to not only allow searches across 
> the whole 
> > index, but to return the number of documents in each of the (150) 
> > possible categories. This is like in an Amazon search 
> > 
> (http://amazon.com/s/ref=nb_ss_gw/105-0072880-3737226?url=search-alias
> > %3Daps &field-keywords=diamond&Go.x=0&Go.y=0&Go=Go), where 
> a category 
> > list is presented on the left with the number of results in each 
> > category.
> >
> >  
> >
> > So far, I can think of two possible ways to implement this:
> >
> >  
> >
> > 1.  Create a QueryFilter for the user enterered query, and perform a
> > category field search for each category.
> > 2.  Create a separate index for each category, and sequentially (or
> > concurrently) search across all the indexes. 
> >
> >  
> >
> > Does anyone know which solution is better than the other? 
> >
> >  
> >
> > Both solutions seem taxing to me because they both involve 
> "number of 
> > categories + 1" searches.
> >
> >  
> >
> > Regards,
> >
> >  -V
> >
> >  
> >
> >  
> >
> >
> >   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: My Category Search Problem

Reply via email to