RE: Aggregating category hits

Ramana Jelda Mon, 22 May 2006 05:52:41 -0700

I think, if you dig a little bit what lucene is when asked to do Sort then
you will get the information what you are looking for.


Here is some help.
Lucene uses TopFieldDocCollector for sorting purpose(lookat implementation
of IndexSearcher).
So your HitCollector will extend this TopFieldDocCollector, so that you will
do your work what ever you want to do  and also let TopFieldDocCollector do
its work (sorting..).I think I don't need to explain you more. 

Then you are done. 

Have fun,
Jelda




> -----Original Message-----
> From: Kapil Chhabra [mailto:[EMAIL PROTECTED] 
> Sent: Monday, May 22, 2006 2:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: Aggregating category hits
> 
> Hi Jelda,
> Is there any way by which I can achieve sorting of search 
> results along with overriding the collect method of the 
> HitCollector in this case?
> I have been using
> 
> srch.search(query,sort);
> 
> If I replace it with srch.search(query, new HitCollector(){ 
> impl of the collect method to collect counts }), I will have 
> no way to sort my results.
> 
> Any pointers?
> 
> Regards,
> kapilChhabra
> 
> Kapil Chhabra wrote:
> > Thanks a lot Jelda.
> > I'll try this get back with the performance comparison chart.
> >
> > Regards,
> > kapilChhabra
> >
> > Ramana Jelda wrote:
> >> Hi Kapil,
> >> As I remember FieldCache is in lucene api since 1.4 .
> >> Ok . Anyhow here is suedo code that can help.
> >>
> >> //1. initialize reader on opening documentId to the categoryid 
> >> relation as below. Depending on your requirement you can either 
> >> getStringIndex().. I get StringIndex in //my project.
> >>
> >> String[] 
> >> docId2CategoryIdRelation=FieldCache.DEFAULT.getStrings(reader,
> >> categoryFieldName);
> >>
> >> //2. cache it
> >> //3. search as usal with your Query providing your own 
> HitCollector 
> >> //4. use docId2CategoryIdRelation to retrieve category id for each 
> >> result document
> >> String yourCategoryId=    docId2CategoryIdRelation[resultDocId]
> >> //5.Increment yourCategoryId count (do lazy initialization of 
> >> categoryCounts
> >> holder.FAQ.)
> >>
> >> //6 You are done.. :)
> >>
> >> All the best,
> >> Jelda
> >>
> >>
> >>
> >>
> >>  
> >>> -----Original Message-----
> >>> From: Kapil Chhabra [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, 
> >>> May 16, 2006 11:50 AM
> >>> To: java-user@lucene.apache.org
> >>> Subject: Re: Aggregating category hits
> >>>
> >>> Hi Jelda,
> >>> I have not yet migrated to Lucene 1.9 and I guess FieldCache has 
> >>> been introduced in this release.
> >>> Can you please give me a pointer to your strategy of FieldCache?
> >>>
> >>> Thanks & Regards,
> >>> Kapil Chhabra
> >>>
> >>>
> >>> Ramana Jelda wrote:
> >>>    
> >>>> But this BitSet strategy is more memory consuming mainly 
> if       
> >>> you have    
> >>>> documents in million numbers and categories in thousands.
> >>>> So I preferred in my project FieldCache strategy.
> >>>>
> >>>> Jelda
> >>>>
> >>>>        
> >>>>> -----Original Message-----
> >>>>> From: Kapil Chhabra [mailto:[EMAIL PROTECTED]
> >>>>> Sent: Tuesday, May 16, 2006 7:38 AM
> >>>>> To: java-user@lucene.apache.org
> >>>>> Subject: Re: Aggregating category hits
> >>>>>
> >>>>> Even I am doing the same in my application.
> >>>>> Once in a day, all the filters [for different categories] are 
> >>>>> initialized. Each time a query is fired, the Query 
> BitSet is ANDed 
> >>>>> with the BitSet of each filter. The cardinality obtained is the 
> >>>>> desired output.
> >>>>> @Eric: I would like to know more about the 
> implementation         
> >>> with DocSet    
> >>>>> in place of Bitset.
> >>>>>
> >>>>> Regards,
> >>>>> kapilChhabra
> >>>>>
> >>>>>
> >>>>> Erik Hatcher wrote:
> >>>>>            
> >>>>>> On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote:
> >>>>>>                
> >>>>>>> If you needed to know not just the total number of 
> hits, but the 
> >>>>>>> number of hits in each "category", how would you handle that?
> >>>>>>>
> >>>>>>> For instance, a search for "egg" would have to 
> produce             
> >>> the 20 most    
> >>>>>>> relevant documents for "egg", but also a list like this:
> >>>>>>>
> >>>>>>> Holiday & Seasonal / Easter 75
> >>>>>>> Books / Cooking 52
> >>>>>>> Miscellaneous 44
> >>>>>>> Kitchen Collectibles 43
> >>>>>>> Hobbies / Crafts 17
> >>>>>>> [...]
> >>>>>>>
> >>>>>>> It seems to me that you'd have to retrieve each hit's
> >>>>>>>                     
> >>>>> stored fields
> >>>>>            
> >>>>>>> and examine the contents of a "category" field. 
> That's a lot of 
> >>>>>>> overhead. Is there another way?
> >>>>>>>                     
> >>>>>> My first implementation of faceted browsing uses 
> BitSet's           
> >>> that get    
> >>>>>> pre-loaded for each category value (each unique term 
> in a           
> >>> "category"
> >>>    
> >>>>>> field, for example). And to intersect that with an 
> actual           
> >>> Query, it    
> >>>>>> gets run through the QueryFilter to get its BitSet and 
> then AND'd 
> >>>>>> together with each of the category BitSet's. Sounds 
> like           
> >>> a lot, but    
> >>>>>> for my applications there are not tons of these 
> BitSet's and the 
> >>>>>> performance has been outstanding. Now that I'm doing more
> >>>>>>                 
> >>>>> with Solr,
> >>>>>            
> >>>>>> I'm beginning to leverage its amazing caching 
> infrastructure and 
> >>>>>> replacing BitSet's with DocSet's.
> >>>>>>
> >>>>>> Erik
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>                 
> >>> 
> --------------------------------------------------------------------
> >>> -
> >>>    
> >>>>>            
> >>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>>>>> For additional commands, e-mail: 
> [EMAIL PROTECTED]
> >>>>>>
> >>>>>>
> >>>>>>                 
> >>> 
> --------------------------------------------------------------------
> >>> -
> >>>    
> >>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>>>> For additional commands, e-mail: 
> [EMAIL PROTECTED]
> >>>>>
> >>>>>             
> >>>>
> >>>>       
> >>> 
> --------------------------------------------------------------------
> >>> -
> >>>    
> >>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>>> For additional commands, e-mail: [EMAIL PROTECTED]
> >>>>
> >>>>
> >>>>         
> >>>     
> >>
> >>
> >> 
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >>   
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Aggregating category hits

Reply via email to