Re: Aggregating category hits

Kapil Chhabra Tue, 16 May 2006 03:12:52 -0700

Thanks a lot Jelda.
I'll try this get back with the performance comparison chart.


Regards,
kapilChhabra

Ramana Jelda wrote:

Hi Kapil,
As I remember FieldCache is in lucene api since 1.4 .
Ok . Anyhow here is suedo code that can help.

//1. initialize reader on opening documentId to the categoryid relation as
below. Depending on your requirement you can either getStringIndex().. I get
StringIndex in //my project.

String[] docId2CategoryIdRelation=FieldCache.DEFAULT.getStrings(reader,
categoryFieldName);

//2. cache it
//3. search as usal with your Query providing your own HitCollector
//4. use docId2CategoryIdRelation to retrieve category id for each result
document
String yourCategoryId=  docId2CategoryIdRelation[resultDocId]
//5.Increment yourCategoryId count (do lazy initialization of categoryCounts
holder.FAQ.)

//6 You are done.. :)

All the best,
Jelda
-----Original Message-----
From: Kapil Chhabra [mailto:[EMAIL PROTECTED]Sent: Tuesday, May 16, 2006 11:50 AM
To: [email protected]
Subject: Re: Aggregating category hits

Hi Jelda,
I have not yet migrated to Lucene 1.9 and I guess FieldCachehas been introduced in this release.
Can you please give me a pointer to your strategy of FieldCache?

Thanks & Regards,
Kapil Chhabra


Ramana Jelda wrote:
But this BitSet strategy is more memory consuming mainly if
you have
documents in million numbers and categories in thousands.
So I preferred in my project FieldCache strategy.

Jelda
-----Original Message-----
From: Kapil Chhabra [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 16, 2006 7:38 AM
To: [email protected]
Subject: Re: Aggregating category hits

Even I am doing the same in my application.
Once in a day, all the filters [for different categories] areinitialized. Each time a query is fired, the Query BitSet is ANDedwith the BitSet of each filter. The cardinality obtained is thedesired output.@Eric: I would like to know more about the implementation
with DocSet
in place of Bitset.

Regards,
kapilChhabra


Erik Hatcher wrote:
On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote:
If you needed to know not just the total number of hits, but thenumber of hits in each "category", how would you handle that?
For instance, a search for "egg" would have to produce
the 20 most
relevant documents for "egg", but also a list like this:

Holiday & Seasonal / Easter 75
Books / Cooking 52
Miscellaneous 44
Kitchen Collectibles 43
Hobbies / Crafts 17
[...]

It seems to me that you'd have to retrieve each hit's
stored fields
and examine the contents of a "category" field. That's a lot ofoverhead. Is there another way?
My first implementation of faceted browsing uses BitSet's
that get
pre-loaded for each category value (each unique term in a
"category"
field, for example). And to intersect that with an actual
Query, it
gets run through the QueryFilter to get its BitSet and then AND'dtogether with each of the category BitSet's. Sounds like
a lot, but
for my applications there are not tons of these BitSet's and theperformance has been outstanding. Now that I'm doing more
with Solr,
I'm beginning to leverage its amazing caching infrastructure andreplacing BitSet's with DocSet's.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Aggregating category hits

Reply via email to