Check out Chris Hostetter's methodology for doing this at cnet. http://mail-archives.apache.org/mod_mbox/lucene-java-user/200508.mbox/[EMAIL PROTECTED]
This sounds like it matches your requirements. cheers, j On 12/7/05, Ching-Pei Hsing <[EMAIL PROTECTED]> wrote: > > Has anyway solved the following problem, or have good suggestions? > > > > Each document is assigned to one or more category nodes in a hierarchy. > > For example, > > > > Document1: /Computer/Desktop, > > Document2: /Computer/Notebook; /Salesforce/ExtremePortable > > Document3: /Computer/Server > > ...... > > > > For each search operations, not only a list of documents hit is > presented but a list of categories containing those documents as well as > the count of documents are also computed > > > > /Computer/Desktop(30) > > /Computer/Notebook(12) > > /Computer/Accessories(51) > > > > One can see this really useful because it can "guide" the user while > refining the search criteria and quickly reduce the size of the result. > I know we can do this, by brut force, by going through the entire result > set, retrieving data for the category field and start aggregating and > counting. It's not scalable though if the number of documents needs to > go through is high. It can create performance issues under load if each > execution thread held on to the index reader for too long (due to the > number of documents needs to go through). > > > > Is there any API or approach we can leverage at search time? Is there > anything we can do at the indexing time? Or, is there any technology we > need to integrate, like those for data warehousing? Any comments or > pointers will be greatly appreciated. > > > > Thanks > > > > Ching-pei > > > > > > > > > > > > > > > > >