chris sleeman wrote: > Pike, > Thanks for your quick response. However I was looking for something sightly > different. > I understand the concept of query filtering, but what I really need is some > sort of "category counting" to refine searches. > > For e.g. my documents can have a fieldname - location, which could be any > city in a country. I want to display the documents (and count) that match > the search query for each city, so that the user can then search within the > search results. The name of cities is not known in advance. > > An example of something similar is - > http://reviews.cnet.com/4566-6501_7-0.html > > I just wanted to know whether anyone has tried doing this using Nutch , and > if so then I would be glad if he could give me some pointers for the same.
For the general principle of how to implement it using Lucene, please see the thread on Lucene java-user list about "Aggregating category hits", started on May 15 2006. This subject was discussed many times. From the point of view of Nutch - you can implement all necessary modifications within org.apache.nutch.searcher.IndexSearcher or LuceneQueryOptimizer. Then, if you don't want to change the DistributedSearch protocol, you could extend the o.a.n.s.Hits class to pass aggregated category info from back-ends to the front-end. For a certain project I implemented two methods of faceted search, one based on random sampling of search results, the other based on bitset intersections. Both methods work reasonably fast, although they differ in accuracy vs. speed balance. Unfortunately the code is not public - but the task is certainly doable, and doesn't require major changes. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
