chris sleeman wrote:
> Pike,
> Thanks for your quick response. However I was looking for something sightly
> different.
> I understand the concept of query filtering, but what I really need is some
> sort of "category counting" to refine searches.
> 
> For e.g. my documents can have a fieldname - location, which could be any
> city in a country. I want to display the documents (and count) that match
> the search query for each city, so that the user can then search within the
> search results. The name of cities is not known in advance.
> 
> An example of something similar is  -
> http://reviews.cnet.com/4566-6501_7-0.html
> 
> I just wanted to know whether anyone has tried doing this using Nutch , and
> if so then I would be glad if he could give me some pointers for the same.

For the general principle of how to implement it using Lucene, please 
see the thread on Lucene java-user list about "Aggregating category 
hits", started on May 15 2006. This subject was discussed many times.

 From the point of view of Nutch - you can implement all necessary 
modifications within org.apache.nutch.searcher.IndexSearcher or 
LuceneQueryOptimizer. Then, if you don't want to change the 
DistributedSearch protocol, you could extend the o.a.n.s.Hits class to 
pass aggregated category info from back-ends to the front-end.

For a certain project I implemented two methods of faceted search, one 
based on random sampling of search results, the other based on bitset 
intersections. Both methods work reasonably fast, although they differ 
in accuracy vs. speed balance. Unfortunately the code is not public - 
but the task is certainly doable, and doesn't require major changes.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to