Bugs item #989844, was opened at 2004-07-12 20:32
Message generated for change (Comment added) made by ericholman
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=989844&group_id=59548

Category: searcher
Group: None
Status: Open
Resolution: None
Priority: 7
Submitted By: Stefan Groschupf (joa23)
Assigned to: Doug Cutting (cutting)
Summary: host grouping

Initial Comment:
This patch group hosts per hit page.

If you like it, _please_ vote for it. ;)

The hits of one host are grouped _per page_ (! not over the 
complete result set) in a <code>HostHit</code> object.
A HostHit object has at least 1+n Hit objects.

The patch provide a new API beside the old API. 
HostHits  = NutchBean.search(Query query, int numHits, int 
hitsPerPage)

This patch allow you to realize different scenarios like.

+ show only one Hit from a host per Page
+ show all hits from a host below the hit with the highest score 
and indent them, similar google does it
+ show one hit per host and show the urls of the other host hits 
below
+ allow Users to switch on or off host grouping 
+ much more

What ever you wish to do, you need to realize that in the jsp page 
with the new method call and using HostHits, HotsHit and Hit.

Some code snippets to get an idea how you can do that:
HostHits hits = bean.search(query, start+hitsPerPage, 
hitsPerPage);
HostHit[] show = hits.getHostHits(start, length)
...
if(hits.getTotal()<=start){
   start = (int) (hits.getTotal()/hitsPerPage-0.49);
   }  
...
 Hit mainHit = show[i].getHit(0);
 HitDetails detail = bean.getDetails(mainHit);
String title = detail.getValue("title");
 String url = detail.getValue("url");
 String summary = bean.getSummary(detail,query);
...

 int hostHitsCount = show[i].getHits().length;
  if (hostHitsCount>1){   
 for (int j= 1; j<hostHitsCount; j++ ){
 HitDetails hostHitDetail = bean.getDetails(show[i].getHit(j));
String hostHitUrl = hostHitDetail.getValue("url");
...

}

You need to add this to nutch-default.xml as well.

<property>
  <name>search.page.raw.hits.factor</name>
  <value>2</value>
  <description>
  A factor that is used to determinate the number of raw hits 
initially fetched, 
  before a host grouping is done.
  </description>
</property>


----------------------------------------------------------------------

Comment By: holman (ericholman)
Date: 2004-07-12 22:54

Message:
Logged In: YES 
user_id=1015664

I vote for it also. However, would ultimately like to see 
grouping across the entire result set, rather than just per 
page.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=989844&group_id=59548


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to