Bugs item #752168, was opened at 2003-06-10 13:44
Message generated for change (Comment added) made by cutting
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=752168&group_id=59548

Category: web ui
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 7
Submitted By: Doug Cutting (cutting)
Assigned to: Doug Cutting (cutting)
Summary: combine same-host hits

Initial Comment:
We should combine multiple hits from the same host. 
The standard way this is done is to show the highest
scoring from a site in its normal place in the results.
 Then, if other hits from the same site would be shown
on the same page, show the second with the first, along
with a "More hits from this site" button.

To do this efficiently will require some caution.  The
construction of a result page for hits number M through
N has three phases:
  1. broadcast query to backends, merging the top-N
from each to find the top-N overall.
  2. get the title and url for M through N of the
overall top N
  3. get the summary for M through N of the overall top N

Each step is done as a parallel network call.  The
title/url lookup and the summary computation both
generally require a random disk access.

To implement this new feature, we'll probably want to
modify this somewhat, into something like:

  1. broadcast query to backends, merging the top-N+K
from each to find the top-N+K overall, where K is an
estimate of the number of hits which will have
non-unique host names.
  2. get the title and url for hits starting at M,
continuing until N-M unique host names are encountered.
 Start by making a parallel network call for the
(N-M)+L urls.  If too few unique hosts are found,
increase K and go back to step (1).
  3. get the summaries for the hits to be displayed, as
before.

Or something like that...  Getting extra hits in stage
(1) is cheap.  Getting extra urls in stage (2) is not.
 So if F is the rate at which we expect to see
duplicate hosts, K should be significantly larger than
(M-N)*F, while L should be close to (M-N)*F.




----------------------------------------------------------------------

>Comment By: Doug Cutting (cutting)
Date: 2004-08-03 13:54

Message:
Logged In: YES 
user_id=21778

Fixed.  See bug 989844 for details.

----------------------------------------------------------------------

Comment By: Doug Cutting (cutting)
Date: 2003-07-22 13:40

Message:
Logged In: YES 
user_id=21778

This should get fixed before a public launch.

----------------------------------------------------------------------

Comment By: Doug Cutting (cutting)
Date: 2003-06-10 13:44

Message:
Logged In: YES 
user_id=21778

Oops.  This is a web ui bug, not a web db bug.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=752168&group_id=59548


-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to