Well, the quick/simple exlanation is:

If you have 5 urls with their associate nutch score:

http://a.com/something1 = 5.0
http://b.com/something2 = 4.0
http://c.com/something3 = 3.0
http://d.com/something4 = 2.0
http://e.com/something5 = 1.0

Then you set nutch to crawl with topN = 3 then a,b,c will be fetched
and d and e will not.  It just means "give me the 3 best ranking URLs"
from the current crawl database.

On 6/8/07, monkeynuts84 <[EMAIL PROTECTED]> wrote:
>
> Can someone give me an explanation of what topN does? I've seen various
> pieces of info but some of them seem to be conflicting. I've noticed in my
> crawls that certain sites are crawled more than other in each iteration of a
> fetch. Is this caused by topN?
>
> Thanks.
> --
> View this message in context: 
> http://www.nabble.com/Explanation-of-topN-tf3891964.html#a11033441
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


-- 
"Conscious decisions by conscious minds are what make reality real"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to