Well, the quick/simple exlanation is: If you have 5 urls with their associate nutch score:
http://a.com/something1 = 5.0 http://b.com/something2 = 4.0 http://c.com/something3 = 3.0 http://d.com/something4 = 2.0 http://e.com/something5 = 1.0 Then you set nutch to crawl with topN = 3 then a,b,c will be fetched and d and e will not. It just means "give me the 3 best ranking URLs" from the current crawl database. On 6/8/07, monkeynuts84 <[EMAIL PROTECTED]> wrote: > > Can someone give me an explanation of what topN does? I've seen various > pieces of info but some of them seem to be conflicting. I've noticed in my > crawls that certain sites are crawled more than other in each iteration of a > fetch. Is this caused by topN? > > Thanks. > -- > View this message in context: > http://www.nabble.com/Explanation-of-topN-tf3891964.html#a11033441 > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
