[ https://issues.apache.org/jira/browse/NUTCH-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2003. ------------------------------------ Fix Version/s: 2.5 Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. > topN is not work correctly > -------------------------- > > Key: NUTCH-2003 > URL: https://issues.apache.org/jira/browse/NUTCH-2003 > Project: Nutch > Issue Type: Bug > Affects Versions: 2.3 > Reporter: Talat Uyarer > Priority: Minor > Fix For: 2.5 > > > I want to crawl top 1000 urls which are ordered by scores from webpage table. > It doesnt work correctly. > When I use topN parameter, it is divided by map task counts (topN/ > maptaskcounts = maptasktopN) Every map tasks generate maptasktopN urls of map > tasks. Assume as I have 25 map tasks and I set topN parameter as 1000 and > maptasktopN is calculated as 40. As Result We dont have top 1000 highest > scored urls, we have 1000 urls of generated 40 highest scored urls per 25 map > tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)