[ 
https://issues.apache.org/jira/browse/NUTCH-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2003.
------------------------------------
    Fix Version/s: 2.5
       Resolution: Auto Closed

Closing 2.5 issues as branch is no longer maintained.

> topN is not work correctly
> --------------------------
>
>                 Key: NUTCH-2003
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2003
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.3
>            Reporter: Talat Uyarer
>            Priority: Minor
>             Fix For: 2.5
>
>
> I want to crawl top 1000 urls which are ordered by scores from webpage table. 
> It doesnt work correctly. 
> When I use topN parameter,  it is divided by map task counts (topN/ 
> maptaskcounts = maptasktopN) Every map tasks generate maptasktopN urls of map 
> tasks. Assume as I have 25 map tasks and I set topN parameter as 1000 and 
> maptasktopN is calculated as 40. As Result We dont have top 1000 highest 
> scored urls, we have 1000 urls of generated 40 highest scored urls per 25 map 
> tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to