[ 
https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516602
 ] 

Emmanuel Joke commented on NUTCH-530:
-------------------------------------

I'm sure to follow your point regarding the outlinks number. 

I don't think its relevant to take into account the number of inlinks. A url 
can have inlink from different segments. If we really want to do it, it means 
that we will have to update the db using all segments in one update. So far, 
the updateDb is done only on a single segment. 


> Add a combiner to improve performance on updatedb
> -------------------------------------------------
>
>                 Key: NUTCH-530
>                 URL: https://issues.apache.org/jira/browse/NUTCH-530
>             Project: Nutch
>          Issue Type: Improvement
>         Environment: java 1.6
>            Reporter: Emmanuel Joke
>            Assignee: Emmanuel Joke
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-530.patch
>
>
> We have a lot of similar links with status "linked" generated at the ouput of 
> the map task when we try to update the crawldb based on the segment fetched.
> We can use a combiner to improve the performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to