[ https://issues.apache.org/jira/browse/NUTCH-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783239#action_12783239 ]
Hudson commented on NUTCH-761: ------------------------------ Integrated in Nutch-trunk #995 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/995/]) Fix a bug resulting from over-eager optimization in . Avoid cloning CrawlDatum in CrawlDbReducer. > Avoid cloningCrawlDatum in CrawlDbReducer > ------------------------------------------ > > Key: NUTCH-761 > URL: https://issues.apache.org/jira/browse/NUTCH-761 > Project: Nutch > Issue Type: Improvement > Reporter: Julien Nioche > Assignee: Andrzej Bialecki > Priority: Minor > Fix For: 1.1 > > Attachments: optiCrawlReducer.patch > > > In the huge majority of cases the CrawlDbReducer gets unique CrawlData in its > reduce phase and these will be the entries coming from the crawlDB and not > present in the segments. > The patch attached optimizes the reduce step by avoid an unnecessary cloning > of the CrawlDatum fields when there is only one CrawlDatum in the values. > This has more impact has the crawlDB gets larger, we noticed an improvement > of around 25-30% in the time spent in the reduce phase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.