[ https://issues.apache.org/jira/browse/NUTCH-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540128#comment-17540128 ]
Markus Jelsma commented on NUTCH-2950: -------------------------------------- I've seen the patch, there's no need to split it up into smaller changes if you ask me. The changes are good! +1 > UpdateHostDb: performance improvements > -------------------------------------- > > Key: NUTCH-2950 > URL: https://issues.apache.org/jira/browse/NUTCH-2950 > Project: Nutch > Issue Type: Improvement > Components: hostdb > Affects Versions: 1.18 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.19 > > > This issue addresses a couple of performance improvements when creating the > HostDb: > - avoid needless conversions between hostname and URL > - improvements of HostDb serialization (write and read) > - parametrize logging and log less on level INFO > - do not create DNS resolver threads if DNS look-ups are not requested by > command-line options > A patch/PR is ready. Depending on the chosen command-line options, a 10-20% > speed-up should be visible if DNS look-ups, normalization and filtering are > off. -- This message was sent by Atlassian Jira (v8.20.7#820007)