Sebastian Nagel created NUTCH-2950: -------------------------------------- Summary: UpdateHostDb: performance improvements Key: NUTCH-2950 URL: https://issues.apache.org/jira/browse/NUTCH-2950 Project: Nutch Issue Type: Improvement Components: hostdb Affects Versions: 1.18 Reporter: Sebastian Nagel Assignee: Sebastian Nagel Fix For: 1.19
This issue addresses a couple of performance improvements when creating the HostDb: - avoid needless conversions between hostname and URL - improvements of HostDb serialization (write and read) - parametrize logging and log less on level INFO - do not create DNS resolver threads if DNS look-ups are not requested by command-line options A patch/PR is ready. Depending on the chosen command-line options, a 10-20% speed-up should be visible if DNS look-ups, normalization and filtering are off. -- This message was sent by Atlassian Jira (v8.20.7#820007)