This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from 873d7bf Merge pull request #473 from
sebastian-nagel/NUTCH-2381-text-prof-signature-lexicographic-sorting
new f02c98e NUTCH-2737 Generator: count and log reason of rejections
during selection - add counters for rejections in Generator's SelectorMapper -
parameterize log messages to simplify code
new e46232d NUTCH-2738 Generator: document property
generate.restrict.status - add generate.restrict.status to nutch-default.xml -
get status (byte) from status name in setConf() to speed up comparison in
SelectorMapper
new 8d21260 Generator: fix logging of hostdb path
new 35da06f NUTCH-2737 Generator: count and log reason of rejections
during selection - count rejections by `generate.max.count` * number of hosts
(resp. domains) affected * number of URLs skipped total (for all hosts)
new 44ded9b Generator: apply formatting
new 4d68c08 NUTCH-2740 Generator: generate.max.count overflow not logged
new 2f310ae Generator: improve description of crawl.gen.delay
new a2762f0 Merge pull request #477 from
sebastian-nagel/NUTCH-2737-generator-log-selection
The 2970 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
conf/nutch-default.xml | 17 +-
src/java/org/apache/nutch/crawl/CrawlDatum.java | 9 +
src/java/org/apache/nutch/crawl/Generator.java | 837 ++++++++++++------------
3 files changed, 456 insertions(+), 407 deletions(-)