[
https://issues.apache.org/jira/browse/NUTCH-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649533#comment-16649533
]
Sebastian Nagel commented on NUTCH-1842:
----------------------------------------
Hi [~yossi], thanks for bringing this back to the agenda and thanks for the PR.
I looked into the source code history: the implementation always used days for
{{crawl.gen.delay}} (since
[71c0cae|https://github.com/apache/nutch/commit/71c0cae1190c7176e5ff68bbf5236e22c1a1f723]),
the "wrong" description in nutch-default.xml has been added later
([133cc06|https://github.com/apache/nutch/commit/133cc0696d588d373f4657ea5d334105513f2976]).
That makes the decision what to fix not easier. But any fix is better for
sure. We should add a note/warning in the change log.
> crawl.gen.delay has a wrong default value in nutch-default.xml or is being
> parsed incorrectly
> ----------------------------------------------------------------------------------------------
>
> Key: NUTCH-1842
> URL: https://issues.apache.org/jira/browse/NUTCH-1842
> Project: Nutch
> Issue Type: Bug
> Components: generator
> Affects Versions: 1.9
> Reporter: kaveh minooie
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.16
>
>
> this is from nutch-default.xml:
> <property>
> <name>crawl.gen.delay</name>
> <value>604800000</value>
> <description>
> This value, expressed in milliseconds, defines how long we should keep the
> lock on records
> in CrawlDb that were just selected for fetching. If these records are not
> updated
> in the meantime, the lock is canceled, i.e. they become eligible for
> selecting.
> Default value of this is 7 days (604800000 ms).
> </description>
> </property>
> this is the from o.a.n.crawl.Generator.configure(JobConf job)
> genDelay = job.getLong(GENERATOR_DELAY, 7L) * 3600L * 24L * 1000L;
> the value in config file is in milliseconds but the code expect it to be in
> days. I reported this couple of years ago on the mailing list as well. I
> didn't post a patch becaue I am not sure which one needs to be fixed.
> considering all the other values in config file are in milliseconds it can be
> argued to that consistency matters, but 'day' is a much more reasonable unit
> for this property.
> Also this value is not being used in 2.x ?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)