timelimit.mins is invalid when depth greater than 1 ---------------------------------------------------
Key: NUTCH-957 URL: https://issues.apache.org/jira/browse/NUTCH-957 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.2 Environment: openSUSE 11.3, jdk-1.6, ant-1.8, tomcat-6.0, nutch-1.2 Reporter: Wade Lau Fix For: 1.2 The setting value of fetcher.timelimit.mins will be invalid when runing ./bin/nutch crawl with depth=n (n>1). The reason is that the value of fetcher.timelimit.mins has been reset in the following paragraph ( org.apache.nutch.fetcher.Fetcher.java ), long timelimit = getConf().getLong("fetcher.timelimit.mins", -1); if (timelimit != -1) { timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000); LOG.info("Fetcher Timelimit set for : " + timelimit); getConf().setLong("fetcher.timelimit.mins", timelimit); } when the crawler goes down to next depth, the value will be the time value of last one which is timelimit.mins + currentTimeMillis. Some logs look like: depth=1 Fetcher: starting at 2011-01-16 20:58:53 Fetcher: segment: crawl/segments/20110116205851 Fetcher Timelimit set for : 1295182793540 now is:[1295182733540] timelimit:[1] new.sum:[1295182793540] depth=2 Fetcher: starting at 2011-01-16 21:00:20 Fetcher: segment: crawl/segments/20110116210018 Fetcher Timelimit set for : 77712262795220167 now is:[1295182820167] timelimit:[1295182793540] new.sum:[77712262795220167] The solution is easy to go as below: long timelimit = getConf().getLong("fetcher.timelimit.mins.init", -1); if( timelimit == -1) { timelimit = getConf().getLong("fetcher.timelimit.mins", -1); getConf().setLong("fetcher.timelimit.mins.init", timelimit); } if (timelimit != -1) { timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000); LOG.info("Fetcher Timelimit set for : " + timelimit); getConf().setLong("fetcher.timelimit.mins", timelimit); } Hope this will be helpful for the next release, and save time for others. refer: http://ufqi.com/exp/x1183.html?title=apache.nutch.timelimit.bug -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.