[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche updated NUTCH-1370: --------------------------------- Priority: Minor (was: Major) Running in pseudo-distributed mode gives you more information if you look at the Hadoop web interface. You get the number of items passed to the mappers and reducers etc... You can of course add a message like this in the logs, won't do any harm :-) > Expose exact number of urls injected @runtime > ---------------------------------------------- > > Key: NUTCH-1370 > URL: https://issues.apache.org/jira/browse/NUTCH-1370 > Project: Nutch > Issue Type: Improvement > Components: injector > Affects Versions: 1.4, nutchgora > Reporter: Lewis John McGibbney > Priority: Minor > Fix For: 1.5, 2.1 > > > Example: When using trunk, currently we see > {code} > 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at > 2012-05-22 09:04:00 > 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: > crawl/crawldb > 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls > 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected > urls to crawl db entries. > 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: > {code} > I would like to see > {code} > 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at > 2012-05-22 09:04:00 > 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: > crawl/crawldb > 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls > 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Injected N urls to > crawl/crawldb > 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected > urls to crawl db entries. > 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: > {code} > This would make debugging easier and would help those who end up getting > {code} > 2012-05-22 09:04:04,850 WARN crawl.Generator - Generator: 0 records selected > for fetching, exiting ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira