[ https://issues.apache.org/jira/browse/NUTCH-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937642#comment-13937642 ]
Lewis John McGibbney commented on NUTCH-1738: --------------------------------------------- This concept could also be ported to 1.X as AFAIK we do not know the num of URLs generated explicitly but rely upon a restrictive value to be set for generate.max.count property in nutch-default.xml. It is of course advised to set smaller more frequent fetchlists*, however the logging is still valuable as it indicates how many URLs _should/could_ have been fetched per round. *Please note I am referring to fetchlists and BatchId's as an equivalent entity here. > Expose number of URLs generated per batch in GeneratorJob > --------------------------------------------------------- > > Key: NUTCH-1738 > URL: https://issues.apache.org/jira/browse/NUTCH-1738 > Project: Nutch > Issue Type: Bug > Components: generator > Affects Versions: 2.2.1 > Reporter: Lewis John McGibbney > Fix For: 2.3 > > > GeneratorJob contains one trivial line of logging > {code:title=GeneratorJob.java|borderStyle=solid} > LOG.info("GeneratorJob: generated batch id: " + batchId); > {code} > I propose to improve this logging by exposing how many URL's are contained > within the generated batch. Something like > {code:title=GeneratorJob.java|borderStyle=solid} > LOG.info("GeneratorJob: generated batch id: " + batchId + " containing " + > $numOfURLs + " URLs"); > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)