[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498689#comment-14498689 ]
Michael Joyce commented on NUTCH-1911: -------------------------------------- Hey folks, Here's what the output from this looks like {code} Usage: DomainStatistics inputDirs outDir mode [numOfReducer] inputDirs Comma separated list of crawldb input directories E.g.: crawl/crawldb/current/ outDir Output directory where results should be dumped mode Set statistics gathering mode host Gather statistics by host domain Gather statistics by domain suffix Gather statistics by suffix tld Gather statistics by top level directory [numOfReducers] Optional number of reduce jobs to use. Defaults to 1. {code} > Imeprove DomainStatistics tool command line parsing > --------------------------------------------------- > > Key: NUTCH-1911 > URL: https://issues.apache.org/jira/browse/NUTCH-1911 > Project: Nutch > Issue Type: Bug > Components: util > Affects Versions: 1.9, 2.2.1 > Reporter: Lewis John McGibbney > Priority: Trivial > Fix For: 1.11 > > > The DomainStatistic's tool could be improved based on the comments addressed > in [this mai > thread|http://www.mail-archive.com/user%40nutch.apache.org/msg13028.html] > For convenience, I've also pasted them below > {quote} > You cannot just tell it where the crawldb is, you need to tell it where the > directory is, so specifying current is ok, but not part-* > {quote} > Patch should be trivial work -- This message was sent by Atlassian JIRA (v6.3.4#6332)