[ 
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498689#comment-14498689
 ] 

Michael Joyce commented on NUTCH-1911:
--------------------------------------

Hey folks,

Here's what the output from this looks like

{code}
Usage: DomainStatistics inputDirs outDir mode [numOfReducer]
        inputDirs       Comma separated list of crawldb input directories
                        E.g.: crawl/crawldb/current/
        outDir          Output directory where results should be dumped
        mode            Set statistics gathering mode
                                host    Gather statistics by host
                                domain  Gather statistics by domain
                                suffix  Gather statistics by suffix
                                tld     Gather statistics by top level directory
        [numOfReducers] Optional number of reduce jobs to use. Defaults to 1.
{code}

> Imeprove DomainStatistics tool command line parsing
> ---------------------------------------------------
>
>                 Key: NUTCH-1911
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1911
>             Project: Nutch
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 1.9, 2.2.1
>            Reporter: Lewis John McGibbney
>            Priority: Trivial
>             Fix For: 1.11
>
>
> The DomainStatistic's tool could be improved based on the comments addressed 
> in [this mai 
> thread|http://www.mail-archive.com/user%40nutch.apache.org/msg13028.html]
> For convenience, I've also pasted them below
> {quote}
> You cannot just tell it where the crawldb is, you need to tell it where the 
> directory is, so specifying current is ok, but not part-*
> {quote}
> Patch should be trivial work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to