Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch readdb" page has been changed by kiranchitturi: http://wiki.apache.org/nutch/bin/nutch%20readdb?action=diff&rev1=1&rev2=2 Readdb is an alias for org.apache.nutch.crawl.CrawlDbReader + + == Nutch 1.x == The CrawlDbReader implements all the read-only parts of accessing our web database. It provides us with a read utility for the crawldb. @@ -24, +26 @@ '''-url <url>''': This simply prints information of any particular <url> to System.out. + == Nutch 2.x == + {{{ + Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex regex]) + [-crawlId <id>] [-content] [-headers] [-links] [-text] + -crawlId <id> - the id to prefix the schemas to operate on, + (default: storage.crawl.id) + -stats [-sort] - print overall statistics to System.out + [-sort] - list status sorted by host + -url <url> - print information on <url> to System.out + -dump <out_dir> [-regex regex] - dump the webtable to a text file in + <out_dir> + -content - dump also raw content + -headers - dump protocol headers + -links - dump links + -text - dump extracted text + [-regex] - filter on the URL of the webtable entry + + }}} CommandLineOptions