[Nutch Wiki] Update of "bin/nutch readdb" by kiranchitturi

Apache Wiki Wed, 20 Mar 2013 14:22:51 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "bin/nutch readdb" page has been changed by kiranchitturi:
http://wiki.apache.org/nutch/bin/nutch%20readdb?action=diff&rev1=1&rev2=2

  Readdb is an alias for org.apache.nutch.crawl.CrawlDbReader
+ 
+ == Nutch 1.x ==
  
  The CrawlDbReader implements all the read-only parts of accessing our web 
database. It provides us with a read utility for the crawldb.
  
@@ -24, +26 @@

  
  '''-url <url>''': This simply prints information of any particular <url> to 
System.out.
  
+ == Nutch 2.x ==
  
+ {{{
+ Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex regex]) 
+                     [-crawlId <id>] [-content] [-headers] [-links] [-text]
+     -crawlId <id>  - the id to prefix the schemas to operate on, 
+                    (default: storage.crawl.id)
+     -stats [-sort] - print overall statistics to System.out
+     [-sort]        - list status sorted by host
+     -url <url>     - print information on <url> to System.out
+     -dump <out_dir> [-regex regex] - dump the webtable to a text file in 
+                    <out_dir>
+     -content       - dump also raw content
+     -headers       - dump protocol headers
+     -links         - dump links
+     -text          - dump extracted text
+     [-regex]       - filter on the URL of the webtable entry
+ 
+ }}}
  
  CommandLineOptions

[Nutch Wiki] Update of "bin/nutch readdb" by kiranchitturi

Reply via email to