Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch parse" page has been changed by kiranchitturi:
http://wiki.apache.org/nutch/bin/nutch%20parse?action=diff&rev1=1&rev2=2

  Parse is an alias for org.apache.nutch.parse.ParseSegment
+ 
+ === Nutch 1.x ===
  
  The class parses contents in one segment. It assumes, under the given 
segment, the existence of ./fetcher_output/, which is typically generated after 
a non-parsing fetcher run (i.e., fetcher is started with option -noParsing or 
as default 'false' boolean value as specified in nutch-default.xml).
  
@@ -24, +26 @@

  
  '''<segmentdir>''': This should be the path to the segment directory 
containing our data for parsing. 
  
+ === Nutch 2.x ===
  
+ {{{
+ Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
+     <batchId>     - symbolic batch ID created by Generator
+     -crawlId <id> - the id to prefix the schemas to operate on, 
+                   (default: storage.crawl.id)
+     -all          - consider pages from all crawl jobs
+     -resume       - resume a previous incomplete job
+     -force        - force re-parsing even if a page is already parsed
+ }}}
  CommandLineOptions
  

Reply via email to