Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch parse" page has been changed by kiranchitturi: http://wiki.apache.org/nutch/bin/nutch%20parse?action=diff&rev1=1&rev2=2 Parse is an alias for org.apache.nutch.parse.ParseSegment + + === Nutch 1.x === The class parses contents in one segment. It assumes, under the given segment, the existence of ./fetcher_output/, which is typically generated after a non-parsing fetcher run (i.e., fetcher is started with option -noParsing or as default 'false' boolean value as specified in nutch-default.xml). @@ -24, +26 @@ '''<segmentdir>''': This should be the path to the segment directory containing our data for parsing. + === Nutch 2.x === + {{{ + Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force] + <batchId> - symbolic batch ID created by Generator + -crawlId <id> - the id to prefix the schemas to operate on, + (default: storage.crawl.id) + -all - consider pages from all crawl jobs + -resume - resume a previous incomplete job + -force - force re-parsing even if a page is already parsed + }}} CommandLineOptions