Anton Beza wrote:
> Hello,
> 
> I'm trying to find a way to re-parse the pages stored through Nutch.
> 
> I want to be able to access the pages Nutch has already processed and
> stored, apply a new parser, and replace the old content with the new.
> 
> Is this possible in Nutch 0.8, or will it have to be altered to achieve
> this?

Just remove the following directories from each segment: crawl_parse, 
parse_text, parse_data, and then run bin/nutch parse on these segments.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to