Anton Beza wrote: > Hello, > > I'm trying to find a way to re-parse the pages stored through Nutch. > > I want to be able to access the pages Nutch has already processed and > stored, apply a new parser, and replace the old content with the new. > > Is this possible in Nutch 0.8, or will it have to be altered to achieve > this?
Just remove the following directories from each segment: crawl_parse, parse_text, parse_data, and then run bin/nutch parse on these segments. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
