Thanks!
I'd like to automate this. Do you know which Java class does the actual
parsing?
Thanks again,
Anton
On 7/26/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> Anton Beza wrote:
> > Hello,
> >
> > I'm trying to find a way to re-parse the pages stored through Nutch.
> >
> > I want to be able to access the pages Nutch has already processed and
> > stored, apply a new parser, and replace the old content with the new.
> >
> > Is this possible in Nutch 0.8, or will it have to be altered to achieve
> > this?
>
> Just remove the following directories from each segment: crawl_parse,
> parse_text, parse_data, and then run bin/nutch parse on these segments.
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general