Thanks!

I'd like to automate this.  Do you know which Java class does the actual
parsing?

Thanks again,
Anton

On 7/26/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> Anton Beza wrote:
> > Hello,
> >
> > I'm trying to find a way to re-parse the pages stored through Nutch.
> >
> > I want to be able to access the pages Nutch has already processed and
> > stored, apply a new parser, and replace the old content with the new.
> >
> > Is this possible in Nutch 0.8, or will it have to be altered to achieve
> > this?
>
> Just remove the following directories from each segment: crawl_parse,
> parse_text, parse_data, and then run bin/nutch parse on these segments.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to