Re: [Nutch-general] Recovering aborted fetch

rubdabadub Sun, 25 Feb 2007 15:06:02 -0800

Hi:

I am Sorry to say that you need to fetch again i.e your last segment.
I know the feeling :-( AFAIK there is no way in 0.8 restart a failed
crawl. I have found having small segment i.e generating small fetch
list and merging all the segment later is the only way to avoid such
situation.


Regards

On 2/25/07, Mathijs Homminga <[EMAIL PROTECTED]> wrote:
> Hi,
>
> While fetching a segment with 4M documents, we ran out of diskspace.
> We guess that the fetcher has fetched (and parsed) about 80 percent of
> the documents, so it would be great if we could continue our crawl somehow.
>
> The segment directory does not contain a crawl_fetch subdirectory yet.
> But we have a /tmp/hadoop/mapred/ (Local FS) directory.
>
> Is there some way we can use the data in the temporary mapred directory
> to create the crawl_fetch data in order to continue our crawl?
>
> Thanks!
> Mathijs
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Recovering aborted fetch

Reply via email to