[Nutch-general] Recovering aborted fetch

Mathijs Homminga Sun, 25 Feb 2007 13:22:17 -0800

Hi,

While fetching a segment with 4M documents, we ran out of diskspace.
We guess that the fetcher has fetched (and parsed) about 80 percent of 
the documents, so it would be great if we could continue our crawl somehow.


The segment directory does not contain a crawl_fetch subdirectory yet. 
But we have a /tmp/hadoop/mapred/ (Local FS) directory.

Is there some way we can use the data in the temporary mapred directory 
to create the crawl_fetch data in order to continue our crawl?

Thanks!
Mathijs


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Recovering aborted fetch

Reply via email to