Robin Haswell wrote:
> Hi there
>
> My fetch process must have nearly finished, and now it's slaying the
> server. I have a horrible feeling it's hung. I have the parse option
> enabled in the configuration so it could be doing that - I've fetched a
> lot of documents (it took 1 week at 250KB/s)
>
> I guess I have two questions:
>
> 1. Am I going to have to kill the fetch and start again? It's running at
> 100% CPU and 68% memory - this has only spiked when the debug messages
> about fetching ceased appearing, and it's taken a few hours after
> finishing to get to this CPU and memory usage
>
> 2. Can I resume an unfinished fetch where I left off before? If I have
> to kill this I can't bear the thought of waiting another week to fetch.
>   

Ad 1.

I suspect that it's sorting the reduce output now ... in 0.8.x this 
operation has poor performance, especially when run on a single server. 
So, I advise patience, and giving as much CPU and RAM as possible. For 
the future, it's also much much better to run the fetcher in non-parsing 
mode and run "nutch parse" afterwards as a separate step.

If you run with disk mounted in the default mode, you may try to change 
it on the fly to "async,noatime", check the "mount" man page for details 
how to do this on a live system. Of course, this has the price that if 
the system crashes then you are likely to lose a lot more data ...

Ad 2.

Unfortunately, it's not possible for now to keep partial results.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to