Robin Haswell wrote: > Hi there > > My fetch process must have nearly finished, and now it's slaying the > server. I have a horrible feeling it's hung. I have the parse option > enabled in the configuration so it could be doing that - I've fetched a > lot of documents (it took 1 week at 250KB/s) > > I guess I have two questions: > > 1. Am I going to have to kill the fetch and start again? It's running at > 100% CPU and 68% memory - this has only spiked when the debug messages > about fetching ceased appearing, and it's taken a few hours after > finishing to get to this CPU and memory usage > > 2. Can I resume an unfinished fetch where I left off before? If I have > to kill this I can't bear the thought of waiting another week to fetch. >
Ad 1. I suspect that it's sorting the reduce output now ... in 0.8.x this operation has poor performance, especially when run on a single server. So, I advise patience, and giving as much CPU and RAM as possible. For the future, it's also much much better to run the fetcher in non-parsing mode and run "nutch parse" afterwards as a separate step. If you run with disk mounted in the default mode, you may try to change it on the fly to "async,noatime", check the "mount" man page for details how to do this on a live system. Of course, this has the price that if the system crashes then you are likely to lose a lot more data ... Ad 2. Unfortunately, it's not possible for now to keep partial results. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
