--- Doug Cutting <[EMAIL PROTECTED]> wrote: > Does the fetcher ever complete?
It just kept slowing down, so I killed it before it could complete. > If not, some of the fetcher threads > could be stuck. There was a bug in the PDF parser which caused it to > > hang on some documents. So, as you encounter more of such pages, > your > crawl would slow as more threads get stuck. Possible. I didn't analyze the output to see if I fetched any PDFs. > You might also try more threads. I've noticed that it sometime takes > a > few minutes for the fetcher to "settle down", so that its initial > performance is not representative of the overall. Yes. However, this was a clear downward pattern after several hours of fetching. Threads could always gets stuck for some reason. I wonder if there should be a general thread monitor that kills/recreates them after a while. Otis > [EMAIL PROTECTED] wrote: > > Hello, > > > > I am using Nutch 0.5 and am wondering whether anyone noticed that > > fetcher sometimes continuously slows down, from the moment it was > > started? > > > > I am using 10 threads, and I noticed that the fetcher started with > > about 100KB/second, went up to 200kb/second, and then the crawling > rate > > started continously going down. After half a day it was crawling at > a > > rate of 30KB/second. The fetch list consists of a number of random > > hosts, so I don't think this should be caused by the delay between > > requests to the the host. There was no other netowork traffic on > my > > server. Of course, there could be something external to my machine > and > > network card, but I couldn't check that. > > > > Has anyone seen this with Nutch? Should I suspect Nutch, or > something > > local to my installation or even external to my machine? > > > > Thanks, > > Otis > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: thawte's Crypto Challenge Vl > > Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam > > Camcorder. More prizes in the weekly Lunch Hour Challenge. > > Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m > > _______________________________________________ > > Nutch-developers mailing list > > [EMAIL PROTECTED] > > https://lists.sourceforge.net/lists/listinfo/nutch-developers > > > ------------------------------------------------------- > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > Project Admins to receive an Apple iPod Mini FREE for your judgement > on > who ports your project to Linux PPC the best. Sponsored by IBM. > Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php > _______________________________________________ > Nutch-developers mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
