cesar voulgaris wrote: > OK, in the DB_unfetched there are pages still to be fetched..thanks. > What I > don´t still get is why if I increase the number of threads in a fetch > process (to speed it up), at the end of the process (some predefined > depth) > I got less "DB_fetched" pages than when I fetch will less threads > (exactly > the same crawl!!). At the end, the meaning of speeding up the process > is to > have certain number of "fetched" pages in aa lapse of time. I start a > crawl > 22 hrs ago with ~500.000 pages in db and ~100.000 DB_fetched. > fetcher.threads.fetch is set to 20. I´ll try the next set for 24 hrs > with 10 > threads and I´ll coment you the results. Perhaps theres something with > the > stack or the jvm which I don´t understand and affect the threading > process > performance
Most likely you can't fetch pages that fast because of "politeness" settings, i.e. maximum number of threads accessing a single host, and delay between requests. Look for "Exceeded http.max.delays" errors in your log. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
