The settings look ok to me. I think the stack trace is from twisted's thread pool, and most likely that's what it looks like for idle threads in the pool (that's ok, we'd expect that). Take a look at the stack traces that look different to this one - what are they doing?
Maybe you could try experimenting and see if you can narrow it down? Like set the max depth to 1 (to ensure we don't get stuck in too few domains being polite), disabling most pipelines, etc. On 4 September 2014 09:22, Davide Setti <[email protected]> wrote: > Hi, > I'm trying to use scrapy to do a broad crawl. What I'm doing is to follow > every link I find on every page, if the domain matches a rule. I feed the > Spider with a few public directories of websites, and I use very high > concurrency. At the beginning it's fast (first minute: 1381 pages/min), but > then the speed decreases every minute down to 50 pages/min after 6 minutes. > Then it's slow and stable ;) > > CPU, memory, network and disk usages are very low, after the initial peak. > > I noticed more than 100k requests in the queue after a few minutes, and > just a few thoundands crawled pages. > > I tried different settings for CONCURRENT_REQUESTS, > CONCURRENT_REQUESTS_PER_IP and added more start_urls, but it only increased > the initial peak. > > Is there something wrong with my settings? > https://gist.github.com/vad/3c3859ee17c07bcb3636 > > In the gist I also put the stack trace i see in every thread (i used the > debugging middleware). > > Regards > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
