The settings look ok to me.

I think the stack trace is from twisted's thread pool, and most likely
that's what it looks like for idle threads in the pool (that's ok, we'd
expect that). Take a look at the stack traces that look different to this
one - what are they doing?

Maybe you could try experimenting and see if you can narrow it down? Like
set the max depth to 1 (to ensure we don't get stuck in too few domains
being polite), disabling most pipelines, etc.




On 4 September 2014 09:22, Davide Setti <[email protected]> wrote:

> Hi,
> I'm trying to use scrapy to do a broad crawl. What I'm doing is to follow
> every link I find on every page, if the domain matches a rule. I feed the
> Spider with a few public directories of websites, and I use very high
> concurrency. At the beginning it's fast (first minute: 1381 pages/min), but
> then the speed decreases every minute down to 50 pages/min after 6 minutes.
> Then it's slow and stable ;)
>
> CPU, memory, network and disk usages are very low, after the initial peak.
>
> I noticed more than 100k requests in the queue after a few minutes, and
> just a few thoundands crawled pages.
>
> I tried different settings for CONCURRENT_REQUESTS,
> CONCURRENT_REQUESTS_PER_IP and added more start_urls, but it only increased
> the initial peak.
>
> Is there something wrong with my settings?
> https://gist.github.com/vad/3c3859ee17c07bcb3636
>
> In the gist I also put the stack trace i see in every thread (i used the
> debugging middleware).
>
> Regards
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to