Re: I have to use distributed scrapy?

Steven Almeroth Fri, 04 Mar 2016 19:39:55 -0800

Scrapy is single-threaded so the GIL is not likely to be the cause. Is the 
machine CPU bound or does it appear idle?


Try playing with some settings:

   - LOG_ENABLED = False
   - CONCURRENT_REQUESTS = 100
   - CONCURRENT_REQUESTS_PER_IP = 8
   - DNSCACHE_ENABLED = True
   - DOWNLOAD_DELAY = 0

Try the builtin benchmarking: 
http://doc.scrapy.org/en/latest/topics/benchmarking.html

On Thursday, March 3, 2016 at 1:13:01 AM UTC-7, Berkant AYDIN wrote:
>
> Hi everyone,
>
> I have to do realtime scraping. I try optimization options on 
> documentation but still slowly. 1600 page crawling only 9 seconds. Yea its 
> very speedy but still not enough. 860 mb/s AWS machine. How can increase 
> performance ? I have to use distributed options ? If yes, which one ? It's 
> a GIL problem ? I have to continue with PyPy ?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: I have to use distributed scrapy?

Reply via email to