Scrapy is single-threaded so the GIL is not likely to be the cause. Is the machine CPU bound or does it appear idle?
Try playing with some settings: - LOG_ENABLED = False - CONCURRENT_REQUESTS = 100 - CONCURRENT_REQUESTS_PER_IP = 8 - DNSCACHE_ENABLED = True - DOWNLOAD_DELAY = 0 Try the builtin benchmarking: http://doc.scrapy.org/en/latest/topics/benchmarking.html On Thursday, March 3, 2016 at 1:13:01 AM UTC-7, Berkant AYDIN wrote: > > Hi everyone, > > I have to do realtime scraping. I try optimization options on > documentation but still slowly. 1600 page crawling only 9 seconds. Yea its > very speedy but still not enough. 860 mb/s AWS machine. How can increase > performance ? I have to use distributed options ? If yes, which one ? It's > a GIL problem ? I have to continue with PyPy ? > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
