I am using scrapy to scrape multiple sites and Scrapyd to place jobs and
logging.
I had written 7 spiders and each spider processes at least 50 start URLs. I
have around 7000 URL's. 1000 URL's for each spider.
As i start placing jobs in ScrapyD with 50 start URL's per job. Initially
all spiders responds fine but suddenly they start working really slow.
Response time for each start URL is really slow after some time.
I have searched enough my settings looks like this:
BOT_NAME = 'service_scraper'
SPIDER_MODULES = ['service_scraper.spiders']
NEWSPIDER_MODULE = 'service_scraper.spiders'
USER_AGENT = 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'
ROBOTSTXT_OBEY = False
CONCURRENT_REQUESTS = 30
DOWNLOAD_DELAY = 0
CONCURRENT_REQUESTS_PER_DOMAIN = 1000
ITEM_PIPELINES = {
'service_scraper.pipelines.MongoInsert': 300,}
MONGO_URL="mongodb://52.66.134.142:27017"
EXTENSIONS = {'scrapy.contrib.feedexport.FeedExporter': None}
HTTPCACHE_ENABLED = True
Hi had tried changing CONCURRENT_REQUESTS and
CONCURRENT_REQUESTS_PER_DOMAIN nothing is working
Thanks in advance
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.