I am using scrapy to scrape multiple sites and Scrapyd to place jobs and 
logging.

I had written 7 spiders and each spider processes at least 50 start URLs. I 
have around 7000 URL's. 1000 URL's for each spider.

As i start placing jobs in ScrapyD with 50 start URL's per job. Initially 
all spiders responds fine but suddenly they start working really slow.

Response time for each start URL is really slow after some time.

I have searched enough my settings looks like this:

BOT_NAME = 'service_scraper'

SPIDER_MODULES = ['service_scraper.spiders']
NEWSPIDER_MODULE = 'service_scraper.spiders'


USER_AGENT = 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'

ROBOTSTXT_OBEY = False

CONCURRENT_REQUESTS = 30

DOWNLOAD_DELAY = 0

CONCURRENT_REQUESTS_PER_DOMAIN = 1000


ITEM_PIPELINES = {
   'service_scraper.pipelines.MongoInsert': 300,}

MONGO_URL="mongodb://52.66.134.142:27017"


EXTENSIONS = {'scrapy.contrib.feedexport.FeedExporter': None}


HTTPCACHE_ENABLED = True

Hi had tried changing CONCURRENT_REQUESTS and 
CONCURRENT_REQUESTS_PER_DOMAIN nothing is working

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to