add delay to start_urls

michael Tue, 07 Oct 2014 13:17:54 -0700

It look like Scrapy just run all start_urls at the same time. How do I tell 
scrapy to start with url1 , wait 30s, then fetch url2


Here is my setting:

AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_DEBUG = True

DOWNLOAD_DELAY = 60
DOWNLOAD_TIMEOUT = 30
CONCURRENT_REQUESTS_PER_DOMAIN = 1
AUTOTHROTTLE_START_DELAY = 10

 
And this is spider

    start_urls = [
        "url1",
        "url2",
        "url3",
        "url4",
        "url5",
     ]


Here is the log:

2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET 
url1> (referer: None)
2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET 
url2> (referer: None)
2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET 
url3> (referer: None)
2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET 
url4> (referer: None)
2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET 
url5> (referer: None)

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

add delay to start_urls

Reply via email to