sometimes I have to visit a URL to populate a cookie, then go for other URLS. the solution is to have only the first URL in start_urls, then have the parse method return the list of other urls to visit.
Em segunda-feira, 13 de outubro de 2014 14h56min30s UTC-3, Nicolás Alejandro Ramírez Quiros escreveu: > > If they are from different domains override start_requests and use > meta['download_slot'] = <some_name> > > El martes, 7 de octubre de 2014 18:17:11 UTC-2, [email protected] > escribió: >> >> It look like Scrapy just run all start_urls at the same time. How do I >> tell scrapy to start with url1 , wait 30s, then fetch url2 >> >> Here is my setting: >> >> AUTOTHROTTLE_ENABLED = True >> AUTOTHROTTLE_DEBUG = True >> >> DOWNLOAD_DELAY = 60 >> DOWNLOAD_TIMEOUT = 30 >> CONCURRENT_REQUESTS_PER_DOMAIN = 1 >> AUTOTHROTTLE_START_DELAY = 10 >> >> >> And this is spider >> >> start_urls = [ >> "url1", >> "url2", >> "url3", >> "url4", >> "url5", >> ] >> >> >> Here is the log: >> >> 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET >> url1> (referer: None) >> 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET >> url2> (referer: None) >> 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET >> url3> (referer: None) >> 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET >> url4> (referer: None) >> 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET >> url5> (referer: None) >> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
