Yes. While scrapy engine get the requests from scheduler first, not from start_urls.
On Saturday, April 2, 2016 at 7:59:53 PM UTC+8, Dimitris Kouzis - Loukas wrote: > > Are you asking for > http://doc.scrapy.org/en/latest/topics/broad-crawls.html ? Finishing all > the start_urls before going wide? > > On Wednesday, March 30, 2016 at 10:52:13 AM UTC+1, Jianhao Chen wrote: >> >> From HERE >> <https://github.com/scrapy/scrapy/blob/master/scrapy/core/engine.py#L121> I >> found that Scrapy engine fetch requests from scheduler before the >> start_urls generated ones. >> >> >> In my usage, I enqueued thousands of start urls (which from various >> domains) to the queue and the crawling goes not so fast (maybe networking >> issues). The problems comes up with me was that the spider itself extracts >> links and follows them, then Scrapy will fetch the requests from scheduler. >> It makes the concurrency lower. >> >> >> I would like to learn about the design purpose of this mechanism. >> BRs. >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
