Are you asking for http://doc.scrapy.org/en/latest/topics/broad-crawls.html ? Finishing all the start_urls before going wide?
On Wednesday, March 30, 2016 at 10:52:13 AM UTC+1, Jianhao Chen wrote: > > From HERE > <https://github.com/scrapy/scrapy/blob/master/scrapy/core/engine.py#L121> I > found that Scrapy engine fetch requests from scheduler before the > start_urls generated ones. > > > In my usage, I enqueued thousands of start urls (which from various > domains) to the queue and the crawling goes not so fast (maybe networking > issues). The problems comes up with me was that the spider itself extracts > links and follows them, then Scrapy will fetch the requests from scheduler. > It makes the concurrency lower. > > > I would like to learn about the design purpose of this mechanism. > BRs. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
