Re: Why engine fetch requests from scheduler first other than the start_urls generated ones?

Dimitris Kouzis - Loukas Sat, 02 Apr 2016 05:00:05 -0700

Are you asking for http://doc.scrapy.org/en/latest/topics/broad-crawls.html 
? Finishing all the start_urls before going wide?


On Wednesday, March 30, 2016 at 10:52:13 AM UTC+1, Jianhao Chen wrote:
>
> From HERE 
> <https://github.com/scrapy/scrapy/blob/master/scrapy/core/engine.py#L121> I 
> found that Scrapy engine fetch requests from scheduler before the 
> start_urls generated ones.
>
>
> In my usage, I enqueued thousands of start urls (which from various 
> domains) to the queue and the crawling goes not so fast (maybe networking 
> issues). The problems comes up with me was that the spider itself extracts 
> links and follows them, then Scrapy will fetch the requests from scheduler. 
> It makes the concurrency lower.
>
>
> I would like to learn about the design purpose of this mechanism.
> BRs.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Why engine fetch requests from scheduler first other than the start_urls generated ones?

Reply via email to