>From HERE 
<https://github.com/scrapy/scrapy/blob/master/scrapy/core/engine.py#L121> I 
found that Scrapy engine fetch requests from scheduler before the start_urls 
generated 
ones.


In my usage, I enqueued thousands of start urls (which from various 
domains) to the queue and the crawling goes not so fast (maybe networking 
issues). The problems comes up with me was that the spider itself extracts 
links and follows them, then Scrapy will fetch the requests from scheduler. 
It makes the concurrency lower.


I would like to learn about the design purpose of this mechanism.
BRs.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to