If you are looking to crawl a url per process why don't you just use 
python-requets? Scrapy is a high performance scraping framework; the best 
way of using it is if you pass it a bunch of urls. A url as an argument is 
not an option, but there are options like scrapy-redis.

https://github.com/darkrho/scrapy-redis

El miércoles, 3 de diciembre de 2014 09:02:35 UTC-2, [email protected] 
escribió:
>
> Hi,
> I am looking for a web-scraping framework that can be both easy to 
> use/configure, and in addition would be part of a large backend. The 
> backend gets thousands of requests per second, and needs to do scraping as 
> part of its logic. We use RabbitMQ and I wonder if scrapy can be part of 
> such a system. So, each request carries a different URL (there is a small 
> set of domains, but the path/query etc is dynamic)
>
> So, I wonder about the following questions:
> 1. Can I pass the URL as an argument? I mean, the spider is known by the 
> domain, but the URL is dynamic, therefore the spider has to get it 
> dynamically
> 2. Integration, performance and scale: I've read that in a running system 
> scrapy can be invoked using scrapyd json API that actually opens up a 
> process. 
> So, in my system that passes lots of Rabbit messages around, the scraping 
> would launch a json request and we'll have lots of concurrent processes. 
> Imagine a single spider tales 2 seconds, then the number of processes would 
> go up until it chokes the server. I fear this model is problematic.
>
> I'd appreciate your advise.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to