It  might be slightly off topic but as you had no success with scrappy:

For Google there is an open source PHP scraper which works very well for 
the purpose: http://scraping.compunect.com
I also stumbled over a few others but this one is currently the best and 
it's kept updated.

It has proper IP management, local caching, DOM parsing and other features. 
Quite all you need actually.

In general scraping google is not impossible but they tend to block IP 
addresses very fast if they are abused for automated access, that PHP 
scraper is using proxies and a hard rate limitation to avoid anoying Google.


Am Dienstag, 12. Februar 2013 20:12:26 UTC+1 schrieb elio:
>
> Hi, 
>
> I have to find websites that expose the usage of certain cloud 
> services in their html. And I  am trying with scrapy. 
>
> piece of code: 
> class GoogleSpider(BaseSpider): 
>     name = "google" 
>     url = 'https://www.google.com/#q=elio' 
>     rules = (Rule(SgmlLinkExtractor('//h3'),callback="parse_item"),) 
>
>     def parse_item(self, response): 
>         hxs = HtmlXPathSelector(response) 
>
> With: 
> BOT_NAME = 'MOZILLA' 
> BOT_VERSION = '7.0' 
> But google give no results. 
>
> Thanks, 
> Elio 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to