Re: trouble getting google through urllib

2006-12-20 Thread BJörn Lindqvist
   Google doesnt like Python scripts. You will need to pretend to be a
   browser by setting the user-agent string in the HTTP header.
  
  and possibly also run the risk of having your system blocked by Google if
  they figure out you are lying to them?

 It is possible. I wrote a 'googlewhack' (remember them?) script a while
 ago, which pretty much downloaded as many google pages as my adsl could
 handle. And they didn't punish me for it. Although apparently they do
 issue short term bans on IP's that abuse their service.

For Google, that load must be piss in the ocean. I bet for Google to
even notice the abuse, it must be something really, really severe.

-- 
mvh Björn
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-20 Thread Fredrik Lundh
BJörn Lindqvist wrote:

 For Google, that load must be piss in the ocean. I bet for Google to
 even notice the abuse, it must be something really, really severe.

like, say, business?

http://scripting.wordpress.com/2006/12/19/scripting-news-for-12192006/#comment-25891

/F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-19 Thread Will McGugan
Dr. Locke Z2A wrote:

 Does anyone know how I would get the bot to have permission to get the
 url? When I put the url in on firefox it works fine. I noticed that in
 the output html that google gave me it replaced some of the characters
 in the url with different stuff like the amp and %7C, so I'm
 thinking thats the problem, does anyone know how I would make it keep
 the url as I intended it to be?
 

Google doesnt like Python scripts. You will need to pretend to be a 
browser by setting the user-agent string in the HTTP header.

Will McGugan
-- 
blog: http://www.willmcgugan.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-19 Thread Duncan Booth
Will McGugan [EMAIL PROTECTED] wrote:

 Dr. Locke Z2A wrote:
 
 Does anyone know how I would get the bot to have permission to get the
 url? When I put the url in on firefox it works fine. I noticed that in
 the output html that google gave me it replaced some of the characters
 in the url with different stuff like the amp and %7C, so I'm
 thinking thats the problem, does anyone know how I would make it keep
 the url as I intended it to be?
 
 
 Google doesnt like Python scripts. You will need to pretend to be a 
 browser by setting the user-agent string in the HTTP header.
 
and possibly also run the risk of having your system blocked by Google if 
they figure out you are lying to them?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-19 Thread Fredrik Lundh
Dr. Locke Z2A wrote:

 H1Forbidden/H1
 Your client does not have permission to get URL
 code/translate_t?text='%20como%20estas'amp;hl=enamp;langpair=es%7Cenamp;tbb=1/code
 from this server.

 Does anyone know how I would get the bot to have permission to get the
 url? 

 http://www.google.com/terms_of_service.html

 You may not send automated queries of any sort to Google's
 system without express permission in advance from Google.

official API:s are available here:

 http://code.google.com/

/F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-19 Thread Paul Rubin
Dr. Locke Z2A [EMAIL PROTECTED] writes:
 Does anyone know how I would get the bot to have permission to get the url?

That's what this was for:

http://code.google.com/apis/soapsearch/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-19 Thread Will McGugan

Duncan Booth wrote:

 
  Google doesnt like Python scripts. You will need to pretend to be a
  browser by setting the user-agent string in the HTTP header.
 
 and possibly also run the risk of having your system blocked by Google if
 they figure out you are lying to them?

It is possible. I wrote a 'googlewhack' (remember them?) script a while
ago, which pretty much downloaded as many google pages as my adsl could
handle. And they didn't punish me for it. Although apparently they do
issue short term bans on IP's that abuse their service.

It is best to play nice of course. I would recommend using their
official APIs if possible!


Will McGugan
--
http://www.willmcgugan.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-19 Thread Dr. Locke Z2A
I looked at those APIs and it would appear that SOAP isn't around
anymore and there are no APIs for google translate :(  Can anyone tell
me how to set the user-agent string in the HTTP header?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble getting google through urllib

2006-12-19 Thread Amit Khemka
On 19 Dec 2006 16:12:59 -0800, Dr. Locke Z2A [EMAIL PROTECTED] wrote:
 I looked at those APIs and it would appear that SOAP isn't around
 anymore and there are no APIs for google translate :(  Can anyone tell
 me how to set the user-agent string in the HTTP header?

import urllib2
req = urllib2.Request('http://www.google.com')
# add 'some' user agent header
req.add_header('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7.8) Gecko/20050524 Fedora/1.5 Firefox/1.5')
up = urllib2.urlopen(req)

cheers,
amit
-- 

Amit Khemka -- onyomo.com
Home Page: www.cse.iitd.ernet.in/~csd00377
Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.
-- 
http://mail.python.org/mailman/listinfo/python-list