Re: trouble getting google through urllib
Google doesnt like Python scripts. You will need to pretend to be a browser by setting the user-agent string in the HTTP header. and possibly also run the risk of having your system blocked by Google if they figure out you are lying to them? It is possible. I wrote a 'googlewhack' (remember them?) script a while ago, which pretty much downloaded as many google pages as my adsl could handle. And they didn't punish me for it. Although apparently they do issue short term bans on IP's that abuse their service. For Google, that load must be piss in the ocean. I bet for Google to even notice the abuse, it must be something really, really severe. -- mvh Björn -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
BJörn Lindqvist wrote: For Google, that load must be piss in the ocean. I bet for Google to even notice the abuse, it must be something really, really severe. like, say, business? http://scripting.wordpress.com/2006/12/19/scripting-news-for-12192006/#comment-25891 /F -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
Dr. Locke Z2A wrote: Does anyone know how I would get the bot to have permission to get the url? When I put the url in on firefox it works fine. I noticed that in the output html that google gave me it replaced some of the characters in the url with different stuff like the amp and %7C, so I'm thinking thats the problem, does anyone know how I would make it keep the url as I intended it to be? Google doesnt like Python scripts. You will need to pretend to be a browser by setting the user-agent string in the HTTP header. Will McGugan -- blog: http://www.willmcgugan.com -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
Will McGugan [EMAIL PROTECTED] wrote: Dr. Locke Z2A wrote: Does anyone know how I would get the bot to have permission to get the url? When I put the url in on firefox it works fine. I noticed that in the output html that google gave me it replaced some of the characters in the url with different stuff like the amp and %7C, so I'm thinking thats the problem, does anyone know how I would make it keep the url as I intended it to be? Google doesnt like Python scripts. You will need to pretend to be a browser by setting the user-agent string in the HTTP header. and possibly also run the risk of having your system blocked by Google if they figure out you are lying to them? -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
Dr. Locke Z2A wrote: H1Forbidden/H1 Your client does not have permission to get URL code/translate_t?text='%20como%20estas'amp;hl=enamp;langpair=es%7Cenamp;tbb=1/code from this server. Does anyone know how I would get the bot to have permission to get the url? http://www.google.com/terms_of_service.html You may not send automated queries of any sort to Google's system without express permission in advance from Google. official API:s are available here: http://code.google.com/ /F -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
Dr. Locke Z2A [EMAIL PROTECTED] writes: Does anyone know how I would get the bot to have permission to get the url? That's what this was for: http://code.google.com/apis/soapsearch/ -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
Duncan Booth wrote: Google doesnt like Python scripts. You will need to pretend to be a browser by setting the user-agent string in the HTTP header. and possibly also run the risk of having your system blocked by Google if they figure out you are lying to them? It is possible. I wrote a 'googlewhack' (remember them?) script a while ago, which pretty much downloaded as many google pages as my adsl could handle. And they didn't punish me for it. Although apparently they do issue short term bans on IP's that abuse their service. It is best to play nice of course. I would recommend using their official APIs if possible! Will McGugan -- http://www.willmcgugan.com -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
I looked at those APIs and it would appear that SOAP isn't around anymore and there are no APIs for google translate :( Can anyone tell me how to set the user-agent string in the HTTP header? -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble getting google through urllib
On 19 Dec 2006 16:12:59 -0800, Dr. Locke Z2A [EMAIL PROTECTED] wrote: I looked at those APIs and it would appear that SOAP isn't around anymore and there are no APIs for google translate :( Can anyone tell me how to set the user-agent string in the HTTP header? import urllib2 req = urllib2.Request('http://www.google.com') # add 'some' user agent header req.add_header('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.5 Firefox/1.5') up = urllib2.urlopen(req) cheers, amit -- Amit Khemka -- onyomo.com Home Page: www.cse.iitd.ernet.in/~csd00377 Endless the world's turn, endless the sun's Spinning, Endless the quest; I turn again, back to my own beginning, And here, find rest. -- http://mail.python.org/mailman/listinfo/python-list