Try: import re import urllib2 url = 'http://www.google.com/search?num=20&hl=en&q=ipod&btnG=Search' user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers = {'User-Agent' : user_agent} req = urllib2.Request(url, None, headers) file_source=open("google_source.txt", 'w') file_source.write(urllib2.urlopen(req).read()) file_source.close()
I think Google blocks the User-Agent urllib2 sends. --Jonas Galvez, http://jonasgalvez.com.br/log On Thu, Jul 3, 2008 at 3:52 AM, spandana g <[EMAIL PROTECTED]> wrote: > Hello , > > I have written a code to get the page source of the google search > page .. this is working for other urls. I have this problem with > > import re > from urllib2 import urlopen > string='http://www.google.com/search?num=20&hl=en&q=ipod&btnG=Search' > file_source=file("google_source.txt",'w') > file_source.write(urlopen(string).read()) > page_content=file_source.readlines() > > Traceback (most recent call last) : > File "C:/Python25/google.py", line 5,in <module> > file_source.write(urlopen(string).read()) > File "C:\Python25\lib\urllib2.py", line 124 , in urlopen > return__opener.open(url, data) > File "C:\Python25\lib\urllib2.py", line 387 , in open > response =meth(req, response) > File "C:\Python25\lib\urllib2.py", line 498 , in http_response > 'http', request, response, code, msg, hdrs) > File "C:\Python25\lib\urllib2.py", line 425, in error > return self._call_chain(*args) > File "C:\Python25\lib\urllib2.py", line 360, in __call_chain > result = func(*args) > File "C:\Python25\lib\urllib2.py", line 506, in http_error_default > raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) > HTTPError: HTTP Error 403: Forbidden > > Actually urlopen is working for google labs sets page but not for the > google.com and even I have same problem with wikipedia . Please let me know > .. If any one of have any idea about this . > > Thank You, > Spandana. > > > > > > > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list