>>>>> Dave Angel <da...@ieee.org> (DA) wrote: >DA> Piet van Oostrum wrote: >>>>>>>> <snip> >>>>>>>> >>> <snip> >DA> But the raw page didn't have any javascript. So what about that original >DA> raw page triggered additional stuff to be loaded? >DA> Is it "user agent", as someone else brought out? And is there somewhere I >DA> can read more about that aspect of things? I've mostly built very static >DA> html pages, where the server yields the same page to everybody. And some >DA> form stuff, where the user clicks on a 'submit" button to trigger a script >DA> that's not shown on the URL line. >>>> >>> >>> Yes, if you specify a 'normal' web browser as user agent you do get the >>> Javascript: >>> >>> import urllib2 >>> >>> request = >>> urllib2.Request('http://www.marketwatch.com/story/mondays-biggest-gaining-and-declining-stocks-2009-07-27') >>> request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X >>> 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13') >>> >>> opener = urllib2.build_opener() page = opener.open(request).read() >>> print page >>> >>> >DA> Thanks much. That's a key I didn't understand.
You can even specify the headers in the Request constructor: url = 'http://www.marketwatch.com/story/mondays-biggest-gaining-and-declining-stocks-2009-07-27' hdr = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13'} request = urllib2.Request(url = url, headers = hdr) -- Piet van Oostrum <p...@cs.uu.nl> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list