>>>>> jitu <nair.jiten...@gmail.com> (j) wrote: >j> Hi, >j> A html page contains 'anchor' elements with 'href' attribute having >j> a semicolon in the url , while fetching the page using >j> urllib2.urlopen, all such href's containing 'semicolons' are >j> truncated.
>j> For example the href >http://travel.yahoo.com/p-travelguide-6901959-pune_restaurants-i;_ylt=AlWSqpkpqhICp1lMgChtJkCdGWoL >j> get truncated to >http://travel.yahoo.com/p-travelguide-6901959-pune_restaurants-i >j> The page I am talking about can be fetched from >j> >http://travel.yahoo.com/p-travelguide-485468-pune_india_vacations-i;_ylc=X3oDMTFka28zOGNuBF9TAzI3NjY2NzkEX3MDOTY5NTUzMjUEc2VjA3NzcC1kZXN0BHNsawN0aXRsZQ-- It's not python that causes this. It is the server that sends you the URLs without these parameters (that's what they are). To get them you have to tell the server that you are a respectable browser. E.g. import urllib2 url = 'http://travel.yahoo.com/p-travelguide-6901959-pune_restaurants-i;_ylt=AlWSqpkpqhICp1lMgChtJkCdGWoL' url = 'http://travel.yahoo.com/p-travelguide-485468-pune_india_vacations-i;_ylc=X3oDMTFka28zOGNuBF9TAzI3NjY2NzkEX3MDOTY5NTUzMjUEc2VjA3NzcC1kZXN0BHNsawN0aXRsZQ--' hdrs = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13', 'Accept': 'image/*'} request = urllib2.Request(url = url, headers = hdrs) page = urllib2.urlopen(request).read() -- Piet van Oostrum <p...@cs.uu.nl> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list