New submission from Olemis Lang <ole...@gmail.com>: Hello ...
The first thing I have to say is that I searched the open issues and I found nothing similar to what I am going to report hereinafter. If this ticket is duplicate , I apologize ... Yesterday I was testing how to access the wiki pages in a Trac [1]_ site and I realized that something wrong was happening (a bug? ...) Initially the behavior was as follows : {{{ #!python >>> u = urllib.urlopen('http://localhost:8000/trac-dev') >>> u.read() 'Environment not found' >>> u.close() }}} And tracd reported a line like this {{{ 127.0.0.1 - - [25/Jan/2009 17:32:08] "GET http://localhost:8000/trac- dev HTTP/1.0" 404 - }}} Which means that a 'Not found' error code was sent back to urllib client. I tried to access the same page from my browser and tracd reported {{{ 127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 - }}} The problem is obvious ... urllib was sending the full URL after GET and it should send only the string after the network location. I applied the following patch to urllib (yours will be better, I am sure about that ;) {{{ #!diff --- /usr/lib/python2.5/urllib.py 2008-07-31 13:40:40.000000000 -0500 +++ /media/urllib_unix.py 2009-01-26 09:48:54.000000000 -0500 @@ -270,6 +270,7 @@ def open_http(self, url, data=None): """Use HTTP protocol.""" import httplib + from urlparse import urlparse user_passwd = None proxy_passwd= None if isinstance(url, str): @@ -312,12 +313,17 @@ else: auth = None h = httplib.HTTP(host) + target = ''.join(sep + part for sep, part in \ + zip(['', ';', '?', '#'], \ + urlparse(selector)[2:]) \ + if part) + print target if data is not None: - h.putrequest('POST', selector) + h.putrequest('POST', target) h.putheader('Content-Type', 'application/x-www-form- urlencoded') h.putheader('Content-Length', '%d' % len(data)) else: - h.putrequest('GET', selector) + h.putrequest('GET', target) if proxy_auth: h.putheader('Proxy-Authorization', 'Basic %s' % proxy_auth) if auth: h.putheader('Authorization', 'Basic %s' % auth) if realhost: h.putheader('Host', realhost) }}} And everithing was «back» to normal ... {{{ #!python >>> u = urllib.urlopen('http://localhost:8000/trac-dev') >>> u.read() ... # Lots of beautiful HTML code ;) >>> u.close() }}} ... tracd outputted ... {{{ 127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 - }}} The same picture is shown when using both Python 2.5.1 and 2.5.2 ... I have not installed Python 2.6.x so I am not sure about whether this issue has propagated onto newer versions of Python ... and I don't know euther if this issue is also present in urllib2 or not ... ... so further research is needed, but IMO this is a serious bug :( PD: If this is a bug ... how could it be hidden so far ? Is there any test case written to assert this kind of things ? I checked out `test.test_urllib` and `test.test_urllibnet` modules and I saw nothing at all ... .. [1] Trac (http://trac.edgewall.org) ---------- components: Library (Lib) messages: 80586 nosy: olemis severity: normal status: open title: urllib.open sends full URL after GET command instead of local path type: behavior versions: Python 2.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5072> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com