[issue4631] urlopen returns extra, spurious bytes
Antoine Pitrou pit...@free.fr added the comment: I took a look at the patch and it looks ok, apart from the _checkClosed() hack (but I don't think there's any immediate solution). It should be noted that HTTPResponse.readline() will be awfully slow since, as HTTPResponse doesn't have peek(), readline() will call read() one byte at a time... (slow I/O is nothing new in py3k, however :-)) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Antoine Pitrou pit...@free.fr added the comment: Here is a patch without the _checkClosed() hack. The solution is simply to remove redundant _checkClosed() calls in IOBase (for example, readline() doesn't need to do an explicit `closed` check as it calls read()). Added file: http://bugs.python.org/file13021/urllib-chunked2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Antoine Pitrou pit...@free.fr: -- resolution: - accepted status: open - pending ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Antoine Pitrou pit...@free.fr added the comment: Committed in r69513, r69514. Thanks everyone! -- resolution: accepted - fixed status: pending - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Antoine Pitrou pit...@free.fr added the comment: On the principle, the test looks good. If you want to avoid the 'if % in value' hack, you can use the named-parameter form of string formatting: localhost:%(port)s % dict(port=8080) 'localhost:8080' localhost % dict(port=8080) 'localhost' ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Daniel Diniz aja...@gmail.com added the comment: Antoine, Thanks for reviewing, here's an updated version. Added file: http://bugs.python.org/file12988/test_urllib_chunked2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Daniel Diniz aja...@gmail.com: Removed file: http://bugs.python.org/file12975/test_urllib_chunked.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Antoine Pitrou pit...@free.fr added the comment: The test looks good to me. I can't comment on the bugfix patch, but if it's ok to you, you can go ahead :) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Daniel Diniz aja...@gmail.com added the comment: Here's a test (in test_urllib2_localnet) that fails before the patch and passes after, mostly lifted from test_httplib: def test_chunked(self): expected_response = bhello world chunked_start = ( b'a\r\n' b'hello worl\r\n' b'1\r\n' b'd\r\n' ) response = [(200, [(Transfer-Encoding, chunked)], chunked_start)] handler = self.start_server(response) data = self.urlopen(http://localhost:%s/; % handler.port) self.assertEquals(data, expected_response) Output: test test_urllib2_localnet failed -- Traceback (most recent call last): File ~/py3k/Lib/test/test_urllib2_localnet.py, line 390, in test_chunked self.assertEquals(data, expected_response) AssertionError: b'a\r\nhello worl\r\n1\r\nd\r\n' != b'hello world' To allow this test to work, the attached patch also touches FakeHTTPRequestHandler and TestUrlopen.urlopen. Added file: http://bugs.python.org/file12975/test_urllib_chunked.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Te-jé Rodgers cont...@tejerodgers.com: -- nosy: +trodgers ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Jean-Paul Calderone exar...@divmod.com: -- nosy: -exarkun ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Craig Holmquist craigh...@gmail.com: -- nosy: +craigh ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Martin v. Löwis mar...@v.loewis.de: -- priority: critical - release blocker ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Antoine Pitrou pit...@free.fr added the comment: The patch should have at least a test so that we don't have a regression on this one. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Jeremy Hylton jer...@alum.mit.edu added the comment: I have a patch here that seems to work for the specific url and that passes all the tests. Can anyone check whether it works for a larger set of cases? I'm a little concerned because I don't understand the new io library in much detail. There's an override for _checkClosed() in the HTTPResponse that seems a little dodgy. I'll try to get someone to review that specifically. Added file: http://bugs.python.org/file12361/urllib-chunked.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Daniel Diniz aja...@gmail.com added the comment: I think your patch is good, but there may be another bug around: I wrote a script to check results of 3.x against 2.x, but many pages (http://groups.google.com/, http://en.wikipedia.org/) give 403: Forbidden for 3.x... but work with 2.x! If you think of this as a bug in 3.x, it could retry the request identifying as 2.x on 403. Other than that, your patch gives me identical results to 2.5/2.6 for 128 sites I tested (only a read(100) for each). Interestingly, my patched version gives a file closer to the buggy version in size, at 12700 bytes versus 12707. Your version agrees with 2.x and simple maths (128 x 100) in giving a 12799 bytes result. I have no idea why. HTH, Daniel ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Adeodato Simó d...@net.com.org.es added the comment: Does the same thing happen with 2.6? No, I can't reproduce with 2.6.1. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Antoine Pitrou pit...@free.fr: -- priority: - critical type: - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Resul Cetin resul-ce...@gmx.net added the comment: I have the same problem with that code: (exchange USERNAME with your delicious username and PASSWORD with your delicious password): import urllib.request auth_handler = urllib.request.HTTPBasicAuthHandler() auth_handler.add_password('del.icio.us API', 'api.del.icio.us', USERNAME, PASSWORD) opener = urllib.request.build_opener(auth_handler) print(str(opener.open('https://api.del.icio.us/v1/posts/all').read(20), utf-8)) And I don't use a proxy or anything like that. This makes python 3 completely unusable for me. And python 2.6 gives me what I want (the content of that virtual file) without any extra data in front or in the middle of the content. -- nosy: +ResulCetin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Adeodato Simó d...@net.com.org.es added the comment: FWIW, there are trailing spurious bytes too And in the middle of the document as well. Each time there's a chunk, I guess? ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Daniel Diniz aja...@gmail.com added the comment: Clarifying the diagnosis, the offending spurious bytes are only present when we use 3.0's GET above. That's because urllib.request.HTTPHandler asks for a vanilla http.client.HTTPConnection, which uses HTTP 1.1. IIUC, either we change the request version back to 1.0 (attached patch) or correct the way the response is processed (is it at all?). I think HTTPSHandler will also suffer from this, perhaps [Fancy]URLopener too. [Antoine: cool, an edit conflict that agrees with what I was about to post :D] -- keywords: +patch Added file: http://bugs.python.org/file12351/urllib_bytes.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Changes by Jeremy Hylton jer...@alum.mit.edu: -- assignee: - jhylton ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Jeremy Hylton jer...@alum.mit.edu added the comment: Brief update: The Python 2.x code works because readline() is provided by socket._fileobject. The Python 3.x code fails because it grabs the HTTPResponse.fp instance variable at the end of AbstractHTTPHandler.do_open. That method needs to pass the response to addinfourl(), but needs to have support for readline / readlines before it can do that. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: [issue4631] urlopen returns extra, spurious bytes
Does the same thing happen with 2.6? Jeremy On Thu, Dec 11, 2008 at 8:53 AM, Jean-Paul Calderone rep...@bugs.python.org wrote: Jean-Paul Calderone exar...@divmod.com added the comment: The f65 is the chunk length for the first chunk returned when requesting that URL. A proxy could easily hide this by switching to a different transfer encoding. -- nosy: +exarkun ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
New submission from Adeodato Simó [EMAIL PROTECTED]: This is very odd, but it was reproduced by people in #python as well. Compare, in python 2.5: urllib.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readline() 'From [EMAIL PROTECTED] Tue Dec 11 11:32:47 2001\n' To the equivalent in python 3.0: urllib.request.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readline() b'f65\r\n' -- components: Library (Lib) messages: 77603 nosy: dato severity: normal status: open title: urlopen returns extra, spurious bytes versions: Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4631] urlopen returns extra, spurious bytes
Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment: I don't reproduce the problem: urllib.request.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readline() b'From [EMAIL PROTECTED] Tue Dec 11 11:32:47 2001\n' I connect through a http proxy. -- nosy: +amaury.forgeotdarc ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4631 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com