Rogério Brito wrote: > I believe that you meant to file this as a Python bug and I think that the > severity is, quite frankly, lower than normal...
I don't think this is a python bug. It's reasonable for pythons's gzip library to fail when presented with corrupted data. It does not know it's being used to download an url. Perhaps it should have a mode where it tries to extract as much data is it can, in case its caller wants to try to be robust. I think this is a bug in youtube-dl though, because of this code: std_headers = { ... 'Accept-Encoding': 'gzip, deflate', } if resp.headers.get('Content-encoding', '') == 'gzip': content = resp.read() gz = gzip.GzipFile(fileobj=io.BytesIO(content), mode='rb') try: uncompressed = io.BytesIO(gz.read()) except IOError as original_ioerror: # There may be junk add the end of the file # See http://stackoverflow.com/q/4928560/35070 for details for i in range(1, 1024): try: gz = gzip.GzipFile(fileobj=io.BytesIO(content[:-i]), mode='rb') uncompressed = io.BytesIO(gz.read()) except IOError: continue break else: raise original_ioerror It's encouraging gzip to be used (rather than deflate or no compression), and it already contains workarounds for similar problems. This code smells. There is probably a python library that implements this robustly. I tried python-urllib3: joey@darkstar:~>python Python 2.7.14 (default, Sep 17 2017, 18:50:44) [GCC 7.2.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib3 >>> http = urllib3.PoolManager() >>> headers = {'Accept-Encoding': 'gzip'} >>> r = http.request('GET', 'http://www.debian.org/', headers=headers) >>> r.headers.get("Content-Encoding") 'gzip' >>> len(r.data) 14871 So that seems to work. I think because it uses zlib to decompress the data, not gzip. -- see shy jo
signature.asc
Description: PGP signature