Package: youtube-dl Version: 2017.05.18.1-1 Severity: normal joey@darkstar:~>youtube-dl http://debian.org/ [generic] debian: Requesting header [redirect] Following redirect to http://www.debian.org/ [generic] www.debian: Requesting header WARNING: Falling back on generic information extractor. [generic] www.debian: Downloading webpage Traceback (most recent call last): File "/usr/bin/youtube-dl", line 11, in <module> load_entry_point('youtube-dl==2017.5.18.1', 'console_scripts', 'youtube-dl')() File "/usr/lib/python3/dist-packages/youtube_dl/__init__.py", line 465, in main _real_main(argv) File "/usr/lib/python3/dist-packages/youtube_dl/__init__.py", line 455, in _real_main retcode = ydl.download(all_urls) File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1896, in download url, force_generic_extractor=self.params.get('force_generic_extractor', False)) File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 771, in extract_info return self.process_ie_result(ie_result, download, extra_info) File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 832, in process_ie_result extra_info=extra_info) File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 760, in extract_info ie_result = ie.extract(url) File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 433, in extract ie_result = self._real_extract(url) File "/usr/lib/python3/dist-packages/youtube_dl/extractor/generic.py", line 1942, in _real_extract full_response = self._request_webpage(request, video_id) File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 502, in _request_webpage return self._downloader.urlopen(url_or_request) File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 2106, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "/usr/lib/python3.5/urllib/request.py", line 472, in open response = meth(req, response) File "/usr/lib/python3/dist-packages/youtube_dl/utils.py", line 981, in http_response uncompressed = io.BytesIO(gz.read()) File "/usr/lib/python3.5/gzip.py", line 274, in read return self._buffer.read(size) File "/usr/lib/python3.5/gzip.py", line 480, in read raise EOFError("Compressed file ended before the " EOFError: Compressed file ended before the end-of-stream marker was reached
I'm able to reproduce this over an Excede satelite internet connection, but not from a VPS. There's some transparent proxying involved, which is apparently confusing the gzip Content-encoding support in youtube-dl. (I have not seen the transparent proxying cause any other problems with other programs.) Only http urls cause the problem, since https bypasses the transparent proxy. I edited the code to dump out the gzip compressed content it received before it trys to decompress it. joey@darkstar:~>file dump dump: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT) joey@darkstar:~>ls -l dump -rw-r--r-- 1 joey joey 4744 Sep 4 21:00 dump joey@darkstar:~>zcat < dump > data gzip: stdin: unexpected end of file joey@darkstar:~>curl --compressed -so raw http://www.debian.org/ joey@darkstar:~>cmp data raw joey@darkstar:~> So, it's apparently downloaded a gzip compressed chunk of data which contains the whole url, but the gzip data is somehow shady, although not in a way that prevents decompressing the whole page content. I've attached the `dump` file to this bug report. I've also attached a `wireshark.pcapng` which has the curl traffic first followed by youtube-dl. I suspect that the gzip compressed data has a missing gzip footer. Normally, the last 8 bytes of `dump` would be the gzip footer. Those are: 93 C6 FF 00 00 00 FF FF If that were a footer, the size would be 0000FFFF which is not the actual size. And, changing any of these bytes except for the last one exposes parts of the compression dictionary, so they must not be part of the footer, and seem to instead be part of the DEFLATE data. Similarly, looking at the http response to curl, the last 8 bytes of that are 9D 7B D7 66 E6 3B 00 00 Which again does not look like a gzip footer. curl seems to follow Postel's law in handling this, so perhaps youtube-dl should too? -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.11.0-2-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages youtube-dl depends on: ii dpkg 1.18.24 ii python3 3.5.3-3 ii python3-pkg-resources 36.2.7-2 Versions of packages youtube-dl recommends: ii aria2 1.32.0-1 ii ca-certificates 20170717 ii curl 7.55.1-1 ii ffmpeg 7:3.3.3-3 ii libav-tools 7:3.3.3-3 ii mplayer 2:1.3.0-6+b4 ii mpv 0.26.0-3 ii rtmpdump 2.4+20151223.gitfa8646d.1-1+b1 ii wget 1.19.1-4 youtube-dl suggests no packages. -- no debconf information -- see shy jo
dump
Description: Binary data
wireshark.pcapng
Description: Binary data
signature.asc
Description: PGP signature