Package: youtube-dl
Version: 2017.05.18.1-1
Severity: normal

joey@darkstar:~>youtube-dl  http://debian.org/
[generic] debian: Requesting header
[redirect] Following redirect to http://www.debian.org/
[generic] www.debian: Requesting header
WARNING: Falling back on generic information extractor.
[generic] www.debian: Downloading webpage
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 11, in <module>
    load_entry_point('youtube-dl==2017.5.18.1', 'console_scripts', 
'youtube-dl')()
  File "/usr/lib/python3/dist-packages/youtube_dl/__init__.py", line 465, in 
main
    _real_main(argv)
  File "/usr/lib/python3/dist-packages/youtube_dl/__init__.py", line 455, in 
_real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1896, in 
download
    url, force_generic_extractor=self.params.get('force_generic_extractor', 
False))
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 771, in 
extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 832, in 
process_ie_result
    extra_info=extra_info)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 760, in 
extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 
433, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3/dist-packages/youtube_dl/extractor/generic.py", line 
1942, in _real_extract
    full_response = self._request_webpage(request, video_id)
  File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 
502, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 2106, in 
urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 472, in open
    response = meth(req, response)
  File "/usr/lib/python3/dist-packages/youtube_dl/utils.py", line 981, in 
http_response
    uncompressed = io.BytesIO(gz.read())
  File "/usr/lib/python3.5/gzip.py", line 274, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.5/gzip.py", line 480, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

I'm able to reproduce this over an Excede satelite internet connection,
but not from a VPS. There's some transparent proxying involved,
which is apparently confusing the gzip Content-encoding support in
youtube-dl. (I have not seen the transparent proxying cause any
other problems with other programs.) Only http urls cause the problem,
since https bypasses the transparent proxy.

I edited the code to dump out the gzip compressed content it received
before it trys to decompress it.

joey@darkstar:~>file dump
dump: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)
joey@darkstar:~>ls -l dump
-rw-r--r-- 1 joey joey 4744 Sep  4 21:00 dump
joey@darkstar:~>zcat < dump > data
gzip: stdin: unexpected end of file
joey@darkstar:~>curl --compressed -so raw http://www.debian.org/
joey@darkstar:~>cmp data raw
joey@darkstar:~>

So, it's apparently downloaded a gzip compressed chunk of data
which contains the whole url, but the gzip data is somehow shady,
although not in a way that prevents decompressing the whole page
content. I've attached the `dump` file to this bug report.

I've also attached a `wireshark.pcapng` which has the curl traffic
first followed by youtube-dl.

I suspect that the gzip compressed data has a missing gzip footer.
Normally, the last 8 bytes of `dump` would be the gzip footer. Those are:
93 C6 FF 00 00 00 FF FF
If that were a footer, the size would be 0000FFFF which is not the
actual size. And, changing any of these bytes except for the last one
exposes parts of the compression dictionary, so they must not be
part of the footer, and seem to instead be part of the DEFLATE data.

Similarly, looking at the http response to curl, 
the last 8 bytes of that are
9D 7B D7 66 E6 3B 00 00
Which again does not look like a gzip footer.

curl seems to follow Postel's law in handling this, so perhaps youtube-dl
should too?

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.11.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), 
LANGUAGE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages youtube-dl depends on:
ii  dpkg                   1.18.24
ii  python3                3.5.3-3
ii  python3-pkg-resources  36.2.7-2

Versions of packages youtube-dl recommends:
ii  aria2            1.32.0-1
ii  ca-certificates  20170717
ii  curl             7.55.1-1
ii  ffmpeg           7:3.3.3-3
ii  libav-tools      7:3.3.3-3
ii  mplayer          2:1.3.0-6+b4
ii  mpv              0.26.0-3
ii  rtmpdump         2.4+20151223.gitfa8646d.1-1+b1
ii  wget             1.19.1-4

youtube-dl suggests no packages.

-- no debconf information

-- 
see shy jo

Attachment: dump
Description: Binary data

Attachment: wireshark.pcapng
Description: Binary data

Attachment: signature.asc
Description: PGP signature

Reply via email to