Bug#913530: [Python-modules-team] Bug#913530: crashes because of html5lib incompatibility

2018-11-11 Thread Nicolas Dandrimont
Control: tags -1 + unreproducible moreinfo

Hi!

* Antoine Beaupre  [2018-11-11 17:14:52 -0500]:

> Package: python3-bleach
> Version: 2.1.3-1
> Severity: critical
> 
> In current Debian buster, with the Python 3.6 interpreter, bleach
> completely fails to load as a module:
> 
> $ python3
> Python 3.6.7 (default, Oct 21 2018, 08:08:16) 
> [GCC 8.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import bleach
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/lib/python3/dist-packages/bleach/__init__.py", line 8, in 
> 
> from bleach.linkifier import (
>   File "/usr/lib/python3/dist-packages/bleach/linkifier.py", line 7, in 
> 
> from html5lib.filters.sanitizer import allowed_protocols
> ImportError: cannot import name 'allowed_protocols'

On an up-to-date sid system, at least with the same package versions as your
system:

$ python3.6
Python 3.6.7 (default, Oct 21 2018, 08:08:16) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bleach
>>> bleach.__file__
'/usr/lib/python3/dist-packages/bleach/__init__.py'
>>> import html5lib
>>> html5lib.__file__
'/usr/lib/python3/dist-packages/html5lib/__init__.py'
>>> from html5lib.filters.sanitizer import allowed_protocols
>>> allowed_protocols
frozenset({'news', 'callto', 'telnet', 'xmpp', 'rsync', 'data', 'nntp', 'irc', 
'gopher', 'feed', 'ed2k', 'https', 'webcal', 'urn', 'ftp', 'rtsp', 'afs', 
'http', 'tag', 'mailto', 'ssh', 'aim', 'sftp'})

The debci tests for python-readme-renderer, which itself imports bleach, are
also happy (and have always been since the upload of readme-renderer in April).

Please show what html5lib.__file__ returns on your system?

> [...]
>
> The simplest fix for this would probably be to upgrade bleach to the
> latest release. Indeed, this command works around the problem
> completely:
> 
> sudo pip install bleach

I have the feeling this may be your problem; Don't ever ever install packages
system-wide with pip, as they will shadow the modules that are installed by
Debian, and will never be upgraded, which will lead to all kinds of
weirdnesses.

If my intuition is correct, and an old html5lib has been installed by pip,
you'll want to purge the contents of /usr/local/lib/python3*/dist-packages/ (or
~/.local/lib/python3.6/dist-packages) to fix your system.

Hope this helps,
-- 
Nicolas Dandrimont

The nice thing about Windows is - It does not just crash, it displays a
dialog box and lets you press 'OK' first.
(Arno Schaefer's .sig)


signature.asc
Description: PGP signature


Bug#913530: crashes because of html5lib incompatibility

2018-11-11 Thread Antoine Beaupre
Package: python3-bleach
Version: 2.1.3-1
Severity: critical

In current Debian buster, with the Python 3.6 interpreter, bleach
completely fails to load as a module:

$ python3
Python 3.6.7 (default, Oct 21 2018, 08:08:16) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bleach
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3/dist-packages/bleach/__init__.py", line 8, in 
from bleach.linkifier import (
  File "/usr/lib/python3/dist-packages/bleach/linkifier.py", line 7, in 
from html5lib.filters.sanitizer import allowed_protocols
ImportError: cannot import name 'allowed_protocols'

This wouldn't be such a big problem if bleach wasn't included by other
packages, like readme_renderer, the latter of which hooks into
distutils if installed. This means that basically *any* setup.py
script that looks for extra packages will crash, making unrelated
software on Debian completely break (hence the "critical"
severity). For example, here's feed2exec failing to run its test suite
under tox:

curie:feed2exec130$ tox
GLOB sdist-make: /home/anarcat/src/feed2exec/setup.py
ERROR: invocation failed (exit code 1), logfile: 
/home/anarcat/src/feed2exec/.tox/log/tox-0.log
ERROR: actionid: tox
msg: packaging
cmdargs: ['/usr/bin/python3', local('/home/anarcat/src/feed2exec/setup.py'), 
'sdist', '--formats=zip', '--dist-dir', 
local('/home/anarcat/src/feed2exec/.tox/dist')]
env: None

running sdist
running egg_info
writing feed2exec.egg-info/PKG-INFO
writing dependency_links to feed2exec.egg-info/dependency_links.txt
writing entry points to feed2exec.egg-info/entry_points.txt
writing requirements to feed2exec.egg-info/requires.txt
writing top-level names to feed2exec.egg-info/top_level.txt
writing manifest file 'feed2exec.egg-info/SOURCES.txt'
running check
Traceback (most recent call last):
  File "setup.py", line 153, in 
classifiers=classifiers,
  File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 140, in 
setup
return distutils.core.setup(**attrs)
  File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
  File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
  File "/usr/lib/python3/dist-packages/setuptools/command/sdist.py", line 52, 
in run
self.run_command(cmd_name)
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 972, in run_command
cmd_obj = self.get_command_obj(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 846, in get_command_obj
klass = self.get_command_class(command)
  File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 635, in 
get_command_class
self.cmdclass[command] = cmdclass = ep.load()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2343, 
in load
return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2349, 
in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File 
"/usr/lib/python3/dist-packages/readme_renderer/integration/distutils.py", line 
24, in 
from ..rst import render
  File "/usr/lib/python3/dist-packages/readme_renderer/rst.py", line 23, in 

from .clean import clean
  File "/usr/lib/python3/dist-packages/readme_renderer/clean.py", line 18, in 

import bleach
  File "/usr/lib/python3/dist-packages/bleach/__init__.py", line 8, in 
from bleach.linkifier import (
  File "/usr/lib/python3/dist-packages/bleach/linkifier.py", line 7, in 
from html5lib.filters.sanitizer import allowed_protocols
ImportError: cannot import name 'allowed_protocols'

ERROR: FAIL could not package project - v = InvocationError('/usr/bin/python3 
/home/anarcat/src/feed2exec/setup.py sdist --formats=zip --dist-dir 
/home/anarcat/src/feed2exec/.tox/dist (see 
/home/anarcat/src/feed2exec/.tox/log/tox-0.log)', 1)

(and before we point the finger at python3-readme-renderer, let's just
remember it's a dependency of twine, which is an important tool to
talk with pip. uninstalling it is possible, but severely handicaps
developers as well.)

html5lib seems to like to change its public API like this
gratiously. I've seen similar errors in unrelated packages in my
search for this bug:

https://github.com/ArchiveTeam/wpull/issues/332
https://github.com/tensorflow/tensorboard/issues/588

The former "fixed" the issue by limiting the html5lib to pre-1.0
releases, the latter by vendoring html5lib, none of which seem like a
satisfactory solution.

Upstream bleach also "fixed" this by vendoring html5lib 1.0.1, in
their 3.0 version released earlier in october:

https://github.com/mozilla/bleach/issues/386

I can confirm that the `allowed_protocols` name is not exported by the
1.0.1 version of