2013/11/16 Nick Coghlan <ncogh...@gmail.com>: > To address Serhiy's security concerns with the compression codecs (which are > technically independent of the question of restoring the aliases), I also > plan to document how to systematically blacklist particular codecs in an > application by setting attributes on the encodings module and/or appropriate > entries in sys.modules.
I would be simpler and safer to blacklist bytes=>bytes and str=>str codecs from bytes.decode() and str.encode() directly. Marc Andre Lemburg proposed to add new attributes in CodecInfo to specify input and output types. > The only functional *change* I'd still like to make for 3.4 is to restore > the shorthand aliases for the non-Unicode codecs (to ease the migration for > folks coming from Python 2), but this thread has convinced me I likely need > to write the PEP *before* doing that, and I still have to integrate > ensurepip into pyvenv before the beta 1 deadline. > > So unless you and Victor are prepared to +1 the restoration of the codec > aliases (closing issue 7475) in anticipation of that codecs infrastructure > documentation PEP, the change to restore the aliases probably won't be in > 3.4. (I *might* get the PEP written in time regardless, but I'm not betting > on it at this point). Using StackOverflow search engine, I found some posts where people asks for "hex" codec on Python 3. There are two answers: use binascii module or use codecs.encode(). So even if codecs.encode() was never documented, it looks like it is used. So I now agree that documenting it would not make the situation worse. Adding transform()/untransform() method to bytes and str is a non trivial change and not everybody likes them. Anyway, it's too late for Python 3.4. In my opinion, the best option is to add new input_type/output_type attributes to CodecInfo right now, and modify the codecs so "abc".encode("hex") raises a LookupError (instead of tricky error message with some evil low-level hacks on the traceback and the exception, which is my initial concern in this mail thread). It fixes also the security vulnerability. To keep backward compatibility (even with custom codecs registered manually), if input_type/output_type is not defined, we should consider that the codec is a classical text encoding (encode str=>bytes, decode bytes=>str). The type of codecs.encode() result is my least concern in this topic. I created the following issue to implement my idea: http://bugs.python.org/issue19619 Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com