STINNER Victor added the comment: > The reason for the problem is the UTF-8 decoder (and other > decoders) expecting an extension to the codec decoder API, > which are not implemented in its StreamReader class (it simply > uses the base class). It's not a problem of the base class, but > that of the codec. > > And no: it doesn't have anything to do with codec.open() > or the StreamReaderWriter class.
open("document.txt", encoding="utf-8") uses IncrementalDecoder of encodings.utf_8. This object doesn't seem to have the discussed issue. IMHO the issue is that StreamReader doesn't use an incremental decoder. I don't see how it could support multibyte encodings and error handlers without an incremental decoder. I like TextIOWrapper design between it only handles codecs and text buffering. Bytes buffering is done at lower-level in a different object. I'm not confortable to modify StreamReader because it combines TextIOWrapper with BufferedReader and so is more complex. >> I propose to modify codecs.open() to reuse the io module: call io.open() >> with newline=''. The io module is now battle-tested and handles well many >> corner cases of incremental codecs with multibyte encodings. > > -1. People who want to use the io module should use it directly. When porting code to Python 3, many people chose to use codecs.open() to get text files using a single code base for Python 2 and Python 3. Once the code is ported, I don't expect that anyone will replace codecs.open() with io.open(). You know, nobody cares of the technical debt... >> The next step would be to deprecate the codecs.StreamReaderWriter class and >> the codecs.open(). But my latest attempt to deprecate them was the PEP 400 >> and it wasn't a full success, so I now prefer to move step by step :-) > > I'm still -1 on the deprecations in PEP 400. You are essentially > suggesting to replace the complete codecs subsystem with the > io module, but forgetting that all codecs use StreamWriter and > StreamReader as base classes. You can elaborate on "all codecs use StreamWriter and StreamReader as base classes". Only codecs.open() uses StreamReader and StreamWriter, no? All codecs implement a StreamReader and StreamWriter class, but my question is how use these classes? > The codecs sub system has a clean design. If used correctly > and maintained with more care, it works really well. It seems like we lack such maintainer, since I wrote the PEP, many issues are still open: http://bugs.python.org/issue7262 http://bugs.python.org/issue8630 http://bugs.python.org/issue10344 http://bugs.python.org/issue12508 http://bugs.python.org/issue12512 See also issue #5445 (wontfix, whereas TextIOWrapper.writeslines() uses "for line in lines") and issue #12513 (this one is not fair, io also has the same bug: issue #12215 :-)). > I'm tired of having to fight these fights every few years. > Can't we just stop having them, please ? The status quo is to do nothing, but as a consequence, bugs are still not fixed yet, and users are still affected by these bugs :-( I'm trying to find a solution. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29783> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com