OK, I am no longer interested in this topic. If you can't reach agreement, so be it, and then the status quo prevails. I am going to mute this thread. There's no need to explain to me why I am wrong.
On Wed, Jan 31, 2018 at 9:48 AM, Serhiy Storchaka <storch...@gmail.com> wrote: > 31.01.18 18:36, Guido van Rossum пише: > > On Wed, Jan 31, 2018 at 3:03 AM, Serhiy Storchaka <storch...@gmail.com >> <mailto:storch...@gmail.com>> wrote: >> >> 19.01.18 05:51, Guido van Rossum пише: >> >> Can someone explain to me why this is such a controversial issue? >> >> It seems reasonable to me to add new encodings to the stdlib >> that do the roundtripping requested in the first message of the >> thread. As long as they have new names that seems to fall under >> "practicality beats purity". (Modifying existing encodings seems >> wrong -- did the feature request somehow transmogrify into that?) >> >> >> In any case you need to change your code. If add new error handler >> -- you need to change the decoding code to use this error handler: >> >> text = data.decode(encoding, 'whatwgreplace') >> >> If add new encodings -- you need to support an alias table that maps >> standard encoding names to corresponding names of WHATWG encoding: >> >> aliases = {'windows_1252': 'windows-1252-whatwg', >> 'windows_1251': 'windows-1251-whatwg', >> 'utf_8': 'utf-8-whatwg', # utf-8 + surrogatepass >> ... >> } >> ... >> text = data.decode(aliases.get(normalize_encoding(encoding), >> encoding)) >> >> I don't see an advantage of the second approach for the end user. >> And of course it is more costly for maintainers, because we will >> need to implement around 20 new encodings, and adds a cognitive >> burden for new Python users, which now have more tables of encodings >> in the documentation. >> >> >> Hm. As a user, unless I run into problems with a specific encoding, I >> never care about how many encodings we have, so I don't see how adding >> extra encodings bothers those users who have no need for them. >> > > The codecs module documentation contains several tables of encodings: > standard encodings, Python-specific text encodings, binary transforms and > text transforms (a single one). This will add yet one large table. The user > that learn Python will need to learn the difference of these encodings from > others encodings and how to use them correctly. The new user doesn't know > what is important for he, and what he can ignore until he will need it (and > how to know that he needs it). > > There's a reason to prefer new encoding names (maybe augmented with alias >> table) over a new error handler: there are lots of places where encodings >> are passed around via text files, Internet protocols, RPC calls, layers and >> layers of function calls. Many of these treat the encoding as a string, not >> as a (string, errorhandler) pair. So there may be situations where there is >> no way in a given API to preserve the need for using a special error >> handler, while the API would not have a problem preserving just the >> encoding name. >> > > The passed encoding differs from the name of new Python encoding. It is > just 'windows-1252', not 'windows-1252-whatwg'. If just change the existing > encoding, this can break other code that expects the standard > 'windows-1252'. Thus every time when you need 'windows-1252-whatwg' instead > of 'windows-1252' passed with the text, you need to map encoding names. How > this differs from using a special error handler? > > Yet one problem, is that actually we need two error handlers. WHATWG > specifies two behaviors for unmapped codes outside of C0-C1 range: > replacing with a special character or error. This corresponds standard > Python handlers 'replace' and 'strict'. Thus we need either add two new > error handlers 'whatwgreplace' and 'whatwgstrict', or add *two* sets of new > encodings (more than 70 encodings totally!). > > > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/