Marc-Andre Lemburg added the comment: On 27.12.2015 02:05, Serhiy Storchaka wrote: > >> I wonder why this does not trigger the exception. > > Because in case of utf-8 and iso-8859-1 decoding and encoding steps are > omitted. > > In general case the input is decoded from specified encoding and than encoded > to UTF-8 for parser. But for utf-8 and iso-8859-1 encodings the parser gets > the raw data.
Right, but since the tokenizer doesn't know about "utf8" it should reach out to the codec registry to get a properly encoded version of the source code (even though this is an unnecessary round-trip). There are few other aliases for UTF-8 which would likely trigger the same problem: # utf_8 codec 'u8' : 'utf_8', 'utf' : 'utf_8', 'utf8' : 'utf_8', 'utf8_ucs2' : 'utf_8', 'utf8_ucs4' : 'utf_8', ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25937> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com