STINNER Victor <victor.stin...@haypocalc.com> added the comment: On Friday 19 November 2010 20:42:53 you wrote: > Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment: > > I don't understand Victor's argument in msg115889. According to UTF-8 RFC, > <http://www.ietf.org/rfc/rfc2279.txt>: > > - US-ASCII values do not appear otherwise in a UTF-8 encoded > character stream. This provides compatibility with file systems > or other software (e.g. the printf() function in C libraries) that > parse based on US-ASCII values but are transparent to other > values.
Most C functions including printf works on multi*byte* strings, not on (wide) character strings. Whereas PyUnicode_FromFormatV() converts the format string (bytes) to unicode (characters). If you would like a comparaison in C, it's like printf()+mbstowcs() in the same function. > This means that printf-like formatters should not care whether the format > string is in UTF-8, Latin1, or any other ASCII-compatible 8-bit encoding. It's maybe true with bytes input and bytes output (eg. PyString_FromFormatV() of Python2), but it's no more true with bytes input and str output (eg. PyUnicode_FromFormatV() of Python3). > It is also fairly simple to ssnity-check for UTF-8 if necessary, but in > case of PyUnicode_FromFormat, the resulting string will be decoded as > UTF-8, so all characters in the format string will be checked anyways. I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long (210 lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas ASCII decode is just: "unicode_char = (Py_UNICODE)byte;" + an if before to check that 0 <= byte <= 127). Nobody noticed my change just because the whole Python code base only uses ASCII argument for the format argument of PyUnicode_FromFormatV(). Victor ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9769> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com