[issue9769] PyUnicode_FromFormatV() doesn't handle non-ascii text correctly

Alexander Belopolsky Fri, 19 Nov 2010 11:42:59 -0800

Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:


I don't understand Victor's argument in msg115889.  According to UTF-8 RFC, 
<http://www.ietf.org/rfc/rfc2279.txt>:

   -  US-ASCII values do not appear otherwise in a UTF-8 encoded
      character stream.  This provides compatibility with file systems
      or other software (e.g. the printf() function in C libraries) that
      parse based on US-ASCII values but are transparent to other
      values.

This means that printf-like formatters should not care whether the format 
string is in UTF-8, Latin1, or any other ASCII-compatible 8-bit encoding.  
(Passing in multibyte encoding pretending to be bytes would of course lead to 
havoc, but C type system will protect you from that.)

It is also fairly simple to ssnity-check for UTF-8 if necessary, but in case of 
PyUnicode_FromFormat, the resulting string will be decoded as UTF-8, so all 
characters in the format string will be checked anyways.

Am I missing something?

----------
nosy: +belopolsky
status: closed -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9769>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9769] PyUnicode_FromFormatV() doesn't handle non-ascii text correctly

Reply via email to