[issue9198] Should repr() print unicode characters outside the BMP?

Ezio Melotti Thu, 08 Jul 2010 03:24:50 -0700

Ezio Melotti <ezio.melo...@gmail.com> added the comment:

Regarding the fonts, I think that who actually uses or needs to use characters 
outside the BMP might have (now or in a few months/years) a font able to 
display them.
I also tried to print the printable chars from U+FFFF to U+1FFFF on my linux 
terminal and about half of them were rendered correctly (the default font is 
DejaVu Sans Mono).


The question is then if we do more harm hiding these chars behind escape 
sequence to the people who use them or hiding the escape sequence behind boxes 
for the people who don't and don't have the right font.


Regarding the categories that should be considered printable, I agree that the 
Zx categories could be considered printable, so the non printable chars could 
be limited to the Cx categories.

> Since we can't apply this check based on a per character basis,
> I think we should only allow non-ASCII code points to pass through
> if sys.stdout/sys.stderr is set to utf-8, utf-16 or utf-32.

If I understood correctly, you are suggesting to look at the 
sys.stdout/sys.stderr encoding and:
 * if it's a UTF-* encoding: allow all the non-ASCII (printable) codepoints 
(because they are the only encodings that can represent all the Unicode 
characters);
 * if it's not a UTF-* encoding: allow only ASCII (printable) codepoints.

This would however introduce a regression. For example on Windows (where the 
encoding is usually not a UTF-* one) I would expect accented characters (at 
least the ones in the codepage I'm using -- and usually it matches the native 
language of the user) to be displayed correctly.
A more accurate approach would be to actually try to encode the string and 
escape only the chars that can't be encoded (and also the one that are not 
printable of course), but this can't be done in repr() because repr() returns a 
Unicode string (in #5110 I did it in sys.displayhook), and encode the string 
there would mean doing it twice.

Also note that I might want to use repr() to get a representation of the object 
without necessarily send it through sys.stdout. For example I could write it on 
a file or send it via mail (Roundup reports errors via mail showing a repr of 
the variables) and in both the cases I might use/want UTF-8 even if sys.stdout 
is ASCII.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9198>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9198] Should repr() print unicode characters outside the BMP?

Reply via email to