[issue13997] Clearly explain the bare minimum Python 3 users should know about Unicode

Terry J. Reedy Fri, 17 Feb 2012 14:25:32 -0800

Terry J. Reedy <tjre...@udel.edu> added the comment:

I agree with no new builtin and appreciate that being taken off the table.


I think the place is the Unicode How-to. I think that document should be 
renamed Encodings and Unicode How-to. The reasons are 1) one has to first 
understand the concept of encoding characters and text as numbers, and 2) this 
issue (and the python-ideas discussion) is not about Unicode, but about using 
pre- (and non-)Unicode encodings with Python3's bytes and string types, and how 
that differs in Python3 versus using Python2's unicode and string types. If 
only Unicode encodings were used, with utf-8 dominant on the Internet (and it 
is now most common for web pages), the problems of concern here would not exist.

Learning about Unicode would mean learning about code units versus codepoints, 
normal versus surrogate chars, BMP versus extended chars (all of which are 
non-issues in wide builds and Py 3.3), 256-char planes, BOMs, surrogates, 
normalization forms, and character properties. While sometimes useful, these 
subjects are not the issue here.

----------
nosy: +terry.reedy

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13997>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13997] Clearly explain the bare minimum Python 3 users should know about Unicode

Reply via email to