[issue21667] Clarify status of O(1) indexing semantics of str objects

Nick Coghlan Thu, 05 Jun 2014 05:34:39 -0700

Nick Coghlan added the comment:

If someone doesn't understand what "Unicode code point" means, that's going to 
be the least of their problems when it comes to implementing a conformant 
Python implementation. We could link to 
http://unicode.org/glossary/#code_point, but that doesn't really add much 
beyond "value from 0 to 0x10FFFF". If you try to dive into the formal Unicode 
spec instead, you end up in a twisty maze of definitions of things that are all 
closely related, but generally not the same thing (code positions, code units, 
code spaces, abstract characters, glyphs, graphemes, etc).


The main advantage of using the more formal "code point" over the informal 
"character" is that it discourages people from assuming they know what they are 
(with the usual mistaken assumption being that Unicode code points correspond 
directly to glyphs the way ASCII and Extended ASCII printable characters 
correspond to their glyphs). The rest of the paragraph then provides the 
mechanical details of the meaningful interpretations of them in Python (as 
length 1 strings and as numbers in a particular range) and the operations for 
translating between those two formats (chr and ord).

Fair point about the slicing - it may be better to just talk about indexing.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21667>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21667] Clarify status of O(1) indexing semantics of str objects

Reply via email to