[issue21667] Clarify status of O(1) indexing semantics of str objects

Jim Jewett Tue, 10 Jun 2014 16:21:06 -0700

Jim Jewett added the comment:

And even my rewrite showed path dependency; a slight further improvement is to 
re-order encoding ahead of bytes.  I also added a paragraph that I hope answers 
the speed issue.


Proposal:

A string is a sequence of Unicode code points.  Strings can include any 
sequence of code points, including some which are semantically meaningless, or 
explicitly undefined.

Python doesn't have a :c:type:`char` type; a single code point is represented 
as a string of length ``1``.  The built-in function :func:`chr` translates an 
integer in the range ``U+0000 - U+10FFFF`` to the corresponding length ``1`` 
string object, and :func:`ord` does the reverse.

:meth:`str.encode` provides a concrete representation (in the given text 
encoding) as a :class:`bytes` object suitable for transport and communication 
with non-Python utilities.  :meth:`bytes.decode` decodes such byte sequences 
into text strings.

.. impl-detail::  There are no methods exposing the internal representation of 
code points within a string.  While the C-API provides some additional 
constraints on CPython, other implementations are free to use any 
representation that treats code points (as opposed to either code units or some 
normalized form of characters) as the unit of measure.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21667>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21667] Clarify status of O(1) indexing semantics of str objects

Reply via email to