For Jython and IronPython, UTF-16 may be best internal encoding. Recent languages (Swiffy, Golang, Rust) chose UTF-8 as internal encoding. Using utf-8 is simple and efficient. For example, no need for utf-8 copy of the string when writing to file and serializing to JSON.
When implementing Python using these languages, UTF-8 will be best internal encoding. To allow Python implementations other than CPython can use UTF-8 or UTF-16 as internal encoding efficiently, I think adding internal position based API is the best solution. >>> s = "\U00100000x" >>> len(s) 2 >>> s[1:] 'x' >>> s.find('x') 1 >>> # s.isize() # Internal length. 5 for utf-8, 3 for utf-16 >>> # s.ifind('x') # Internal position, 4 for utf-8, 2 for utf-16 >>> # s.islice(s.ifind('x')) => 'x' (I like design of golang and Rust. I hope CPython uses utf-8 as internal encoding in the future. But this is off-topic.) On Wed, Jun 4, 2014 at 4:41 PM, Jeff Allen <ja...@farowl.co.uk> wrote: > Jython uses UTF-16 internally -- probably the only sensible choice in a > Python that can call Java. Indexing is O(N), fundamentally. By > "fundamentally", I mean for those strings that have not yet noticed that > they contain no supplementary (>0xffff) characters. > > I've toyed with making this O(1) universally. Like Steven, I understand this > to be a freedom afforded to implementers, rather than an issue of > conformity. > > Jeff Allen > > > On 04/06/2014 02:17, Steven D'Aprano wrote: >> >> There is a discussion over at MicroPython about the internal >> representation of Unicode strings. > > ... > >> My own feeling is that O(1) string indexing operations are a quality of >> implementation issue, not a deal breaker to call it a Python. I can't >> see any requirement in the docs that str[n] must take O(1) time, but >> perhaps I have missed something. >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com -- INADA Naoki <songofaca...@gmail.com> _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com