On 10/12/2016 5:57 PM, Elliot Gorokhovsky wrote:
On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith <n...@pobox.com
<mailto:n...@pobox.com>> wrote:

    But this isn't relevant to Python's str, because Python's str never
    uses UTF-8.


Really? I thought in python 3, strings are all unicode...

They are ...

so what encoding do they use, then?

Since 3.3, essentially ascii, latin1, utf-16 without surrogates (ucs2), or utf-32, depending on the hightest codepoint. This is the 'kind' field. If we go this route, I suspect that optimizing string sorting will take some experimentation. If the initial item is str, it might be worthwhile to record the highest 'kind' during the type scan, so that strncmp can be used if all are ascii or latin-1.


--
Terry Jan Reedy

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to