On Thu, Oct 13, 2016 at 5:17 PM, Stephen J. Turnbull
<turnbull.stephen...@u.tsukuba.ac.jp> wrote:
> Chris Angelico writes:
>
>  > I'm not sure what you mean by "strcmp-able"; do you mean that the
>  > lexical ordering of two Unicode strings is guaranteed to be the same
>  > as the byte-wise ordering of their UTF-8 encodings?
>
> This is definitely not true for the Han characters.  In Japanese, the
> most commonly used lexical ordering is based on the pronunciation,
> meaning that there are few characters (perhaps none) in common use
> that has a unique place in lexical ordering (most individual
> characters have multiple pronunciations, and even many whole personal
> names do).

Yeah, and even just with Latin-1 characters, you have (a) non-ASCII
characters that sort between ASCII characters, and (b) characters that
have different meanings in different languages, and should be sorted
differently. So lexicographical ordering is impossible in a generic
string sort.

ChrisA
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to