Ned Batchelder <n...@nedbatchelder.com> writes: > This is where my knowledge about Unicode gets fuzzy. Isn't it the > case that some grapheme clusters (or whatever the right word is) can't > be normalized down to a single code point? Characters can accept many > accents, for example.
That's true, but doesn't affect the point being made: that one can have both “sequence of Unicode code points” in Python's ‘unicode’ (now ‘str’) type, and also deal with “sequence of text the reader will see”. > In that case, you can't always normalize and use the existing string > methods, but would need more specialized code. Specialised code may not be needed. It will at least be true that “any two code-point sequences which normalise to the same value will be visually the same for the reader”, which is an important assertion for addressing the complaints from Mortoray's article. -- \ “Pray, v. To ask that the laws of the universe be annulled in | `\ behalf of a single petitioner confessedly unworthy.” —Ambrose | _o__) Bierce, _The Devil's Dictionary_, 1906 | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list