Chris Angelico <ros...@gmail.com>: > On Fri, Jul 14, 2017 at 6:53 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: >> Chris Angelico <ros...@gmail.com>: >> Then, why bother with Unicode to begin with? Why not just use bytes? >> After all, Python3's strings have the very same pitfalls: >> >> - you don't know the length of a text in characters >> - chr(n) doesn't return a character >> - you can't easily find the 7th character in a piece of text > > First you have to define "character".
I'm referring to the Grapheme clusters, a.k.a.real characters >> - you can't compare the equality of two pieces of text >> - you can't use a piece of text as a reliable dict key > > (Dict key usage is defined in terms of equality, so these two are the > same concern.) Ideally, yes. However, someone might say, "don't use == to compare equality; use unicode.textually_equal() instead". That advise might satisfy the first requirement but not the second. > Yes, you can. For most purposes, textual equality should be defined in > terms of NFC or NFD normalization. Python already gives you that. You > could argue that a string should always be stored NFC (or NFD, take > your pick), and then the equality operator would handle this; but I'm > not sure the benefit is worth it. As I said, Python3's strings are neither here nor there. They don't quite solve the problem Python2's strings had. They will push the internationalization problems a bit farther out but fall short of the mark. he developer still has to worry a lot. Unicode seemingly solved one problem only to present the developer of a bagful of new problems. And if Python3's strings are a half-measure, why not stick to bytes? > If you're trying to use strings as identifiers in any way (say, file > names, or document lookup references), using the NFC/NFD normalized > form of the string should be sufficient. Show me ten Python3 database applications, and I'll show you ten Python3 database applications that don't normalize their primary keys. Marko -- https://mail.python.org/mailman/listinfo/python-list