Arnaud Clère (25 January 2019 10:59) wrote: > Most user code I have written or seen handles text data naively and is > incorrect in some respect but I think only a minority of if is leading > to real problems because input data will rarely trigger them.
That depends a lot on who's supplying your data. The same rationale was given for "making do" with old 8-bit encodings, which meant programs worked for various rich nations' primary languages and didn't for anyone else's. Then we switched to UTF-16, which let us continue not thinking about what we're really doing, while reaching a larger slice of the world. Still, that leaves us complicit in suppressing various minority cultures by making software that works for the dominant culture around them, but not for them. Until we get into the habit of thinking of text properly (and I still don't even know the terminology, so I have a way to go on this, just like anyone) instead of as a sequence of evenly-sized units, we're going to continue either being inefficient (because we use units that are bigger than needed for many use-cases - arguably true of UTF-16) or we fail to properly support cultures whose scripts are relegated to the outer planes of Unicode - as, for example, the Chakma language's number system, which QLocale currently can't represent (QTBUG-69324) because the digits don't fit in a single UTF-16 unit (as QLocaleData expects of digits, signs and quotes, though it understands most of its other locale-specific texts might be longer). As a result, we can't support any Chakma locale. By all means, let's make sure the internals are efficient for the more common languages and scripts; but it's way past time to start doing Unicode properly, so that all cultures are well-served by default, when the software folk are using is built on Qt, Eddy. _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development