On Wednesday 11 February 2015 11:22:59 Julien Blanc wrote: > On 11/02/2015 10:32, Bo Thorsen wrote: > > 2) length() returns the number of chars I see on the screen, not a > > random implementation detail of the chosen encoding. > > How’s that supposed to work with combining characters, which are part of > unicode ?
That's true. And add that there are some zero-width characters too and some characters that are double-width. Also, QString::length() returns the number of UTF-16 codepoints, not the number of UCS-4 characters, so it reports 2 characters for a pair of surrogates, not 1. If you really want to know the width of the string as seen on screen, you need to use QFontMetrics, even for a monospace setting. > > 3) at(int) and [] gives the unicode char, not a random encoding char. > > Same problem with combining characters. What do you expect : > > QString s = QString::fromWCharArray(L"n\u0303"); > s.length(); // 1 or 2 ?? > s[0]; // n or ñ ?? Yet, unlike std::u16string, QString can convert from NFD to NFC: QString s = QString::fromUtf16(u"n\u0303") .normalized(QString::NormalizationForm_C); s.length() == 1; s[0] == "ñ"; > > Another note: Latin1 is the worst idea for i18n ever invented, and it's > > by now useless, irrelevant and only a source for bugs once you start to > > truly support i18n outside of USA and Western Europe. I would be one > > step closer to total happiness if C++17 and Qt7 makes this "encoding" > > completely unsupported. > > Could not agree more with that part. There are two reasons we keep Latin1 in the API: 1) it's a superset of US-ASCII, so toAscii and fromAscii are just calls to the Latin1 functions with the note "behaviour is undefined if the string contains non-ASCII characters" 2) it's dead easy to convert to and from it to UTF-16 As I was explaining yesterday to some people, the core of the loop of converting from Latin1 to UTF-16 is *two* AVX2 instructions: b36: vpmovzxbw (%rax,%rsi,1),%ymm0 b3c: vmovdqu %ymm0,(%rdi,%rax,2) [plus the loop overhead itself] The conversion from UTF-16 to Latin1 is a little more complex due to the requirement to replace non-Latin1 characters with '?', so it's a few more instructions with AVX-512F: 1c60a: vpmovzxwd 0x0(%r13,%rdx,2),%zmm2 1c612: vpcmpnltud %zmm1,%zmm2,%k1 1c61d: vpblendmd %zmm0,%zmm2,%zmm3{%k1} 1c623: vpmovdb %zmm3,(%rdx,%rdi,1) Without AVX-512F (which no one has yet), it expands to more code. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development