Re: [Development] HEADS-UP: QStringLiteral

Mutz, Marc via Development Thu, 22 Aug 2019 05:50:40 -0700

On 2019-08-22 13:42, Lars Knoll wrote:

That's why we are not removing QLatin1String: the Latin1 algorithm isas fast
as memcpy. The only thing better than that is zero copies.


We could also turn this around: Are we over-optimising here? Do we
have the right balance between ease of use and performance? Converting
utf8 is a bit more costly than latin1, but would that ever matter in
real world use cases?

Once we have proper support for u8 (in Qt, and C++ (char8_t)), we cancertainly think about phasing out QLatin1String. Personally, I don'tthink the decoding performance between L1 and UTF-8 is the key here.

UTF-8 even has the nice property that it's closed under all texttransformations in all locales, unlike L1 (toupper('ß') == ẞ ∉ L1,tolower('I') @ tr_TR = ı ∉ L1, ...). QUtfXXX would also greatly reducethe number of overloads of core string functions we need to provide (thesame way as QStringView does already, if you considerQT_STRINGVIEW_LEVEL >= 2).

For me, the problem is QUtf8XXX::size() - what should that return?! IOW:what's the meaning of an index into a UTF-8 string? That extends tomid(), left(), right(), split(), ... In all current Qt string classes,size() returns the number of characters (ignoring surrogate pairs inQString, which we probably can live with because there are differentways to spell a ä in Unicode, too (ä, a + ¨), such that any serious textprocessing is anyway far removed from the simplistic 1 code point = 1glyph pov, so surrogate pairs aren't much of an issue anymore). Whateverwe do here, it will be downhill from where we are. Either size() is O(N)or a string (view) is no longer the size of a pointer (or two). That's2x (50%+0) O(1) memory per string (view), and such stuff adds up over1000s of strings...

So, maybe, at some point in the future, we can axe QLatin1String. But weneed to seriously up UTF-8 support in Qt before that. QString is kind ofin the way here, as UTF-16 has the bad side effect of endian dependence.If, say, .qm files were stored in UTF-8, tr() could return a QUtf8View.That's not possible with QString, unless apps come with two .qm files,one LE and BE.

One way to get out of this history pit was mentioned here and there onthis ML before: we could have a QAnyString(View) (all names subject tobikeshedding), a string (view) that type-erases the encoding (like astd::variant<QUtf8String(View), QLatin1String(View), QString(View)>),which would be the type used in higher-level APIs(QLineEdit::setText(QAnyStringView)). I think std::filesystem::path gotthat quite right: you can feed it UTF-8 or UTF-16, and it willtransparently convert to and from native API's encoding as needed.

But such a type has to be an _addition_ to, not a replacement of,encoding-dependent string types (proof: how do you process aQAnyString(View) if you're given one? Probably, keeping the std::variantsimile, with a visitation mechanism, and the visitor is overloaded onthe type. Sure, you can use (char8_t*, qsizetype) and (char16_t,qizetype) for that, but then we're back to a place we thought we'd nevergo back to after we got views: C-like string manipulation APIs.


Flame away...

Thanks,
Marc
_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Re: [Development] HEADS-UP: QStringLiteral

Reply via email to