[Development] QUtf8String{, View} (was: Re: QString and related changes for Qt 6)

Marc Mutz via Development Thu, 14 May 2020 07:44:25 -0700

Hi Lars,

On 2020-05-12 09:49, Lars Knoll wrote:
[...]

One open question is whether we should add a QUtf8String with a
char8_t. I am not yet convinced that we actually need the class
though.

[...]

I positively want to stop using QByteArray as the QUtf8String that itcurrently is. QByteArray should lose all notion of string-ness(deprecate toLower() etc, remove in Qt 7) and be a QVector<std::byte>.Not sure we'll get there for Qt 6, not sure we'll get there with thename QByteArray, but that should be the end game for this class.

The networking code is full of uses of QByteArray and due to the lack ofQByteArrayRef (QStringRef) or QByteArrayView (QStringView), it'ssplitting and substringing is much less performant than it could be.


Also, given a function like

   setFoo(const QByteArray &);

what does this actually expect? An UTF-8 string? A local 8-bit string?An octet stream? A Latin-1 string? QByteArray is the jack of all these,master of none.

So, assuming the premiss that QByteArray should not be string-ishanymore, what do we want to have as the result type of QString::toUtf8()and QString::toLatin1()? Do we really want mere bytes?


I don't think so.

If Unicode succeeds, most I/O will be in the form of UTF-8. File nameson Unix are UTF-8 (for all intents and purposes these days), not UTF-16(as they are on Windows). It makes a _ton_ of sense to have a containerfor this, and C++20 tempts us with char8_t to do exactly that. I'd loveto do string processing in UTF-8 without potentially doubling thestorage requirements by first converting it to UTF-16, then doing theprocessing, then converting it back.


Qt should have a strong story not just for UTF-16, but also for UTF-8.

I've talked about this on QtWS, but here's TL;DV: of it:

value_type container view string-ishAPI?


char / QLatin1Char    — QLatinString — QLatin1StringView — yes
char8_t / qchar8      — QUtf8String  — QUtf8StringView   — yes
char16_t / QChar      — QString      — QStringView       — yes
(char32_t             — QUtf32String — QUtf32StringView  — yes)

std::byte             — QByteArray   — QByteArrayView    — NO

I'm not sure we need the utf32 one, and I'm ok with dropping the L1 one,provided a) we can depend on char8_t (ie. Qt 7) and b) utf-8 <-> utf16operations are not much slower than L1 <-> utf16 ones (I heard Lars'team has them down to within 5% of each other, not sure that'spossible). Anyway, we'd have two class templates, and they'd just beinstantiated with different Char types to flesh out all of the above,with the exception of the byte array ones:


  using QUtf8String = QBasicString<char8_t>;
  using QString = QBasicString<char16_t>;
  using QLatin1String = QBasicString<char>;
  (using QByteArray = QVector<std::byte>;)

If, after getting all of the above runnig, we _then_ want The One String(View) To Rule Them All, then I'd suggest QAnyString{,View} (not sure weneed a QAnyString), which can contain any of the 2-4 string (view)classes above (but not QByteArray(View)), but which doesn't havestring-ish API. Instead, you need to inspect it to extract the actualstring class (QLatin1String, QUtf8String, QString) contained, or simplyask for the one you want, and it will convert, if necessary.


With this, your typical Qt function taking strings would look like this:

   QLineEdit::setText(QAnyStringView text)
   {
       Q_D(QLineEdit);

if (text == d->text) // mixed-mode comparisons are supported outof the box

           return;

d->text = text.toString(); // centralized conversion to QString(in library, not user code)// also available: toLatin1(),toUtf8()

       update();
   }

Callers now have total freedom in what to pass:

   le->setText("Hi");
   le->setText(u"Hi");
   le->setText(u8"Hi");
   le->setText(u"Hi"s);
   le->setText(u8"Hi"sv);
   le->setText(QVarLengthArray{'H', 'i'});
   le->setText("Hello" % ", World"); // QStringBuilder

and they'd all result in optimal code, because QAnyStringView is atrivial type (in the C++ sense), which means, unlike QString, it can bepassed in CPU registers instead of on the stack.


Likewise, parsing code could do

   Meep parseMeep(QAnyStringView str)
   {
       return str.visit([](auto str) {
           Meep meep;
           for (auto me : str.tokenize(u'\n'))
              meep += parse(me);
           return meep;
       });
   }

iow: instead of a bunch of overloads, you write your code as a templateand let QAnyStringView instantiate your lambda with the actual type ofstring view passed.


As a further example, here's op== for QAnyStringView (provided by Qt):

   bool operator==(QAnyStringView lhs, QAnyStringView rhs) noexcept
   {
       return lhs.visit([rhs](auto lhs) {
           return rhs.visit([lhs](auto rhs) {
               return lhs == rhs;
           });
       });
   }

Last year, I heard someone (don't remember whom) suggest this forQString. That is: allow QString to hold UTF-16 or UTF-8 data. I'dclassify this idea as another over-my-dead-body (which, btw, issemi-official ISO speak for "strong objection"). As I'm wont to say: AnAPI doesn't become easy to use by minimizing the number of classes, butby minimizing the number of responsibilities per class, even if thatmeans many more small classes than one big.

I would add, as I've done before, and even Matthew said, that I'd bevery wary of folding QStringView into QString. I can understand the urgeto not have to go and s/QString/QStringView/ in many places (ors/QString/QAnyStringView/), but it is my firm belief that it would makeQt much easier and convenient to use if we didn't put all thoseresponsibilities on QString.

There's only our own lazyness which stands in the way of this betteralternative.


Thanks,
Marc
_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

[Development] QUtf8String{, View} (was: Re: QString and related changes for Qt 6)

Reply via email to