Re: [Development] QString and related changes for Qt 6

2020-05-14 Thread Elvis Stansvik
Den tors 14 maj 2020 15:46Marc Mutz via Development <
development@qt-project.org> skrev:

> On 2020-05-13 17:17, Matthew Woehlke wrote:
> [...]
> > Non-owning QString is dangerous. QStringLiteral is less dangerous
> > because it is almost never used with non-rodata storage (and indeed, I
> > would consider any such usage highly suspect, if not outright broken).
> > QString::fromRawData is dangerous, but "obviously" so.
> >
> > We should not implement any way of creating a non-owning QString that
> > is not explicit, and if we adhere to that, I don't see us *not*
> > wanting QStringView in many instances.
>
> I must be crazy, but ... +1!
>

*chuckle* :)


> Thanks,
> Marc
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development
>
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-14 Thread Marc Mutz via Development

On 2020-05-13 17:17, Matthew Woehlke wrote:
[...]

Non-owning QString is dangerous. QStringLiteral is less dangerous
because it is almost never used with non-rodata storage (and indeed, I
would consider any such usage highly suspect, if not outright broken).
QString::fromRawData is dangerous, but "obviously" so.

We should not implement any way of creating a non-owning QString that
is not explicit, and if we adhere to that, I don't see us *not*
wanting QStringView in many instances.


I must be crazy, but ... +1!

Thanks,
Marc
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-14 Thread Marc Mutz via Development

On 2020-05-13 20:48, Jaroslaw Kobus wrote:
From: Development  on behalf of 
Thiago Macieira 

Sent: Wednesday, May 13, 2020 6:21 PM
To: development@qt-project.org
Subject: Re: [Development] QString and related changes for Qt 6

On terça-feira, 12 de maio de 2020 22:57:31 PDT Jaroslaw Kobus wrote:
> That's why I've mentioned the better option: aggregation: QStringView could
> be a member of QString. However, the downside would be that every time you
> want to call a const method for QString, you would need to first get access
> to the QStringView member. The advantage is that in this way you may easily
> integrate different interfaces inside one class.

This is more or less what we want to do. QString in Qt 6 is {begin, 
size, d}
and QStringView has always been {begin, size}. So, yeah, it can be 
done.


The idea is indeed to offload the majority of the non-mutating methods 
to the

same functions, from inline code. There's no reason to have both
QString::indexOf and QStringView::indexOf entry points in the library.


Good to hear. And I hope that Marc will resurrect soon after his veto.


Had you looked into qstring.cpp (I know it hurts!), you'd've seen that 
it's already implemented that way. But Neither does QString aggregate a 
QStringView nor does it inherit it.


So, there's no resurrection coming because no death was caused.

Thanks,
Marc
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Jaroslaw Kobus
> From: Development  on behalf of Thiago 
> Macieira 
> Sent: Wednesday, May 13, 2020 6:21 PM
> To: development@qt-project.org
> Subject: Re: [Development] QString and related changes for Qt 6
> 
> On terça-feira, 12 de maio de 2020 22:57:31 PDT Jaroslaw Kobus wrote:
> > That's why I've mentioned the better option: aggregation: QStringView could
> > be a member of QString. However, the downside would be that every time you
> > want to call a const method for QString, you would need to first get access
> > to the QStringView member. The advantage is that in this way you may easily
> > integrate different interfaces inside one class.
> 
> This is more or less what we want to do. QString in Qt 6 is {begin, size, d}
> and QStringView has always been {begin, size}. So, yeah, it can be done.
> 
> The idea is indeed to offload the majority of the non-mutating methods to the
> same functions, from inline code. There's no reason to have both
> QString::indexOf and QStringView::indexOf entry points in the library.

Good to hear. And I hope that Marc will resurrect soon after his veto.

Regards

Jarek
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Thiago Macieira
On terça-feira, 12 de maio de 2020 22:57:31 PDT Jaroslaw Kobus wrote:
> That's why I've mentioned the better option: aggregation: QStringView could
> be a member of QString. However, the downside would be that every time you
> want to call a const method for QString, you would need to first get access
> to the QStringView member. The advantage is that in this way you may easily
> integrate different interfaces inside one class.

This is more or less what we want to do. QString in Qt 6 is {begin, size, d} 
and QStringView has always been {begin, size}. So, yeah, it can be done.

The idea is indeed to offload the majority of the non-mutating methods to the 
same functions, from inline code. There's no reason to have both 
QString::indexOf and QStringView::indexOf entry points in the library.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Matthew Woehlke

On 13/05/2020 11.49, Giuseppe D'Angelo wrote:

Il 13/05/20 16:44, Matthew Woehlke ha scritto:

Note that adding the QString(char16_t*) constructor

Pedantic, but surely you meant `char16_t const*`.


Hey, you can't nitpick here ...



This can be solved with a third overload:

   template 
   void foo(char16_t ()[N]) { foo(QStringView{s, N}); }


... and then do the same mistake in the same email >:-)


Touché :-D. I fixed it in my godbolt experiment, but yup, missed it here.


--
Matthew
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Giuseppe D'Angelo via Development

Il 13/05/20 16:44, Matthew Woehlke ha scritto:

Note that adding the QString(char16_t*) constructor

Pedantic, but surely you meant `char16_t const*`.


Hey, you can't nitpick here ...



This can be solved with a third overload:

   template 
   void foo(char16_t ()[N]) { foo(QStringView{s, N}); }

... and then do the same mistake in the same email >:-)

--
Giuseppe D'Angelo | giuseppe.dang...@kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts



smime.p7s
Description: Firma crittografica S/MIME
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Matthew Woehlke

On 12/05/2020 17.21, Thiago Macieira wrote:

On Tuesday, 12 May 2020 08:42:28 PDT Matthew Woehlke wrote:

How will this work? As I understand, the main advantage to
QStringLiteral is that it statically encodes the *length* as well as the
data. This isn't possible with raw literals, which are merely
NUL-terminated.


Black magic!

I mean, templates and constexpr.


Yeah... I'm not sure what I was thinking when I wrote that...

Oh, wait...


I don't see us ever getting rid of some form of QString
literal short of templatizing *everything* that takes a T* (for T in
char, char16_t, etc.) to take a T(&)[N] instead.


...I was thinking this. You might be able to escape this for methods 
that don't take both QString *and* QStringView. Otherwise, well, see my 
later message on that point.


And on that note...


But QStringView(u"foo") should call that first constructor. Doesn't it? I
never remember if the literal decays to pointer before the overload
resolution.


Uh... no, actually it doesn't. (Which TBH smells a bit like a defect to 
me, but we're stuck with it for now.)


Note: https://godbolt.org/z/FbjQkM

(That was experimenting with QString/QStringView overload 
disambiguation, but also includes the relevant ctors. Comment out the 
templated overload of `foo` and one of the others, and you'll see that 
the invocation with a literal calls the "wrong" ctor.)


So, we either need to retain literals in some form, or, as I was saying, 
every method needs to have a templated flavor for string literals.



The "nice" thing about QStringView is that it does not have ownership;
you have to be careful about how long you hold onto it lest it turn into
a dangling pointer. You can't construct a QString from any old bag of
byt^Wcharacters because a QString is implicitly valid until it is destroyed.


That's the problem we've had with QStringLiteral and QString::fromRawData().

You *can* create it from read-only data and tell it never to try to modify.
The trick is guaranteeing that it remains valid until the last user finished
using it. Because of copy-on-write, that last user can be much later than the
statement that created the QString in the first place.


Right, but if you're using QStringLiteral / QString::fromRawData, you 
"know" you're taking on that responsibility. (And for QStringLiteral, 
you only run into problems in some instances with library unloading, 
which is a non-issue for many applications.)


What I worry about with trying to avoid QStringView is that we either 
lose the ability to avoid copies when the input is a *temporary* (e.g. 
stack-allocated) buffer, or else we silently accept such uses and 
produce broken programs.


Note that you can't rely on adding non-const overloads as a work-around; 
the string might be coming from an intermediate function that doesn't 
have a non-const overload, but was called with a (non-const) temporary 
buffer.


Example:

  void foo(char const* s)
  {
...
method_taking_qt_string(s);
...
  }

  void bar()
  {
char buffer[MAX_SIZE];
...do stuff to put data in buffer...
foo(buffer);
  }

Non-owning QString is dangerous. QStringLiteral is less dangerous 
because it is almost never used with non-rodata storage (and indeed, I 
would consider any such usage highly suspect, if not outright broken). 
QString::fromRawData is dangerous, but "obviously" so.


We should not implement any way of creating a non-owning QString that is 
not explicit, and if we adhere to that, I don't see us *not* wanting 
QStringView in many instances.


--
Matthew
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Matthew Woehlke

On 13/05/2020 02.33, Lars Knoll wrote:

On 12 May 2020, at 23:09, Thiago Macieira wrote:

I want rules that determine what the API should be without looking at the
implementation of those two functions.


You may be disappointed, at least as far as parameters.


This is one reason why I think we should simply use QString in most of those 
cases.

Additionally, QString is a class that owns it’s data, making it the
class that’s easiest to use and safest. QStringView doesn’t own it’s
data and as such there are always lifetime considerations that need
to be taken into account when using it. So using it would make using
the API harder and more error prone.


That might be true for return values. For parameters, if the *user* 
needs to care whether the function takes a QString vs. QStringView, 
we're doing something wrong. The onus to properly handle a QStringView 
in that case should be entirely on the *implementer* of the API.


...but yeah, if we're talking about return values, that's a whole other 
kettle of fish.


--
Matthew
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Matthew Woehlke

On 12/05/2020 13.48, Giuseppe D'Angelo via Development wrote:

On 5/12/20 6:12 PM, Иван Комиссаров wrote:
So the question is - is it possible to allow to construct QString from 
unicode literal?


"Not yet", but adding a constructor from char16_t to QString makes sense.

This creates a problem down the line: today you have a

   f(QString)

and you call it with f(u"whatever"). Then, later on, you realize that 
QString is not needed and QStringView suffices. (This is the case all 
over existing Qt code.)


What do you do? Adding a QStringView overload will make calls ambiguous, 
removing the QString one will be an ABI break. We need an established 
solution for these cases as they'll pop up during the Qt 6 lifetime.


This can be solved with a third overload:

  template 
  void foo(char16_t ()[N]) { foo(QStringView{s, N}); }

Of course, this isn't quite right; we actually want:

  QStringView{s, s[N - 1] ? N : N - 1}

...so that we correctly handle both NUL-terminated literals and also raw 
arrays (which may not be NUL-terminated!). There is the slight caveat 
that we will ignore a final NUL in a raw array, but a) I think that's 
reasonable, and b) I don't see a way around that short of a language 
change to give string literals a distinct type. Also note that 
reasonable compilers should optimize away the conditional, so there is 
no added overhead.



Note that adding the QString(char16_t*) constructor


Pedantic, but surely you meant `char16_t const*`. (Also, please provide 
the templated overload so calling strlen is not needed!)


--
Matthew
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Ville Voutilainen
On Wed, 13 May 2020 at 12:50, Tor Arne Vestbø  wrote:
>
>
>
> > On 13 May 2020, at 10:12, Edward Welbourne  wrote:
> >
> >> Note that adding the QString(char16_t*) constructor introduces this
> >> ambiguity for the functions that are already overloaded on
> >> QString+QStringView (and thus today are using QStringView).
> >
> > Would it suffice to skip the QString(char16_t *) constructor and,
> > instead, have a QString(QStringView) constructor ?
> >
> > I guess calls to functions taking QString would have to make one of the
> > steps explicit, when passing a u"...", i.e. either call
> > f(QString(u"...")) or f(QStringView(u"...")), preferring the latter (as
> > it's future-proof against f changing signature from QString to
> > QStingView later; note that this concern applies to Qt-using code, which
> > may allow itself such ABI-breaks, not just Qt itself, which wouldn't, at
> > least not once the old API has appeared in a public release).  I suppose
> > both forms are capable of exploiting constexpr and happening at
> > compile-time, when the compiler deigns to make it so.
>
> Whatever we end up with, _please_ avoid the 
> explicitness/verboseness/boilerplate of having to wrap every “foo” in some 
> QPreferredStringTypeOfTheWeek(“foo”)
>
> I expect my code to looks like this:
>
>   foo.bar(“baz”)
>
> Or if the allocations and conversations are really a performance issue for 
> this particular piece of code:
>
>  foo.bar(u“baz”)
>
> Anything else should be reserved for corner cases where the explicitness is 
> warranted.

That's all well and good, but if foo.bar(a) and foo.bar(b) have
different semantics on whether
the class copies or views what I pass in, I am going to hurt you. :)
Meaning that if it sometimes
stores a copy, then it should always store a copy, instead of
sometimes storing a copy
and sometimes storing a view, in which case I need to be insanely
careful about calling
such functionality. If the class doesn't store the argument, I don't
care. If it does, it should
decide whether it stores a copy or a view.

Overloads in an overload set should have the same semantics,
otherwise that API is a vector, where for some incoming types it
does A and for others
it does B, and the code can no longer be read without looking at the
API documentation for every call.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Tor Arne Vestbø


> On 13 May 2020, at 10:12, Edward Welbourne  wrote:
> 
>> Note that adding the QString(char16_t*) constructor introduces this
>> ambiguity for the functions that are already overloaded on
>> QString+QStringView (and thus today are using QStringView).
> 
> Would it suffice to skip the QString(char16_t *) constructor and,
> instead, have a QString(QStringView) constructor ?
> 
> I guess calls to functions taking QString would have to make one of the
> steps explicit, when passing a u"...", i.e. either call
> f(QString(u"...")) or f(QStringView(u"...")), preferring the latter (as
> it's future-proof against f changing signature from QString to
> QStingView later; note that this concern applies to Qt-using code, which
> may allow itself such ABI-breaks, not just Qt itself, which wouldn't, at
> least not once the old API has appeared in a public release).  I suppose
> both forms are capable of exploiting constexpr and happening at
> compile-time, when the compiler deigns to make it so.

Whatever we end up with, _please_ avoid the 
explicitness/verboseness/boilerplate of having to wrap every “foo” in some 
QPreferredStringTypeOfTheWeek(“foo”)

I expect my code to looks like this:

  foo.bar(“baz”)

Or if the allocations and conversations are really a performance issue for this 
particular piece of code:

 foo.bar(u“baz”)

Anything else should be reserved for corner cases where the explicitness is 
warranted.

Tor Arne 

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread André Pönitz
On Tue, May 12, 2020 at 02:09:21PM -0700, Thiago Macieira wrote:
> On Tuesday, 12 May 2020 10:48:24 PDT Giuseppe D'Angelo via Development wrote:
> > What do you do? Adding a QStringView overload will make calls ambiguous,
> > removing the QString one will be an ABI break. We need an established
> > solution for these cases as they'll pop up during the Qt 6 lifetime.
> 
> Indeed.
> 
> And the API policy must be one such that it doesn't depend on what the method 
> does *today* and it doesn't create a mess. Functions change.
> 
> [Good regexp example snipped]
> 
> I want rules that determine what the API should be without looking at the 
> implementation of those two functions.

Same for me.

And I think this is an important point, even to the degree that a clear, uniform
API is more worth than a handful cycles.

Most of API changes that are currently discussed or even done "for performance
reasons" *do not matter in practice*.

If a real world Qt application has a performance problem, this is *not* solved
by changing QRegularExpression::pattern() from returning a QString to returning
QStringView.

There are very few cases in repeatedly used low level functions where it 
actually
*does* make sense, but there it's actually ok to have duplicated interface.

The "overload" problem would also be solvable, by not using overloads, but
differently named functions, e.g. by sth like  .midView() instead of .mid().

Andre'
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Edward Welbourne
On 5/12/20 6:12 PM, Иван Комиссаров wrote:
>> So the question is - is it possible to allow to construct QString from 
>> unicode literal?

Giuseppe D'Angelo (12 May 2020 19:48) replied:
> "Not yet", but adding a constructor from char16_t to QString makes sense.
>
> This creates a problem down the line: today you have a
>
>   f(QString)
>
> and you call it with f(u"whatever"). Then, later on, you realize that
> QString is not needed and QStringView suffices. (This is the case all
> over existing Qt code.)
>
> What do you do? Adding a QStringView overload will make calls ambiguous,
> removing the QString one will be an ABI break. We need an established
> solution for these cases as they'll pop up during the Qt 6 lifetime.
>
> Note that adding the QString(char16_t*) constructor introduces this
> ambiguity for the functions that are already overloaded on
> QString+QStringView (and thus today are using QStringView).

Would it suffice to skip the QString(char16_t *) constructor and,
instead, have a QString(QStringView) constructor ?

I guess calls to functions taking QString would have to make one of the
steps explicit, when passing a u"...", i.e. either call
f(QString(u"...")) or f(QStringView(u"...")), preferring the latter (as
it's future-proof against f changing signature from QString to
QStingView later; note that this concern applies to Qt-using code, which
may allow itself such ABI-breaks, not just Qt itself, which wouldn't, at
least not once the old API has appeared in a public release).  I suppose
both forms are capable of exploiting constexpr and happening at
compile-time, when the compiler deigns to make it so.

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Lars Knoll
> On 12 May 2020, at 23:09, Thiago Macieira  wrote:
> 
> On Tuesday, 12 May 2020 10:48:24 PDT Giuseppe D'Angelo via Development wrote:
>> What do you do? Adding a QStringView overload will make calls ambiguous,
>> removing the QString one will be an ABI break. We need an established
>> solution for these cases as they'll pop up during the Qt 6 lifetime.
> 
> Indeed.
> 
> And the API policy must be one such that it doesn't depend on what the method 
> does *today* and it doesn't create a mess. Functions change.
> 
> Let's take an example with QRegularExpression's pattern (not picking on 
> Giuseppe). Today it is:
> 
>QString pattern() const;
>void setPattern(const QString );
> 
> QString QRegularExpression::pattern() const
> {
>return d->pattern;
> }
> 
> Since this is returning a stored QString, someone might feel that it should 
> instead return a QStringView. But if it's storing, then the setter should 
> remain const QString &. That would be:
> 
>QStringView pattern() const;
>void setPattern(const QString );
> 
> But suppose that there's a pcre2_get_pattern_16() function. Then someone 
> might 
> be tempted to say that since PCRE stores the pattern, we don't need to. That 
> would mean QRegularExpression::pattern() ought to be written as:
> 
> QString QRegularExpression::pattern() const
> {
>qsizetype len = pcre2_get_pattern_length_16(d->compiledPattern);
>QString retval(Qt::Uninitialized, len);
>pcre2_get_pattern_16(d->compiledPattern, retval.data(), len);
>return retval;
> }
> 
> But if PCRE is going to store the pattern and PCRE doesn't use QString, then 
> setPattern could take a QStringView instead. That would be:
> 
>QString pattern() const;
>void setPattern(QStringView pattern);
> 
> That's the opposite of the previous one.
> 
> I want rules that determine what the API should be without looking at the 
> implementation of those two functions.

This is one reason why I think we should simply use QString in most of those 
cases. 

Additionally, QString is a class that owns it’s data, making it the class 
that’s easiest to use and safest. QStringView doesn’t own it’s data and as such 
there are always lifetime considerations that need to be taken into account 
when using it. So using it would make using the API harder and more error prone.

Cheers,
Lars

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Lars Knoll
> On 13 May 2020, at 08:14, André Somers  wrote:
> 
> 
> On 12-05-20 22:42, Thiago Macieira wrote:
>> 
>> QStringView::mid(), for example, returns QStringView, but QString::mid()
>> returns QString.
> _Should_ QString::mid be returning a QString though? Perhaps it should return 
> a QStringView?

That’s a separate question, but I agree it’s something we should investigate. 
Most likely it would break a large amount of code however (mid() being a 
methods that’s extremely widely used).

Cheers,
Lars


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Lars Knoll

> On 12 May 2020, at 23:21, Thiago Macieira  wrote:
> 
> On Tuesday, 12 May 2020 08:42:28 PDT Matthew Woehlke wrote:
>> How will this work? As I understand, the main advantage to
>> QStringLiteral is that it statically encodes the *length* as well as the
>> data. This isn't possible with raw literals, which are merely
>> NUL-terminated.
> 
> Black magic!
> 
> I mean, templates and constexpr. QStringView has these two constructors:
> 
>template 
>Q_DECL_CONSTEXPR QStringView(const Char ()[N]) noexcept;
> 
>template 
>Q_DECL_CONSTEXPR QStringView(const Char *str) noexcept;
> 
> The first one has a clear-cut size and can be initialised from a character 
> literal. The second one can attempt to determine at constexpr time what the 
> string length is.
> 
> It can't do so today (5.15) because of the lack of if constexpr. But Qt 6.0 
> will require C++17, so it can use if constexpr and implement a scan-for-NUL 
> at 
> constexpr time if the payload is also constexpr. If it isn't, then it falls 
> back to calling qustrlen().
> 
>> Even std::string wants literals for this reason. A UDL would obviously
>> be superior, but I don't see us ever getting rid of some form of QString
>> literal short of templatizing *everything* that takes a T* (for T in
>> char, char16_t, etc.) to take a T(&)[N] instead.
> 
>   u"foo"_qs
>   u"foo"_qsv;
> 
> But QStringView(u"foo") should call that first constructor. Doesn't it? I 
> never remember if the literal decays to pointer before the overload 
> resolution.
> 
>>> In most other places we should by default only use QString, unless
>>> there are very significant performance benefits to be had from using
>>> QStringView. This helps us keep an API that’s both easy to use and
>>> maintain. With the ideas above, you can still create a read-only
>>> string, so data copies can in many cases be avoided if required.
>> 
>> Really? How?
>> 
>> The "nice" thing about QStringView is that it does not have ownership;
>> you have to be careful about how long you hold onto it lest it turn into
>> a dangling pointer. You can't construct a QString from any old bag of
>> byt^Wcharacters because a QString is implicitly valid until it is destroyed.
> 
> That's the problem we've had with QStringLiteral and QString::fromRawData().
> 
> You *can* create it from read-only data and tell it never to try to modify. 
> The trick is guaranteeing that it remains valid until the last user finished 
> using it. Because of copy-on-write, that last user can be much later than the 
> statement that created the QString in the first place.
> 
> One way to ensure that guarantee is to never unload/free the memory block in 
> the first place. We already don't unload plugins for this and similar reasons.

I have partial patches (they still need some more work) where we can create a 
QString from read-only data. This is possible because QString in Qt 6 has a 
begin/end pointer in the class itself (not in the d-pointer).

So a read-only QString would contain a null d-pointer plus the pointer to data 
and size/end.

To avoid problems with plugins, we have two options. Either we continue not 
unloading them (safe bet), or we disable those constructors when compiling 
plugin code, and enforce a copy of the data in that case. 
> 
> One thing Lars and I agree is that those literals must be null-terminated, 
> unlike QStringView. Whether it's simply an API contract or whether we test/
> enforce remains to be seen. On the platforms where Qt runs, we can almost 
> always read past the end of the string to see if the terminator is there, 
> even 
> if it means writing assembly code.

Ideally, we can check this at compile time for most cases. We have been making 
that assumption, but not checking it in Qt5’s QString (you could get a non zero 
terminated string by using fromRawData()). 

Cheers,
Lars

> 
>> That said, I think I understand the reasoning here; make it up front
>> that the input is going to wind up in *a* QString. If the user's input
>> is *already* a QString, the function can make a shared copy rather than
>> constructing a brand new one. However, it would be nice for such
>> functions to offer r-value reference overloads for cases where a QString
>> needs to be created, or if the user is done with their copy. (Actually,
>> a possibly-owning reference wrapper could be useful here...)
> 
> -- 
> Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel System Software Products
> 
> 
> 
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread Lars Knoll


> On 12 May 2020, at 22:42, Thiago Macieira  wrote:
> 
> On Tuesday, 12 May 2020 09:34:40 PDT Marc Mutz via Development wrote:
>> On 2020-05-12 11:31, Jaroslaw Kobus wrote:
>>> So, just an idea: instead of repeating the common API part in QString
>>> and QStringView, what about making it one common? E.g. what about:
>>> - deriving QString from QStringView (and adding mutator API)
>>> or (maybe even better):
>>> - aggregating QStringView object as a part of QString API and giving
>> 
>>> accesor for it, like:
>> Vetoed. Over my dead body™. No inheriting of non-polymorphic types from
>> each other. What we have is static polymorphism, and that's what we
>> should continue to have.
> 
> Agreed, but also because many of the methods in QStringView are not 
> applicable 
> to QString.
> 
> QStringView::mid(), for example, returns QStringView, but QString::mid() 
> returns QString.
> 
> QString is neither a specialisation nor a broadening of QStringView.

Agreed as well. Those are two separate classes, but they can share the 
implementation of many methods.

Lars

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-13 Thread André Somers


On 12-05-20 22:42, Thiago Macieira wrote:


QStringView::mid(), for example, returns QStringView, but QString::mid()
returns QString.
_Should_ QString::mid be returning a QString though? Perhaps it should 
return a QStringView?


André

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Jaroslaw Kobus



> From: Development  on behalf of Thiago 
> Macieira 
> Sent: Tuesday, May 12, 2020 10:42 PM
> To: development@qt-project.org
> Subject: Re: [Development] QString and related changes for Qt 6

> > On 2020-05-12 11:31, Jaroslaw Kobus wrote:
> > >So, just an idea: instead of repeating the common API part in QString
> > > and QStringView, what about making it one common? E.g. what about:

[...]

> > > or (maybe even better):
> > > - aggregating QStringView object as a part of QString API and giving

[...]
>
> QStringView::mid(), for example, returns QStringView, but QString::mid()
> returns QString.
> 
> QString is neither a specialisation nor a broadening of QStringView.

The first option (inheritance) just gives the idea for simple, not perfect 
solution.

That's why I've mentioned the better option: aggregation: QStringView could be 
a member
of QString. However, the downside would be that every time you want to call a 
const method
for QString, you would need to first get access to the QStringView member. The 
advantage
is that in this way you may easily integrate different interfaces inside one 
class.

Anyway, if you are saying the APIs of QString and QStringView are not the same, 
and they
should still differ, than forget about the above.

Regards

Jarek
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Thiago Macieira
On Tuesday, 12 May 2020 08:42:28 PDT Matthew Woehlke wrote:
> How will this work? As I understand, the main advantage to
> QStringLiteral is that it statically encodes the *length* as well as the
> data. This isn't possible with raw literals, which are merely
> NUL-terminated.

Black magic!

I mean, templates and constexpr. QStringView has these two constructors:

template 
Q_DECL_CONSTEXPR QStringView(const Char ()[N]) noexcept;

template 
Q_DECL_CONSTEXPR QStringView(const Char *str) noexcept;

The first one has a clear-cut size and can be initialised from a character 
literal. The second one can attempt to determine at constexpr time what the 
string length is.

It can't do so today (5.15) because of the lack of if constexpr. But Qt 6.0 
will require C++17, so it can use if constexpr and implement a scan-for-NUL at 
constexpr time if the payload is also constexpr. If it isn't, then it falls 
back to calling qustrlen().

> Even std::string wants literals for this reason. A UDL would obviously
> be superior, but I don't see us ever getting rid of some form of QString
> literal short of templatizing *everything* that takes a T* (for T in
> char, char16_t, etc.) to take a T(&)[N] instead.

u"foo"_qs
u"foo"_qsv;

But QStringView(u"foo") should call that first constructor. Doesn't it? I 
never remember if the literal decays to pointer before the overload 
resolution.

> > In most other places we should by default only use QString, unless
> > there are very significant performance benefits to be had from using
> > QStringView. This helps us keep an API that’s both easy to use and
> > maintain. With the ideas above, you can still create a read-only
> > string, so data copies can in many cases be avoided if required.
> 
> Really? How?
> 
> The "nice" thing about QStringView is that it does not have ownership;
> you have to be careful about how long you hold onto it lest it turn into
> a dangling pointer. You can't construct a QString from any old bag of
> byt^Wcharacters because a QString is implicitly valid until it is destroyed.

That's the problem we've had with QStringLiteral and QString::fromRawData().

You *can* create it from read-only data and tell it never to try to modify. 
The trick is guaranteeing that it remains valid until the last user finished 
using it. Because of copy-on-write, that last user can be much later than the 
statement that created the QString in the first place.

One way to ensure that guarantee is to never unload/free the memory block in 
the first place. We already don't unload plugins for this and similar reasons.

One thing Lars and I agree is that those literals must be null-terminated, 
unlike QStringView. Whether it's simply an API contract or whether we test/
enforce remains to be seen. On the platforms where Qt runs, we can almost 
always read past the end of the string to see if the terminator is there, even 
if it means writing assembly code.

> That said, I think I understand the reasoning here; make it up front
> that the input is going to wind up in *a* QString. If the user's input
> is *already* a QString, the function can make a shared copy rather than
> constructing a brand new one. However, it would be nice for such
> functions to offer r-value reference overloads for cases where a QString
> needs to be created, or if the user is done with their copy. (Actually,
> a possibly-owning reference wrapper could be useful here...)

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Thiago Macieira
On Tuesday, 12 May 2020 10:48:24 PDT Giuseppe D'Angelo via Development wrote:
> What do you do? Adding a QStringView overload will make calls ambiguous,
> removing the QString one will be an ABI break. We need an established
> solution for these cases as they'll pop up during the Qt 6 lifetime.

Indeed.

And the API policy must be one such that it doesn't depend on what the method 
does *today* and it doesn't create a mess. Functions change.

Let's take an example with QRegularExpression's pattern (not picking on 
Giuseppe). Today it is:

QString pattern() const;
void setPattern(const QString );

QString QRegularExpression::pattern() const
{
return d->pattern;
}

Since this is returning a stored QString, someone might feel that it should 
instead return a QStringView. But if it's storing, then the setter should 
remain const QString &. That would be:

QStringView pattern() const;
void setPattern(const QString );

But suppose that there's a pcre2_get_pattern_16() function. Then someone might 
be tempted to say that since PCRE stores the pattern, we don't need to. That 
would mean QRegularExpression::pattern() ought to be written as:

QString QRegularExpression::pattern() const
{
qsizetype len = pcre2_get_pattern_length_16(d->compiledPattern);
QString retval(Qt::Uninitialized, len);
pcre2_get_pattern_16(d->compiledPattern, retval.data(), len);
return retval;
}

But if PCRE is going to store the pattern and PCRE doesn't use QString, then 
setPattern could take a QStringView instead. That would be:

QString pattern() const;
void setPattern(QStringView pattern);

That's the opposite of the previous one.

I want rules that determine what the API should be without looking at the 
implementation of those two functions.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Thiago Macieira
On Tuesday, 12 May 2020 09:34:40 PDT Marc Mutz via Development wrote:
> On 2020-05-12 11:31, Jaroslaw Kobus wrote:
> > So, just an idea: instead of repeating the common API part in QString
> > and QStringView, what about making it one common? E.g. what about:
> > - deriving QString from QStringView (and adding mutator API)
> > or (maybe even better):
> > - aggregating QStringView object as a part of QString API and giving
> 
> > accesor for it, like:
> Vetoed. Over my dead body™. No inheriting of non-polymorphic types from
> each other. What we have is static polymorphism, and that's what we
> should continue to have.

Agreed, but also because many of the methods in QStringView are not applicable 
to QString.

QStringView::mid(), for example, returns QStringView, but QString::mid() 
returns QString.

QString is neither a specialisation nor a broadening of QStringView.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Thiago Macieira
On Tuesday, 12 May 2020 02:04:35 PDT Tor Arne Vestbø wrote:
> During the contributor summit we were talking about just assuming “foo” is
> utf-8, now that our source code is utf-8. Is that not possible?

We've been doing that since 5.0.

But UTF-8 to UTF-16 requires a conversion. u"" wouldn't and in some cases, we 
would be able to use it without memory allocations either -- that is, 
QStringLiteral().

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Thiago Macieira
On Tuesday, 12 May 2020 02:01:45 PDT Edward Welbourne wrote:
> I largely agree, with the exception of: supporting an 8-bit string view
> type for comparisons (including startsWith(), find()/indexOf() and
> similar) can save client code a factor of two on the size of many string
> literals.  I'm fine with limiting its use to the QString(View) API,
> though.  So QUtf8View would replace QLatin1String as that 8-bit view
> type, with a much more limited scope.
> 
> While we can simply ask folk to stick a u on the front of their strings,
> doubling the size of each, it would be a kindness to those with lots of
> string literals to allow them to use u8 instead and avoid that doubling.
> Meanwhile, the many situations where data from an outside source arrives
> in UTF-8 make a case for providing a view type that can wrap such data
> and make it "presentable" for interaction with QString(View), tagged
> with the right semantics (i.e. the knowledge that it's UTF-8) in the
> type system.

I think we need some more data before we do that. First of all, char8_t 
doesn't exist before C++20. u8"" has existed since C++11, but it didn't 
produce char8_t literals until C++20. So we have to be careful with 
recommending people use it.

The APIs we add using char8_t, if any, will exist with C++20 only. But for Qt, 
everything char is already UTF-8, so we don't need char8_t.

The problem with QUtf8View is how it may be used. Unlike QLatin1String, direct 
UTF-16-to-UTF-8 comparisons as easy, so the QString methods that would take 
QUtf8View are necessarily slower. If space is a constraint but not runtime, it 
might be best to just use QString constructor.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Giuseppe D'Angelo via Development

On 5/12/20 6:12 PM, Иван Комиссаров wrote:

So the question is - is it possible to allow to construct QString from unicode 
literal?


"Not yet", but adding a constructor from char16_t to QString makes sense.

This creates a problem down the line: today you have a

  f(QString)

and you call it with f(u"whatever"). Then, later on, you realize that 
QString is not needed and QStringView suffices. (This is the case all 
over existing Qt code.)


What do you do? Adding a QStringView overload will make calls ambiguous, 
removing the QString one will be an ABI break. We need an established 
solution for these cases as they'll pop up during the Qt 6 lifetime.



Note that adding the QString(char16_t*) constructor introduces this 
ambiguity for the functions that are already overloaded on 
QString+QStringView (and thus today are using QStringView).


Thanks,
--
Giuseppe D'Angelo | giuseppe.dang...@kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts



smime.p7s
Description: S/MIME Cryptographic Signature
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Marc Mutz via Development

On 2020-05-12 16:12, Giuseppe D'Angelo via Development wrote:

On 5/12/20 12:20 PM, Иван Комиссаров wrote:
* Exceptions can be done where significant performance gains can be 
demonstrated and the API will by design not require a copy of the 
data (e.g. XML writer, stream writers, date time handling)
Let me disagree here. The decision should be taken on the fact if the 
object takes ownership of the string (and thus QString is used) or it 
only «looks» into it.


I agree. This however leaves us with questions regarding the API. E.g.:

class Attribute {
public:
  // OK: takes ownership
  void addAttribute(const QString , const QString );


Such code can take QAnyStringView which would be, essentially, 
std::variantchar32_t)>.


And while I still think that char[] should be deprecated once we can 
depend on char8_t, for the time being, that would work with "foo" (and 
convert to QUtf8StringView).


Thanks,
Marc
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Marc Mutz via Development

On 2020-05-12 11:31, Jaroslaw Kobus wrote:

So, just an idea: instead of repeating the common API part in QString
and QStringView, what about making it one common? E.g. what about:
- deriving QString from QStringView (and adding mutator API)
or (maybe even better):
- aggregating QStringView object as a part of QString API and giving
accesor for it, like:


Vetoed. Over my dead body™. No inheriting of non-polymorphic types from 
each other. What we have is static polymorphism, and that's what we 
should continue to have.


Sorry,
Marc
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Иван Комиссаров
Good question!

Personally, I think that both should accept u"foo" as input. However, the 
following code does not compile:

QString s(u"foo");

I have no idea if this is intentional or not and if there will be problems with 
QString/QStringView overloads. 
However, since the overloads are going to be revisited anyway, maybe it is 
possible to remove some QString overloads
In favor of the QStirngView ones and thus allow accepting unicode literal in 
QString as well.

I don’t think that accepting char* should be the desired use-case. Yes, it 
works in the first case because QT_NO_CAST_FROM_ASCII 
is disabled by default, but I don’t think we should encourage that use-case - 
if the unicode literal is working for both cases, that should become 
the «right way» to go.

So the question is - is it possible to allow to construct QString from unicode 
literal?

Ivan

> 12 мая 2020 г., в 16:12, Giuseppe D'Angelo via Development 
>  написал(а):
> 
> On 5/12/20 12:20 PM, Иван Комиссаров wrote:
>>> * Exceptions can be done where significant performance gains can be 
>>> demonstrated and the API will by design not require a copy of the data 
>>> (e.g. XML writer, stream writers, date time handling)
>> Let me disagree here. The decision should be taken on the fact if the object 
>> takes ownership of the string (and thus QString is used) or it only «looks» 
>> into it.
> 
> I agree. This however leaves us with questions regarding the API. E.g.:
> 
> class Attribute {
> public:
>  // OK: takes ownership
>  void addAttribute(const QString , const QString );
> 
>  // does not take ownership
>  bool hasAttribute(QStringView key) const;
> };
> 
> Is it OK that you can call addAttribute("foo", "bar") but not 
> hasAttribute("foo")? (And similar)
> 
> Thanks,
> -- 
> Giuseppe D'Angelo | giuseppe.dang...@kdab.com | Senior Software Engineer
> KDAB (France) S.A.S., a KDAB Group company
> Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
> KDAB - The Qt, C++ and OpenGL Experts
> 
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Matthew Woehlke

On 12/05/2020 03.49, Lars Knoll wrote:

* QStringLiteral should turn into a small wrapper around u”…”, and
probably also get deprecated. Maybe we could add a user defined
literal for it instead that returns a read-only QString (QString s =
“…”_q;). So u”…” would lead to a QStringView, u”…”_q to a read-only
QString.
How will this work? As I understand, the main advantage to 
QStringLiteral is that it statically encodes the *length* as well as the 
data. This isn't possible with raw literals, which are merely 
NUL-terminated.


Even std::string wants literals for this reason. A UDL would obviously 
be superior, but I don't see us ever getting rid of some form of QString 
literal short of templatizing *everything* that takes a T* (for T in 
char, char16_t, etc.) to take a T(&)[N] instead.



In most other places we should by default only use QString, unless
there are very significant performance benefits to be had from using
QStringView. This helps us keep an API that’s both easy to use and
maintain. With the ideas above, you can still create a read-only
string, so data copies can in many cases be avoided if required.

Really? How?

The "nice" thing about QStringView is that it does not have ownership; 
you have to be careful about how long you hold onto it lest it turn into 
a dangling pointer. You can't construct a QString from any old bag of 
byt^Wcharacters because a QString is implicitly valid until it is destroyed.


That said, I think I understand the reasoning here; make it up front 
that the input is going to wind up in *a* QString. If the user's input 
is *already* a QString, the function can make a shared copy rather than 
constructing a brand new one. However, it would be nice for such 
functions to offer r-value reference overloads for cases where a QString 
needs to be created, or if the user is done with their copy. (Actually, 
a possibly-owning reference wrapper could be useful here...)


--
Matthew
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Giuseppe D'Angelo via Development

On 5/12/20 12:20 PM, Иван Комиссаров wrote:

* Exceptions can be done where significant performance gains can be 
demonstrated and the API will by design not require a copy of the data (e.g. 
XML writer, stream writers, date time handling)

Let me disagree here. The decision should be taken on the fact if the object 
takes ownership of the string (and thus QString is used) or it only «looks» 
into it.


I agree. This however leaves us with questions regarding the API. E.g.:

class Attribute {
public:
  // OK: takes ownership
  void addAttribute(const QString , const QString );

  // does not take ownership
  bool hasAttribute(QStringView key) const;
};

Is it OK that you can call addAttribute("foo", "bar") but not 
hasAttribute("foo")? (And similar)


Thanks,
--
Giuseppe D'Angelo | giuseppe.dang...@kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts



smime.p7s
Description: S/MIME Cryptographic Signature
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Henry Skoglund

On 2020-05-12 12:36, Lars Knoll wrote:
...
Leaving things behind simplifies our lives and in the longer term also 
our users life. And yes, non unicode encodings are legacy in todays 
world. They need to disappear, and most people are working towards 
that goal. We can and should do our part.

Lars


+1

I still have some .exe files around I wrote for WIndows 3.0 in 1991, 
while they run nicely today (GDI graphics) on my Windows 10 PC, the 
Swedish characters display wrong, because somewhere along the way 
Microsoft decided that the kosher codepage for Windows programs would 
cease to be 850 and instead be 1251. Yes in 1991 CP 850 was hot, today 
not so much. So I'd prefer if Qt would require UTF-8 even on Windows.


P.S. Consider a similar type of "technical debt" being settled by Qt: 
I'm thinking of the "DPI awareness" setting in 5.14, i.e. for a default 
widgets program, Qt nowadays tells WIndows that it's "DPI aware" and 
wants the truth about screen coordinates, even on those portable PCs 
with high DPIs that have Scale set to 125% or 150%.  On the Qt forum 
I've seen lot of heat/complaints about QLabels being shoehorned in with 
the QLineEdits because the fonts are too big for those 125% or 150% 
screens, I'd answer: create a qt.conf file with the contents:

[Platforms]
WindowsArguments = dpiawareness=0

and your legacy widgets program will go back to display fine, albeit a 
bit blurry and bloated.
But! If you're asking (with that qt.conf file present) what the screen 
size is (e.g. QGuiApplication::screens(0)->geometry() etc.) Windows will 
lie to you and scale "backwards" so that a normal 2560x1440 screen is 
reported as "QRect(0,0,1707,960)". So using dpiawareness=0 is a bad 
long-term solution :-(


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Lars Knoll
> On 12 May 2020, at 11:34, André Pönitz  wrote:
> 
> On Tue, May 12, 2020 at 07:49:06AM +, Lars Knoll wrote:
>> I believe it’s important to leave the non Unicode world behind us [...]
> 
> Is that meant to be a convincing technical argument?

This is nothing technical per se.
> 
>>   * We have extensive support for legacy text encodings in Qt Core, that
>>   should not be there anymore in 2020
> 
> To clarify, since this kind of things is easily misread:
> 
>  "It should not be _in Qt Core_, but it should be somewhere else _in Qt_."
> 
> Getting easy access to encodings is a valuable feature of Qt.

A separate library that uses ICU behind the scenes is something I agree with. 
QTextCodec in it’s current form not so much.
> 
>>   * We offer options to generate HTML or XML in legacy encodings, even
>>   though the standard clearly says that those are deprecated
>> 
>>   * to/fromLocal8Bit() should be equivalent to to/fromUtf8() on all but
>>   Windows (where we’re still a few years away from fully getting rid of
>>   this)
>> 
>>   * source code encoding is undefined
>>   Cleaning this up has progressed quite a bit, and a lot of changes in
>>   various classes have been merged. There’s a large set of changes
>>   currently being reviewed the remove QTextCodec as a dependency in Qt
>>   (it’ll get moved to libQt5Compat), and introduce a new QStringConverter
>>   class, that can handle transcoding between Unicode encodings, Latin1
>>   and the system locale. For all systems except Windows, we make the
>>   additional assumption that the system locale is UTF-8 (see also my
>>   other mail about UTF-8 as System locale on Windows).
> 
> libQt5Compat is something that's likely to go away in Qt 7. I don't see the
> general need for text codecs going away. So it would make more sense to have
> them in a module of their own from the beginning.

See above, and the new QStringEncoder/Decoder can support additional encodings 
(though that’s not yet implemented).
> 
>>   A next step is to change the build system, so that it (by default)
>>   assumes that source code is encoded in UTF-8. We are lady do set
>>   compiler flags to ensure this when building Qt itself, but are not
>>   doing this yet for user code.
> 
> Which makes sense, because it's not up to a library to dictate how user
> code has to look like.

Funny, how most other programming languages actually ‘dictate’ that. gcc and 
clang have both switched to making this the default already for quite some time 
(even if your system locale happens to not be utf8). I’ve not seen complaints 
about this anywhere.
> 
>>   But gcc and clang do already treat all source code as UTF-8 by default
>>   (and I believe ICC does the same at least on platforms other than
>>   Windows). MSVC will require a /utf-8 flag to enable this, something
>>   that I want to add to the default config for both qmake and cmake when
>>   compiling a Qt app. Without it, MSVC will still assume the source code
>>   is encoded in the current ANSI code page and u”…” or u8”…” will result
>>   in garbage. Worse it’ll lead to non portable code, that might compile
>>   correctly on one developer machine and create garbage on the next one
>>   (as it uses a different locale).
> 
>>   Changing this also for our users will make source code written for Qt
>>   more portable and bring Qt on par with most other programming languages
>>   in the world that already mandate utf8 as the source encoding (JS,
>>   Swift, Java, etc).
> 
> "Bringing on par" by cutting functionality that is.
> 
> To me it is unclear how relevant citing other languages here is. If anything
> at all, Standard C++ would be relevant, which does *not* mandate UTF-8.

We are talking what the default is. If someone really wants a different 
encoding, they can still do that. And the default is already utf8 on all but 
windows (where it’s the current ansi code page, which means anything but ascii 
is not well defined at all).
> 
>>   [...]
> 
>>   Comments are welcome, [...]
> 
> I buy a "codecs are too big for Qt Core, they should be separate" argument
> (that was not made here unless I overlooked it) and I buy the "there should
> not be multiple overloads for the mass of string-taking functions in the API"
> argument. I'd even buy a "we don't have resources to even keep it around".
> 
> I don't understand the motivation for the "legacy", "believe", "important to
> leave behind" line of reasoning.

Leaving things behind simplifies our lives and in the longer term also our 
users life. And yes, non unicode encodings are legacy in todays world. They 
need to disappear, and most people are working towards that goal. We can and 
should do our part.

Lars




___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Иван Комиссаров

> 12 мая 2020 г., в 09:49, Lars Knoll  написал(а):
> 
> Hi all,
> 

First of all, the plan sounds great!

> 
> Most other classes:
> 
> * Only take and return QString
> * Exceptions can be done where significant performance gains can be 
> demonstrated and the API will by design not require a copy of the data (e.g. 
> XML writer, stream writers, date time handling)

Let me disagree here. The decision should be taken on the fact if the object 
takes ownership of the string (and thus QString is used) or it only «looks» 
into it.

Otherwise, QString gets propagated all over the place:

void addSuffix(const QString ) // can’t use view here!
{
m_memberString.append(suffix); // no QStringView overload, can’t use 
QStringView in the API
}

Ok, we aim to have an QString::append(QStringView) overload, so the example is 
not that good.

Another one:

QMimeType findMimeType(const QString ) // can’t use view here!
{
QMimeDataBase().mimeTypeForName(name); // no QStringView overload, the API 
propagates QString through all the code
}

I hope the idea is clear.

PS: it is not that easy to fix QMimeDataBase to take QStringView (I looked into 
the possibility), but the aim should be to take QStringView where it is 
possible, not where it is *faster*.

Ivan
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Lars Knoll
> On 12 May 2020, at 11:04, Tor Arne Vestbø  wrote:
> 
> 
>> On 12 May 2020, at 09:49, Lars Knoll  wrote:
>> 
>> * Our QLatin1String uses are in most cases about pure ASCII strings. In any 
>> case, we should consider mass porting them over to u”…” instead.
> 
> During the contributor summit we were talking about just assuming “foo” is 
> utf-8, now that our source code is utf-8. Is that not possible? 

It is, but we’d need to copy the data to create a QString. With 16bit data, we 
could avoid many of the copies.

Cheers,
Lars

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Jaroslaw Kobus
> From: Development  on behalf of Lars 
> Knoll 
> Sent: Tuesday, May 12, 2020 9:49 AM
> To: Qt development mailing list
> Subject: [Development] QString and related changes for Qt 6
>
>
> * QStringView and QByteArrayView need to be completed to implement all const 
> methods of QString/QByteArray

Wondering about this point. Looks like we aim for:

QString API = QStringView API (const API) + mutator API

So, just an idea: instead of repeating the common API part in QString and 
QStringView, what about making it one common? E.g. what about:
- deriving QString from QStringView (and adding mutator API)
or (maybe even better):
- aggregating QStringView object as a part of QString API and giving accesor 
for it, like:

QStringView QString::stringView();

In this way we are getting access to read-only API part of QString API. And we 
are not anymore worried about manual sync of the QString const API part and 
QStringView API. The same of course regards to QByteArray & QByteArrayView...

Jarek
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Tor Arne Vestbø

> On 12 May 2020, at 09:49, Lars Knoll  wrote:
> 
> * Our QLatin1String uses are in most cases about pure ASCII strings. In any 
> case, we should consider mass porting them over to u”…” instead.

During the contributor summit we were talking about just assuming “foo” is 
utf-8, now that our source code is utf-8. Is that not possible? 

Tor Arne 
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] QString and related changes for Qt 6

2020-05-12 Thread Edward Welbourne
Lars Knoll (12 May 2020 09:49) wrote:
> My high level goal for the string classes in Qt 6 was to complete the
> Unicode related changes that we started for Qt 5.0, where we made utf8
> and utf16 the main encodings, and simplify things. I believe it’s
> important to leave the non Unicode world behind us, and offer an as
> consistent cross-platform story here as we can.

+1

> A next step is to change the build system, so that it (by default)
> assumes that source code is encoded in UTF-8. We [already] do set
> compiler flags to ensure this when building Qt itself,

... except on Windows, which we plan to fix; good plan.

> Our string handling classes currently consist of the following
> classes: QByteArray, QString, QStringView, QStringRef, QStringLiteral
> and QLatin1String. The set it too large, inconsistent and needs
> cleaning up:

Indeed, we recently documented which should be used when; doing so made
it clear that we'd be better with fewer of these, if only for the sake
of making it easier to explain !

> * QByteArray’s methods like toUpper() will only handle ASCII
>   characters (they assume Latin1 in Qt5).

We should document that doing even this is under sufferance and we wish
folk would stop using QByteArray for it.  It's an operation that
implicates the semantics of the bytes, so should be done using a class
that believes it knows the semantics of the bytes - which QByteArray
should steadfastly refuse to do.  Aim to remove at Qt 7.

> This would leave us with 4 string-related classes: QByteArray(View)
> and QString(View).

Sounds much better; and clearer.

> One open question is whether we should add a QUtf8String with a
> char8_t. I am not yet convinced that we actually need the class
> though.

How about a QUtf8View, replacing QLatin1String, as the way to pass
single-byte-encoded literals into our string APIs ?  See below.

> The next question is what we do with our API methods. Currently we
> have many places where we have three to 4 overloads for the same
> methods (taking a QString, a QStringView, a QStringRef and a
> QLatin1String). We can’t have 4 overloads for each method in all of
> Qt, so we need to restrict overloads to the places where it is
> required. IMO this is mainly the string related classes
> themselves. And even there we can probably cut down on the number of
> overloads.

I largely agree, with the exception of: supporting an 8-bit string view
type for comparisons (including startsWith(), find()/indexOf() and
similar) can save client code a factor of two on the size of many string
literals.  I'm fine with limiting its use to the QString(View) API,
though.  So QUtf8View would replace QLatin1String as that 8-bit view
type, with a much more limited scope.

While we can simply ask folk to stick a u on the front of their strings,
doubling the size of each, it would be a kindness to those with lots of
string literals to allow them to use u8 instead and avoid that doubling.
Meanwhile, the many situations where data from an outside source arrives
in UTF-8 make a case for providing a view type that can wrap such data
and make it "presentable" for interaction with QString(View), tagged
with the right semantics (i.e. the knowledge that it's UTF-8) in the
type system.

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


[Development] QString and related changes for Qt 6

2020-05-12 Thread Lars Knoll
Hi all,

I’ve had a longer chat with Thiago about how to evolve QString for Qt 6 last 
week.

Some work has already happened, so both QString and QByteArray now share the 
data structure with QList/QVector, enabling zero-copy conversion between the 
types. There’s also some pending changes to transition those classes to 
qsizetype and removing the 32bit limitations we currently have.

My high level goal for the string classes in Qt 6 was to complete the Unicode 
related changes that we started for Qt 5.0, where we made utf8 and utf16 the 
main encodings, and simplify things. I believe it’s important to leave the non 
Unicode world behind us, and offer an as consistent cross-platform story here 
as we can.

Qt 5.x still has some left-overs from the pre-unicode world:

* QTextStream encodes in Latin1 by default, so do a couple of classes in some 
places
* While we assume Utf8 as the source encoding for Qt, we still use 
QLatin1String all over the place
* We have extensive support for legacy text encodings in Qt Core, that should 
not be there anymore in 2020
* We offer options to generate HTML or XML in legacy encodings, even though the 
standard clearly says that those are deprecated
* to/fromLocal8Bit() should be equivalent to to/fromUtf8() on all but Windows 
(where we’re still a few years away from fully getting rid of this)
* source code encoding is undefined

Cleaning this up has progressed quite a bit, and a lot of changes in various 
classes have been merged. There’s a large set of changes currently being 
reviewed the remove QTextCodec as a dependency in Qt (it’ll get moved to 
libQt5Compat), and introduce a new QStringConverter class, that can handle 
transcoding between Unicode encodings, Latin1 and the system locale. For all 
systems except Windows, we make the additional assumption that the system 
locale is UTF-8 (see also my other mail about UTF-8 as System locale on 
Windows).


A next step is to change the build system, so that it (by default) assumes that 
source code is encoded in UTF-8. We are lady do set compiler flags to ensure 
this when building Qt itself, but are not doing this yet for user code.

But gcc and clang do already treat all source code as UTF-8 by default (and I 
believe ICC does the same at least on platforms other than Windows). MSVC will 
require a /utf-8 flag to enable this, something that I want to add to the 
default config for both qmake and cmake when compiling a Qt app. Without it, 
MSVC will still assume the source code is encoded in the current ANSI code page 
and u”…” or u8”…” will result in garbage. Worse it’ll lead to non portable 
code, that might compile correctly on one developer machine and create garbage 
on the next one (as it uses a different locale).

Changing this also for our users will make source code written for Qt more 
portable and bring Qt on par with most other programming languages in the world 
that already mandate utf8 as the source encoding (JS, Swift, Java, etc).


Our string handling classes currently consist of the following classes: 
QByteArray, QString, QStringView, QStringRef, QStringLiteral and QLatin1String. 
The set it too large, inconsistent and needs cleaning up:

* With the source code encoding being utf8, QLatin1String makes a lot less 
sense, and I my goal is to deprecate/deprioritize it in Qt 6. Instead, I would 
like to advocate the use of u”…” to directly encode the string as utf-16.
* QStringRef has been superseded by QStringView and should get deprecated. The 
main hurdle here is it’s use in QXmlStream. The plan is to extend QXmlStringRef 
(yes, that one exists as well…) to cover the use case. Both QXmlStringRef and 
QStringRef will get a cast operator to QStringView. With that we can then 
remove all API that takes a QStringRef and replace it with API taking either a 
QString or a QStringView
* QStringLiteral should turn into a small wrapper around u”…”, and probably 
also get deprecated. Maybe we could add a user defined literal for it instead 
that returns a read-only QString (QString s = “…”_q;). So u”…” would lead to a 
QStringView, u”…”_q to a read-only QString.
* We should add a QByteArrayView to keep symmetry between the QString and 
QByteArray APIs. This is somewhat independent from the rest though and lower 
priority.
* QStringView and QByteArrayView need to be completed to implement all const 
methods of QString/QByteArray
* A basic different between QString and and QStringView will be that the view 
class can contain non zero terminated data and are read-only, while QString 
will guarantee a zero termination (I checked whether we can remove that 
enforcement, but it will break too much code). Sidenote: Currently, 
fromRawData() together with utf16() can break this assumption, we should fix 
this
* QByteArray’s methods like toUpper() will only handle ASCII characters (they 
assume Latin1 in Qt5).

This would leave us with 4 string related classes: QByteArray(View) and 
QString(View).

Another step that is