Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On 14/10/15 23:51, "Bubke Marco" wrote: >On October 14, 2015 23:10:26 Thiago Macieira >wrote: > >> On Wednesday 14 October 2015 20:52:12 Bubke Marco wrote: >>> On October 14, 2015 22:13:11 Thiago Macieira >>> >> wrote: >>> And I don't want an utf 8 baked >>> QString. For my use cases implicit sharing is overkill. Move semantics >>> would be enough. I want localAwareCompare(const char *s1, const char >>>*s2). >> >> Do it on your own. You just said that ICU has the function you want, so >>use >> it. > >So Qt is always shipping with ICU? No, we wanted to do this at some point, but it turns out that it’s not possible to rely on it on all platforms. > > >> Qt does not have to provide a comparator that operates on something >>other than >> its native string type. > >Isn't Qt a framework to help developers? Sorry your argumentation is >sounds not very empirical. Of course our aim should be to help developers. But there will always be some use cases which we will not cover. The question is whether this is one of them or not. > > >> >>> Maybe windows and mac os will bring support to the standard library so >>>we >>> don't need it but in the mean time it would be very helpful. >>> >>> A utf 8 based QTextDocument would be maybe nice too. >> >> What for? It needs to keep a lot of extra structures, so the cost of >> conversion and extra memory is minimal. And besides, QTextDocument >>really >> needs a seekable string, not UTF-8. > >Is UTF 16 seekable? You still have surrogates and you can merge merge >code points. For the most parts. When it comes to positioning cursors inside the text, you’ll always need to take care of complex text layouting, diacritics and (in the case of utf16) surrogates. Still, a lot of the seeking is probably easier with utf16 than with utf8. > > >Lets describe an example. I send the QTextDocument content to an library >which expect utf8 content and gives me back positions. This gets >interesting if you use non >ASCII signs. Actually the new clang code model works that way. We also have the opposite case, where we need to send utf16 to a 3rd party or system library, and get back positions. Unfortunately, not all APIs take the same encoding. > >> >> Even if we provide UTF-8 support classes, those will not propagate to >>the GUI. >> Forget it. > >What about compressing UTF 16 like python is doing it for UTF 32. If you >are only using ascii you set a flag and you can remove all that useless >zeros. It would be have implications for data() but maybe we should not >provide access to the internal representation. If you use UTF 32 as a >base you don't need anymore surrogates. That’s back to a mixed representation in QString. I personally think that combines the worst of both worlds. Cheers, Lars ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
> On Oct 14, 2015, at 5:14 PM, Matthew Woehlke wrote: > > On 2015-10-14 10:30, André Somers wrote: >> Op 14-10-2015 om 15:59 schreef Matthew Woehlke: >>> STL should change. In Qt and Python, you can use negative indices to >>> refer to a distance (length) relative to the end (length) of the string. >>> In STL you can't do that, which is a significant limitation by >>> comparison. Please don't drop this useful functionality! >> >> I'm not so sure anymore. Do you really think that for instance passing >> in a negative _from_ in QString::indexOf to search from the back of the >> string is intuitive API? I don't. > > Huh? Of course it is. > > s.indexOf('c', 5); // find 'c', forward, starting at offset 5 > s.indexOf('c', -5); // find 'c', forward, starting at offset N-5 So from where does s.indexOf(‘c’, i-2) search? This is similar to integer overflow, and I think utilizing that in an API leads to less readable and potentially unexpectedly behaving code. Anyhow this seems to be only vaguely related to the things that are discussed in this thread. Br, Eike > A negative offset -K is exactly the same as N + 1 - K (N = length of > string). It just saves having to write that out yourself. It *doesn't* > change the behavior of the function. (I think you are confusing > tail-relative offset with reverse operation, which is totally different > and orthogonal.) > > Even STL supports this, partially, for -1; string::npos is generally > equivalent to -1 in Qt. > > Bah. Okay, apparently Qt actually *doesn't* support tail-relative, but > just treats n<0 like string::npos. That could be improved for Qt 6 > though, but only if n is signed. > > -- > Matthew > > ___ > Development mailing list > Development@qt-project.org > http://lists.qt-project.org/mailman/listinfo/development -- Eike Ziller, Senior Software Engineer - The Qt Company GmbH The Qt Company GmbH, Rudower Chaussee 13, D-12489 Berlin Geschäftsführer: Mika Pälsi, Juha Varelius, Tuula Haataja Sitz der Gesellschaft: Berlin, Registergericht: Amtsgericht Charlottenburg, HRB 144331 B ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Thursday 15 October 2015 06:05:34 Thiago Macieira wrote: > On Thursday 15 October 2015 02:22:50 Marc Mutz wrote: > > On Thursday 15 October 2015 00:27:14 Thiago Macieira wrote: > > > Way too much code would break if we did that because we allow people > > > access > > > to the data pointer in QString and to iterate directly > > > (std::{,w,u16}string don't allow that, which makes parsing them > > > actually a lot more cumbersome). > > > > > > > > Just chiming in to say: It does: > > http://en.cppreference.com/w/cpp/string/basic_string/data > > Ah, right. The mutable pointer is the one missing... char *data = &*str.begin(); There might not be explicit API for it, but it's not forbidden to use that idiom (if !str.empty(), of course). -- Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company Tel: +49-30-521325470 KDAB - The Qt Experts ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Thursday 15 October 2015 02:22:50 Marc Mutz wrote: > On Thursday 15 October 2015 00:27:14 Thiago Macieira wrote: > > Way too much code would break if we did that because we allow people > > access > > to the data pointer in QString and to iterate directly > > (std::{,w,u16}string don't allow that, which makes parsing them actually a > > lot more cumbersome). > > Just chiming in to say: It does: > http://en.cppreference.com/w/cpp/string/basic_string/data Ah, right. The mutable pointer is the one missing... -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Thursday 15 October 2015 00:27:14 Thiago Macieira wrote: > Way too much code would break if we did that because we allow people access > to the data pointer in QString and to iterate directly > (std::{,w,u16}string don't allow that, which makes parsing them actually a > lot more cumbersome). Just chiming in to say: It does: http://en.cppreference.com/w/cpp/string/basic_string/data -- Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company Tel: +49-30-521325470 KDAB - The Qt Experts ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
Hi Andre, On Wednesday 14 October 2015 22:37:01 André Pönitz wrote: > That's why I'd like to propose the following: > > Since experiments within Qt proper are difficult due to the BC > and SC guarantees we give and the practical impossibility to un-do > additions we should simply not do it there. > > Instead, we could (and should) use part of Qt Creator's code base, > specifically some of 'leaf' plugins (i.e. plugins with no known > downstream users), to play with the idea, and develop a solid > understanding of the pros and cons of the idea of using *View > classes in interfaces until Qt 6 comes. > > The way forward could be to add e.g. 'Utils::[Q]StringView' > and 'Utils::[Q]ByteArrayView' in implementation src/libs/utils > and start using these in a few 'harmless' plugins. > > The advantages here are less restrictions due to lower compatibility > guarantees, less restrictions imposed by older compilers, less harm > done if the experiment fails (i.e. if the *Views turn out to not be > beneficial) and generally more flexibility when e.g. comparing competing > implementations. > > Opinions? I disagree that QtC is a better place to try out QStringView. The user base of Qt APIs is orders of magnitude larger than that of QtC APIs, and we should encourage outside experiments, not prevent them. Just like QStringBuilder, we can make QStringView opt-in for now (which means providing a QString overload :( - but we can start with existing API). Thanks, Marc -- Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company Tel: +49-30-521325470 KDAB - The Qt Experts ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 21:51:23 Bubke Marco wrote: > On October 14, 2015 23:10:26 Thiago Macieira wrote: > > Do it on your own. You just said that ICU has the function you want, so > > use > > it. > > So Qt is always shipping with ICU? It can be disabled on Windows. On OS X there's no point since it's part of the system. On Linux, if you disable it, you're going to have some other features reduced, so don't disable it. > > Qt does not have to provide a comparator that operates on something other > > than its native string type. > > Isn't Qt a framework to help developers? Sorry your argumentation is sounds > not very empirical. Yes, it is. But Qt's goal is not to support every single use-case and corner- case out there. Qt should make 90% easy and 9% possible. That means there's a 1% of the realm of possibilities that Qt does not address. If your use-case calls into this group, use the fact that Qt is native code and just call other libraries. That's one of the two main advantages of native code. There's no sandbox to escape from. Qt already supports doing locale-aware comparison. We even have a class for it, so it can be done efficiently: QCollator and it supports our native string type (QString). Providing extra support for a character encoding that is not what QString uses falls in that 1%. Just use ICU. > >> Maybe windows and mac os will bring support to the standard library so we > >> don't need it but in the mean time it would be very helpful. > >> > >> A utf 8 based QTextDocument would be maybe nice too. > > > > What for? It needs to keep a lot of extra structures, so the cost of > > conversion and extra memory is minimal. And besides, QTextDocument really > > needs a seekable string, not UTF-8. > > Is UTF 16 seekable? You still have surrogates and you can merge merge code > points. Seekable enough. It's much easier to deal with than UTF-8. A surrogate pair, as its name says, appears *only* in pairs, so you always know if you're on the first or on the second. Moreover, all living languages are encoded in the Basic Multilingual Plane, so no surrogate pairs are required for any of them. Handling of surrogate pairs can be moved to non-critical codepaths. As for combining code points, that's something different and usually one or more layers removed from the seeking, along-side zero- and full-width code points. QTextDocument also handles fonts with variable width glyphs, so you can never simply convert a byte index to pixel just like that. (not to mention those pesky line breaks...) > Lets describe an example. I send the QTextDocument content to an library > which expect utf8 content and gives me back positions. This gets > interesting if you use non ASCII signs. Actually the new clang code model > works that way. That example shows how UTF-16 is better. See above on seekability of UTF-16 vs UTF-8. The solution for this is to fix the library to accept UTF-16. When we were doing Qt 5.0, we needed PCRE to support UTF-16. Their developers were very welcoming and wrote the version that supports UTF-16, so Qt does not need to reallocate. > > Even if we provide UTF-8 support classes, those will not propagate to the > > GUI. Forget it. > > What about compressing UTF 16 like python is doing it for UTF 32. If you are > only using ascii you set a flag and you can remove all that useless zeros. > It would be have implications for data() but maybe we should not provide > access to the internal representation. If you use UTF 32 as a base you > don't need anymore surrogates. That's what Lars called a "hybrid solution" and vetoed. I second that. Way too much code would break if we did that because we allow people access to the data pointer in QString and to iterate directly (std::{,w,u16}string don't allow that, which makes parsing them actually a lot more cumbersome). As for UTF-32/UCS-4, it occupies twice as much space as it needs for all text written with living languages. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On October 14, 2015 23:10:26 Thiago Macieira wrote: > On Wednesday 14 October 2015 20:52:12 Bubke Marco wrote: >> On October 14, 2015 22:13:11 Thiago Macieira > wrote: >> And I don't want an utf 8 baked >> QString. For my use cases implicit sharing is overkill. Move semantics >> would be enough. I want localAwareCompare(const char *s1, const char *s2). > > Do it on your own. You just said that ICU has the function you want, so use > it. So Qt is always shipping with ICU? > Qt does not have to provide a comparator that operates on something other > than > its native string type. Isn't Qt a framework to help developers? Sorry your argumentation is sounds not very empirical. > >> Maybe windows and mac os will bring support to the standard library so we >> don't need it but in the mean time it would be very helpful. >> >> A utf 8 based QTextDocument would be maybe nice too. > > What for? It needs to keep a lot of extra structures, so the cost of > conversion and extra memory is minimal. And besides, QTextDocument really > needs a seekable string, not UTF-8. Is UTF 16 seekable? You still have surrogates and you can merge merge code points. Lets describe an example. I send the QTextDocument content to an library which expect utf8 content and gives me back positions. This gets interesting if you use non ASCII signs. Actually the new clang code model works that way. > > Even if we provide UTF-8 support classes, those will not propagate to the > GUI. > Forget it. What about compressing UTF 16 like python is doing it for UTF 32. If you are only using ascii you set a flag and you can remove all that useless zeros. It would be have implications for data() but maybe we should not provide access to the internal representation. If you use UTF 32 as a base you don't need anymore surrogates. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 22:56:15 André Pönitz wrote: > > I think there’s actually quite a few of those. In addition, it might be > > tricky to use QStringView in signals and slots. > > One could try to be clever and go through an intermediate QString > object at least in queued connections. Or even always. -2 on any signal-slot special-casing for some types. It's possible that the string in question *is* retained and there's no need to copy. We can't know that in QObject::activate, so we shouldn't try. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 20:52:12 Bubke Marco wrote: > On October 14, 2015 22:13:11 Thiago Macieira wrote: > > On Wednesday 14 October 2015 17:55:34 Bubke Marco wrote: > >> Think about a local aware compare which is called very very often. You > >> don't want malloc in between. In in most cases you get an const char* or > >> const shor* in this cases It would be nice if your interface would > >> support UTF-8 and not only UTF-16. > > > > Three of the four implementations of QString::localeAwareCompare operate > > on > > UTF-16 (Win32 CompareStringW, CoreFoundation's CFStringCompare and ICU > > ucol_strcoll). That's another reason for keeping QString as UTF-16. > > Thiago, to my understanding ICU is supporting UTF 8 too. I don't ask for UTF > 8 support because I like it but I need it. There's ucol_strcollUTF8 since ICU 50, indeed. Quite a few systems are still running older versions today, but that wouldn't be an argument for Qt 6. > And I don't want an utf 8 baked > QString. For my use cases implicit sharing is overkill. Move semantics > would be enough. I want localAwareCompare(const char *s1, const char *s2). Do it on your own. You just said that ICU has the function you want, so use it. Qt does not have to provide a comparator that operates on something other than its native string type. > Maybe windows and mac os will bring support to the standard library so we > don't need it but in the mean time it would be very helpful. > > A utf 8 based QTextDocument would be maybe nice too. What for? It needs to keep a lot of extra structures, so the cost of conversion and extra memory is minimal. And besides, QTextDocument really needs a seekable string, not UTF-8. Even if we provide UTF-8 support classes, those will not propagate to the GUI. Forget it. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On October 14, 2015 22:13:11 Thiago Macieira wrote: > On Wednesday 14 October 2015 17:55:34 Bubke Marco wrote: >> Think about a local aware compare which is called very very often. You don't >> want malloc in between. In in most cases you get an const char* or const >> shor* in this cases It would be nice if your interface would support UTF-8 >> and not only UTF-16. > > Three of the four implementations of QString::localeAwareCompare operate on > UTF-16 (Win32 CompareStringW, CoreFoundation's CFStringCompare and ICU > ucol_strcoll). That's another reason for keeping QString as UTF-16. > Thiago, to my understanding ICU is supporting UTF 8 too. I don't ask for UTF 8 support because I like it but I need it. And I don't want an utf 8 baked QString. For my use cases implicit sharing is overkill. Move semantics would be enough. I want localAwareCompare(const char *s1, const char *s2). Maybe windows and mac os will bring support to the standard library so we don't need it but in the mean time it would be very helpful. A utf 8 based QTextDocument would be maybe nice too. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wed, Oct 14, 2015 at 11:15:55AM +, Knoll Lars wrote: > >That leaves classes which simply store the string. You cited QUrl. I > >don't see > >a problem providing QString overloads for these, esp. considering that > >we're > >starting out with an all-QString API here. Then again, once we have > >QStringView overloads, we can simply disable the QString overloads and > >see the > >effect. > > I think there’s actually quite a few of those. In addition, it might be > tricky to use QStringView in signals and slots. One could try to be clever and go through an intermediate QString object at least in queued connections. Or even always. > [...] > Of course we don’t know all it’s uses. But many uses outside of QtCore are > clearly less critical. QLineEdit::setText is clearly not called in tight > loops, and once you set the text it has to do lots of other work. There > are many similar APIs in Qt, where I don’t think we’ll ever see a benefit > of a QStringView, and the simplicity of passing in a const QString ref is > probably preferrable. Right. OTOH there are instances where it provably *does* matter, e.g. everything in the vicinity of QFileInfo, or: > >Take QDateTime as a warning. ... > I am certainly in favor of experimenting with this. Let’s start in a > branch or behind an ifdef. Or in a safer place. See my other mail. Andre' ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wed, Oct 14, 2015 at 06:37:19AM +, Knoll Lars wrote: > Agree here as well. We can’t make QString utf-8 backed without breaking > way too much code. I also don’t see the need for it. The native encoding > on Windows and Mac (Cocoa) is utf-16 as well, on Linux it’s utf-8. So no > matter which platform we’re on, we won’t avoid some conversions. I am afraid that "the native encoding on Windows and Mac (Cocoa) is UTF-16" argument does not carry much weight in my daily work. I read/write files a lot, talk to processes/services/whatever. Almost all of that uses some 8-bit encoding, often enough something compatible with UTF-8, also on Windows and Mac. Even small fry like settings keys is usually plain English 7-bit clean. QString's UTF-16 is pretty much the antithesis of a good compromise in that area. It generates line noise in the sources and wastes cycles at runtime. > And I will strongly oppose any attempts to make QString some sort of > hybrid supporting both. The added complexity in maintaining the code base > is simply not worth it. I don't think a hybrid would be better, either. But that is not part of this RFC. I think Marc's proposal of using *View classes in interfaces has some merits. How much exactly I am unsure about. I only know that the (non-)performance of QString based interfaces has bitten me often enough to justify at least some experiments. That's why I'd like to propose the following: Since experiments within Qt proper are difficult due to the BC and SC guarantees we give and the practical impossibility to un-do additions we should simply not do it there. Instead, we could (and should) use part of Qt Creator's code base, specifically some of 'leaf' plugins (i.e. plugins with no known downstream users), to play with the idea, and develop a solid understanding of the pros and cons of the idea of using *View classes in interfaces until Qt 6 comes. The way forward could be to add e.g. 'Utils::[Q]StringView' and 'Utils::[Q]ByteArrayView' in implementation src/libs/utils and start using these in a few 'harmless' plugins. The advantages here are less restrictions due to lower compatibility guarantees, less restrictions imposed by older compilers, less harm done if the experiment fails (i.e. if the *Views turn out to not be beneficial) and generally more flexibility when e.g. comparing competing implementations. Opinions? Andre' ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 17:55:34 Bubke Marco wrote: > Think about a local aware compare which is called very very often. You don't > want malloc in between. In in most cases you get an const char* or const > shor* in this cases It would be nice if your interface would support UTF-8 > and not only UTF-16. Three of the four implementations of QString::localeAwareCompare operate on UTF-16 (Win32 CompareStringW, CoreFoundation's CFStringCompare and ICU ucol_strcoll). That's another reason for keeping QString as UTF-16. I don't think any of those even allocates memory, but it's impossible to tell for sure with the CoreFoundation API. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 20:09:56 Marc Mutz wrote: > On Wednesday 14 October 2015 12:41:12 Allan Sandfeld Jensen wrote: > > Why not a QCharArray? With external data constructor, that should be the > > same, shouldn't it? > > If you propose something like QString/QByteArray::fromRawData(), then that > allocates the control block, so no, not really an option. Which is also solved by the null d-pointer. In other words QStringLiteral("foo") === QString::fromRawData(u"foo", 3); In theory. In practice, there may be some dragons hidden somewhere. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 20:04:12 Marc Mutz wrote: > On Wednesday 14 October 2015 18:11:26 Thiago Macieira wrote: > > and the fact that QStringLiterals don't share will cause the > > innocent-looking above code require 64 bytes of read-only data. > > They are shared, because it seems that lambdas within the same function have > the same type. At least last I checked, that was what GCC implemented. GCC 5.2, 6: 2 lambdas, data duplicated Clang 3.7, 3.8: 2 lambdas, data duplicated ICC 16: 2 lambdas, data duplicated You can see from the disassembly that they are two different types. > > movq_ZN10QArrayData18shared_static_dataE@GOTPCREL(%rip), %rax > > And you want the nullptr to get rid of this relocation. Yes, but more importantly because it speeds up the check for when reference counting should be done. Right now, it needs to check bit 9 inside d->flags, which means dereferencing the pointer (hitting another cacheline) and the compiler never knows that test is constant with QStringLiterals. With a null pointer, the check is very trivial (a TEST instruction, for both the null and the ~1 check) and the compiler should be able to optimise the destructor away. Here's the entire function, as it is today with one QStringLiteral only: (compiled with GCC 6 -fno-exceptions, rearranged/edited for clarity) ; load the literal: movq_ZN10QArrayData18shared_static_dataE@GOTPCREL(%rip), %rax ; d movl$3, 16(%rsp); str.d.size = 3 movq%rax, (%rsp); str.d.d = &QArrayData::shared_static_data leaq.LC0(%rip), %rax; u"foo" movq%rax, 8(%rsp) ; str.d.b = u"foo" ; make the call: movq%rsp, %rdi call_Z1fRK7QString@PLT ; inlined QString::~QString movq(%rsp), %rax; reload the d pointer testl $512, (%rax); d->flags & QArrayData::ImmutableHeader je .L8 addq$40, %rsp ret ; this is the dead code, it never gets run: .L8: lock subl $1, 4(%rax) ; d->ref_.deref() jne .L5 movq(%rsp), %rdi; load d pointer movl$16, %edx ; alignof(QTypedArrayData) movl$2, %esi; sizeof(QChar) call_ZN10QArrayData10deallocateEPS_mm@PLT addq$40, %rsp ret A hacky implementation that uses a null pointer instead: ; load the literal: leaq.LC0(%rip), %rax; u"foo" movq$0, (%rsp) ; str.d.d = nullptr movq%rax, 8(%rsp) ; str.d.b = u"foo" movl$3, 16(%rsp); str.d.size = 3 ; make the call movq%rsp, %rdi call_Z1fRK7QString@PLT addq$40, %rsp ret The QString::~QString destructor expanded to empty with GCC. Unfortunately, Clang and ICC retained the check (they must be assuming the callee modified the const parameter). Unfortunately, if I change the isStatic to check for LSB set for the SSO case, even GCC gets thrown off and brings back the dead code. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
Marc Mutz > I'm not optimising. I'm decoupling the concept of a "QString" from the owning > implementation "QString", so that we don't need to either convert from/to > QString quite so often or you can use "foreign types" > (std::basic_string, char16_t[], ...) in lieu of QString. That is > important when you need to interface with 3rd-party libraries. Think about a local aware compare which is called very very often. You don't want malloc in between. In in most cases you get an const char* or const shor* in this cases It would be nice if your interface would support UTF-8 and not only UTF-16. Incorporating ideas of http://utfcpp.sourceforge.net/ could be useful. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 12:41:12 Allan Sandfeld Jensen wrote: > Why not a QCharArray? With external data constructor, that should be the > same, shouldn't it? If you propose something like QString/QByteArray::fromRawData(), then that allocates the control block, so no, not really an option. > Anyway, I doubt this is really something that needs optimizing, QString is > neat because it is simple and easy to remember. If anything we need to > use QByteArray in more places where QStrings are only 8-bit strings. I'm not optimising. I'm decoupling the concept of a "QString" from the owning implementation "QString", so that we don't need to either convert from/to QString quite so often or you can use "foreign types" (std::basic_string, char16_t[], ...) in lieu of QString. That is important when you need to interface with 3rd-party libraries. Thanks, Marc -- Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company Tel: +49-30-521325470 KDAB - The Qt Experts ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 18:11:26 Thiago Macieira wrote: > and the fact that QStringLiterals don't share will cause the > innocent-looking above code require 64 bytes of read-only data. They are shared, because it seems that lambdas within the same function have the same type. At least last I checked, that was what GCC implemented. > movq_ZN10QArrayData18shared_static_dataE@GOTPCREL(%rip), %rax And you want the nullptr to get rid of this relocation. I like it! Thanks, Marc -- Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company Tel: +49-30-521325470 KDAB - The Qt Experts ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 15:59:21 Matthew Woehlke wrote: > On 2015-10-14 07:15, Knoll Lars wrote: > > In addition, it might be tricky to use QStringView in signals and > > slots. > > As I previously stated, I'm pretty sure you *CAN'T* use QStringView to > call slots, except for direct call. In any other case, you risk the > backing data being modified or, worse, deallocated, before the slot > dispatches (this is *especially* dangerous with cross-thread dispatch, > since now you have thread safety to worry about). The only way around > that would be for QStringView to take a COW reference to the underlying > data. > > We already have a class like that. It's called QString. > > What *might* work is if the event dispatcher, when it makes copies of > the arguments, makes a deep copy of QStringView into a QString. I'm not > sure if this is possible, though, and anyway then you're in the same > boat of making a (potentially) unnecessary deep copy if you had a > QString in the first place. This is nothing new. You cannot pass reference types through cross-thread signal/slot connections. In fact, you cannot pass any non-reentrant type, either. That doesn't prevent API such as QPrintPreviewDialog::paintRequested() from cropping up, and still being useful. Thanks, Marc -- Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company Tel: +49-30-521325470 KDAB - The Qt Experts ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 08:51:11 Thiago Macieira wrote: > The separation of the string itself from the size and the d pointer allows > the compiler, if it wants to, to share strings. In fact, disassembly of > > f(QStringLiteral("foo"), QStringLiteral("foo")) > > produces one copy of u"foo" only. Let me expand on this. Current Qt5 QStringLiteral("foo") produces a data block of size sizeof(QArrayData) + sizeof(u"foo") = 24 + 8 = 32 bytes and the fact that QStringLiterals don't share will cause the innocent-looking above code require 64 bytes of read-only data. My current code expands to 8 bytes of read-only data, at the expense of a little more code. Current code: leaq_ZZZ1fvENKUlvE0_clEvE15qstring_literal(%rip), %rax movq%rax, 16(%rsp) leaq_ZZZ1fvENKUlvE_clEvE15qstring_literal(%rip), %rax movq%rax, (%rsp) ; followed by the call: movq%rsp, %rdi leaq16(%rsp), %rsi call_Z1fRK7QStringS1_@PLT My code: leaq.LC0(%rip), %rcx; u"foo" movq_ZN10QArrayData18shared_static_dataE@GOTPCREL(%rip), %rax movq%rcx, 40(%rsp) movl$3, 48(%rsp) movq%rcx, 8(%rsp) movq%rax, 32(%rsp) movq%rax, (%rsp) movl$3, 16(%rsp) ; the call itself is unchanged: movq%rsp, %rdi leaq32(%rsp), %rsi call_Z1fRK7QStringS1_@PLT -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 09:59:28 Matthew Woehlke wrote: > On 2015-10-14 06:16, Marc Mutz wrote: > > First, afaiu from what Thiago mentions in reviews, Q6String will have SSO > > (small-string-optimisation) which makes many short strings expensive to > > copy (if you think that copying 24 bytes is slower than upping an atomic > > int through an indirection) or cheap to copy (if you think the opposite). > > In any case, small strings will be very cheap to create (no allocation), > > so for many strings there will be not much difference between passing a > > QStringView or passing a QString. > > Atomic operations are expensive (I think I heard once 'on the order of > 100 instruction cycles', but that's highly apocryphal), mainly I would > guess due to the need to maintain cache coherency. A small copy might > happen entirely in local hot cache. 24 bytes is a whole three registers > on a modern 64-bit machine. That's probably not going to be very slow. This discussion is a red herring. It's not a choice between copying and atomically incrementing a reference counter. It's a choice between copying and between copying plus incrementing the reference counter. QString s2 = s1; needs to copy those bytes that are sizeof(QString) *anyway*, regardless of whether in addition to that it will increment the refcount. > >> Yes, signed please. We can discuss whether it should be 64bit for Qt 6. > > > > The current std API uses size_t. Do you (= both of you) expect that ever > > to > > change? If it doesn't, Qt will forever be the odd one out, until we > > finally > > drop QVector etc for std::vector etc and then porting will be a horror > > because of MSVC's annoying warnings. > > STL should change. In Qt and Python, you can use negative indices to > refer to a distance (length) relative to the end (length) of the string. > In STL you can't do that, which is a significant limitation by > comparison. Please don't drop this useful functionality! And see discussions in the std-discussions and std-proposals mailing list. I repeat what I said: the current *committee* stance is that you should use signed for everything, except when you need well-defined overflow behaviour. The only problem, which I raised there and did not get resolved, is which type to use. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 12:16:47 Marc Mutz wrote: > > >But as a condition to be even considered, it needs to be only for the > > >methods > > >that do not hold a copy of the string. That is, methods that immediately > > >consume the string and no longer need to reference its contents. > > Thiago, I think it would help the discussion if you quickly summarised your > planned changes to QString in Qt 6. > > AFAIK, the size and offset will move into the object, so I expected that > Q6String would subsume QStringRef, because each QString could provide a > separate view on the shared underlying data. I also was led to believe that > Q6String would use SSO, which, given its inceased sizeof(), would make a > lot of sense, imo. Indeed, that's the biggest gain. QString will contain a QStringPrivate, which is struct QStringPrivate { QArrayData *d; ushort *b; qsize size; // let's bikeshed what qsize is later }; My current code initialises a QStringLiteral like so: # define QStringLiteral(str) \ ([]() -> QString { \ QStringPrivate holder = { \ QArrayData::sharedStatic(), \ reinterpret_cast(const_cast(QT_UNICODE_LITERAL(str))), \ sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; \ return QString(holder); \ }()) \ The separation of the string itself from the size and the d pointer allows the compiler, if it wants to, to share strings. In fact, disassembly of f(QStringLiteral("foo"), QStringLiteral("foo")) produces one copy of u"foo" only. Like you said, QString can become its own QStringView/QStringRef/QSubString. QString::left/mid/right can simply copy the d pointer, increment the refcount, then adjust b and size. This solves the issue I had with your proposal: passing a QStringView to a method that decides to copy it, so it wouldn't participate in reference counting. The drawback with this is the pathological case where a short substring is holding a large data block hostage. My next objective, not yet achieved due to lack of time, is to make that QArrayData::sharedStatic() actually be a null pointer. That is, for anything that we didn't allocate memory for, the d pointer should be null. That implies a much faster loading of constant QStringLiterals and much faster handling of the decrement case. The biggest pain point in the code above in my current version is what happens after the call to f(): the compiler generates 2x bit testing of d->flags and calls to QArrayData::deallocate(), which are dead code and will never be run. After that, implement SSO, which should hold 11 UTF-16 characters, including the null terminator. If we benchmark and find that we could use more, we can simply artificially increase sizeof(QString) to 32, which may have some extra benefits of its own, including the fact that the 24-byte short QString will be at odds with the null d pointer -- the if (d) check instead becomes if (quintptr(d) & ~quintptr(1)) [also note how the order of the members in QStringPrivate needs to change for big-endian architectures] [and note everything I say about QString also applies to QByteArray and QVector] > And then I thought, QString would be converted to hold UTF-8. I saw > wip/qstring-utf8 fly by on gerrit, but ok, that hasn't received any updates > since 2012. That was when we converted the QString methods taking const char* from Latin1 to UTF-8. The backing store has never changed. My version of QString stores an extra flag that indicates whether the string is US-ASCII, in which case we can run the unchecked to-Latin1 algorithm in both toLatin1 and toUtf8. Another idea I had but haven't investigated is to cache that result, which requires the returned QByteArray to share the d pointer with the QString. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On 2015-10-14 10:30, André Somers wrote: > Op 14-10-2015 om 15:59 schreef Matthew Woehlke: >> STL should change. In Qt and Python, you can use negative indices to >> refer to a distance (length) relative to the end (length) of the string. >> In STL you can't do that, which is a significant limitation by >> comparison. Please don't drop this useful functionality! > > I'm not so sure anymore. Do you really think that for instance passing > in a negative _from_ in QString::indexOf to search from the back of the > string is intuitive API? I don't. Huh? Of course it is. s.indexOf('c', 5); // find 'c', forward, starting at offset 5 s.indexOf('c', -5); // find 'c', forward, starting at offset N-5 A negative offset -K is exactly the same as N + 1 - K (N = length of string). It just saves having to write that out yourself. It *doesn't* change the behavior of the function. (I think you are confusing tail-relative offset with reverse operation, which is totally different and orthogonal.) Even STL supports this, partially, for -1; string::npos is generally equivalent to -1 in Qt. Bah. Okay, apparently Qt actually *doesn't* support tail-relative, but just treats n<0 like string::npos. That could be improved for Qt 6 though, but only if n is signed. -- Matthew ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
Op 14-10-2015 om 15:59 schreef Matthew Woehlke: > >>> Yes, signed please. We can discuss whether it should be 64bit for Qt 6. >> The current std API uses size_t. Do you (= both of you) expect that ever to >> change? If it doesn't, Qt will forever be the odd one out, until we finally >> drop QVector etc for std::vector etc and then porting will be a horror >> because >> of MSVC's annoying warnings. > STL should change. In Qt and Python, you can use negative indices to > refer to a distance (length) relative to the end (length) of the string. > In STL you can't do that, which is a significant limitation by > comparison. Please don't drop this useful functionality! I'm not so sure anymore. Do you really think that for instance passing in a negative _from_ in QString::indexOf to search from the back of the string is intuitive API? I don't. I would rather have a specific indexOfBackwards or something like that. Or one could just use the iterator API with a standard algorithm I guess. André ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On 2015-10-14 06:16, Marc Mutz wrote: > First, afaiu from what Thiago mentions in reviews, Q6String will have SSO > (small-string-optimisation) which makes many short strings expensive to copy > (if you think that copying 24 bytes is slower than upping an atomic int > through an indirection) or cheap to copy (if you think the opposite). In any > case, small strings will be very cheap to create (no allocation), so for many > strings there will be not much difference between passing a QStringView or > passing a QString. Atomic operations are expensive (I think I heard once 'on the order of 100 instruction cycles', but that's highly apocryphal), mainly I would guess due to the need to maintain cache coherency. A small copy might happen entirely in local hot cache. 24 bytes is a whole three registers on a modern 64-bit machine. That's probably not going to be very slow. (Mind, atomics still blow full mutexes out of the water, but they're still an order of magnitude slower than small stack allocations and most single machine instructions.) >> Yes, signed please. We can discuss whether it should be 64bit for Qt 6. > > The current std API uses size_t. Do you (= both of you) expect that ever to > change? If it doesn't, Qt will forever be the odd one out, until we finally > drop QVector etc for std::vector etc and then porting will be a horror > because > of MSVC's annoying warnings. STL should change. In Qt and Python, you can use negative indices to refer to a distance (length) relative to the end (length) of the string. In STL you can't do that, which is a significant limitation by comparison. Please don't drop this useful functionality! > array_view cannot compete with QByteArray's API. E.g. there's no toInt(). ...and it *shouldn't*. Never mind that you're talking about a function that deals with *strings*, it's debatable whether that sort of thing belongs as class methods at all. Anyway, they aren't "missing" in the standard library; they're free functions. (That said, the CSL could use better flavors, and there was some talk of that, but AFAIK it didn't get anywhere. I can pretty well guarantee you the committee isn't going to be adding that sort of thing to array_view, or even string_view, any time soon.) -- Matthew ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On 2015-10-14 07:15, Knoll Lars wrote: > In addition, it might be tricky to use QStringView in signals and > slots. As I previously stated, I'm pretty sure you *CAN'T* use QStringView to call slots, except for direct call. In any other case, you risk the backing data being modified or, worse, deallocated, before the slot dispatches (this is *especially* dangerous with cross-thread dispatch, since now you have thread safety to worry about). The only way around that would be for QStringView to take a COW reference to the underlying data. We already have a class like that. It's called QString. What *might* work is if the event dispatcher, when it makes copies of the arguments, makes a deep copy of QStringView into a QString. I'm not sure if this is possible, though, and anyway then you're in the same boat of making a (potentially) unnecessary deep copy if you had a QString in the first place. -- Matthew ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wed, Oct 14, 2015 at 11:15:55AM +, Knoll Lars wrote: > >> >> A: Once QString is backed by UTF-8, [...] > > It’s worthwhile discussing, but any such change would have huge > implications on our QString API. > indeed. > In any case, it’s nothing we can do in Qt 5. > i don't think this is true. see http://lists.qt-project.org/pipermail/development/2015-February/020111.html and the surrounding discussion. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On 14/10/15 12:16, "Marc Mutz" wrote: >On Wednesday 14 October 2015 08:37:19 Knoll Lars wrote: >> I’m not a huge fan of having different overloads with QString, >>QStringRef >> and QLatin1String and in some cases (QChar *, int) for many methods >> neither. But while your proposal solves some problems it introduces >>others. >> >> A QStringView class would only work for methods that read the data >> contained in it, but don’t try to modify it or take a copy (as Thiago >> pointed out). > >I do not agree with that statement. > >First, afaiu from what Thiago mentions in reviews, Q6String will have SSO >(small-string-optimisation) which makes many short strings expensive to >copy >(if you think that copying 24 bytes is slower than upping an atomic int >through an indirection) or cheap to copy (if you think the opposite). In >any >case, small strings will be very cheap to create (no allocation), so for >many >strings there will be not much difference between passing a QStringView >or >passing a QString. I think Thiago should expand a bit on hit plans so we see how the different pieces will fit together. > >Second, upon modification, the QString will detach (make a copy), and >_then_ >perform the operation. With a QStringView and an efficient base of >operations, >those two operations can be folded into one (basically as the const >versions >of QString methods, where they exist, do (or should do)). Unless the >operation >can and will actually be done in the original allocation (ie. incl. no >detach), the const methods should be faster. That will never be the case >when >you pass QString by const-&, because there will always be the lvalue >parameter >attached to the QString instance. For modification in-place to work, you >need >to pass by rvalue ref. So for typical functions modifying the string, >there's >also no difference between QString and QStringView. Agreed. As soon as you modify the string, it makes no difference, unless maybe you at the same time hand over the string to the called function. > >That leaves classes which simply store the string. You cited QUrl. I >don't see >a problem providing QString overloads for these, esp. considering that >we're >starting out with an all-QString API here. Then again, once we have >QStringView overloads, we can simply disable the QString overloads and >see the >effect. I think there’s actually quite a few of those. In addition, it might be tricky to use QStringView in signals and slots. > >BTW: functions storing a passed QString as-is should provide a QString&& >overload, and that might be a good idea even when otherwise using >QStringView >only. Yes, agree with this. > >> And you certainly can’t keep the pointer to the data around >> for longer than the lifetime of the QStringView, so it’s to some extent >>an >> advanced class you have to be careful when using in your own APIs. > >It's like the distinction between QModelIndex and QPersistentModelIndex. >The >first is an interface type, the latter the storage type. Neither is more >"advanced" than the other. They are complements. > >> So it can work nicely for methods such as QString::indexOf and similar, >> but will never be good for methods that need to copy the string (e.g. >> QUrl::setHostName). >> >> >> Another thing I wonder about is whether we shouldn’t deprecate >> QLatin1String moving forward. We have QStringLiteral, and even though >>it’s >> implementation is not ideal, we should be able to get it working >> everywhere now with Qt 5.7. Let’s think about how and whether we can >> improve it’s implementation to fix the remaining issues. Then we could >> remove/deprecate QLatin1String. > >There are problems in QStringLiteral that cannot be solved. Common data >sharing will never happen with the current syntax. I'd suggest a >QStaticString, a fully constexpr wrapper around QStaticStringData, >basically to determine the N transparently, which can be used as a >variable at >namespace scope in lieu of the current need to pack all QStringLiterals >into >static inline functions. But that's outside the scope of this thread, so >let's >not go there. Yes, that’s what I originally wanted with QStringLiteral. Unfortunately it’s semantics then got changed to return a full QString object. > >> On 13/10/15 23:01, "Thiago Macieira" wrote: >> >On Tuesday 13 October 2015 22:46:36 Marc Mutz wrote: >> >> Q: What mistakes do you refer to? >> >> >> >> >> >> >> >> A: The fact that it has copy ctor and assignment operator, so it's >>not a >> >> trivally-copyable type and thus cannot efficiently passed by-value. >>It >> >> >> >>may >> >> >> >> also be too large for pass-by-value due to the rather useless QString >> >> pointer (should have been QStringData*, if any). Neither can be fixed >> >> before Qt 6. >> > >> >Not even in Qt 6. The reason why it uses a QString pointer is that it >> >follows >> >the QString through reallocations. If the QString is mutated, the >> >QStringRef >> >will still be val
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Tuesday 13 October 2015, Marc Mutz wrote: > Hi, > > After looking quite a bit into the current state of string handling in Qt > for my QtWS talk last week, I have become frustrated by the state of > string handling in Qt. > > We have such powerful tools for string handling (QStringRef, > QStringBuilder), but all APIs outside QString and its immediate > surroundings only deal in QString. The correct way would be to overload > every function taking QString with QLatin1String and QStringRef versions, > and then, for some other rare cases, const QChar *, int size. Let alone > std::basic_string. > > I would therefore like to propose to abandon QString for new API (and over > time phase it out of existing API), and only provide (const QChar*, size_t) > as the most general form. I would propose to package the two into a class, > called - you guessed it - QStringView. > Why not a QCharArray? With external data constructor, that should be the same, shouldn't it? Anyway, I doubt this is really something that needs optimizing, QString is neat because it is simple and easy to remember. If anything we need to use QByteArray in more places where QStrings are only 8-bit strings. `Allan ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Wednesday 14 October 2015 08:37:19 Knoll Lars wrote: > I’m not a huge fan of having different overloads with QString, QStringRef > and QLatin1String and in some cases (QChar *, int) for many methods > neither. But while your proposal solves some problems it introduces others. > > A QStringView class would only work for methods that read the data > contained in it, but don’t try to modify it or take a copy (as Thiago > pointed out). I do not agree with that statement. First, afaiu from what Thiago mentions in reviews, Q6String will have SSO (small-string-optimisation) which makes many short strings expensive to copy (if you think that copying 24 bytes is slower than upping an atomic int through an indirection) or cheap to copy (if you think the opposite). In any case, small strings will be very cheap to create (no allocation), so for many strings there will be not much difference between passing a QStringView or passing a QString. Second, upon modification, the QString will detach (make a copy), and _then_ perform the operation. With a QStringView and an efficient base of operations, those two operations can be folded into one (basically as the const versions of QString methods, where they exist, do (or should do)). Unless the operation can and will actually be done in the original allocation (ie. incl. no detach), the const methods should be faster. That will never be the case when you pass QString by const-&, because there will always be the lvalue parameter attached to the QString instance. For modification in-place to work, you need to pass by rvalue ref. So for typical functions modifying the string, there's also no difference between QString and QStringView. That leaves classes which simply store the string. You cited QUrl. I don't see a problem providing QString overloads for these, esp. considering that we're starting out with an all-QString API here. Then again, once we have QStringView overloads, we can simply disable the QString overloads and see the effect. BTW: functions storing a passed QString as-is should provide a QString&& overload, and that might be a good idea even when otherwise using QStringView only. > And you certainly can’t keep the pointer to the data around > for longer than the lifetime of the QStringView, so it’s to some extent an > advanced class you have to be careful when using in your own APIs. It's like the distinction between QModelIndex and QPersistentModelIndex. The first is an interface type, the latter the storage type. Neither is more "advanced" than the other. They are complements. > So it can work nicely for methods such as QString::indexOf and similar, > but will never be good for methods that need to copy the string (e.g. > QUrl::setHostName). > > > Another thing I wonder about is whether we shouldn’t deprecate > QLatin1String moving forward. We have QStringLiteral, and even though it’s > implementation is not ideal, we should be able to get it working > everywhere now with Qt 5.7. Let’s think about how and whether we can > improve it’s implementation to fix the remaining issues. Then we could > remove/deprecate QLatin1String. There are problems in QStringLiteral that cannot be solved. Common data sharing will never happen with the current syntax. I'd suggest a QStaticString, a fully constexpr wrapper around QStaticStringData, basically to determine the N transparently, which can be used as a variable at namespace scope in lieu of the current need to pack all QStringLiterals into static inline functions. But that's outside the scope of this thread, so let's not go there. > On 13/10/15 23:01, "Thiago Macieira" wrote: > >On Tuesday 13 October 2015 22:46:36 Marc Mutz wrote: > >> Q: What mistakes do you refer to? > >> > >> > >> > >> A: The fact that it has copy ctor and assignment operator, so it's not a > >> trivally-copyable type and thus cannot efficiently passed by-value. It > >> > >>may > >> > >> also be too large for pass-by-value due to the rather useless QString > >> pointer (should have been QStringData*, if any). Neither can be fixed > >> before Qt 6. > > > >Not even in Qt 6. The reason why it uses a QString pointer is that it > >follows > >the QString through reallocations. If the QString is mutated, the > >QStringRef > >will still be valid (provided it isn't shortened beyond the substring the > >QStringRef points to). There's a lot of code that depends on this, so we > >can't > >change it. QString foo = "foo"; QStringRef ref = foo.midRef(1); // ref == "oo"; foo = "bar"; // oops, ref == "ar"; We could change it to hold QString::Data* instead, though, right? And make it share ownership of the QString::Data, in which case we have a QString that has position and size inline. Or, if it doesn't participate in the ownership, we can start returning QStringRef from QStringLiteral(Ref?), killing one major QSL problem (out-of-line QString dtor litter). > Only by deprecating QStringRef and not using it ourselve
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
Hi Lars Knoll Lars > Agree here as well. We can’t make QString utf-8 backed without breaking > way too much code. I also don’t see the need for it. The native encoding > on Windows and Mac (Cocoa) is utf-16 as well, on Linux it’s utf-8. So no > matter which platform we’re on, we won’t avoid some conversions. With native do you mean the OS API's? There are many other API's which are preferring UTF-8 for performance and/or size reason like databases. Most text from the web is in UTF-8 because the overhead of Chinese signs is still lower than the savings for the embedded tags around them. I don't think we should orientate on the OS API's but more on the most performance demanding ones. So why do we not provide a QUtf8String and use it for example in networking. We don't need to change everything at once but we should provide UTF-8 support so that our users do not have to invent the wheel again and again like we do in Creator. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Tuesday 13. October 2015 22:46:36 Marc Mutz wrote: > I would therefore like to propose to abandon QString for new API (and over > time phase it out of existing API), and only provide (const QChar*, size_t) > as the most general form. I would propose to package the two into a class, > called - you guessed it - QStringView. +1 I think we indeed need QStringView, QByteArrayView and even QVectorView. And functions that take strings without taking ownership of them (i.e: not setters) should use that. -- Olivier Woboq - Qt services and support - http://woboq.com - http://code.woboq.org ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On 13/10/15 22:46, "Matthew Woehlke" wrote: >On 2015-10-13 15:59, Jake Petroules wrote: >> On Oct 13, 2015, at 1:46 PM, Marc Mutz wrote: >>> I would therefore like to propose to abandon QString for new API (and >>>over >>> time phase it out of existing API), and only provide (const QChar*, >>>size_t) as >>> the most general form. I would propose to package the two into a >>>class, called >>> - you guessed it - QStringView. >> >> In general this sounds like a dangerous idea because it carries over >> all the old API concepts (i.e. (QChar *, size_t) is an extremely >> broken abstraction). You need to read and truly comprehend >> https://developer.apple.com/swift/blog/?id=30 before suggesting any >> changes to string-related APIs for the next major version of Qt, >> because if anything, THAT is what it should look like. Anything but >> that is a near-useless wrapper around binary data, not a true string >> class. From a conceptual point of view I fully agree with the article. Handling unicode data is difficult, and that is what’s required to make it as seamless as possible. But the approach Swift is taking is not trivial or even 100% unambiguous. Afaik they always work with a certain normalization form (composed). But it poses certain problems as well. With their API, you can always add a combining character (like an accent) to an existing letter in the string, but you can never remove it. This creates a certain assymetry that can in some cases pose problems as well. > >While I don't necessarily disagree with that article, I think that the >points being made are orthogonal to what Marc is proposing. Yes, to a good degree it’s orthogonal. What both Marc’s proposal and the article above show is that we should rethink some parts of our unicode handling with Qt 6. QString is very good in many ways, but it still shows it’s history as being a vector of utf16 code points. > >The idea of QStringView would, I presume, be similar to that of >std::string_view; namely, to provide an abstraction over a bag of >"characters" (using that term rather loosely). It does NOT in any way >relate to doing any sort of operations (besides slicing) on a "string". >The idea is to be able to inexpensively pass around "text", whether it >comes from QString, QStringRef, wchar_t*, or what have you, without >having to perform superfluous memory allocations to convert to One True >Form (i.e. QString) when the consumer doesn't actually care. > >That said... I note that slots probably still need to take QString, >because a queued call with a QStringView is horribly broken (for reasons >which I hope are obvious). At least unless the event dispatcher is >clever enough to promote these to QString in the event. Yes, as will many other methods. QStringView would IMO mainly something we can use in places where we use the data in a read-only fashion and where performance is critical. Cheers, Lars ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
I’m not a huge fan of having different overloads with QString, QStringRef and QLatin1String and in some cases (QChar *, int) for many methods neither. But while your proposal solves some problems it introduces others. A QStringView class would only work for methods that read the data contained in it, but don’t try to modify it or take a copy (as Thiago pointed out). And you certainly can’t keep the pointer to the data around for longer than the lifetime of the QStringView, so it’s to some extent an advanced class you have to be careful when using in your own APIs. So it can work nicely for methods such as QString::indexOf and similar, but will never be good for methods that need to copy the string (e.g. QUrl::setHostName). Another thing I wonder about is whether we shouldn’t deprecate QLatin1String moving forward. We have QStringLiteral, and even though it’s implementation is not ideal, we should be able to get it working everywhere now with Qt 5.7. Let’s think about how and whether we can improve it’s implementation to fix the remaining issues. Then we could remove/deprecate QLatin1String. On 13/10/15 23:01, "Thiago Macieira" wrote: >On Tuesday 13 October 2015 22:46:36 Marc Mutz wrote: >> Q: What mistakes do you refer to? >> >> A: The fact that it has copy ctor and assignment operator, so it's not a >> trivally-copyable type and thus cannot efficiently passed by-value. It >>may >> also be too large for pass-by-value due to the rather useless QString >> pointer (should have been QStringData*, if any). Neither can be fixed >> before Qt 6. > >Not even in Qt 6. The reason why it uses a QString pointer is that it >follows >the QString through reallocations. If the QString is mutated, the >QStringRef >will still be valid (provided it isn't shortened beyond the substring the >QStringRef points to). There's a lot of code that depends on this, so we >can't >change it. Only by deprecating QStringRef and not using it ourselves anymore. But it’s used quite a lot in Qt, so this is no easy job and will certainly break source compatibility in places such as the XML stream reader. > >> Q: Why size_t? >> >> A: The intent of QStringView (and std::experimental::string_view) is to >>act >> as an interface between modules written with different compilers and >> different flags. A std::string will never be compatible between >>compilers >> or even just different flags, but a simple struct {char*, size_t} will >> always be, by way of it's C compatibility. >> >> So the goal is not just to accept QString, QStringRef, and (QChar*,int) >>(and >> QVarLengthArray!) as input to QStringView, but also >> std::basic_string and std::vector. > >The C++ committee's current stance on signed vs unsigned is that you >should >use signed for everything, except when you want to have modulo-2 >overflows. >We're not overflowing, so it should be signed. Yes, signed please. We can discuss whether it should be 64bit for Qt 6. > >> Q: What future do you have in mind for QStringRef? >> >> A: None in particular, though I have found a need for an owning >>QStringRef >> in some places. But I expect Qt 6' QString to be able to provide a >> restricted view on shared data, such that it would subsume QStringRef >> completely. > >We should deprecate it if QStringView comes into being. Agree. > >> Q: What about QLatin1String? >> >> A: Once QString is backed by UTF-8, latin-1 ceases to be a special >>charset. >> We might want something like QUsAsciiString, but it would just be a >>UTF-8 >> string, so it could be packed into QStringView. > >Since QString will not be backed by UTF-8, the answer is irrelevant. Agree here as well. We can’t make QString utf-8 backed without breaking way too much code. I also don’t see the need for it. The native encoding on Windows and Mac (Cocoa) is utf-16 as well, on Linux it’s utf-8. So no matter which platform we’re on, we won’t avoid some conversions. And I will strongly oppose any attempts to make QString some sort of hybrid supporting both. The added complexity in maintaining the code base is simply not worth it. > >> Q: What about QByteArray, QVector? >> >> A: I'm unsure about QByteArrayView. It might not pull its weight >>compared to >> std::(experimental::)string_view, but I also note that we're currently >> missing a QByteArrayRef, so a QBAView might make sense while we wait for >> the std one to become available to us. > >Given the mistakes that you and I are pointing out in QStringRef, we >should >not add QByteArrayRef. Instead, it should be in the new-style, in which >case I >wonder whether we should add a class in the first place. And moreover, >how >often is this needed? std::array_view should be plenty for QByteArray and >QVector where needed. Agreed as well. > >> I'm actively opposed to a QArrayView, because I don't think it provides >>us >> with anything std::(experimental::)array_view doesn't already. > >Right. > >> Q: What do you mean when you say "abandon QString"? >> >> A: I mean t
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On Tuesday 13 October 2015 22:46:36 Marc Mutz wrote: > Q: What mistakes do you refer to? > > A: The fact that it has copy ctor and assignment operator, so it's not a > trivally-copyable type and thus cannot efficiently passed by-value. It may > also be too large for pass-by-value due to the rather useless QString > pointer (should have been QStringData*, if any). Neither can be fixed > before Qt 6. Not even in Qt 6. The reason why it uses a QString pointer is that it follows the QString through reallocations. If the QString is mutated, the QStringRef will still be valid (provided it isn't shortened beyond the substring the QStringRef points to). There's a lot of code that depends on this, so we can't change it. > Q: Why size_t? > > A: The intent of QStringView (and std::experimental::string_view) is to act > as an interface between modules written with different compilers and > different flags. A std::string will never be compatible between compilers > or even just different flags, but a simple struct {char*, size_t} will > always be, by way of it's C compatibility. > > So the goal is not just to accept QString, QStringRef, and (QChar*,int) (and > QVarLengthArray!) as input to QStringView, but also > std::basic_string and std::vector. The C++ committee's current stance on signed vs unsigned is that you should use signed for everything, except when you want to have modulo-2 overflows. We're not overflowing, so it should be signed. > Q: What future do you have in mind for QStringRef? > > A: None in particular, though I have found a need for an owning QStringRef > in some places. But I expect Qt 6' QString to be able to provide a > restricted view on shared data, such that it would subsume QStringRef > completely. We should deprecate it if QStringView comes into being. > Q: What about QLatin1String? > > A: Once QString is backed by UTF-8, latin-1 ceases to be a special charset. > We might want something like QUsAsciiString, but it would just be a UTF-8 > string, so it could be packed into QStringView. Since QString will not be backed by UTF-8, the answer is irrelevant. > Q: What about QByteArray, QVector? > > A: I'm unsure about QByteArrayView. It might not pull its weight compared to > std::(experimental::)string_view, but I also note that we're currently > missing a QByteArrayRef, so a QBAView might make sense while we wait for > the std one to become available to us. Given the mistakes that you and I are pointing out in QStringRef, we should not add QByteArrayRef. Instead, it should be in the new-style, in which case I wonder whether we should add a class in the first place. And moreover, how often is this needed? std::array_view should be plenty for QByteArray and QVector where needed. > I'm actively opposed to a QArrayView, because I don't think it provides us > with anything std::(experimental::)array_view doesn't already. Right. > Q: What do you mean when you say "abandon QString"? > > A: I mean that functions should not take QStrings as arguments, but > QStringViews. Then users can transparently pass QString, QStringRef and any > of a number of other "string" types without overloading the function on > each of them. > > I do not mean to abandon QString, the class. Only QString, the interface > type. I'm not agreeing to the proposal just yet. But as a condition to be even considered, it needs to be only for the methods that do not hold a copy of the string. That is, methods that immediately consume the string and no longer need to reference its contents. Methods that keep a copy for any reason (e.g., QFile::setFilename) should still keep a QString API so that they can participate in the reference counting. > Q: What API should QStringView have? > > A: Since it's mainly an interface type, it should have implicit conversions > from all kinds of "string" types, but explicit conversion _to_ those string > types. It should carry all the API from QString that can be implemented on > just a (QChar*, size_t) (e.g. trimmed(), left(), mid(), section(), split(), > but not append(), replace() (except maybe the (QChar,QChar) overload. > Corresponding QString/Ref API could (eventually) just forward to the > QStringView one. That makes sense. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
On 2015-10-13 15:59, Jake Petroules wrote: > On Oct 13, 2015, at 1:46 PM, Marc Mutz wrote: >> I would therefore like to propose to abandon QString for new API (and over >> time phase it out of existing API), and only provide (const QChar*, size_t) >> as >> the most general form. I would propose to package the two into a class, >> called >> - you guessed it - QStringView. > > In general this sounds like a dangerous idea because it carries over > all the old API concepts (i.e. (QChar *, size_t) is an extremely > broken abstraction). You need to read and truly comprehend > https://developer.apple.com/swift/blog/?id=30 before suggesting any > changes to string-related APIs for the next major version of Qt, > because if anything, THAT is what it should look like. Anything but > that is a near-useless wrapper around binary data, not a true string > class. While I don't necessarily disagree with that article, I think that the points being made are orthogonal to what Marc is proposing. The idea of QStringView would, I presume, be similar to that of std::string_view; namely, to provide an abstraction over a bag of "characters" (using that term rather loosely). It does NOT in any way relate to doing any sort of operations (besides slicing) on a "string". The idea is to be able to inexpensively pass around "text", whether it comes from QString, QStringRef, wchar_t*, or what have you, without having to perform superfluous memory allocations to convert to One True Form (i.e. QString) when the consumer doesn't actually care. That said... I note that slots probably still need to take QString, because a queued call with a QStringView is horribly broken (for reasons which I hope are obvious). At least unless the event dispatcher is clever enough to promote these to QString in the event. -- Matthew ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
I like idea to devide the job of manipulating data and sending data around in different classes. Many times I get string from different sources in different formats with different ownerships. And for performance reasons you don't want copy or convert that strings. Many sources like databases provide for performance reasons utf8 so we should definitely support it. How do you want handle ownership. I think it should no be included in the type. What about move semantics. If the data is moved around it don't need to be copied before it can be manipulated. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
> On Oct 13, 2015, at 1:46 PM, Marc Mutz wrote: > > Hi, > > After looking quite a bit into the current state of string handling in Qt for > my QtWS talk last week, I have become frustrated by the state of string > handling in Qt. > > We have such powerful tools for string handling (QStringRef, QStringBuilder), > but all APIs outside QString and its immediate surroundings only deal in > QString. The correct way would be to overload every function taking QString > with QLatin1String and QStringRef versions, and then, for some other rare > cases, const QChar *, int size. Let alone std::basic_string. > > I would therefore like to propose to abandon QString for new API (and over > time phase it out of existing API), and only provide (const QChar*, size_t) > as > the most general form. I would propose to package the two into a class, > called > - you guessed it - QStringView. > > =FAQ= > > Q: Why not just use QStringRef? > > A: QStringRef is tied to QString. E.g. you can't create a QStringRef from a > pair of QChar*, int. It also is kind of stuck in historic mistakes making it > undesireable as a cheap-to-pass parameter type. > > Q: What mistakes do you refer to? > > A: The fact that it has copy ctor and assignment operator, so it's not a > trivally-copyable type and thus cannot efficiently passed by-value. It may > also > be too large for pass-by-value due to the rather useless QString pointer > (should have been QStringData*, if any). Neither can be fixed before Qt 6. > > Q: Why size_t? > > A: The intent of QStringView (and std::experimental::string_view) is to act > as > an interface between modules written with different compilers and different > flags. A std::string will never be compatible between compilers or even just > different flags, but a simple struct {char*, size_t} will always be, by way > of > it's C compatibility. > > So the goal is not just to accept QString, QStringRef, and (QChar*,int) (and > QVarLengthArray!) as input to QStringView, but also > std::basic_string and std::vector. > > Q: What about the plans to make QString UTF-8-backed? > > A: QStringView-using code will need to be ported just as QString-using code > will. > > Q: What future do you have in mind for QStringRef? > > A: None in particular, though I have found a need for an owning QStringRef in > some places. But I expect Qt 6' QString to be able to provide a restricted > view on shared data, such that it would subsume QStringRef completely. > > Q: What about QLatin1String? > > A: Once QString is backed by UTF-8, latin-1 ceases to be a special charset. > We > might want something like QUsAsciiString, but it would just be a UTF-8 > string, > so it could be packed into QStringView. > > Q: What about QByteArray, QVector? > > A: I'm unsure about QByteArrayView. It might not pull its weight compared to > std::(experimental::)string_view, but I also note that we're currently > missing > a QByteArrayRef, so a QBAView might make sense while we wait for the std one > to become available to us. > > I'm actively opposed to a QArrayView, because I don't think it provides us > with anything std::(experimental::)array_view doesn't already. > > Q: What about a rope? > > A: A rope is a more complex string that can provide complex views on existing > data as well as store rules for generating stretches of data (as opposed to > the data itself). > > A rope is a very complex data structure and would not work as a universal > interface type. It would be cool if Qt had a rope, but that is outside the > scope of my proposal. > > Q: What do you mean when you say "abandon QString"? > > A: I mean that functions should not take QStrings as arguments, but > QStringViews. Then users can transparently pass QString, QStringRef and any > of > a number of other "string" types without overloading the function on each of > them. > > I do not mean to abandon QString, the class. Only QString, the interface type. > > Q: What API should QStringView have? > > A: Since it's mainly an interface type, it should have implicit conversions > from all kinds of "string" types, but explicit conversion _to_ those string > types. It should carry all the API from QString that can be implemented on > just a (QChar*, size_t) (e.g. trimmed(), left(), mid(), section(), split(), > but not append(), replace() (except maybe the (QChar,QChar) overload. > Corresponding QString/Ref API could (eventually) just forward to the > QStringView one. > > Thanks, now fire away, > Marc > > -- > Marc Mutz | Senior Software Engineer > KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company > Tel: +49-30-521325470 > KDAB - The Qt Experts > ___ > Development mailing list > Development@qt-project.org > http://lists.qt-project.org/mailman/listinfo/development In general this sounds like a dangerous idea because it carries over all the old API concepts (i.e. (QChar *, size_t) i