[Development] Renaming quint128
I was working on extended integers and added qint128 and quint128 to qglobal.h (qtypes.h), but when I tried to rebuild all of Qt today, I found out that QtBluetooth has this in qbluetoothuuid.h: struct quint128 { quint8 data[16]; }; And it's used in the API, with a constructor and a toUInt128(), but that's all. It's also not documented. I'd like to move it away so I can add the proper integer. There's a way to replace it without breaking BC or SC: 1) on 64-bit systems with GCC and Clang, use the actual integer type 2) everywhere else, use the the struct 3) for QtBluetooth's own build, add a removed_api.cpp that also #undef __SIZEOF_INT128__ It might be a good idea to move that backup definition to QtCore, so QtBluetooth isn't depending on just how qtypes.h does it. -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] How qAsConst and qExchange lead to qNN
On Thursday, 17 November 2022 10:32:50 PST Elvis Stansvik wrote: > Fermat's Last QString Vectorization Update :p Everything is already set to Gerrit. What I haven't done is benchmark it to confirm the theoretical runs in LLVM-MCA. It starts at https://codereview.qt-project.org/c/qt/qtbase/+/386952 See the search at https://codereview.qt-project.org/q/ is:open+owner:thiago.macieira%2540intel.com+message:QString The changes are mostly organised as "reorganise the pre-AVX code", then "rewrite AVX2 code" then "add AVX512VL code" for each of the functions. -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] How qAsConst and qExchange lead to qNN
On Thursday, 17 November 2022 10:24:35 PST Volker Hilsheimer via Development wrote: > > Though I am postponing the QString vectorisation update to 6.6 because I > > don't have time to collect the benchmarks to prove I'm right before > > feature freeze next Friday. > > Next Friday is the platform & module freeze. Feature freeze is not until > December 9th, i.e. another 3 weeks to go. Next Friday is also the day after Thanksgiving here in the US. I don't expect I can finish the benchmarking in 3 weeks, not considering I need to finish the IPC work and that includes starting a couple of changes that I haven't started yet (like the ability to clean up after itself). For the benchmarking, I've already collected the data by instrumenting each of the functions in question and running a Qt build, a Qt Creator start and a Qt build inside Qt Creator: qt-build-data.tar.xz: 1197.3 MB qtcreator-nosession.tar.xz: 2690.0 MB qtcreator-session.tar.xz: 35134.6 MB The data retains its intra-cacheline alignment. The way I'm seeing it, is that for each of the algorithm generations, I need to: 1) find the asymptotic limits, given L1, L2 and L3 cache sizes That is, the algorithms should be fast enough that the bottleneck is the transfer of data. There's no way that running qustrchr on 35 GB is going to be bound by anything other than RAM bandwidth or, in my laptop's case, the NVMe. So what are those limits? 2) benchmark at several data set sizes (half to 75% of L1, half to 75% of L2) on several generations Confirm that the algorithm is running close to or better than the ideal run that LLVM-MCA showed when I designed them. I know I can benchmark throughput to see if we're reaching the target bytes/cycle processing, but I don't know if I can benchmark the latency. I also don't know if it matters. 3) benchmark at several input sizes (i.e., strings of 4 characters, 8 characters, etc.) Same as #2, but instead of running over the sample that adds up to a certain data size, select the input such that the strings have always the same size. 4) compare to the previous generation's algorithm to confirm it's actually better Different instructions have different pros and cons; what might work for one at a given data size may not for another The algorithms available are: * baseline SSE2: no comparisons * SSE 4.1: compare to baseline SSE2, current SSE 4.1 * AVX2: compare to new SSE 4.1, current AVX2 * AVX512 with 256-bit vectors ("Avx256"): compare to new AVX2 I plan on collecting data on 3 laptop processors (Haswell, Skylake and Tiger Lake) and 2 desktop processors (Coffee Lake and Skylake Extreme). The Skylake should match the performance of almost all the Skylake and derivatives since 2016; the Coffee Lake NUC has the same processor as my Mac Mini; the Tiger Lake should be the performance of modern processors. The Skylake Extreme and the Tiger Lake can run the AVX512 code too. I don't know if the AVX512 code on Skylake will show a performance gain or a loss, because despite using only 256 bits, it may need to power on the OpMask registers. If it doesn't, I will adjust the feature detection to only apply to Ice Lakes and later. I have a new Alder Lake which would be nice to benchmark, to get the performance on both the Golden Cove P-core and the Gracemont E-core, but the thing runs Windows and the IT-mandated virus scans, so I will not bother. -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] How qAsConst and qExchange lead to qNN
Den tors 17 nov. 2022 kl 18:46 skrev Thiago Macieira : > > On Thursday, 17 November 2022 02:04:54 PST Marc Mutz via Development wrote: > > > Also, sometimes I wonder if all the work you and I do to optimise these > > > things matter, in the end. We may save 0.5% of the CPU time, only for > > > that to be dwarfed by whatever QtGui, QtQml are doing. > > > > I hear you, but I'm not ready to give in just yet. > > Nor am I. > > Though I am postponing the QString vectorisation update to 6.6 because I don't > have time to collect the benchmarks to prove I'm right before feature freeze > next Friday. Fermat's Last QString Vectorization Update :p Elvis > > -- > Thiago Macieira - thiago.macieira (AT) intel.com > Cloud Software Architect - Intel DCAI Cloud Engineering > > > > ___ > Development mailing list > Development@qt-project.org > https://lists.qt-project.org/listinfo/development ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] How qAsConst and qExchange lead to qNN
> On 17 Nov 2022, at 18:45, Thiago Macieira wrote: > > On Thursday, 17 November 2022 02:04:54 PST Marc Mutz via Development wrote: >>> Also, sometimes I wonder if all the work you and I do to optimise these >>> things matter, in the end. We may save 0.5% of the CPU time, only for >>> that to be dwarfed by whatever QtGui, QtQml are doing. >> >> I hear you, but I'm not ready to give in just yet. > > Nor am I. > > Though I am postponing the QString vectorisation update to 6.6 because I > don't > have time to collect the benchmarks to prove I'm right before feature freeze > next Friday. Next Friday is the platform & module freeze. Feature freeze is not until December 9th, i.e. another 3 weeks to go. Volker ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] How qAsConst and qExchange lead to qNN
On Thursday, 17 November 2022 02:04:54 PST Marc Mutz via Development wrote: > > Also, sometimes I wonder if all the work you and I do to optimise these > > things matter, in the end. We may save 0.5% of the CPU time, only for > > that to be dwarfed by whatever QtGui, QtQml are doing. > > I hear you, but I'm not ready to give in just yet. Nor am I. Though I am postponing the QString vectorisation update to 6.6 because I don't have time to collect the benchmarks to prove I'm right before feature freeze next Friday. -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] How qAsConst and qExchange lead to qNN
Hi Thiago, On 16.11.22 18:50, Thiago Macieira wrote: > On Tuesday, 15 November 2022 23:50:38 PST Marc Mutz via Development wrote: >>> in a thread-safe manner (such that if something in >>> the same thread or another thread-safely modifies that map, the original >>> user isn't affected). >> >> The above isn't thread-safe, it isn't even re-entrant, in the same way >> that iteration using iterators isn't. This is a known issue whenever you >> hand out references, and it's nothing that violates our >> const-is-thread-safe promise, > > No, but it moves the responsibility for avoiding this problem to the user. > > Right now, you can do: >for (auto elem : object.keyList()) { >operate(); // may recurse back into object and modify it >} > > If you use a generator paradigm to return this key list, then the *user* must > know that they must create a local container with the items to be generated > and iterate over that. Performance-wise, this no different than if the Qt code > created the container and returned it, but it has two drawbacks: > > 1) the responsibility for knowing this Not necessarily. E.g. when the co-routine implementation uses the equivalent of an indexed loop, it immunizes itself from changes to the container while it's suspended. It can also post a re-entrancy guard in the class' data, like we sometimes already do in event handlers and often do in slots, to at least detect and mitigate the issue. This isn't different from emitting signals or calling virtual functions while iterating, and the solutions are the same, and, largely, if not completely, under the control of the co-routine implementation. That said, it's not entirely clear to me how widespread such issues are. After all, the user or a generator sees the potentially-re-entering code, it's in the function he's presently writing/analyzing, and not hidden in the way signal/slot connections or even virtual functions hide the issue by having far-removed code cause the problem. So I don't know whether the benefits of lazy evaluation outweight or are dwarfed by this issue. > 2) if the Qt object already has a QList with this, then using a generator > paradigm enforces the need of a deep copy, when implicit would have been > cheaper I hasten to interject here that the code you wrote above actually does deep-copy in that case (hidden detach in the for loop). Apart from that, we're circling back to the assumption that a class would hold or return a QList for the sake of QList. For holding, and also for returning, if one must return an owning container, a QVLA or otherwise SBO'd container would be more appropriate in many cases. The lack of such containers in Qt begets the use of QList in the first place. To get out of this tread-mill, one needs to look outside the Qt echo chamber, to std C++ (std::u16string, std::pmr), Folly (F14 (hash table), fbstring (SSO, CoW only for large strings)), Python (strings are QAnyString with SSO there), LLVM (llvm::SmallVector, StringRef, ArrayRef), Mozilla's JS strings (L1/UTF-16 QAnyString, SSO). Then work backwards from these kinds of containers to how we can enable them in Qt. >>> Because you pointed to QStringTokenizer and that implicitly- >>> copies a QString. >> >> That's imprecise. QStringTokenizer extends rvalue lifetimes ("rvalue >> pinning") so's to make this safe: >> >> for (auto part : qTokenize(label->text(), u';')) > > BTW, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2012r2.pdf is > accepted for C++23 and moves the end of the temporaries' lifetimes to the end > of the full for statement. Hallelujah! Thanks, Nico! > Though we still need to work with C++17 and 20 for a while. > > Also, sometimes I wonder if all the work you and I do to optimise these things > matter, in the end. We may save 0.5% of the CPU time, only for that to be > dwarfed by whatever QtGui, QtQml are doing. I hear you, but I'm not ready to give in just yet. Thanks, Marc -- Marc Mutz Principal Software Engineer The Qt Company Erich-Thilo-Str. 10 12489 Berlin, Germany www.qt.io Geschäftsführer: Mika Pälsi, Juha Varelius, Jouni Lintunen Sitz der Gesellschaft: Berlin, Registergericht: Amtsgericht Charlottenburg, HRB 144331 B ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development