[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #18 from Jonathan Wakely --- No.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 cqwrteur changed: What|Removed |Added CC||unlvsur at live dot com --- Comment #17 from cqwrteur --- (In reply to Jonathan Wakely from comment #14) > I have a patch to remove indirections in std::array which I'll commit for > GCC 13. shouldn't it also add [[__gnu__::__artificial__]] attribute?
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #16 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:22399ad6edcd4a2903b05196b59eec3159ceaa38 commit r13-114-g22399ad6edcd4a2903b05196b59eec3159ceaa38 Author: Jonathan Wakely Date: Wed Apr 27 16:09:06 2022 +0100 libstdc++: Add always_inline to the simplest std::array accessors [PR104719] libstdc++-v3/ChangeLog: PR libstdc++/104719 * include/std/array (array::size(), array::max_size()) (array::empty(), array::data()): Add always_inline attribute.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #15 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:ef8d5ac08b5e60f35c52087d88c0235c8ce6b65b commit r13-113-gef8d5ac08b5e60f35c52087d88c0235c8ce6b65b Author: Jonathan Wakely Date: Fri Mar 25 10:28:28 2022 + libstdc++: Simplify std::array accessors [PR104719] This removes the __array_traits::_S_ref and __array_traits::_S_ptr accessors, which only exist to make the special case of std::array syntactically well-formed. By changing the empty type used as the std::array::_M_elems data member to support operator[] and conversion to a pointer, we can write code using the natural syntax. The indirection through _S_ref and _S_ptr is removed for the common case, and a function call is only used for the special case of zero-size arrays. The invalid member access for zero-sized arrays is changed to use __builtin_trap() instead of a null dereference. This guarantees a runtime error if it ever gets called, instead of undefined behaviour that is likely to get optimized out as unreachable. libstdc++-v3/ChangeLog: PR libstdc++/104719 * include/std/array (__array_traits::_S_ref): Remove. (__array_traits::_S_ptr): Remove. (__array_traits::_Type): Define operator[] and operator T* to provide an array-like API. (array::_AT_Type): Remove public typeef. (array::operator[], array::at, array::front, array::back): Use index operator to access _M_elems instead of _S_ref. (array::data): Use implicit conversion from _M_elems to pointer. (swap(array&, array&)): Use __enable_if_t helper. (get): Use index operator to access _M_elems. * testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust dg-error line numbers.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #14 from Jonathan Wakely --- I have a patch to remove indirections in std::array which I'll commit for GCC 13.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Jason Merrill changed: What|Removed |Added Resolution|--- |DUPLICATE Target Milestone|--- |12.0 Status|NEW |RESOLVED --- Comment #13 from Jason Merrill --- I believe the primary concern of this PR is resolved by Patrick's patch for PR96780; you can now use -ffold-simple-inlines to avoid calls to std::move at -O0. Please open separate PRs for individual standard library abstraction penalty issues. *** This bug has been marked as a duplicate of bug 96780 ***
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Jason Merrill changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #12 from Jason Merrill --- (In reply to Vittorio Romeo from comment #9) > - For the `operator[]` benchmark, when using `-Og` after applying > `[[gnu::always_inline]]` to all the functions touched by the benchmark, we > reduce the overhead from 34% to around 11%. Quoting from your gist: -Og, without `[[gnu::always_inline]]` on `operator[]` carray_squareop_mean 440 ns 439 ns3 vector_squareop_mean 661 ns 662 ns3 -Og, with `[[gnu::always_inline]]` on `operator[]` vector_squareop_mean 494 ns 491 ns3 Which looks significant... ...but I don't see this when I run your test myself; the vector_squareop results (and the generated code) are unaffected by adding always_inline. In particular there are no calls to operator[]. carray_squareop 547 ns 546 ns 1272023 vector_squareop 591 ns 589 ns 1227215 carray: movslq %edx, %rax leaq(%rbx,%rax,4), %rcx movl(%rcx), %eax addl$1, %eax movl%eax, (%rcx) movl%eax, (%rcx) addl$1, %edx vector: movslq %ecx, %rax salq$2, %rax addq(%rsp), %rax movl(%rax), %esi leal1(%rsi), %edx movl%edx, (%rax) movl%edx, (%rax) addl$1, %ecx It seems the main difference between the two is that the vector version needs to keep loading the base pointer from the stack (%rsp) for some reason, rather than keep it in a register %rbx. This doesn't seem like an inlining issue at all. The double move in both versions is curious.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #11 from Jonathan Wakely --- Stoopid autocorrect. s/kitty/it/
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #10 from Jonathan Wakely --- There is a downside to always_inline, which might be similar to the stack usage problem mentioned in the llvm ticket. If a function grows too large, the optimiser can just give up and stop inlining. But for always_inline that gives a fatal error with no way to fix it. The attribute tells the compiler kitty isn't allowed to give up, the call must *always* be inline. If we liberally sprinkle it all over the std::lib and it results in compiler errors, users can't do anything about it. The code won't compile, period. Annotations that usually improve performance, but then sometimes give unavoidable errors are not a good idea.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #9 from Vittorio Romeo --- I have done some benchmarking for three use cases, both with `-O0` and `-Og`, hacking my `libstdc++` headers to add `[[gnu::always_inline]]` where deemed appropriate. --- The use cases were: 1. `operator[]` benchmark -- `vector_squareop` and `carray_squareop` as seen above 2. Iterator benchmark -- `vector_iter` and `carray_iter` as seen above 3. Algorithm benchmark -- `std::accumulate` versus a raw `for` loop --- All the benchmark results, benchmarking rig specs, and used code available here: https://gist.github.com/vittorioromeo/efa005d44ccd4ec7279181768a0c1f0b --- In short, these are the results: - For all benchmarks, when using `-O0` without any modification to `libstdc++`, the overhead of the Standard Library can be huge (+25-400%). - For all benchmarks, when using `-Og` without any modification to `libstdc++`, the overhead of the Standard Library is small (+5-15%). - For the `operator[]` benchmark, when using `-O0` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 25% to around 10%. - For the `operator[]` benchmark, when using `-Og` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 34% to around 11%. - For the iterator benchmark, when using `-O0` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 302% to around 186%. - For the iterator benchmark, when using `-Og` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 11% to around 8%. - For the algorithm benchmark, when using `-O0` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 304% to around 47%. - For the algorithm benchmark, when using `-Og`, independently of whether we modify `libstdc++` or not, the overhead is around 36%.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Severity|normal |enhancement Last reconfirmed||2022-02-28 --- Comment #8 from Jonathan Wakely --- (In reply to Vittorio Romeo from comment #6) > > The request is to replace it with some kind of magic that does the same as > > std::move without actually writing std::move. > > More generally speaking, ensure that function such as `std::move`, > `std::forward`, `std::vector::operator[]`, > `std::vector::iterator::operator*`, and so on never appear in debugging > call stacks and do not affect performance in `-Og` (or even `-O0`. They will still appear in debugging stacks, because GCC emits debug info even for inline functions. That means that GDB makes it look like the function was entered and exited, even if the actual code is completely inlined. That was the original topic of PR 96780 (because I don't think it's useful for std::move and std::forward). I think what *should* matters is actual raw performance, not how many stack frames you see in the debugger. However, reading twitter and reddit, I think people would complain about insane levels of complexity in the std::lib *even if it has zero overhead*. > I think the title for my issue is a bit too specific, but I'd like to make > this a wider discussion in how to mitigate debug performance overhead caused > by C++ standard library abstractions. OK, confirming as an enhancement, but somebody will need to do measurements and propose specific patches. I'm not aware of any low-hanging fruit for runtime perf, so the problems need to be identified before they can be fixed.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #7 from Jonathan Wakely --- (In reply to Vittorio Romeo from comment #4) > I like the idea of having the compiler itself fold calls to things like > `std::move` and `std::forward` as suggested in the linked > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780. That benefits user code too, so that users won't have to use static_cast in their own code because "std::move is slow". > But I think this issue I opened should be more general for any standard > library function that ends up impacting debug performance. Another common > example in the gamedev community is `std::vector`. Does the gamedev community actually use -Og though? Because if they only use -O0 (as is certainly the case for some) then the solution is "don't do that". Do they even use GCC/libstdc++ at all? Would they actually be more likely to if we change anything here? For those who are using -Og, adding always_inline to trivial accessors shouldn't be necessary, because GCC _should_ always inline them anyway. > In this benchmark, which uses `-Og`, you can notice a large performance > difference between a `std::vector` and `int*` dynamic array for > operations that I believe should have equal performance: > - https://quick-bench.com/q/lrS4I-lmDJ3VFP8L8rG2YHGXO-8 > - https://quick-bench.com/q/Uf-t79n7uYWAKdThOL_wxSp12Y0 > > Are the above results also something that should be handled on the compiler > side of things? Or would, for example, marking `std::vector::operator[]` and > `std::vector::iterator::operator*` as `always_inline` remove the performance > discrepancy? Somebody will have to investigate whether lack of inlining is the problem there. My guess would be that it's not due to trivial functions like op[] failing to be inlined, and so always_inline wouldn't help (except at -O0, but "don't do that" if you need performance :-)
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #6 from Vittorio Romeo --- > The request is to replace it with some kind of magic that does the same as > std::move without actually writing std::move. More generally speaking, ensure that function such as `std::move`, `std::forward`, `std::vector::operator[]`, `std::vector::iterator::operator*`, and so on never appear in debugging call stacks and do not affect performance in `-Og` (or even `-O0`. I think the title for my issue is a bit too specific, but I'd like to make this a wider discussion in how to mitigate debug performance overhead caused by C++ standard library abstractions.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Jonathan Wakely changed: What|Removed |Added See Also|https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=67906,| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=81159,| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=90428 | --- Comment #5 from Jonathan Wakely --- (In reply to Eric Gallager from comment #3) > Seems related to some of the requests for more warnings about uses of > std::move that might degrade performance, e.g. bug 67906, bug 81159, bug > 90428 > > Also, Marek's blog post about std::move seems relevant here: > https://developers.redhat.com/blog/2019/04/12/understanding-when-not-to- > stdmove-in-c I don't think any of those are relevant, they're about *unnecessary* uses of std::move where that is either entirely redundant or doesn't change semantics but prevents the optimizer doing better on its own. This is about *necessary* uses of std::move, and removing the move here would change semantics and potentially break the code. The request is to replace it with some kind of magic that does the same as std::move without actually writing std::move.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #4 from Vittorio Romeo --- I see that `std::move` is indeed inlined with `-Og`, my apologies on not noticing that. I like the idea of having the compiler itself fold calls to things like `std::move` and `std::forward` as suggested in the linked https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780. But I think this issue I opened should be more general for any standard library function that ends up impacting debug performance. Another common example in the gamedev community is `std::vector`. In this benchmark, which uses `-Og`, you can notice a large performance difference between a `std::vector` and `int*` dynamic array for operations that I believe should have equal performance: - https://quick-bench.com/q/lrS4I-lmDJ3VFP8L8rG2YHGXO-8 - https://quick-bench.com/q/Uf-t79n7uYWAKdThOL_wxSp12Y0 Are the above results also something that should be handled on the compiler side of things? Or would, for example, marking `std::vector::operator[]` and `std::vector::iterator::operator*` as `always_inline` remove the performance discrepancy?
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Eric Gallager changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=67906, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=81159, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=90428 CC||egallager at gcc dot gnu.org, ||mpolacek at gcc dot gnu.org --- Comment #3 from Eric Gallager --- Seems related to some of the requests for more warnings about uses of std::move that might degrade performance, e.g. bug 67906, bug 81159, bug 90428 Also, Marek's blog post about std::move seems relevant here: https://developers.redhat.com/blog/2019/04/12/understanding-when-not-to-stdmove-in-c
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #2 from Jonathan Wakely --- (In reply to Vittorio Romeo from comment #0) > Would it be possible to replace `std::move` calls internal to `libstdc++` > with a cast, No, absolutely not. > or some sort of compiler intrinsic? No, but the compiler could just fold it away (see PR 96780). > Or maybe mark `std::move` > as "always inline" even without optimizations enabled? Maybe, although since your benchmark shows there is no problem with GCC at -Og I think this is a Clang problem, not libstdc++ or libc++.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Jonathan Wakely changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=96780 --- Comment #1 from Jonathan Wakely --- If it's not inlined at -Og then that's a compiler problem, GCC inlines it at -Og: https://quick-bench.com/q/d1H4VjD_Xns4ef7iOfweIy6rPNM