[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-05-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #18 from Jonathan Wakely  ---
No.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-05-04 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

cqwrteur  changed:

   What|Removed |Added

 CC||unlvsur at live dot com

--- Comment #17 from cqwrteur  ---
(In reply to Jonathan Wakely from comment #14)
> I have a patch to remove indirections in std::array which I'll commit for
> GCC 13.

shouldn't it also add [[__gnu__::__artificial__]] attribute?

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-05-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:22399ad6edcd4a2903b05196b59eec3159ceaa38

commit r13-114-g22399ad6edcd4a2903b05196b59eec3159ceaa38
Author: Jonathan Wakely 
Date:   Wed Apr 27 16:09:06 2022 +0100

libstdc++: Add always_inline to the simplest std::array accessors
[PR104719]

libstdc++-v3/ChangeLog:

PR libstdc++/104719
* include/std/array (array::size(), array::max_size())
(array::empty(), array::data()): Add  always_inline attribute.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-05-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:ef8d5ac08b5e60f35c52087d88c0235c8ce6b65b

commit r13-113-gef8d5ac08b5e60f35c52087d88c0235c8ce6b65b
Author: Jonathan Wakely 
Date:   Fri Mar 25 10:28:28 2022 +

libstdc++: Simplify std::array accessors [PR104719]

This removes the __array_traits::_S_ref and __array_traits::_S_ptr
accessors, which only exist to make the special case of std::array
syntactically well-formed.

By changing the empty type used as the std::array::_M_elems data
member to support operator[] and conversion to a pointer, we can write
code using the natural syntax. The indirection through _S_ref and
_S_ptr is removed for the common case, and a function call is only used
for the special case of zero-size arrays.

The invalid member access for zero-sized arrays is changed to use
__builtin_trap() instead of a null dereference. This guarantees a
runtime error if it ever gets called, instead of undefined behaviour
that is likely to get optimized out as unreachable.

libstdc++-v3/ChangeLog:

PR libstdc++/104719
* include/std/array (__array_traits::_S_ref): Remove.
(__array_traits::_S_ptr): Remove.
(__array_traits::_Type): Define operator[] and operator T*
to provide an array-like API.
(array::_AT_Type): Remove public typeef.
(array::operator[], array::at, array::front, array::back): Use
index operator to access _M_elems instead of _S_ref.
(array::data): Use implicit conversion from _M_elems to pointer.
(swap(array&, array&)): Use __enable_if_t helper.
(get): Use index operator to access _M_elems.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-03-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #14 from Jonathan Wakely  ---
I have a patch to remove indirections in std::array which I'll commit for GCC
13.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-03-26 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Jason Merrill  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
   Target Milestone|--- |12.0
 Status|NEW |RESOLVED

--- Comment #13 from Jason Merrill  ---
I believe the primary concern of this PR is resolved by Patrick's patch for
PR96780; you can now use -ffold-simple-inlines to avoid calls to std::move at
-O0.  Please open separate PRs for individual standard library abstraction
penalty issues.

*** This bug has been marked as a duplicate of bug 96780 ***

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-03-10 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Jason Merrill  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #12 from Jason Merrill  ---
(In reply to Vittorio Romeo from comment #9)
> - For the `operator[]` benchmark, when using `-Og` after applying
> `[[gnu::always_inline]]` to all the functions touched by the benchmark, we
> reduce the overhead from 34% to around 11%. 

Quoting from your gist:

-Og, without `[[gnu::always_inline]]` on `operator[]`
carray_squareop_mean  440 ns  439 ns3
vector_squareop_mean  661 ns  662 ns3
-Og, with `[[gnu::always_inline]]` on `operator[]`
vector_squareop_mean  494 ns  491 ns3

Which looks significant...

...but I don't see this when I run your test myself; the vector_squareop
results (and the generated code) are unaffected by adding always_inline.  In
particular there are no calls to operator[].

carray_squareop  547 ns  546 ns  1272023
vector_squareop  591 ns  589 ns  1227215

carray:

movslq  %edx, %rax
leaq(%rbx,%rax,4), %rcx
movl(%rcx), %eax
addl$1, %eax
movl%eax, (%rcx)
movl%eax, (%rcx)
addl$1, %edx

vector:

movslq  %ecx, %rax
salq$2, %rax
addq(%rsp), %rax
movl(%rax), %esi
leal1(%rsi), %edx
movl%edx, (%rax)
movl%edx, (%rax)
addl$1, %ecx

It seems the main difference between the two is that the vector version needs
to keep loading the base pointer from the stack (%rsp) for some reason, rather
than keep it in a register %rbx.  This doesn't seem like an inlining issue at
all.

The double move in both versions is curious.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-03-01 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #11 from Jonathan Wakely  ---
Stoopid autocorrect. s/kitty/it/

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-03-01 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #10 from Jonathan Wakely  ---
There is a downside to always_inline, which might be similar to the stack usage
problem mentioned in the llvm ticket. If a function grows too large, the
optimiser can just give up and stop inlining. But for always_inline that gives
a fatal error with no way to fix it. The attribute tells the compiler kitty
isn't allowed to give up, the call must *always* be inline. If we liberally
sprinkle it all over the std::lib and it results in compiler errors, users
can't do anything about it. The code won't compile, period. Annotations that
usually improve performance, but then sometimes give unavoidable errors are not
a good idea.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #9 from Vittorio Romeo  ---
I have done some benchmarking for three use cases, both with `-O0` and `-Og`,
hacking my `libstdc++` headers to add `[[gnu::always_inline]]` where deemed
appropriate. 

---

The use cases were:

1. `operator[]` benchmark -- `vector_squareop` and `carray_squareop` as seen
above

2. Iterator benchmark -- `vector_iter` and `carray_iter` as seen above

3. Algorithm benchmark -- `std::accumulate` versus a raw `for` loop

---

All the benchmark results, benchmarking rig specs, and used code available
here:
https://gist.github.com/vittorioromeo/efa005d44ccd4ec7279181768a0c1f0b

---

In short, these are the results:

- For all benchmarks, when using `-O0` without any modification to `libstdc++`,
the overhead of the Standard Library can be huge (+25-400%).

- For all benchmarks, when using `-Og` without any modification to `libstdc++`,
the overhead of the Standard Library is small (+5-15%).

- For the `operator[]` benchmark, when using `-O0` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 25% to around 10%.

- For the `operator[]` benchmark, when using `-Og` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 34% to around 11%. 

- For the iterator benchmark, when using `-O0` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 302% to around 186%. 

- For the iterator benchmark, when using `-Og` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 11% to around 8%. 

- For the algorithm benchmark, when using `-O0` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 304% to around 47%.

- For the algorithm benchmark, when using `-Og`, independently of whether we
modify `libstdc++` or not, the overhead is around 36%.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Severity|normal  |enhancement
   Last reconfirmed||2022-02-28

--- Comment #8 from Jonathan Wakely  ---
(In reply to Vittorio Romeo from comment #6)
> > The request is to replace it with some kind of magic that does the same as 
> > std::move without actually writing std::move.
> 
> More generally speaking, ensure that function such as `std::move`,
> `std::forward`, `std::vector::operator[]`,
> `std::vector::iterator::operator*`, and so on never appear in debugging
> call stacks and do not affect performance in `-Og` (or even `-O0`. 

They will still appear in debugging stacks, because GCC emits debug info even
for inline functions. That means that GDB makes it look like the function was
entered and exited, even if the actual code is completely inlined. That was the
original topic of PR 96780 (because I don't think it's useful for std::move and
std::forward). I think what *should* matters is actual raw performance, not how
many stack frames you see in the debugger. However, reading twitter and reddit,
I think people would complain about insane levels of complexity in the std::lib
*even if it has zero overhead*.

> I think the title for my issue is a bit too specific, but I'd like to make
> this a wider discussion in how to mitigate debug performance overhead caused
> by C++ standard library abstractions.

OK, confirming as an enhancement, but somebody will need to do measurements and
propose specific patches. I'm not aware of any low-hanging fruit for runtime
perf, so the problems need to be identified before they can be fixed.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #7 from Jonathan Wakely  ---
(In reply to Vittorio Romeo from comment #4)
> I like the idea of having the compiler itself fold calls to things like
> `std::move` and `std::forward` as suggested in the linked
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780. 

That benefits user code too, so that users won't have to use static_cast
in their own code because "std::move is slow".

> But I think this issue I opened should be more general for any standard
> library function that ends up impacting debug performance. Another common
> example in the gamedev community is `std::vector`.

Does the gamedev community actually use -Og though? Because if they only use
-O0 (as is certainly the case for some) then the solution is "don't do that".
Do they even use GCC/libstdc++ at all? Would they actually be more likely to if
we change anything here?

For those who are using -Og, adding always_inline to trivial accessors
shouldn't be necessary, because GCC _should_ always inline them anyway.

> In this benchmark, which uses `-Og`, you can notice a large performance
> difference between a `std::vector` and `int*` dynamic array for
> operations that I believe should have equal performance:
> - https://quick-bench.com/q/lrS4I-lmDJ3VFP8L8rG2YHGXO-8
> - https://quick-bench.com/q/Uf-t79n7uYWAKdThOL_wxSp12Y0
> 
> Are the above results also something that should be handled on the compiler
> side of things? Or would, for example, marking `std::vector::operator[]` and
> `std::vector::iterator::operator*` as `always_inline` remove the performance
> discrepancy?

Somebody will have to investigate whether lack of inlining is the problem
there. My guess would be that it's not due to trivial functions like op[]
failing to be inlined, and so always_inline wouldn't help (except at -O0, but
"don't do that" if you need performance :-)

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #6 from Vittorio Romeo  ---
> The request is to replace it with some kind of magic that does the same as 
> std::move without actually writing std::move.

More generally speaking, ensure that function such as `std::move`,
`std::forward`, `std::vector::operator[]`,
`std::vector::iterator::operator*`, and so on never appear in debugging call
stacks and do not affect performance in `-Og` (or even `-O0`. 

I think the title for my issue is a bit too specific, but I'd like to make this
a wider discussion in how to mitigate debug performance overhead caused by C++
standard library abstractions.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Jonathan Wakely  changed:

   What|Removed |Added

   See Also|https://gcc.gnu.org/bugzill |
   |a/show_bug.cgi?id=67906,|
   |https://gcc.gnu.org/bugzill |
   |a/show_bug.cgi?id=81159,|
   |https://gcc.gnu.org/bugzill |
   |a/show_bug.cgi?id=90428 |

--- Comment #5 from Jonathan Wakely  ---
(In reply to Eric Gallager from comment #3)
> Seems related to some of the requests for more warnings about uses of
> std::move that might degrade performance, e.g. bug 67906, bug 81159, bug
> 90428
> 
> Also, Marek's blog post about std::move seems relevant here:
> https://developers.redhat.com/blog/2019/04/12/understanding-when-not-to-
> stdmove-in-c

I don't think any of those are relevant, they're about *unnecessary* uses of
std::move where that is either entirely redundant or doesn't change semantics
but prevents the optimizer doing better on its own. This is about *necessary*
uses of std::move, and removing the move here would change semantics and
potentially break the code. The request is to replace it with some kind of
magic that does the same as std::move without actually writing std::move.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #4 from Vittorio Romeo  ---
I see that `std::move` is indeed inlined with `-Og`, my apologies on not
noticing that. 

I like the idea of having the compiler itself fold calls to things like
`std::move` and `std::forward` as suggested in the linked
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780. 

But I think this issue I opened should be more general for any standard library
function that ends up impacting debug performance. Another common example in
the gamedev community is `std::vector`.

In this benchmark, which uses `-Og`, you can notice a large performance
difference between a `std::vector` and `int*` dynamic array for operations
that I believe should have equal performance:
- https://quick-bench.com/q/lrS4I-lmDJ3VFP8L8rG2YHGXO-8
- https://quick-bench.com/q/Uf-t79n7uYWAKdThOL_wxSp12Y0

Are the above results also something that should be handled on the compiler
side of things? Or would, for example, marking `std::vector::operator[]` and
`std::vector::iterator::operator*` as `always_inline` remove the performance
discrepancy?

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Eric Gallager  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=67906,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=81159,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=90428
 CC||egallager at gcc dot gnu.org,
   ||mpolacek at gcc dot gnu.org

--- Comment #3 from Eric Gallager  ---
Seems related to some of the requests for more warnings about uses of std::move
that might degrade performance, e.g. bug 67906, bug 81159, bug 90428

Also, Marek's blog post about std::move seems relevant here:
https://developers.redhat.com/blog/2019/04/12/understanding-when-not-to-stdmove-in-c

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #2 from Jonathan Wakely  ---
(In reply to Vittorio Romeo from comment #0)
> Would it be possible to replace `std::move` calls internal to `libstdc++`
> with a cast,

No, absolutely not.

> or some sort of compiler intrinsic?

No, but the compiler could just fold it away (see PR 96780).

> Or maybe mark `std::move`
> as "always inline" even without optimizations enabled? 

Maybe, although since your benchmark shows there is no problem with GCC at -Og
I think this is a Clang problem, not libstdc++ or libc++.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Jonathan Wakely  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=96780

--- Comment #1 from Jonathan Wakely  ---
If it's not inlined at -Og then that's a compiler problem, GCC inlines it at
-Og:
https://quick-bench.com/q/d1H4VjD_Xns4ef7iOfweIy6rPNM