[Bug c++/109387] New: "definition of explicitly-defaulted" error with explicit template instantiation

2023-04-03 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109387

Bug ID: 109387
   Summary: "definition of explicitly-defaulted" error with
explicit template instantiation
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vittorio.romeo at outlook dot com
  Target Milestone: ---

Given this code:

template  struct S { S(); };
// extern template struct S;
template  S::S() = default;
template S::S();

GCC fails to compile with this error:

error: definition of explicitly-defaulted 'S< 
>::S() 
   [with  = int]'
3 | template  S::S() = default;
  |   ^~~~

Uncommenting the extern template fixes compilation. Clang compiles the original
code without any issue.

Live example on Compiler Explorer: 
- https://gcc.godbolt.org/z/YoPhvKnxa

Related StackOverflow thread:
- https://stackoverflow.com/questions/75913223

[Bug target/109380] inline member function symbol not exported with explicit template instantiation declaration on MinGW

2023-04-02 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109380

--- Comment #2 from Vittorio Romeo  ---
Hmm, you might be correct. Seeing that the issue has not been looked at since
2017, are you aware of any workaround besides `-Wl,--export-all-symbols`? 

The issue is preventing me from applying explicit template instantiations in
the SFML codebase for commonly used template types.

[Bug c++/109380] New: inline member function symbol not exported with explicit template instantiation declaration on MinGW

2023-04-02 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109380

Bug ID: 109380
   Summary: inline member function symbol not exported with
explicit template instantiation declaration on MinGW
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vittorio.romeo at outlook dot com
  Target Milestone: ---

Created attachment 54801
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54801=edit
Full example (lib.h, lib.cpp, main.cpp, build.sh)

Given the following code:

// 
// lib.h
template 
struct S
{
void f();
void g() { }
};

template  void S::f() { }
extern template struct __declspec(dllexport) S;

// 
// lib.cpp
#include "lib.h"
template struct S;

// 
// main.cpp
#include "lib.h"
int main() { S{}.g(); }


When building with:

g++ -c -o main.o main.cpp && \
g++ -c -o lib.o lib.cpp && \
g++ -shared -o lib.dll lib.o -Wl,--out-implib,liblib.dll.a && \
g++ -o main.exe main.o -L. -llib


This linker error is erroneously produced:

   
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
main.o:main.cpp:(.text+0x15): undefined reference to `S::g()'
collect2.exe: error: ld returned 1 exit status


This is likely the same as bug #89088:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89088


And probably related to this Clang PR and review:

 https://reviews.llvm.org/D61118


I bumped into this issue today, using GCC version 12.2.0, on MinGW/MSYS2. 
The last bug report is UNCONFIRMED since 2019.

[Bug c++/89088] Dllexport for explicit template instantiation missing inline methods

2023-04-02 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89088

Vittorio Romeo  changed:

   What|Removed |Added

 CC||vittorio.romeo at outlook dot 
com

--- Comment #2 from Vittorio Romeo  ---
Bumped into this issue today. I confirm that it is still present on gcc version
12.2.0, on MSYS2.

Any workaround that does not require exporting all symbols via
'-Wl,--export-all-symbols'?

[Bug c++/107105] New: Consider folding `__and_`, `__or_`, and `__not_` at the front-end level

2022-09-30 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107105

Bug ID: 107105
   Summary: Consider folding `__and_`, `__or_`, and `__not_` at
the front-end level
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vittorio.romeo at outlook dot com
  Target Milestone: ---

This is another possible compilation speed improvement that came to mind after
running ClangBuildAnalyzer on a few open source projects (gzdoom, SFML, some of
my own, ...) and noticing results like these:

 Template sets that took longest to instantiate:
35407 ms: std::__and_<$> (20262 times, avg 1 ms)
17745 ms: std::unique_ptr<$> (916 times, avg 19 ms)
14302 ms: std::__uniq_ptr_data<$> (916 times, avg 15 ms)
14153 ms: std::__uniq_ptr_impl<$> (916 times, avg 15 ms)
13537 ms: std::__or_<$> (15100 times, avg 0 ms)
13046 ms: std::basic_string<$> (2248 times, avg 5 ms)
11706 ms: std::_Hashtable<$> (1051 times, avg 11 ms)
10527 ms: std::unordered_map<$> (545 times, avg 19 ms)
10379 ms: std::is_convertible<$> (11737 times, avg 0 ms)

It looks like `__and_`, `__or_`, and `__not_` are widely used throughout
libstdc++'s implementation, and are used to implement most type traits. 

I was wondering whether it would be possible and somewhat easy to fold these in
the front-end, similarly to what has been done for `std::move` and similar
functions. Another option is to use a compiler intrinsic.

I have not done any research, but I suppose that if this is possible, reducing
the number of instantiations of these small helpers would benefit pretty much
every project using libstdc++. Just an idea -- feel free to close this ticket
if this is not possible or not worth the effort.

[Bug c++/100157] Support `__type_pack_element` like Clang

2022-06-30 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100157

--- Comment #9 from Vittorio Romeo  ---
(In reply to Jonathan Wakely from comment #8)
> (In reply to Vittorio Romeo from comment #6)
> > worthwhile to keep the same name as Clang for compatibility,
> 
> No, that's not an option. Clang's is a built-in template, GCC's can't be (it
> would require considerable internal reworking to support that).
> 
> That's also why we have __integer_pack(N)... instead of __make_integer_seq<>.
> 
> Since GCC's built-in has to use different syntax, it would be a disaster to
> use the same name.
> 
> #if __has_builtin(__type_pack_element)
> // now what? is it a template or a function?
> #endif

Got it, I didn't realize that they had to be wildly different. I guess that as
long as a library developer can wrap either under a portable macro, it should
be fine.

[Bug c++/100157] Support `__type_pack_element` like Clang

2022-06-30 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100157

--- Comment #6 from Vittorio Romeo  ---
Thank you, Jonathan, for looking into this. I feel like it might be worthwhile
to keep the same name as Clang for compatibility, or maybe talk to some Clang
developers and see if there can be an agreement on naming and design that works
for both compilers -- would be nice to have something that works for both GCC
and Clang in the same way.

[Bug c++/96780] debuginfo for std::move and std::forward isn't useful

2022-03-01 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780

--- Comment #7 from Vittorio Romeo  ---
> As discussed on IRC, we might not want to do this folding at -O0 (although 
> I'd personally be happy with it unconditionally).

I think you should reconsider this as discussed in these places:
- https://github.com/llvm/llvm-project/issues/53689
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Compiling in `-O0` is a valid choice when trying to maximize compilation speed
and debuggability, yet pretty much everyone seems to agree that they'd like to
never see `std::move`/`std::forward` in their debugger nor have them introduce
any performance overhead, even in `-O0`.

I would also suggest, as an extension, to consider a more general approach for
other standard library functions. As an example, there are good gains to be
made in terms of debug performance for things like `std::unique_ptr` (see
https://github.com/llvm/llvm-project/issues/53689#issuecomment-1055669228) or
`std::vector::iterator`.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #9 from Vittorio Romeo  ---
I have done some benchmarking for three use cases, both with `-O0` and `-Og`,
hacking my `libstdc++` headers to add `[[gnu::always_inline]]` where deemed
appropriate. 

---

The use cases were:

1. `operator[]` benchmark -- `vector_squareop` and `carray_squareop` as seen
above

2. Iterator benchmark -- `vector_iter` and `carray_iter` as seen above

3. Algorithm benchmark -- `std::accumulate` versus a raw `for` loop

---

All the benchmark results, benchmarking rig specs, and used code available
here:
https://gist.github.com/vittorioromeo/efa005d44ccd4ec7279181768a0c1f0b

---

In short, these are the results:

- For all benchmarks, when using `-O0` without any modification to `libstdc++`,
the overhead of the Standard Library can be huge (+25-400%).

- For all benchmarks, when using `-Og` without any modification to `libstdc++`,
the overhead of the Standard Library is small (+5-15%).

- For the `operator[]` benchmark, when using `-O0` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 25% to around 10%.

- For the `operator[]` benchmark, when using `-Og` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 34% to around 11%. 

- For the iterator benchmark, when using `-O0` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 302% to around 186%. 

- For the iterator benchmark, when using `-Og` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 11% to around 8%. 

- For the algorithm benchmark, when using `-O0` after applying
`[[gnu::always_inline]]` to all the functions touched by the benchmark, we
reduce the overhead from 304% to around 47%.

- For the algorithm benchmark, when using `-Og`, independently of whether we
modify `libstdc++` or not, the overhead is around 36%.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #6 from Vittorio Romeo  ---
> The request is to replace it with some kind of magic that does the same as 
> std::move without actually writing std::move.

More generally speaking, ensure that function such as `std::move`,
`std::forward`, `std::vector::operator[]`,
`std::vector::iterator::operator*`, and so on never appear in debugging call
stacks and do not affect performance in `-Og` (or even `-O0`. 

I think the title for my issue is a bit too specific, but I'd like to make this
a wider discussion in how to mitigate debug performance overhead caused by C++
standard library abstractions.

[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

--- Comment #4 from Vittorio Romeo  ---
I see that `std::move` is indeed inlined with `-Og`, my apologies on not
noticing that. 

I like the idea of having the compiler itself fold calls to things like
`std::move` and `std::forward` as suggested in the linked
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780. 

But I think this issue I opened should be more general for any standard library
function that ends up impacting debug performance. Another common example in
the gamedev community is `std::vector`.

In this benchmark, which uses `-Og`, you can notice a large performance
difference between a `std::vector` and `int*` dynamic array for operations
that I believe should have equal performance:
- https://quick-bench.com/q/lrS4I-lmDJ3VFP8L8rG2YHGXO-8
- https://quick-bench.com/q/Uf-t79n7uYWAKdThOL_wxSp12Y0

Are the above results also something that should be handled on the compiler
side of things? Or would, for example, marking `std::vector::operator[]` and
`std::vector::iterator::operator*` as `always_inline` remove the performance
discrepancy?

[Bug libstdc++/104719] New: Use of `std::move` in libstdc++ leads to worsened debug performance

2022-02-28 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719

Bug ID: 104719
   Summary: Use of `std::move` in libstdc++ leads to worsened
debug performance
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vittorio.romeo at outlook dot com
  Target Milestone: ---

`std::accumulate` is defined as follows in `libstdc++`:

```
  template
_GLIBCXX20_CONSTEXPR
inline _Tp
accumulate(_InputIterator __first, _InputIterator __last, _Tp __init)
{
  // concept requirements
  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
  __glibcxx_requires_valid_range(__first, __last);

  for (; __first != __last; ++__first)
__init = _GLIBCXX_MOVE_IF_20(__init) + *__first;
  return __init;
}
```

Where `_GLIBCXX_MOVE_IF_20` is:

```
#if __cplusplus > 201703L
// _GLIBCXX_RESOLVE_LIB_DEFECTS
// DR 2055. std::move in std::accumulate and other algorithms
# define _GLIBCXX_MOVE_IF_20(_E) std::move(_E)
#else
# define _GLIBCXX_MOVE_IF_20(_E) _E
#endif
```

When compiling a program using `std::accumulate` in debug mode, under `-Og`,
there is a noticeable performance impact due to the presence of `std::move`.
- With `std::move`: https://quick-bench.com/q/h_M_AUs3pgBE3bYr82rsA1_VtjU
- Without `std::move`: https://quick-bench.com/q/ysis2b1CgIZkRsO2cqfjZm9Jkio

This performance degradation is one example of why many people (especially in
the gamedev community) are not adopting standard library algorithms and modern
C++ more widely. 

Would it be possible to replace `std::move` calls internal to `libstdc++` with
a cast, or some sort of compiler intrinsic? Or maybe mark `std::move` as
"always inline" even without optimizations enabled? 

Related issue for libc++:
https://github.com/llvm/llvm-project/issues/53689

[Bug c++/100157] New: Support `__type_pack_element` like Clang

2021-04-20 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100157

Bug ID: 100157
   Summary: Support `__type_pack_element` like Clang
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vittorio.romeo at outlook dot com
  Target Milestone: ---

Clang provides a `__type_pack_element` builtin which allows efficient indexing
of parameter packs in variadic templates, and it seems that GCC has no
equivalent.

This forces users interested in minimizing compilation times to resort to
arcane implemenations such as these ones:
https://github.com/kvasir-io/mpl/blob/development/src/kvasir/mpl/sequence/lookup.hpp

A builtin like `__type_pack_element` would not only allow user code to compile
faster, but also anything inside libstdc++ that needs to index a type list
(e.g. `std::tuple_element_t`) would benefit from it.

This is the Clang test driver, to show an usage example:
https://github.com/llvm-mirror/clang/blob/master/test/SemaCXX/type_pack_element.cpp

This is the Clang pull request:
https://reviews.llvm.org/D15421

[Bug libstdc++/100008] New: std::clamp generates suboptimal assembly for primitive types

2021-04-09 Thread vittorio.romeo at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18

Bug ID: 18
   Summary: std::clamp generates suboptimal assembly for primitive
types
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vittorio.romeo at outlook dot com
  Target Milestone: ---

`std::clamp` generates poor assembly compared to hand-written counterpart for
primitive types like `float`, even with `-Ofast -ffast-math`:

stdclamp(float, float, float):
comiss  xmm0, xmm1
jb  .L2
movaps  xmm1, xmm0
minss   xmm1, xmm2
.L2:
movaps  xmm0, xmm1
ret

myclamp(float, float, float):
maxss   xmm0, xmm1
minss   xmm0, xmm2
ret

Live example:
https://godbolt.org/z/5oxvocevK

More information on:
https://secret.club/2021/04/09/std-clamp.html