[Bug c++/109387] New: "definition of explicitly-defaulted" error with explicit template instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109387 Bug ID: 109387 Summary: "definition of explicitly-defaulted" error with explicit template instantiation Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vittorio.romeo at outlook dot com Target Milestone: --- Given this code: template struct S { S(); }; // extern template struct S; template S::S() = default; template S::S(); GCC fails to compile with this error: error: definition of explicitly-defaulted 'S< >::S() [with = int]' 3 | template S::S() = default; | ^~~~ Uncommenting the extern template fixes compilation. Clang compiles the original code without any issue. Live example on Compiler Explorer: - https://gcc.godbolt.org/z/YoPhvKnxa Related StackOverflow thread: - https://stackoverflow.com/questions/75913223
[Bug target/109380] inline member function symbol not exported with explicit template instantiation declaration on MinGW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109380 --- Comment #2 from Vittorio Romeo --- Hmm, you might be correct. Seeing that the issue has not been looked at since 2017, are you aware of any workaround besides `-Wl,--export-all-symbols`? The issue is preventing me from applying explicit template instantiations in the SFML codebase for commonly used template types.
[Bug c++/109380] New: inline member function symbol not exported with explicit template instantiation declaration on MinGW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109380 Bug ID: 109380 Summary: inline member function symbol not exported with explicit template instantiation declaration on MinGW Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vittorio.romeo at outlook dot com Target Milestone: --- Created attachment 54801 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54801=edit Full example (lib.h, lib.cpp, main.cpp, build.sh) Given the following code: // // lib.h template struct S { void f(); void g() { } }; template void S::f() { } extern template struct __declspec(dllexport) S; // // lib.cpp #include "lib.h" template struct S; // // main.cpp #include "lib.h" int main() { S{}.g(); } When building with: g++ -c -o main.o main.cpp && \ g++ -c -o lib.o lib.cpp && \ g++ -shared -o lib.dll lib.o -Wl,--out-implib,liblib.dll.a && \ g++ -o main.exe main.o -L. -llib This linker error is erroneously produced: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: main.o:main.cpp:(.text+0x15): undefined reference to `S::g()' collect2.exe: error: ld returned 1 exit status This is likely the same as bug #89088: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89088 And probably related to this Clang PR and review: https://reviews.llvm.org/D61118 I bumped into this issue today, using GCC version 12.2.0, on MinGW/MSYS2. The last bug report is UNCONFIRMED since 2019.
[Bug c++/89088] Dllexport for explicit template instantiation missing inline methods
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89088 Vittorio Romeo changed: What|Removed |Added CC||vittorio.romeo at outlook dot com --- Comment #2 from Vittorio Romeo --- Bumped into this issue today. I confirm that it is still present on gcc version 12.2.0, on MSYS2. Any workaround that does not require exporting all symbols via '-Wl,--export-all-symbols'?
[Bug c++/107105] New: Consider folding `__and_`, `__or_`, and `__not_` at the front-end level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107105 Bug ID: 107105 Summary: Consider folding `__and_`, `__or_`, and `__not_` at the front-end level Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vittorio.romeo at outlook dot com Target Milestone: --- This is another possible compilation speed improvement that came to mind after running ClangBuildAnalyzer on a few open source projects (gzdoom, SFML, some of my own, ...) and noticing results like these: Template sets that took longest to instantiate: 35407 ms: std::__and_<$> (20262 times, avg 1 ms) 17745 ms: std::unique_ptr<$> (916 times, avg 19 ms) 14302 ms: std::__uniq_ptr_data<$> (916 times, avg 15 ms) 14153 ms: std::__uniq_ptr_impl<$> (916 times, avg 15 ms) 13537 ms: std::__or_<$> (15100 times, avg 0 ms) 13046 ms: std::basic_string<$> (2248 times, avg 5 ms) 11706 ms: std::_Hashtable<$> (1051 times, avg 11 ms) 10527 ms: std::unordered_map<$> (545 times, avg 19 ms) 10379 ms: std::is_convertible<$> (11737 times, avg 0 ms) It looks like `__and_`, `__or_`, and `__not_` are widely used throughout libstdc++'s implementation, and are used to implement most type traits. I was wondering whether it would be possible and somewhat easy to fold these in the front-end, similarly to what has been done for `std::move` and similar functions. Another option is to use a compiler intrinsic. I have not done any research, but I suppose that if this is possible, reducing the number of instantiations of these small helpers would benefit pretty much every project using libstdc++. Just an idea -- feel free to close this ticket if this is not possible or not worth the effort.
[Bug c++/100157] Support `__type_pack_element` like Clang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100157 --- Comment #9 from Vittorio Romeo --- (In reply to Jonathan Wakely from comment #8) > (In reply to Vittorio Romeo from comment #6) > > worthwhile to keep the same name as Clang for compatibility, > > No, that's not an option. Clang's is a built-in template, GCC's can't be (it > would require considerable internal reworking to support that). > > That's also why we have __integer_pack(N)... instead of __make_integer_seq<>. > > Since GCC's built-in has to use different syntax, it would be a disaster to > use the same name. > > #if __has_builtin(__type_pack_element) > // now what? is it a template or a function? > #endif Got it, I didn't realize that they had to be wildly different. I guess that as long as a library developer can wrap either under a portable macro, it should be fine.
[Bug c++/100157] Support `__type_pack_element` like Clang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100157 --- Comment #6 from Vittorio Romeo --- Thank you, Jonathan, for looking into this. I feel like it might be worthwhile to keep the same name as Clang for compatibility, or maybe talk to some Clang developers and see if there can be an agreement on naming and design that works for both compilers -- would be nice to have something that works for both GCC and Clang in the same way.
[Bug c++/96780] debuginfo for std::move and std::forward isn't useful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780 --- Comment #7 from Vittorio Romeo --- > As discussed on IRC, we might not want to do this folding at -O0 (although > I'd personally be happy with it unconditionally). I think you should reconsider this as discussed in these places: - https://github.com/llvm/llvm-project/issues/53689 - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Compiling in `-O0` is a valid choice when trying to maximize compilation speed and debuggability, yet pretty much everyone seems to agree that they'd like to never see `std::move`/`std::forward` in their debugger nor have them introduce any performance overhead, even in `-O0`. I would also suggest, as an extension, to consider a more general approach for other standard library functions. As an example, there are good gains to be made in terms of debug performance for things like `std::unique_ptr` (see https://github.com/llvm/llvm-project/issues/53689#issuecomment-1055669228) or `std::vector::iterator`.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #9 from Vittorio Romeo --- I have done some benchmarking for three use cases, both with `-O0` and `-Og`, hacking my `libstdc++` headers to add `[[gnu::always_inline]]` where deemed appropriate. --- The use cases were: 1. `operator[]` benchmark -- `vector_squareop` and `carray_squareop` as seen above 2. Iterator benchmark -- `vector_iter` and `carray_iter` as seen above 3. Algorithm benchmark -- `std::accumulate` versus a raw `for` loop --- All the benchmark results, benchmarking rig specs, and used code available here: https://gist.github.com/vittorioromeo/efa005d44ccd4ec7279181768a0c1f0b --- In short, these are the results: - For all benchmarks, when using `-O0` without any modification to `libstdc++`, the overhead of the Standard Library can be huge (+25-400%). - For all benchmarks, when using `-Og` without any modification to `libstdc++`, the overhead of the Standard Library is small (+5-15%). - For the `operator[]` benchmark, when using `-O0` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 25% to around 10%. - For the `operator[]` benchmark, when using `-Og` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 34% to around 11%. - For the iterator benchmark, when using `-O0` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 302% to around 186%. - For the iterator benchmark, when using `-Og` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 11% to around 8%. - For the algorithm benchmark, when using `-O0` after applying `[[gnu::always_inline]]` to all the functions touched by the benchmark, we reduce the overhead from 304% to around 47%. - For the algorithm benchmark, when using `-Og`, independently of whether we modify `libstdc++` or not, the overhead is around 36%.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #6 from Vittorio Romeo --- > The request is to replace it with some kind of magic that does the same as > std::move without actually writing std::move. More generally speaking, ensure that function such as `std::move`, `std::forward`, `std::vector::operator[]`, `std::vector::iterator::operator*`, and so on never appear in debugging call stacks and do not affect performance in `-Og` (or even `-O0`. I think the title for my issue is a bit too specific, but I'd like to make this a wider discussion in how to mitigate debug performance overhead caused by C++ standard library abstractions.
[Bug libstdc++/104719] Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 --- Comment #4 from Vittorio Romeo --- I see that `std::move` is indeed inlined with `-Og`, my apologies on not noticing that. I like the idea of having the compiler itself fold calls to things like `std::move` and `std::forward` as suggested in the linked https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780. But I think this issue I opened should be more general for any standard library function that ends up impacting debug performance. Another common example in the gamedev community is `std::vector`. In this benchmark, which uses `-Og`, you can notice a large performance difference between a `std::vector` and `int*` dynamic array for operations that I believe should have equal performance: - https://quick-bench.com/q/lrS4I-lmDJ3VFP8L8rG2YHGXO-8 - https://quick-bench.com/q/Uf-t79n7uYWAKdThOL_wxSp12Y0 Are the above results also something that should be handled on the compiler side of things? Or would, for example, marking `std::vector::operator[]` and `std::vector::iterator::operator*` as `always_inline` remove the performance discrepancy?
[Bug libstdc++/104719] New: Use of `std::move` in libstdc++ leads to worsened debug performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104719 Bug ID: 104719 Summary: Use of `std::move` in libstdc++ leads to worsened debug performance Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: vittorio.romeo at outlook dot com Target Milestone: --- `std::accumulate` is defined as follows in `libstdc++`: ``` template _GLIBCXX20_CONSTEXPR inline _Tp accumulate(_InputIterator __first, _InputIterator __last, _Tp __init) { // concept requirements __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>) __glibcxx_requires_valid_range(__first, __last); for (; __first != __last; ++__first) __init = _GLIBCXX_MOVE_IF_20(__init) + *__first; return __init; } ``` Where `_GLIBCXX_MOVE_IF_20` is: ``` #if __cplusplus > 201703L // _GLIBCXX_RESOLVE_LIB_DEFECTS // DR 2055. std::move in std::accumulate and other algorithms # define _GLIBCXX_MOVE_IF_20(_E) std::move(_E) #else # define _GLIBCXX_MOVE_IF_20(_E) _E #endif ``` When compiling a program using `std::accumulate` in debug mode, under `-Og`, there is a noticeable performance impact due to the presence of `std::move`. - With `std::move`: https://quick-bench.com/q/h_M_AUs3pgBE3bYr82rsA1_VtjU - Without `std::move`: https://quick-bench.com/q/ysis2b1CgIZkRsO2cqfjZm9Jkio This performance degradation is one example of why many people (especially in the gamedev community) are not adopting standard library algorithms and modern C++ more widely. Would it be possible to replace `std::move` calls internal to `libstdc++` with a cast, or some sort of compiler intrinsic? Or maybe mark `std::move` as "always inline" even without optimizations enabled? Related issue for libc++: https://github.com/llvm/llvm-project/issues/53689
[Bug c++/100157] New: Support `__type_pack_element` like Clang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100157 Bug ID: 100157 Summary: Support `__type_pack_element` like Clang Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vittorio.romeo at outlook dot com Target Milestone: --- Clang provides a `__type_pack_element` builtin which allows efficient indexing of parameter packs in variadic templates, and it seems that GCC has no equivalent. This forces users interested in minimizing compilation times to resort to arcane implemenations such as these ones: https://github.com/kvasir-io/mpl/blob/development/src/kvasir/mpl/sequence/lookup.hpp A builtin like `__type_pack_element` would not only allow user code to compile faster, but also anything inside libstdc++ that needs to index a type list (e.g. `std::tuple_element_t`) would benefit from it. This is the Clang test driver, to show an usage example: https://github.com/llvm-mirror/clang/blob/master/test/SemaCXX/type_pack_element.cpp This is the Clang pull request: https://reviews.llvm.org/D15421
[Bug libstdc++/100008] New: std::clamp generates suboptimal assembly for primitive types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18 Bug ID: 18 Summary: std::clamp generates suboptimal assembly for primitive types Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: vittorio.romeo at outlook dot com Target Milestone: --- `std::clamp` generates poor assembly compared to hand-written counterpart for primitive types like `float`, even with `-Ofast -ffast-math`: stdclamp(float, float, float): comiss xmm0, xmm1 jb .L2 movaps xmm1, xmm0 minss xmm1, xmm2 .L2: movaps xmm0, xmm1 ret myclamp(float, float, float): maxss xmm0, xmm1 minss xmm0, xmm2 ret Live example: https://godbolt.org/z/5oxvocevK More information on: https://secret.club/2021/04/09/std-clamp.html