[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099 --- Comment #13 from andysem at mail dot ru --- (In reply to Jonathan Wakely from comment #12) > (In reply to andysem from comment #11) > > > I'm not sure what you mean by "the compiler is free to generate code that > > > takes it into account." Takes what into account? What problem does that > > > freedom cause? > > > > I mean the compiler could move (some part of) dynamic_cast to precede the > > check for the standard facet. I.e. of something like this: > > > > template< typename _Facet > > > const _Facet* __try_use_facet(locale const& loc) > > { > > const size_t __i = _Facet::id._M_id(); > > const locale::facet** __facets = __loc._M_impl->_M_facets; > > const _Facet* __dyn_facet = __dynamic_cast< const _Facet* > > >(__facets[__i]); > > > > // checks for every standard facet type > > if (__is_same(_Facet, ...)) > > return static_cast(__facets[__i]); > > > > return __dyn_facet; > > } > > But why? Maybe I'm being slow but I still don't understand what problem is > being solved here. > > Also this would defeat the optimization that avoids the unnecessary overhead > of dynamic_cast for standard facets. I have seen gcc sometimes reorder code like this (i.e. move code from under a branch before it), presumably to improve ILP or prefetch data, I'm not sure. Yes, that defeats the branch, if it is used as an optimization, and I had to prevent this by adding compiler fences in those cases. Granted, in my case it happened with inlined code, but I imagine LTO makes it possible to perform similar code transformations with out-of-line code as well. I'm not saying this is what actually happens. I'm just pointing out that even the code that is apparently unreachable may influence codegen and cause ODR issues. > > AFAIK, the standard or libstdc++ docs do not require RTTI for std::locale to > > function. > > The standard requires RTTI, period. Using -fno-rtti is completely > non-standard and so the standard has nothing to say about it. Yes, but the standard is written with implementations in mind.
[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099 --- Comment #11 from andysem at mail dot ru --- > I'm not sure what you mean by "the compiler is free to generate code that > takes it into account." Takes what into account? What problem does that > freedom cause? I mean the compiler could move (some part of) dynamic_cast to precede the check for the standard facet. I.e. of something like this: template< typename _Facet > const _Facet* __try_use_facet(locale const& loc) { const size_t __i = _Facet::id._M_id(); const locale::facet** __facets = __loc._M_impl->_M_facets; const _Facet* __dyn_facet = __dynamic_cast< const _Facet* >(__facets[__i]); // checks for every standard facet type if (__is_same(_Facet, ...)) return static_cast(__facets[__i]); return __dyn_facet; } >> I think, a failing dynamic_cast would not be useful as this would make >> std::use_facet unusable with -fno-rtti. > > I don't see a problem with that. If you want to use a polymorphic container > of facets, you need RTTI. The standard facets will work, but it doesn't seem > reasonable to expect it to work for program-defined facets. AFAIK, the standard or libstdc++ docs do not require RTTI for std::locale to function. In fact, the facet::id stuff seems to exist precisely to make RTTI optional in the implementations. Besides, the code, as it was written, seems to intend to work without RTTI by switching to static_cast instead of dynamic_cast. So making std::use_facet always fail would go against that intention. > And a std::use_facet that fails seems better than one that segfaults, doesn't > it? No, not really. It still means users cannot use locale without RTTI, just for a different reason. > And I guess if the user's derived facet uses virtual inheritance from > std::locale::facet, then this could break with the > static_cast*> in libstdc++ today. Hmm. std::ctype non-virtually derives from std::locale::facet, so no, that would not break. What wouldn't work is this: class my_facet : virtual public std::ctype< char > {}; my_facet const& fac = std::use_facet< my_facet >(loc); But this would fail in compile time with no RTTI, which *is* better than a segfault or an exception in runtime.
[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099 --- Comment #4 from andysem at mail dot ru --- > It's mostly OK to mix code with -frtti and -fno-rtti, but sometimes it bites > you. Note that in this case, it is the standard library that is built with -frtti and the rest of the code is built with -fno-rtti. I.e. you *always* get this sort of mix when you specify -fno-rtti.
[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099 --- Comment #3 from andysem at mail dot ru --- I think, a failing dynamic_cast would not be useful as this would make std::use_facet unusable with -fno-rtti. Re. ODR violation in the latest code, while it is true that the dynamic/static_cast is not reachable for the standard facets, it is still part of the function definition and the compiler is free to generate code that takes it into account. Note that the shortcuts for the standard facets are implemented with conditional if-constexpr, which will turn into regular ifs if libstdc++ is compiled in pre-C++17 mode. Which, I think, is the default, isn't it? I think, the ODR violation is still worth fixing.
[Bug libstdc++/113099] New: locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099 Bug ID: 113099 Summary: locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2 Product: gcc Version: 11.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider this test case: ``` #include class __attribute__((__visibility__("default"))) my_codecvt final : public std::codecvt< wchar_t, char, std::mbstate_t > { public: explicit my_codecvt(std::size_t refs = 0) : std::codecvt< wchar_t, char, std::mbstate_t >(refs) { } protected: bool do_always_noconv() const noexcept override { return false; } int do_encoding() const noexcept override { return 0; } std::codecvt_base::result do_in(std::mbstate_t&, const char*, const char*, const char*&, wchar_t*, wchar_t*, wchar_t*&) const override { return ok; } std::codecvt_base::result do_out(std::mbstate_t&, const wchar_t*, const wchar_t*, const wchar_t*&, char*, char*, char*&) const override { return ok; } std::codecvt_base::result do_unshift(std::mbstate_t&, char*, char*, char*&) const override { return ok; } int do_length(std::mbstate_t&, const char*, const char*, std::size_t) const override { return 0; } int do_max_length() const noexcept override { return 0; } }; int main() { std::locale loc(std::locale(), new my_codecvt()); auto const& fac = std::use_facet< std::codecvt< wchar_t, char, std::mbstate_t > >(loc); (void)fac; } ``` ``` g++ -std=c++17 -fno-rtti -o locale_no_rtti locale_no_rtti.cpp ``` When compiled with RTTI disabled, with the command line above, this code crashes with the following backtrace: ``` #0 0x77caccd1 in __dynamic_cast () from /lib/x86_64-linux-gnu/libstdc++.so.6 #1 0x77d5307f in std::codecvt const& std::use_facet >(std::locale const&) () from /lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x539b in main () ``` This reproduces on gcc 11.4 and 12 at least. ``` $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) ``` This was apparently fixed by accident in gcc 13.2 by this commit: https://github.com/gcc-mirror/gcc/commit/b3ac43a3c05744d62a963d656bed782fc867ad79 The commit introduces shortcuts that use static_casts for the standard facets, which allows to avoid the crash, but that still retains an ODR violation, where the explicitly instantiated __try_use_facet templates in libstdc++ library use dynamic_cast (even if unreachable) and the template definition that is visible to user's code uses static_cast. The problematic code is here: https://github.com/gcc-mirror/gcc/blob/d7e9ae4fa94afd5517536b4dfc7d6be0b3e8c2c3/libstdc%2B%2B-v3/include/bits/locale_classes.tcc#L142-L146 When libstdc++ is compiled, RTTI is enabled and __cpp_rtti is defined, but when user's code is compiled with RTTI disabled, that macro is not defined, so the __try_use_facet template definition is different. It doesn't seem like the commit I mentioned above intended to fix the original issue with dynamic_cast anyway, so I thought it was worth creating this bug report, even though the original test case passes on the latest gcc. I think, the code
[Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096 andysem at mail dot ru changed: What|Removed |Added CC||andysem at mail dot ru --- Comment #11 from andysem at mail dot ru --- (In reply to Andrew Pinski from comment #10) > (In reply to Peter Dimov from comment #9) > > I don't think I want WFE here, based on what I read about it. Putting the > > core to sleep seems like something to do in an embedded system where I have > > full control of what cores do, not something to do on the application level, > > in a portable C++ library. > > No, WFE can be used in userspace just fine and in fact it will be > interrupted every once in a while. yield only sleeps for a few cycles and > then wakes up, while wfe will sleep until an event happens (also WFE is very > hypervisor friendly too). Spin locks are used when latency is a concern and when the protected region is extremely small (i.e. a few instructions). Putting the core to sleep until the next interrupt does not seem appropriate for this purpose. x86 pause and ARM yield are better suited exactly because they wait for a time in the order of cycles (up to about a hundred on recent x86) rather than microseconds or more. I can add that in most spin lock implementations I have seen, either yield, nop or nothing is used for wasting CPU cycles. A few examples: https://www.boost.org/doc/libs/1_82_0/boost/fiber/detail/cpu_relax.hpp https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/vdso/processor.h#L11 https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/vdso/processor.h#L12 https://chromium.googlesource.com/chromium/src/third_party/WebKit/Source/wtf/+/823d62cdecdbd5f161634177e130e5ac01eb7b48/SpinLock.cpp The first link has instructions for a few other architectures besides ARM and x86. I agree with Peter that an architecture-neutral intrinsic could be useful to avoid this kind of code duplicated in various projects. Although I realize that specifying the exact behavior of this intrinsic would be difficult, since even the underlying instructions are defined rather vaguely. However, one thing is certain: this intrinsic must be a full compiler fence.
[Bug c++/109761] Nested class destructor's noexcept specification incorrectly considered as too loose compared to the outer class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109761 --- Comment #2 from andysem at mail dot ru --- I don't see how completeness of outer is related to nested's destructor. Or put it another way, how nested's destructor noexcept specification has anything to do with outer, whether it is completed or not.
[Bug c++/109761] New: Nested class destructor's noexcept specification incorrectly considered as too loose compared to the outer class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109761 Bug ID: 109761 Summary: Nested class destructor's noexcept specification incorrectly considered as too loose compared to the outer class Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider the following piece of code: template< typename T > T declval() noexcept; struct base { virtual ~base() {} }; class outer : public base { protected: class nested { private: outer& m_outer; public: explicit nested(outer& o) noexcept(noexcept(o.on_nested_ctor())) : m_outer(o) { o.on_nested_ctor(); } nested(nested const&) = delete; nested& operator= (nested const&) = delete; ~nested() noexcept(noexcept(declval().on_nested_dtor())) { m_outer.on_nested_dtor(); } }; public: ~outer() {} protected: void on_nested_ctor(); void on_nested_dtor(); }; This wrongly fails to compile: $ g++ -std=gnu++17 -o nested_noexcept_dtor.o -c nested_noexcept_dtor.cpp nested_noexcept_dtor.cpp:28:9: error: looser exception specification on overriding virtual function ‘outer::nested::~nested() noexcept (false)’ 28 | ~nested() noexcept(noexcept(declval().on_nested_dtor())) | ^ nested_noexcept_dtor.cpp:6:13: note: overridden function is ‘virtual base::~base() noexcept’ 6 | virtual ~base() {} | ^ The compiler tries to match the ~nested() noexcept specification against the ~base()'s as if nested was derived from base, but it is not. ~nested() noexcept specification has no bearing on the outer class hierarchy. $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)
[Bug target/98612] _mm_comieq_sd has wrong semantics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98612 andysem at mail dot ru changed: What|Removed |Added CC||andysem at mail dot ru --- Comment #9 from andysem at mail dot ru --- Some relevant discussion here: https://stackoverflow.com/questions/75818896/mm-comieq-ss-difference-between-clang-and-gcc Indeed, the description for _mm_comieq_ss in the Intrinsics Guide 3.6.5 as of 2022-01-24 takes NaNs into account: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=comiss_expand=1424 I think, the point that the intrinsic should correspond to the naked instruction is not correct in this case. The intrinsic corresponds to the same instruction either way, the question is only which flags are considered the result of the intrinsic. And it seems that the intended behavior (as indicated by the updated docs, icc and clang) is to check ZF & PF, not just ZF as gcc is currently doing.
[Bug target/108401] gcc defeats vector constant generation with intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401 --- Comment #7 from andysem at mail dot ru --- To be clear, I'm not asking the compiler to recognize the particular pattern of alternating 0x00 and 0xFF bytes. Because hardcoding this particular pattern won't improve generated code in other cases. Rather, I'm asking to tune down code transformations for intrinsics. If the developer wrote a sequence of intrinsics to generate a constant then he probably wanted that sequence instead of a simple _mm_set1_epi32 or a load from memory. But, if you're going to improve constant generation, please make it so that it can recognize not only the particular pattern described in this bug. More importantly, it should recognize the all-ones case (as a single pcmpeq) as a starting point. Then it can apply shifts to achieve the final result from the all-ones vector - shifts of any width, length or direction, including psrldq/pslldq. This would improve generated code in a wider range of cases.
[Bug target/108401] gcc defeats vector constant generation with intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401 --- Comment #6 from andysem at mail dot ru --- (In reply to Andrew Pinski from comment #1) > >and gcc 12 generates a worse code: > > it is not worse really; depending on the how fast moving between the > register sets is. I meant "worse" compared to vpcmpeq+vpsrlw pair. (Side note about the broadcast version: it could have been smaller if it used a 32-bit constant and vpbroadcastd. vpcmpeq+vpsrlw would still be better in this particular case, but if broadcast is needed, a smaller footprint code is preferred.)
[Bug target/108401] New: gcc defeats vector constant generation with intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401 Bug ID: 108401 Summary: gcc defeats vector constant generation with intrinsics Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider the following code: #include __m256i load_00FF() { __m256i mm = _mm256_setzero_si256(); return _mm256_srli_epi16(_mm256_cmpeq_epi64(mm, mm), 8); } This function generates a vector constant of alternating 0xFF and 0x00 bytes. The code is written this way to avoid a load from memory, which may cause a cache miss. The expected generated code is this: vpcmpeqqymm0, ymm0, ymm0 vpsrlw ymm0, ymm0, 8 ret which is almost exactly what gcc 8 generates (it uses vpcmpeqd instead of vpcmpeqq, which is fine). However, gcc 9 through 11 generates a memory load instead, defeating the attempt to avoid it: vmovdqa ymm0, YMMWORD PTR .LC0[rip] ret and gcc 12 generates a worse code: movabs rax, 71777214294589695 vmovq xmm1, rax vpbroadcastqymm0, xmm1 ret In all cases, the compiler flags are: -O3 -march=haswell Code on godbolt.org: https://gcc.godbolt.org/z/sfT787PY9 I think the compiler should follow the code in intrinsics more closely since despite the apparent equivalence, the choice of instructions can have performance implications. The original code that is written by the developer is better anyway, so it's not clear why the compiler is being so creative in this case.
[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978 --- Comment #6 from andysem at mail dot ru --- (In reply to Jonathan Wakely from comment #5) > (In reply to andysem from comment #3) > > Is there no way to improve standard components implementation? I'd imagine > > you could provide the new implementation in the new version inline namespace > > and still support the old ABI for backward compatibility. > > To give a more complete answer: Inline namespaces don't help, that's a myth. > > struct X { std::optional b; }; > > Now one translation unit has X using the old ABI and one has X using the new > ABI, but they're the same X. The fact that the two versions of optional are > in different namespaces is no help at all. You still have an ABI break. > > We did it for std::string and it was a multi-year effort that caused > disruption across the industry. It's not worth doing that again to save a > few bytes in std::optional. You could include a hash of all members' namespaces in the X's mangled name. Or maybe add an attribute that would propagate a tag to the outer class' mangled name. Yes, that's also an ABI change in itself, but a useful one that would allow inline namespaces to become useful. I think a way to improve standard library components is essential. If not inline namespaces then something else. Otherwise there is no evolution path of the standard library components.
[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808 --- Comment #10 from andysem at mail dot ru --- (In reply to andysem from comment #8) > (In reply to Jonathan Wakely from comment #7) > > > > Do you want ODR violations? Because that's how you get ODR violations. > > I understand this, but my point is that this is a breaking change, > apparently even with the constructor being marked explicit... I take it back, with the constructor marked explicit it doesn't break. Before, I tested the preprocessed source from gcc 11.2, which didn't have explicit. Adding the explicit fixes compilation. Sorry for the confusion.
[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808 --- Comment #8 from andysem at mail dot ru --- (In reply to Jonathan Wakely from comment #7) > (In reply to andysem from comment #6) > > So do you think this is a problem in Boost.Filesystem? > > I don't know yet, I can't reproduce it with the Boost in Fedora 36, and I > haven't looked further. I have added a workaround to the current Boost.Filesystem develop and master, and Boost 1.80 (the latest official release) didn't support std::string_view, so is not affected either. In order to reproduce you would need the revision I posted in the original problem description. > > I would say this is a regression in string_view as the code is valid in > > pre-C++23, and I would expect it to stay valid in C++23 onwards. > > That range constructor is explicit in the C++23 draft now, but that change > hasn't been backordered to GCC 12 yet, and apparently doesn't change > anything if you still see this in trunk. > > > Shouldn't > > the range constructor be simply disabled when fs::path::iterator is not > > defined? > > Do you want ODR violations? Because that's how you get ODR violations. I understand this, but my point is that this is a breaking change, apparently even with the constructor being marked explicit, and it breaks in a rather surprising way. Also, apparently MSVC is able to not break somehow as it doesn't have this issue. So maybe there is a solution that does not introduce ODR violations and yet still works. I think it would be preferable to fix std::string_view rather than all its uses like in std/Boost.Filesystem.
[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808 --- Comment #6 from andysem at mail dot ru --- So do you think this is a problem in Boost.Filesystem? I would say this is a regression in string_view as the code is valid in pre-C++23, and I would expect it to stay valid in C++23 onwards. Shouldn't the range constructor be simply disabled when fs::path::iterator is not defined?
[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808 --- Comment #4 from andysem at mail dot ru --- It fails the same way with 12.2 and trunk on godbolt: https://gcc.godbolt.org/z/rT6TWhhvP
[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808 --- Comment #3 from andysem at mail dot ru --- I don't have the environment to build gcc locally, so I can't readily test trunk. But I have been told the same issue reproduces with gcc-12 20220319-1ubuntu1 from Ubuntu 22.04: https://github.com/boostorg/multiprecision/runs/8117686288?check_suite_focus=true#step:17:248
[Bug libstdc++/106808] New: std::string_view range concept requirement causes compile error with Boost.Filesystem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808 Bug ID: 106808 Summary: std::string_view range concept requirement causes compile error with Boost.Filesystem Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Created attachment 53528 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53528=edit Preprocessed code of the reproducer. The following code fails to compile in C++23 mode: #include int main() { boost::filesystem::path p = "foo"; boost::filesystem::path p2 = p; } $ g++ -std=c++23 -I. -c fs_concepts_issue.cpp -o fs_concepts_issue.o In file included from /usr/include/c++/11/bits/stl_iterator_base_types.h:71, from /usr/include/c++/11/bits/stl_algobase.h:65, from /usr/include/c++/11/bits/char_traits.h:39, from /usr/include/c++/11/string:40, from /usr/include/c++/11/bits/locale_classes.h:40, from /usr/include/c++/11/locale:39, from ./boost/filesystem/detail/path_traits.hpp:18, from ./boost/filesystem/path.hpp:21, from fs_concepts_issue.cpp:1: /usr/include/c++/11/bits/ranges_base.h: In substitution of ‘template requires (__maybe_borrowed_range<_Tp>) && ((is_array_v::type>) || (__member_begin<_Tp>) || (__adl_begin<_Tp>)) constexpr auto std::ranges::__cust_access::_Begin::operator()(_Tp&&) const [with _Tp = const boost::filesystem::path&]’: /usr/include/c++/11/bits/ranges_base.h:576:15: required by substitution of ‘template requires !(is_same_v<_DRange, std::basic_string_view>) && (contiguous_range<_Range>) && (sized_range<_Range>) && (is_same_v)()))>::type, std::indirectly_readable_traits)()))>::type> >::__iter_traits)()))>::type, std::indirectly_readable_traits)()))>::type> >::value_type, _CharT>) && !(is_convertible_v<_Range, const _CharT*>) && !requires(_DRange& __d) {__d->__conv_op ();} && (!requires{typename _DRange::traits_type;} || (is_same_v)) constexpr std::basic_string_view::basic_string_view(_Range&&) [with _Range = wchar_t; _DRange = std::char_traits]’ ./boost/filesystem/detail/path_traits.hpp:514:93: required from ‘constexpr const bool boost::filesystem::detail::path_traits::is_convertible_to_std_string_view::value’ ./boost/type_traits/disjunction.hpp:31:27: required from ‘struct boost::disjunction, boost::filesystem::detail::path_traits::is_convertible_to_path_source_non_std_string_view >’ ./boost/filesystem/detail/path_traits.hpp:537:8: required from ‘struct boost::filesystem::detail::path_traits::is_convertible_to_path_source’ ./boost/type_traits/disjunction.hpp:26:8: required from ‘struct boost::disjunction >’ ./boost/type_traits/disjunction.hpp:30:8: required from ‘struct boost::disjunction, boost::filesystem::detail::path_traits::is_convertible_to_path_source >’ ./boost/type_traits/conjunction.hpp:31:27: required from ‘struct boost::conjunction, boost::filesystem::detail::path_traits::is_convertible_to_path_source >, boost::negation > >’ ./boost/filesystem/path.hpp:249:9: required by substitution of ‘template boost::filesystem::path::path(const Source&) [with Source = boost::filesystem::path; = ]’ fs_concepts_issue.cpp:6:34: required from here /usr/include/c++/11/bits/iterator_concepts.h:942:15: required for the satisfaction of ‘__member_begin<_Tp>’ [with _Tp = const boost::filesystem::path&] /usr/include/c++/11/bits/iterator_concepts.h:942:32: in requirements with ‘_Tp& __t’ [with _Tp = const boost::filesystem::path&] /usr/include/c++/11/bits/iterator_concepts.h:942:32: error: satisfaction value of atomic constraint ‘requires(_Tp& __t) {{std::ranges::__cust_access::__decay_copy(__t->begin())} -> decltype(auto) [requires std::input_or_output_iterator<, >];} [with _Tp = const boost::filesystem::path&]’ changed from ‘false’ to ‘true’ 942 | concept __member_begin = requires(_Tp& __t) |^~ 943 | { | ~ 944 | { __decay_copy(__t.begin()) } -> input_or_output_iterator; | ~~ 945 | }; | ~ In file included from /usr/include/c++/11/string_view:48, from /usr/include/c++/11/bits/basic_string.h:48, from /usr/include/c++/11/string:55, from /usr/include/c++/11/bits/locale_classes.h:40, from /usr/include/c++/11/locale:39, from ./boost/filesystem/detail/path_traits.hpp:18, from ./boost/filesystem/path.hpp:21, from fs_concepts_issue.cpp:1: /usr/include/c++/11/bits/ranges_base.h:576:22: note: satisfaction
[Bug libstdc++/105857] codecvt::do_length causes unexpected buffer overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857 --- Comment #2 from andysem at mail dot ru --- > outside the [s, s + max_size) range This should be [from, from_to) range. Sorry, posted a little too soon.
[Bug libstdc++/105857] codecvt::do_length causes unexpected buffer overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857 --- Comment #1 from andysem at mail dot ru --- Created attachment 53089 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53089=edit Test case to reproduce the problem.
[Bug libstdc++/105857] New: codecvt::do_length causes unexpected buffer overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857 Bug ID: 105857 Summary: codecvt::do_length causes unexpected buffer overflow Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider the following test case: #include #include const std::size_t max_size = 10u; const char text[] = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"; int main() { std::locale loc; std::codecvt< wchar_t, char, std::mbstate_t > const& fac = std::use_facet< std::codecvt< wchar_t, char, std::mbstate_t > >(loc); std::mbstate_t mbs = std::mbstate_t(); const char* from = text; const char* from_to = from + max_size; std::size_t max = ~static_cast< std::size_t >(0u); return static_cast< std::size_t >(fac.length(mbs, from, from_to, max)); } $ g++ -g2 -O0 -o codecvt_length_bug codecvt_length_bug.cpp Running this causes a crash with a buffer overflow: Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737348011840, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x77b56476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x77b3c7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x77b9d6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x77cef943 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155 #6 0x77c4a76a in __GI___fortify_fail (msg=msg@entry=0x77cef8e9 "buffer overflow detected") at ./debug/fortify_fail.c:26 #7 0x77c490c6 in __GI___chk_fail () at ./debug/chk_fail.c:28 #8 0x77c4a199 in __mbsnrtowcs_chk (dst=, src=, nmc=, len=, ps=, dstlen=) at ./debug/mbsnrtowcs_chk.c:27 #9 0x77e290d2 in std::codecvt::do_length(__mbstate_t&, char const*, char const*, unsigned long) const () from /lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x52d3 in std::__codecvt_abstract_base::length (this=0x77f86090, __state=..., __from=0x6040 " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", __end=0x604a "*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", __max=18446744073709551615) at /usr/include/c++/11/bits/codecvt.h:219 #11 0x523d in main () at codecvt_length_bug.cpp:14 The problem appears to be that std::codecvt< wchar_t, char, std::mbstate_t >::do_length() accesses characters outside the [s, s + max_size) range, apparently using the ~static_cast< std::size_t >(0u) as the size limit. This is against the do_length() definition in the C++ standard, see [locale.codecvt.virtuals]/12-14 (http://eel.is/c++draft/locale.codecvt.virtuals#lib:codecvt,do_length): Effects: The effect on the state argument is as if it called do_in(state, from, from_end, from, to, to+max, to) for to pointing to a buffer of at least max elements. That is, max is only referred to as the size of the potential output buffer, and the source buffer is specified as [from, from_end). There is no requirement for max to be within [from, from_end) bounds. If I change max to (sizeof(text) - 1u) then the buffer overflow does not happen. (As to the purpose of this code, it is supposed to calculate the size, in bytes, of the initial sequence of complete characters not larger than max_size.) $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.2.0-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet
[Bug c++/88165] error: default member initializer for 'A::B::m' required before the end of its enclosing class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88165 --- Comment #13 from andysem at mail dot ru --- (In reply to Jonathan Wakely from comment #10) > (In reply to Fedor Chelnokov from comment #7) > > This struct definition: > > ``` > > struct A { > > struct B { > > int i = 0; > > B() {} > > This declares default constructor, meaning that the type is default > constructible. Period. > > By declaring B(); you assert that B is default constructible. > > > }; > > A(B = {}); > > This is valid, because the user-provided default constructor for B is all > the compiler needs to see to know that B={} is valid. > > > }; > > ``` > > is accepted by GCC, but another one ({} replaced with = default) is not: > > ``` > > struct C { > > struct D { > > int i = 0; > > D() = default; > > This declares a default constructor that might be defined implicitly by the > compiler, **or** it might get deleted if the member definitions of D would > make the implicit default constructor ill-formed. This is obviously very > different from the case where you declare B(); There is no assertion that D > is default constructible, the compiler has to deduce whether or not that's > true. That depends on the default member initializer for D::i. The > initializer (which is just '0' here) is a "complete class context" which > means it is not processed until the class D is complete (this allows you to > use other members, or e.g. sizeof(D) as the initializer). > > A nested class like C::D is not complete until its enclosing class is > complete. This means the initializer for C::D::i is compiled after C is > complete. This means whether C::D is default constructible is not known > until C is complete. > > > > }; > > C(D = {}); > > This requires checking whether C::D is default constructible, but we are > still in the body of C, so C is not complete, which means C::D is not > complete, which means we don't know if C::D is default constructible. Does it? I thought the default arguments were only evaluated at the point where they are actually used, i.e. in this case - when `C` constructor is invoked with no arguments. > > }; > > ``` > > Demo: https://gcc.godbolt.org/z/WTPdTn1Yf > > > > Could you please explain why? I though that both must be same accepted or > > same rejected. > > I hope the explanation above helps. Thank you for the explanation. If this is how the standard specifies compiler behavior, I wonder if this should be considered as a defect (DR). There is nothing in `std::numeric_limits::max()` that depends on the definition of `C`, so there is no reason to prevent it from compiling.
[Bug c++/102293] [10/11/12 Regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293 --- Comment #6 from andysem at mail dot ru --- Ok, thanks for confirming.
[Bug c++/102293] [10/11/12 Regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293 --- Comment #4 from andysem at mail dot ru --- (In reply to Martin Liška from comment #3) > Dup. > > *** This bug has been marked as a duplicate of bug 100161 *** This bug isn't about a warning but about an ICE. Are you positive this is a duplicate?
[Bug c++/102293] [10 regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293 --- Comment #2 from andysem at mail dot ru --- The code compiles if -std=c++03 is replaced with e.g. -std=c++11.
[Bug c++/102293] [10 regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293 --- Comment #1 from andysem at mail dot ru --- I forgot to mention that the same code compiles successfully with gcc 9.4.0. And here is the output from gcc 11.1.0: In file included from ./boost/assert.hpp:58, from ./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:16, from ./boost/xpressive/detail/core/matchers.hpp:46, from ./boost/xpressive/detail/core/linker.hpp:35, from ./boost/xpressive/detail/static/static.hpp:19, from ./boost/xpressive/detail/static/transforms/as_matcher.hpp:18, from ./boost/xpressive/detail/static/grammar.hpp:21, from ./boost/xpressive/basic_regex.hpp:28, from ./boost/xpressive/regex_compiler.hpp:30, from ./boost/xpressive/xpressive_dynamic.hpp:17, from libs/log/test/run/filt_matches_xpressive.cpp:18: ./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp: In instantiation of ‘boost::xpressive::detail::simple_repeat_matcher::simple_repeat_matcher(const Xpr&, unsigned int, unsigned int, std::size_t) [with Xpr = boost::xpressive::detail::matcher_wrapper >, mpl_::bool_, boost::xpressive::detail::compound_charset > > > >; Greedy = mpl_::bool_; std::size_t = long unsigned int]’: ./boost/xpressive/detail/dynamic/dynamic.hpp:216:48: required from ‘void boost::xpressive::detail::make_simple_repeat(const boost::xpressive::detail::quant_spec&, boost::xpressive::detail::sequence&, const Xpr&) [with BidiIter = const char*; Xpr = boost::xpressive::detail::matcher_wrapper >, mpl_::bool_, boost::xpressive::detail::compound_charset > > > >]’ ./boost/xpressive/detail/dynamic/dynamic.hpp:130:31: required from ‘void boost::xpressive::detail::dynamic_xpression::repeat_(const boost::xpressive::detail::quant_spec&, boost::xpressive::detail::sequence&, mpl_::int_<1>, mpl_::false_) const [with Matcher = boost::xpressive::detail::charset_matcher >, mpl_::bool_, boost::xpressive::detail::compound_charset > > >; BidiIter = const char*; mpl_::false_ = mpl_::bool_]’ ./boost/xpressive/detail/dynamic/dynamic.hpp:96:22: required from ‘void boost::xpressive::detail::dynamic_xpression::repeat(const boost::xpressive::detail::quant_spec&, boost::xpressive::detail::sequence&) const [with Matcher = boost::xpressive::detail::charset_matcher >, mpl_::bool_, boost::xpressive::detail::compound_charset > > >; BidiIter = const char*]’ ./boost/xpressive/detail/dynamic/dynamic.hpp:94:18: required from here ./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:81:37: internal compiler error: in value_dependent_expression_p, at cp/pt.c:26978 81 | BOOST_ASSERT(0 != width && unknown_width() != width); | ~~~^~~ ./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:81:13: note: in expansion of macro ‘BOOST_ASSERT’ 81 | BOOST_ASSERT(0 != width && unknown_width() != width); | ^~~~ 0xe306b3 internal_error(char const*, ...) ???:0 0xe27039 fancy_abort(char const*, int, char const*) ???:0 0xf5208e value_dependent_expression_p(tree_node*) ???:0 0xf520a9 value_dependent_expression_p(tree_node*) ???:0 0x10ed97e tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool) ???:0 0x10eddd9 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool) ???:0 0x10ee17a tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool) ???:0 0x1178f37 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool) ???:0 0x117910c tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool) ???:0 0x1179047 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool) ???:0 0x1178f95 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool) ???:0 0x1179047 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool) ???:0 0x1178f95 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool) ???:0 0x122150a instantiate_decl(tree_node*, bool, bool) ???:0 0xfa3850 instantiate_pending_templates(int) ???:0 0xf9f8f3 c_parse_final_cleanups() ???:0 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See for instructions. "g++-11" -ftemplate-depth-1024 -std=c++03 -fPIC -m64 -pthread -O0 -fno-inline -Wall -g -fno-strict-aliasing -DBOOST_ALL_NO_LIB=1 -DBOOST_ATOMIC_DYN_LINK=1 -DBOOST_CHRONO_DYN_LINK=1 -DBOOST_FILESYSTEM_DYN_LINK=1 -DBOOST_LOG_DYN_LINK=1 -DBOOST_LOG_SETUP_DYN_LINK=1 -DBOOST_LOG_USE_AVX2 -DBOOST_LOG_USE_SSSE3 -DBOOST_TEST_DYN_LINK=1 -DBOOST_TEST_NO_AUTO_LINK=1 -DBOOST_THREAD_BUILD_DLL=1 -DBOOST_THREAD_POSIX -DBOOST_THREAD_USE_DLL=1 -D_XOPEN_SOURCE=600 -I"." -I"libs/log/test/common" -c -o
[Bug c++/102293] New: [10 regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293 Bug ID: 102293 Summary: [10 regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730) Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Created attachment 51443 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51443=edit Preprocessed output of the failing test. I'm seeing ICE with gcc 10.3.0 and gcc 11.1.0 when compiling Boost.Log tests that use Boost.Xpressive. The ICE points at Boost.Xpressive code: $ "g++-10" -save-temps -ftemplate-depth-1024 -std=c++03 -fPIC -m64 -pthread -O0 -fno-inline -Wall -g -fno-strict-aliasing -DBOOST_ALL_NO_LIB=1 -DBOOST_ATOMIC_DYN_LINK=1 -DBOOST_CHRONO_DYN_LINK=1 -DBOOST_FILESYSTEM_DYN_LINK=1 -DBOOST_LOG_DYN_LINK=1 -DBOOST_LOG_SETUP_DYN_LINK=1 -DBOOST_LOG_USE_AVX2 -DBOOST_LOG_USE_SSSE3 -DBOOST_TEST_DYN_LINK=1 -DBOOST_TEST_NO_AUTO_LINK=1 -DBOOST_THREAD_BUILD_DLL=1 -DBOOST_THREAD_POSIX -DBOOST_THREAD_USE_DLL=1 -D_XOPEN_SOURCE=600 -I"." -I"libs/log/test/common" -c -o "bin.v2/libs/log/test/filt_matches_xpressive.test/gcc-11/debug/cxxstd-03-iso/threadapi-pthread/threading-multi/run/filt_matches_xpressive.o" "libs/log/test/run/filt_matches_xpressive.cpp" In file included from ./boost/xpressive/detail/core/matchers.hpp:46, from ./boost/xpressive/detail/core/linker.hpp:35, from ./boost/xpressive/detail/static/static.hpp:19, from ./boost/xpressive/detail/static/transforms/as_matcher.hpp:18, from ./boost/xpressive/detail/static/grammar.hpp:21, from ./boost/xpressive/basic_regex.hpp:28, from ./boost/xpressive/regex_compiler.hpp:30, from ./boost/xpressive/xpressive_dynamic.hpp:17, from libs/log/test/run/filt_matches_xpressive.cpp:18: ./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp: In instantiation of ‘boost::xpressive::detail::simple_repeat_matcher::simple_repeat_matcher(const Xpr&, unsigned int, unsigned int, std::size_t) [with Xpr = boost::xpressive::detail::matcher_wrapper >, mpl_::bool_, boost::xpressive::detail::compound_charset > > > >; Greedy = mpl_::bool_; std::size_t = long unsigned int]’: ./boost/xpressive/detail/dynamic/dynamic.hpp:216:48: required from ‘void boost::xpressive::detail::make_simple_repeat(const boost::xpressive::detail::quant_spec&, boost::xpressive::detail::sequence&, const Xpr&) [with BidiIter = const char*; Xpr = boost::xpressive::detail::matcher_wrapper >, mpl_::bool_, boost::xpressive::detail::compound_charset > > > >]’ ./boost/xpressive/detail/dynamic/dynamic.hpp:130:31: required from ‘void boost::xpressive::detail::dynamic_xpression::repeat_(const boost::xpressive::detail::quant_spec&, boost::xpressive::detail::sequence&, mpl_::int_<1>, mpl_::false_) const [with Matcher = boost::xpressive::detail::charset_matcher >, mpl_::bool_, boost::xpressive::detail::compound_charset > > >; BidiIter = const char*; mpl_::false_ = mpl_::bool_]’ ./boost/xpressive/detail/dynamic/dynamic.hpp:96:22: required from ‘void boost::xpressive::detail::dynamic_xpression::repeat(const boost::xpressive::detail::quant_spec&, boost::xpressive::detail::sequence&) const [with Matcher = boost::xpressive::detail::charset_matcher >, mpl_::bool_, boost::xpressive::detail::compound_charset > > >; BidiIter = const char*]’ ./boost/xpressive/detail/dynamic/dynamic.hpp:94:18: required from here ./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:81:23: internal compiler error: in value_dependent_expression_p, at cp/pt.c:26730 81 | BOOST_ASSERT(0 != width && unknown_width() != width); | ~~^~~ Please submit a full bug report, with preprocessed source if appropriate. See for instructions. $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 10.3.0-1ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release
[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477 --- Comment #9 from andysem at mail dot ru --- (In reply to Martin Sebor from comment #8) > > Submitting a bug for the LTO problem is only helpful if it comes with a test > case to reproduce it. I have heard about problems suppressing warnings in > LTO builds but I'm not aware of it being a design limitation, and other than > the mention of the inlining caveat (the -Wstringop-overflow example) I can't > find it mentioned in the LTO section of the manual pointed to by the link. The warning is very likely appearing as a result of inlining, and since the compiler inlines based on its own decisions it is difficult to create a test case that would reproduce the problem. The bottom line is that diagnostic pragmas don't work (reliably) with LTO, and fixing that requires the compiler to track pragmas past inlining.
[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477 --- Comment #7 from andysem at mail dot ru --- (In reply to Martin Sebor from comment #6) > In C/C++ the size of the largest object is PTRDIFF_MAX - 1 bytes. This is > necessary so that the difference between the address of its first byte and > that just past its last byte fits in ptrdiff_t. Sure, but memset, as well as other string operations, takes size_t as an argument and doesn't have this limitation. They must be able to operate on the full range of size_t worth of bytes. I don't remember neither C nor C++ declaring a narrowing limit on the allowed size_t range wrt. these functions. And please note that string functions are often used on raw storage that is obtained through means other than defined by C++ as operations creating objects. That storage may be larger than the maximum C++ object size. > A number of GCC warnings > point out code that breaks this assumption, including allocations in excess > of the maximum or calls to functions like memset or memcpy that would access > more than the maximum. None of these warnings considers the conditions > that must be true in order for control to reach the problem statement, so if > the statement hasn't been eliminated from in the IL (i.e., can be seen in > the output of -fdump-tree-optimized or whatever pass the warning runs in) > the warning triggers. Taking the conditions into consideration and issuing > "maybe" kinds of warnings (like -Wmaybe-uninitialized) is a work in progress > but as of now no warning other thanb -Wmaybe-uninitialized does. Ok, then please consider this bug as a case for adding consideration of conditions. > No such distinction as "there's definitely a bug" vs "there may be a bug" is > feasible because there is, in general, no way to identify functions that are > defined but never called, or those that are only called with arguments that > make certain code paths unreachable. As a result, with few exceptions every > GCC warning is of the latter kind. To clarify, what I mean by "there's definitely a bug" kind of warning is when the compiler has enough data to prove that the operation will result in UB (for example, cause a buffer overflow). Such as: char* alloc_string(size_t n) { char* p = new char[n]; // forgot about the byte for the terminating 0 std::memset(p, 0, n + 1); return p; } This is notably different from compiler trying to reason about a possible value of n based on size_t range and issuing warnings based on that. I would like to see warnings of the former kind but not so much of the latter kind. If there is no way to separate the two kinds of warnings, and the compiler does not consider prior conditions that are written *specifically* to prevent UB that triggers the warning, I'm afraid, I don't see the use for such warnings. > If you're interested, the following > two-part article discusses some of the challenges in this area: > > https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings/ > https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings- > part-2/ Thanks, I will read it. I understand that compiler warnings are not easy to implement right. > If the #pragma suppression doesn't work with LTO please report that as a > separate bug. There's nothing we can do to avoid the warning in this case. The fact that pragmas don't work with LTO is documented in the -flto description[1]. My impression is that gcc simply doesn't support warning control through pragmas when LTO is enabled. Is this not the case? I can create a bug, but I cannot provide a small repro, since the warning triggers in a large code base, but not necessarily a small test case. Would such a bug report be useful? [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto
[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477 --- Comment #5 from andysem at mail dot ru --- (In reply to Martin Sebor from comment #4) > The case of _size being very large and n very small may be handled correctly > by the original code thanks to the check for _capacity but because its value > isn't known it affects neither the codegen nor the warning. > > The warning is designed to flag bounds greater than PTRDIFF_MAX But sizes above PTRDIFF_MAX are valid sizes. IMHO, triggering the warning based on possible sizes is just not the right approach to begin with. If what you're trying to detect is an integral overflow then *that* is what should be detected, and pointed to by the warning, not the unrelated memset. As an illustration to my point, I worked around this warning by replacing memset with std::fill_n with no other changes to the code. The optimizer will likely convert it to memset anyway. This shows that (1) if the supposed bug is an overflow in `_size + 4` then it is no longer being detected and (2) there is no problem in filling more than PTRDIFF_MAX bytes (as expected, of course). > and that's > just what it sees here as is evident from the output of the > -fdump-tree-optimized option. There is nothing to fix here. Code that's > unreachable as a result of preconditions GCC cannot prove may be susceptible > to false positives. That's a problem shared by all flow-sensitive warnings, > not just in GCC but in all static analyzers with flow analysis. > > In general, GCC warnings are designed to "report constructions that are not > inherently erroneous but that are risky or suggest there may have been an > error." Not every instance of every warning necessarily corresponds to an > error, and some may even be false positives. False positives is a very good reason for the warnings to be disabled or ignored, meaning that they are useless. Is there any reliable detection implemented in -Wstringop-overflow? Meaning that, when the compiler can tell "there's definitely a bug", as opposed to "there may be something fishy going on"? If yes, then it may be sensible to separate the reliable and speculative parts into different warnings. Personally, I would like to have a warning when the compiler is certain that something is wrong, but not the false positives, like shown in this bug. That's the only thing stopping me from just disabling this warning globally. > Unhelpful warnings can be > disabled either globally, on the command line, or on a case by case basis by > #pragma GCC diagnostic. #pragma GCC diagnostic doesn't work with LTO. I was actually suppressing that warning in the real code base with a #pragma, but it no longer works when LTO is enabled by default in my distro. > Adding preconditions like 'if (_size >= __PTRDIFF_MAX__ / 4) > __builtin_unreachable ();' (the 4 should be replaced by sizeof (value_type) > in the original test case) often helps not just warnings but also codegen. > They're not required but can be helpful and preferable to suppression. Again, that precondition is incorrect.
[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477 --- Comment #3 from andysem at mail dot ru --- To put it another way, the case of _size being large and n small is valid, and memset (and resize in general) do behave correctly in this case. Which is why the warning is bogus and the suggested assert is incorrect. Whether n became small is a separate matter unrelated to memset. You could argue that the line where n becomes small *as a result of overflow* could issue a warning. For example, if size_t was augmented with some sort of an attribute. But that is not what -Wstringop-overflow is.
[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477 andysem at mail dot ru changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID |--- --- Comment #2 from andysem at mail dot ru --- (In reply to Martin Sebor from comment #1) > > Since n is set to _size + 4 in test(), (n < _size) holds only if the > addition wraps around zero, implying _size is excessively large. If `_size + 4` does overflow, the result of memset is still valid (i.e. it is filling what it is supposed to fill), and the warning is incorrect. > The > warning can be avoided by asserting that that isn't so, e.g., by adding the > following > > if (_size >= __PTRDIFF_MAX__ / 4) > __builtin_unreachable (); > > just before the memset call. I don't think users should be required to insert non-portable asserts like these to be able to use memset without warnings. Besides, this assert is incorrect as it will prevent memset to be called if (_size >= __PTRDIFF_MAX__ / 4). (Why `/ 4`, BTW?) I'm reopening because I think, as it currently works, the warning is bogus and should be fixed. It is not actionable on the user's side and the suggested workaround is not practical.
[Bug middle-end/100477] New: Bogus -Wstringop-overflow warning on memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477 Bug ID: 100477 Summary: Bogus -Wstringop-overflow warning on memset Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider the following test case: #include #include class Container { public: typedef unsigned char value_type; typedef value_type* pointer; typedef std::size_t size_type; Container(); void clear(); void reserve(size_type); size_type size() const { return _size; } void resize(size_type n) { if(n == 0) { clear(); return; } if(n > _capacity) { reserve(n); } else if(_owned && n < _size) { std::memset(_buf + n, 0, (_size - n) * sizeof(value_type)); } _size = n; } private: pointer _buf; size_type _size; size_type _capacity; int _shrinkCounter; bool _owned; }; void test(Container& c, int v) { Container::size_type position = c.size(); c.resize(position + sizeof(int)); } $ g++ -Wall -O3 -o memset_warning.o -c memset_warning.cpp In file included from /usr/include/string.h:519, from /usr/include/c++/10/cstring:42, from memset_warning.cpp:2: In function ‘void* memset(void*, int, size_t)’, inlined from ‘void Container::resize(Container::size_type)’ at memset_warning.cpp:32:24, inlined from ‘void test(Container&, int)’ at memset_warning.cpp:48:13: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:33: warning: ‘void* __builtin_memset(void*, int, long unsigned int)’ specified bound 18446744073709551612 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=] 59 | return __builtin___memset_chk (__dest, __ch, __len, | ~~~^ 60 | __glibc_objsize0 (__dest)); | ~~ Here, the reported constant 18446744073709551612 is -4, which implies the compiler is ignoring the `n < _size` check and calculates `(_size - n)` for memset even though it will never be called. $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 10.3.0-1ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1)
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 --- Comment #10 from andysem at mail dot ru --- Thanks. Will this be backported to 10 and 11 branches?
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 --- Comment #6 from andysem at mail dot ru --- Hmm, it looks like the original code has changed enough so that the problem no longer reproduces, with or without __restrict__. I don't have the older version of the code, so I can't tell what changed exactly. Data alignment most probably did change, but data layout of A (its equivalent in the original code) as well as the operation on it certainly didn't. Sorry for the confusion.
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 --- Comment #3 from andysem at mail dot ru --- I tried adding __restrict__ to the equivalents of x, y1 and y2 in the original larger code base and it didn't help. The compiler (gcc 10.2) would still generate the same half-vectorized code.
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 --- Comment #1 from andysem at mail dot ru --- For reference, an ideal version of this code should look something like this: test(A&, A const&, A const&): movdqu (%rsi), %xmm0 movdqu (%rdi), %xmm1 movdqu (%rdx), %xmm2 paddd %xmm1, %xmm0 psubd %xmm2, %xmm0 movups %xmm0, (%rdi) ret
[Bug tree-optimization/99971] New: GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 Bug ID: 99971 Summary: GCC generates partially vectorized and scalar code at once Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider the following code sample: struct A { unsigned int a, b, c, d; A& operator+= (A const& that) { a += that.a; b += that.b; c += that.c; d += that.d; return *this; } A& operator-= (A const& that) { a -= that.a; b -= that.b; c -= that.c; d -= that.d; return *this; } }; void test(A& x, A const& y1, A const& y2) { x += y1; x -= y2; } The code, when compiled with options "-O3 -march=nehalem", generates: test(A&, A const&, A const&): pushq %rbp movdqu (%rdi), %xmm1 pushq %rbx movl4(%rsi), %r8d movdqu (%rsi), %xmm0 movl(%rsi), %r9d paddd %xmm1, %xmm0 movl8(%rsi), %ecx movl12(%rsi), %eax movl%r8d, %esi movl(%rdi), %ebp movl4(%rdi), %ebx movl8(%rdi), %r11d movl12(%rdi), %r10d movups %xmm0, (%rdi) subl(%rdx), %r9d subl4(%rdx), %esi subl8(%rdx), %ecx subl12(%rdx), %eax addl%ebp, %r9d addl%ebx, %esi movl%r9d, (%rdi) popq%rbx addl%r11d, %ecx popq%rbp movl%esi, 4(%rdi) addl%r10d, %eax movl%ecx, 8(%rdi) movl%eax, 12(%rdi) ret https://gcc.godbolt.org/z/Mzchj8bxG Here you can see that the compiler has partially vectorized the test function - it converted "x += y1" to paddd, as expected, but failed to vectorize "x -= y2". But at the same time the compiler also generated scalar code, including for the already vectorized "x += y1" line, basically duplicating it. Note that when either "x += y1" or "x -= y2" is commented, the compiler is able to vectorize the line that is left. It is also able to vectorize both lines when the += and -= operators are applied to different objects instead of x. This is reproducible since gcc 8 up to and including 10.2. gcc 7 doesn't vectorize this code. With the current trunk on godbolt the generated code is different: test(A&, A const&, A const&): movdqu (%rsi), %xmm0 movdqu (%rdi), %xmm1 paddd %xmm1, %xmm0 movups %xmm0, (%rdi) movd%xmm0, %eax subl(%rdx), %eax movl%eax, (%rdi) pextrd $1, %xmm0, %eax subl4(%rdx), %eax movl%eax, 4(%rdi) pextrd $2, %xmm0, %eax subl8(%rdx), %eax movl%eax, 8(%rdi) pextrd $3, %xmm0, %eax subl12(%rdx), %eax movl%eax, 12(%rdi) ret Here the compiler is able to vectorize "x += y1" but not "x -= y2". At least, it removed the duplicate scalar version of "x += y1". Given that the compiler is able to vectorize each line in isolation, I would expect it to be able to vectorize them combined. Generating duplicate versions of code is certainly not expected.
[Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563 --- Comment #8 from andysem at mail dot ru --- Thanks again.
[Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563 --- Comment #5 from andysem at mail dot ru --- Thanks. Will there be a fix in branch 10?
[Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563 Bug ID: 99563 Summary: Code miscompilation caused by _mm256_zeroupper() Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider the following code: #include constexpr unsigned int block_size = 8u; float compute_generic(const double* data, unsigned int width, unsigned int height); inline __attribute__((always_inline)) float compute_avx(const double* data, unsigned int width, unsigned int height) { __m128d mm_res = _mm_setzero_pd(); unsigned long block_count = static_cast< unsigned long >((width + block_size - 1) / block_size) * static_cast< unsigned long >((height + block_size - 1) / block_size); float res = static_cast< float >(_mm_cvtsd_f64(mm_res) / static_cast< double >(block_count)); _mm256_zeroupper(); return res; } float compute(const double* data, unsigned int width, unsigned int height) { if (width >= 16 && height >= block_size) { return compute_avx(data, width, height); } else { return compute_generic(data, width, height); } } $ g++ -O2 -march=sandybridge -mno-vzeroupper -o test.o test.cpp https://gcc.godbolt.org/z/dhr7an The code compiles to: compute(double const*, unsigned int, unsigned int): cmp esi, 15 jbe .L2 cmp edx, 7 jbe .L2 vzeroupper ret .L2: jmp compute_generic(double const*, unsigned int, unsigned int) which leaves the result of compute() uninitialized if AVX path is taken. The problem disappears if one of the following is done: - -O2 is replaced with -O1 - -mno-vzeroupper is removed - _mm256_zeroupper(); call is removed (the upper bits of vector registers is left dirty, though) This is a regression in gcc 10 branch and later, gcc 9.x compiles this correctly.
[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978 --- Comment #3 from andysem at mail dot ru --- (In reply to Jonathan Wakely from comment #1) > This would be an ABI break, and so not going to happen. Is there no way to improve standard components implementation? I'd imagine you could provide the new implementation in the new version inline namespace and still support the old ABI for backward compatibility. (In reply to Jonathan Wakely from comment #2) > If we were going to do this, we could also make std::optional occupy a > single byte, using one bit for the value and one for the engaged flag. This would be more problematic as emplace(), value() and operator*() need to return T&, which would not be possible.
[Bug libstdc++/98978] New: Consider packing _M_Engaged in the tail padding of T in optional<>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978 Bug ID: 98978 Summary: Consider packing _M_Engaged in the tail padding of T in optional<> Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Using std::optional with some types may considerably increase object sizes since it adds alignof(T) bytes worth of overhead. Sometimes it is possible to avoid this overhead if the flag indicating presence of the stored value (_M_Engaged in libstdc++ sources) is placed in the tail padding of the T object. This can be done if std::optional constructs an object of a type that derives from T, which has an additional bool data member that is initialized to true upon construction. The below code roughly illustrates the idea: template< typename T > struct _Optional_payload_base { struct _PresentT : T { const bool _M_Engaged = true; // Forwarding ctors and other members }; static constexpr size_t engaged_offset = offsetof(_PresentT, _M_Engaged); struct _AbsentT { unsigned char _M_Offset[engaged_offset]; const bool _M_Engaged = false; }; union _Storage { _AbsentT _M_Empty; _PresentT _M_Value; _Storage() : _M_Empty() {} // Forwarding ctors and other members }; _Storage _M_payload bool is_engaged() const noexcept { return *reinterpret_cast< const bool* >(reinterpret_cast< const unsigned char* >(&_M_payload) + engaged_offset); } }; The above relies on some implementation details, such as: - offsetof works for the type T. It does for many types in gcc, beyond what is required by the C++ standard. Maybe there is a way to avoid offsetof, I just didn't immediately see it. - The location of _M_Engaged in both _PresentT and _AbsentT is the same. This is a property of the target ABI, and AFAICS it should be true at least on x86 psABI and I think Microsoft ABI. The above will only work for non-final class types, for other types, and where the above requirements don't hold true, the current code with a separate _M_Engaged flag would work.
[Bug target/97891] [x86] Consider using registers on large initializations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97891 --- Comment #5 from andysem at mail dot ru --- Using a register is beneficial even for bytes and words if there are multiple of mov instructions. But there has to be a single reg0 for all movs. I'm not very knowlegeable about gcc internals, but would it be beneficial to implement this on a higher level than instruction transformation? I.e. so that instead of this: a = 0; b = 0; c = 0; we have: any reg0 = 0; // any represents a type compatible with any fundamental or enum type a = reg0; b = reg0; c = reg0; This way, reg0 would be in a single register, and that xorl instruction could be subject to other tree optimizations. With tree-level optimization, another thing to note is vectorizer. I know gcc can sometimes merge adjacent initializations without padding to a larger single instruction initializazion. For example: struct A { long a1; long a2; A() : a1(0), a2(0) { } }; void test(A* p, unsigned int count) { for (unsigned int i = 0; i < count; ++i) { p[i] = A(); } } test(A*, unsigned int): testl %esi, %esi je .L1 leal-1(%rsi), %eax pxor%xmm0, %xmm0 salq$4, %rax leaq16(%rdi,%rax), %rax .L3: movups %xmm0, (%rdi) addq$16, %rdi cmpq%rax, %rdi jne .L3 .L1: ret I would like this to still work.
[Bug target/97891] [x86] Consider using registers on large initializations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97891 --- Comment #1 from andysem at mail dot ru --- As a side note, the "xorl %edx, %edx" in the original code should have been moved outside the loop, as it was in the code with __asm__ block.
[Bug target/97891] New: [x86] Consider using registers on large initializations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97891 Bug ID: 97891 Summary: [x86] Consider using registers on large initializations Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Consider the following example code: struct A { long a; short b; int c; char d; long x; bool y; int z; char* p; A() : a(0), b(0), c(0), d(0), x(0), y(false), z(0), p(0) {} }; void test(A* p, unsigned int count) { for (unsigned int i = 0; i < count; ++i) { p[i] = A(); } } When compiled with "-O3 -march=nehalem" the generated code is: test(A*, unsigned int): testl %esi, %esi je .L1 leal-1(%rsi), %eax leaq(%rax,%rax,2), %rax salq$4, %rax leaq48(%rdi,%rax), %rax .L3: xorl%edx, %edx movq$0, (%rdi) addq$48, %rdi movw%dx, -40(%rdi) movl$0, -36(%rdi) movb$0, -32(%rdi) movq$0, -24(%rdi) movb$0, -16(%rdi) movl$0, -12(%rdi) movq$0, -8(%rdi) cmpq%rax, %rdi jne .L3 .L1: ret https://gcc.godbolt.org/z/TrfWYr Here, the main loop body between .L3 and .L1 is 60 bytes large, with a significant amount of space wasted on the $0 constants encoded in mov instructions. It would be more efficient to use a single zero register in all member initializations, especially given that %edx is already used like that. A loop rewritten like this: for (unsigned int i = 0; i < count; ++i) { __asm__ ( "movq%q1, (%0)\n\t" "movw%w1, 8(%0)\n\t" "movl%1, 12(%0)\n\t" "movb%b1, 16(%0)\n\t" "movq%q1, 24(%0)\n\t" "movb%b1, 32(%0)\n\t" "movl%1, 36(%0)\n\t" "movq%q1, 40(%0)\n\t" : : "r" (p + i), "q" (0) ); } compiles to: test(A*, unsigned int): testl %esi, %esi je .L1 leal-1(%rsi), %eax leaq(%rax,%rax,2), %rax salq$4, %rax leaq48(%rdi,%rax), %rdx xorl%eax, %eax .L3: movq%rax, (%rdi) movw%ax, 8(%rdi) movl%eax, 12(%rdi) movb%al, 16(%rdi) movq%rax, 24(%rdi) movb%al, 32(%rdi) movl%eax, 36(%rdi) movq%rax, 40(%rdi) addq$48, %rdi cmpq%rdx, %rdi jne .L3 .L1: ret Here, the loop between .L3 and .L1 only takes 34 bytes, which is nearly half the original size. Constant (for example, zero) initialization is a frequently used pattern to initialize structures, so the sequences like the above are quite wide spread. Converting cases like this to the use of registers could save some code size and reduce cache pressure.
[Bug libstdc++/59325] Provide a way to disable deprecated warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59325 --- Comment #8 from andysem at mail dot ru --- (In reply to Jonathan Wakely from comment #6) > (In reply to andysem from comment #2) > > (In reply to Jonathan Wakely from comment #1) > > > You can use a #pragma to disable -Wdeprecated locally > > > > But the legacy C++ is used in the library, which code I'd like to avoid > > changing. > > Is this still a problem? Uses of deprecated features within the library > itself should no longer emit warnings (I think I've disabled them with > pragmas). > > So the only uses should be in our own code, which you can add pragmas to. Yes, this is still a problem. To be clear, the library I was referring to was not libstdc++ but another library which I have no control of and which is still using C++03 features. I don't think it will move away from std::auto_ptr any time soon as it is part of the API. > If you want to ignore the warning because you know you're not going to use a > different implementation or a newer standard, you can use -Wno-deprecated to > disable all such warnings globally or use #pragma to disable them locally. I don't want to disable _all_ deprecated warnings universally, only those emitted by libstdc++. For this reason I cannot use -Wno-deprecated. #pragma also doesn't really work because libstdc++ can be included from any header, including those from other libraries, for which I don't want to disable warnings.