[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2

2023-12-26 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099

--- Comment #13 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #12)
> (In reply to andysem from comment #11)
> > > I'm not sure what you mean by "the compiler is free to generate code that 
> > > takes it into account." Takes what into account? What problem does that 
> > > freedom cause?
> > 
> > I mean the compiler could move (some part of) dynamic_cast to precede the
> > check for the standard facet. I.e. of something like this:
> > 
> >   template< typename _Facet >
> >   const _Facet* __try_use_facet(locale const& loc)
> >   {
> > const size_t __i = _Facet::id._M_id();
> > const locale::facet** __facets = __loc._M_impl->_M_facets;
> > const _Facet* __dyn_facet = __dynamic_cast< const _Facet*
> > >(__facets[__i]);
> > 
> > // checks for every standard facet type
> > if (__is_same(_Facet, ...))
> >   return static_cast(__facets[__i]);
> > 
> > return __dyn_facet;
> >   }
> 
> But why? Maybe I'm being slow but I still don't understand what problem is
> being solved here.
> 
> Also this would defeat the optimization that avoids the unnecessary overhead
> of dynamic_cast for standard facets.

I have seen gcc sometimes reorder code like this (i.e. move code from under a
branch before it), presumably to improve ILP or prefetch data, I'm not sure.
Yes, that defeats the branch, if it is used as an optimization, and I had to
prevent this by adding compiler fences in those cases. Granted, in my case it
happened with inlined code, but I imagine LTO makes it possible to perform
similar code transformations with out-of-line code as well.

I'm not saying this is what actually happens. I'm just pointing out that even
the code that is apparently unreachable may influence codegen and cause ODR
issues.

> > AFAIK, the standard or libstdc++ docs do not require RTTI for std::locale to
> > function.
> 
> The standard requires RTTI, period. Using -fno-rtti is completely
> non-standard and so the standard has nothing to say about it.

Yes, but the standard is written with implementations in mind.

[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2

2023-12-24 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099

--- Comment #11 from andysem at mail dot ru ---
> I'm not sure what you mean by "the compiler is free to generate code that 
> takes it into account." Takes what into account? What problem does that 
> freedom cause?

I mean the compiler could move (some part of) dynamic_cast to precede the check
for the standard facet. I.e. of something like this:

  template< typename _Facet >
  const _Facet* __try_use_facet(locale const& loc)
  {
const size_t __i = _Facet::id._M_id();
const locale::facet** __facets = __loc._M_impl->_M_facets;
const _Facet* __dyn_facet = __dynamic_cast< const _Facet* >(__facets[__i]);

// checks for every standard facet type
if (__is_same(_Facet, ...))
  return static_cast(__facets[__i]);

return __dyn_facet;
  }

>> I think, a failing dynamic_cast would not be useful as this would make
>> std::use_facet unusable with -fno-rtti.
>
> I don't see a problem with that. If you want to use a polymorphic container 
> of facets, you need RTTI. The standard facets will work, but it doesn't seem 
> reasonable to expect it to work for program-defined facets.

AFAIK, the standard or libstdc++ docs do not require RTTI for std::locale to
function. In fact, the facet::id stuff seems to exist precisely to make RTTI
optional in the implementations. Besides, the code, as it was written, seems to
intend to work without RTTI by switching to static_cast instead of
dynamic_cast. So making std::use_facet always fail would go against that
intention.

> And a std::use_facet that fails seems better than one that segfaults, doesn't 
> it?

No, not really. It still means users cannot use locale without RTTI, just for a
different reason.

> And I guess if the user's derived facet uses virtual inheritance from 
> std::locale::facet, then this could break with the 
> static_cast*> in libstdc++ today. Hmm.

std::ctype non-virtually derives from std::locale::facet, so no, that would not
break. What wouldn't work is this:

   class my_facet : virtual public std::ctype< char > {};

   my_facet const& fac = std::use_facet< my_facet >(loc);

But this would fail in compile time with no RTTI, which *is* better than a
segfault or an exception in runtime.

[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2

2023-12-23 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099

--- Comment #4 from andysem at mail dot ru ---
> It's mostly OK to mix code with -frtti and -fno-rtti, but sometimes it bites 
> you.

Note that in this case, it is the standard library that is built with -frtti
and the rest of the code is built with -fno-rtti. I.e. you *always* get this
sort of mix when you specify -fno-rtti.

[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2

2023-12-23 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099

--- Comment #3 from andysem at mail dot ru ---
I think, a failing dynamic_cast would not be useful as this would make
std::use_facet unusable with -fno-rtti.

Re. ODR violation in the latest code, while it is true that the
dynamic/static_cast is not reachable for the standard facets, it is still part
of the function definition and the compiler is free to generate code that takes
it into account. Note that the shortcuts for the standard facets are
implemented with conditional if-constexpr, which will turn into regular ifs if
libstdc++ is compiled in pre-C++17 mode. Which, I think, is the default, isn't
it? I think, the ODR violation is still worth fixing.

[Bug libstdc++/113099] New: locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2

2023-12-20 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099

Bug ID: 113099
   Summary: locale without RTTI uses dynamic_cast before gcc 13.2
or has ODR violation since gcc 13.2
   Product: gcc
   Version: 11.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider this test case:

```
#include 

class __attribute__((__visibility__("default"))) my_codecvt final :
public std::codecvt< wchar_t, char, std::mbstate_t >
{
public:
explicit my_codecvt(std::size_t refs = 0) :
std::codecvt< wchar_t, char, std::mbstate_t >(refs)
{
}

protected:
bool do_always_noconv() const noexcept override { return false; }

int do_encoding() const noexcept override { return 0; }
std::codecvt_base::result do_in(std::mbstate_t&, const char*,
const char*, const char*&, wchar_t*, wchar_t*, wchar_t*&)
const override
{ return ok; }
std::codecvt_base::result do_out(std::mbstate_t&, const wchar_t*,
const wchar_t*, const wchar_t*&, char*, char*, char*&)
const override
{ return ok; }
std::codecvt_base::result do_unshift(std::mbstate_t&, char*,
char*, char*&) const override
{ return ok; }
int do_length(std::mbstate_t&, const char*, const char*,
std::size_t) const override
{ return 0; }
int do_max_length() const noexcept override { return 0; }
};

int main()
{
std::locale loc(std::locale(), new my_codecvt());
auto const& fac = std::use_facet<
std::codecvt< wchar_t, char, std::mbstate_t > >(loc);
(void)fac;
}
```

```
g++ -std=c++17 -fno-rtti -o locale_no_rtti locale_no_rtti.cpp
```

When compiled with RTTI disabled, with the command line above, this code
crashes with the following backtrace:

```
#0  0x77caccd1 in __dynamic_cast () from
/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x77d5307f in std::codecvt const&
std::use_facet >(std::locale const&)
() from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x539b in main ()
```

This reproduces on gcc 11.4 and 12 at least.

```
$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 
```

This was apparently fixed by accident in gcc 13.2 by this commit:

https://github.com/gcc-mirror/gcc/commit/b3ac43a3c05744d62a963d656bed782fc867ad79

The commit introduces shortcuts that use static_casts for the standard facets,
which allows to avoid the crash, but that still retains an ODR violation, where
the explicitly instantiated __try_use_facet templates in libstdc++ library use
dynamic_cast (even if unreachable) and the template definition that is visible
to user's code uses static_cast. The problematic code is here:

https://github.com/gcc-mirror/gcc/blob/d7e9ae4fa94afd5517536b4dfc7d6be0b3e8c2c3/libstdc%2B%2B-v3/include/bits/locale_classes.tcc#L142-L146

When libstdc++ is compiled, RTTI is enabled and __cpp_rtti is defined, but when
user's code is compiled with RTTI disabled, that macro is not defined, so the
__try_use_facet template definition is different.

It doesn't seem like the commit I mentioned above intended to fix the original
issue with dynamic_cast anyway, so I thought it was worth creating this bug
report, even though the original test case passes on the latest gcc.

I think, the code 

[Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM

2023-06-02 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

andysem at mail dot ru changed:

   What|Removed |Added

 CC||andysem at mail dot ru

--- Comment #11 from andysem at mail dot ru ---
(In reply to Andrew Pinski from comment #10)
> (In reply to Peter Dimov from comment #9)
> > I don't think I want WFE here, based on what I read about it. Putting the
> > core to sleep seems like something to do in an embedded system where I have
> > full control of what cores do, not something to do on the application level,
> > in a portable C++ library.
> 
> No, WFE can be used in userspace just fine and in fact it will be
> interrupted every once in a while. yield only sleeps for a few cycles and
> then wakes up, while wfe will sleep until an event happens (also WFE is very
> hypervisor friendly too).

Spin locks are used when latency is a concern and when the protected region is
extremely small (i.e. a few instructions). Putting the core to sleep until the
next interrupt does not seem appropriate for this purpose. x86 pause and ARM
yield are better suited exactly because they wait for a time in the order of
cycles (up to about a hundred on recent x86) rather than microseconds or more.

I can add that in most spin lock implementations I have seen, either yield, nop
or nothing is used for wasting CPU cycles. A few examples:

https://www.boost.org/doc/libs/1_82_0/boost/fiber/detail/cpu_relax.hpp
https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/vdso/processor.h#L11
https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/vdso/processor.h#L12
https://chromium.googlesource.com/chromium/src/third_party/WebKit/Source/wtf/+/823d62cdecdbd5f161634177e130e5ac01eb7b48/SpinLock.cpp

The first link has instructions for a few other architectures besides ARM and
x86. I agree with Peter that an architecture-neutral intrinsic could be useful
to avoid this kind of code duplicated in various projects. Although I realize
that specifying the exact behavior of this intrinsic would be difficult, since
even the underlying instructions are defined rather vaguely. However, one thing
is certain: this intrinsic must be a full compiler fence.

[Bug c++/109761] Nested class destructor's noexcept specification incorrectly considered as too loose compared to the outer class

2023-05-06 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109761

--- Comment #2 from andysem at mail dot ru ---
I don't see how completeness of outer is related to nested's destructor. Or put
it another way, how nested's destructor noexcept specification has anything to
do with outer, whether it is completed or not.

[Bug c++/109761] New: Nested class destructor's noexcept specification incorrectly considered as too loose compared to the outer class

2023-05-06 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109761

Bug ID: 109761
   Summary: Nested class destructor's noexcept specification
incorrectly considered as too loose compared to the
outer class
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following piece of code:

template< typename T >
T declval() noexcept;

struct base
{
virtual ~base() {}
};

class outer :
public base
{
protected:
class nested
{
private:
outer& m_outer;

public:
explicit nested(outer& o) noexcept(noexcept(o.on_nested_ctor())) :
m_outer(o)
{
o.on_nested_ctor();
}

nested(nested const&) = delete;
nested& operator= (nested const&) = delete;

~nested() noexcept(noexcept(declval().on_nested_dtor()))
{
m_outer.on_nested_dtor();
}
};

public:
~outer() {}

protected:
void on_nested_ctor();
void on_nested_dtor();
};

This wrongly fails to compile:

$ g++ -std=gnu++17 -o nested_noexcept_dtor.o -c nested_noexcept_dtor.cpp 
nested_noexcept_dtor.cpp:28:9: error: looser exception specification on
overriding virtual function ‘outer::nested::~nested() noexcept (false)’
   28 | ~nested()
noexcept(noexcept(declval().on_nested_dtor()))
  | ^
nested_noexcept_dtor.cpp:6:13: note: overridden function is ‘virtual
base::~base() noexcept’
6 | virtual ~base() {}
  | ^

The compiler tries to match the ~nested() noexcept specification against the
~base()'s as if nested was derived from base, but it is not. ~nested() noexcept
specification has no bearing on the outer class hierarchy.

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)

[Bug target/98612] _mm_comieq_sd has wrong semantics

2023-03-24 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98612

andysem at mail dot ru changed:

   What|Removed |Added

 CC||andysem at mail dot ru

--- Comment #9 from andysem at mail dot ru ---
Some relevant discussion here:

https://stackoverflow.com/questions/75818896/mm-comieq-ss-difference-between-clang-and-gcc

Indeed, the description for _mm_comieq_ss in the Intrinsics Guide 3.6.5 as of
2022-01-24 takes NaNs into account:

https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=comiss_expand=1424

I think, the point that the intrinsic should correspond to the naked
instruction is not correct in this case. The intrinsic corresponds to the same
instruction either way, the question is only which flags are considered the
result of the intrinsic. And it seems that the intended behavior (as indicated
by the updated docs, icc and clang) is to check ZF & PF, not just ZF as gcc is
currently doing.

[Bug target/108401] gcc defeats vector constant generation with intrinsics

2023-01-16 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401

--- Comment #7 from andysem at mail dot ru ---
To be clear, I'm not asking the compiler to recognize the particular pattern of
alternating 0x00 and 0xFF bytes. Because hardcoding this particular pattern
won't improve generated code in other cases.

Rather, I'm asking to tune down code transformations for intrinsics. If the
developer wrote a sequence of intrinsics to generate a constant then he
probably wanted that sequence instead of a simple _mm_set1_epi32 or a load from
memory.

But, if you're going to improve constant generation, please make it so that it
can recognize not only the particular pattern described in this bug. More
importantly, it should recognize the all-ones case (as a single pcmpeq) as a
starting point. Then it can apply shifts to achieve the final result from the
all-ones vector - shifts of any width, length or direction, including
psrldq/pslldq. This would improve generated code in a wider range of cases.

[Bug target/108401] gcc defeats vector constant generation with intrinsics

2023-01-16 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401

--- Comment #6 from andysem at mail dot ru ---
(In reply to Andrew Pinski from comment #1)
> >and gcc 12 generates a worse code:
> 
> it is not worse really; depending on the how fast moving between the
> register sets is.

I meant "worse" compared to vpcmpeq+vpsrlw pair.

(Side note about the broadcast version: it could have been smaller if it used a
32-bit constant and vpbroadcastd. vpcmpeq+vpsrlw would still be better in this
particular case, but if broadcast is needed, a smaller footprint code is
preferred.)

[Bug target/108401] New: gcc defeats vector constant generation with intrinsics

2023-01-13 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401

Bug ID: 108401
   Summary: gcc defeats vector constant generation with intrinsics
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following code:

#include 

__m256i load_00FF()
{
__m256i mm = _mm256_setzero_si256();
return _mm256_srli_epi16(_mm256_cmpeq_epi64(mm, mm), 8);
}

This function generates a vector constant of alternating 0xFF and 0x00 bytes.
The code is written this way to avoid a load from memory, which may cause a
cache miss. The expected generated code is this:

vpcmpeqqymm0, ymm0, ymm0
vpsrlw  ymm0, ymm0, 8
ret

which is almost exactly what gcc 8 generates (it uses vpcmpeqd instead of
vpcmpeqq, which is fine). However, gcc 9 through 11 generates a memory load
instead, defeating the attempt to avoid it:

vmovdqa ymm0, YMMWORD PTR .LC0[rip]
ret

and gcc 12 generates a worse code:

movabs  rax, 71777214294589695
vmovq   xmm1, rax
vpbroadcastqymm0, xmm1
ret

In all cases, the compiler flags are: -O3 -march=haswell

Code on godbolt.org: https://gcc.godbolt.org/z/sfT787PY9

I think the compiler should follow the code in intrinsics more closely since
despite the apparent equivalence, the choice of instructions can have
performance implications. The original code that is written by the developer is
better anyway, so it's not clear why the compiler is being so creative in this
case.

[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>

2022-09-18 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978

--- Comment #6 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #5)
> (In reply to andysem from comment #3)
> > Is there no way to improve standard components implementation? I'd imagine
> > you could provide the new implementation in the new version inline namespace
> > and still support the old ABI for backward compatibility.
> 
> To give a more complete answer: Inline namespaces don't help, that's a myth.
> 
> struct X { std::optional b; };
> 
> Now one translation unit has X using the old ABI and one has X using the new
> ABI, but they're the same X. The fact that the two versions of optional are
> in different namespaces is no help at all. You still have an ABI break.
> 
> We did it for std::string and it was a multi-year effort that caused
> disruption across the industry. It's not worth doing that again to save a
> few bytes in std::optional.

You could include a hash of all members' namespaces in the X's mangled name. Or
maybe add an attribute that would propagate a tag to the outer class' mangled
name. Yes, that's also an ABI change in itself, but a useful one that would
allow inline namespaces to become useful.

I think a way to improve standard library components is essential. If not
inline namespaces then something else. Otherwise there is no evolution path of
the standard library components.

[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem

2022-09-02 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808

--- Comment #10 from andysem at mail dot ru ---
(In reply to andysem from comment #8)
> (In reply to Jonathan Wakely from comment #7)
> > 
> > Do you want ODR violations? Because that's how you get ODR violations.
> 
> I understand this, but my point is that this is a breaking change,
> apparently even with the constructor being marked explicit...

I take it back, with the constructor marked explicit it doesn't break. Before,
I tested the preprocessed source from gcc 11.2, which didn't have explicit.
Adding the explicit fixes compilation. Sorry for the confusion.

[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem

2022-09-02 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808

--- Comment #8 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #7)
> (In reply to andysem from comment #6)
> > So do you think this is a problem in Boost.Filesystem?
> 
> I don't know yet, I can't reproduce it with the Boost in Fedora 36, and I
> haven't looked further.

I have added a workaround to the current Boost.Filesystem develop and master,
and Boost 1.80 (the latest official release) didn't support std::string_view,
so is not affected either. In order to reproduce you would need the revision I
posted in the original problem description.

> > I would say this is a regression in string_view as the code is valid in
> > pre-C++23, and I would expect it to stay valid in C++23 onwards.
> 
> That range constructor is explicit in the C++23 draft now, but that change
> hasn't been backordered to GCC 12 yet, and apparently doesn't change
> anything if you still see this in trunk.
> 
> > Shouldn't
> > the range constructor be simply disabled when fs::path::iterator is not
> > defined?
> 
> Do you want ODR violations? Because that's how you get ODR violations.

I understand this, but my point is that this is a breaking change, apparently
even with the constructor being marked explicit, and it breaks in a rather
surprising way. Also, apparently MSVC is able to not break somehow as it
doesn't have this issue. So maybe there is a solution that does not introduce
ODR violations and yet still works. I think it would be preferable to fix
std::string_view rather than all its uses like in std/Boost.Filesystem.

[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem

2022-09-01 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808

--- Comment #6 from andysem at mail dot ru ---
So do you think this is a problem in Boost.Filesystem?

I would say this is a regression in string_view as the code is valid in
pre-C++23, and I would expect it to stay valid in C++23 onwards. Shouldn't the
range constructor be simply disabled when fs::path::iterator is not defined?

[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem

2022-09-01 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808

--- Comment #4 from andysem at mail dot ru ---
It fails the same way with 12.2 and trunk on godbolt:

https://gcc.godbolt.org/z/rT6TWhhvP

[Bug libstdc++/106808] std::string_view range concept requirement causes compile error with Boost.Filesystem

2022-09-01 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808

--- Comment #3 from andysem at mail dot ru ---
I don't have the environment to build gcc locally, so I can't readily test
trunk. But I have been told the same issue reproduces with gcc-12
20220319-1ubuntu1 from Ubuntu 22.04:

https://github.com/boostorg/multiprecision/runs/8117686288?check_suite_focus=true#step:17:248

[Bug libstdc++/106808] New: std::string_view range concept requirement causes compile error with Boost.Filesystem

2022-09-01 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106808

Bug ID: 106808
   Summary: std::string_view range concept requirement causes
compile error with Boost.Filesystem
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Created attachment 53528
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53528=edit
Preprocessed code of the reproducer.

The following code fails to compile in C++23 mode:

#include 

int main()
{
boost::filesystem::path p = "foo";
boost::filesystem::path p2 = p;
}

$ g++ -std=c++23 -I. -c fs_concepts_issue.cpp -o fs_concepts_issue.o

In file included from /usr/include/c++/11/bits/stl_iterator_base_types.h:71,
 from /usr/include/c++/11/bits/stl_algobase.h:65,
 from /usr/include/c++/11/bits/char_traits.h:39,
 from /usr/include/c++/11/string:40,
 from /usr/include/c++/11/bits/locale_classes.h:40,
 from /usr/include/c++/11/locale:39,
 from ./boost/filesystem/detail/path_traits.hpp:18,
 from ./boost/filesystem/path.hpp:21,
 from fs_concepts_issue.cpp:1:
/usr/include/c++/11/bits/ranges_base.h: In substitution of ‘template
 requires (__maybe_borrowed_range<_Tp>) && ((is_array_v::type>) || (__member_begin<_Tp>) ||
(__adl_begin<_Tp>)) constexpr auto
std::ranges::__cust_access::_Begin::operator()(_Tp&&) const [with _Tp = const
boost::filesystem::path&]’:
/usr/include/c++/11/bits/ranges_base.h:576:15:   required by substitution of
‘template  requires !(is_same_v<_DRange,
std::basic_string_view>) && (contiguous_range<_Range>) && (sized_range<_Range>)
&& (is_same_v)()))>::type,
std::indirectly_readable_traits)()))>::type>
>::__iter_traits)()))>::type,
std::indirectly_readable_traits)()))>::type>
>::value_type, _CharT>) && !(is_convertible_v<_Range, const _CharT*>) &&
!requires(_DRange& __d) {__d->__conv_op ();} && (!requires{typename
_DRange::traits_type;} || (is_same_v))
constexpr std::basic_string_view::basic_string_view(_Range&&) [with
_Range = wchar_t; _DRange = std::char_traits]’
./boost/filesystem/detail/path_traits.hpp:514:93:   required from ‘constexpr
const bool
boost::filesystem::detail::path_traits::is_convertible_to_std_string_view::value’
./boost/type_traits/disjunction.hpp:31:27:   required from ‘struct
boost::disjunction,
boost::filesystem::detail::path_traits::is_convertible_to_path_source_non_std_string_view
>’
./boost/filesystem/detail/path_traits.hpp:537:8:   required from ‘struct
boost::filesystem::detail::path_traits::is_convertible_to_path_source’
./boost/type_traits/disjunction.hpp:26:8:   required from ‘struct
boost::disjunction
>’
./boost/type_traits/disjunction.hpp:30:8:   required from ‘struct
boost::disjunction,
boost::filesystem::detail::path_traits::is_convertible_to_path_source
>’
./boost/type_traits/conjunction.hpp:31:27:   required from ‘struct
boost::conjunction,
boost::filesystem::detail::path_traits::is_convertible_to_path_source
>,
boost::negation
> >’
./boost/filesystem/path.hpp:249:9:   required by substitution of
‘template boost::filesystem::path::path(const Source&)
[with Source = boost::filesystem::path;  = ]’
fs_concepts_issue.cpp:6:34:   required from here
/usr/include/c++/11/bits/iterator_concepts.h:942:15:   required for the
satisfaction of ‘__member_begin<_Tp>’ [with _Tp = const
boost::filesystem::path&]
/usr/include/c++/11/bits/iterator_concepts.h:942:32:   in requirements with
‘_Tp& __t’ [with _Tp = const boost::filesystem::path&]
/usr/include/c++/11/bits/iterator_concepts.h:942:32: error: satisfaction value
of atomic constraint ‘requires(_Tp& __t)
{{std::ranges::__cust_access::__decay_copy(__t->begin())} -> decltype(auto)
[requires std::input_or_output_iterator<, >];} [with _Tp = const
boost::filesystem::path&]’ changed from ‘false’ to ‘true’
  942 |   concept __member_begin = requires(_Tp& __t)
  |^~
  943 | {
  | ~   
  944 |   { __decay_copy(__t.begin()) } -> input_or_output_iterator;
  |   ~~
  945 | };
  | ~   
In file included from /usr/include/c++/11/string_view:48,
 from /usr/include/c++/11/bits/basic_string.h:48,
 from /usr/include/c++/11/string:55,
 from /usr/include/c++/11/bits/locale_classes.h:40,
 from /usr/include/c++/11/locale:39,
 from ./boost/filesystem/detail/path_traits.hpp:18,
 from ./boost/filesystem/path.hpp:21,
 from fs_concepts_issue.cpp:1:
/usr/include/c++/11/bits/ranges_base.h:576:22: note: satisfaction 

[Bug libstdc++/105857] codecvt::do_length causes unexpected buffer overflow

2022-06-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857

--- Comment #2 from andysem at mail dot ru ---
> outside the [s, s + max_size) range

This should be [from, from_to) range. Sorry, posted a little too soon.

[Bug libstdc++/105857] codecvt::do_length causes unexpected buffer overflow

2022-06-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857

--- Comment #1 from andysem at mail dot ru ---
Created attachment 53089
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53089=edit
Test case to reproduce the problem.

[Bug libstdc++/105857] New: codecvt::do_length causes unexpected buffer overflow

2022-06-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857

Bug ID: 105857
   Summary: codecvt::do_length causes unexpected buffer overflow
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following test case:

#include 
#include 

const std::size_t max_size = 10u;
const char text[] = "
!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";

int main()
{
std::locale loc;
std::codecvt< wchar_t, char, std::mbstate_t > const& fac =
std::use_facet< std::codecvt< wchar_t, char, std::mbstate_t > >(loc);
std::mbstate_t mbs = std::mbstate_t();
const char* from = text;
const char* from_to = from + max_size;
std::size_t max = ~static_cast< std::size_t >(0u);
return static_cast< std::size_t >(fac.length(mbs, from, from_to, max));
}

$ g++ -g2 -O0 -o codecvt_length_bug codecvt_length_bug.cpp

Running this causes a crash with a buffer overflow:

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at
./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840)
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737348011840) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737348011840, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x77b56476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x77b3c7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x77b9d6f6 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x77cef943 "*** %s ***: terminated\n") at
../sysdeps/posix/libc_fatal.c:155
#6  0x77c4a76a in __GI___fortify_fail (msg=msg@entry=0x77cef8e9
"buffer overflow detected") at ./debug/fortify_fail.c:26
#7  0x77c490c6 in __GI___chk_fail () at ./debug/chk_fail.c:28
#8  0x77c4a199 in __mbsnrtowcs_chk (dst=, src=, nmc=, len=, ps=,
dstlen=) at ./debug/mbsnrtowcs_chk.c:27
#9  0x77e290d2 in std::codecvt::do_length(__mbstate_t&, char const*, char const*, unsigned long)
const () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x52d3 in std::__codecvt_abstract_base::length (this=0x77f86090, __state=..., __from=0x6040
 "
!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~",
 
__end=0x604a 
"*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~",
__max=18446744073709551615) at /usr/include/c++/11/bits/codecvt.h:219
#11 0x523d in main () at codecvt_length_bug.cpp:14

The problem appears to be that std::codecvt< wchar_t, char, std::mbstate_t
>::do_length() accesses characters outside the [s, s + max_size) range,
apparently using the ~static_cast< std::size_t >(0u) as the size limit. This is
against the do_length() definition in the C++ standard, see
[locale.codecvt.virtuals]/12-14
(http://eel.is/c++draft/locale.codecvt.virtuals#lib:codecvt,do_length):

Effects: The effect on the state argument is as if it called do_­in(state,
from, from_­end, from, to, to+max, to) for to pointing to a buffer of at least
max elements.

That is, max is only referred to as the size of the potential output buffer,
and the source buffer is specified as [from, from_end). There is no requirement
for max to be within [from, from_end) bounds. If I change max to (sizeof(text)
- 1u) then the buffer overflow does not happen.

(As to the purpose of this code, it is supposed to calculate the size, in
bytes, of the initial sequence of complete characters not larger than
max_size.)

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.2.0-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet

[Bug c++/88165] error: default member initializer for 'A::B::m' required before the end of its enclosing class

2022-04-22 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88165

--- Comment #13 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #10)
> (In reply to Fedor Chelnokov from comment #7)
> > This struct definition:
> > ```
> > struct A {
> > struct B {
> > int i = 0;
> > B() {}
> 
> This declares default constructor, meaning that the type is default
> constructible. Period.
> 
> By declaring B(); you assert that B is default constructible.
> 
> > };
> > A(B = {});
> 
> This is valid, because the user-provided default constructor for B is all
> the compiler needs to see to know that B={} is valid.
> 
> > };
> > ```
> > is accepted by GCC, but another one ({} replaced with = default) is not:
> > ```
> > struct C {
> > struct D {
> > int i = 0;
> > D() = default;
> 
> This declares a default constructor that might be defined implicitly by the
> compiler, **or** it might get deleted if the member definitions of D would
> make the implicit default constructor ill-formed. This is obviously very
> different from the case where you declare B(); There is no assertion that D
> is default constructible, the compiler has to deduce whether or not that's
> true. That depends on the default member initializer for D::i. The
> initializer (which is just '0' here) is a "complete class context" which
> means it is not processed until the class D is complete (this allows you to
> use other members, or e.g. sizeof(D) as the initializer).
> 
> A nested class like C::D is not complete until its enclosing class is
> complete. This means the initializer for C::D::i is compiled after C is
> complete. This means whether C::D is default constructible is not known
> until C is complete.
> 
> 
> > };
> > C(D = {});
> 
> This requires checking whether C::D is default constructible, but we are
> still in the body of C, so C is not complete, which means C::D is not
> complete, which means we don't know if C::D is default constructible.

Does it? I thought the default arguments were only evaluated at the point where
they are actually used, i.e. in this case - when `C` constructor is invoked
with no arguments.

> > };
> > ```
> > Demo: https://gcc.godbolt.org/z/WTPdTn1Yf
> > 
> > Could you please explain why? I though that both must be same accepted or
> > same rejected.
> 
> I hope the explanation above helps.

Thank you for the explanation.

If this is how the standard specifies compiler behavior, I wonder if this
should be considered as a defect (DR). There is nothing in
`std::numeric_limits::max()` that depends on the definition of `C`, so
there is no reason to prevent it from compiling.

[Bug c++/102293] [10/11/12 Regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)

2021-09-13 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293

--- Comment #6 from andysem at mail dot ru ---
Ok, thanks for confirming.

[Bug c++/102293] [10/11/12 Regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)

2021-09-13 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293

--- Comment #4 from andysem at mail dot ru ---
(In reply to Martin Liška from comment #3)
> Dup.
> 
> *** This bug has been marked as a duplicate of bug 100161 ***

This bug isn't about a warning but about an ICE. Are you positive this is a
duplicate?

[Bug c++/102293] [10 regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)

2021-09-12 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293

--- Comment #2 from andysem at mail dot ru ---
The code compiles if -std=c++03 is replaced with e.g. -std=c++11.

[Bug c++/102293] [10 regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)

2021-09-12 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293

--- Comment #1 from andysem at mail dot ru ---
I forgot to mention that the same code compiles successfully with gcc 9.4.0.

And here is the output from gcc 11.1.0:

In file included from ./boost/assert.hpp:58,
 from
./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:16,
 from ./boost/xpressive/detail/core/matchers.hpp:46,
 from ./boost/xpressive/detail/core/linker.hpp:35,
 from ./boost/xpressive/detail/static/static.hpp:19,
 from
./boost/xpressive/detail/static/transforms/as_matcher.hpp:18,
 from ./boost/xpressive/detail/static/grammar.hpp:21,
 from ./boost/xpressive/basic_regex.hpp:28,
 from ./boost/xpressive/regex_compiler.hpp:30,
 from ./boost/xpressive/xpressive_dynamic.hpp:17,
 from libs/log/test/run/filt_matches_xpressive.cpp:18:
./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp: In
instantiation of ‘boost::xpressive::detail::simple_repeat_matcher::simple_repeat_matcher(const Xpr&, unsigned int, unsigned int,
std::size_t) [with Xpr =
boost::xpressive::detail::matcher_wrapper >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > > >; Greedy = mpl_::bool_;
std::size_t = long unsigned int]’:
./boost/xpressive/detail/dynamic/dynamic.hpp:216:48:   required from ‘void
boost::xpressive::detail::make_simple_repeat(const
boost::xpressive::detail::quant_spec&,
boost::xpressive::detail::sequence&, const Xpr&) [with BidiIter =
const char*; Xpr =
boost::xpressive::detail::matcher_wrapper >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > > >]’
./boost/xpressive/detail/dynamic/dynamic.hpp:130:31:   required from ‘void
boost::xpressive::detail::dynamic_xpression::repeat_(const
boost::xpressive::detail::quant_spec&,
boost::xpressive::detail::sequence&, mpl_::int_<1>, mpl_::false_)
const [with Matcher =
boost::xpressive::detail::charset_matcher >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > >; BidiIter = const char*;
mpl_::false_ = mpl_::bool_]’
./boost/xpressive/detail/dynamic/dynamic.hpp:96:22:   required from ‘void
boost::xpressive::detail::dynamic_xpression::repeat(const
boost::xpressive::detail::quant_spec&,
boost::xpressive::detail::sequence&) const [with Matcher =
boost::xpressive::detail::charset_matcher >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > >; BidiIter = const char*]’
./boost/xpressive/detail/dynamic/dynamic.hpp:94:18:   required from here
./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:81:37: internal
compiler error: in value_dependent_expression_p, at cp/pt.c:26978
   81 | BOOST_ASSERT(0 != width && unknown_width() != width);
  |  ~~~^~~
./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:81:13: note: in
expansion of macro ‘BOOST_ASSERT’
   81 | BOOST_ASSERT(0 != width && unknown_width() != width);
  | ^~~~
0xe306b3 internal_error(char const*, ...)
???:0
0xe27039 fancy_abort(char const*, int, char const*)
???:0
0xf5208e value_dependent_expression_p(tree_node*)
???:0
0xf520a9 value_dependent_expression_p(tree_node*)
???:0
0x10ed97e tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
???:0
0x10eddd9 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
???:0
0x10ee17a tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
???:0
0x1178f37 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
???:0
0x117910c tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
???:0
0x1179047 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
???:0
0x1178f95 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
???:0
0x1179047 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
???:0
0x1178f95 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
???:0
0x122150a instantiate_decl(tree_node*, bool, bool)
???:0
0xfa3850 instantiate_pending_templates(int)
???:0
0xf9f8f3 c_parse_final_cleanups()
???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

"g++-11"  -ftemplate-depth-1024 -std=c++03 -fPIC -m64 -pthread -O0
-fno-inline -Wall -g -fno-strict-aliasing -DBOOST_ALL_NO_LIB=1
-DBOOST_ATOMIC_DYN_LINK=1 -DBOOST_CHRONO_DYN_LINK=1
-DBOOST_FILESYSTEM_DYN_LINK=1 -DBOOST_LOG_DYN_LINK=1
-DBOOST_LOG_SETUP_DYN_LINK=1 -DBOOST_LOG_USE_AVX2 -DBOOST_LOG_USE_SSSE3
-DBOOST_TEST_DYN_LINK=1 -DBOOST_TEST_NO_AUTO_LINK=1 -DBOOST_THREAD_BUILD_DLL=1
-DBOOST_THREAD_POSIX -DBOOST_THREAD_USE_DLL=1 -D_XOPEN_SOURCE=600  -I"."
-I"libs/log/test/common"  -c -o

[Bug c++/102293] New: [10 regression] ICE when compiling Boost.Xpressive (in value_dependent_expression_p, at cp/pt.c:26730)

2021-09-12 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102293

Bug ID: 102293
   Summary: [10 regression] ICE when compiling Boost.Xpressive (in
value_dependent_expression_p, at cp/pt.c:26730)
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Created attachment 51443
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51443=edit
Preprocessed output of the failing test.

I'm seeing ICE with gcc 10.3.0 and gcc 11.1.0 when compiling Boost.Log tests
that use Boost.Xpressive. The ICE points at Boost.Xpressive code:

$ "g++-10" -save-temps  -ftemplate-depth-1024 -std=c++03 -fPIC -m64 -pthread
-O0 -fno-inline -Wall -g -fno-strict-aliasing -DBOOST_ALL_NO_LIB=1
-DBOOST_ATOMIC_DYN_LINK=1 -DBOOST_CHRONO_DYN_LINK=1
-DBOOST_FILESYSTEM_DYN_LINK=1 -DBOOST_LOG_DYN_LINK=1
-DBOOST_LOG_SETUP_DYN_LINK=1 -DBOOST_LOG_USE_AVX2 -DBOOST_LOG_USE_SSSE3
-DBOOST_TEST_DYN_LINK=1 -DBOOST_TEST_NO_AUTO_LINK=1 -DBOOST_THREAD_BUILD_DLL=1
-DBOOST_THREAD_POSIX -DBOOST_THREAD_USE_DLL=1 -D_XOPEN_SOURCE=600  -I"."
-I"libs/log/test/common"  -c -o
"bin.v2/libs/log/test/filt_matches_xpressive.test/gcc-11/debug/cxxstd-03-iso/threadapi-pthread/threading-multi/run/filt_matches_xpressive.o"
"libs/log/test/run/filt_matches_xpressive.cpp"
In file included from ./boost/xpressive/detail/core/matchers.hpp:46,
 from ./boost/xpressive/detail/core/linker.hpp:35,
 from ./boost/xpressive/detail/static/static.hpp:19,
 from
./boost/xpressive/detail/static/transforms/as_matcher.hpp:18,
 from ./boost/xpressive/detail/static/grammar.hpp:21,
 from ./boost/xpressive/basic_regex.hpp:28,
 from ./boost/xpressive/regex_compiler.hpp:30,
 from ./boost/xpressive/xpressive_dynamic.hpp:17,
 from libs/log/test/run/filt_matches_xpressive.cpp:18:
./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp: In
instantiation of ‘boost::xpressive::detail::simple_repeat_matcher::simple_repeat_matcher(const Xpr&, unsigned int, unsigned int,
std::size_t) [with Xpr =
boost::xpressive::detail::matcher_wrapper >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > > >; Greedy = mpl_::bool_;
std::size_t = long unsigned int]’:
./boost/xpressive/detail/dynamic/dynamic.hpp:216:48:   required from ‘void
boost::xpressive::detail::make_simple_repeat(const
boost::xpressive::detail::quant_spec&,
boost::xpressive::detail::sequence&, const Xpr&) [with BidiIter =
const char*; Xpr =
boost::xpressive::detail::matcher_wrapper >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > > >]’
./boost/xpressive/detail/dynamic/dynamic.hpp:130:31:   required from ‘void
boost::xpressive::detail::dynamic_xpression::repeat_(const
boost::xpressive::detail::quant_spec&,
boost::xpressive::detail::sequence&, mpl_::int_<1>, mpl_::false_)
const [with Matcher =
boost::xpressive::detail::charset_matcher >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > >; BidiIter = const char*;
mpl_::false_ = mpl_::bool_]’
./boost/xpressive/detail/dynamic/dynamic.hpp:96:22:   required from ‘void
boost::xpressive::detail::dynamic_xpression::repeat(const
boost::xpressive::detail::quant_spec&,
boost::xpressive::detail::sequence&) const [with Matcher =
boost::xpressive::detail::charset_matcher >, mpl_::bool_,
boost::xpressive::detail::compound_charset > > >; BidiIter = const char*]’
./boost/xpressive/detail/dynamic/dynamic.hpp:94:18:   required from here
./boost/xpressive/detail/core/matcher/simple_repeat_matcher.hpp:81:23: internal
compiler error: in value_dependent_expression_p, at cp/pt.c:26730
   81 | BOOST_ASSERT(0 != width && unknown_width() != width);
  | ~~^~~
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 10.3.0-1ubuntu1'
--with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release 

[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset

2021-05-12 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477

--- Comment #9 from andysem at mail dot ru ---
(In reply to Martin Sebor from comment #8)
> 
> Submitting a bug for the LTO problem is only helpful if it comes with a test
> case to reproduce it.  I have heard about problems suppressing warnings in
> LTO builds but I'm not aware of it being a design limitation, and other than
> the mention of the inlining caveat (the -Wstringop-overflow example) I can't
> find it mentioned in the LTO section of the manual pointed to by the link.

The warning is very likely appearing as a result of inlining, and since the
compiler inlines based on its own decisions it is difficult to create a test
case that would reproduce the problem.

The bottom line is that diagnostic pragmas don't work (reliably) with LTO, and
fixing that requires the compiler to track pragmas past inlining.

[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset

2021-05-10 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477

--- Comment #7 from andysem at mail dot ru ---
(In reply to Martin Sebor from comment #6)
> In C/C++ the size of the largest object is PTRDIFF_MAX - 1 bytes.  This is
> necessary so that the difference between the address of its first byte and
> that just past its last byte fits in ptrdiff_t.

Sure, but memset, as well as other string operations, takes size_t as an
argument and doesn't have this limitation. They must be able to operate on the
full range of size_t worth of bytes. I don't remember neither C nor C++
declaring a narrowing limit on the allowed size_t range wrt. these functions.

And please note that string functions are often used on raw storage that is
obtained through means other than defined by C++ as operations creating
objects. That storage may be larger than the maximum C++ object size.

> A number of GCC warnings
> point out code that breaks this assumption, including allocations in excess
> of the maximum or calls to functions like memset or memcpy that would access
> more than the maximum.   None of these warnings considers the conditions
> that must be true in order for control to reach the problem statement, so if
> the statement hasn't been eliminated from in the IL (i.e., can be seen in
> the output of -fdump-tree-optimized or whatever pass the warning runs in)
> the warning triggers.  Taking the conditions into consideration and issuing
> "maybe" kinds of warnings (like -Wmaybe-uninitialized) is a work in progress
> but as of now no warning other thanb -Wmaybe-uninitialized does.

Ok, then please consider this bug as a case for adding consideration of
conditions.

> No such distinction as "there's definitely a bug" vs "there may be a bug" is
> feasible because there is, in general, no way to identify functions that are
> defined but never called, or those that are only called with arguments that
> make certain code paths unreachable.  As a result, with few exceptions every
> GCC warning is of the latter kind.

To clarify, what I mean by "there's definitely a bug" kind of warning is when
the compiler has enough data to prove that the operation will result in UB (for
example, cause a buffer overflow). Such as:

char* alloc_string(size_t n)
{
char* p = new char[n]; // forgot about the byte for the terminating 0
std::memset(p, 0, n + 1);
return p;
}

This is notably different from compiler trying to reason about a possible value
of n based on size_t range and issuing warnings based on that. I would like to
see warnings of the former kind but not so much of the latter kind.

If there is no way to separate the two kinds of warnings, and the compiler does
not consider prior conditions that are written *specifically* to prevent UB
that triggers the warning, I'm afraid, I don't see the use for such warnings.

> If you're interested, the following
> two-part article discusses some of the challenges in this area:
> 
> https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings/
> https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings-
> part-2/

Thanks, I will read it. I understand that compiler warnings are not easy to
implement right.

> If the #pragma suppression doesn't work with LTO please report that as a
> separate bug.  There's nothing we can do to avoid the warning in this case.

The fact that pragmas don't work with LTO is documented in the -flto
description[1]. My impression is that gcc simply doesn't support warning
control through pragmas when LTO is enabled. Is this not the case?

I can create a bug, but I cannot provide a small repro, since the warning
triggers in a large code base, but not necessarily a small test case. Would
such a bug report be useful?

[1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto

[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset

2021-05-10 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477

--- Comment #5 from andysem at mail dot ru ---
(In reply to Martin Sebor from comment #4)
> The case of _size being very large and n very small may be handled correctly
> by the original code thanks to the check for _capacity but because its value
> isn't known it affects neither the codegen nor the warning.
> 
> The warning is designed to flag bounds greater than PTRDIFF_MAX

But sizes above PTRDIFF_MAX are valid sizes. IMHO, triggering the warning based
on possible sizes is just not the right approach to begin with. If what you're
trying to detect is an integral overflow then *that* is what should be
detected, and pointed to by the warning, not the unrelated memset.

As an illustration to my point, I worked around this warning by replacing
memset with std::fill_n with no other changes to the code. The optimizer will
likely convert it to memset anyway. This shows that (1) if the supposed bug is
an overflow in `_size + 4` then it is no longer being detected and (2) there is
no problem in filling more than PTRDIFF_MAX bytes (as expected, of course).

> and that's
> just what it sees here as is evident from the output of the
> -fdump-tree-optimized option.  There is nothing to fix here.  Code that's
> unreachable as a result of preconditions GCC cannot prove may be susceptible
> to false positives.  That's a problem shared by all flow-sensitive warnings,
> not just in GCC but in all static analyzers with flow analysis.
> 
> In general, GCC warnings are designed to "report constructions that are not
> inherently erroneous but that are risky or suggest there may have been an
> error."  Not every instance of every warning necessarily corresponds to an
> error, and some may even be false positives.

False positives is a very good reason for the warnings to be disabled or
ignored, meaning that they are useless.

Is there any reliable detection implemented in -Wstringop-overflow? Meaning
that, when the compiler can tell "there's definitely a bug", as opposed to
"there may be something fishy going on"? If yes, then it may be sensible to
separate the reliable and speculative parts into different warnings.
Personally, I would like to have a warning when the compiler is certain that
something is wrong, but not the false positives, like shown in this bug. That's
the only thing stopping me from just disabling this warning globally.

> Unhelpful warnings can be
> disabled either globally, on the command line, or on a case by case basis by
> #pragma GCC diagnostic.

#pragma GCC diagnostic doesn't work with LTO. I was actually suppressing that
warning in the real code base with a #pragma, but it no longer works when LTO
is enabled by default in my distro.

> Adding preconditions like 'if (_size >= __PTRDIFF_MAX__ / 4)
> __builtin_unreachable ();' (the 4 should be replaced by sizeof (value_type)
> in the original test case) often helps not just warnings but also codegen. 
> They're not required but can be helpful and preferable to suppression.

Again, that precondition is incorrect.

[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset

2021-05-10 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477

--- Comment #3 from andysem at mail dot ru ---
To put it another way, the case of _size being large and n small is valid, and
memset (and resize in general) do behave correctly in this case. Which is why
the warning is bogus and the suggested assert is incorrect. Whether n became
small is a separate matter unrelated to memset.

You could argue that the line where n becomes small *as a result of overflow*
could issue a warning. For example, if size_t was augmented with some sort of
an attribute. But that is not what -Wstringop-overflow is.

[Bug middle-end/100477] Bogus -Wstringop-overflow warning on memset

2021-05-10 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477

andysem at mail dot ru changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #2 from andysem at mail dot ru ---
(In reply to Martin Sebor from comment #1)
> 
> Since n is set to _size + 4 in test(), (n < _size) holds only if the
> addition wraps around zero, implying _size is excessively large.

If `_size + 4` does overflow, the result of memset is still valid (i.e. it is
filling what it is supposed to fill), and the warning is incorrect.

> The
> warning can be avoided by asserting that that isn't so, e.g., by adding the
> following
> 
>   if (_size >= __PTRDIFF_MAX__ / 4)
> __builtin_unreachable ();
> 
> just before the memset call.

I don't think users should be required to insert non-portable asserts like
these to be able to use memset without warnings.

Besides, this assert is incorrect as it will prevent memset to be called if
(_size >= __PTRDIFF_MAX__ / 4). (Why `/ 4`, BTW?)

I'm reopening because I think, as it currently works, the warning is bogus and
should be fixed. It is not actionable on the user's side and the suggested
workaround is not practical.

[Bug middle-end/100477] New: Bogus -Wstringop-overflow warning on memset

2021-05-07 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100477

Bug ID: 100477
   Summary: Bogus -Wstringop-overflow warning on memset
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following test case:

#include 
#include 

class Container
{
public:
typedef unsigned char value_type;
typedef value_type* pointer;
typedef std::size_t size_type;

Container();

void clear();
void reserve(size_type);

size_type size() const { return _size; }

void resize(size_type n)
{
if(n == 0)
{
clear();
return;
}

if(n > _capacity)
{
reserve(n);
}
else if(_owned && n < _size)
{
std::memset(_buf + n, 0, (_size - n) * sizeof(value_type));
}
_size = n;
}

private:
pointer _buf;
size_type _size;
size_type _capacity;
int _shrinkCounter;
bool _owned;
};

void test(Container& c, int v)
{
Container::size_type position = c.size();
c.resize(position + sizeof(int));
}

$ g++ -Wall -O3 -o memset_warning.o -c memset_warning.cpp 
In file included from /usr/include/string.h:519,
 from /usr/include/c++/10/cstring:42,
 from memset_warning.cpp:2:
In function ‘void* memset(void*, int, size_t)’,
inlined from ‘void Container::resize(Container::size_type)’ at
memset_warning.cpp:32:24,
inlined from ‘void test(Container&, int)’ at memset_warning.cpp:48:13:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:33: warning: ‘void*
__builtin_memset(void*, int, long unsigned int)’ specified bound
18446744073709551612 exceeds maximum object size 9223372036854775807
[-Wstringop-overflow=]
   59 |   return __builtin___memset_chk (__dest, __ch, __len,
  |  ~~~^
   60 |  __glibc_objsize0 (__dest));
  |  ~~

Here, the reported constant 18446744073709551612 is -4, which implies the
compiler is ignoring the `n < _size` check and calculates `(_size - n)` for
memset even though it will never be called.

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 10.3.0-1ubuntu1'
--with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-gcn/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1)

[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-23 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

--- Comment #10 from andysem at mail dot ru ---
Thanks. Will this be backported to 10 and 11 branches?

[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-15 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

--- Comment #6 from andysem at mail dot ru ---
Hmm, it looks like the original code has changed enough so that the problem no
longer reproduces, with or without __restrict__. I don't have the older version
of the code, so I can't tell what changed exactly. Data alignment most probably
did change, but data layout of A (its equivalent in the original code) as well
as the operation on it certainly didn't. Sorry for the confusion.

[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-15 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

--- Comment #3 from andysem at mail dot ru ---
I tried adding __restrict__ to the equivalents of x, y1 and y2 in the original
larger code base and it didn't help. The compiler (gcc 10.2) would still
generate the same half-vectorized code.

[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-08 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

--- Comment #1 from andysem at mail dot ru ---
For reference, an ideal version of this code should look something like this:

test(A&, A const&, A const&):
movdqu  (%rsi), %xmm0
movdqu  (%rdi), %xmm1
movdqu  (%rdx), %xmm2
paddd   %xmm1, %xmm0
psubd   %xmm2, %xmm0
movups  %xmm0, (%rdi)
ret

[Bug tree-optimization/99971] New: GCC generates partially vectorized and scalar code at once

2021-04-08 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

Bug ID: 99971
   Summary: GCC generates partially vectorized and scalar code at
once
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following code sample:

struct A
{
unsigned int a, b, c, d;

A& operator+= (A const& that)
{
a += that.a;
b += that.b;
c += that.c;
d += that.d;
return *this;
}

A& operator-= (A const& that)
{
a -= that.a;
b -= that.b;
c -= that.c;
d -= that.d;
return *this;
}
};

void test(A& x, A const& y1, A const& y2)
{
x += y1;
x -= y2;
}

The code, when compiled with options "-O3 -march=nehalem", generates:

test(A&, A const&, A const&):
pushq   %rbp
movdqu  (%rdi), %xmm1
pushq   %rbx
movl4(%rsi), %r8d
movdqu  (%rsi), %xmm0
movl(%rsi), %r9d
paddd   %xmm1, %xmm0
movl8(%rsi), %ecx
movl12(%rsi), %eax
movl%r8d, %esi
movl(%rdi), %ebp
movl4(%rdi), %ebx
movl8(%rdi), %r11d
movl12(%rdi), %r10d
movups  %xmm0, (%rdi)
subl(%rdx), %r9d
subl4(%rdx), %esi
subl8(%rdx), %ecx
subl12(%rdx), %eax
addl%ebp, %r9d
addl%ebx, %esi
movl%r9d, (%rdi)
popq%rbx
addl%r11d, %ecx
popq%rbp
movl%esi, 4(%rdi)
addl%r10d, %eax
movl%ecx, 8(%rdi)
movl%eax, 12(%rdi)
ret

https://gcc.godbolt.org/z/Mzchj8bxG

Here you can see that the compiler has partially vectorized the test function -
it converted "x += y1" to paddd, as expected, but failed to vectorize "x -=
y2". But at the same time the compiler also generated scalar code, including
for the already vectorized "x += y1" line, basically duplicating it.

Note that when either "x += y1" or "x -= y2" is commented, the compiler is able
to vectorize the line that is left. It is also able to vectorize both lines
when the += and -= operators are applied to different objects instead of x.

This is reproducible since gcc 8 up to and including 10.2. gcc 7 doesn't
vectorize this code. With the current trunk on godbolt the generated code is
different:

test(A&, A const&, A const&):
movdqu  (%rsi), %xmm0
movdqu  (%rdi), %xmm1
paddd   %xmm1, %xmm0
movups  %xmm0, (%rdi)
movd%xmm0, %eax
subl(%rdx), %eax
movl%eax, (%rdi)
pextrd  $1, %xmm0, %eax
subl4(%rdx), %eax
movl%eax, 4(%rdi)
pextrd  $2, %xmm0, %eax
subl8(%rdx), %eax
movl%eax, 8(%rdi)
pextrd  $3, %xmm0, %eax
subl12(%rdx), %eax
movl%eax, 12(%rdi)
ret

Here the compiler is able to vectorize "x += y1" but not "x -= y2". At least,
it removed the duplicate scalar version of "x += y1".

Given that the compiler is able to vectorize each line in isolation, I would
expect it to be able to vectorize them combined. Generating duplicate versions
of code is certainly not expected.

[Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()

2021-03-20 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563

--- Comment #8 from andysem at mail dot ru ---
Thanks again.

[Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()

2021-03-16 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563

--- Comment #5 from andysem at mail dot ru ---
Thanks. Will there be a fix in branch 10?

[Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper()

2021-03-12 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563

Bug ID: 99563
   Summary: Code miscompilation caused by _mm256_zeroupper()
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following code:

#include 

constexpr unsigned int block_size = 8u;

float compute_generic(const double* data, unsigned int width, unsigned int
height);

inline __attribute__((always_inline))
float compute_avx(const double* data, unsigned int width, unsigned int height)
{
__m128d mm_res = _mm_setzero_pd();
unsigned long block_count = static_cast< unsigned long >((width +
block_size - 1) / block_size)
* static_cast< unsigned long >((height + block_size - 1) / block_size);

float res = static_cast< float >(_mm_cvtsd_f64(mm_res) / static_cast<
double >(block_count));

_mm256_zeroupper();

return res;
}

float compute(const double* data, unsigned int width, unsigned int height)
{
if (width >= 16 && height >= block_size)
{
return compute_avx(data, width, height);
}
else
{
return compute_generic(data, width, height);
}
}

$ g++ -O2 -march=sandybridge -mno-vzeroupper -o test.o test.cpp

https://gcc.godbolt.org/z/dhr7an

The code compiles to:

compute(double const*, unsigned int, unsigned int):
cmp esi, 15
jbe .L2
cmp edx, 7
jbe .L2
vzeroupper
ret
.L2:
jmp compute_generic(double const*, unsigned int, unsigned int)

which leaves the result of compute() uninitialized if AVX path is taken. The
problem disappears if one of the following is done:

- -O2 is replaced with -O1
- -mno-vzeroupper is removed
- _mm256_zeroupper(); call is removed (the upper bits of vector registers is
left dirty, though)

This is a regression in gcc 10 branch and later, gcc 9.x compiles this
correctly.

[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>

2021-02-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978

--- Comment #3 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #1)
> This would be an ABI break, and so not going to happen.

Is there no way to improve standard components implementation? I'd imagine you
could provide the new implementation in the new version inline namespace and
still support the old ABI for backward compatibility.

(In reply to Jonathan Wakely from comment #2)
> If we were going to do this, we could also make std::optional occupy a
> single byte, using one bit for the value and one for the engaged flag.

This would be more problematic as emplace(), value() and operator*() need to
return T&, which would not be possible.

[Bug libstdc++/98978] New: Consider packing _M_Engaged in the tail padding of T in optional<>

2021-02-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978

Bug ID: 98978
   Summary: Consider packing _M_Engaged in the tail padding of T
in optional<>
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Using std::optional with some types may considerably increase object sizes
since it adds alignof(T) bytes worth of overhead. Sometimes it is possible to
avoid this overhead if the flag indicating presence of the stored value
(_M_Engaged in libstdc++ sources) is placed in the tail padding of the T
object. This can be done if std::optional constructs an object of a type that
derives from T, which has an additional bool data member that is initialized to
true upon construction. The below code roughly illustrates the idea:

template< typename T >
struct _Optional_payload_base
{
  struct _PresentT : T
  {
const bool _M_Engaged = true;

// Forwarding ctors and other members
  };

  static constexpr size_t engaged_offset = offsetof(_PresentT, _M_Engaged);

  struct _AbsentT
  {
unsigned char _M_Offset[engaged_offset];
const bool _M_Engaged = false;
  };

  union _Storage
  {
_AbsentT _M_Empty;
_PresentT _M_Value;

_Storage() : _M_Empty() {}

// Forwarding ctors and other members
  };

  _Storage _M_payload

  bool is_engaged() const noexcept
  {
return *reinterpret_cast< const bool* >(reinterpret_cast< const unsigned
char* >(&_M_payload) + engaged_offset);
  }
};

The above relies on some implementation details, such as:

- offsetof works for the type T. It does for many types in gcc, beyond what is
required by the C++ standard. Maybe there is a way to avoid offsetof, I just
didn't immediately see it.
- The location of _M_Engaged in both _PresentT and _AbsentT is the same. This
is a property of the target ABI, and AFAICS it should be true at least on x86
psABI and I think Microsoft ABI.

The above will only work for non-final class types, for other types, and where
the above requirements don't hold true, the current code with a separate
_M_Engaged flag would work.

[Bug target/97891] [x86] Consider using registers on large initializations

2020-11-19 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97891

--- Comment #5 from andysem at mail dot ru ---
Using a register is beneficial even for bytes and words if there are multiple
of mov instructions. But there has to be a single reg0 for all movs.

I'm not very knowlegeable about gcc internals, but would it be beneficial to
implement this on a higher level than instruction transformation? I.e. so that
instead of this:

a = 0;
b = 0;
c = 0;

we have:

any reg0 = 0; // any represents a type compatible with any fundamental or
enum type
a = reg0;
b = reg0;
c = reg0;

This way, reg0 would be in a single register, and that xorl instruction could
be subject to other tree optimizations.

With tree-level optimization, another thing to note is vectorizer. I know gcc
can sometimes merge adjacent initializations without padding to a larger single
instruction initializazion. For example:

struct A
{
long a1;
long a2;

A() :
a1(0), a2(0)
{
}
};

void test(A* p, unsigned int count)
{
for (unsigned int i = 0; i < count; ++i)
{
p[i] = A();
}
}

test(A*, unsigned int):
testl   %esi, %esi
je  .L1
leal-1(%rsi), %eax
pxor%xmm0, %xmm0
salq$4, %rax
leaq16(%rdi,%rax), %rax
.L3:
movups  %xmm0, (%rdi)
addq$16, %rdi
cmpq%rax, %rdi
jne .L3
.L1:
ret

I would like this to still work.

[Bug target/97891] [x86] Consider using registers on large initializations

2020-11-18 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97891

--- Comment #1 from andysem at mail dot ru ---
As a side note, the "xorl %edx, %edx" in the original code should have been
moved outside the loop, as it was in the code with __asm__ block.

[Bug target/97891] New: [x86] Consider using registers on large initializations

2020-11-18 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97891

Bug ID: 97891
   Summary: [x86] Consider using registers on large
initializations
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following example code:

struct A
{
long a;
short b;
int c;
char d;
long x;
bool y;
int z;
char* p;

A() :
a(0), b(0), c(0), d(0), x(0), y(false), z(0), p(0)
{}
};

void test(A* p, unsigned int count)
{
for (unsigned int i = 0; i < count; ++i)
{
p[i] = A();
}
}

When compiled with "-O3 -march=nehalem" the generated code is:

test(A*, unsigned int):
testl   %esi, %esi
je  .L1
leal-1(%rsi), %eax
leaq(%rax,%rax,2), %rax
salq$4, %rax
leaq48(%rdi,%rax), %rax
.L3:
xorl%edx, %edx
movq$0, (%rdi)
addq$48, %rdi
movw%dx, -40(%rdi)
movl$0, -36(%rdi)
movb$0, -32(%rdi)
movq$0, -24(%rdi)
movb$0, -16(%rdi)
movl$0, -12(%rdi)
movq$0, -8(%rdi)
cmpq%rax, %rdi
jne .L3
.L1:
ret

https://gcc.godbolt.org/z/TrfWYr

Here, the main loop body between .L3 and .L1 is 60 bytes large, with a
significant amount of space wasted on the $0 constants encoded in mov
instructions. It would be more efficient to use a single zero register in all
member initializations, especially given that %edx is already used like that.

A loop rewritten like this:

for (unsigned int i = 0; i < count; ++i)
{
__asm__
(
"movq%q1, (%0)\n\t"
"movw%w1, 8(%0)\n\t"
"movl%1, 12(%0)\n\t"
"movb%b1, 16(%0)\n\t"
"movq%q1, 24(%0)\n\t"
"movb%b1, 32(%0)\n\t"
"movl%1, 36(%0)\n\t"
"movq%q1, 40(%0)\n\t"
: : "r" (p + i), "q" (0)
);
}

compiles to:

test(A*, unsigned int):
testl   %esi, %esi
je  .L1
leal-1(%rsi), %eax
leaq(%rax,%rax,2), %rax
salq$4, %rax
leaq48(%rdi,%rax), %rdx
xorl%eax, %eax
.L3:
movq%rax, (%rdi)
movw%ax, 8(%rdi)
movl%eax, 12(%rdi)
movb%al, 16(%rdi)
movq%rax, 24(%rdi)
movb%al, 32(%rdi)
movl%eax, 36(%rdi)
movq%rax, 40(%rdi)

addq$48, %rdi
cmpq%rdx, %rdi
jne .L3
.L1:
ret

Here, the loop between .L3 and .L1 only takes 34 bytes, which is nearly half
the original size.

Constant (for example, zero) initialization is a frequently used pattern to
initialize structures, so the sequences like the above are quite wide spread.
Converting cases like this to the use of registers could save some code size
and reduce cache pressure.

[Bug libstdc++/59325] Provide a way to disable deprecated warnings

2020-10-20 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59325

--- Comment #8 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #6)
> (In reply to andysem from comment #2)
> > (In reply to Jonathan Wakely from comment #1)
> > > You can use a #pragma to disable -Wdeprecated locally
> > 
> > But the legacy C++ is used in the library, which code I'd like to avoid
> > changing.
> 
> Is this still a problem? Uses of deprecated features within the library
> itself should no longer emit warnings (I think I've disabled them with
> pragmas).
> 
> So the only uses should be in our own code, which you can add pragmas to.

Yes, this is still a problem. To be clear, the library I was referring to was
not libstdc++ but another library which I have no control of and which is still
using C++03 features. I don't think it will move away from std::auto_ptr any
time soon as it is part of the API.

> If you want to ignore the warning because you know you're not going to use a
> different implementation or a newer standard, you can use -Wno-deprecated to
> disable all such warnings globally or use #pragma to disable them locally.

I don't want to disable _all_ deprecated warnings universally, only those
emitted by libstdc++. For this reason I cannot use -Wno-deprecated. #pragma
also doesn't really work because libstdc++ can be included from any header,
including those from other libraries, for which I don't want to disable
warnings.