[Bug tree-optimization/113718] New: std::bit_cast making the compiler generate unnecessary code.

2024-02-02 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113718

Bug ID: 113718
   Summary: std::bit_cast making the compiler generate unnecessary
code.
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

#include 

void f();

auto const p1 = 
auto const p2 = std::bit_cast();

bool a() {
  return p1 == p2;
}

The code emitted for `a` should be the same as-if `return true;` but the usage
of a "no-op" `std::bit_cast` muddies the waters and the compiler generates:

a():
  cmp QWORD PTR p2[rip], OFFSET FLAT:_Z1fv
  sete al
  ret

FWIW: The following changes make the compiler to generate more efficient code:

1. Move `p1` and `p2` inside the body of `a`.
2. Replace `std::bit_cast` with `static_cast`.
3. Remove the cast altogether.

Things get terribly worse if `p1` and `p2` are made `static` and moved inside
the body of `a`.

Given that the compiler can get confused by a "no-op" `std::bit_cast`, I wonder
if it would do the same for more interesting code than this toy example.

https://godbolt.org/z/daWe5Yod8

[Bug middle-end/110906] New: __attribute__((optimize("no-math-errno"))) has no effect.

2023-08-04 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906

Bug ID: 110906
   Summary: __attribute__((optimize("no-math-errno"))) has no
effect.
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider this C++ code compiled with -O3:

double g(double x) {
  return std::sqrt(x);
}

Usually this does call the library function std::sqrt because x might be
negative and errno needs to be set accordingly. Moreover, with -fno-math-errno
a single sqrtsd instruction is emitted. However, annotating g with

__attribute__((optimize("no-math-errno")))

has no effect. This attribute (and #pragma GCC optimize("no-math-errno") ) used
to work up to gcc 5.5.

https://godbolt.org/z/T1nb11bv5

[Bug tree-optimization/107564] New: Fail to recognize overflow check for addition of __uint128_t operands

2022-11-07 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107564

Bug ID: 107564
   Summary: Fail to recognize overflow check for addition of
__uint128_t operands
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

char f128(__uint128_t m, __uint128_t n) {
#if !defined(USE_BUILTIN_ADD_OVERFLOW)
m += n;
return m < n;
#else
__uint128_t r;
return __builtin_add_overflow(m, n, );
#endif
}

When USE_BUILTIN_ADD_OVERFLOW is undefined, GCC fails to recognise this is an
overflow check and with -O3 generates this:

mov r8, rdi
mov rax, rsi
mov rdi, rax
mov rsi, r8
mov rax, rdx
mov rdx, rcx
add rsi, rax
adc rdi, rcx
cmp rsi, rax
mov rcx, rdi
sbb rcx, rdx
setcal
ret

When USE_BUILTIN_ADD_OVERFLOW is defined, it generates better code but still
suboptimal:

mov r8, rdi
mov rax, rsi
mov rsi, r8
mov rdi, rax
add rsi, rdx
adc rdi, rcx
setcal
ret

For other unsigned integer types GCC generates the same optimal code for both
methods. For instance for uint64_t:

add rdi, rsi
setcal
ret

https://godbolt.org/z/bj4M5no4j

[Bug tree-optimization/104539] Failed to inline a very simple template function when it's explicit instantiated.

2022-02-14 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104539

--- Comment #1 from Cassio Neri  ---
Sorry, the last snippet above should be

template 
inline
int f() {
return 0;
}

[Bug tree-optimization/104539] New: Failed to inline a very simple template function when it's explicit instantiated.

2022-02-14 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104539

Bug ID: 104539
   Summary: Failed to inline a very simple template function when
it's explicit instantiated.
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

template 
//inline
int f() {
return 0;
}

int g() {
return f<0>() + 1;
}

Using -O3, I'd expect f to be inlined in g and this is indeed the case:

g():
  mov eax, 1
  ret

However, if f is explicit instantiated:

template unsigned f<0>();

then we get a function call (or a jmp if tail call optimisation is possible)

g():
  sub rsp, 8
  call int f<0>()
  add rsp, 8
  add eax, 1
  ret

A (quite unusual, IMHO) workaround is declaring f as inline:

template 
inline
unsigned f() {
return n;
}

https://godbolt.org/z/TarsTY3zb

[Bug tree-optimization/104444] New: Missing constant folding in shift expression.

2022-02-08 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10

Bug ID: 10
   Summary: Missing constant folding in shift expression.
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

#include 

inline bool f(uint32_t m, int n) {
  return (m >> n) != 0;
}

bool g(int n) {
  return f(1 << 24, n);
}

g can be optimised to "return n <= 24". LLVM does that but gcc doesn't.

The example above drove me to another missing optimisation opportunity based on
undefined behaviour. (Perhaps a matter for other report?)

bool h(uint32_t m, int n) {
  return (n >= 0 && n < 32) || (m >> n) != 0;
}

If (n >= 0 && n < 32) is false, then (m >> n) is UB (in C++, probably also in
C). Therefore, h can be optimised to "return true" but gcc doesn't do that
(neither does LLVM).

See here: https://godbolt.org/z/hx9vGe6Kj

If confirmed, these bugs could be added to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987

Potentially related:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95817
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94789#c1

[Bug tree-optimization/101436] Yet another bogus "array subscript is partly outside array bounds"

2021-07-13 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101436

--- Comment #3 from Cassio Neri  ---
Because of the typeid check the unsafe static_cast never happens and I think
the compiler should not be warning about a problem that doesn't exist. Besides,
there's no array involved in this code. I appreciate the attempt to emit a good
warning that might improve my code but the message is completely misleading and
make me scratch my head. Here the code is minimal and obvious to figure out
that there's no array. In a large code base I could spend longtime looking for
an array that doesn't exist or I could find an array that has no issue but the
compiler makes me think it has.

Re using a dynamic_cast: I could surely use a dynamic_cast in real code but
this is a compiler test case. IMHO, it should be minimal, straight to the point
at the expense of neglecting other aspects of the language (e.g. better
practices) that could otherwise divert the attention. As I said the virtual
destructor in A and the typeid check were there to avoid obvious UB that would
happen had I unconditionally performed the static_cast. In that UB case
(provided the message were clearer and not misleading talking about arrays) I'd
be very grateful for getting the warning. 

Notice also that I provided a couple of changes that don't make the code any
better w.r.t. an unsafe static_cast (which, again, is never performed). These
changes make the spurious warning to go away (which is good) and this shows
that there's certainly something wrong with the logic that decides to emit the
warning for the code as originally posted.

What about this example which involves no virtual method and where dynamic_cast
cannot help?

struct A {
  int type;
};

struct C1 {
  int i;
  int j;
};

struct C2 {
  int i;
};

template 
struct B : A {
  B() : A{i} {}
  T x;
};

using BC1 = B;
using BC2 = B;

void do_something(int);
BC2 get_BC();

void h(A& a) {
  if (a.type == 1) {
BC1& b = static_cast(a);
int i = b.x.i;
do_something(i);
  }
  else if (a.type == 2) {
BC2& b = static_cast(a);
int i = b.x.i;
do_something(i);
  }
}

void foo() {
  auto x = get_BC();
  h(x);
}

Here again, there are changes that make the code no better w.r.t. a potential
unsafe cast but do make the warning to go away:

1) Change the return type of get_BC to BC1.
2) Remove C1::j.
3) Remove the extra level of indirection given by template class B. (See [2].)

[1] Example above: https://godbolt.org/z/Tha3M6xq3
[2] Example with no template: https://godbolt.org/z/nWsPvTrYr

[Bug tree-optimization/101436] New: Yet another bogus "array subscript is partly outside array bounds"

2021-07-13 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101436

Bug ID: 101436
   Summary: Yet another bogus "array subscript is partly outside
array bounds"
   Product: gcc
   Version: 11.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

This bogus warning was reported at least twice recently: #98266 and #101374.
Below is a new case that, it seems, hasn't been addressed yet.

#include 

struct A {
  virtual ~A();
};

template 
struct B : A {
  T x;
};

struct C1 {
  int i;
  double j;
};

struct C2 {
  int i;
};

void do_something(int);
B get_BC2();

void h(A& a) {
  if (typeid(a) == typeid(B)) {
B& b = static_cast&>(a);
int i = b.x.i;
do_something(i);
  }
}

void foo() {
  B x = get_BC2();
  h(x);
}

Compiled with '-O3 -Warray-bounds' yields:

: In function 'void foo()':
:27:9: warning: array subscript 'B[0]' is partly outside array
bounds of 'B [1]' [-Warray-bounds]
   27 | int i = b.x.i;
  | ^
:33:9: note: while referencing 'x'
   33 |   B x = get_BC2();

FWIW:

1) This is a regression from GCC 10.3.

2) The warning goes away if any of the following changes are made:
  * Remove C1::j.
  * Change type of C1::j to any of int, char, bool, unsigned or float. (Perhaps
any type T such that sizeof(T) <= sizeof(int)).
  * Compile with '-fPIC' (however, if h is marked inline then the warning comes
back).

3) If b is declared as B (as opposed to B&), then the warning points to
line 'struct B: A {'.

4) The test case could be simplified further by removing A's virtual destructor
and the typeid check. However, this would make the code to invoke UB and I hope
the code above doesn't.

5) #98266 regards virtual inheritance which does not appear here and a test
cases therein issues no warning when compiled with GCC 11.1.

6) IIUC the warning reported by #101374 happens in GCC's own code and was
caused by some recent change that is not part of GCC 11.1. Indeed a test case
reported therein compiles fine with GCC 11.1 whereas the one above doesn't.

See also:

Test case above: https://godbolt.org/z/n4obaohPs
Test case from  #98266: https://godbolt.org/z/PEjfhs3T6
Test case from #101374: https://godbolt.org/z/Ebb8YszT5

[Bug tree-optimization/101225] New: Example where y % 16 == 0 seems more expensive than y % 400 == 0.

2021-06-26 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101225

Bug ID: 101225
   Summary: Example where y % 16 == 0 seems more expensive than y
% 400 == 0.
   Product: gcc
   Version: 11.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider this implementation of is_leap_year:

bool is_leap_year_1(short year) {
  return year % 100 == 0 ? year % 400 == 0 : year % 4 == 0;
}

If a number is multiple of 100, then it's divisible by 400 if and only if it's
divisible by 16. Since checking divisibility by 16 is cheap, one would expect
the following version to be more efficient (at least, not worse):

bool is_leap_year_2(short year) {
  return year % 100 == 0 ? year % 16 == 0 : year % 4 == 0;
}

According to [1] the latter is 1.4x slower than the former.

The emitted instructions with -O3 [2] don't seem bad and, except for a leal x
addw, the difference is a localized strength-reduction from "y % 400 == 0" to
"y % 16 == 0":

is_leap_year_1(short):
  imulw $23593, %di, %ax
  leal 1308(%rax), %edx
  rorw $2, %dx
  cmpw $654, %dx
  ja .L2
  addw $1296, %ax # Begin: year % 400 == 0
  rorw $4, %ax#
  cmpw $162, %ax  #
  setbe %al   # End  : year % 400 == 0
  ret
.L2:
  andl $3, %edi
  sete %al
  ret

is_leap_year_2(short):
  imulw $23593, %di, %ax
  addw $1308, %ax
  rorw $2, %ax
  cmpw $654, %ax
  ja .L6
  andl $15, %edi # Begin: y % 16 == 0
  sete %al   # End  : y % 16 == 0
  ret
.L6:
  andl $3, %edi
  sete %al
  ret

FWIW: My educated **guess** is that the issue is the choice of registers: for
version 1 just after leal, the register rax/ax/al is free and regardless of the
branch taken, the CPU can continue the calculation of "y % 100 == 0" in
parallel with the other divisibility check, up to "sete %al". For version 2,
rax/ax/al is busy during the whole execution of "y % 100" and "sete %al" can't
be preemptively executed. As a test for my theory I reimplemented half of
is_leap_year_2 in inline asm (see in [1] and [2]) using similar choices of
registers as in is_leap_year_1 and I got the performance boost that I was
expecting.

[1] https://quick-bench.com/q/3U8t4qzXxtSpsehbWNOh3SWxBGQ
[2] https://godbolt.org/z/jfK3j5777

Note: [1] runs GCC 10.2 but the same happens on GCC 11.0.0.

[Bug tree-optimization/88797] [9 Regression] Unneeded branch added when function is inlined (function runs faster if not inlined)

2021-05-14 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797

--- Comment #13 from Cassio Neri  ---
FWIW: This seems to have been fixed since 10.1. As we can see in [1], on
version 10.1, test_f has no unnecessary branches, as opposed to version 9.3.

[1] https://godbolt.org/z/h87Efbanb

As far as I'm concerned, you could close the ticket.

[Bug middle-end/93634] Improving modular calculations (e.g. divisibility tests).

2020-02-11 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93634

--- Comment #1 from Cassio Neri  ---
FYI, this is what clang trunk generates:
  imull $-1431655765, %edi, %eax # imm = 0xAAAB
  addl $1431655764, %eax # imm = 0x5554
  rorl %eax
  cmpl $715827882, %eax # imm = 0x2AAA
  setb %al
  retq

[Bug middle-end/93634] New: Improving modular calculations (e.g. divisibility tests).

2020-02-08 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93634

Bug ID: 93634
   Summary: Improving modular calculations (e.g. divisibility
tests).
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

bool f(unsigned n) { return n % 6 == 4; }

at -O3 the code generated for x86_64 is

mov%edi,%eax
mov$0xaaab,%edx
imul   %rdx,%rax
shr$0x22,%rax
lea(%rax,%rax,2),%eax
add%eax,%eax
sub%eax,%edi
cmp$0x4,%edi
sete   %al
retq   

whereas it could be

sub$0x4,%edi
imul   $0xaaab,%edi,%edi
ror%edi
cmp$0x2aa9,%edi
setbe  %al
retq   

Notice the later is quite similar to what gcc generates for n % 6 == 3:

imul   $0xaaab,%edi,%edi
sub$0x1,%edi
ror%edi
cmp$0x2aaa,%edi
setbe  %al
retq   

It's true that there's a small mathematical difference for the cases r <= 3 and
r >= 4 but not enough to throw away the faster algorithm. I reckon this is not
obvious and I refer to
https://accu.org/var/uploads/journals/Overload154.pdf#page=13 which presents
the overall idea and some benchmarks. In addition, it makes some comments on
gcc's generated code for other cases of n % d == r. References therein provide
mathematical proofs and extra benchmarks.

FWIW:

1) This relates to bug 82853 and bug 12849 and to a lesser extend bug 89845.

2) Specifically, it confirms the idea (for unsigned integers) described by Orr
Shalom Dvory in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853#c33

[Bug libstdc++/92124] New: std::vector copy-assigning when it should move-assign.

2019-10-16 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92124

Bug ID: 92124
   Summary: std::vector copy-assigning when it should move-assign.
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider two vectors a and rv. In the situation below a = std::move(rv)
copy-assigns elements of rv into a, violating
[container.requirements.general]/4, Table 83: "All existing elements of a are
either move assigned to or destroyed" (See [1].)

It happens with std::vector> such that:

1) X's move-constructor might throw (though I'm assigning and not
constructing);
2) A does not propagate on move-assignment and allocators used by source and
target vectors do not compare equal.

The following MCVE contains some boiler plate and the most important parts are
indicated by comments.

#include 
#include 
#include 
#include 

struct X {
X() = default;
X(const X&) = default;

// Move constructor might throw
X(X&&) noexcept(false) {} // "= default" changes reported behaviour

// Tracking calls to assignment functions
X& operator=(const X&) {
putchar('c'); return *this;
}
X& operator=(X&&) noexcept(true) {
putchar('m'); return *this;
}
};

unsigned counter = 0;

template 
struct A : std::allocator {

template 
struct rebind { using other = A; };

A() : std::allocator(), id(++counter) {}

// Does not propagate
using propagate_on_container_move_assignment = std::false_type;

// Does not always compare equal
using is_always_equal = std::false_type;
bool operator ==(const A& o) { return id == o.id; }
bool operator !=(const A& o) { return id != o.id; }

unsigned id;
};

int main() {
std::vector> a(2), rv(2);
a = std::move(rv);
}

Running the code above outputs "cc" (instead of "mm") confirming the two
elements of rv are copy-assigned into a.

See relevant discussion in [2] (with link to possible culprit lines of code in
libstdc++) and life example above in [3]

[1]
https://timsong-cpp.github.io/cppwp/n4659/container.requirements#tab:containers.container.requirements
[2]
https://stackoverflow.com/questions/58378051/issue-when-compiling-libstdc-with-clang?noredirect=1#comment103136248_58378051
[3] https://godbolt.org/z/EgkPrP

[Bug c++/91158] "if (__builtin_constant_p(n))" versus "if constexpr (__builtin_constant_p(n))"

2019-07-14 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91158

--- Comment #7 from Cassio Neri  ---
(In reply to Jakub Jelinek from comment #4)
Got it! Thank you, Mark and Jonathan. Please, feel free to close the ticket.

[Bug c++/91158] "if (__builtin_constant_p(n))" versus "if constexpr (__builtin_constant_p(n))"

2019-07-14 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91158

--- Comment #3 from Cassio Neri  ---
Forget my use case and comments on dead code elimination. That was a
digression. (My bad.) In general, I don't expect `if` and `if constexpr` to
behave the same but I do in this particular case. (I might be wrong.) Finally,
since `__builtin_constant_p` is not standard this ticket is a feature request,
not a bug report.

My reasoning is this: when the compiler sees `static_assert(f1(1));` it enters
a "constexpr evaluation mode" (not sure this is the right terminology but you
get the point). At this moment, regardless of optimization level the compiler
must propagate `1` to `f1` otherwise (generally speaking) it cannot evaluate
`f1(1)` and decide whether the `static_assert` passes or not. Therefore, when
it enters `f1` and sees `if constexpr (__builtin_constant_p(n))` it is already
in "constexpr evaluation mode" (so `constexpr` here is redundant) and it knows
`n == 1`. Hence, it should evaluate `__builtin_constant_p(n)` to `1`.

To make clear that my point is not that `if` and `if constexpr` should always
work the same, please, contrast with this program:

int main() {
if (f0(1))   puts("if   : yes");
else puts("if   : no");
if constexpr (f0(1)) puts("if ce: yes");
else puts("if ce: no");
}

The output in -O0 mode is `if   : no` and `if ce: yes`. Since the first `if` is
not `constexpr`, the compiler doesn't need to enter "constexpr evaluation mode"
and -O0 is too low for `1` to be propagated to `f0`. The second `if`, on the
other hand, is `constexpr`. The compiler enters "constexpr evaluation mode",
propagates `1` to `f0` and evaluates `if (__builtin_constant_p(n))` to `1`
regardless that this `if` is not `constexpr`.

Also, to make clear I'm OK with `if constexpr (__builtin_constant_p(n))`
evaluating to `0` even in -O3 level, consider this:

int main() {
if (f0(1))   puts("if   : yes");
else puts("if   : no");
}

The output is `if   : no`. Since the `if` is not `constexpr`, constant
propagation is up to the optimizer (QoI issue) and I'm OK if it enters
"constexpr evaluation mode" only inside `f1` (when it sees `if constexpr
(__builtin_constant_p(n))`) at which point is too late to know the value of `n`
and considers `n` as non constant.

Finally, I would link to link this issue with bug 70552 comment 5. Martin
Sebor, commenting on a patch for another related issue says:

"The patch referenced from it sets a precedent for the intrinsic treating
constant expressions as constant despite its late evaluation under "normal"
circumstances".

IIUIC, it says that `__builtin_constant_p(expr)` always evaluates to `1` if
expr is a C++ constant expression (e.g. a call to a `constexpr` function).
Similarly, I believe that in "constexpr evaluation mode", almost every
evaluation of `__builtin_constant_p(expr)` in the taken path should yield `1`.
(There are exceptions, notably, when `expr` is a non `constexpr` local
variable.)

[Bug c++/91158] New: "if (__builtin_constant_p(n))" versus "if constexpr (__builtin_constant_p(n))"

2019-07-13 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91158

Bug ID: 91158
   Summary: "if (__builtin_constant_p(n))" versus "if constexpr
(__builtin_constant_p(n))"
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

constexpr bool f0(int n) {
if (__builtin_constant_p(n))
return true;
return false;
// Alternatively:
// return __builtin_constant_p(n) ? true : false;
// return __builtin_constant_p(n);
}
constexpr bool f1(int n) {
if constexpr (__builtin_constant_p(n))
return true;
return false;
}
static_assert( f0(1));
static_assert( f1(1)); // gcc 9.1 fails

I would expect both static_asserts to pass, that is, I would expect no
difference in behaviour between f0 and f1. (FWIW, for gcc 9.1, -O0, -O1, -O2
and -O3 all behave the same.)

This might be an issue with different moments where 'if constexpr' and
__builtin_constant_p are evaluated. (Similarly to bug 19449 comment 2.) In any
case, I find f1 very misleading. My real use case is like

if (__builtin_constant_p(n))
// efficient code
else
// less efficient code

However, branching at runtime is unacceptable and, if the compiler does not
know the value of n it's preferable to drop the 'if-else' altogether and live
with the less efficient code. Willing to avoid branching at runtime is a big
hint for using 'if constexpr' but, as things stand, this implies *never* using
the more efficient code.

Although a regular 'if' does what I want, I don't get the assurance that 'if
contexpr' provides about no branching at runtime. Instead, I need to rely on
the optimizer rather than on the semantics of C++.

See also bug 54021.

[Bug middle-end/12849] testing divisibility by constant

2019-05-31 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849

--- Comment #7 from Cassio Neri  ---
Thanks for implementing the modular inverse algorithm in gcc. However, the
implementation has an issue. In some cases, for no obvious reason, the compiler
falls back to the old algorithm. For instance,

bool f1(unsigned n) { return n % 10 == 5; }

as expected, uses the modular inverse algorithm and translates to

f1(unsigned int):
  imull $-858993459, %edi, %edi
  subl $1, %edi
  rorl %edi
  cmpl $429496729, %edi
  setbe %al
  ret

whereas

bool f2(unsigned n) { return n % 10 == 6; }

doesn't use the modular inverse algorithm and is the same as in older versions
of gcc:

f2(unsigned int):
  movl %edi, %eax
  movl $3435973837, %edx
  imulq %rdx, %rax
  shrq $35, %rax
  leal (%rax,%rax,4), %eax
  addl %eax, %eax
  subl %eax, %edi
  cmpl $6, %edi
  sete %al
  ret

See on godbolt: https://godbolt.org/z/u-C54I

I would like make another observation. For some divisors (e.g. 7, 19, 21) the
modular inverse algorithm seems to be faster than the traditional one even when
the remainder r (in n % d == r) is not a compile time constant. In general this
happens in cases where the "magic number" M used by the traditional algorithm
to replace the division "n / d" with "n * M >> k" is such that M doesn't fit in
a register and extra operations are required to overcome this problem. In other
words, these are the divisors for which '"Add" indicator' in
https://www.hackersdelight.org/magic.htm shows 1.

I made some measurements and I hope to make my results available for your
consideration soon.

[Bug tree-optimization/90447] Missed opportunities to use adc (worse when -1 is involved)

2019-05-13 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90447

--- Comment #1 from Cassio Neri  ---
Forgot to mention this discussion on SO:

https://stackoverflow.com/questions/56101507/is-there-anything-special-about-1-0x-regarding-adc

[Bug tree-optimization/90447] New: Missed opportunities to use adc (worse when -1 is involved)

2019-05-12 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90447

Bug ID: 90447
   Summary: Missed opportunities to use adc (worse when -1 is
involved)
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

The following are three attempts to get gcc to generate adc instructions from
C++:

#include 

unsigned constexpr X = 0;

unsigned f1(unsigned a, unsigned b) {
b += a;
auto c = b < a;
b += X + c;
return b;
}

unsigned f2(unsigned a, unsigned b) {
b += a;
b += X + (b < a);
return b;
}

unsigned f3(unsigned a, unsigned b) {
b += a;
unsigned char c = b < a;
_addcarry_u32(c, b, X, );
return b;
}

The 3 functions above (-O3 -std=c++17) generate:

  addl%edi, %esi
  movl%esi, %eax
  adcl$0, %eax
  ret

This is great and I would expect that changing X would only affect the
immediate value and nothing more. I was wrong. Changing X to 1, makes f1 and f3
change as I expected but f2 becomes:

f2(unsigned int, unsigned int):
  xorl%eax, %eax
  addl%edi, %esi
  setc%al
  addl$1, %eax
  addl%esi, %eax
  ret

I thought I could blame "b += X + (b < a);" for being undefined behaviour.
However, I believe that, at least in c++17 this is not the case given the
addition of this sentence:

"The right operand is sequenced before the left operand."

to [expr.ass]. As far as Standard C++ is concerned, I expect f1 to be
equivalent to f2.

Things got worse when X == -1:

f1(unsigned int, unsigned int):
  xorl %eax, %eax
  addl %edi, %esi
  setc %al
  leal -1(%rax,%rsi), %eax
  ret
f2(unsigned int, unsigned int):
  xorl %eax, %eax
  addl %edi, %esi
  setnc %al
  subl %eax, %esi
  movl %esi, %eax
  ret
f3(unsigned int, unsigned int):
  addl %esi, %edi
  movl $-1, %eax
  setc %dl
  addb $-1, %dl
  adcl %edi, %eax
  ret

No adc whatsoever. I'm not an assembly guy but if I understand f3 correctly,
"setc %dl / addb $-1, dl" is simply storing the CF in dl and adding dl to 0xff
to force CF to get the same value it already had before instruction setc was
executed. Basically, this is a convoluted-register-wasteful nop.

I thought the problem could be related to issue [1] but this one has already
being resolved in trunk where this issue also happens and -fno-split-paths
doesn't seem to change anything.

The example in godbold is https://godbolt.org/z/3GUyLj but if you play with the
site's settings (particularly, lib.f) be aware of their issue [2].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 but this 
[2] https://github.com/mattgodbolt/compiler-explorer/issues/1377

[Bug c++/89960] New: Implicit derived to base conversion considered type punning.

2019-04-04 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89960

Bug ID: 89960
   Summary: Implicit derived to base conversion considered type
punning.
   Product: gcc
   Version: 8.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

struct base {
int i;
void f(){}
};

template 
struct derived : base {
void g1() {
return (this->*F)();
}
void g2() {
base* p = this;
return (p->*F)();
}
};

void h() {
derived<::f> x;
x.g1();
x.g2();
}

Compiling with -O2 -Wstrict-aliasing gives a warning

warning: dereferencing type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
 return (this->*F)();
   ~~^~

It looks like the implicit conversion from derived to base is considered
type-punning. 

Remarks: The warning goes away if either:
1) -O2 is not used.
2) -Wstrict-aliasing is not used.
3) base has no non-static data members.
4) F is not a template parameter.
5) x.g1()) is not called. (In contrast, x.g2() compiles fine and this is a
workaround for the issue.)
6) if another compiler is used (other vendor's but also gcc 4.6.4 or earlier)

[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)

2019-01-11 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797

--- Comment #5 from Cassio Neri  ---
There's a (fragile) workaround:

void use(unsigned);
#define VERSION 0
bool f(unsigned x, unsigned y) {
#if VERSION == 0
return x <  + (y <= );
#else
bool b = y <= ;
return x <  + b;
#endif
}
void test_f(unsigned x, unsigned y) {
for (unsigned i = 0; i < ; ++i)
use(f(x++, y++));
}

f is till the same. Version 0 of test_f has 4 jumps whereas version 1 has only
one.

https://godbolt.org/z/gZZQ2f

[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)

2019-01-10 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797

--- Comment #4 from Cassio Neri  ---
Comment on attachment 45408
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45408
Running example

The magic numbers 4, 6, 7, 0x24924924u and 0xb6db6db7u were chosen in an
attempt to maximize the probability of making branch prediction harder and the
difference in performance clearer.

[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)

2019-01-10 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797

--- Comment #3 from Cassio Neri  ---
The attached file is running example that shows that performance is damaged.
The code runs faster when test_f calls g instead of f where g is
bool g(unsigned x, unsigned y) {
if (x >= y) return false;
return f(n, r);
}
even in the case where x < y and g does call f.

Depending on #defines the example runs either f, g or both. These are the
timings:

$ g++ -O3 -o gcc_issue gcc_issue.cpp -D RUN_SIMPLE && time ./gcc_issue
Running simple function...
real0m3.646s
user0m3.645s
sys 0m0.000s

$ g++ -O3 -o gcc_issue gcc_issue.cpp -D RUN_COMPLEX && time ./gcc_issue
Running complex function...
real0m1.165s
user0m1.161s
sys 0m0.003s

$ g++ -O3 -o gcc_issue gcc_issue.cpp -D RUN_BOTH && time ./gcc_issue
Running simple function...
Running complex function...
real0m3.059s
user0m3.051s
sys 0m0.007s

Notice that run both is faster than running f only! This is so because then the
compiler gives up inlining and calls the (good) generated code for f in
isolation.

[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)

2019-01-10 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797

--- Comment #2 from Cassio Neri  ---
Created attachment 45408
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45408=edit
Running example

[Bug rtl-optimization/88797] New: Unneeded branch added when function is inlined (function runs faster if not inlined)

2019-01-10 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797

Bug ID: 88797
   Summary: Unneeded branch added when function is inlined
(function runs faster if not inlined)
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

void use(unsigned);
bool f(unsigned x, unsigned y) {
return x <  + (y <= );
}
void test_f(unsigned x, unsigned y) {
for (unsigned i = 0; i < ; ++i)
use(f(x++, y++));
}

The generated code for f seems fine and the there's no branch to test y <=
:

f(unsigned int, unsigned int):
  xorl %eax, %eax
  cmpl $, %esi
  setbe %al
  addl $, %eax
  cmpl %edi, %eax
  seta %al
  ret

However, when f is inlined in test_f, a branch is introduced to decide whether
x should be compared to  or 1112 (code cut for brevity)

test_f(unsigned int, unsigned int):
  [...]
  jmp .L6
.L14:
  cmpl $, %eax
.L12:
  [...]
.L6:
  [...]
  cmpl $, %ebx
  jbe .L14
  cmpl $1110, %eax
  jmp .L12
  [...]

See https://godbolt.org/z/_EC992 use -O3.

This seems to be a regression: it used to be OK up to 6.3 and then degraded in
7.1 (according to godbolt).

[Bug middle-end/12849] testing divisibility by constant

2018-03-14 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849

Cassio Neri  changed:

   What|Removed |Added

 CC||cassio.neri at gmail dot com

--- Comment #4 from Cassio Neri  ---
A simple mathematical proof that the algorithm works is found here:

http://clomont.com/efficient-divisibility-testing/

See also https://stackoverflow.com/a/49264279/1137388.

[Bug tree-optimization/84648] New: Missed optimization : loop not removed.

2018-03-01 Thread cassio.neri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84648

Bug ID: 84648
   Summary: Missed optimization : loop not removed.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

The loop below is not eliminated:

int main() {
for (unsigned i = 0; i < (1u << 31); ++i) {
}
return 0;
}

Compiled with -O3:

main:
  xor eax, eax
.L2:
  add eax, 1
  jns .L2
  xor eax, eax
  ret

The loop is removed for other bounds, e.g. (1u << 31) + 1 or (1u << 31) - 1, or
when < is replaced with <=.

Allow me to make a guess of the underlying problem: The optimization that uses
jns to detect when i reaches (10...0)_2 ends up by blocking the other
optimization that eliminates the loop altoghether.

Same issue when using unsigned long long and (1ull << 63).

FWIW: clang has the same issue (in C but not in C++).

[Bug c++/59238] New: Dynamic allocating a list-initialized object of a type with private destructor fails.

2013-11-21 Thread cassio.neri at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59238

Bug ID: 59238
   Summary: Dynamic allocating a list-initialized object of a type
with private destructor fails.
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com

Consider:

class foo {
  ~foo() {}
};

int main() { 
  new foo;   // OK
  new foo(); // OK
  new foo{}; // error: 'foo::~foo()' is private
}

The last line shouldn't fail to compile since the destructor is not invoked.

FWIW, it compiles fine with clang. It also compiles fine with gcc 4.9.0
20131109 if foo has a user declared default constructor.


[Bug c++/58170] New: Crash when aliasing a template class that is a member of its template base class.

2013-08-15 Thread cassio.neri at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58170

Bug ID: 58170
   Summary: Crash when aliasing a template class that is a member
of its template base class.
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com

The code below crashes gcc 4.8.1 (coincidentaly, it also crashes clang 3.3).

-

template typename T, typename U
struct base {

  template typename V
  struct derived;

};

template typename T, typename U
template typename V
struct baseT, U::derived : public baseT, V {
};

// This (wrong?) alias declaration provoques the crash.
template typename T, typename U, typename V
using derived = baseT, U::derivedV;

// This one works:
// template typename T, typename U, typename V
// using derived = typename baseT, U::template derivedV;

template typename T
void f() {
  derivedT, bool, char m{};
  (void) m;
}

int main() {
  fint();
}

-

$ g++ -v -save-temps -std=c++11 -Wall -pedantic main.cpp 
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc/src/gcc-4.8-20130725/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl
--disable-cloog-version-check --enable-lto --enable-gold --enable-ld=default
--enable-plugin --with-plugin-ld=ld.gold --with-linker-hash-style=gnu
--disable-install-libiberty --disable-multilib --disable-libssp
--disable-werror --enable-checking=release
Thread model: posix
gcc version 4.8.1 20130725 (prerelease) (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-std=c++11' '-Wall' '-Wpedantic'
'-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/cc1plus -E -quiet -v -D_GNU_SOURCE
main.cpp -mtune=generic -march=x86-64 -std=c++11 -Wall -Wpedantic
-fpch-preprocess -o main.ii
ignoring nonexistent directory
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../x86_64-unknown-linux-gnu/include
#include ... search starts here:
#include ... search starts here:
 /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../include/c++/4.8.1

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../include/c++/4.8.1/x86_64-unknown-linux-gnu

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../include/c++/4.8.1/backward
 /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/include
 /usr/local/include
 /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-std=c++11' '-Wall' '-Wpedantic'
'-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/cc1plus -fpreprocessed main.ii
-quiet -dumpbase main.cpp -mtune=generic -march=x86-64 -auxbase main -Wall
-Wpedantic -std=c++11 -version -o main.s
GNU C++ (GCC) version 4.8.1 20130725 (prerelease) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.8.1 20130725 (prerelease), GMP version
5.1.2, MPFR version 3.1.2, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C++ (GCC) version 4.8.1 20130725 (prerelease) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.8.1 20130725 (prerelease), GMP version
5.1.2, MPFR version 3.1.2, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: fcec9480bd3d120c7e8d40d79394317d
main.cpp: In substitution of ‘templateclass T, class U, class V using derived
= baseT, U::derivedV [with T = T; U = bool; V = char]’:
main.cpp:24:24:   required from here
main.cpp:16:39: internal compiler error: Segmentation fault
 using derived = baseT, U::derivedV;
   ^
Please submit a full bug report,
with preprocessed source if appropriate.
See https://bugs.archlinux.org/ for instructions.

[Bug c++/56693] New: Fail to ignore const qualification on top of a function type.

2013-03-22 Thread cassio.neri at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56693



 Bug #: 56693

   Summary: Fail to ignore const qualification on top of a

function type.

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: cassio.n...@gmail.com





The following code:



void f() {}



template typename T void g(const T*) { }



int main() {

g(f);

}



raises an error with this note:



types 'const T' and 'void()' have incompatible cv-qualifiers



Attempting to instantiate g creates a function that takes a pointer to a const

T where T = void(). Since there's no such thing as a const function, this

explains the note. However, C++11 8.3.5/6 says



The effect of a cv-qualifier-seq in a function declarator is not the same as

adding cv-qualification on top of the function type. In the latter case, the

cv-qualifiers are ignored.



Hence, the const qualifier should be ignored and the code should compile. (It

does compile with clang and visual studio.)



For more information see:

http://stackoverflow.com/questions/15578298/can-a-const-t-match-a-pointer-to-free-function


[Bug c++/55101] New: Invalid implicit conversion in initialization when source type is a template argument type

2012-10-27 Thread cassio.neri at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55101



 Bug #: 55101

   Summary: Invalid implicit conversion in initialization when

source type is a template argument type

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: cassio.n...@gmail.com





Calling f(b) below implies an implicit call to an explicit conversion operator.

This is fine with for gcc 4.8.0 but illegal for clang 3.1 and 3.2 (trunk).

Notice that gcc complains (as it should) in other similar circumstances.



struct A { };



struct B {

  explicit operator int() const { return 1; }

  explicit operator A() const { return A(); }

};



template typename T void f(T b) { int x = b; }

template typename T void g(T b) {   A y = b; }



int main() {



  B b;



  //int x = b; // Error: cannot convert 'B' to 'int' in initialization

  f(b);// OK for gcc 4.8.0, despite that 'int x = b;' occurs inside f

   // Error for clang 3.1 and 3.2.



  //A y = b;   // Error: conversion from 'B' to non-scalar type 'A' requested

  //g(b);  // Error: conversion from 'B' to non-scalar type 'A' requested

}


[Bug libstdc++/54722] New: std::is_nothrow_default_constructibleT::value depends on whether destructor throws or not.

2012-09-26 Thread cassio.neri at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54722



 Bug #: 54722

   Summary: std::is_nothrow_default_constructibleT::value

depends on whether destructor throws or not.

Classification: Unclassified

   Product: gcc

   Version: unknown

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: libstdc++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: cassio.n...@gmail.com





Consider:



#include iostream

#include type_traits



struct foo {

  foo() noexcept {}

  ~foo() {}

};



int main() {

  std::cout  std::boolalpha;

  std::cout  std::is_nothrow_default_constructiblefoo::value  std::endl;

  return 0;

}



This should output 'true' but it outputs 'false'. Adding a 'noexcept'

specification to ~foo() makes the code to output the expected result.



Looking at the source, I guess, the reason lies on the implementation of this

helper class:



templatetypename _Tp

  struct __is_nt_default_constructible_atom

  : public integral_constantbool, noexcept(_Tp())

  { };



Indeed, the expression '_Tp()' if executed creates a temporary of type _Tp

whose lifetime ends immediately and ~Tp_ is called. Therefore,

'noexcept(_Tp())' is 'true' if and only if neither the constructor nor the

destructor throw.



I believe the below implementation fixes the problem (at least it does for the

example above):



templatetypename _Tp

  struct __is_nt_default_constructible_atom

  : public std::integral_constantbool, noexcept(new (std::nothrow) _Tp)

  { };