[Bug lto/97508] lto1: internal compiler error: decompressed stream: Destination buffer is too small

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97508

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2020-10-21
   Keywords||ice-on-valid-code, lto
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
It works for me.  What compression scheme are you using?  I have

Supported LTO compression algorithms: zlib

are you using zstd, if so, which version?

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #28 from Christophe Leroy  ---
Looks like we have a way to do it. Works at least with GCC 5.5, 8.2, 9.2, 10.1

unsigned long g(unsigned long a, unsigned long b)
{
unsigned long long s = (unsigned long long)a + (unsigned long long)b;

return (s >> 32) + s;
}

0020 :
  20:   7c 63 20 14 addcr3,r3,r4
  24:   7c 63 01 94 addze   r3,r3
  28:   4e 80 00 20 blr



Though GCC 4.9.4 does:

0014 :
  14:   7c 69 1b 78 mr  r9,r3
  18:   7c 8b 23 78 mr  r11,r4
  1c:   39 00 00 00 li  r8,0
  20:   39 40 00 00 li  r10,0
  24:   7d 6b 48 14 addcr11,r11,r9
  28:   7d 4a 41 14 adder10,r10,r8
  2c:   7c 6a 5a 14 add r3,r10,r11
  30:   4e 80 00 20 blr

[Bug target/97506] [11 Regression] ICE: in extract_insn, at recog.c:2294 (unrecognizable insn) with -mavx512vbmi -mavx512vl

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97506

--- Comment #3 from Richard Biener  ---
Targets shouldn't ICE on unsimplified stuff - the testcase explicitely disables
constant propagation so I guess we get what was asked for.

[Bug ada/97504] [11 Regression] Ada bootstrap error after r11-4029

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97504

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
   Keywords||build
Summary|[11 regress] Ada bootstrap  |[11 Regression] Ada
   |error after r11-4029|bootstrap error after
   ||r11-4029

[Bug fortran/83118] [8/9/10/11 Regression] Bad intrinsic assignment of class(*) array component of derived type

2020-10-20 Thread pault at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83118

--- Comment #36 from Paul Thomas  ---
Created attachment 49412
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49412&action=edit
An updated patch

The patch has been evolving... slowly.

I found that dependency_57.f90 segfaulted in runtime so I have fixed that.

I believe that I know how to resolve Tobias's query. I hope to submit a
complete patch in the coming days.

Paul

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-20 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #3 from Jim Wilson  ---
The basic idea here is that the movqi pattern in riscv.md currently emits RTL
for a load that looks like this
  (set (reg:QI target) (mem:QI (address)))
As an experiment, we want to try changing that to something like this
  (set (reg:DI temp) (zero_extend:DI (mem:DI (address
  (set (reg:QI target) (subreg:QI (reg:DI temp) 0))
The hope is that the optimizer will combine the subreg with following
operations resulting in smaller faster code at the end.  And this should also
solve the volatile load optimization problem.  So we need a patch, and then we
need experiments to see if the patch actually produces better code on real
programs.  It should be fairly easy to write the patch even if you don't have
any gcc experience.  The testing part of this is probably more work than the
patch writing.

The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have to
be modified to look for qimode loads from memory, allocate a temporary
register, do a zero_extending load into the temp reg, and then a subreg copy
into the target register.

You will probably also need to handle cases where both the target and source
are memory locations, in which case this already gets split into two
instructions, a load followed by a store.

You can look at the movqi pattern in arm.md file to see an example of how to do
this, where it calls gen_zero_extendqisi2.  Though for RISC-V, we would want
gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32.

If the movqi change works, then we would want similar changes for movhi and
maybe also movsi for rv64.

It might also be worth checking whether zero-extend or sign-extend is the
better choice.  We zero extend char by default, so that should be best.  For
rv64, the hardware sign extends simode to dimode by default, so sign-extend is
probably best for that.  For himode I'm not sure, I think we prefer sign-extend
by default, but that isn't necessarily the best choice for loads.  This would
have to be tested.

You can see rtl dumps by using -fdump-rtl-all.  The combiner is the pass that
should be optimizing away the unnecessary zero-extend.  You can see details of
what the combiner pass is doing by using -fdump-rtl-combine-all.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

Andreas Krebbel  changed:

   What|Removed |Added

   Last reconfirmed||2020-10-21
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #7 from Andreas Krebbel  ---
The vec_cmp* expanders in vx-builtins.md are only supposed to be used for
expanding the builtins. Unfortunately the names appear to collide with the rtx
standard names to some degree. I will try to implement the standard name
patterns and direct builtin expansion to them instead.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #47 from Christophe Leroy  ---
(In reply to Segher Boessenkool from comment #46)
> (In reply to Christophe Leroy from comment #43)
> > int g(int x)
> > {
> > return __builtin_clz(0);
> > }
> > 
> > Gives
> > 
> > 0018 :
> >   18:   38 60 00 20 li  r3,32
> >   1c:   4e 80 00 20 blr
> 
> That is because rs6000 has
> 
> /* The cntlzw and cntlzd instructions return 32 and 64 for input of zero.  */
> #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
>   ((VALUE) = GET_MODE_BITSIZE (MODE), 2)
> 
> This says that at RTL level and in the optabs, clz of 0 *is* defined,
> for rs6000.  But the builtin is not valid with an arg of 0!

I opened bug #97503 for that

[Bug target/97506] [11 Regression] ICE: in extract_insn, at recog.c:2294 (unrecognizable insn) with -mavx512vbmi -mavx512vl

2020-10-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97506

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #2 from Hongtao.liu  ---
Could be fixed by

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e6f8b314f18..19c12df4401 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3525,6 +3525,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true,
rtx op_false)
   machine_mode mode = GET_MODE (dest);
   machine_mode cmpmode = GET_MODE (cmp);

+  /* Simplify this trivial compare, avoid ICE error in pr97506.  */
+  if (rtx_equal_p (op_true, op_false))
+{
+  emit_move_insn (dest, op_true);
+  return;
+}
+
   /* In AVX512F the result of comparison is an integer mask.  */
   bool maskcmp = mode != cmpmode && ix86_valid_mask_cmp_mode (mode);

But shouldn't middle end also simplify such trivial VCOND_EXPR.

  _4 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _6 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _8 = .VCONDU (_6, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, _4, {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, 113);

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-20 Thread jiawei at iscas dot ac.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

jiawei  changed:

   What|Removed |Added

 CC||jiawei at iscas dot ac.cn

--- Comment #2 from jiawei  ---
(In reply to Jim Wilson from comment #1)
> Comparing with the ARM port, I see that in the ARM port, the movqi pattern
> emits
> (insn 8 7 9 2 (set (reg:SI 117)
> (zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8])))
> "tmp.c\
> ":7:7 -1
>  (nil))
> (insn 9 8 10 2 (set (reg:QI 116)
> (subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1
>  (nil))
> and then later it combines the subreg operation with the following
> zero_extend and cancels them out.
> 
> Whereas in the RISC-V port, the movqi pattern emits
> (insn 9 7 10 2 (set (reg:QI 76)
>   (mem/v/c:QI (lo_sum:DI (reg:DI 74)
> (symbol_ref:DI ("active") [flags 0xc4]   0x7f9f0310312\
> 0 active>)) [1 active+0 S1 A8])) "tmp.c":7:7 -1
>  (nil))
> and then combine refuses to combine the following zero-extend with this insn
> as the memory operation is volatile.
> 
> So it seems we need to rewrite the RISC-V port to make movqi and movhi zero
> extend to si/di mode and then subreg.  That probably will require cascading
> changes to avoid code size and performance regressions.
> 
> Looks like a tractable small to medium size project, but will need to wait
> for a volunteer to work on it.

Hi Jim, My name is Jiawei Chen. I am from the PLCT Lab. I had recurrented this
bug, and want to try to help fixing this bug. What should I modify,is there any
suggestions?

[Bug rtl-optimization/66706] Redundant bitmask instruction on x >> (n & 32)

2020-10-20 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66706
Bug 66706 depends on bug 66552, which changed state.

Bug 66552 Summary: Missed optimization when shift amount is result of signed 
modulus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66552

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/66552] Missed optimization when shift amount is result of signed modulus

2020-10-20 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66552

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||guojiufu at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #16 from Jiu Fu Guo  ---
Just confirmed the fix is ready in the trunk.

[Bug libstdc++/95322] std::list | take | transform, expression does not work cbegin() == end()

2020-10-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95322

--- Comment #18 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Patrick Palka
:

https://gcc.gnu.org/g:574ab3c85bb393e0ed0171b96eb42e0dd1e91de4

commit r10-8927-g574ab3c85bb393e0ed0171b96eb42e0dd1e91de4
Author: Patrick Palka 
Date:   Wed Aug 26 21:51:48 2020 -0400

libstdc++: Implement remaining piece of LWG 3448

Almost all of the proposed resolution for LWG 3448 is already
implemented; the only part left is to adjust the return type of
transform_view::sentinel::operator-.

libstdc++-v3/ChangeLog:

PR libstdc++/95322
* include/std/ranges (transform_view::sentinel::__distance_from):
Give this a deduced return type.
(transform_view::sentinel::operator-): Adjust the return type so
that it's based on the constness of the iterator rather than
that of the sentinel.
* testsuite/std/ranges/adaptors/95322.cc: Refer to LWG 3488.

(cherry picked from commit 3ae0cd94abc15e33dc06ca7a5f76f14b1d74129f)

[Bug libstdc++/55394] Using call_once without -lpthread compiles without warning

2020-10-20 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55394

--- Comment #12 from Sergei Trofimovich  ---
Aha, makes sense.

My hack did not survive bootstrap anyway as libgcc.a started referring
pthread_once() as well.

[Bug tree-optimization/97505] [11 Regression] ICE in extract_range_basic, at vr-values.c:1439 since r11-4130-g16e4f1ad44e3c00b8b73c9e4ade3d236ea7044a8

2020-10-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97505

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:292c92715b282f7c6617c94351d3e38ec027d637

commit r11-4141-g292c92715b282f7c6617c94351d3e38ec027d637
Author: Andrew MacLeod 
Date:   Tue Oct 20 16:55:14 2020 -0400

Temporarily disable trap in in extract_range_builtin check.

Until we figure out how to adjust ubsan for symbolics, disable the trap.

gcc/ChangeLog:

PR tree-optimization/97505
* vr-values.c (vr_values::extract_range_basic): Trap if
vr_values version disagrees with range_of_builtin_call.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #46 from Segher Boessenkool  ---
(In reply to Christophe Leroy from comment #43)
> int g(int x)
> {
>   return __builtin_clz(0);
> }
> 
> Gives
> 
> 0018 :
>   18: 38 60 00 20 li  r3,32
>   1c: 4e 80 00 20 blr

That is because rs6000 has

/* The cntlzw and cntlzd instructions return 32 and 64 for input of zero.  */
#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
  ((VALUE) = GET_MODE_BITSIZE (MODE), 2)

This says that at RTL level and in the optabs, clz of 0 *is* defined,
for rs6000.  But the builtin is not valid with an arg of 0!

[Bug lto/97508] New: lto1: internal compiler error: decompressed stream: Destination buffer is too small

2020-10-20 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97508

Bug ID: 97508
   Summary: lto1: internal compiler error: decompressed stream:
Destination buffer is too small
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

[hjl@gnu-skx-1 tmp]$ cat pr15323a.c
int main (void)
{
  return 0;
}
[hjl@gnu-skx-1 tmp]$ cat doit
CFLAGS="-flto -fno-profile-use -O2"
cc $CFLAGS -c pr15323a.c -o pr15323a.o
cc $CFLAGS -r -nostdlib pr15323a.o -o pr15323a-r.o
cc $CFLAGS  -o pr15323a.exe pr15323a-r.o
[hjl@gnu-skx-1 tmp]$ sh doit
during IPA pass: cp
lto1: internal compiler error: decompressed stream: Destination buffer is too
small
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
lto-wrapper: fatal error: cc returned 1 exit status
compilation terminated.
/usr/local/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
[hjl@gnu-skx-1 tmp]$

[Bug tree-optimization/97360] [11 Regression] ICE in range_on_exit

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360

--- Comment #35 from Segher Boessenkool  ---
Send it to gcc-patches@ please, with explanation and everything?

[Bug gcov-profile/97507] New: Move __gcov_exit from per-object .fini_array.00100 to libgcov

2020-10-20 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97507

Bug ID: 97507
   Summary: Move __gcov_exit from per-object .fini_array.00100 to
libgcov
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Per object file .fini_array.00100 wastes space. __gcov_exit can be called in
libgcov. It can be registered via atexit (if first run) in __gcov_init.

The Linux kernel does not call destructors and currently discards .fini_array
and .fini_array.*  `gcc -fprofile-arcs` is currently one reason that
.fini_array needs to be discarded (another reason is kasan. I don't know other
reasons)

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2020-10-20 Thread joakim.tjernlund at infinera dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #27 from Joakim Tjernlund  ---
It has been 10 years, it is not that hard :)

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #26 from Segher Boessenkool  ---
It isn't easy to do.  Feel free to try your hand at it :-)

[Bug target/97506] [11 Regression] ICE: in extract_insn, at recog.c:2294 (unrecognizable insn) with -mavx512vbmi -mavx512vl

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97506

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2020-10-20
   Target Milestone|--- |11.0
 Ever confirmed|0   |1
 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from Jakub Jelinek  ---
Started with r11-2577-g229752afe3156a3990dacaedb94c76846cebf132

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

Christophe Leroy  changed:

   What|Removed |Added

 CC||christophe.leroy at csgroup 
dot eu

--- Comment #25 from Christophe Leroy  ---
With GCC 10.1, I still get:

 :
   0:   7c 63 22 14 add r3,r3,r4
   4:   7c 84 18 10 subfc   r4,r4,r3
   8:   7d 29 49 10 subfe   r9,r9,r9
   c:   7c 69 18 50 subfr3,r9,r3
  10:   4e 80 00 20 blr

Any plan to get the expected adde/addze instead ?

[Bug tree-optimization/97505] [11 Regression] ICE in extract_range_basic, at vr-values.c:1439 since r11-4130-g16e4f1ad44e3c00b8b73c9e4ade3d236ea7044a8

2020-10-20 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97505

--- Comment #2 from Aldy Hernandez  ---
Created attachment 49411
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49411&action=edit
proposed patch

We should disable the assert while this PR is fixed, so it doesn't hold anyone
else up.

Patch needs a testcase.

[Bug tree-optimization/97505] [11 Regression] ICE in extract_range_basic, at vr-values.c:1439 since r11-4130-g16e4f1ad44e3c00b8b73c9e4ade3d236ea7044a8

2020-10-20 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97505

--- Comment #1 from Aldy Hernandez  ---
We are calculating ranges for the following:

(gdb) dd stmt
_18 = .UBSAN_CHECK_SUB (_58, _57);

which gets turned into a MINUS_EXPR.  Then we call
extract_range_from_binary_expr on the MINUS_EXPR:

  /* Pretend the arithmetics is wrapping.  If there is
 any overflow, we'll complain, but will actually do
 wrapping operation.  */
  flag_wrapv = 1;
  extract_range_from_binary_expr (vr, subcode, type,
  gimple_call_arg (stmt, 0),
  gimple_call_arg (stmt, 1));
  flag_wrapv = saved_flag_wrapv;

In extract_range_from_binary_expr, we calculate the range for _58 and _57
respectively as:

(gdb) dd vr0
integer(kind=8) [-INF, _57 - 1]
(gdb) dd vr1
integer(kind=8) [_58 + 1, +INF]

Which extract_range_from_binary_expr can then use to reduce the MINUS_EXPR to
~[0,0]:

 /* If we didn't derive a range for MINUS_EXPR, and
 op1's range is ~[op0,op0] or vice-versa, then we
 can derive a non-null range.  This happens often for
 pointer subtraction.  */
  if (vr->varying_p ()
  && (code == MINUS_EXPR || code == POINTER_DIFF_EXPR)
  && TREE_CODE (op0) == SSA_NAME
  && ((vr0.kind () == VR_ANTI_RANGE
   && vr0.min () == op1
   && vr0.min () == vr0.max ())
  || (vr1.kind () == VR_ANTI_RANGE
  && vr1.min () == op0
  && vr1.min () == vr1.max (
{
  vr->set_nonzero (expr_type);
  vr->equiv_clear ();
}

The ranger version is not handling these symbolics.

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-20 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459

--- Comment #9 from Thomas Koenig  ---
(In reply to Jakub Jelinek from comment #7)
> So, can we use this for anything but modulo 3, or 5, or 17, or 257 (all of
> those have 2^32 mod N == 2^64 mod N == 2^128 mod N == 1)

I think so, too.

> probably also
> keyed on the target having corresponding uaddv4_optab handler, normal
> expansion not being able to handle it and emitting a libcall?

Again, yes.

This can also be used as a building block for handling division
and remainder base 10.

Here's a benchmark for this (it uses the sum of digits base 10
instead). qsum1 uses the standard method, which you can find
(for example) in libgfortran.

div_rem5_v2 first calculates the remainder of the division by 5 using this
method, then does an exact division by multiplying with its modular inverse
for 2^128.

div_rem10_v2 then uses div_rem5_v2 to calculate the value and
remainder of the division by 10, and qsum_v2 uses that to
calculate the sum of digits.

The timings are about a factor of 2 faster than the straightforward
libcall version:

s = 360398898 qsum_v1: 1.091621 s
s = 360398898 qsum_v2: 0.485509 s


#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define ONE ((__uint128_t) 1)
#define TWO_64 (ONE << 64)

typedef __uint128_t mytype;

double this_time ()
{
  struct timeval tv;
  gettimeofday (&tv, NULL);
  return tv.tv_sec + tv.tv_usec * 1e-6;
}

unsigned
qsum_v1 (mytype n)
{
  unsigned ret;
  ret = 0;
  while (n > 0)
{
  ret += n % 10;
  n = n / 10;
}
  return ret;
}

static void inline __attribute__((always_inline))
div_rem_5_v2 (mytype n, mytype *div, unsigned *rem)
{
  unsigned long a, b, c;
  /* The modular inverse to 5 modulo 2^128  */
  const mytype magic = (0x * TWO_64 + 0xCCCD *
ONE);
  b = n >> 64;
  c = n;
  if (__builtin_add_overflow (b, c, &a))
a++;

  *rem = a % 5;
  *div = (n-*rem) * magic;
}

static void inline __attribute__((always_inline))
div_rem_10_v2 (mytype n, mytype *div, unsigned *rem)
{
  mytype n5;
  unsigned rem5;
  div_rem_5_v2 (n, &n5, &rem5);
  *rem = rem5 + (n5 % 2) * 5;
  *div = n5/2;
}

unsigned
qsum_v2 (mytype n)
{
  unsigned ret;
  unsigned rem;
  mytype n_new;
  ret = 0;
  while (n > 0)
{
  div_rem_10_v2 (n, &n_new, &rem);
  ret += rem;
  n = n_new;
}
  return ret;
}

#define N 1000

int main()
{
  mytype *a;
  unsigned long int s;
  double t1, t2;
  int fd;
  long int i;
  a = malloc (sizeof (*a) * N);
  fd = open ("/dev/urandom", O_RDONLY);
  read (fd, a, sizeof (*a) * N);

  s = 0;
  t1 = this_time();
  for (i=0; i

[Bug target/97506] New: [11 Regression] ICE: in extract_insn, at recog.c:2294 (unrecognizable insn) with -mavx512vbmi -mavx512vl

2020-10-20 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97506

Bug ID: 97506
   Summary: [11 Regression] ICE: in extract_insn, at recog.c:2294
(unrecognizable insn) with -mavx512vbmi -mavx512vl
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 49410
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49410&action=edit
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -Og -finline-functions-called-once -fno-tree-ccp
-mavx512vbmi -mavx512vl testcase.c 
testcase.c: In function 'foo':
testcase.c:15:1: error: unrecognizable insn:
   15 | }
  | ^
(insn 9 8 10 2 (set (reg:V16QI 84 [ _8 ])
(vec_merge:V16QI (const_vector:V16QI [
(const_int 0 [0]) repeated x16
])
(const_vector:V16QI [
(const_int 0 [0]) repeated x16
])
(reg:HI 90))) "testcase.c":8:17 -1
 (nil))
during RTL pass: vregs
testcase.c:15:1: internal compiler error: in extract_insn, at recog.c:2294
0x6ef463 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/repo/gcc-trunk/gcc/rtl-error.c:108
0x6ef4e6 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/repo/gcc-trunk/gcc/rtl-error.c:116
0x6de26e extract_insn(rtx_insn*)
/repo/gcc-trunk/gcc/recog.c:2294
0xce62b3 instantiate_virtual_regs_in_insn
/repo/gcc-trunk/gcc/function.c:1607
0xce62b3 instantiate_virtual_regs
/repo/gcc-trunk/gcc/function.c:1977
0xce62b3 execute
/repo/gcc-trunk/gcc/function.c:2026
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r11-4031-20201019090534-g04ffed2ef29-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu
--with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r11-4031-20201019090534-g04ffed2ef29-checking-yes-rtl-df-extra-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.0.0 20201019 (experimental) (GCC)

[Bug tree-optimization/97505] [11 Regression] ICE in extract_range_basic, at vr-values.c:1439 since r11-4130-g16e4f1ad44e3c00b8b73c9e4ade3d236ea7044a8

2020-10-20 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97505

Martin Liška  changed:

   What|Removed |Added

  Known to work||10.2.0
  Known to fail||11.0
 Ever confirmed|0   |1
   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |11.0
   Last reconfirmed||2020-10-20

[Bug tree-optimization/97505] New: [11 Regression] ICE in extract_range_basic, at vr-values.c:1439 since r11-4130-g16e4f1ad44e3c00b8b73c9e4ade3d236ea7044a8

2020-10-20 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97505

Bug ID: 97505
   Summary: [11 Regression] ICE in extract_range_basic, at
vr-values.c:1439 since
r11-4130-g16e4f1ad44e3c00b8b73c9e4ade3d236ea7044a8
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: aldyh at gcc dot gnu.org, amacleod at redhat dot com
  Target Milestone: ---

The following fails:

./xgcc -B.
/home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/alloc_comp_assign_8.f90
-Os -fsanitize=signed-integer-overflow -c
during GIMPLE pass: dom
/home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/alloc_comp_assign_8.f90:20:0:

   20 |   subroutine at_from_at(b,a)
  | 
internal compiler error: in extract_range_basic, at vr-values.c:1439
0x757c74 vr_values::extract_range_basic(value_range_equiv*, gimple*)
/home/marxin/Programming/gcc/gcc/vr-values.c:1439
0x17b28e3 evrp_range_analyzer::record_ranges_from_stmt(gimple*, bool)
/home/marxin/Programming/gcc/gcc/gimple-ssa-evrp-analyze.c:304
0x103536f record_temporary_equivalences_from_stmts_at_dest
/home/marxin/Programming/gcc/gcc/tree-ssa-threadedge.c:292
0x10359ca thread_through_normal_block
/home/marxin/Programming/gcc/gcc/tree-ssa-threadedge.c:1061
0x103747d thread_through_normal_block
/home/marxin/Programming/gcc/gcc/tree-ssa-threadedge.c:1302
0x103747d thread_across_edge
/home/marxin/Programming/gcc/gcc/tree-ssa-threadedge.c:1259
0xf4f034 dom_opt_dom_walker::after_dom_children(basic_block_def*)
/home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:1504
0x177c947 dom_walker::walk(basic_block_def*)
/home/marxin/Programming/gcc/gcc/domwalk.c:352
0xf510af execute
/home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:724
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #45 from Jakub Jelinek  ---
> > __attribute__ ((noinline))
> > int
> > my_fls64 (__u64 x)
> > {
> >   asm volatile ("movl $-1, %eax");
> >   return (__builtin_clzll (x) ^ 63) + 1;
> > }
> 
> Aha, bsr is not doing anything if parameter is 0, so pattern is correct
> (just the instruction is undefined for 0 which makes sense).
> But with that pattern GCC can't synthetize the code sequence above :)

The docs explicitly say that bsf/bsr DEST is undefined if ZF is set (operand is
zero), so relying on it preserving the previous value of the register is quite
dangerous.

[Bug c/97503] Suboptimal use of cntlzw and cntlzd

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
On the RTL side, there is simplify_cond_clz_ctz that should simplify it and
noce_try_ifelse_collapse that should be matching it (it does on x86 with -mbmi
-mlzcnt).

[Bug c/97503] Suboptimal use of cntlzw and cntlzd

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

--- Comment #1 from Jakub Jelinek  ---
Created attachment 49409
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49409&action=edit
gcc11-pr97503.patch

While that is something that can and often is done during ifcvt, I think for
various architectures we can do it at the GIMPLE level too, as done in this
patch (untested so far).

[Bug libstdc++/59325] Provide a way to disable deprecated warnings

2020-10-20 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59325

Jonathan Wakely  changed:

   What|Removed |Added

 Status|WAITING |NEW
   Keywords||diagnostic

--- Comment #9 from Jonathan Wakely  ---
Confirming as an enhancement request.

[Bug tree-optimization/97360] [11 Regression] ICE in range_on_exit

2020-10-20 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360

--- Comment #34 from Peter Bergner  ---
(In reply to Peter Bergner from comment #32)
> (In reply to Richard Biener from comment #31)
> > > > Is this really the correct fix?
> > 
> > Yes.
> 
> Just to verify, this is an approval for Andrew's patch above?
> If so, I can push it to trunk for Andrew.

Heh, and of course this is a rs6000 port file, so Segher, do you have issues
with the above patch?

[Bug tree-optimization/97501] [11 Regression] ICE in verify_range, at value-range.cc:369, -O2 on dead overflowing nested loops since r11-3685-gfcae5121154d1c33

2020-10-20 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97501

--- Comment #5 from Andrew Macleod  ---
Just to further annotate why this is the correct fix for posterity.. 

evrp/vrp calls adjust_range_with_scev:

  /* Start with the input range... */
  tree vrmin = vr->min ();
  tree vrmax = vr->max ();

  /* ...and narrow it down with what we got from SCEV.  */
  if (compare_values (min, vrmin) == 1)
vrmin = min;
  if (compare_values (max, vrmax) == -1)
vrmax = max;

  vr->update (vrmin, vrmax);

which adjusts an existing range.  The gimple_range routine merely returns any
range SCEV finds.. which is similar to passing in [MIN, MAX] as the range to be
adjusted.

furthermore, when compare_values is called, if either operand overflows,
  if (TREE_CODE (val1) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
{
  /* We cannot compare overflowed values.  */
  if (TREE_OVERFLOW (val1) || TREE_OVERFLOW (val2))
return -2;
it returns -2

Since EVRP will not change either end bound if -2 is returned, saturating the
end points on an overflow is tantamount to passing [MIN, MAX] as the range to
be adjusted.

So this should produce the same results for integers.

[Bug c++/90629] Support for -Wmismatched-new-delete

2020-10-20 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90629

Martin Sebor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |msebor at gcc dot 
gnu.org

--- Comment #3 from Martin Sebor  ---
With the patch I'm testing GCC reports the following for the test case in
comment #2:

90629-c2.C: In function ‘void f()’:
90629-c2.C:8:16: warning: ‘void operator delete(void*, long unsigned int)’
called on pointer returned from a mismatched allocation function
[-Wmismatched-new-delete]
8 | delete p;
  |^
90629-c2.C:7:33: note: returned from a call to ‘void* operator new [](long
unsigned int)’
7 | char * p = new char [ 10];
  | ^
90629-c2.C:11:19: warning: ‘void operator delete [](void*)’ called on pointer
returned from a mismatched allocation function [-Wmismatched-new-delete]
   11 | delete [] p2;
  |   ^~
90629-c2.C:10:25: note: returned from a call to ‘void* operator new(long
unsigned int)’
   10 | char * p2 = new char;
  | ^~~~
90629-c2.C:17:19: warning: ‘void operator delete [](void*)’ called on pointer
returned from a mismatched allocation function [-Wmismatched-new-delete]
   17 | delete [] q2;
  |   ^~
90629-c2.C:16:36: note: returned from a call to ‘void* malloc(size_t)’
   16 | char * q2 = (char *) malloc( 10);
  |  ~~^


The call to free(q) isn't diagnosed in the simple test case because it's
eliminated by the cddce pass along with the new expression.

[Bug ada/97504] [11 regress] Ada bootstrap error after r11-4029

2020-10-20 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97504

--- Comment #1 from Andreas Schwab  ---
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556477.html

[Bug tree-optimization/97501] [11 Regression] ICE in verify_range, at value-range.cc:369, -O2 on dead overflowing nested loops since r11-3685-gfcae5121154d1c33

2020-10-20 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97501

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Aldy Hernandez  ---
fixed

[Bug tree-optimization/97501] [11 Regression] ICE in verify_range, at value-range.cc:369, -O2 on dead overflowing nested loops since r11-3685-gfcae5121154d1c33

2020-10-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97501

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Aldy Hernandez :

https://gcc.gnu.org/g:5d53ec27015b916640171e891870adf2c6fdfd4c

commit r11-4129-g5d53ec27015b916640171e891870adf2c6fdfd4c
Author: Aldy Hernandez 
Date:   Tue Oct 20 15:25:20 2020 +0200

Saturate overflows return from SCEV in ranger.

bounds_of_var_in_loop is returning an overflowed int, which is causing
us to create a range for which we can't compare the bounds causing
an ICE in verify_range.

Overflowed bounds cause compare_values() to return -2, which we
don't handle in verify_range.

We don't represent overflowed ranges in irange, so this patch just
saturates any overflowed end-points to MIN or MAX.

gcc/ChangeLog:

PR tree-optimization/97501
* gimple-range.cc
(gimple_ranger::range_of_ssa_name_with_loop_info):
Saturate overflows returned from SCEV.

gcc/testsuite/ChangeLog:

* gcc.dg/pr97501.c: New test.

[Bug ada/97504] New: [11 regress] Ada bootstrap error after r11-4029

2020-10-20 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97504

Bug ID: 97504
   Summary: [11 regress] Ada bootstrap error after r11-4029
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:1e70b1a358b6ce3b894f284d88fbb90518d45cc0, r11-4029 

/home/seurer/gcc/git/build/gcc-test/./gcc/xgcc
-B/home/seurer/gcc/git/build/gcc-test/./gcc/
-B/home/seurer/gcc/git/install/gcc-test/powerpc64le-unknown-linux-gnu/bin/
-B/home/seurer/gcc/git/install/gcc-test/powerpc64le-unknown-linux-gnu/lib/
-isystem
/home/seurer/gcc/git/install/gcc-test/powerpc64le-unknown-linux-gnu/include
-isystem
/home/seurer/gcc/git/install/gcc-test/powerpc64le-unknown-linux-gnu/sys-include
   -c -g -O2  -fPIC  -W -Wall -gnatpg -nostdinc   a-nallfl.ads -o a-nallfl.o
a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:48:13: warning: profile of "Sin" doesn't match the builtin it
binds
a-nallfl.ads:51:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:51:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:51:13: warning: profile of "Cos" doesn't match the builtin it
binds
a-nallfl.ads:54:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:54:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:54:13: warning: profile of "Tan" doesn't match the builtin it
binds
a-nallfl.ads:57:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:57:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:57:13: warning: profile of "Exp" doesn't match the builtin it
binds
a-nallfl.ads:60:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:60:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:60:13: warning: profile of "Sqrt" doesn't match the builtin it
binds
a-nallfl.ads:63:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:63:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:63:13: warning: profile of "Log" doesn't match the builtin it
binds
a-nallfl.ads:66:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:66:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:66:13: warning: profile of "Acos" doesn't match the builtin it
binds
a-nallfl.ads:69:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:69:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:69:13: warning: profile of "Asin" doesn't match the builtin it
binds
a-nallfl.ads:72:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:72:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:72:13: warning: profile of "Atan" doesn't match the builtin it
binds
a-nallfl.ads:75:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:75:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:75:13: warning: profile of "Sinh" doesn't match the builtin it
binds
a-nallfl.ads:78:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:78:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:78:13: warning: profile of "Cosh" doesn't match the builtin it
binds
a-nallfl.ads:81:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:81:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:81:13: warning: profile of "Tanh" doesn't match the builtin it
binds
a-nallfl.ads:84:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:84:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:84:13: warning: profile of "Pow" doesn't match the builtin it
binds
../gcc-interface/Makefile:302: recipe for target 'a-nallfl.o' failed
make[7]: *** [a-nallfl.o] Error 1

commit 1e70b1a358b6ce3b894f284d88fbb90518d45cc0 (HEAD)
Author: Alexandre Oliva 
Date:   Sun Oct 18 17:19:53 2020 -0300

[Bug preprocessor/97471] [11 Regression] ICE on using function-like macro as a non function-like macro since r11-338-g2a0225e47868fbfc

2020-10-20 Thread nathan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97471

Nathan Sidwell  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Nathan Sidwell  ---
dbcc6b1577b 2020-10-20 | preprocessor: Further fix for EOF in macro args
[PR97471]

score?

[Bug libstdc++/59325] Provide a way to disable deprecated warnings

2020-10-20 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59325

--- Comment #8 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #6)
> (In reply to andysem from comment #2)
> > (In reply to Jonathan Wakely from comment #1)
> > > You can use a #pragma to disable -Wdeprecated locally
> > 
> > But the legacy C++ is used in the library, which code I'd like to avoid
> > changing.
> 
> Is this still a problem? Uses of deprecated features within the library
> itself should no longer emit warnings (I think I've disabled them with
> pragmas).
> 
> So the only uses should be in our own code, which you can add pragmas to.

Yes, this is still a problem. To be clear, the library I was referring to was
not libstdc++ but another library which I have no control of and which is still
using C++03 features. I don't think it will move away from std::auto_ptr any
time soon as it is part of the API.

> If you want to ignore the warning because you know you're not going to use a
> different implementation or a newer standard, you can use -Wno-deprecated to
> disable all such warnings globally or use #pragma to disable them locally.

I don't want to disable _all_ deprecated warnings universally, only those
emitted by libstdc++. For this reason I cannot use -Wno-deprecated. #pragma
also doesn't really work because libstdc++ can be included from any header,
including those from other libraries, for which I don't want to disable
warnings.

[Bug c/97503] New: Suboptimal use of cntlzw and cntlzd

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

Bug ID: 97503
   Summary: Suboptimal use of cntlzw and cntlzd
   Product: gcc
   Version: 10.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.leroy at csgroup dot eu
  Target Milestone: ---

int f(int x)
{
return x ? __builtin_clz(x) : 32;
}

Is built as

 :
   0:   2c 03 00 00 cmpwi   r3,0
   4:   40 82 00 0c bne 10 
   8:   38 60 00 20 li  r3,32
   c:   4e 80 00 20 blr
  10:   7c 63 00 34 cntlzw  r3,r3
  14:   4e 80 00 20 blr


I would expect

 :
   0:   7c 63 00 34 cntlzw  r3,r3
   4:   4e 80 00 20 blr

Because cntlzw (Count Leading Zeros Word) is documentated in powerpc
instruction set as returning 0 to 32 inclusive

The same applies to the 64 bits version:

long f(long x)
{
return x ? __builtin_clzll(x) : 64;
}

 <.f>:
   0:   2c 23 00 00 cmpdi   r3,0
   4:   41 82 00 0c beq 10 <.f+0x10>
   8:   7c 63 00 74 cntlzd  r3,r3
   c:   4e 80 00 20 blr
  10:   38 60 00 40 li  r3,64
  14:   4e 80 00 20 blr

[Bug tree-optimization/97360] [11 Regression] ICE in range_on_exit

2020-10-20 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360

--- Comment #33 from rguenther at suse dot de  ---
On October 20, 2020 4:16:37 PM GMT+02:00, "bergner at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360
>
>--- Comment #32 from Peter Bergner  ---
>(In reply to Richard Biener from comment #31)
>> (In reply to Andrew Macleod from comment #30)
>> > On 10/19/20 6:40 PM, bergner at gcc dot gnu.org wrote:
>> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360
>> > >
>> > > --- Comment #28 from Peter Bergner 
>---
>> > > (In reply to Andrew Macleod from comment #25)
>> > >> Wonder if it was suppose to be something like:
>> > >>
>> > >>
>> > >> /* Vector pair and vector quad support.  */
>> > >> if (TARGET_EXTRA_BUILTINS)
>> > >>   {
>> > >> -  tree oi_uns_type = make_unsigned_type (256);
>> > >> -  vector_pair_type_node = build_distinct_type_copy
>(oi_uns_type);
>> > >> +  vector_pair_type_node = make_unsigned_type (256);
>> > >> SET_TYPE_MODE (vector_pair_type_node, POImode);
>> > >> layout_type (vector_pair_type_node);
>> > >> lang_hooks.types.register_builtin_type
>(vector_pair_type_node,
>> > >>"__vector_pair");
>> > >>   
>> > >> -  tree xi_uns_type = make_unsigned_type (512);
>> > >> -  vector_quad_type_node = build_distinct_type_copy
>(xi_uns_type);
>> > >> +  vector_quad_type_node = make_unsigned_type (512);
>> > >> SET_TYPE_MODE (vector_quad_type_node, PXImode);
>> > >> layout_type (vector_quad_type_node);
>> > >> lang_hooks.types.register_builtin_type
>(vector_quad_type_node,
>> > > So this passed bootstrap and regtesting with no regressions.
>> > >
>> > > Is this really the correct fix?
>> 
>> Yes.
>
>Just to verify, this is an approval for Andrew's patch above?

Yes. 

>If so, I can push it to trunk for Andrew.

[Bug c++/82239] Parentheses around constexpr template member break static_assert

2020-10-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82239

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Marek Polacek :

https://gcc.gnu.org/g:953277ba3fa39a9285cf89f59932b0169e7f6b9d

commit r11-4126-g953277ba3fa39a9285cf89f59932b0169e7f6b9d
Author: Marek Polacek 
Date:   Tue Oct 20 10:15:41 2020 -0400

c++: Add fixed test [PR82239]

This test was fixed by r256550 but that commit was fixing another issue,
and just happened to fix this too.

gcc/testsuite/ChangeLog:

PR c++/82239
* g++.dg/cpp0x/static_assert16.C: New test.

[Bug c++/82239] Parentheses around constexpr template member break static_assert

2020-10-20 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82239

Marek Polacek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Marek Polacek  ---
Fixed.

[Bug libstdc++/59325] Provide a way to disable deprecated warnings

2020-10-20 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59325

--- Comment #7 from Jonathan Wakely  ---
(In reply to andysem from comment #4)
> I just think that all these hoops could be avoided if libstdc++ was a little
> more friendly in this regard. After all, there's no harm in using e.g.
> auto_ptr in C++11 code, it surely won't disappear from STL any time soon, so
> the warning is a bit overreactive anyway.

Well it is gone from the recent standards, and is not longer defined by other
std::lib implementations when compiling with newer standards, e.g. libc++
doesn't define it in C++17 mode. That's exactly what the warning is telling
you: you're using a feature which is going away.

If you want to ignore the warning because you know you're not going to use a
different implementation or a newer standard, you can use -Wno-deprecated to
disable all such warnings globally or use #pragma to disable them locally.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #44 from Jakub Jelinek  ---
Then perhaps some backends need to be improved.
Try e.g.:
void bar (void);

void
foo (int x)
{
  if (__builtin_clz (x) == 32)
bar ();
}
with trunk GCC if you don't trust me.

[Bug libstdc++/59325] Provide a way to disable deprecated warnings

2020-10-20 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59325

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2020-10-20
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #6 from Jonathan Wakely  ---
(In reply to andysem from comment #2)
> (In reply to Jonathan Wakely from comment #1)
> > You can use a #pragma to disable -Wdeprecated locally
> 
> But the legacy C++ is used in the library, which code I'd like to avoid
> changing.

Is this still a problem? Uses of deprecated features within the library itself
should no longer emit warnings (I think I've disabled them with pragmas).

So the only uses should be in our own code, which you can add pragmas to.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #43 from Christophe Leroy  ---
(In reply to Christophe Leroy from comment #42)
> (In reply to Jakub Jelinek from comment #41)
> > It is documented to be undefined:
> >  -- Built-in Function: int __builtin_clz (unsigned int x)
> >  Returns the number of leading 0-bits in X, starting at the most
> >  significant bit position.  If X is 0, the result is undefined.
> > Especially GCC 11 (but e.g. clang too) will e.g. during value range
> > propagation assume that e.g. the builtin return value will be only 0 to 31,
> > not to 32, etc.
> > The portable way how to write this is x ? __builtin_clz (x) :
> > whatever_value_you_want_for_clz_0;
> > and the compiler should recognize that and if the instruction is well
> > defined for 0 and matches your choice, use optimal sequence.
> 
> int f(int x)
> {
>   return x ? __builtin_clz(x) : 32;
> }
> 
> Is compiled into (with -O2):
> 
>  :
>0: 2c 03 00 00 cmpwi   r3,0
>4: 40 82 00 0c bne 10 
>8: 38 60 00 20 li  r3,32
>c: 4e 80 00 20 blr
>   10: 7c 63 00 34 cntlzw  r3,r3
>   14: 4e 80 00 20 blr
> 
> 
> 
> Allthough
> 
> int g(void)
> {

int g(int x)
{
return __builtin_clz(0);
}

Gives

0018 :
  18:   38 60 00 20 li  r3,32
  1c:   4e 80 00 20 blr

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #42 from Christophe Leroy  ---
(In reply to Jakub Jelinek from comment #41)
> It is documented to be undefined:
>  -- Built-in Function: int __builtin_clz (unsigned int x)
>  Returns the number of leading 0-bits in X, starting at the most
>  significant bit position.  If X is 0, the result is undefined.
> Especially GCC 11 (but e.g. clang too) will e.g. during value range
> propagation assume that e.g. the builtin return value will be only 0 to 31,
> not to 32, etc.
> The portable way how to write this is x ? __builtin_clz (x) :
> whatever_value_you_want_for_clz_0;
> and the compiler should recognize that and if the instruction is well
> defined for 0 and matches your choice, use optimal sequence.

int f(int x)
{
return x ? __builtin_clz(x) : 32;
}

Is compiled into (with -O2):

 :
   0:   2c 03 00 00 cmpwi   r3,0
   4:   40 82 00 0c bne 10 
   8:   38 60 00 20 li  r3,32
   c:   4e 80 00 20 blr
  10:   7c 63 00 34 cntlzw  r3,r3
  14:   4e 80 00 20 blr



Allthough

int g(void)
{

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #41 from Jakub Jelinek  ---
It is documented to be undefined:
 -- Built-in Function: int __builtin_clz (unsigned int x)
 Returns the number of leading 0-bits in X, starting at the most
 significant bit position.  If X is 0, the result is undefined.
Especially GCC 11 (but e.g. clang too) will e.g. during value range propagation
assume that e.g. the builtin return value will be only 0 to 31, not to 32, etc.
The portable way how to write this is x ? __builtin_clz (x) :
whatever_value_you_want_for_clz_0;
and the compiler should recognize that and if the instruction is well defined
for 0 and matches your choice, use optimal sequence.

[Bug tree-optimization/97360] [11 Regression] ICE in range_on_exit

2020-10-20 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360

--- Comment #32 from Peter Bergner  ---
(In reply to Richard Biener from comment #31)
> (In reply to Andrew Macleod from comment #30)
> > On 10/19/20 6:40 PM, bergner at gcc dot gnu.org wrote:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360
> > >
> > > --- Comment #28 from Peter Bergner  ---
> > > (In reply to Andrew Macleod from comment #25)
> > >> Wonder if it was suppose to be something like:
> > >>
> > >>
> > >> /* Vector pair and vector quad support.  */
> > >> if (TARGET_EXTRA_BUILTINS)
> > >>   {
> > >> -  tree oi_uns_type = make_unsigned_type (256);
> > >> -  vector_pair_type_node = build_distinct_type_copy (oi_uns_type);
> > >> +  vector_pair_type_node = make_unsigned_type (256);
> > >> SET_TYPE_MODE (vector_pair_type_node, POImode);
> > >> layout_type (vector_pair_type_node);
> > >> lang_hooks.types.register_builtin_type (vector_pair_type_node,
> > >>"__vector_pair");
> > >>   
> > >> -  tree xi_uns_type = make_unsigned_type (512);
> > >> -  vector_quad_type_node = build_distinct_type_copy (xi_uns_type);
> > >> +  vector_quad_type_node = make_unsigned_type (512);
> > >> SET_TYPE_MODE (vector_quad_type_node, PXImode);
> > >> layout_type (vector_quad_type_node);
> > >> lang_hooks.types.register_builtin_type (vector_quad_type_node,
> > > So this passed bootstrap and regtesting with no regressions.
> > >
> > > Is this really the correct fix?
> 
> Yes.

Just to verify, this is an approval for Andrew's patch above?
If so, I can push it to trunk for Andrew.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

--- Comment #6 from Jakub Jelinek  ---
With
--- gcc/config/s390/vx-builtins.md.jj   2020-04-30 09:26:01.0 +0200
+++ gcc/config/s390/vx-builtins.md  2020-10-20 16:08:31.847698827 +0200
@@ -812,6 +812,16 @@
   DONE;
 })

+(define_expand "vec_cmp"
+  [(set (match_operand:VI_HW   0 "register_operand" "=v")
+   (intcmp:VI_HW (match_operand:VI_HW 1 "register_operand"  "v")
+ (match_operand:VI_HW 2 "register_operand"  "v")))]
+  "TARGET_VX"
+{
+  s390_expand_vec_compare (operands[0], , operands[1],
operands[2]);
+  DONE;
+})
+
 (define_expand "vec_cmp"
   [(set (match_operand:  0 "register_operand" "=v")
(fpcmp: (match_operand:VF_HW 1 "register_operand"  "v")
(couldn't just change the existing pattern easily as s390.c refers to the
current names) the testcase doesn't ICE anymore, but whether it does the right
thing is unclear.  Anyway, deferring to s390x maintainers now.

[Bug c++/82239] Parentheses around constexpr template member break static_assert

2020-10-20 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82239

Marek Polacek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
 CC||mpolacek at gcc dot gnu.org
   Last reconfirmed||2020-10-20
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org

--- Comment #1 from Marek Polacek  ---
Fixed by r256550 but the test is useful.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

--- Comment #5 from Jakub Jelinek  ---
The vector comparison optabs are:
OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
therefore they need two modes in their names - the first one is the value mode
and the second one is the mask mode.
But then the backend defines named patterns like:
vec_cmpeqv16qi
rather than
vec_cmpeqv16qiv16qi
I'd expect.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

--- Comment #4 from Richard Biener  ---
Or rather, it doesn't make sense to not have vec_cmp[u]{,eq} optabs.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

Richard Biener  changed:

   What|Removed |Added

 CC||krebbel at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
So it looks like vcond_ expand to a vector compare + vsel so it doesn't make
sense that vcond_ accepts cases that vcmp_ expanders reject.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

--- Comment #2 from Richard Biener  ---
So the following is problematic:

  vector(16)  _12;
  vector(16)  _14;
  vector(16)  _32;

  _14 = vect__1.7_3 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _12 = vect__2.10_15 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _32 = VEC_COND_EXPR <_12, _14, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0 }>;

using a VEC_COND_EXPR to combine two vector bools to another vector bool.
The above could be folded to ~_12 & _14 instead.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #40 from Christophe Leroy  ---
(In reply to Jakub Jelinek from comment #39)
> (In reply to Christophe Leroy from comment #38)
> > But on powerpc that's already the case and it doesn't solve the issue.
> > 
> > static inline int fls(unsigned int x)
> > {
> > return 32 - __builtin_clz(x);
> > }
> > 
> > static inline int fls64(__u64 x)
> > {
> > return 64 - __builtin_clzll(x);
> > }
> 
> That is clearly a kernel bug (__builtin_clz* is documented undefined for 0,
> while fls* wants to be well defined there), and shouldn't change anything,
> because
> in the if (__builtin_constant_p (size))
> case get_order doesn't use fls*, but ilog2.  And it is ilog2 that should be
> changed.

What's the bug ?

int f(int x)
{
  return __builtin_clz(x);
}

Compiles into

:
  cntlzw r3, r3
  blr


Powerpc 32 bits documentation says:

cntlzw : Count Leading Zeros Word

A count of the number of consecutive zero bits starting at bit 0 of rS is
placed into rA. This number ranges from 0 to 32, inclusive.

I can't see a problem when x == 0

[Bug c++/97010] C++20 ADL and function templates that are not visible (P0846R0) fails on call with templated type

2020-10-20 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97010

Marek Polacek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Marek Polacek  ---
Fixed.

[Bug c++/97010] C++20 ADL and function templates that are not visible (P0846R0) fails on call with templated type

2020-10-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97010

--- Comment #4 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Marek Polacek
:

https://gcc.gnu.org/g:257bbf154cd2b83eb02c99db73537e3b8ba3a6e7

commit r10-8915-g257bbf154cd2b83eb02c99db73537e3b8ba3a6e7
Author: Marek Polacek 
Date:   Thu Sep 10 17:27:43 2020 -0400

c++: Fix P0846 (ADL and function templates) in template [PR97010]

To quickly recap, P0846 says that a name is also considered to refer to
a template if it is an unqualified-id followed by a < and name lookup
finds either one or more functions or finds nothing.

In a template, when parsing a function call that has type-dependent
arguments, we can't perform ADL right away so we set KOENIG_LOOKUP_P in
the call to remember to do it when instantiating the call
(tsubst_copy_and_build/CALL_EXPR).  When the called function is a
function template, we represent the call with a TEMPLATE_ID_EXPR;
usually the operand is an OVERLOAD.

In the P0846 case though, the operand can be an IDENTIFIER_NODE, when
name lookup found nothing when parsing the template name.  But we
weren't handling this correctly in tsubst_copy_and_build.  First
we need to pass the FUNCTION_P argument from  to
, otherwise we give a bogus error.  And then in
 we need to perform ADL.  The rest of the changes is to
give better errors when ADL didn't find anything.

gcc/cp/ChangeLog:

PR c++/97010
* pt.c (tsubst_copy_and_build) : Call
tsubst_copy_and_build explicitly instead of using the RECUR macro.
Handle a TEMPLATE_ID_EXPR with an IDENTIFIER_NODE as its operand.
: Perform ADL for a TEMPLATE_ID_EXPR with an
IDENTIFIER_NODE as its operand.

gcc/testsuite/ChangeLog:

PR c++/97010
* g++.dg/cpp2a/fn-template21.C: New test.
* g++.dg/cpp2a/fn-template22.C: New test.

(cherry picked from commit 635072248a426c933c74ef4431e82401249b6218)

[Bug tree-optimization/97501] [11 Regression] ICE in verify_range, at value-range.cc:369, -O2 on dead overflowing nested loops since r11-3685-gfcae5121154d1c33

2020-10-20 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97501

Aldy Hernandez  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Aldy Hernandez  ---
bounds_of_var_in_loop is returning an overflowed int, which is causing
us to create a range for which we can't compare the bounds causing
an ICE in verify_range.

Overflowed bounds cause compare_values() to return -2, which we
don't handle in verify_range.

We don't represent overflowed ranges in irange, so this patch just
saturates any overflowed end-points to MIN or MAX.

Testing the following patch:

gcc/ChangeLog:

PR 97501/tree-optimization
* gimple-range.cc
(gimple_ranger::range_of_ssa_name_with_loop_info):
Saturate overflows returned from SCEV.

gcc/testsuite/ChangeLog:

* gcc.dg/pr97501.c: New test.

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 999d631c5ee..6ce9e52c691 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1140,9 +1140,9 @@ gimple_ranger::range_of_ssa_name_with_loop_info (irange
&r, tree name,
   // ?? We could do better here.  Since MIN/MAX can only be an
   // SSA, SSA +- INTEGER_CST, or INTEGER_CST, we could easily call
   // the ranger and solve anything not an integer.
-  if (TREE_CODE (min) != INTEGER_CST)
+  if (TREE_CODE (min) != INTEGER_CST || TREE_OVERFLOW (min))
min = vrp_val_min (type);
-  if (TREE_CODE (max) != INTEGER_CST)
+  if (TREE_CODE (max) != INTEGER_CST || TREE_OVERFLOW (max))
max = vrp_val_max (type);
   r.set (min, max);
 }
diff --git a/gcc/testsuite/gcc.dg/pr97501.c b/gcc/testsuite/gcc.dg/pr97501.c
new file mode 100644
index 000..aedac83962d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97501.c
@@ -0,0 +1,14 @@
+// { dg-do compile }
+// { dg-options "-O2" }
+
+static int c = 0;
+
+int main() {
+  int b = 0;
+  if (c) {
+  for (;; b--)
+do
+  b++;
+while (b);
+  }
+}

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #39 from Jakub Jelinek  ---
(In reply to Christophe Leroy from comment #38)
> But on powerpc that's already the case and it doesn't solve the issue.
> 
> static inline int fls(unsigned int x)
> {
>   return 32 - __builtin_clz(x);
> }
> 
> static inline int fls64(__u64 x)
> {
>   return 64 - __builtin_clzll(x);
> }

That is clearly a kernel bug (__builtin_clz* is documented undefined for 0,
while fls* wants to be well defined there), and shouldn't change anything,
because
in the if (__builtin_constant_p (size))
case get_order doesn't use fls*, but ilog2.  And it is ilog2 that should be
changed.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread christophe.leroy at csgroup dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #38 from Christophe Leroy  ---
(In reply to Jan Hubicka from comment #32)
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> > 
> > --- Comment #31 from Segher Boessenkool  ---
> > (In reply to Jan Hubicka from comment #27)
> > > It is because --param inline-insns-single was reduced for -O2 from 200
> > > to 70.  GCC 10 has newly different set of parameters for -O2 and -O3 and
> > > enables auto-inlining at -O2.
> > > 
> > > Problem with inlininig funtions declared inline is that C++ codebases
> > > tends to abuse this keyword for things that are really too large (and
> > > get_order would be such example if it did not have builtin_constant_p
> > > check which inliner does not understand well). So having same limit at
> > > -O2 and -O3 turned out to be problematic with respect to code size and
> > > especially with respect to LTO, where a lot more inlining oppurtunities
> > > appear.
> > 
> > Do the heuristics account for that not inlining a "static inline" results
> > in multiple copies?
> 
> It prevents inlining only when there are multiple calls in the unit
> being compiled (there is no way to know that the same inline function is
> duplicated in other units).
> This is what happens here: there are multiple calls so inliner concludes
> inlining would cost too much of code size and later they are optimized
> away.
> 
> get_order is a wrapper around ffs64.  This can be implemented w/o asm
> statement as follows:
> int
> my_fls64 (__u64 x)
> {
>   if (!x)
>   return 0;
>   return 64 - __builtin_clzl (x);
> }
> 
> This results in longer assembly than the kernel asm implementation. If
> that matters I would replace builtin_constnat_p part of get_order by this
> implementation that is more transparent to the code size estimation and
> things will get inlined.
> 

But on powerpc that's already the case and it doesn't solve the issue.

static inline int fls(unsigned int x)
{
return 32 - __builtin_clz(x);
}

static inline int fls64(__u64 x)
{
return 64 - __builtin_clzll(x);
}

[Bug target/97323] [10/11 Regression] ICE 'verify_type' failed on arm-linux-gnueabihf

2020-10-20 Thread doko at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97323

--- Comment #7 from Matthias Klose  ---
commit 872d5034baa1007606d405e37937908602fbbe51

[Bug tree-optimization/96129] [11 regression] gcc.dg/vect/vect-alias-check.c etc. FAIL

2020-10-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96129

--- Comment #4 from Kewen Lin  ---
As the regressed failures, it's highly suspected to be duplicated of PR96376.

[Bug target/97323] [10/11 Regression] ICE 'verify_type' failed on arm-linux-gnueabihf

2020-10-20 Thread doko at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97323

--- Comment #6 from Matthias Klose  ---
this is triggered by:

2015-05-19  Jan Hubicka  

   * tree.c (verify_type_variant): Fix #undef.
   (gimple_canonical_types_compatible_p): Move here from lto.c
   (verify_type): Verify TYPE_CANONICAL compatibility.
   * tree.h (gimple_canonical_types_compatible_p): Declare.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #1 from Martin Liška  ---
Maybe a dup of PR97270?

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #37 from Jan Hubicka  ---
Hi,
this implements the heuristics increasing bounds for functions having
__builtin_constant_p on parameters.  Note that for get_order this is
still not enough, since we increase the bound twice if hint applies, so
it goes from 70 to 140 and not to 190 needed, however it will handle
ohter similar cases.

If hint weight is increased to 300%, so 210 I get:
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build10/gcc$ ./xgcc -B ./ -O2
-Winline pipe.i --param inline-heuristics-hint-percent=300
In file included from fs/pipe.c:11:
./include/linux/slab.h: In function ‘alloc_pipe_info’:
./include/linux/slab.h:586:121: warning: inlining failed in call to
‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached
[-Winline]
./include/linux/slab.h:605:9: note: called from here
./include/linux/slab.h: In function ‘pipe_resize_ring’:
./include/linux/slab.h:586:121: warning: inlining failed in call to
‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached
[-Winline]
./include/linux/slab.h:605:9: note: called from here

So the problem only shifts to not inlininig kmalloc_array.
(that is why it would be nice to update kernel with the easier
get_order)

However it shows different problem: ipa-cp produces cone of
kmalloc_array since it is always used by constant size, but the clone
does not update the predicates, so we lose track about the parameter
being constant and that is why we optimize out only late.

Martin, I think this is caused by long lasting TODO in
ipa_fn_summary_t::duplicate and probably we should implement it: based
on the known partial assignment of params to constant we should fold the
conditions in predicates.

Indeed with ./xgcc -B ./ -O2 -Winline pipe.i  -fno-ipa-cp --param
inline-heuristics-hint-percent=300
the warning goes away.  We still need the stronger hint though.

Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread Jan Hubicka
Hi,
this implements the heuristics increasing bounds for functions having
__builtin_constant_p on parameters.  Note that for get_order this is
still not enough, since we increase the bound twice if hint applies, so
it goes from 70 to 140 and not to 190 needed, however it will handle
ohter similar cases.

If hint weight is increased to 300%, so 210 I get:
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build10/gcc$ ./xgcc -B ./ -O2 
-Winline pipe.i --param inline-heuristics-hint-percent=300
In file included from fs/pipe.c:11:
./include/linux/slab.h: In function ‘alloc_pipe_info’:
./include/linux/slab.h:586:121: warning: inlining failed in call to 
‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached
[-Winline]
./include/linux/slab.h:605:9: note: called from here
./include/linux/slab.h: In function ‘pipe_resize_ring’:
./include/linux/slab.h:586:121: warning: inlining failed in call to 
‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached 
[-Winline]
./include/linux/slab.h:605:9: note: called from here

So the problem only shifts to not inlininig kmalloc_array.
(that is why it would be nice to update kernel with the easier
get_order)

However it shows different problem: ipa-cp produces cone of
kmalloc_array since it is always used by constant size, but the clone
does not update the predicates, so we lose track about the parameter
being constant and that is why we optimize out only late.

Martin, I think this is caused by long lasting TODO in
ipa_fn_summary_t::duplicate and probably we should implement it: based
on the known partial assignment of params to constant we should fold the
conditions in predicates.

Indeed with ./xgcc -B ./ -O2 -Winline pipe.i  -fno-ipa-cp --param 
inline-heuristics-hint-percent=300
the warning goes away.  We still need the stronger hint though.
gcc/ChangeLog:

2020-10-20  Jan Hubicka  

PR c/97445
* ipa-fnsummary.c (ipa_dump_hints): Handle
INLINE_HINT_builtin_constant_p.
(ipa_fn_summary::~ipa_fn_summary): Free builtin_constant_p_parms.
(ipa_fn_summary_t::duplicate): Copy builtin_constant_p_parms.
(ipa_dump_fn_summary): Dump builtin_constant_p_parms.
(set_cond_stmt_execution_predicate): Compute builtin_constant_p_parms.
(ipa_call_context::estimate_size_and_time): Set
INLINE_HINT_builtin_constant_p.
(ipa_merge_fn_summary_after_inlining): Merge builtin_constant_p_parms.
(inline_read_section): Stream builtin_constant_p_parms.
(ipa_fn_summary_write): Stream builtin_constant_p_parms.
* ipa-fnsummary.h (enum ipa_hints_vals): Add
INLINE_HINT_builtin_constant_p.
(ipa_fn_summary): Add builtin_constant_p_parms.
* ipa-inline.c (want_inline_small_function_p): Handle
INLINE_HINT_builtin_constant_p.
(edge_badness): Handle INLINE_HINT_builtin_constant_p.

gcc/testsuite/ChangeLog:

2020-10-20  Jan Hubicka  

* gcc.dg/ipa/inlinehint-5.c: New test.


diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 9e3eda4d3cb..4292f1f5fe7 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -141,6 +141,11 @@ ipa_dump_hints (FILE *f, ipa_hints hints)
   hints &= ~INLINE_HINT_known_hot;
   fprintf (f, " known_hot");
 }
+  if (hints & INLINE_HINT_builtin_constant_p)
+{
+  hints &= ~INLINE_HINT_builtin_constant_p;
+  fprintf (f, " builtin_constant_p");
+}
   gcc_assert (!hints);
 }
 
@@ -751,6 +756,7 @@ ipa_fn_summary::~ipa_fn_summary ()
   vec_free (call_size_time_table);
   vec_free (loop_iterations);
   vec_free (loop_strides);
+  vec_free (builtin_constant_p_parms);
 }
 
 void
@@ -805,6 +811,10 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
  that are known to be false or true.  */
   info->conds = vec_safe_copy (info->conds);
 
+  if (info->builtin_constant_p_parms)
+info->builtin_constant_p_parms
+= vec_safe_copy (info->builtin_constant_p_parms);
+
   /* When there are any replacements in the function body, see if we can figure
  out that something was optimized out.  */
   if (ipa_node_params_sum && dst->clone.tree_map)
@@ -1066,6 +1076,13 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
fprintf (f, " inlinable");
  if (s->fp_expressions)
fprintf (f, " fp_expression");
+ if (s->builtin_constant_p_parms)
+   {
+ fprintf (f, " builtin_constant_p_parms");
+ for (unsigned int i = 0;
+  i < s->builtin_constant_p_parms->length (); i++)
+   fprintf (f, " %i", (*s->builtin_constant_p_parms)[i]);
+   }
  fprintf (f, "\n  global time: %f\n", s->time.to_double ());
  fprintf (f, "  self size:   %i\n", ss->self_size);
  fprintf (f, "  global size: %i\n", ss->size);
@@ -1598,6 +1615,8 @@ set_cond_stmt_execution_predicate (struct 
ipa_func_body_info *fbi,
   op2 = gimple_call_arg (set_stmt, 0);
   if (!decompose_param_ex

[Bug tree-optimization/97501] [11 Regression] ICE in verify_range, at value-range.cc:369, -O2 on dead overflowing nested loops since r11-3685-gfcae5121154d1c33

2020-10-20 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97501

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2020-10-20
 CC||amacleod at redhat dot com,
   ||marxin at gcc dot gnu.org
  Known to work||10.2.0
Summary|[11 Regression] ICE in  |[11 Regression] ICE in
   |verify_range, at|verify_range, at
   |value-range.cc:369, -O2 on  |value-range.cc:369, -O2 on
   |dead overflowing nested |dead overflowing nested
   |loops   |loops since
   ||r11-3685-gfcae5121154d1c33
  Known to fail||11.0

--- Comment #1 from Martin Liška  ---
Thanks for the report, started with r11-3685-gfcae5121154d1c33.

[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
   Priority|P3  |P1

[Bug bootstrap/97502] New: [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502

Bug ID: 97502
   Summary: [11 Regression] PGO bootstrap failure on s390x-linux
with -march=z13 starting with r11-3426
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

In profiledbootstrap on s390x-linux --with-arch=z13, I get an ICE during
feedback compilation of s390.c:
/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-s390x-redhat-linux/./prev-gcc/xg++
-B/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-
s390x-redhat-linux/./prev-gcc/ -B/usr/s390x-redhat-linux/bin/ -nostdinc++
-B/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-s390x-redhat-lin
ux/prev-s390x-redhat-linux/libstdc++-v3/src/.libs
-B/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-s390x-redhat-linux/prev-s390x-redhat-lin
ux/libstdc++-v3/libsupc++/.libs 
-I/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-s390x-redhat-linux/prev-s390x-redhat-linux/libstdc++-v3/i
nclude/s390x-redhat-linux 
-I/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-s390x-redhat-linux/prev-s390x-redhat-linux/libstdc++-v3/include
  -I/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/libstdc++-v3/libsupc++
-L/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-s390x-redhat
-linux/prev-s390x-redhat-linux/libstdc++-v3/src/.libs
-L/home/nfs/jakub/rpmbuild/BUILD/gcc-11.0.0-20201019/obj-s390x-redhat-linux/prev-s390x-redhat
-linux/libstdc++-v3/libsupc++/.libs  -fno-PIE -c   -O2 -g -Wall
-Wformat-security -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
-fstack-protector-strong -
grecord-gcc-switches -march=z13 -mtune=z13 -fasynchronous-unwind-tables
-fstack-clash-protection -fprofile-use -DIN_GCC -fno-exceptions -fno-rt
ti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-
virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings  
-DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/. -I../../gcc/../in
clude -I../../gcc/../libcpp/include  -I../../gcc/../libdecnumber
-I../../gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/../libbacktrace  
-o
 s390.o -MT s390.o -MMD -MP -MF ./.deps/s390.TPo ../../gcc/config/s390/s390.c
...
during RTL pass: expand
../../gcc/config/s390/s390.c: In function 'void s390_register_info()':
../../gcc/config/s390/s390.c:9761:1: internal compiler error: in do_store_flag,
at expr.c:12394
 9761 | s390_register_info ()
  | ^~
0x163e15d do_store_flag
../../gcc/expr.c:12394
0x163e959 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../gcc/expr.c:9627
0x16475f9 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:10171
0x16488b9 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc/expr.c:8486
0x1756145 expand_normal
../../gcc/expr.h:288
0x1756145 expand_vect_cond_optab_fn
../../gcc/internal-fn.c:2604
0x1756145 expand_VCONDU
../../gcc/internal-fn.def:144
0x14d085b expand_gimple_stmt
../../gcc/cfgexpand.c:3851
0x14d085b expand_gimple_basic_block
../../gcc/cfgexpand.c:5892
0x14d31ad execute
../../gcc/cfgexpand.c:6576
Please submit a full bug report,
with preprocessed source if appropriate.

I've managed to reduce it to a small self-contained testcase that fortunately
doesn't even need PGO:

extern char v[54];
void bar (char *);
void
foo (void)
{
  int i;
  char c[32];
  bar (c);
  for (i = 0; i < 32; i++)
c[i] = c[i] && !v[i];
  bar (c);
}

ICEs the same way with -O3 -march=z13 starting with
r11-3426-g10843f8303509fcba880c6c05c08e4b4ccd24f36

[Bug c/97493] generate wrong code with "-Os -fno-toplevel-reorder -frename-registers"

2020-10-20 Thread suochenyao at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97493

suochenyao at 163 dot com  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

[Bug c/97493] generate wrong code with "-Os -fno-toplevel-reorder -frename-registers"

2020-10-20 Thread suochenyao at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97493

--- Comment #4 from suochenyao at 163 dot com  ---
I made mistake, Jelinek is right.
This testcase is invalid and I can not reproduce it as well.
Sorry for wasting your time, my fault.
I apologize for wasting you time.

[Bug tree-optimization/97501] [11 Regression] ICE in verify_range, at value-range.cc:369, -O2 on dead overflowing nested loops

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97501

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
   Priority|P3  |P1
 CC||aldyh at gcc dot gnu.org

[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers

2020-10-20 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #6 from Andreas Krebbel  ---
Alternatively I could also mark r12 as preserved across function calls for
-fpic in the backend. In fact all the bits we care about are preserved. Since
the register is fixed all the accesses do come from the backend itself.

That's similar to what I was trying with the fixed_regs hack. But I agree that
this might not be correct in general.

The full fix is probably to track the exact parts of partially clobbered regs
which stay live but this would be a major change.

[Bug tree-optimization/96129] [11 regression] gcc.dg/vect/vect-alias-check.c etc. FAIL

2020-10-20 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96129

--- Comment #3 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #2 from Richard Biener  ---
> Quite some revs, two vectorizer changes.  Do the FAILs still occur?

Both still do.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #36 from Jan Hubicka  ---
> Find attached the temporary files for net/core/skbuff.c, as it is indeed what
> initially triggered my focus on this issue. I don't expect such a function at
> all:
> 
> 0cc8 :
>  cc8:   48 00 00 00 b   cc8 
> cc8: R_PPC_REL24__csum_partial
> 
> 
> Every function should call __csum_partial() directly.

Here csum_artial is:

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((__no_instrument_function__)) __wsum csum_add(__wsum csum, __wsum
addend)
{
 if (__builtin_constant_p(csum) && csum == 0)
  return addend;
 if (__builtin_constant_p(addend) && addend == 0)
  return csum;
 asm("addc %0,%0,%1;"
 "addze %0,%0;"
 : "+r" (csum) : "r" (addend) : "xer");
 return csum;
}
static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((__no_instrument_function__)) __wsum csum_partial(const void
*buff, int len, __wsum sum)
{
 if (__builtin_constant_p(len) && len <= 16 && (len & 1) == 0) {
  if (len == 2)
   sum = csum_add(sum, ( __wsum)*(const u16 *)buff);
  if (len >= 4)
   sum = csum_add(sum, ( __wsum)*(const u32 *)buff);
  if (len == 6)
   sum = csum_add(sum, ( __wsum)
 *(const u16 *)(buff + 4));
  if (len >= 8)
   sum = csum_add(sum, ( __wsum)
 *(const u32 *)(buff + 4));
  if (len == 10)
   sum = csum_add(sum, ( __wsum)
 *(const u16 *)(buff + 8));
  if (len >= 12)
   sum = csum_add(sum, ( __wsum)
 *(const u32 *)(buff + 8));
  if (len == 14)
   sum = csum_add(sum, ( __wsum)
 *(const u16 *)(buff + 12));
  if (len >= 16)
   sum = csum_add(sum, ( __wsum)
 *(const u32 *)(buff + 12));
 } else if (__builtin_constant_p(len) && (len & 3) == 0) {
  sum = csum_add(sum, ip_fast_csum_nofold(buff, len >> 2));
 } else {
  sum = __csum_partial(buff, len, sum);
 }
 return sum;
}
So again it expands to really large decision tree with many
builtion_constant_p checks that makes inliner to give up.
You should see all such cases easilly with -Winline

Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread Jan Hubicka
> 
> Original asm is:
> 
> __attribute__ ((noinline))
> int fls64(__u64 x)
> {
>  int bitpos = -1;
>  asm("bsrq %1,%q0"
>  : "+r" (bitpos)
>  : "rm" (x));
>  return bitpos + 1;
> }
> 
> There seems to be bug in bsr{q} pattern.  I can make GCC produce same
> code with:
> 
> __attribute__ ((noinline))
> int
> my_fls64 (__u64 x)
> {
>   asm volatile ("movl $-1, %eax");
>   return (__builtin_clzll (x) ^ 63) + 1;
> }

Aha, bsr is not doing anything if parameter is 0, so pattern is correct
(just the instruction is undefined for 0 which makes sense).
But with that pattern GCC can't synthetize the code sequence above :)

Honza


[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #35 from Jan Hubicka  ---
> 
> Original asm is:
> 
> __attribute__ ((noinline))
> int fls64(__u64 x)
> {
>  int bitpos = -1;
>  asm("bsrq %1,%q0"
>  : "+r" (bitpos)
>  : "rm" (x));
>  return bitpos + 1;
> }
> 
> There seems to be bug in bsr{q} pattern.  I can make GCC produce same
> code with:
> 
> __attribute__ ((noinline))
> int
> my_fls64 (__u64 x)
> {
>   asm volatile ("movl $-1, %eax");
>   return (__builtin_clzll (x) ^ 63) + 1;
> }

Aha, bsr is not doing anything if parameter is 0, so pattern is correct
(just the instruction is undefined for 0 which makes sense).
But with that pattern GCC can't synthetize the code sequence above :)

Honza

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #34 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> 
> --- Comment #33 from Jakub Jelinek  ---
> (In reply to Jan Hubicka from comment #32)
> > get_order is a wrapper around ffs64.  This can be implemented w/o asm
> > statement as follows:
> > int
> > my_fls64 (__u64 x)
> > {
> >   if (!x)
> >   return 0;
> >   return 64 - __builtin_clzl (x);
> > }
> > 
> > This results in longer assembly than the kernel asm implementation. If
> > that matters I would replace builtin_constnat_p part of get_order by this
> > implementation that is more transparent to the code size estimation and
> > things will get inlined.
> 
> Better __builtin_clzll so that it works also on 32-bit arches.
> Anyway, if kernel's fls64 results in better code than the my_fls64, we should
> look at GCC's code generation for that case.

Original asm is:

__attribute__ ((noinline))
int fls64(__u64 x)
{
 int bitpos = -1;
 asm("bsrq %1,%q0"
 : "+r" (bitpos)
 : "rm" (x));
 return bitpos + 1;
}

There seems to be bug in bsr{q} pattern.  I can make GCC produce same
code with:

__attribute__ ((noinline))
int
my_fls64 (__u64 x)
{
  asm volatile ("movl $-1, %eax");
  return (__builtin_clzll (x) ^ 63) + 1;
}

But obviously the volatile asm should not be needed.  I think bsrq is
incorrectly modelled as returning full register

(define_insn "bsr_rex64"
  [(set (match_operand:DI 0 "register_operand" "=r")
(minus:DI (const_int 63)
  (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_64BIT"
  "bsr{q}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "prefix_0f" "1")
   (set_attr "znver1_decode" "vector")
   (set_attr "mode" "DI")])

Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread Jan Hubicka
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> 
> --- Comment #33 from Jakub Jelinek  ---
> (In reply to Jan Hubicka from comment #32)
> > get_order is a wrapper around ffs64.  This can be implemented w/o asm
> > statement as follows:
> > int
> > my_fls64 (__u64 x)
> > {
> >   if (!x)
> >   return 0;
> >   return 64 - __builtin_clzl (x);
> > }
> > 
> > This results in longer assembly than the kernel asm implementation. If
> > that matters I would replace builtin_constnat_p part of get_order by this
> > implementation that is more transparent to the code size estimation and
> > things will get inlined.
> 
> Better __builtin_clzll so that it works also on 32-bit arches.
> Anyway, if kernel's fls64 results in better code than the my_fls64, we should
> look at GCC's code generation for that case.

Original asm is:

__attribute__ ((noinline))
int fls64(__u64 x)
{
 int bitpos = -1;
 asm("bsrq %1,%q0"
 : "+r" (bitpos)
 : "rm" (x));
 return bitpos + 1;
}

There seems to be bug in bsr{q} pattern.  I can make GCC produce same
code with:

__attribute__ ((noinline))
int
my_fls64 (__u64 x)
{
  asm volatile ("movl $-1, %eax");
  return (__builtin_clzll (x) ^ 63) + 1;
}

But obviously the volatile asm should not be needed.  I think bsrq is
incorrectly modelled as returning full register

(define_insn "bsr_rex64"
  [(set (match_operand:DI 0 "register_operand" "=r")
(minus:DI (const_int 63)
  (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_64BIT"
  "bsr{q}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "prefix_0f" "1")
   (set_attr "znver1_decode" "vector")
   (set_attr "mode" "DI")])



[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #33 from Jakub Jelinek  ---
(In reply to Jan Hubicka from comment #32)
> get_order is a wrapper around ffs64.  This can be implemented w/o asm
> statement as follows:
> int
> my_fls64 (__u64 x)
> {
>   if (!x)
>   return 0;
>   return 64 - __builtin_clzl (x);
> }
> 
> This results in longer assembly than the kernel asm implementation. If
> that matters I would replace builtin_constnat_p part of get_order by this
> implementation that is more transparent to the code size estimation and
> things will get inlined.

Better __builtin_clzll so that it works also on 32-bit arches.
Anyway, if kernel's fls64 results in better code than the my_fls64, we should
look at GCC's code generation for that case.

And, perhaps kernel's const_ilog2 should be reimplemented using __builtin_clz*?
Or, maybe even better, keep const_ilog2 as is because as it is declared it
should be usable even in pedantic C constant expressions, and just change ilog2
to:
#define ilog2(n) \
( \
__builtin_constant_p(n) ?   \
((n) < 2 ? 0 : 63 - __builtin_clzll (n)) : \
(sizeof(n) <= 4) ?  \
__ilog2_u32(n) :\
__ilog2_u64(n)  \
 )

[Bug libstdc++/95917] coroutine functions leak under freestanding mode causing dependencies and binary bloat.

2020-10-20 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95917

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jonathan Wakely  ---
.

[Bug libstdc++/95917] coroutine functions leak under freestanding mode causing dependencies and binary bloat.

2020-10-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95917

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:94fd05f1f76faca9dc9033b55d44c960155d38e9

commit r11-4120-g94fd05f1f76faca9dc9033b55d44c960155d38e9
Author: Jonathan Wakely 
Date:   Tue Oct 20 11:19:58 2020 +0100

libstdc++: Define noop coroutine details private and inline [PR 95917]

This moves the __noop_coro_frame type, the __noop_coro_fr global
variable, and the __dummy_resume_destroy function from namespace scope,
replacing them with private members of the specialization
coroutine_handle.

The function and variable are also declared inline, so that they
generate no code unless used.

libstdc++-v3/ChangeLog:

PR libstdc++/95917
* include/std/coroutine (__noop_coro_frame): Replace with
noop_coroutine_handle::__frame.
(__dummy_resume_destroy): Define inline in __frame.
(__noop_coro_fr): Replace with noop_coroutine_handle::_S_fr
and define as inline.
* testsuite/18_support/coroutines/95917.cc: New test.

[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers

2020-10-20 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #5 from rsandifo at gcc dot gnu.org  
---
I think the problem is a disconnect between compute_transp
and the code in gcse.c itself.  compute_transp considers %r12
to be transparent in all blocks despite the partial clobbers.
But whether that's true is context-dependent.

I think the fix is to make transp also handle partial clobbers
in a conservative way.  I don't have a specific suggestion how
to do that yet.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #32 from Jan Hubicka  ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> 
> --- Comment #31 from Segher Boessenkool  ---
> (In reply to Jan Hubicka from comment #27)
> > It is because --param inline-insns-single was reduced for -O2 from 200
> > to 70.  GCC 10 has newly different set of parameters for -O2 and -O3 and
> > enables auto-inlining at -O2.
> > 
> > Problem with inlininig funtions declared inline is that C++ codebases
> > tends to abuse this keyword for things that are really too large (and
> > get_order would be such example if it did not have builtin_constant_p
> > check which inliner does not understand well). So having same limit at
> > -O2 and -O3 turned out to be problematic with respect to code size and
> > especially with respect to LTO, where a lot more inlining oppurtunities
> > appear.
> 
> Do the heuristics account for that not inlining a "static inline" results
> in multiple copies?

It prevents inlining only when there are multiple calls in the unit
being compiled (there is no way to know that the same inline function is
duplicated in other units).
This is what happens here: there are multiple calls so inliner concludes
inlining would cost too much of code size and later they are optimized
away.

get_order is a wrapper around ffs64.  This can be implemented w/o asm
statement as follows:
int
my_fls64 (__u64 x)
{
  if (!x)
  return 0;
  return 64 - __builtin_clzl (x);
}

This results in longer assembly than the kernel asm implementation. If
that matters I would replace builtin_constnat_p part of get_order by this
implementation that is more transparent to the code size estimation and
things will get inlined.

Honza

Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread Jan Hubicka
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> 
> --- Comment #31 from Segher Boessenkool  ---
> (In reply to Jan Hubicka from comment #27)
> > It is because --param inline-insns-single was reduced for -O2 from 200
> > to 70.  GCC 10 has newly different set of parameters for -O2 and -O3 and
> > enables auto-inlining at -O2.
> > 
> > Problem with inlininig funtions declared inline is that C++ codebases
> > tends to abuse this keyword for things that are really too large (and
> > get_order would be such example if it did not have builtin_constant_p
> > check which inliner does not understand well). So having same limit at
> > -O2 and -O3 turned out to be problematic with respect to code size and
> > especially with respect to LTO, where a lot more inlining oppurtunities
> > appear.
> 
> Do the heuristics account for that not inlining a "static inline" results
> in multiple copies?

It prevents inlining only when there are multiple calls in the unit
being compiled (there is no way to know that the same inline function is
duplicated in other units).
This is what happens here: there are multiple calls so inliner concludes
inlining would cost too much of code size and later they are optimized
away.

get_order is a wrapper around ffs64.  This can be implemented w/o asm
statement as follows:
int
my_fls64 (__u64 x)
{
  if (!x)
  return 0;
  return 64 - __builtin_clzl (x);
}

This results in longer assembly than the kernel asm implementation. If
that matters I would replace builtin_constnat_p part of get_order by this
implementation that is more transparent to the code size estimation and
things will get inlined.

Honza


[Bug libstdc++/95917] coroutine functions leak under freestanding mode causing dependencies and binary bloat.

2020-10-20 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95917

Jonathan Wakely  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2020-10-20
  Component|c++ |libstdc++
   Target Milestone|--- |11.0
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org

[Bug sanitizer/97414] AddressSanitizer CHECK failed: detect_stack_use_after_return and detect_invalid_pointer_pairs

2020-10-20 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97414

Martin Liška  changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING

--- Comment #4 from Martin Liška  ---
Waiting for upstream to apply the fix.

[Bug tree-optimization/97501] New: [11 Regression] ICE in verify_range, at value-range.cc:369, -O2 on dead overflowing nested loops

2020-10-20 Thread mwindsor at imperial dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97501

Bug ID: 97501
   Summary: [11 Regression] ICE in verify_range, at
value-range.cc:369, -O2 on dead overflowing nested
loops
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mwindsor at imperial dot ac.uk
  Target Milestone: ---

When compiling code including an (unreachable) nested pair of over/underflowing
loops, recent git versions of GCC11 experience an ICE in value-range.cc:369. 
This seems a different bug from #96430, #97317, and #97467: eg it contains no
bitshifts or casts, and exhibits on snapshots after the most recent of those
bugs was fixed.

I am seeing this on git revision `f0c0f124ebe28b71abccbd7247678c9ac608b649`
(fetched 2020-10-20) on x86_64-pc-linux-gnu, on which this report is based.  I
have also seen this on, for instance, revision
`8949b985dbaf07d433bd57d2883e1e5414f20e75` (fetched 2020-10-13) on
powerpc64le-unknown-linux-gnu.  GCC 10.2.0 on x86_64-apple-darwin19 does NOT
exhibit the ICE, so I presume this is a GCC11 regression.  I'm unsure of the
first revision to exhibit this at the moment.

Considering the following example (`mwe-full.c`; preprocessing gives the same
file modulo line directives):

===

static int c = 0;

int main() {
  int b = 0;
  if (c) {
  for (;; b--)
do
  b++;
while (b);
  }
}

===

(This comes from a C-Reduced version which didn't have the check on `c`; I
added this to avoid run-time-live UB, but the bug exhibits without it)

`gcc -O2 -v -save-temps mwe-full.c` gives me:

===

Using built-in specs.
COLLECT_GCC=/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/bin/gcc
COLLECT_LTO_WRAPPER=/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /data/mwindsor/old_compilers/gcc-snapshots/git/configure
--prefix=/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20201020 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-O2' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
'-dumpdir' 'a-'

/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/cc1
-E -quiet -v -imultiarch x86_64-linux-gnu mwe-full.c -mtune=generic
-march=x86-64 -O2 -fpch-preprocess -o a-mwe-full.i
ignoring non-existent directory "/usr/local/include/x86_64-linux-gnu"
ignoring non-existent directory
"/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:

/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/lib/gcc/x86_64-pc-linux-gnu/11.0.0/include
 /usr/local/include
 /data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/include

/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/lib/gcc/x86_64-pc-linux-gnu/11.0.0/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-O2' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
'-dumpdir' 'a-'

/data/mwindsor/old_compilers/gcc-snapshots/2020-10-20/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/cc1
-fpreprocessed a-mwe-full.i -quiet -dumpdir a- -dumpbase mwe-full.c
-dumpbase-ext .c -mtune=generic -march=x86-64 -O2 -version -o a-mwe-full.s
GNU C17 (GCC) version 11.0.0 20201020 (experimental) (x86_64-pc-linux-gnu)
compiled by GNU C version 11.0.0 20201020 (experimental), GMP version
6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C17 (GCC) version 11.0.0 20201020 (experimental) (x86_64-pc-linux-gnu)
compiled by GNU C version 11.0.0 20201020 (experimental), GMP version
6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 8980ce301bb16f6f8206916710982a55
during GIMPLE pass: evrp
mwe-full.c: In function ‘main’:
mwe-full.c:11:1: internal compiler error: in verify_range, at
value-range.cc:369
   11 | }
  | ^
0x78f55b irange::verify_range()
/data/mwindsor/old_compilers/gcc-snapshots/git/gcc/value-range.cc:369
0x10d9d03 irange::irange_set(tree_node*, tree_node*)
/data/mwindsor/old_compilers/gcc-snapshots/git/gcc/value-range.cc:172
0x10d9d03 irange::set(tree_node*, tree_node*, value_range_kind)
/data/mwindsor/old_compilers/gcc-snapshots/git/gcc/value-range.cc:226
0x10d8f4c irange::operator=(irange const&)
/d

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #31 from Segher Boessenkool  ---
(In reply to Jan Hubicka from comment #27)
> It is because --param inline-insns-single was reduced for -O2 from 200
> to 70.  GCC 10 has newly different set of parameters for -O2 and -O3 and
> enables auto-inlining at -O2.
> 
> Problem with inlininig funtions declared inline is that C++ codebases
> tends to abuse this keyword for things that are really too large (and
> get_order would be such example if it did not have builtin_constant_p
> check which inliner does not understand well). So having same limit at
> -O2 and -O3 turned out to be problematic with respect to code size and
> especially with respect to LTO, where a lot more inlining oppurtunities
> appear.

Do the heuristics account for that not inlining a "static inline" results
in multiple copies?

> I will implement the heuristics to push up inline limits of functions
> having builtin_constant_p of parameter which should help a bit in this
> case

Thank you!

> (but not very systematically: as dicussed in the PR log it is quite
> hard problem to get builtin_constant_p right in the code size metrics
> used by inliner before it knows exactly what is going to be constant and
> what is not).

That is true for many other inlining things as well...  builtin_constant_p
is worse than most I guess ;-)

[Bug tree-optimization/97500] [11 Regression] ICE in vect_schedule_slp_instance, at tree-vect-slp.c:5094 since r11-3823-g126ed72b9f48f853

2020-10-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97500

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener  ---
I have a patch.

[Bug libstdc++/55394] Using call_once without -lpthread compiles without warning

2020-10-20 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55394

--- Comment #11 from Jonathan Wakely  ---
It's not plausible because it doesn't work for non-pthreads targets where
gthr-default.h is not gthr-posix.h

We can't use pthread_once anyway, see PR 66146, so I'm rewriting it entirely in
terms of either futexes or condition variables.

[Bug middle-end/97487] [8/9/10/11 Regression] ICE in expand_simple_binop, at optabs.c:939 since r8-3977

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97487

Jakub Jelinek  changed:

   What|Removed |Added

   Keywords|openmp  |
Summary|[10/11 Regression] ICE in   |[8/9/10/11 Regression] ICE
   |expand_simple_binop, at |in expand_simple_binop, at
   |optabs.c:939 since  |optabs.c:939 since r8-3977
   |r10-1420-g744fd446c321f78f  |

--- Comment #3 from Jakub Jelinek  ---
Also, the PR has nothing to do with OpenMP:
-O2 --param max-rtl-if-conversion-unpredictable-cost=0 on x86_64-linux ICEs
too:
typedef long long int V __attribute__((vector_size (16)));

long long int
foo (V x, V y)
{
  long long int t1 = y[0];
  long long int t2 = x[0];
  long long int t3;
  if (t2 < 0)
t3 = t1;
  else
t3 = 0;
  return t3;
}
And that started with r8-3977-gef9eec0b599d533b58e29fe0c0bf6435e5368378 (latent
before because of different costs).

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

--- Comment #6 from Hongtao.liu  ---
(In reply to Alexander Monakov from comment #5)
> afaict LRA is just following IRA decisions, and IRA allocates that pseudo to
> memory due to costs.
> 
> Not sure where strange cost is coming from, but it depends on x86 tuning
> options: with -mtune=skylake we get the expected code, with -mtune=haswell
> we get 128-bit vectors right and extra load for 256-bit, with -mtune=generic
> both cases have extra loads.

in 

  /* If this insn loads a parameter from its stack slot, then it
 represents a savings, rather than a cost, if the parameter is
 stored in memory.  Record this fact.

 Similarly if we're loading other constants from memory (constant
 pool, TOC references, small data areas, etc) and this is the only
 assignment to the destination pseudo.

 Don't do this if SET_SRC (set) isn't a general operand, if it is
 a memory requiring special instructions to load it, decreasing
 mem_cost might result in it being loaded using the specialized
 instruction into a register, then stored into stack and loaded
 again from the stack.  See PR52208.

 Don't do this if SET_SRC (set) has side effect.  See PR56124.  */
  if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set))
  && (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX
  && ((MEM_P (XEXP (note, 0))
   && !side_effects_p (SET_SRC (set)))
  || (CONSTANT_P (XEXP (note, 0))
  && targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)),
XEXP (note, 0))
  && REG_N_SETS (REGNO (SET_DEST (set))) == 1))
  && general_operand (SET_SRC (set), GET_MODE (SET_SRC (set)))
  /* LRA does not use equiv with a symbol for PIC code.  */
  && (! ira_use_lra_p || ! pic_offset_table_rtx
  || ! contains_symbol_ref_p (XEXP (note, 0
{
  enum reg_class cl = GENERAL_REGS;
  rtx reg = SET_DEST (set);
  int num = COST_INDEX (REGNO (reg));

  COSTS (costs, num)->mem_cost
-= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency;
  record_address_regs (GET_MODE (SET_SRC (set)),
   MEM_ADDR_SPACE (SET_SRC (set)),
   XEXP (SET_SRC (set), 0), 0, MEM, SCRATCH,
   frequency * 2);
  counted_mem = true;
}
---

for 

(insn 9 8 11 3 (set (reg:V2DI 88 [ _16 ])
(mem:V2DI (plus:DI (reg/v/f:DI 91 [ input ])
(reg:DI 89 [ ivtmp.11 ])) [0 MEM[(const __m128i *
{ref-all})input_7(D) + ivtmp.11_40 * 1]+0 S16 A128]))
"/export/users2/liuhongt/tools-build/build_gcc11_master_debug/gcc/include/emmintrin.h":697:10
1405 {movv2di_internal}

mem_cost for r88 would minus ira_memory_move_cost[V2DImode][GENERAL_REGS][1],
and got -11808 as an initial value, but for reality it should minus
ira_memory_move_cost[V2DImode][SSE_REGS][1], then have -5905 as an initial
value. It seems it adds too much preference to memory here.

Then in the later record_operand_costs, when ira found r88 would also be used
in shift and ior instruction, the mem_cost for r88 increases, but still smaller 
than costs of SSE_REGS because we add too much preference to memory in the
upper. Finally, ira would choose memory for r88 because it has lowest cost and
it's suboptimal.

a10(r88,l1) costs: SSE_FIRST_REG:0,0 NO_REX_SSE_REGS:0,0 SSE_REGS:0,0
MEM:-984,-984

[Bug libstdc++/55394] Using call_once without -lpthread compiles without warning

2020-10-20 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55394

--- Comment #10 from Sergei Trofimovich  ---
How about something as simple as:

--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -697,7 +697,12 @@ static inline int
 __gthread_once (__gthread_once_t *__once, void (*__func) (void))
 {
   if (__gthread_active_p ())
-return __gthrw_(pthread_once) (__once, __func);
+/*
+ * Call non-weak form of a symbol to force linker
+ * resolve real implementation of pthread_once even
+ * for as-needed case: https://gcc.gnu.org/PR55394.
+ */
+return pthread_once (__once, __func);
   else
 return -1;
 }

Superficial test fixes "-flto -Wl,--as-needed" case above for me.

The fix leaves '__gthread_active_p ()' to probe weak symbols but calls non-weak
form of pthread_once().

If it's a plausible path I can have a pass through all static inline functions,
 and convert weak calls to non-weak calls where glibc-2.32 does not define a
symbol in libc.so.6.

[Bug sanitizer/97478] Cross Build from windows to linux. It looks like the sys/timeb.h header file does not exist in latest glibc any more

2020-10-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97478

--- Comment #4 from Jakub Jelinek  ---
Well, it looks like glibc might be reverting that change:
https://sourceware.org/pipermail/libc-alpha/2020-October/118791.html
Apparently that removal also breaks SPEC2k6 and SPEC2017.
So let's wait and see.

  1   2   >