[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-10 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #36 from Richard Henderson  ---
(In reply to Mayshao-oc from comment #34)
> (In reply to Jakub Jelinek from comment #17)
> > Fixed for AMD on the library side too.
> > We need a statement from Zhaoxin and VIA for their CPUs.
> 
> Sorry for the late reply.
> We guarantee that VMOVDQA will be an atomic load or store provided 128 bit
> aligned address in Zhaoxin processors, provided that the memory type is WB.
> Can we extend this patch to Zhaoxin processors as well?

Is VMOVDQU atomic, provided the address is aligned in Zhaoxin processors?

In QEMU, we make use of this additional guarantee from AMD.
We also reference this gcc bugzilla entry for documentation.  :-)

[Bug tree-optimization/112296] __builtin_constant_p doesn't propagate through member functions

2023-10-31 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112296

--- Comment #9 from Richard Henderson  ---
> Thanks.  So yes,
> 
> macro(x++);
> 
> incrementing x twice would have been odd - but that's the usual bug
> in this kind of macro definition.  Fixing it by throwing away
> side-effects (and always going the out_of_line_function (x) path!)
> for the __builtin_constant_p argument is an odd choice.

In the beginning __builtin_constant_p was resolved immediately,
so formulating this as

#define macro(x) \
  ({ __typeof(x) _x = (x); \
 __builtin_constant_p(_x) })

would always return false, defeating the purpose.

> The execute.exp testcase suggests the intention but the testcases
> verification is somewhat incomplete (it lacks verifying the side-effects
> are gone).

That's probably my omission.  ;-)

[Bug tree-optimization/112296] __builtin_constant_p doesn't propagate through member functions

2023-10-31 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112296

--- Comment #7 from Richard Henderson  ---
(In reply to Richard Biener from comment #5)
> int bad1(void) { return __builtin_constant_p(global++); }
...
> Joseph, Richard, do you have anything to add or remember discussions about
> this semantic detail of __builtin_constant_p?

Since it has been 25 years, I don't recall any specific discussions.

The intended use-case at the time was more like

#define macro(x) \
  (__builtin_constant_p(x) \
   ? inline_expression(x)  \
   : out_of_line_function(x))

So I would have expected side effects to have been ignored for
the builtin and expanded via one of the two arms.

[Bug ipa/108470] New: Missing documentation for alternate uses of __attribute__((noinline))

2023-01-19 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108470

Bug ID: 108470
   Summary: Missing documentation for alternate uses of
__attribute__((noinline))
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

The noinline attribute affects decisions made by ipa-split.cc
and ipa-icf.cc that are not immediately obvious.

At least the ipa-split choice affects code correctness for QEMU
(in that we expect __builtin_return_address to be used in very
specific contexts, and the transformation done by ipa-split
invalidates that).  Using noinline on the affected functions
prevents the ipa-split optimization and restores functionality.

It would be nice to document the effect, so that the workaround
is not affected in future gcc versions.

[Bug middle-end/107389] Always propagate __builtin_assume_aligned

2022-10-26 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107389

Richard Henderson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
Summary|Alignment not inferred from |Always propagate
   |type at -O0 |__builtin_assume_aligned
Version|unknown |12.2.1
 Ever confirmed|0   |1
   Last reconfirmed||2022-10-26
   Severity|normal  |enhancement

--- Comment #4 from Richard Henderson  ---
Rename and re-categorise as enhancement.

As mentioned, in comments above, need to
be able to rely on __builtin_assume_aligned
even with -O0 to avoid link errors when
not including libatomic.

[Bug c/107389] Alignment not inferred from type at -O0

2022-10-25 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107389

--- Comment #3 from Richard Henderson  ---
If __builtin_assume_aligned were to work at -O0,
that would also work for me.  Better, probably,
than fiddling with __attribute__((aligned)).

[Bug c/107389] New: Alignment not inferred from type at -O0

2022-10-25 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107389

Bug ID: 107389
   Summary: Alignment not inferred from type at -O0
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

Consider

typedef __uint128_t aligned_type __attribute__((aligned(16)));
_Static_assert(__alignof(aligned_type) == 16);
__uint128_t foo(aligned_type *p) { return __atomic_load_n(p, 0); }

For s390x, atomic_loadti should expand this to LPQ.

For my purposes, it must also do this at -O0, not just with
optimization.  But the alignment seen by gen_atomic_loadti
is only 8, so it FAILs the expansion and falls back to libatomic.

The following appears to solve the problem:

--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -468,8 +468,11 @@ get_pointer_alignment_1
}
   else
{
+ /* Assume alignment from the type. */
+ tree ptr_type = TREE_TYPE (exp);
+ tree obj_type = TREE_TYPE (ptr_type);
+ *alignp = TYPE_ALIGN (obj_type);
  *bitposp = 0;
- *alignp = BITS_PER_UNIT;
  return false;
}
 }

but I have an inkling that would have undesired effects
for other usages.  If so, perhaps a special case could be
made for the usage in get_builtin_sync_mem.

[Bug c/105131] New: Warning for mismatched declaration/definition with enum

2022-04-01 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105131

Bug ID: 105131
   Summary: Warning for mismatched declaration/definition with
enum
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

For a testcase such as

enum E { l = -1, z = 0, g = 1 };
int foo(void);
enum E foo(void) { return z; }

If the implementation type of 'enum E' is not 'int',
we will correctly emit an error (e.g. -fshort-enums).

It would be desirable to emit a warning in this case,
because it is probably a mistake and definitely a
portability error.

[Bug middle-end/99696] lto looks past aliases to initializers

2021-03-21 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99696

--- Comment #1 from Richard Henderson  ---
Actually, I can reproduce this with gcc 9.3 as well.
My upstream bug report simply uses gcc 11, so I assumed.

[Bug middle-end/99696] New: lto looks past aliases to initializers

2021-03-21 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99696

Bug ID: 99696
   Summary: lto looks past aliases to initializers
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

The following is a c-ish version of

  const int y = init();

which no longer works with gcc 11.

The intended advantage to the program from which this is
extracted is that references to Y may be cse'd across calls.

IMO this should work fine with LTO, so long as it does not
apply the constant initializer optimization to const variables
that are aliased.

Compile: gcc -O2 -flto y?.c

--- y1.c ---
#include 
extern const int y;
int main(void)
{
assert(y == 1);
return 0;
}

--- y2.c ---
static int x;
extern const int y __attribute__((alias("x")));
static void __attribute__((constructor)) init(void)
{
x = 1;
}

[Bug target/97323] [10/11 Regression] ICE 'verify_type' failed on arm-linux-gnueabihf

2020-10-30 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97323

--- Comment #10 from Richard Henderson  ---
Created attachment 49473
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49473=edit
rfc patch

The following fixes the ICE.
It seems like a hack, done at the wrong level.

Should we have in fact set TYPE_STRUCTURAL_EQUALITY_P all the way
back on the unaligned 'a' type, before we even try to create an
array of 'a'?  If so, that would have properly triggered the test
here in build_array_type_1 that would have bypassed the problem.

[Bug target/97323] [10/11 Regression] ICE 'verify_type' failed on arm-linux-gnueabihf

2020-10-30 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97323

--- Comment #9 from Richard Henderson  ---
As a data point, this problem can be seen with any
strict-alignment target -- e.g. sparc.

[Bug middle-end/94543] New: missed optimization with MIN and AND with type promotion

2020-04-09 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94543

Bug ID: 94543
   Summary: missed optimization with MIN and AND with type
promotion
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

unsigned f(unsigned short x) { return (x > 0xff ? 0xff : x) & 0xff; }

cmpw$255, %di
movl$255, %eax
cmova   %eax, %edi
movzwl  %di, %eax
ret

The final AND is of course redundant.  The optimizer removes it
for wider types, but fails to do so when promoting from short.

[Bug target/94174] Missed ccmp optimizations

2020-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94174

--- Comment #2 from Richard Henderson  ---
Case 3:

void test3(__int128 a, unsigned long l)
{
  if ((__int128_t)a - l <= 1)
doit(); 
}

currently generates as

subsx0, x0, x2
sbc x1, x1, xzr
cmp x1, 0
ble .L11
.L7:
ret
.L11:
bne .L10
cmp x0, 1
bhi .L7
.L10:
b   doit

but at least the bne + cmp can be 

ccmp x0, 1, #2, eq

Note that clang attempts a branchless double-word comparison

subsx8, x0, x2
sbcsx9, x1, xzr
cmp x8, #1
csetw8, hi
cmp x9, #0
csetw9, gt
cselw8, w8, w9, eq
tbnzw8, #0, .LBB0_2

we can do better than that:

subsx8, x0, x2
sbcsx9, x1, xzr
// x9 < 0 || (x9 == 0 && x8 <= 1)
csetx10, lt
ccmpx8, #1, #2, ne(nzCv: eq -> hi)
ccmpx10, #0, #4, hi   (nZcv: ls -> eq)
b.eq.L10

It's not 100% clear this is better than the 2 branch
version (with the ccmp), but at least it's no larger.

[Bug target/94174] Missed ccmp optimizations

2020-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94174

Richard Henderson  changed:

   What|Removed |Added

Summary|__builtin_add_overflow vs   |Missed ccmp optimizations
   |ccmp|
 Target||aarch64-*

--- Comment #1 from Richard Henderson  ---
Case 2:

void test2(unsigned long a, unsigned long l)
{
  if (l + 1 == 0 || a <= l + 1)
doit();
}

currently generates as

cmn x1, #1
beq .L13
add x1, x1, 1
cmp x1, x0
bcc .L12

but could be

addsx2, x1, #1
ccmpx0, x2, #0, ne
b.hi.L12

[Bug target/94174] New: __builtin_add_overflow vs ccmp

2020-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94174

Bug ID: 94174
   Summary: __builtin_add_overflow vs ccmp
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

(1) Case 1:

void doit(void);
void test(unsigned long a, unsigned long l)
{
  if (!__builtin_add_overflow(a, 8 - 1, ) && a <= l)
doit();
}

currently generates as

addsx0, x0, #7
csetx2, cs
cmp x0, x1
eor w2, w2, 1
csetw0, ls
tst w0, w2
bne .L22

but could be

addsx2, x0, #7
ccmpx2, x1, #0, cc
b.ls.L22

[Bug target/93768] New: Use vpternlog for composite logical operations

2020-02-16 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93768

Bug ID: 93768
   Summary: Use vpternlog for composite logical operations
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

We should pattern-match multiple logical operations
into the ternary logical operator.  While there are
lots of obscure combinations available, probably the
most useful are

  Two-input inverted logicals:
  0x11  ~(B|C)
  0x77  ~(B)
  0x99  ~(B^C)
  0xbb  C|~B
  0xdd  B|~C

  Three-input simple logicals:
  0x80  A
  0x96  A^B^C
  0xfe  A|B|C

  Multiple alternatives of the ?: operation, which allows
  the memory-capable operand, C, in various positions, and
  allows the input-output operand, A, in various positions:
  0xe2  B?A:C
  0xe4  C?A:B
  0xb8  B?C:A
  0xd8  C?B:A
  0xca  A?B:C
  0xac  A?C:B

[Bug target/92902] jump tables are put into the text section

2019-12-11 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92902

--- Comment #14 from Richard Henderson  ---
The only reason I can think for jump tables to be put into the text
section is the old aout format, which didn't have a separate read
only data section.  There should be no reason to do that these days.

[Bug target/68543] [AArch64] Implement overflow arithmetic standard names {u,}{add,sub,mul}v4 and/or negv3

2019-11-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68543

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |9.0

--- Comment #8 from Richard Henderson  ---
This feature was added in r262890, included in gcc 9.

[Bug target/91833] [10 Regression] [AArch64] LSE atomics depends on glibc specific sys/auxv.h

2019-09-25 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91833

--- Comment #6 from Richard Henderson  ---
Author: rth
Date: Wed Sep 25 22:51:55 2019
New Revision: 276134

URL: https://gcc.gnu.org/viewcvs?rev=276134=gcc=rev
Log:
aarch64: Configure for sys/auxv.h in libgcc for lse-init.c

PR target/91833
* config/aarch64/lse-init.c: Include auto-target.h.  Disable
initialization if !HAVE_SYS_AUXV_H.
* configure.ac (AC_CHECK_HEADERS): Add sys/auxv.h.
* config.in, configure: Rebuild.

Modified:
trunk/libgcc/ChangeLog
trunk/libgcc/config.in
trunk/libgcc/config/aarch64/lse-init.c
trunk/libgcc/configure   (contents, props changed)
trunk/libgcc/configure.ac

Propchange: trunk/libgcc/configure
('svn:executable' added)

[Bug target/91834] [10 Regression ] [AArch64] LSE atomics, warnings due to unpredictable behavior with strx and the same register for s and t

2019-09-25 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91834

--- Comment #5 from Richard Henderson  ---
Author: rth
Date: Wed Sep 25 21:48:41 2019
New Revision: 276133

URL: https://gcc.gnu.org/viewcvs?rev=276133=gcc=rev
Log:
aarch64: Fix store-exclusive in load-operate LSE helpers

PR target/91834
* config/aarch64/lse.S (LDNM): Ensure STXR output does not
overlap the inputs.

Modified:
trunk/libgcc/ChangeLog
trunk/libgcc/config/aarch64/lse.S

[Bug target/91833] [10 Regression] [AArch64] LSE atomics depends on glibc specific sys/auxv.h

2019-09-20 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91833

--- Comment #5 from Richard Henderson  ---
Ah, I've been using the old time one tree build.
I'll try building aarch64-elf in pieces as I fix.

[Bug target/91833] [10 Regression] [AArch64] LSE atomics depends on glibc specific sys/auxv.h

2019-09-20 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91833

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #2 from Richard Henderson  ---
Hmm.  When I built aarch64-elf for newlib, configure automatically
sets inhibit_libc, which avoids the whole issue.

While a test vs __GLIBC__ might work, a configure test vs sys/auxv.h
is probably better.  I see current musl also provides this header.

Pekka, what library are you using?

[Bug target/91834] [10 Regression ] [AArch64] LSE atomics, warnings due to unpredictable behavior with strx and the same register for s and t

2019-09-20 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91834

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #4 from Richard Henderson  ---
Ack.

[Bug c/91765] New: -Wredundant-decls conflicts with __attribute__((alias))

2019-09-13 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91765

Bug ID: 91765
   Summary: -Wredundant-decls conflicts with
__attribute__((alias))
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

/* Header file */
extern int bar;

/* Source file */
static int foo;
extern int bar __attribute__((alias("foo")));

--

For this test case, "gcc -c -Wredundant-decls z.c" produces

z.c:6:12: warning: redundant redeclaration of ‘bar’ [-Wredundant-decls]
6 | extern int bar __attribute__((alias("foo")));
  |^~~
z.c:2:12: note: previous declaration of ‘bar’ was here
2 | extern int bar;
  |^~~

However, the alias line is not a redundant decl, but rather the
actual definition of the symbol "bar".  The syntax of the alias,
for whatever reason, requires the use of the extern keyword.

Remove the extern from the alias and we get

z.c:6:5: error: ‘bar’ defined both normally and as ‘alias’ attribute
6 | int bar __attribute__((alias("foo")));
  | ^~~

This also seems like a bug, but it's probably years too late to
change the syntax of the definition of aliases.

[Bug tree-optimization/91504] New: Inlining misses some logical operation folding

2019-08-20 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91504

Bug ID: 91504
   Summary: Inlining misses some logical operation folding
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

In the following test case,

static inline unsigned deposit32(unsigned value, int start, int length,
 unsigned fieldval)
{
unsigned mask = (~0U >> (32 - length)) << start;
return (value & ~mask) | ((fieldval << start) & mask);
}

unsigned foo(unsigned value)
{
   return deposit32(value, 10, 1, 1);
}

unsigned bar(unsigned value)
{
int start = 10;
int length = 1;
unsigned fieldval = 1;
unsigned mask = (~0U >> (32 - length)) << start;
return (value & ~mask) | ((fieldval << start) & mask);
}

One would expect FOO and BAR to compile to the same code, since
the latter is the trivial inlining of the former, but that does
not happen.

gcc -O2 -S z.c gives

foo:
mvn w1, w0
and w1, w1, 1024
eor w0, w1, w0
ret

bar:
orr w0, w0, 1024
ret

The problem appears independent of target, as it isvisible on
both x86 and aarch64 targets.

[Bug target/88547] New: missed optimization for vector comparisons

2018-12-18 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

Bug ID: 88547
   Summary: missed optimization for vector comparisons
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

typedef signed svec __attribute__((vector_size(16)));
typedef unsigned uvec __attribute__((vector_size(16)));

svec les(svec x, svec y) {
return x <= y;
}

uvec leu(uvec x, uvec y) {
return x <= y;
}

currently assemble to 

les:
vpcmpgtd  %xmm1, %xmm0, %xmm0
vpcmpeqd  %xmm1, %xmm1, %xmm1
vpandn%xmm1, %xmm0, %xmm0

leu:
vmovdqa64 .LC0(%rip), %xmm2
vpsubd%xmm2, %xmm1, %xmm1
vpsubd%xmm2, %xmm0, %xmm0
vpcmpgtd  %xmm1, %xmm0, %xmm0
vpcmpeqd  %xmm1, %xmm1, %xmm1
vpandn%xmm1, %xmm0, %xmm0

By using the transformation min(x, y) == x we can produce

les:
vpminsd   %xmm1, %xmm0, %xmm1
vpcmpeqd  %xmm1, %xmm0, %xmm0

leu:
vpminud   %xmm1, %xmm0, %xmm1
vpcmpeqd  %xmm0, %xmm1, %xmm0

This can be used to reduce unsigned comparisons without requiring
the use of a constant bias vector.  At least when the given min insn
is available in the architecture.

[Bug target/86774] Alpha port needs updating for CVE-2017-5753

2018-08-01 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86774

Richard Henderson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-08-01
 CC||mattst88 at gmail dot com,
   ||uros at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Henderson  ---
Browsing back through old manuals, I think the only way to ensure
a barrier against speculation is the HW_REI/STALL instruction, but
that is available only to PALcode.

In an ideal world, we'd roll out a new PALcode entry point that
did nothing but issue a memory barrier and return with a stall.
Sadly, there will never again be a firmware update for Alphas.

Probably the best we can really do is just a trap barrier plus
a memory barrier.  It's not a complete fix, but it would narrow
the window.

Unless someone can find an existing non-privileged PALcode entry
point that just so happens to end in HW_REI/STALL?  I'm having
trouble tracking down any sources at the moment -- many of the
old domains have lapsed.

[Bug target/86541] Use SSE to emulate __attribute__((vector_size(8)))

2018-07-17 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86541

--- Comment #2 from Richard Henderson  ---
(In reply to Richard Biener from comment #1)
> Given that we have a target pass that makes use of SSE regs for scalar
> operations I wonder if it would make more sense to attack this at the
> target level by claiming native support for vector_size(8) and using
> a target pass to make that work.  As you said the most simple way is to
> movlhps %xmmN, %xmmN at strategic places.  That very thing could be
> also done by tree-vect-generic.c of course.

I was really thinking to support V8QImode et al in the md file.

Consider e.g. mulv16qi3, for which there is no 8-bit multiply
support in the ISA.  We expand to 2 unpacks, 2 mulv8qi3,
2 zero-extend, 1 repack.  By expanding mulv8qi3 in the backend,
we can halve the amount of work.

However, if we "lower" at the generic level, we'll not be able
to see that half of the V16QImode expansion is dead, and wind
up doing twice as much work as necessary.

However, I can also see the value in not replicating *all* of
those patterns in the backend, for a feature of limited use.

[Bug target/86541] New: Use SSE to emulate __attribute__((vector_size(8)))

2018-07-16 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86541

Bug ID: 86541
   Summary: Use SSE to emulate __attribute__((vector_size(8)))
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

In order to be more compatible across platforms, it would be
helpful if vector_size(8) was better supported for i386/x86_64.

The vast majority of the operations can be supported easily
with existing vector_size(16) instructions, and using either
(V)MOVQ to zero-extend the input or VPBROADCASTD/MOVDDUP to
replicate the input across the xmm register.

For integer operations it probably doesn't matter, but fp
operations would have different exception characteristics
with a zero-extension.  Replicating the inputs across the
lanes would avoid extra fp exceptions.

[Bug target/84010] [sparc64] Problematic TLS code generation

2018-01-24 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #6 from Richard Henderson  ---
For better rematerialization, I wonder if it wouldn't be better
to represent this as

(set (reg:P tmp)
 (const:P (unspec [(symbol_ref "xxx")] UNSPEC_TLSIE)))

prior to reload, and split to sethi+add+ld only after reload.

That makes the rhs CONSTANT_P, which gets us into the right
sort of paths in lra and reload.  It also means we'll never
spill just part of the expression.

[Bug target/84010] [sparc64] Bad TLS code generation

2018-01-23 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #3 from Richard Henderson  ---
(In reply to James Clarke from comment #2)
> Here is a completely untested patch which should in theory resolve this
> series of issues. This doesn't introduce rematerialization for them (or, if
> it's supposed to already, resolve whatever's stopping it), but that's a
> future optimisation. I'm surprised these are even being scheduled so far
> apart and spilled, but hey.

Modulo the exact pattern names, and handling tls modes beyond IE,
this is pretty much the same patch that I have just tested.

Thanks.

[Bug target/84010] New: [sparc64] Bad TLS code generation

2018-01-23 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

Bug ID: 84010
   Summary: [sparc64] Bad TLS code generation
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

Created attachment 43224
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43224=edit
preprocessed input and assembly output

Seen in top-of-tree qemu:

lduw[%fp+1823], %g1
...
ldx [%l7 + %g1], %g1, %tie_ldx(tcg_ctx)
...
ldx [%g7+%g1], %o1

Linker relaxation of R_SPARC_TLS_IE_LDX requires the input to be the
output of R_SPARC_TLS_IE_HI/LO.  However, the tie_ld64 pattern uses
an SImode input.  This led to spill/fill that dropped the sign-extension
of the value.

In addition to fixing the mode, we should think about letting reload
reconstitute the value -- it's just sethi+or pair.

command-line:

cc -I/home/rth/qemu/bld/sparc64-softmmu/../target/sparc -Itarget/sparc
-I/home/rth/qemu/qemu/tcg -I/home/rth/qemu/qemu/tcg/sparc
-I/home/rth/qemu/qemu/linux-headers -I/home/rth/qemu/bld/linux-headers -I.
-I/home/rth/qemu/qemu -I/home/rth/qemu/qemu/accel/tcg
-I/home/rth/qemu/qemu/include -I/usr/include/pixman-1
-I/home/rth/qemu/qemu/dtc/libfdt -Werror -pthread -I/usr/include/glib-2.0
-I/usr/lib/sparc64-linux-gnu/glib-2.0/include -m64 -mcpu=ultrasparc
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes
-fno-strict-aliasing -fno-common -fwrapv  -Wexpansion-to-defined -Wendif-labels
-Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body
-Wnested-externs -Wformat-security -Wformat-y2k -Winit-self
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition
-Wtype-limits -fstack-protector-strong -I/home/rth/qemu/qemu/capstone/include 
-I../linux-headers -I.. -I/home/rth/qemu/qemu/target/sparc -DNEED_CPU_H
-I/home/rth/qemu/qemu/include  -MMD -MP -MT target/sparc/translate.o -MF
target/sparc/translate.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g   -c -o
target/sparc/translate.o /home/rth/qemu/qemu/target/sparc/translate.c
--save-temps

[Bug target/80848] /crtend.o(.eh_frame); no .eh_frame_hdr table will be created

2017-05-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80848

Richard Henderson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||rth at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #3 from Richard Henderson  ---
Dup.

*** This bug has been marked as a duplicate of bug 80037 ***

[Bug libgcc/80037] Bad .eh_frame data in crtend.o

2017-05-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

Richard Henderson  changed:

   What|Removed |Added

 CC||mitalis at iiitd dot ac.in

--- Comment #8 from Richard Henderson  ---
*** Bug 80848 has been marked as a duplicate of this bug. ***

[Bug libgcc/80037] Bad .eh_frame data in crtend.o

2017-05-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |6.5

--- Comment #6 from Richard Henderson  ---
Fixed.

[Bug libgcc/80037] Bad .eh_frame data in crtend.o

2017-05-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

--- Comment #5 from Richard Henderson  ---
Author: rth
Date: Fri May 26 19:33:19 2017
New Revision: 248526

URL: https://gcc.gnu.org/viewcvs?rev=248526=gcc=rev
Log:
PR libgcc/80037

 Backport from mainline
 * config/alpha/t-alpha (CRTSTUFF_T_CFLAGS): New.


Modified:
branches/gcc-6-branch/libgcc/ChangeLog
branches/gcc-6-branch/libgcc/config/alpha/t-alpha

[Bug libgcc/80037] Bad .eh_frame data in crtend.o

2017-05-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

--- Comment #4 from Richard Henderson  ---
Author: rth
Date: Fri May 26 19:29:46 2017
New Revision: 248525

URL: https://gcc.gnu.org/viewcvs?rev=248525=gcc=rev
Log:
PR libgcc/80037

 Backport from mainline
 * config/alpha/t-alpha (CRTSTUFF_T_CFLAGS): New.


Modified:
branches/gcc-7-branch/libgcc/ChangeLog
branches/gcc-7-branch/libgcc/config/alpha/t-alpha

[Bug libgcc/80037] Bad .eh_frame data in crtend.o

2017-05-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

--- Comment #3 from Richard Henderson  ---
Author: rth
Date: Fri May 26 18:45:59 2017
New Revision: 248522

URL: https://gcc.gnu.org/viewcvs?rev=248522=gcc=rev
Log:
PR libgcc/80037

 * config/alpha/t-alpha (CRTSTUFF_T_CFLAGS): New.

Modified:
trunk/libgcc/ChangeLog
trunk/libgcc/config/alpha/t-alpha

[Bug libgcc/80037] Bad .eh_frame data in crtend.o

2017-03-13 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

Richard Henderson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-03-13
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Henderson  ---
Created attachment 40967
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40967=edit
proposed patch

Testing this as a fix.

[Bug libgcc/80037] New: Bad .eh_frame data in crtend.o

2017-03-13 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

Bug ID: 80037
   Summary: Bad .eh_frame data in crtend.o
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

Looks similar to PR40332, but targeting alpha.

The cause is gcc writing gas directives for unwind for
__do_global_ctors_aux, and also writing the terminator
for the .eh_frame section via the static variable __FRAME_END__.

This results in one CIE entry after the terminator, causing
ld to complain.

For normal applications, the program still runs ok, although
the lack of .eh_frame_hdr is much less than ideal if c++
exceptions are involved.

However, dejagnu treats this extra output as an error.
So test results for alpha are unusable at present.

[Bug tree-optimization/71916] [6/7 Regression] ICE at -O3 on valid code on x86_64-linux-gnu in "maybe_record_trace_start"

2016-07-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71916

--- Comment #8 from Richard Henderson  ---
(gdb) call debug_cfi_row(cur_row)
.cfi_def_cfa 7, 16
.cfi_offset 3, -16
.cfi_offset 16, -8
(gdb) call debug_cfi_row(ti->beg_row)
.cfi_def_cfa 7, 8
.cfi_offset 16, -8

(gdb) call debug_rtx(start)
(code_label 377 83 109 25 46 "" [3 uses])

(gdb) call debug_rtx(origin)
(jump_insn:TI 187 186 395 36 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 377)
(pc))) z.c:16 635 {*jcc_1}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 9500 (nil)))
 -> 377)

On one code path we've saved RBX, and on another code path we haven't.
The label in question appears in jump2, presumably jump threading, but
I haven't actually traced the blocks to be sure this is true.

I imagine it's got something to do with the infinite loop vs shrink-wrapping.
We perhaps ought to have added a restore of RBX, but didn't because it
doesn't appear to be needed.

[Bug rtl-optimization/71636] Missed optimization in variable alignment test

2016-06-23 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71636

--- Comment #1 from Richard Henderson  ---
Oh, and I meant to mention -- if the target doesn't have an andnot
insn, both formulations are identical in complexity and minimal path.

Which might suggest *always* performing the transformation at a
high level, letting the andnot be used if it happens to be available.

[Bug rtl-optimization/71636] New: Missed optimization in variable alignment test

2016-06-23 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71636

Bug ID: 71636
   Summary: Missed optimization in variable alignment test
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

The following functions are equivalent,

unsigned f(unsigned x, unsigned b)
{
  return x & ((1U << b) - 1);
}

unsigned g(unsigned x, unsigned b)
{
  return x & ~(~0U << b);
}

If the target has an andnot insn, G is shorter:

aarch64:
f:  mov w2, 1
lsl w1, w2, w1
sub w2, w1, #1
and w0, w2, w0

g:  mov w2, -1
lsl w1, w2, w1
bic w0, w0, w1


x86_64 (-march=haswell):
f:  movl$1, %edx
shlx%esi, %edx, %eax
subl$1, %eax
andl%edi, %eax

g:  movl$-1, %edx
shlx%esi, %edx, %esi
andn%edi, %esi, %eax

[Bug preprocessor/69391] [5 Regression] Incorrect __LINE__ expansion with -ftrack-macro-expansion=0 on g++5.2

2016-04-06 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69391

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
Summary|[5/6 Regression] Incorrect  |[5 Regression] Incorrect
   |__LINE__ expansion with |__LINE__ expansion with
   |-ftrack-macro-expansion=0   |-ftrack-macro-expansion=0
   |on g++5.2   |on g++5.2

--- Comment #7 from Richard Henderson  ---
Fixed for gcc6.

[Bug preprocessor/60723] Line directives with incorrect system header flag

2016-04-06 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60723
Bug 60723 depends on bug 61817, which changed state.

Bug 61817 Summary: Inconsistent location of tokens in the expansion list of a 
built-in macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61817

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug preprocessor/61817] Inconsistent location of tokens in the expansion list of a built-in macro

2016-04-06 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61817

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Richard Henderson  ---
Fixed.

[Bug preprocessor/61817] Inconsistent location of tokens in the expansion list of a built-in macro

2016-04-06 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61817

--- Comment #4 from Richard Henderson  ---
Author: rth
Date: Wed Apr  6 18:35:16 2016
New Revision: 234794

URL: https://gcc.gnu.org/viewcvs?rev=234794=gcc=rev
Log:
PR preprocessor/61817
PR preprocessor/69391

  * internal.h (_cpp_builtin_macro_text): Update decl.
  * macro.c (_cpp_builtin_macro_text): Accept location for __LINE__.
  (builtin_macro): Accept a second location for __LINE__.
  (enter_macro_context): Compute both virtual and real expansion
  locations for the macro.

  * gcc.dg/pr61817-1.c: New test.
  * gcc.dg/pr61817-2.c: New test.
  * gcc.dg/pr69391-1.c: New test.
  * gcc.dg/pr69391-2.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/pr61817-1.c
trunk/gcc/testsuite/gcc.dg/pr61817-2.c
trunk/gcc/testsuite/gcc.dg/pr69391-1.c
trunk/gcc/testsuite/gcc.dg/pr69391-2.c
Modified:
trunk/gcc/testsuite/ChangeLog
trunk/libcpp/ChangeLog
trunk/libcpp/internal.h
trunk/libcpp/macro.c

[Bug preprocessor/69391] [5/6 Regression] Incorrect __LINE__ expansion with -ftrack-macro-expansion=0 on g++5.2

2016-04-06 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69391

--- Comment #6 from Richard Henderson  ---
Author: rth
Date: Wed Apr  6 18:35:16 2016
New Revision: 234794

URL: https://gcc.gnu.org/viewcvs?rev=234794=gcc=rev
Log:
PR preprocessor/61817
PR preprocessor/69391

  * internal.h (_cpp_builtin_macro_text): Update decl.
  * macro.c (_cpp_builtin_macro_text): Accept location for __LINE__.
  (builtin_macro): Accept a second location for __LINE__.
  (enter_macro_context): Compute both virtual and real expansion
  locations for the macro.

  * gcc.dg/pr61817-1.c: New test.
  * gcc.dg/pr61817-2.c: New test.
  * gcc.dg/pr69391-1.c: New test.
  * gcc.dg/pr69391-2.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/pr61817-1.c
trunk/gcc/testsuite/gcc.dg/pr61817-2.c
trunk/gcc/testsuite/gcc.dg/pr69391-1.c
trunk/gcc/testsuite/gcc.dg/pr69391-2.c
Modified:
trunk/gcc/testsuite/ChangeLog
trunk/libcpp/ChangeLog
trunk/libcpp/internal.h
trunk/libcpp/macro.c

[Bug preprocessor/69391] [5/6 Regression] Incorrect __LINE__ expansion with -ftrack-macro-expansion=0 on g++5.2

2016-03-29 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69391

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

[Bug target/70355] [5/6 Regression] ICE: in simplify_subreg_concatn, at lower-subreg.c:617 with -funroll-loops -mavx512f

2016-03-29 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70355

--- Comment #3 from Richard Henderson  ---
Author: rth
Date: Tue Mar 29 15:19:00 2016
New Revision: 234524

URL: https://gcc.gnu.org/viewcvs?rev=234524=gcc=rev
Log:
PR middle-end/70355

  * lower-subreg.c (simplify_subreg_concatn): Reject paradoxical subregs.

Added:
trunk/gcc/testsuite/gcc.c-torture/compile/pr70355.c
trunk/gcc/testsuite/gcc.target/i386/pr70355.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/lower-subreg.c

[Bug target/70355] [5 Regression] ICE: in simplify_subreg_concatn, at lower-subreg.c:617 with -funroll-loops -mavx512f

2016-03-29 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70355

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
Summary|[5/6 Regression] ICE: in|[5 Regression] ICE: in
   |simplify_subreg_concatn, at |simplify_subreg_concatn, at
   |lower-subreg.c:617 with |lower-subreg.c:617 with
   |-funroll-loops -mavx512f|-funroll-loops -mavx512f

--- Comment #4 from Richard Henderson  ---
Fixed for gcc6.

[Bug target/70355] [5/6 Regression] ICE: in simplify_subreg_concatn, at lower-subreg.c:617 with -funroll-loops -mavx512f

2016-03-28 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70355

--- Comment #2 from Richard Henderson  ---
Created attachment 38113
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38113=edit
proposed patch

Testing the following, which works on the reduced test case.

As a missed-optimization, we really ought to be handling logic operations on
these wide types via normal sse logic insns.  Perhaps not for V1TI, but
definitely for V2TI and V4TI.  There's no point in breaking them down into
4 and 8 DImode operations, respectively.

[Bug target/64971] [5/6 Regression] gcc.c-torture/compile/pr37433.c ICEs with -mabi=ilp32

2016-03-28 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971

--- Comment #8 from Richard Henderson  ---
Created attachment 38112
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38112=edit
proposed patch

Andrew's approach to force the SYMBOL_REF to DImode is certainly one way
to approach it; another is to accept the SImode SYMBOL_REF.  Which can be
done relatively easily with a define_special_predicate.

Whichever solution is chosen, there appears to be a disconnect between 
the call and sibcall patterns:

 (1) sibcalls fail to check for aarch64_islong_call_p.
 (2) sibcalls use a combined pattern with aarch64_call_insn_operand
 and Usf constraint, whereas calls use two separate patterns.

This disconnect should be fixed at the same time.

[Bug target/70355] [5/6 Regression] ICE: in simplify_subreg_concatn, at lower-subreg.c:617 with -funroll-loops -mavx512f

2016-03-22 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70355

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

[Bug middle-end/69845] [4.9/5/6 Regression] Expression getting incorrectly optimized after being rewritten by compiler

2016-03-22 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69845

--- Comment #7 from Richard Henderson  ---
Proposed patch
  https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01255.html

[Bug middle-end/70199] [5 Regression] Crash at -O2 when using labels.

2016-03-21 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70199
Bug 70199 depends on bug 70273, which changed state.

Bug 70273 Summary: [6 regression] FAIL: g++.dg/ext/label13a.C  -std=gnu++98 
execution test / scan-assembler _ZN1CC4Ev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70273

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/70273] [6 regression] FAIL: g++.dg/ext/label13a.C -std=gnu++98 execution test / scan-assembler _ZN1CC4Ev

2016-03-21 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70273

--- Comment #11 from Richard Henderson  ---
Author: rth
Date: Mon Mar 21 23:03:56 2016
New Revision: 234386

URL: https://gcc.gnu.org/viewcvs?rev=234386=gcc=rev
Log:
PR c++/70273

  * decl.c (notice_forced_label_r): New.
  (cp_finish_decl): Use it.

Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/decl.c

[Bug middle-end/70273] [6 regression] FAIL: g++.dg/ext/label13a.C -std=gnu++98 execution test / scan-assembler _ZN1CC4Ev

2016-03-21 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70273

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Richard Henderson  ---
Fixed.

[Bug middle-end/69845] [4.9/5/6 Regression] Expression getting incorrectly optimized after being rewritten by compiler

2016-03-21 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69845

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

[Bug middle-end/70199] [5/6 Regression] Crash at -O2 when using labels.

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70199

--- Comment #6 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 16:50:18 2016
New Revision: 234261

URL: https://gcc.gnu.org/viewcvs?rev=234261=gcc=rev
Log:
PR middle-end/70199

 * function.h (struct function): Add has_forced_label_in_static.
 * gimplify.c (force_labels_r): Set it.
 * lto-streamer-in.c (input_struct_function_base): Read it.
 * lto-streamer-out.c (output_struct_function_base): Write it.
 * tree-inline.c (has_label_address_in_static_1): Remove.
 (copy_forbidden): Remove fndecl parameter; test
 has_forced_label_in_static.
 (inline_forbidden_p): Update call to copy_forbidden.
 (tree_versionable_function_p): Likewise.
 * ipa-chkp.c (chkp_instrumentable_p): Likewise.
 (chkp_versioning): Likewise.
 * tree-inline.h (copy_forbidden): Update decl.

testsuite/
 * gcc.c-torture/compile/pr70199.c: New.

Added:
trunk/gcc/testsuite/gcc.c-torture/compile/pr70199.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/function.h
trunk/gcc/gimplify.c
trunk/gcc/ipa-chkp.c
trunk/gcc/lto-streamer-in.c
trunk/gcc/lto-streamer-out.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-inline.c
trunk/gcc/tree-inline.h

[Bug middle-end/70273] [6 regression] FAIL: g++.dg/ext/label13a.C -std=gnu++98 execution test / scan-assembler _ZN1CC4Ev

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70273

Richard Henderson  changed:

   What|Removed |Added

  Attachment #38003|0   |1
is obsolete||

--- Comment #10 from Richard Henderson  ---
Created attachment 38034
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38034=edit
second patch

The first patch fails full testing.  We fail to remap
the label for some reason without leaving it on the 
local decl list.

[Bug rtl-optimization/70261] [6 Regression] r234265 causes fails on rs6000

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70261

--- Comment #3 from Richard Henderson  ---
Created attachment 37993
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37993=edit
aarch64 pbase_type_info.ii

This will ICE with just cc1plus -O.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #11 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 23:53:01 2016
New Revision: 234271

URL: https://gcc.gnu.org/viewcvs?rev=234271=gcc=rev
Log:
Gimplify vec_cond_expr with condition inside

  PR middle-end/70240
  PR middle-end/68215
  PR tree-opt/68714
  * gimplify.c (gimplify_expr) [VEC_COND_EXPR]: Gimplify the
  first operand as is_gimple_condexpr.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c

[Bug middle-end/70240] [6 Regression] ICE: in gimplify_modify_expr, at gimplify.c:4854 with -ftree-vectorize

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70240

--- Comment #11 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 23:53:18 2016
New Revision: 234273

URL: https://gcc.gnu.org/viewcvs?rev=234273=gcc=rev
Log:
PR middle-end/70240

  * gcc.c-torture/compile/pr70240.c: New.

Added:
trunk/gcc/testsuite/gcc.c-torture/compile/pr70240.c
Modified:
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/70240] [6 Regression] ICE: in gimplify_modify_expr, at gimplify.c:4854 with -ftree-vectorize

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70240

--- Comment #10 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 23:53:10 2016
New Revision: 234272

URL: https://gcc.gnu.org/viewcvs?rev=234272=gcc=rev
Log:
Revert r231575

  PR middle-end/70240
  PR middle-end/68215
  2015-12-11  Eric Botcazou  
  * tree-vect-generic.c (tree_vec_extract): Remove GSI parameter.
  Do not gimplify the result.
  (do_unop): Adjust call to tree_vec_extract.
  (do_binop): Likewise.
  (do_compare): Likewise.
  (do_plus_minus): Likewise.
  (do_negate): Likewise.
  (expand_vector_condition): Likewise.
  (do_cond): Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-generic.c

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #24 from Richard Henderson  ---
Regression resolved.
Create a new PR if desired to track the enhancements.

[Bug middle-end/70240] [6 Regression] ICE: in gimplify_modify_expr, at gimplify.c:4854 with -ftree-vectorize

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70240

--- Comment #9 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 23:53:01 2016
New Revision: 234271

URL: https://gcc.gnu.org/viewcvs?rev=234271=gcc=rev
Log:
Gimplify vec_cond_expr with condition inside

  PR middle-end/70240
  PR middle-end/68215
  PR tree-opt/68714
  * gimplify.c (gimplify_expr) [VEC_COND_EXPR]: Gimplify the
  first operand as is_gimple_condexpr.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c

[Bug middle-end/70240] [6 Regression] ICE: in gimplify_modify_expr, at gimplify.c:4854 with -ftree-vectorize

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70240

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Richard Henderson  ---
Fixed.

[Bug middle-end/70199] [5 Regression] Crash at -O2 when using labels.

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70199

Richard Henderson  changed:

   What|Removed |Added

Summary|[5/6 Regression] Crash at   |[5 Regression] Crash at -O2
   |-O2 when using labels.  |when using labels.

--- Comment #7 from Richard Henderson  ---
Fixed for gcc6.

[Bug middle-end/68215] [6 regression] FAIL: c-c++-common/opaque-vector.c -std=c++11 (internal compiler error)

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68215

--- Comment #5 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 23:53:01 2016
New Revision: 234271

URL: https://gcc.gnu.org/viewcvs?rev=234271=gcc=rev
Log:
Gimplify vec_cond_expr with condition inside

  PR middle-end/70240
  PR middle-end/68215
  PR tree-opt/68714
  * gimplify.c (gimplify_expr) [VEC_COND_EXPR]: Gimplify the
  first operand as is_gimple_condexpr.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c

[Bug middle-end/70273] [6 regression] FAIL: g++.dg/ext/label13a.C -std=gnu++98 execution test / scan-assembler _ZN1CC4Ev

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70273

Richard Henderson  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #8 from Richard Henderson  ---
(In reply to Richard Biener from comment #2)
> Looks like FE constructor cloning "breaks" this by not having the new flag
> set before gimplification.

During review you were concerned about placing decls on the local_decl
list before BIND_EXPR lowering.  This ICE implies that the FE is putting
decls on that list even before gimplification?

[Bug target/70120] [6 Regression][aarch64] -g causes Assembler messages: Error: unaligned opcodes detected in executable segment

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70120

--- Comment #9 from Richard Henderson  ---
Ah right, -ffunction-sections.

That requires a more extensive, though less hackish, fix.
Will post a new patch later this afternoon.

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #23 from Richard Henderson  ---
(In reply to Jiong Wang from comment #21)
> Please check the documentation at
> http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/
> Cortex_A57_Software_Optimization_Guide_external.pdf, page 14, the line
> describe "Load register, register offset, scale by 2".

Interesting that only HImode loads suffer that penalty, and that
QI, SI and DImode loads (scale by 4/8) don't.  But nevermind.


> Agreed, while double check the code, for the performance related "scale by
> 2" situation, aarch64 backend has already made it a illegitimate address.
> 
> There is the following check in aarch64_classify_address, the "GET_MODE_SIZE
> (mode) != 16" is catching that.
> 
>   bool allow_reg_index_p =
> !load_store_pair_p
> && (GET_MODE_SIZE (mode) != 16 || aarch64_vector_mode_supported_p (mode))
> && !aarch64_vect_struct_mode_p (mode);
> 
> So if the address is something like (base for short + index * 2), then it
> will go through the aarch64_legitimize_address.

Um, no, it won't.  That's a 16 byte test, not 16 bit.
So: Not a load/store_pair & not 16 bytes & not a vect mode = allow_reg_index.


> Thus I think your second
> patch at #c10 with my minor modification still is the proper fix for current
> stage.

That said I agree.  I don't think we should change move expanders at this time.

Patch committed as approved.

[Bug middle-end/70273] [6 regression] FAIL: g++.dg/ext/label13a.C -std=gnu++98 execution test / scan-assembler _ZN1CC4Ev

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70273

--- Comment #9 from Richard Henderson  ---
Created attachment 38003
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38003=edit
proposed patch

Alternately, instead of setting local_decls early (and doing
other tri-state-ish things in copy_forbidden), set
has_forced_label_in_static early.

Starting full testing now...

[Bug middle-end/68215] [6 regression] FAIL: c-c++-common/opaque-vector.c -std=c++11 (internal compiler error)

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68215

--- Comment #6 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 23:53:10 2016
New Revision: 234272

URL: https://gcc.gnu.org/viewcvs?rev=234272=gcc=rev
Log:
Revert r231575

  PR middle-end/70240
  PR middle-end/68215
  2015-12-11  Eric Botcazou  
  * tree-vect-generic.c (tree_vec_extract): Remove GSI parameter.
  Do not gimplify the result.
  (do_unop): Adjust call to tree_vec_extract.
  (do_binop): Likewise.
  (do_compare): Likewise.
  (do_plus_minus): Likewise.
  (do_negate): Likewise.
  (expand_vector_condition): Likewise.
  (do_cond): Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-generic.c

[Bug rtl-optimization/70261] [6 Regression] r234265 causes fails on rs6000

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70261

Richard Henderson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-17
 CC||rth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Richard Henderson  ---
This also fails on aarch64 stage1 libstdc++, also during combine.

#1  0x0100dc90 in replace_rtx (x=0x3ffad2b58f8, from=0x3ffad2b1cb0, 
to=0x3ffad2b5910) at ../../git-rh/gcc/rtlanal.c:2969
2969  gcc_assert (GET_MODE (x) == GET_MODE (from));
(gdb) call debug_rtx(x)
(reg:CC_NZ 66 cc)
(gdb) call debug_rtx(from)
(reg:CC 66 cc)

I'll attach the preprocessed for reference.

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #22 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 21:23:05 2016
New Revision: 234269

URL: https://gcc.gnu.org/viewcvs?rev=234269=gcc=rev
Log:
PR target/70048

  * config/aarch64/aarch64.c (virt_or_elim_regno_p): New.
  (aarch64_classify_address): Use it.
  (aarch64_legitimize_address): Force all subexpressions of PLUS
  into registers.  Simplify as (sfp+const)+reg or (reg+reg)+const.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-15 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #19 from Richard Henderson  ---
(In reply to Jiong Wang from comment #16)
> But there is a performance issue as described at
>  
>   https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00281.html
> 
>   "this patch forces register scaling expression out of memory ref, so that
>RTL CSE pass can handle common register scaling expressions"
> 
> This is particularly performance critial if a group of instructions are
> using the same "scaled register" inside hot loop. CSE can reduce redundant
> calculations.

I wish that message had been a bit more complete with the description
of the performance issue.  I must guess from this...

>   ldr dst1, [reg_base1, reg_index, #lsl 1]
>   ldr dst2, [reg_base2, reg_index, #lsl 1]
>   ldr dst3, [reg_base3, reg_index, #lsl 1]
> 
> into
> 
>   reg_index = reg_index << 1;
>   ldr dst1, [reg_base1, reg_index]
>   ldr dst2, [reg_base2, reg_index]
>   ldr dst3, [reg_base3, reg_index]

that it must have something to do with the smaller cpus, e.g. exynosm1,
based on the address cost tables.

I'll note for the record that you cannot hope to solve this with
the legitimize_address hook alone for the simple reason that it's not
called for legitimate addresses, of which (base + index * 2) is
a member.  The hook is only being called for illegitimate addresses.

To include legitimate addresses, you'd have to force out the address
components somewhere else.  Perhaps in the mov expanders, since that's
one of the very few places mem's are allowed.  You'd want to do this
only if !cse_not_expected.

OTOH, it's also the sort of thing that one would hope that CSE itself
would be able to handle.  Looking across various addresses, computing
sums of costs, and breaking out subexpressions as necessary.

[Bug middle-end/70240] [6 Regression] ICE: in gimplify_modify_expr, at gimplify.c:4854 with -ftree-vectorize

2016-03-15 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70240

Richard Henderson  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #7 from Richard Henderson  ---
It appears that if we take some advice from PR68714 #c6,
adjusting the gimplification of VEC_COND_EXPR, that alone
fixes the original PR68215.

If we then revert r231575, this bug is resolved.

Thoughts?

[Bug middle-end/70240] [6 Regression] ICE: in gimplify_modify_expr, at gimplify.c:4854 with -ftree-vectorize

2016-03-15 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70240

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #3 from Richard Henderson  ---
Mine.

[Bug target/70120] [6 Regression][aarch64] -g causes Assembler messages: Error: unaligned opcodes detected in executable segment

2016-03-15 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70120

--- Comment #6 from Richard Henderson  ---
Created attachment 37975
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37975=edit
proposed patch

This is kind of a hack, but not too bad.

Zdenek, could you please test on that third testcase that
you didn't post?  I expect it'll be fixed too, but...

[Bug target/70120] [6 Regression][aarch64] -g causes Assembler messages: Error: unaligned opcodes detected in executable segment

2016-03-15 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70120

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #5 from Richard Henderson  ---
The problem here is the literal pool isn't a multiple of the
instruction size, so the Letext label is "misaligned", which
affects the encoding of the dwarf2 info.

[Bug middle-end/70199] [5/6 Regression] Crash at -O2 when using labels.

2016-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70199

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #5 from Richard Henderson  ---
Mine.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Richard Henderson  ---
Fixed.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #8 from Richard Henderson  ---
Author: rth
Date: Mon Mar 14 20:48:15 2016
New Revision: 234196

URL: https://gcc.gnu.org/viewcvs?rev=234196=gcc=rev
Log:
PR tree-opt/68714

  * tree-ssa-reassoc.c (ovce_extract_ops, optimize_vec_cond_expr): New.
  (can_reassociate_p): Allow ANY_INTEGRAL_TYPE_P.
  (reassociate_bb): Use optimize_vec_cond_expr; avoid
  optimize_range_tests, attempt_builtin_copysign and attempt_builtin_powi
  on vectors.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68714.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-reassoc.c

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-09 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #13 from Richard Henderson  ---
Created attachment 37911
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37911=edit
aggressive patch

Consider something like this, whereby we allow (sfp + scale + const)
as an address all the way until register allocation.  LRA already knows
how to decompose this address in order to make it become valid, so
for your bar example in #c11 we get

bar:
stp x29, x30, [sp, -80]!
add x29, sp, 0
str x19, [sp, 16]
mov w19, w0
add x0, x29, 32
bl  g
add x0, x29, 48
bl  g
add x0, x29, 64
bl  g
add x0, x29, 32
ldrbw0, [x0, w19, sxtw]
bl  f
add x0, x29, 48
ldrbw0, [x0, w19, sxtw]
bl  f
add x0, x29, 64
ldrbw0, [x0, w19, sxtw]
bl  f
ldr x19, [sp, 16]
ldp x29, x30, [sp], 80
ret

So, three more instructions than trunk, no extra saved registers
like with the proposed patch.  The extra instructions are simply
a choice that LRA makes during decomposition.  If we look at a
different example,

void baz(int i, int j, int k)
{
  char A[10];
  g(A);
  h(A[i], A[j], A[k]);
} 

wherein the offsets are the same but the scale differs,

add x0, x29, 48
ldrbw2, [x0, w21, sxtw]
ldrbw1, [x0, w20, sxtw]
ldrbw0, [x0, w19, sxtw]
bl  h

where post-reload-cse unifies the three x29+48 insns.
Compare that to trunk, which produces

add x0, x29, 64
add x21, x0, x21, sxtw
add x20, x0, x20, sxtw
add x19, x0, x19, sxtw
ldrbw2, [x21, -16]
ldrbw1, [x20, -16]
ldrbw0, [x19, -16]
bl  h

At some point an AArch64 maintainer is going to have to decide
what to do with this PR.  If the answer is to defer all to gcc7,
then we should downgrade the priority to P4.

[Bug tree-optimization/70128] Linux kernel div patching optimized away

2016-03-07 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70128

--- Comment #7 from Richard Henderson  ---
(In reply to Andrew Pinski from comment #5)
> I still say this is undefined even with -fno-strict-aliasing because
> patching a function is undefined.

Oh please.  I think that's short-sighted.

I don't see how this differs materially from e.g.

  const int x;
  void f(void) { *(int *) = 1; }

and we don't delete that store.

We need a mode in which it's possible to do things that aren't
valid in "normal" C.  We've more-or-less settled on -fn-s-a as
an escape whereby treating all memory as a collection of bytes
is valid.

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-07 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

Richard Henderson  changed:

   What|Removed |Added

  Attachment #37886|0   |1
is obsolete||

--- Comment #10 from Richard Henderson  ---
Created attachment 37890
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37890=edit
second patch

Still going through full testing, but I wanted to post this
before the end of the day.

This update includes a virt_or_elim_regno_p, as discussed in #c7/#c8.

It also updates aarch64_legitimize_address to treat R0+R1+C as a special
case of R0+(R1*S)+C.  All of the arguments wrt scaling apply to unscaled
indices as well.

As a minor point, doing some of the expansion in a slightly different
order results in less garbage rtl being generated in the process.

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-07 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #9 from Richard Henderson  ---
While I fully believe in CSE'ing "base + reg*scale" when talking about
non-stack-based pointers, when it comes to stack-based data access I'm
less certain about the proper approach.

All things work out "best" when there's no (or little) offset applied
during register elimination.  When this can be true, all of the rtl
optimizations see the final address and can do the right thing.

This isn't easy to do for AArch64, however.  So we need to accept that
some amount of concession need be made so that it's not too difficult
turn reg + scale + c1 + c2 into a final address without extra steps.

We already special case the eliminable frame registers in
aarch64_classify_address to allow arbitrary offset, and we're prepared
to split to a proper offset during RA.  It wouldn't be out of the
question to allow "reg + scale + c" as well.  We can probably come up
with some good heuristics for splitting into a number of cases based
on the generalized "((reg + hi_c) + scale) + lo_c".

But the patch we take for stage4 must be less than the full solution.

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-07 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #6 from Richard Henderson  ---
Created attachment 37886
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37886=edit
proposed patch

I agree -- at minimum virtual and eliminable frame registers ought to be
special-cased.  If we separate the constants too far, we'll never be able
to fold the constant plus the adjustment back together.

If the statement in #c4 is taken at face value -- that r233136 was applied
to simplify frame-based array accesses...   Well, I simply don't believe that.

I can see how the patch would aid reduction of access to members of a
structure that are in an array which in turn is *not* on the stack.  But
for the average stack-based access I can't see except that it would hurt.

[Bug rtl-optimization/70061] [6 Regression] ICE: SIGSEGV in delete_insn_chain() with unused label

2016-03-07 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70061

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Richard Henderson  ---
Fixed.

[Bug rtl-optimization/70061] [6 Regression] ICE: SIGSEGV in delete_insn_chain() with unused label

2016-03-07 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70061

--- Comment #4 from Richard Henderson  ---
Author: rth
Date: Mon Mar  7 11:48:57 2016
New Revision: 234025

URL: https://gcc.gnu.org/viewcvs?rev=234025=gcc=rev
Log:
PR rtl-opt/70061

  * tree-outofssa.c (emit_partition_copy): Flush pending stack adjust.
  (insert_value_copy_on_edge): Likewise.

  * gcc.c-torture/compile/pr70061.c: New test.

Added:
trunk/gcc/testsuite/gcc.c-torture/compile/pr70061.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-outof-ssa.c

[Bug rtl-optimization/70061] [6 Regression] ICE: SIGSEGV in delete_insn_chain() with unused label

2016-03-05 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70061

--- Comment #3 from Richard Henderson  ---
Created attachment 37875
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37875=edit
proposed patch

Thanks, Jeff, the errant stack adjustment was a good hint.

The problem is that we are emitting copy sequences to edges without caring
for deferred popping of arguments.  Thus when we began emitting code for the
first block, which in this case starts with a label, we had 32 bytes of
pending stack adjustment, which emit_label flushed.  The ICE is due to the
label not being the first insn in the BB.

There are two equivalent ways to fix this: we can either save/restore
inhibit_defer_pop around these sequences, or we can manually flush any
pending stack adjustment.

This patch does the latter.  Just starting full testing.

[Bug rtl-optimization/70061] [6 Regression] ICE: SIGSEGV in delete_insn_chain() with unused label

2016-03-04 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70061

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

[Bug middle-end/70069] New: Uninitialized value default to zero, plus warning

2016-03-03 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70069

Bug ID: 70069
   Summary: Uninitialized value default to zero, plus warning
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rth at gcc dot gnu.org
  Target Milestone: ---

Quoting Ingo Monlar, via a LKML discussion:

=
It could be combined with the following 'safe' runtime behavior: when built
with 
-Ow then all uninitialized values are initialized to 0. This should be
relatively 
easy to implement, as it does not depend on any optimization. After all is said 
and done, there's two cases:

  - a 0-initialization gets optimized out by an optimization pass. This is the 
common case.

  - a variable gets initialized to 0 unnecessarily. (If a warning got ignored.)

having some runtime overhead for zero initialization is much preferred for many 
projects.

The warning could even be generated at this late stage: i.e. the warning would 
simply warn about remaining 0-initializations that previous passes were unable
to 
eliminate.

This way no undeterministic, random, uninitialized (and worst-case: attacker 
controlled) values can ever enter the program flow (from the stack) - in the
worst 
case (where a warning was ignored) a 0 value is set implicitly - which is still 
deterministic behavior.
==

I imagine a marked value of some sort (e.g. a flag on existing *_CST, or
maybe a new UNINIT_CST) with such an initialization being applied to all
auto variables that aren't already initialized at declaration.  Optimize
as usual, but don't discard the marked value at PHIs.  Warn if any persist
during expansion.

All controlled by -fnew-flag, so that it's opt-in.

[Bug libffi/70024] [5/6 Regression] libffi ABI change w/o SONAME bump

2016-03-02 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70024

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Richard Henderson  ---
FIXED for gcc6, WONTFIX for gcc5.

[Bug libffi/70024] [5/6 Regression] libffi ABI change w/o SONAME bump

2016-03-02 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70024

--- Comment #6 from Richard Henderson  ---
Author: rth
Date: Thu Mar  3 01:40:29 2016
New Revision: 233926

URL: https://gcc.gnu.org/viewcvs?rev=233926=gcc=rev
Log:
PR libffi/70024

  * Makefile.am (libffi_version_script): Look in cwd for libffi.map.
  (libffi_version_dep, libffi.map-sun): Likewise.
  (libffi.map): New target.
  * libffi.map.in: Rename from libffi.map.  Add required defines,
  includes, and conditionals.

Added:
trunk/libffi/libffi.map.in
  - copied, changed from r233925, trunk/libffi/libffi.map
Removed:
trunk/libffi/libffi.map
Modified:
trunk/libffi/ChangeLog
trunk/libffi/Makefile.am
trunk/libffi/Makefile.in

[Bug libffi/70024] [5/6 Regression] libffi ABI change w/o SONAME bump

2016-03-02 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70024

--- Comment #5 from Richard Henderson  ---
Author: rth
Date: Wed Mar  2 23:28:11 2016
New Revision: 233921

URL: https://gcc.gnu.org/viewcvs?rev=233921=gcc=rev
Log:
PR libffi/70024

  * Makefile.am (libffi_version_script): New.
  (libffi_version_dep): New.
  (libffi_version_info): New.
  (libffi_la_LDFLAGS): Include libffi_version_info, libffi_version_script.
  (libffi_la_DEPENDENCIES): Include libffi_version_dep.
  * acinclude.m4 (LIBAT_ENABLE, LIBAT_CHECK_LINKER_FEATURES): New.
  (LIBAT_ENABLE_SYMVERS, LIBAT_BUILD_VERSIONED_SHLIB): New.
  (LIBAT_BUILD_VERSIONED_SHLIB_GNU): New.
  (LIBAT_BUILD_VERSIONED_SHLIB_SUN): New.
  * configure.ac: Invoke LIBAT_ENABLE_SYMVERS.
  * libffi.map: New file.
  * libtool-version: Increase to 5.0.0.
  * Makefile.in, configure: Rebuild.
  * man/Makefile.in, testsuite/Makefile.in: Rebuild.

Added:
trunk/libffi/libffi.map
Modified:
trunk/libffi/ChangeLog
trunk/libffi/Makefile.am
trunk/libffi/Makefile.in
trunk/libffi/acinclude.m4
trunk/libffi/configure
trunk/libffi/configure.ac
trunk/libffi/include/Makefile.in
trunk/libffi/libtool-version
trunk/libffi/man/Makefile.in
trunk/libffi/testsuite/Makefile.in

[Bug libffi/70024] [5/6 Regression] libffi ABI change w/o SONAME bump

2016-03-02 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70024

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

  1   2   3   4   5   6   7   8   >