[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register

2023-06-02 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

--- Comment #14 from Ulrich Weigand  ---
(In reply to Georg-Johann Lay from comment #13)
> Also I don't have a test case for your scenario.  I can reproduce the bug
> back to v5 on avr and maybe it is even older.  As it appears, this PR lead
> to no hickups on any other target, so for now I'd like to keep the fix
> restricted to what I can test.

I agree that your patch looks correct and unlikely to cause any new problems,
so I won't object to it being committed.  I just wanted to point out that it
might not be a complete fix.

[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register

2023-06-02 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #12 from Ulrich Weigand  ---
Sorry for not responding earlier, I've been out on vacation.

I think your root cause analysis is correct.  In this part of code:

  if (success)
delete_insn (insn);
  changed |= success;
  insn = next;
  move2add_record_mode (reg);
  reg_offset[regno]
= trunc_int_for_mode (added_offset + base_offset,
  mode);
  continue;

the intent seems to be to manually update the move2add data structures to
account for the effects of "next", because the default logic is now skipped for
the "next" insn.  That's why in particular the reg mode and offset are manually
calculated.

This manual logic however is really only correct if "next" is actually just a
simple SET.  Reading the comment before the whole loop:
  /* For simplicity, we only perform this optimization on
 straightforward SETs.  */
makes me suspect the original author assumed that "next" is in fact a
straightforward SET here as well.  This is however not true due to behavior of
the "single_set" extractor.  (I'm wondering if "single_set" used to be defined
differently back in the days?)

Your fix does look correct to me as far as handling parallel CLOBBERs go. 
However, looking at "single_set", it seems there is yet another case: the
extractor also accepts a parallel of two or more SETs, as long as all except
one of those SETs have destinations that are dead.  These cases would still not
be handled correctly with your patch, I think.

I'm wondering whether it is even worthwhile to attempt to cover those cases. 
Maybe a more straightforward fix would be to keep in line with the
above-mentioned comment about "straightforward SETs" and just check for a
single SET directly instead of using "single_set" here.  Do you think this
would miss any important optimizations?

[Bug debug/108996] Proposal for adding DWARF call site information in GCC with -O0

2023-03-07 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108996

--- Comment #9 from Ulrich Weigand  ---
(In reply to Andrew Pinski from comment #7)
> (In reply to Ulrich Weigand from comment #4)
> > (In reply to Jakub Jelinek from comment #3)
> > > What is done on other arches?
> > 
> > That depends on the platform ABI.  On some arches, including x86/x86_64 and
> > arm/aarch64, the ABI requires the generated code reloads the return buffer
> > pointer into a defined register at function exit (either the same it was in
> > on function entry, or some other ABI-defined register).  On those arches,
> > GDB can at least inspect the return value at the point the function return
> > happens.
> 
> aarch64 does not require that. GCC produces it yes but that is a missed
> optimization, see PR 103010 which I filed against GCC for that case.

Well, I was looking at GDB code that at least *assumes* that the aarch64 ABI
does require that:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c;h=5b1b9921f87e588f8251a77d858f8f312be1e5ac;hb=HEAD#l2500

If this is incorrect, I guess GDB would have to be fixed.

[Bug debug/108996] Proposal for adding DWARF call site information in GCC with -O0

2023-03-07 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108996

--- Comment #8 from Ulrich Weigand  ---
(In reply to Jakub Jelinek from comment #5)
> Though, relying on DW_OP_entry_value is not reliable, if e.g. tail calls are
> (or could be) involved, then GDB needs to punt.

The only way a tail call could happen is if the return value is
passed through directly to the (caller's) caller, so the return
buffer address should still be correct, right?

> So, I wonder if we just shouldn't ask for a DWARF 6 extension here, have
> some way for the compiler to specify DW_AT_location for the return value.
> Then for -O1+ -g with var-tracking that address could be for PowerPC r3
> register in such functions or wherever its initial value is tracked
> (including DW_OP_entry_value).
> While for -O0, we'd see we've spilled that parameter to stack and would set
> DW_AT_location to that place spilled on the stack.

I don't think it is possible to track the value in the callee - the value may
not be available *anywhere* because it is no longer needed.  (Also, I don't
think the implicit return buffer address is guaranteed to be spilled to the
stack even at -O0.)

[Bug debug/108996] Proposal for adding DWARF call site information in GCC with -O0

2023-03-03 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108996

--- Comment #4 from Ulrich Weigand  ---
(In reply to Jakub Jelinek from comment #3)
> What is done on other arches?

That depends on the platform ABI.  On some arches, including x86/x86_64 and
arm/aarch64, the ABI requires the generated code reloads the return buffer
pointer into a defined register at function exit (either the same it was in on
function entry, or some other ABI-defined register).  On those arches, GDB can
at least inspect the return value at the point the function return happens.

On a few arches, in particular SPARC and RISC-V, the ABI even guarantees that
the return buffer pointer register remains valid throughout execution of the
function, so that GDB can inspect and/or modify the return value at any point.

But on most other arches, including s390x and ppc/ppc64, the ABI does not
guarantee anything, so GDB simply cannot access the function return value at
all (after the point the return buffer pointer register is no longer needed by
generated code and the register has been reused).

However, *if* the debug info contains an entry-value record for that register
at the call site in the current caller, then the return buffer can be accessed
at any time, on all arches.   Given that in this specific case, most callers
will actually just point the return buffer register to a local stack buffer
(i.e. set the register to "stack pointer plus some constant"), generating an
entry-value record for these special cases should actually be quite
straightforward for the compiler, without requiring a lot of value-tracking
machinery.

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2022-10-25 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #22 from Ulrich Weigand  ---
(In reply to Jakub Jelinek from comment #15)
> PowerPC I think does, not sure about s390.

For s390x see here:
https://github.com/IBM/s390x-abi

[Bug debug/104194] No way to distinguish IEEE and IBM long double in debug info

2022-07-25 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104194

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #8 from Ulrich Weigand  ---
(In reply to Jakub Jelinek from comment #7)
> A temporary workaround now applied.

It turns out this workaround is not transparent to users of the debugger, for
example if you define a variable as
   long double x;
and then issue the "ptype x" command in GDB, you'll now get "_Float128" - which
is quite surprising if you've never even used that type in your source code. 
(This also causes a few GDB test suite failures.)

> The dwarf-discuss thread seems to prefer using separate DW_ATE_* values
> instead of DW_AT_precision/DW_AT_minimum_exponent, but hasn't converged yet.

When I discussed this back in 2017:
https://slideslive.com/38902369/precise-target-floatingpoint-emulation-in-gdb
(see page 16 in the slides), my suggestion was simple
  DW_AT_encoding_variant
which would have the let the details of the floating-point format remain
platform-defined (unspecified by DWARF), but simply allow a platform to define
multiple different formats of the same size if required.

[Bug tree-optimization/97970] [11 regression] 'gcc.dg/gomp/pr82374.c scan-tree-dump-times vect "vectorized 1 loops" 2' for 32-bit x86

2020-11-24 Thread uweigand at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97970

--- Comment #2 from Ulrich Weigand  ---
The patch did not handle flag_excess_precision correctly.  I've reverted for
now and will look into a proper fix.  Sorry for the breakage.

[Bug target/96559] Wrong code with -march=z900 -mtune=z9-109

2020-08-11 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96559

--- Comment #1 from Ulrich Weigand  ---
> [...] as __clzdi2 points to the very same place as _Z11CeilingLog2v.

How do you get to that conclusion?  Nothing in that assembler source sets
__clzdi2 to point to the same place as _Z11CeilingLog2v.  The ".globl" simply
declares that there is a globally visible definition, from someplace outside
this file.

And in fact if I compile all the way to an object file and look at it using
"objdump --disassemble --reloc", I see:

 <_Z11CeilingLog2v>:
   0:   eb cf f0 60 00 24   stmg%r12,%r15,96(%r15)
   6:   a7 fb ff 60 aghi%r15,-160
   a:   c0 c0 00 00 00 00   larl%r12,a <_Z11CeilingLog2v+0xa>
c: R_390_PC32DBL.bss+0x2
  10:   e3 20 c0 00 00 04   lg  %r2,0(%r12)
  16:   c0 e5 00 00 00 00   brasl   %r14,16 <_Z11CeilingLog2v+0x16>
18: R_390_PC32DBL   __clzdi2+0x2
  1c:   42 20 c0 08 stc %r2,8(%r12)
  20:   e3 40 f1 10 00 04   lg  %r4,272(%r15)
  26:   eb cf f1 00 00 04   lmg %r12,%r15,256(%r15)
  2c:   07 f4   br  %r4

And "nm" shows:
 U __clzdi2
0008 B compute___trans_tmp_3
 B CountLeadingZeroes64_aValue
 T _Z11CeilingLog2v

So if the call to __clzdi2 ends up going to the wrong place, something must
have gone wrong at the link stage.

[Bug target/69286] trunk/libgcc/config/s390/tpf-unwind.h: 28 redundant condition ?

2019-09-28 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69286

--- Comment #2 from Ulrich Weigand  ---
Yes, it does appear I checked in this code, but the tpf-unwind.h changes were
actually provided by Jim Johnston on the IBM TPF team:
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg02104.html

In any case, feel free to make the obvious change here :-)

[Bug target/86772] [meta-bug] tracking port status for CVE-2017-5753

2018-08-06 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86772
Bug 86772 depends on bug 86807, which changed state.

Bug 86807 Summary: spu port needs updating for CVE-2017-5753
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86807

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/86807] spu port needs updating for CVE-2017-5753

2018-08-06 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86807

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |uweigand at gcc dot 
gnu.org

--- Comment #2 from Ulrich Weigand  ---
Fixed.

[Bug target/86807] spu port needs updating for CVE-2017-5753

2018-08-06 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86807

--- Comment #1 from Ulrich Weigand  ---
Author: uweigand
Date: Mon Aug  6 14:40:56 2018
New Revision: 263335

URL: https://gcc.gnu.org/viewcvs?rev=263335&root=gcc&view=rev
Log:
[spu, commit] Define TARGET_HAVE_SPECULATION_SAFE_VALUE

The SPU processor is not affected by speculation, so this macro can
safely be defined as speculation_safe_value_not_needed.

gcc/ChangeLog:

PR target/86807
* config/spu/spu.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
Define to speculation_safe_value_not_needed.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/spu/spu.c

[Bug target/85075] powerpc: ICE in iszero testcase

2018-04-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85075

--- Comment #3 from Ulrich Weigand  ---
Maybe I'm confused, but: How does this even build?

_Float128 is a C-only extension, this type is not supposed to be available at
all in C++ mode as far as I know.

[Bug rtl-optimization/85180] New: Infinite (?) loop in RTL DSE optimizer

2018-04-03 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180

Bug ID: 85180
   Summary: Infinite (?) loop in RTL DSE optimizer
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: uweigand at gcc dot gnu.org
CC: krebbel at gcc dot gnu.org
  Target Milestone: ---
Target: s390x-ibm-linux

Created attachment 43828
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43828&action=edit
Test case - run with "cc1plus -O"

When attempting to compile the attached testcase simply with "cc1plus -O" on a
s390x-ibm-linux target, the compilation process never terminates.  The problem
appear to originate in the dse.c pass; building with -fno-dse makes the problem
go away.

I'm not completely sure that this is really an infinite loop, strictly
speaking, or just some exponential time behavior somewhere.  In any case, at
the time the compiler hangs, it sits in a long chain of find_base_term calls
ultimately originating at the canon_output_dependence in dse.c:1593
(record_store).

[Bug bootstrap/83396] [8 Regression] Bootstrap failures with Statement Frontiers

2017-12-14 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83396

--- Comment #64 from Ulrich Weigand  ---
I'm seeing the same error on spu-elf when building newlib with GCC revision
255614.  In case this isn't fixed by more recent changes already, here's a
reduced test case (build with -O -g):

const char *
test (const char *s)
{
  for (; ; s++)
switch (*s)
  {
  case '-':
  case 0:
return 0;

  case '\t':
  case '\n':
  case '\v':
  case '\f':
  case '\r':
  case ' ':
continue;

  default:
goto break2;
  }
break2:

  for (; *s >= '0' && *s <= '9'; s++)
;

  return s;
}

[Bug target/82960] spu_machine_dependent_reorg does not handle jump_table_data insn

2017-12-08 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82960

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Ulrich Weigand  ---
I've now checked in a slightly different fix (which causes fewer changes to
processing -- I think there were some cases where we actually need to handle
jump_table_insn in pad_bb to avoid miscounting insns ...).

This fixes the ICE for me.

[Bug target/82960] spu_machine_dependent_reorg does not handle jump_table_data insn

2017-12-08 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82960

--- Comment #5 from Ulrich Weigand  ---
Author: uweigand
Date: Fri Dec  8 11:33:09 2017
New Revision: 255508

URL: https://gcc.gnu.org/viewcvs?rev=255508&root=gcc&view=rev
Log:

gcc/
PR target/82960
* config/spu/spu.c (pad_bb): Only check INSN_CODE when INSN_P is true.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/spu/spu.c

[Bug target/82960] spu_machine_dependent_reorg does not handle jump_table_data insn

2017-11-20 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82960

--- Comment #3 from Ulrich Weigand  ---
I'll have a look.   I still need to get my SPU build environment back up and
running, the build currently fails due to unrelated issues.

I remember looking at this a few years back:
https://gcc.gnu.org/ml/gcc-patches/2013-04/msg00151.html

This seemed to have fixed the problem back then, not sure why this no longer
works.

[Bug sanitizer/79341] Many Asan tests fail on s390

2017-02-15 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341

--- Comment #71 from Ulrich Weigand  ---
(In reply to Dominik Vogt from comment #70)
> If funny line information is the only consequence, no.  Is it safe to assume
> that libsanitizer won't crash or produce garbege because of this?

Why should line infomation be "funny"?  With the odd addresses (decremented by
one), line information should identify the line of the call, otherwise we'd get
the line *after* the call.  IMO identifying the call is actually better ...

[Bug sanitizer/79341] Many Asan tests fail on s390

2017-02-15 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341

--- Comment #60 from Ulrich Weigand  ---
... well, as Florian said as well :-)

[Bug sanitizer/79341] Many Asan tests fail on s390

2017-02-15 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341

--- Comment #59 from Ulrich Weigand  ---
(In reply to Dominik Vogt from comment #57)
> libsanitizer miscalculates the Pcs in the backtrace:
> 
> #0 0x1000839 in NullDeref
> #1 0x10006c1 in main
> #2 0x3fff6e23069 in __libc_start_main
> #3 0x100073d
> 
> These are all odd addresses, pointing to the last byte of the previous
> instruction.  In case of null-deref-1.c that byte belongs to some
> instrumentation code that is associated with line 11.

Normally you should decrement the return address by one for normal frames (in
order to identify the call instruction), but you should not decrement the
return address for signal frames (since the address already identifies the
faulting instruction).

That's why there's usually a bit to distinguish signal frames from normal
frames during unwinding.  Maybe this somehow doesn't work correctly with the
libsanitizer unwinding?

[Bug sanitizer/79341] Many Asan tests fail on s390

2017-02-11 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341

--- Comment #48 from Ulrich Weigand  ---
s390(x) has -fasynchronous-unwind-tables on by default anyway, and .eh_frame
based DWARF unwinding is the only way to create stack backtraces that always
works.

However, I understood that asan deliberately doesn't want to use DWARF
unwinding for the the malloc/free case since it can be slow.  That's why Marcin
actually added -mbackchain to LLVM in the first place.  (We've had -mbackchain
in GCC forever, but it has defaulted to off for a very long time.)

I don't think we should switch to *always* using backchain unwinding in asan,
since system libraries on s390 will be built without backchain.  However,
switching -mbackchain on by default when building for asan might make sense.

[Bug sanitizer/79341] Many Asan tests fail on s390

2017-02-03 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341

--- Comment #22 from Ulrich Weigand  ---
(In reply to Jakub Jelinek from comment #21)
> Could libsanitizer call __tls_get_offset instead, after setting %r12 or
> whatever else is needed for it to make work and then perhaps adjust the
> result if needed?
> E.g. on s390x __tls_get_offset is internally:
> __tls_get_offset:\n\
> la  %r2,0(%r2,%r12)\n\
> jg  __tls_get_addr\n\
> and in the interceptor:
> #ifdef __s390x__
>   "la %r2, 0(%r2,%r12)\n"
>   "jg __interceptor___tls_get_addr_internal_protected\n"
> #else
> at which point the original %r2 and %r12 is lost and it is hard to call the
> original __tls_get_offset, it might be better to pass the original %r2 and
> %r12 values to some C function and from that compute the r2 + r12 the code
> perhaps needs for its own thing, but then we could (again in assembly) call
> the original __tls_get_offset again if needed.

Yes, it would appear to be safer to call __tls_get_offset instead.
You probably do not even need the original %r12, but simply subtract
%r12 (whatever it currently is) from %r2 before calling the original
__tls_get_offset.  The value of %r12 is not used for anything except
adding it to %r2.

> That said, if asan wants to intercept also what dlsym will internally call,
> then that will not really work.  But does libasan on other targets rely on
> dlsym calling __tls_get_addr internally in those cases?  That would be yet
> another reliance on glibc internals.

As I understand it, they do make that assumption; libsanitizer must get
involved at the point any new TLS data section is allocated.  Since this
allocation may happen as a result of a dlsym call, those cases have to
be intercepted as well.

[Bug rtl-optimization/78812] New: Wrong code generation due to hoisting memory load across function call

2016-12-14 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78812

Bug ID: 78812
   Summary: Wrong code generation due to hoisting memory load
across function call
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: uweigand at gcc dot gnu.org
  Target Milestone: ---

Compiling the following test case with g++ -Os -fpic on s390x-ibm-linux results
in abort() being called unexpectedly:

extern "C" void abort (void) __attribute__ ((__noreturn__));

class Transaction
{
public:
  bool Aborted;

  Transaction () : Aborted (false) { }

  ~Transaction ()
  {
if (!Aborted)
  abort ();
  }
};

void test (Transaction &Trans) __attribute__ ((noinline));
void test (Transaction &Trans)
{
  Trans.Aborted = true;
}

int main (void)
{
  Transaction T;
  test (T);
}


What happens is that the destructor for Transaction is inlined into main
(twice, once in the regular exit and once in the exception path).  The load of
the Aborted member is then moved by the code hoisting pass (pass_rtl_hoist) in
gcse.c to a single location in basic block 2.

However, that basic block ends with the call to the "test" routine, and for
some reason, code hoisting thinks it therefore needs to move the hoisted
instruction to *before* that call (see the CALL_P case of
insert_insn_end_basic_block).  This causes wrong code generation since that
call actually modifies the memory that is being loaded.

There seems to be other code that appears intended to recognize that case and
avoid hoisting then (see prune_expressions), but that apparently only looks for
abnormal edges, which we don't have here.

Not sure how this is supposed to work correctly ...

[Bug bootstrap/77359] [7 Regression] AIX bootstrap failure due to alignment of stack pointer + STACK_DYNAMIC_OFFSET

2016-10-26 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359

--- Comment #19 from Ulrich Weigand  ---
I've been looking into this a bit with Dominik, and here's what I understand of
the problem so far:

- It really all starts with emit-rtl.c:init_emit doing:
  REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM) = STACK_BOUNDARY;
  which makes the compiler assume that the address of the dynamic area
  is a multiple of STACK_BOUNDARY / BITS_PER_BYTE.

- Now, at least on AIX, STACK_BOUNDARY is always 128 (16 bytes):
  /* 32-bit and 64-bit AIX stack boundary is 128.  */
  #undef  STACK_BOUNDARY
  #define STACK_BOUNDARY 128

- On the other hand, the actual offset of the dynamic area is
  computed from the stack pointer + STACK_DYNAMIC_OFFSET.
  Since the stack pointer is always aligned to STACK_BOUNDARY,
  we must ensure that STACK_DYNAMIC_OFFSET is likewise aligned.
  (At least if calls_alloca is true; otherwise this doesn't matter.)

- However, the current definition (shared by Linux and AIX) is:
  #define STACK_DYNAMIC_OFFSET(FUNDECL) \
(RS6000_ALIGN (crtl->outgoing_args_size,\
   (TARGET_ALTIVEC || TARGET_VSX) ? 16 : 8) \
 + (STACK_POINTER_OFFSET))
  This has (at least) two problems: it doesn't always align to
  STACK_BOUNDARY (which is 16 even if !TARGET_ALTIVEC), and even
  if it does, it then adds STACK_POINTER_OFFSET, which *itself*
  is not aligned to 16 on 32-bit AIX.

  Your experimental patch fixes the second problem, but not the first.

- Of course, STACK_DYNAMIC_OFFSET cannot be simply changed on its
  own. Its value must agree with the value of STARTING_FRAME_OFFSET if
  !FRAME_GROWS_DOWNWARD, and the latter value must agree with the value
  info->fixed_size + info_parm_size as computed by rs6000_stack_info.

  I do not believe your current patch ensures this in all cases.


I'm not really familar enough with the history of the various rs6000 subtargets
to understand where exactly the problem should be fixed.  For example, why does
rs6000_stack_info align info->parm_size to 16 in the first place?  This seems
pointless if the parm area *starts* at an offset that is not a multiple of 16
anyway ...

[Bug target/70168] [5 Regression] Wrong code generation in __sync_val_compare_and_swap on PowerPC

2016-03-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70168

Ulrich Weigand  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Ulrich Weigand  ---
Fixed.

[Bug target/70168] [5 Regression] Wrong code generation in __sync_val_compare_and_swap on PowerPC

2016-03-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70168

--- Comment #5 from Ulrich Weigand  ---
Author: uweigand
Date: Thu Mar 10 23:59:20 2016
New Revision: 234127

URL: https://gcc.gnu.org/viewcvs?rev=234127&root=gcc&view=rev
Log:
PR target/70168
* config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
Handle overlapping retval and newval.

Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/rs6000/rs6000.c

[Bug target/70168] [5 Regression] Wrong code generation in __sync_val_compare_and_swap on PowerPC

2016-03-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70168

--- Comment #4 from Ulrich Weigand  ---
Author: uweigand
Date: Thu Mar 10 23:58:44 2016
New Revision: 234126

URL: https://gcc.gnu.org/viewcvs?rev=234126&root=gcc&view=rev
Log:
PR target/70168
* config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
Handle overlapping retval and newval.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/rs6000.c

[Bug target/70168] [5 Regression] Wrong code generation in __sync_val_compare_and_swap on PowerPC

2016-03-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70168

Ulrich Weigand  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
URL||https://gcc.gnu.org/ml/gcc-
   ||patches/2016-03/msg00671.ht
   ||ml
  Component|rtl-optimization|target
   Assignee|unassigned at gcc dot gnu.org  |uweigand at gcc dot 
gnu.org

--- Comment #3 from Ulrich Weigand  ---
Patch posted.

[Bug rtl-optimization/70168] [5 Regression] Wrong code generation in __sync_val_compare_and_swap on PowerPC

2016-03-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70168

--- Comment #1 from Ulrich Weigand  ---
Created attachment 37925
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37925&action=edit
Patch to add retval vs. newval overlap check

This patch fixes the problem for me with the GCC 5 branch.  Not fully
regression tested yet.

[Bug rtl-optimization/70168] New: [5 Regression] Wrong code generation in __sync_val_compare_and_swap on PowerPC

2016-03-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70168

Bug ID: 70168
   Summary: [5 Regression] Wrong code generation in
__sync_val_compare_and_swap on PowerPC
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: uweigand at gcc dot gnu.org
CC: amodra at gcc dot gnu.org, dje at gcc dot gnu.org
  Target Milestone: ---
Target: powerpc64le-linux

Building the following test case on powerpc64le-linux with -O2 using the
current GCC 5 branch:

unsigned long atomicAND (volatile unsigned long *memRef, unsigned long mask)
{
  unsigned long oldValue, newValue, prevValue;

  for (oldValue = *memRef; ; oldValue = prevValue)
{
  newValue = oldValue & mask;
  if (newValue == oldValue)
break;

  prevValue = __sync_val_compare_and_swap (memRef, oldValue, newValue);
  if (prevValue == oldValue)
break;
}

  return oldValue;
}

results in this assembler output:

atomicAND:
ld 9,0(3)
b .L6
.p2align 4,,15
.L10:
sync
.L3:
ldarx 10,0,3
cmpd 0,10,9
bne- 0,.L4
stdcx. 9,0,3
bne- 0,.L3
.L4:
isync
cmpld 7,9,10
beq 7,.L7
mr 9,10
.L6:
and 10,9,4
cmpld 7,9,10
bne 7,.L10
.L7:
mr 3,10
blr

Note how the stdcx. stores r9, which holds the original value, not the masked
value.  Therefore, this will usually succeed without updating the memory
location.

Debugging this problem, it turns out that rs6000_expand_atomic_compare_and_swap
is called with retval (operands[1]) equal to newval (operands[4]), and the
expander then proceeds to clobber newval before using it.

There is code to detect overlap of retval with oldval (operands[3]), but not
with newval.  Adding the equivalent detection code fixes the problem for me.

For some reason, I can reproduce this problem only with GCC 5; the same problem
should still be latently present in mainline since there's still no overlap
check, but I seem unable to construct a test case that would actually cause
retval == newval with mainline.

[Bug target/70117] ppc long double isinf() is wrong?

2016-03-07 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70117

--- Comment #7 from Ulrich Weigand  ---
Ah, OK.  I did't realize this value didn't fit into a 106-bit mantissa.

I agree that it probably doesn't make sense to change the internal
representation to allow larger mantissas.  First of all, there's nothing really
special about 107 bits; there can be IBM long double values that would require
a much larger mantissa in the internal representation, since we can have many
implicit zero bits.  But more problematical, if we change the internal
representation to a mantissa larger than 106 bits, there will be values in that
internal format that cannot be represented directly in the target IBM long
double format.

In any case, I certainly agree that the is* routines for IBM long double should
simply operate on the high double of the pair.

I still think that it would be better for gnulib to use the same LDBL_MAX as
GCC, which means gnulib should probably be changed to use the 106-bit value.

[Bug target/70117] ppc long double isinf() is wrong?

2016-03-07 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70117

--- Comment #4 from Ulrich Weigand  ---
(In reply to Alan Modra from comment #3)
> > while with GCC, we get:
> > 
> >   high double: 7FEF 
> >   low  double: 7C8F FFFE
> 
> Right.  This is 0x1.f78p+1023
> 
> gnulib isn't correct here.  As the comment says the high double must be the
> value of the long double correctly rounded to double (to nearest since that
> is the only mode supported for IBM extended double).  Any long double value
> higher than the above will round up the high double to inf.

Well, what I don't quite understand is that the gnulib value, which is

0x1.f7cp+1023

likewise should round to the same double value, shouldn't it?  I notice that if
I actually attempt to use that value in C source code, the compiler does indeed
round it to inf -- but I don't see why it actually should do so ...

[Bug target/70117] ppc long double isinf() is wrong?

2016-03-07 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70117

Ulrich Weigand  changed:

   What|Removed |Added

 CC||amodra at gcc dot gnu.org,
   ||dje at gcc dot gnu.org,
   ||uweigand at gcc dot gnu.org

--- Comment #1 from Ulrich Weigand  ---
Hmm.  For some reason, the gnulib definition of LDBL_MAX differs from GCC's
definition.   With gnulib, we get:

  high double: 7FEF 
  low  double: 7C8F 

while with GCC, we get:

  high double: 7FEF 
  low  double: 7C8F FFFE

In any case, someone is wrong here -- values of LDBL_MAX should certainly
agree.

Now I'm not completely sure why GCC choses the value it does, I notice that
GCC's choice is certainly deliberate; there's extra code to flip the last bit:

real.c:get_max_float

  if (fmt->pnan < fmt->p)
{
  /* This is an IBM extended double format made up of two IEEE
 doubles.  The value of the long double is the sum of the
 values of the two parts.  The most significant part is
 required to be the value of the long double rounded to the
 nearest double.  Rounding means we need a slightly smaller
 value for LDBL_MAX.  */
  buf[4 + fmt->pnan / 4] = "7bde"[fmt->pnan % 4];
}

and similarly real.c:real_maxval

  if (fmt->pnan < fmt->p)
/* This is an IBM extended double format made up of two IEEE
   doubles.  The value of the long double is the sum of the
   values of the two parts.  The most significant part is
   required to be the value of the long double rounded to the
   nearest double.  Rounding means we need a slightly smaller
   value for LDBL_MAX.  */
clear_significand_bit (r, SIGNIFICAND_BITS - fmt->pnan - 1);

This code was originally added by Alan back in 2004 ... adding Alan on CC if he
recalls what this was all about.

[Bug bootstrap/69464] [6 Regression]: bootstrap failure on CentOS 5.11: error: ‘swap’ is not a member of ‘std’

2016-01-25 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69464

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #7 from Ulrich Weigand  ---
I see the same problem in my SPU daily build (running on a RHEL 5 system using
the gcc 4.1.2 host compiler).

[Bug target/68759] [6 Regression] Linux kernel build failure on ppc64le

2016-01-13 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68759

Ulrich Weigand  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |INVALID

--- Comment #8 from Ulrich Weigand  ---
Yes, this is a kernel issue.  It is not an actual bug on the GCC side.

[Bug target/68759] [6 Regression] Linux kernel build failure on ppc64le

2016-01-13 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68759

--- Comment #6 from Ulrich Weigand  ---
FYI, two patches to fix this issue have just been committed to powerpc-next:

https://git.kernel.org/powerpc/c/2e50c4bef77511b42cc226865d

https://git.kernel.org/powerpc/c/a61674bdfc7c2bf909c4010699

[Bug target/68759] [6 Regression] Linux kernel build failure on ppc64le

2015-12-07 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68759

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-12-07
 CC||amodra at gcc dot gnu.org,
   ||dje.gcc at gmail dot com
   Assignee|unassigned at gcc dot gnu.org  |uweigand at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #4 from Ulrich Weigand  ---
OK, with this .config I was able to recreate the problem.  Thanks!

The immediate cause for the ERROR (which is reported by the kernel's modpost
tool) is that object files contain undefined references against symbols that
are actually *defined* in the same object file as local symbols.

This weird situation is the result of some very special object file magic
performed by the perl script ./scripts/recordmcount.pl, which is apparently
supposed to record all call sites to the _mcount routine into a special
section, and uses rather complex logic to do so, which involves parsing the
output of objdump.

This objdump parsing in turn seems to get thrown out of sync if the object
file's .text sections contain data blocks outside of functions (or at least,
before the very first function).  For the sparc64 target, there is special code
in the recordmcount.pl script to handle this situation, but not for powerpc.

This didn't matter so far since on powerpc, this could never happen.  However,
it turns out that kernel modules are built with -mcmodel=large for some reason,
and therefore after my patch, we do indeed see a data block before the first
function.

Using the same trick as on sparc64 makes the build go through again:

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 826470d..96e2486 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -263,7 +263,8 @@ if ($arch eq "x86_64") {

 } elsif ($arch eq "powerpc") {
 $local_regex = "^[0-9a-fA-F]+\\s+t\\s+(\\.?\\S+)";
-$function_regex = "^([0-9a-fA-F]+)\\s+<(\\.?.*?)>:";
+# See comment in the sparc64 section for why we use '\w'.
+$function_regex = "^([0-9a-fA-F]+)\\s+<(\\.?\\w*?)>:";
 $mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\s\\.?_mcount\$";

 if ($bits == 64) {


Alan, have you seen this recordmcount.pl script before?  Is there really no
simpler way to achieve what this wants to do?

B.t.w. given that kernel modules are build with -mcmodel=large, this probably
means that we'll now have to teach the kernel module loader to handle the
R_PPC64_ENTRY reloc.  I guess we need to talk to the kernel folks.

[Bug target/68759] [6 Regression] Linux kernel build failure on ppc64le

2015-12-07 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68759

--- Comment #1 from Ulrich Weigand  ---
I've tried to reproduce this, but with no success so far.  This is primarily
because I cannot manage to get a current mainline Linux kernel built with
allyesconfig for powerpc64le in the first place (various issues, but mostly it
ends up too big and we get relocation overflows in built-in.o).  With other
settings (like defconfig or allmodconfig), it builds fine for me with the new
compiler as well.

Could you provide more information on the exact source tree and .config
settings you're attempting to build?   Also, which binutils are you using?

[Bug tree-optimization/68306] [6 Regression] ICE: in vectorizable_store, at tree-vect-stmts.c:5651

2015-11-16 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68306

--- Comment #14 from Ulrich Weigand  ---
Building the following reduced test case with
  -O2 -ftree-vectorize -fcx-fortran-rules
with an spu-elf cross-cc1 shows the ICE.

void
test (_Complex float *dest,
  _Complex float scale, int count)
{
  for (int x = 0; x < count; x++)
dest[x] *= scale;
}

[Bug tree-optimization/68306] [6 Regression] ICE: in vectorizable_store, at tree-vect-stmts.c:5651

2015-11-16 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68306

Ulrich Weigand  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #12 from Ulrich Weigand  ---
Unfortunately, it seems that your second commit brought back the failure on
spu-elf that had already been fixed by the first commit ...

/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/libgfortran/generated/matmul_c4.c:
In function 'matmul_c4':
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/libgfortran/generated/matmul_c4.c:79:1:
internal compiler error: in vectorizable_store, at tree-vect-stmts.c:5655
 matmul_c4 (gfc_array_c4 * const restrict retarray,
 ^

0x10b19c83 vectorizable_store
   
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/gcc/tree-vect-stmts.c:5655
0x10b21db3 vect_transform_stmt(gimple*, gimple_stmt_iterator*, bool*,
_slp_tree*, _slp_instance*)
   
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/gcc/tree-vect-stmts.c:8007
0x10b49ecf vect_schedule_slp_instance
   
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/gcc/tree-vect-slp.c:3608
0x10b503ab vect_schedule_slp(vec_info*)
   
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/gcc/tree-vect-slp.c:3673
0x10b2dffb vect_transform_loop(_loop_vec_info*)
   
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/gcc/tree-vect-loop.c:6773
0x10b57ab3 vectorize_loops()
   
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/gcc/tree-vectorizer.c:533
0x109fa5d7 execute
   
/home/uweigand/dailybuild/spu-tc-2015-11-13/gcc-head/src/gcc/tree-ssa-loop.c:273

[Bug c/68062] [4.9/5/6 Regression] ICE when comparing vectors

2015-11-12 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68062

--- Comment #8 from Ulrich Weigand  ---
(In reply to Richard Biener from comment #7)
> I think there was some inconsistencies in C vs. C++ FEs in this area (but as
> usual I don't remember exactly but I remember Uli complaining about it again
> at the Caulrdon).
> 
> I believe it was sort-of automatic integer promotion rules should apply if
> they don't change vector sizes (thus, the sign promotion parts should apply).
> 
> That's not "ignoring" signs but doing the appropriate (view-)conversions.

Actually, the C vs. C++ FE inconsistency was about binary operators (+, -,
...), not comparisons.

For both binary and relational operators, the various applicable standards
(AltiVec + extensions, System z vector extensions, OpenCL) all agree that if
the two operands differ in signedness, the operation is not valid and should
result in an error.  However, GCC has never done this, but has always accepted
these combinations (both C and C++).  (At some point, we might want to change
this, but then we have to care that we don't break "vector bool" handling for
those platforms that support it.)

The difference between C and C++ comes in when determining what to use as the
*result type* of a binary operator whose operands differ in signedness.  This
does not apply to comparisons since those have a result type different from the
input types in any case.

[Bug tree-optimization/68306] [6 Regression] ICE: in vectorizable_store, at tree-vect-stmts.c:5651

2015-11-12 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68306

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #4 from Ulrich Weigand  ---
I see the same ICE in a spu-elf libgfortran build:

/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/libgfortran/generated/matmul_c8.c:
In function 'matmul_c8':
/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/libgfortran/generated/matmul_c8.c:79:1:
internal compiler error: in vectorizable_store, at tree-vect-stmts.c:5651
 matmul_c8 (gfc_array_c8 * const restrict retarray,
 ^

0x10b10373 vectorizable_store
   
/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/gcc/tree-vect-stmts.c:5651
0x10b1e553 vect_transform_stmt(gimple*, gimple_stmt_iterator*, bool*,
_slp_tree*, _slp_instance*)
   
/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/gcc/tree-vect-stmts.c:8003
0x10b48b6f vect_schedule_slp_instance
   
/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/gcc/tree-vect-slp.c:3484
0x10b4afeb vect_schedule_slp(vec_info*)
   
/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/gcc/tree-vect-slp.c:3549
0x10b4f2f7 vect_slp_bb(basic_block_def*)
   
/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/gcc/tree-vect-slp.c:2543
0x10b502c7 execute
   
/home/uweigand/dailybuild/spu-tc-2015-11-11/gcc-head/src/gcc/tree-vectorizer.c:734

[Bug bootstrap/68231] [6 Regression] bootstrap failure after placement new

2015-11-06 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68231

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #5 from Ulrich Weigand  ---
Same on spu-elf.

[Bug debug/66728] [5/6 Regression] CONST_WIDE_INT causes corrupted DWARF debug info

2015-08-20 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66728

--- Comment #6 from Ulrich Weigand  ---
(In reply to rsand...@gcc.gnu.org from comment #4)
> Testing a patch.  It involves tightening the mode of the rtx returned
> by rtl_for_decl_location, as well as new asserts, so some fallout is
> likely...

Hi Richard, just a quick ping ...  Were you able to make any progress with
this?


[Bug debug/66728] CONST_WIDE_INT causes corrupted DWARF debug info

2015-07-01 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66728

--- Comment #1 from Ulrich Weigand  ---
A bit of debugging shows that what's going on here is this:

add_const_value_attribute is called with the following constant RTL:
(const_wide_int 0x8000)

The routine then does:
 add_AT_wide (die, DW_AT_const_value,
  std::make_pair (rtl, GET_MODE (rtl)));

Note that GET_MODE (rtl) is VOIDmode.  This apparently causes creation of a
wide_int value with precision 0:
{ = {val = {0, -9223372036854775808, 2}, len = 2, precision =
0}

This seems already wrong, but doesn't quite explain the inconsistent output.

However, dwarf2out.c:get_full_len returns 0 if the precision is 0. 
Subsequently, when the DIE is emitted in output_die, we have:

int len = get_full_len (*a->dw_attr_val.v.val_wide);
int l = HOST_BITS_PER_WIDE_INT / HOST_BITS_PER_CHAR;
if (len * HOST_BITS_PER_WIDE_INT > 64)
  dw2_asm_output_data (1, get_full_len (*a->dw_attr_val.v.val_wide)
* l,
   NULL);

if (WORDS_BIG_ENDIAN)
  for (i = len - 1; i >= 0; --i)
{
  dw2_asm_output_data (l, a->dw_attr_val.v.val_wide->elt (i),
   "%s", name);
  name = NULL;
}

When get_full_len is 0, the "if" is false, and thus no length is emitted.  In
addition, the loop count is 0, so nothing is emitted at all.

On the other hand, when the abbrev is emitted, value_format does:

case dw_val_class_wide_int:
  switch (get_full_len (*a->dw_attr_val.v.val_wide) *
HOST_BITS_PER_WIDE_INT)
{
case 8:
  return DW_FORM_data1;
case 16:
  return DW_FORM_data2;
case 32:
  return DW_FORM_data4;
case 64:
  return DW_FORM_data8;
default:
  return DW_FORM_block1;
}

so for a length of 0 we fall into the default case and assume DW_FORM_block1.

Any suggestions how to fix those (two?) problems?


[Bug debug/66728] New: CONST_WIDE_INT causes corrupted DWARF debug info

2015-07-01 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66728

Bug ID: 66728
   Summary: CONST_WIDE_INT causes corrupted DWARF debug info
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: uweigand at gcc dot gnu.org
  Target Milestone: ---

Compiling the following test case:

__uint128_t test(void)
{
  static const __uint128_t two127 = ((__uint128_t) 1) << 127;

  return two127;
}

on x86_64-linux (or ppc64le-linux) with -O -S -g -dA results in:

.uleb128 0x2# (DIE (0x2d) DW_TAG_subprogram)
# DW_AT_external
.long   .LASF3  # DW_AT_name: "test"
[snip]
.uleb128 0x3# (DIE (0x4e) DW_TAG_variable)
.long   .LASF4  # DW_AT_name: "two127"
.byte   0x1 # DW_AT_decl_file (xxx.i)
.byte   0x4 # DW_AT_decl_line
.long   0x61# DW_AT_type
.byte   0   # end of children of DIE 0x2d

But abbreviation 3 (used for the 0x4e DIE) reads:
.uleb128 0x3# (abbrev code)
.uleb128 0x34   # (TAG: DW_TAG_variable)
.byte   0   # DW_children_no
.uleb128 0x3# (DW_AT_name)
.uleb128 0xe# (DW_FORM_strp)
.uleb128 0x3a   # (DW_AT_decl_file)
.uleb128 0xb# (DW_FORM_data1)
.uleb128 0x3b   # (DW_AT_decl_line)
.uleb128 0xb# (DW_FORM_data1)
.uleb128 0x49   # (DW_AT_type)
.uleb128 0x13   # (DW_FORM_ref4)
.uleb128 0x1c   # (DW_AT_const_value)
.uleb128 0xa# (DW_FORM_block1)
.byte   0
.byte   0

So the variable DIE should have an DW_AT_const_value attribute encoded as
DW_FORM_block1.  This makes sense, since the variable was optimized away due to
being constant, and the size of the constant is 16 bytes.

However, the code in .debug_info to construct the DIE does not emit any
DW_FORM_block1; in fact, it does not emit *anything* where the
DW_AT_const_value is expected.  This causes the resulting debug info to be
corrupted, and tools operating on this info will emit errors (or even crash).

On ppc64le-linux (but not x86_64-linux), the same problem occurs with GCC 5 as
well.  The difference seems to be that x86_64 in GCC 5 uses a CONST_DOUBLE
instead of a CONST_WIDE_INT to represent that 128-bit constant.


[Bug target/65408] New: powerpc64 function argument passing may access invalid memory

2015-03-12 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65408

Bug ID: 65408
   Summary: powerpc64 function argument passing may access invalid
memory
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: uweigand at gcc dot gnu.org
CC: amodra at gcc dot gnu.org, bergner at gcc dot gnu.org,
meissner at gcc dot gnu.org
Target: powerpc64-linux, powerpc64le-linux

The following simple test case:

struct test
{
  int x;
  int y;
  int z;
};

void func(struct test);

void foo(struct test *ptr)
{
  func(*ptr);
}

generates this code for "foo":
ld 4,8(3)
ld 3,0(3)
bl func

Note how *16 bytes* of memory are accessed here.   This is wrong, since "struct
test" is only 12 bytes in size with 4-byte alignment, and if you have an array
of those, the last element may happen to reside just 12 bytes before a page
boundary, so accessing 16 bytes may in fact crash.

When using the -mstrict-align compiler option, we get instead:
lwz 0,0(3)
lwz 4,8(3)
lwz 3,4(3)
sldi 0,0,32
or 3,3,0
sldi 4,4,32
bl func
which is less than optimal, but at least correct.

This bug seems to be present in all compiler versions I've tested (BE or LE),
modulo those that default to -mstrict-align (e.g. LE with -mtune=power7).


[Bug libstdc++/64638] Build failure with recent futex changes in libstdc++, likely all non-gthreads targets

2015-01-17 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64638

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #3 from Ulrich Weigand  ---
I see the same failure on spu-elf (another non-gthreads target).


[Bug rtl-optimization/64010] [msp430-elf] struct function dereference clobbers parameter passed to function

2014-12-17 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64010

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |uweigand at gcc dot 
gnu.org

--- Comment #14 from Ulrich Weigand  ---
Fixed.


[Bug rtl-optimization/64010] [msp430-elf] struct function dereference clobbers parameter passed to function

2014-12-17 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64010

--- Comment #13 from Ulrich Weigand  ---
Since this has been in mainline for two weeks without reported issues, and it
should in general be a safe change, I've backported the patch to 4.9 now.


[Bug rtl-optimization/64010] [msp430-elf] struct function dereference clobbers parameter passed to function

2014-12-17 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64010

--- Comment #12 from Ulrich Weigand  ---
Author: uweigand
Date: Wed Dec 17 15:07:28 2014
New Revision: 218821

URL: https://gcc.gnu.org/viewcvs?rev=218821&root=gcc&view=rev
Log:
2014-12-17  Ulrich Weigand  

Backport from mainline
2014-12-03  Ulrich Weigand  

PR rtl-optimization/64010
* reload.c (push_reload): Before reusing a register contained
in an operand as input reload register, ensure that it is not
used in CALL_INSN_FUNCTION_USAGE.


Modified:
branches/gcc-4_9-branch/gcc/ChangeLog
branches/gcc-4_9-branch/gcc/reload.c


[Bug target/64160] msp430 code generation error adding 32-bit integers

2014-12-09 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64160

--- Comment #4 from Ulrich Weigand  ---
The ususal test in such scenarios involves reg_overlap_mentioned_p:

/* Nonzero if modifying X will affect IN.  [...]  */
int
reg_overlap_mentioned_p (const_rtx x, const_rtx in)

which also handles cases like where the modified register is used as a base
register in a MEM in the second operand (not sure whether this can happen on
your platform).

A condition along the lines of

 "!reg_overlap_mentioned_p (msp430_subreg (HImode, operands[0], SImode, 0),
msp430_subreg (HImode, operands[1], SImode, 2))
  && !reg_overlap_mentioned_p (msp430_subreg (HImode, operands[0], SImode, 0),
   msp430_subreg (HImode, operands[2], SImode, 2))"

should probably catch all invalid cases (i.e. if modifying op3 will affect op7
and/or op8, the split is invalid).

Or, to avoid the duplicate msp430_subreg computation, you might simply instead
add a FAIL at the end of the split instructions:

  if (reg_overlap_mentioned_p (operands[3], operands[7])
  || reg_overlap_mentioned_p (operands[3], operands[8]))
FAIL;

B.t.w. is there a particular reason why the target-specific msp430_subreg is
needed instead of the usual operand_subword?

As to your predicate question, msp430_nonsubreg_operand is defined as:
(define_predicate "msp430_nonsubreg_operand"
  (match_code "reg,mem"))
so it is true if and only if the operand is a REG or a MEM, which means it
would indeed reject SUBREGs.  (Of course a register operand can still *have*
subregs, it just cannot itself *be* a subreg.)

Again, it's not completely clear to me why this special predicate is needed in
the first place; it seems to be solely used in this one splitter.  (Maybe this
is related to some restriction in msp430_subreg, which goes back to the
question what *that* is needed instead of operand_subword, which ought to
handle all general-operand cases.)


[Bug target/64160] msp430 code generation error adding 32-bit integers

2014-12-05 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64160

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #1 from Ulrich Weigand  ---
This is not a reload problem, it's a bug in the addsi3 splitter where it
doesn't account for overlapping source and destination.  We have initially:

(insn 11 10 12 2 (set (reg:SI 13 R13)
(plus:SI (reg:SI 12 R12)
(mem:SI (reg/v/f:HI 28 [ ap ]) [2 ap_4(D)->duration+0 S4 A16])))
bug.i:10 20 {addsi3}
 (expr_list:REG_DEAD (reg:HI 12 R12)
(nil)))

Note where the destination (reg:SI 13) partially overlaps the source (reg:SI
12).

After split 1 we have:

(insn 22 10 23 2 (parallel [
(set (reg:HI 13 R13)
(plus:HI (reg:HI 12 R12)
(mem:HI (reg/v/f:HI 28 [ ap ]) [2 ap_4(D)->duration+0 S2
A16])))
(set (reg:BI 2 R2)
(truncate:BI (lshiftrt:SI (plus:SI (zero_extend:SI (reg:HI 12
R12))
(zero_extend:SI (mem:HI (reg/v/f:HI 28 [ ap ]) [2
ap_4(D)->duration+0 S2 A16])))
(const_int 16 [0x10]
]) bug.i:10 -1
 (nil))
(insn 23 22 12 2 (set (reg:HI 14 R14 [+2 ])
(plus:HI (plus:HI (reg:HI 13 R13 [+2 ])
(mem:HI (plus:HI (reg/v/f:HI 28 [ ap ])
(const_int 2 [0x2])) [2 ap_4(D)->duration+2 S2 A16]))
(zero_extend:HI (reg:BI 2 R2 bug.i:10 -1
 (nil))

Note how the first insn of the pair now clobbers the part of the source that is
still needed for the second half.


[Bug rtl-optimization/64010] [msp430-elf] struct function dereference clobbers parameter passed to function

2014-12-03 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64010

--- Comment #11 from Ulrich Weigand  ---
Hi Nick,

I've checked this in to mainline now. I'd like to wait for a couple of days to
see if anything breaks before backporting ...


[Bug rtl-optimization/64010] [msp430-elf] struct function dereference clobbers parameter passed to function

2014-12-03 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64010

--- Comment #10 from Ulrich Weigand  ---
Author: uweigand
Date: Wed Dec  3 21:59:10 2014
New Revision: 218335

URL: https://gcc.gnu.org/viewcvs?rev=218335&root=gcc&view=rev
Log:
PR rtl-optimization/64010
* reload.c (push_reload): Before reusing a register contained
in an operand as input reload register, ensure that it is not
used in CALL_INSN_FUNCTION_USAGE.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/reload.c


[Bug rtl-optimization/64010] [msp430-elf] struct function dereference clobbers parameter passed to function

2014-12-02 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64010

--- Comment #5 from Ulrich Weigand  ---
Created attachment 34170
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34170&action=edit
Do not clobber function argument registers


[Bug rtl-optimization/64010] [msp430-elf] struct function dereference clobbers parameter passed to function

2014-12-02 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64010

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #4 from Ulrich Weigand  ---
Yes, this seems a generic reload bug.  The comment ahead of the lines you're
adding say:

If [...] the operand contains a register that dies in this insn *and is used
nowhere else* [...]

which is supposed to be implemented by this check:

&& ! refers_to_regno_for_reload_p (regno,
   end_hard_regno (rel_mode,
   regno),
   PATTERN (this_insn), inloc)

But this doesn't look into registers used as function arguments.

I'm not sure why this hasn't occured elsewhere ... however, in your particular
case, it is triggered by a call insn pattern using memory-indirect addressing,
which is probably not available on many targets.

Your patch is a little too conservative, however: it rejects any register that
could potentially be used as function argument, even if it isn't actually used
in this particular call.

Can you check whether this alternative patch (using find_reg_fusage) also fixes
the problem for you?


[Bug target/64115] [4.9/5 Regression] ICE: : in rs6000_delegitimize_address, at config/rs6000/rs6000.c:7051

2014-12-02 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64115

Ulrich Weigand  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Ulrich Weigand  ---
Fixed everywhere.


[Bug target/64115] [4.9/5 Regression] ICE: : in rs6000_delegitimize_address, at config/rs6000/rs6000.c:7051

2014-12-02 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64115

--- Comment #8 from Ulrich Weigand  ---
Author: uweigand
Date: Tue Dec  2 14:33:00 2014
New Revision: 218275

URL: https://gcc.gnu.org/viewcvs?rev=218275&root=gcc&view=rev
Log:
PR target/64115
* config/rs6000/rs6000.c (rs6000_delegitimize_address): Remove
invalid UNSPEC_TOCREL sanity check under ENABLE_CHECKING.

Modified:
branches/gcc-4_8-branch/gcc/ChangeLog
branches/gcc-4_8-branch/gcc/config/rs6000/rs6000.c


[Bug target/64115] [4.9/5 Regression] ICE: : in rs6000_delegitimize_address, at config/rs6000/rs6000.c:7051

2014-12-02 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64115

--- Comment #7 from Ulrich Weigand  ---
Author: uweigand
Date: Tue Dec  2 14:30:47 2014
New Revision: 218274

URL: https://gcc.gnu.org/viewcvs?rev=218274&root=gcc&view=rev
Log:
PR target/64115
* config/rs6000/rs6000.c (rs6000_delegitimize_address): Remove
invalid UNSPEC_TOCREL sanity check under ENABLE_CHECKING.

Modified:
branches/gcc-4_9-branch/gcc/ChangeLog
branches/gcc-4_9-branch/gcc/config/rs6000/rs6000.c


[Bug target/64115] [4.9/5 Regression] ICE: : in rs6000_delegitimize_address, at config/rs6000/rs6000.c:7051

2014-12-02 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64115

--- Comment #6 from Ulrich Weigand  ---
Author: uweigand
Date: Tue Dec  2 14:27:46 2014
New Revision: 218273

URL: https://gcc.gnu.org/viewcvs?rev=218273&root=gcc&view=rev
Log:
PR target/64115
* config/rs6000/rs6000.c (rs6000_delegitimize_address): Remove
invalid UNSPEC_TOCREL sanity check under ENABLE_CHECKING.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/rs6000.c


[Bug target/64115] [4.9/5 Regression] ICE: : in rs6000_delegitimize_address, at config/rs6000/rs6000.c:7051

2014-12-01 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64115

Ulrich Weigand  changed:

   What|Removed |Added

 CC||dje.gcc at gmail dot com

--- Comment #3 from Ulrich Weigand  ---
The ICE is triggered by checking code in rs6000_delegitimize_address:

  if (GET_CODE (y) == UNSPEC
  && XINT (y, 1) == UNSPEC_TOCREL)
{
#ifdef ENABLE_CHECKING
  if (REG_P (XVECEXP (y, 0, 1))
  && REGNO (XVECEXP (y, 0, 1)) == TOC_REGISTER)
{
  /* All good.  */
}
  else if (GET_CODE (XVECEXP (y, 0, 1)) == DEBUG_EXPR)
{
  /* Weirdness alert.  df_note_compute can replace r2 with a
 debug_expr when this unspec is in a debug_insn.
 Seen in gcc.dg/pr51957-1.c  */
}
  else
{
  debug_rtx (orig_x);
  abort ();
}
#endif

which attempts to ensure that the second argument of UNSPEC_TOCREL is the TOC
register.   However, this check seems fragile; in debug code, we can get RTX
simplifications that replace the TOC register by some equivalent expression.

The code already recognizes one such case; this bug shows another case, where
the TOC register is replaced by a MEM RTX for the TOC save slot holding the TOC
value.  [ This case is probably made more likely by the change in my ELFv2 ABI
preparation patch, which has the effect of making TOC moves into the save slot
more explicit at the RTL level, allowing var-tracking code to detect that
equivalence. ]

One could try to make this check in rs6000_delegitimize_address more generic by
allowing some (or all) MEM RTXes.  However, I'm wonding what exactly that check
is supposed to achieve in the first place; for the purposes of this routine,
the second argument of UNSPEC_TOCREL is really irrelevant.

David, would you be OK with simply removing the check (everything enclosed with
ENABLE_CHECKING in the above code)?


[Bug target/64115] [4.9/5 Regression] ICE: : in rs6000_delegitimize_address, at config/rs6000/rs6000.c:7051

2014-12-01 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64115

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2014-12-01
   Assignee|unassigned at gcc dot gnu.org  |uweigand at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Ulrich Weigand  ---
Confirmed.


[Bug rtl-optimization/63952] [5 Regression] bootstrap failure (ICE in prepare_cmp_insn) on s390x in libjava

2014-11-21 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63952

Ulrich Weigand  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Ulrich Weigand  ---
Fixed.


[Bug rtl-optimization/63952] [5 Regression] bootstrap failure (ICE in prepare_cmp_insn) on s390x in libjava

2014-11-21 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63952

--- Comment #4 from Ulrich Weigand  ---
Author: uweigand
Date: Fri Nov 21 15:33:27 2014
New Revision: 217929

URL: https://gcc.gnu.org/viewcvs?rev=217929&root=gcc&view=rev
Log:
PR rtl-optimization/63952
* optabs.c (prepare_cmp_insn): Do not call can_compare_p for CCmode.
* config/s390/s390.md ("cbranchcc4"): Accept any s390_comparison.
Remove incorrect TARGET_HARD_FLOAT check and no-op expander code.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/s390/s390.md
trunk/gcc/optabs.c


[Bug rtl-optimization/63952] [5 Regression] bootstrap failure (ICE in prepare_cmp_insn) on s390x in libjava

2014-11-19 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63952

Ulrich Weigand  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||uweigand at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |uweigand at gcc dot 
gnu.org

--- Comment #3 from Ulrich Weigand  ---
I'll have a look.


[Bug tree-optimization/63748] [4.9/5 Regression] wrong may be used uninitialized warning (abnormal edges)

2014-11-10 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63748

--- Comment #6 from Ulrich Weigand  ---
I guess I can see why there might be an abnormal edge starting at bb 3, or at
least, that the compiler might not be easily able to deduce that it isn't
necessary.

However, I do not understand why any of the abnormal edges *target* bb5
*before* the setjmp call.  Shouldn't an abnormal edge due to a longjmp end up
*after* the setjmp?  After all, the setjmp itself (including the preparation of
its arguments) is *not* executed twice; the effect of the longjmp is simply to
make the setjmp *return* twice.


[Bug tree-optimization/63748] [4.9/5 Regression] may be used uninitialized warning on variable definition with initializer

2014-11-07 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63748

--- Comment #4 from Ulrich Weigand  ---
(In reply to Andrew Pinski from comment #3)
> I think this is hard warning to avoid in the compiler as we don't know if
> foo calls longjmp or not.  Note we don't know if alloc_jmp_buf does a push
> somewhere else.

Huh?  No matter what, "buf" is in fact never used uninitialized in this
function, and that should be trivial to see; the only use of "buf" occurs in
line 17 immediately after the definition of "buf" in line 16, whether any
longjmp is ever called or not.

Also, if any of the gotos in this function is removed, the warning disappears. 
The problem seems to be related to some call-graph optimizations (note that the
if (noside) check is partially redundant).


[Bug tree-optimization/63748] [4.9/5 Regression] may be used uninitialized warning on variable definition with initializer

2014-11-07 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63748

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #2 from Ulrich Weigand  ---
Much simplified test case:

typedef struct __jmp_buf_tag jmp_buf;
extern int setjmp (jmp_buf *);
jmp_buf *alloc_jmp_buf ();
int foo (void *);

int
test (int op, int noside)
{
  void *argvec = 0;

  if (op)
{
  jmp_buf *buf = alloc_jmp_buf ();
  setjmp (buf);

  if (noside)
goto nosideret;

do_call_it:

  if (noside)
goto nosideret;

  return foo (argvec);
}

  argvec = __builtin_alloca (1);
  goto do_call_it;

nosideret:
  return 1;
}

results in:

xxx.i: In function ‘test’:
xxx.i:14:16: error: ‘buf’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
   jmp_buf *buf = alloc_jmp_buf ();

[Bug libstdc++/62259] atomic class doesn't enforce required alignment on powerpc64

2014-09-03 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62259

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #1 from Ulrich Weigand  ---
Indeed, when running a simple test program:

#include 
#include 

struct twoints {
  int a;
  int b;
};

int main(void) {
   printf("%d\n", __alignof__ (twoints));
   printf("%d\n", __alignof__ (std::atomic));
   return 0;
}

we see that the GCC only requires 4 bytes of alignment for the atomic type.

However, with the equivalent C11 code using the _Atomic keyword

#include 
#include 

struct twoints {
  int a;
  int b;
};

int main() {
   printf("%d\n", __alignof__ (struct twoints));
   printf("%d\n", __alignof__ (_Atomic (struct twoints)));
   return 0;
}

we get an alignment requirement of 8 bytes for the atomic type.

In the C case, this is done by the compiler front-end where it implements the
_Atomic keyword.  In the C++ case, it seems the compiler doesn't really get
involved, as it's all done in plain C++ in standard library code ...

I suspect the intent was that for C++, we likewise ought to have an increased
alignment requirement for the type, but I'm not sure how to implement this in
the library.  Need some of the library experts to comment here.


[Bug target/53854] ICE in find_constant_pool_ref

2014-09-01 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53854

--- Comment #9 from Ulrich Weigand  ---
I just noticed that this bug has disappeared on mainline.  Binary search showed
that this happens with rev. 211007, which checks in this patch:
https://gcc.gnu.org/ml/gcc-patches/2013-03/msg01263.html
which originated as part of this patch:
https://gcc.gnu.org/ml/gcc-patches/2013-01/msg01234.html
which implements the -fuse-caller-save feature.

This is weird, since that feature isn't even active in this test case ...

Looking at the IRA dumps reveals the following change in cost computation (note
that r49 and r50 are the two pseudos originally holding the input to the inline
asm):

Before 211007, we have:
  Allocno a1r50 of GENERAL_REGS(15) has 1 avail. regs  13, node:  13 (confl
regs =  0-5 14-36)
  Allocno a2r49 of GENERAL_REGS(15) has 1 avail. regs  13, node:  13 (confl
regs =  0-5 14-36)

At 211007, we have instead:
  Allocno a1r50 of GENERAL_REGS(15) has 8 avail. regs  6-13, node:  6-13
(confl regs =  0-5 14-36)
  Allocno a2r49 of GENERAL_REGS(15) has 8 avail. regs  6-13, node:  6-13
(confl regs =  0-5 14-36)


So it seems that before this patch, IRA thought the only possible register to
hold these values was r13, while after the patch, r6..r12 are allowable as
well.  The former seems obviously bogus.


Looking closer at the changes introduced in 211007, it seems that there is
actually a bug in that patch, which explains the change in behavior even though
-fuse-caller-save is not actually active:

Note that in ira_tune_allocno_costs, the patch changes the code to only add the
extra penalty for IRA_HARD_REGNO_ADD_COST_MULTIPLIER if the register is
call-clobbered.  This is weird, since IRA_HARD_REGNO_ADD_COST_MULTIPLIER is not
supposed to have anything to do with calls, and doesn't before the patch.

And indeed, moving the IRA_HARD_REGNO_ADD_COST_MULTIPLIER logic outside the
outer if (ira_hard_reg_set_intersection_p) re-introduces the bug.


This made me take a closer look at the definition of
IRA_HARD_REGNO_ADD_COST_MULTIPLIER, which happens to be defined solely on s390:

/* In some case register allocation order is not enough for IRA to
   generate a good code.  The following macro (if defined) increases
   cost of REGNO for a pseudo approximately by pseudo usage frequency
   multiplied by the macro value.

   We avoid usage of BASE_REGNUM by nonzero macro value because the
   reload can decide not to use the hard register because some
   constant was forced to be in memory.  */
#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(regno)   \
  (regno == BASE_REGNUM ? 0.0 : 0.5)

Interestingly, the comment says BASE_REGNUM should be avoided, but the actual
implementation of the macro avoid *all* registers *but* BASE_REGNUM ...  This
simply seems to be a bug.

Reverting the logic in that macro leads to this IRA cost calculation:
  Allocno a1r50 of GENERAL_REGS(15) has 7 avail. regs  6-12, node:  6-12
(confl regs =  0-5 14-36)
  Allocno a2r49 of GENERAL_REGS(15) has 7 avail. regs  6-12, node:  6-12
(confl regs =  0-5 14-36)

So it avoids r13 (BASE_REGNUM), but allows r6 .. r12.  This again makes the
test case pass.


I'd suggest to fix the ira_tune_allocno_costs bug introduced by 211007, and
also fix the s390 definition of IRA_HARD_REGNO_ADD_COST_MULTIPLIER (and
probably backport the latter fix to the branches).  I'll start discussing this
on the list.

This still doesn't solve the underlying problem, but should make its appearance
again as rare as it used to be ...


[Bug libobjc/61920] [4.8/4.9/4.10 Regression] libobjc has undefined symbols on powerpc*-linux-gnu

2014-07-28 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61920

--- Comment #8 from Ulrich Weigand  ---
Author: uweigand
Date: Mon Jul 28 14:33:20 2014
New Revision: 213128

URL: https://gcc.gnu.org/viewcvs?rev=213128&root=gcc&view=rev
Log:
PR libobjc/61920
* encoding.c (rs6000_special_adjust_field_align_p): Use definition
that matches the 4.8 branch ABI.

Modified:
branches/gcc-4_8-branch/libobjc/ChangeLog
branches/gcc-4_8-branch/libobjc/encoding.c


[Bug libobjc/61920] [4.8/4.9/4.10 Regression] libobjc has undefined symbols on powerpc*-linux-gnu

2014-07-28 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61920

--- Comment #7 from Ulrich Weigand  ---
Author: uweigand
Date: Mon Jul 28 14:32:13 2014
New Revision: 213127

URL: https://gcc.gnu.org/viewcvs?rev=213127&root=gcc&view=rev
Log:
PR libobjc/61920
* encoding.c (rs6000_special_adjust_field_align_p): Use definition
that matches the 4.9 branch ABI.

Modified:
branches/gcc-4_9-branch/libobjc/ChangeLog
branches/gcc-4_9-branch/libobjc/encoding.c


[Bug libobjc/61920] [4.8/4.9/4.10 Regression] libobjc has undefined symbols on powerpc*-linux-gnu

2014-07-28 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61920

--- Comment #6 from Ulrich Weigand  ---
Since we didn't backport the actual ABI change to the branches, only the
warning, I think it would be consistent to use something like this on the
branches:

#define rs6000_special_adjust_field_align_p(FIELD, COMPUTED) \
  (TARGET_ALTIVEC && TREE_CODE (TREE_TYPE (FIELD)) == VECTOR_TYPE)

rather than the #define ... 0 that is appropriate for mainline.


[Bug middle-end/60102] [4.9/4.10 Regression] powerpc fp-bit ices at dwf_regno

2014-06-03 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60102

--- Comment #12 from Ulrich Weigand  ---
(In reply to Sandra Loosemore from comment #9)
> I've been looking at this a little bit more.
> 
> DWARF_FRAME_REGNUM is specifically documented to take a hard register number
> as its operand, so the assertion in dwf_regno is at least consistent with
> that.  The one in dbx_reg_number is more dubious, since neither
> LEAF_REG_REMAP or DBX_REGISTER_NUMBER are documented to require a hard
> register number.
> 
> So: either the powerpc backend is broken to be using a pseudo in this
> context, or else the documentation for DWARF_FRAME_REGNUM should be changed
> to permit this and the assertions (as necessary) moved into the
> target-specific implementations of these macros.

All those routines are supposed to implement mappings from GCC internal hard
register numbers to some externally-defined number scheme (DWARF, DBX, ...), so
it is consistent that they only accept hard register numbers.

The rs6000 back-end isn't actually attempting to use a "pseudo", they're just
using a quick hack there: apparently, with SPE registers, GCC internal hard
registers 0 .. 31 may be backed by a pair of external registers in debug info,
where the high element of the pair gets a DWARF number in the 1200 ... 1231
range.

The rs6000_dwarf_register_span attempts to implement that by returning a
PARALLEL of two registers.  Now, according to the rules, those both ought to be
hard registers.  However, there is no GCC internal number defined for the high
part of the pair.  The back-end used to hack around that by simply using the
DWARF number (in the 1200 ... 1231) range as "hard" regno, combined with
another hack in rs6000_dbx_register_number that then just passes that number
through unchanged to the DWARF assembler output.

This all worked as long as the middle-end didn't look too closely at those
PARALLELs ... but with those extra asserts, it now fails.

I guess I don't quite understand why there aren't real GCC hard regnos to cover
those SPE high parts ... that seems the cleanest solution.   Not sure if this
was done to avoid some drawback I'm not seeing right now ...


[Bug target/61300] powerpc64le miscompile with K&R-style function definition at -O0

2014-06-02 Thread uweigand at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61300

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #6 from Ulrich Weigand  ---
Note that either of the proposed changes (comment #3 or comment #5) would
result in an incompatible ABI change, since other compilers already implement
the parameter save area rules as defined by the ELFv2 ABI.

The basic rule is: A function body may expect the parameter save area to have
been provided on its callers stack *iff* the function has either some argument
that is passed in memory according to calling convention rules, or the function
has a variable argument list.

This is the only correct rule to be used when generating code for a function
body, no matter whether the function definition uses K&R style or not, or
whether there is a prototype in scope at the definition site or not.


Now, when generating a function *call*, we of course may follow the same rule,
which we can do if we have a prototype in scope, or else we may opt to always
provide the parameter save area (which is the only safe option if there is *no*
prototype in scope).


The problem is that the same macro REG_PARM_STACK_SPACE is currently invoked
for both function calls and function definitions, so the test for prototype_p
is wrong if we're currently compiling a function definition.  Is there a way to
inspect the REG_PARM_STACK_SPACE argument to distinguish those cases?


[Bug go/60870] go interface methods broken on ppc64le (bug296.go)

2014-04-17 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60870

--- Comment #5 from Ulrich Weigand  ---
(In reply to Ian Lance Taylor from comment #4)
> I don't have a PPC system.  Can you see if the attached patch to
> gcc/go/gofrontend/expressions.cc fixes the problem?

Yes, this makes bug296.go PASS again on powerpc64le.

Thanks for the quick fix!


[Bug go/60870] go interface methods broken on ppc64le (bug296.go)

2014-04-17 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60870

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-04-17
 CC||uweigand at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Ulrich Weigand  ---
Confirmed.

This commit seems to have reverted the effects of the bug fix here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02994.html


[Bug target/57363] IBM long double: adding NaN and number raises inexact exception

2013-12-03 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57363

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Ulrich Weigand  ---
Fixed.
http://gcc.gnu.org/ml/gcc-cvs/2013-12/msg00087.html


[Bug target/57949] [powerpc64] Structure parameter alignment issue with vector extensions

2013-11-15 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57949

--- Comment #9 from Ulrich Weigand  ---
Author: uweigand
Date: Fri Nov 15 23:39:50 2013
New Revision: 204870

URL: http://gcc.gnu.org/viewcvs?rev=204870&root=gcc&view=rev
Log:
gcc:

2013-11-15  Ulrich Weigand  

Backport from mainline r201750.
Note: Default setting of -mcompat-align-parm inverted!

2013-08-14  Bill Schmidt  

PR target/57949
* doc/invoke.texi: Add documentation of mcompat-align-parm
option.
* config/rs6000/rs6000.opt: Add mcompat-align-parm option.
* config/rs6000/rs6000.c (rs6000_function_arg_boundary): For AIX
and Linux, correct BLKmode alignment when 128-bit alignment is
required and compatibility flag is not set.
(rs6000_gimplify_va_arg): For AIX and Linux, honor specified
alignment for zero-size arguments when compatibility flag is not
set.

gcc/testsuite:

2013-11-15  Ulrich Weigand  

Backport from mainline r201750.
Note: Default setting of -mcompat-align-parm inverted!

2013-08-14  Bill Schmidt  

PR target/57949
* gcc.target/powerpc/pr57949-1.c: New.
* gcc.target/powerpc/pr57949-2.c: New.


Added:
branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/pr57949-1.c
branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/pr57949-2.c
Modified:
branches/ibm/gcc-4_8-branch/gcc/ChangeLog.ibm
branches/ibm/gcc-4_8-branch/gcc/config/rs6000/rs6000.c
branches/ibm/gcc-4_8-branch/gcc/config/rs6000/rs6000.opt
branches/ibm/gcc-4_8-branch/gcc/doc/invoke.texi
branches/ibm/gcc-4_8-branch/gcc/testsuite/ChangeLog.ibm


[Bug target/57363] IBM long double: adding NaN and number raises inexact exception

2013-11-13 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57363

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #3 from Ulrich Weigand  ---
Hi Adhemerval, I'm also seeing that this patch fixes some glibc failures.

What's the status of this?  Were you planning to submit it for inclusion?

B.t.w. I'm wondering if we don't need to use

+  if (fabs (z) != inf())
+   return z;

instead; z could still be minus infinity, right?


[Bug middle-end/59119] New: Segfault in -fisolate-erroneous-paths pass

2013-11-13 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59119

Bug ID: 59119
   Summary: Segfault in -fisolate-erroneous-paths pass
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: uweigand at gcc dot gnu.org

Building the following test case (reduced from Python 2.7.5) with -O2 -g:

extern void *memmove (void *, const void *, __SIZE_TYPE__);
extern void *memset (void *, int, __SIZE_TYPE__);

typedef struct {
long n_prefix;
long n_spadding;
} NumberFieldWidths;

void
fill_number(char *buf, const NumberFieldWidths *spec)
{
if (spec->n_prefix) {
memmove(buf,
(char *) 0,
spec->n_prefix * sizeof(char));
buf += spec->n_prefix;
}
if (spec->n_spadding) {
memset(buf, 0, spec->n_spadding);
buf += spec->n_spadding;
}
}

crashes the compiler with:

formatter_string.i: In function ‘fill_number’:
formatter_string.i:11:1: internal compiler error: Segmentation fault
 fill_number(char *buf, const NumberFieldWidths *spec)
 ^
0x1064f147 crash_signal
/home/uweigand/src/gcc/gcc/toplev.c:334
0x1090806c ptrofftype_p
/home/uweigand/src/gcc/gcc/tree.h:4463
0x1090806c build2_stat(tree_code, tree_node*, tree_node*, tree_node*)
/home/uweigand/src/gcc/gcc/tree.c:4151
0x1022c0db gimple_assign_rhs_to_tree(gimple_statement_d*)
/home/uweigand/src/gcc/gcc/cfgexpand.c:103
0x10861e67 insert_debug_temp_for_var_def(gimple_stmt_iterator_d*, tree_node*)
/home/uweigand/src/gcc/gcc/tree-ssa.c:442
0x10862397 insert_debug_temps_for_defs(gimple_stmt_iterator_d*)
/home/uweigand/src/gcc/gcc/tree-ssa.c:549
0x103ff49b gsi_remove(gimple_stmt_iterator_d*, bool)
/home/uweigand/src/gcc/gcc/gimple-iterator.c:563
0x10ba8083 insert_trap_and_remove_trailing_statements
/home/uweigand/src/gcc/gcc/gimple-ssa-isolate-paths.c:110
0x10ba8a47 gimple_ssa_isolate_erroneous_paths
/home/uweigand/src/gcc/gcc/gimple-ssa-isolate-paths.c:305
0x10ba8a47 execute
/home/uweigand/src/gcc/gcc/gimple-ssa-isolate-paths.c:370

[Bug target/56184] [4.8 Regression] Internal compiler error in push_reload during bootstrap stage 2

2013-02-06 Thread uweigand at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56184



Ulrich Weigand  changed:



   What|Removed |Added



 CC||vmakarov at gcc dot gnu.org



--- Comment #6 from Ulrich Weigand  2013-02-06 
19:40:30 UTC ---

The problem occurs with the following insn:



(insn 539 383 384 46

 (set (reg:DI 355 [313]) (const_int 256 [0x100]))

 test.ii:128 643 {*movdi_vfp}

 (expr_list:REG_EQUIV (const_int 256 [0x100])

 (nil)))



Register 355 is recognized as always-equal to the constant 256, and insn 539 is

the insn that originally sets up the equivalence.  If the register doesn't get

a hard reg, what ought to happen is that users of reg 355 get replaced by the

constant, and the insn setting the equivalence ought to be deleted.  Because

the insn will get deleted anyway, it also ought to be skipped for find_reloads.



To achieve that, reg_equiv_constant(355) should hold the constant, and

reg_equiv_init(355) should point to the above insn.  However, what actually

happens in this test case is that reg_equiv_init(355) is NULL.  Therefore, the

insn is *not* skipped for find_reloads, which then aborts since it tries to

push an output reload for an always-constant register, which is not supposed to

happen.



Now the register is somewhat special in that it was created by IRA via live

range splitting.  The original register was reg 313; and this still has

reg_equiv_init(313) pointing to the above insn.  However, reg_equiv_init(355)

is NULL.  There is a routine fix_reg_equiv_init in ira.c which appears to be

intended to fix the reg_equiv_init settings of new registers created by live

range splitting.  However, this doesn't seem to have worked in this case ...



Unfortunately I'm not really familiar with the live range splitting code; maybe

Vladimir can help with this?


[Bug target/56184] [4.8 Regression] Internal compiler error in push_reload during bootstrap stage 2

2013-02-06 Thread uweigand at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56184



--- Comment #5 from Ulrich Weigand  2013-02-06 
19:27:31 UTC ---

Depending on configure tests of the installed (cross-)assembler, the ICE may

not occur.  In those cases, I'm now able to reliably reproduce the ICE by using

-fno-section-anchors (in addition to the flags given above).


[Bug target/56184] [4.8 Regression] Internal compiler error in push_reload during bootstrap stage 2

2013-02-05 Thread uweigand at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56184



Ulrich Weigand  changed:



   What|Removed |Added



 CC||uweigand at gcc dot gnu.org



--- Comment #4 from Ulrich Weigand  2013-02-05 
13:51:24 UTC ---

This is weird; I cannot reproduce the behaviour even with the exact configure

and command lines you specify.  I've been using SVN rev. 195717; which revision

do you see the problem with?



In the generated test.ii.208r.ira file I get, I see different register uses

even before IRA, compared to your version.



Would you mind sending me (offline) a full set of the dump files so I can see

where my compile run starts to diverge from yours?


[Bug middle-end/54957] Two crashes introduced by rev192488

2012-10-23 Thread uweigand at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54957



Ulrich Weigand  changed:



   What|Removed |Added



 CC||uweigand at gcc dot gnu.org



--- Comment #14 from Ulrich Weigand  2012-10-23 
17:10:11 UTC ---

I'm getting the same crash when building libstdc++ for spu-elf:



Program received signal SIGSEGV, Segmentation fault.

emit_case_dispatch_table (index_expr=0xf601b1c0, index_type=0xf5e70420,

case_list=0x110001c0, default_label=0xeda05398, minval=0xf5dd0bc0,

maxval=0xf5dd3740, 

range=0xf5dd3740, stmt_bb=0x0) at

/home/uweigand/fsf/gcc-head/gcc/stmt.c:1919

1919  edge default_edge = EDGE_SUCC(stmt_bb, 0);

(gdb) bt

#0  emit_case_dispatch_table (index_expr=0xf601b1c0, index_type=0xf5e70420,

case_list=0x110001c0, default_label=0xeda05398, minval=0xf5dd0bc0,

maxval=0xf5dd3740, 

range=0xf5dd3740, stmt_bb=0x0) at

/home/uweigand/fsf/gcc-head/gcc/stmt.c:1919

#1  0x1079240c in expand_sjlj_dispatch_table (dispatch_index=, dispatch_table=0x10fe6108) at /home/uweigand/fsf/gcc-head/gcc/stmt.c:2292

#2  0x104ac3c4 in sjlj_emit_dispatch_table (dispatch_label=0xeda03980,

num_dispatch=8) at /home/uweigand/fsf/gcc-head/gcc/except.c:1363

#3  0x104ac6f0 in sjlj_build_landing_pads () at

/home/uweigand/fsf/gcc-head/gcc/except.c:1420

#4  0x104acb4c in finish_eh_generation () at

/home/uweigand/fsf/gcc-head/gcc/except.c:1454

#5  0x103ddc24 in gimple_expand_cfg () at

/home/uweigand/fsf/gcc-head/gcc/cfgexpand.c:4579

#6  0x106e1608 in execute_one_pass (pass=0x10ec19b4) at

/home/uweigand/fsf/gcc-head/gcc/passes.c:2320

#7  0x106e1cb4 in execute_pass_list (pass=0x10ec19b4) at

/home/uweigand/fsf/gcc-head/gcc/passes.c:2381

#8  0x10406770 in expand_function (node=0xf1998e50) at

/home/uweigand/fsf/gcc-head/gcc/cgraphunit.c:1601

#9  0x10407b44 in expand_all_functions () at

/home/uweigand/fsf/gcc-head/gcc/cgraphunit.c:1705

#10 0x10408060 in compile () at

/home/uweigand/fsf/gcc-head/gcc/cgraphunit.c:2003

#11 0x1040942c in finalize_compilation_unit () at

/home/uweigand/fsf/gcc-head/gcc/cgraphunit.c:2080

#12 0x101c5fc0 in cp_write_global_declarations () at

/home/uweigand/fsf/gcc-head/gcc/cp/decl2.c:4286

#13 0x107a6ef4 in compile_file () at

/home/uweigand/fsf/gcc-head/gcc/toplev.c:560

#14 0x107a77ec in do_compile () at

/home/uweigand/fsf/gcc-head/gcc/toplev.c:1866

#15 0x107a85bc in toplev_main (argc=23, argv=0xffabf8c4) at

/home/uweigand/fsf/gcc-head/gcc/toplev.c:1942

#16 0x10c0f6d0 in main (argc=, argv=)

at /home/uweigand/fsf/gcc-head/gcc/main.c:36


[Bug rtl-optimization/54739] [4.8 regression] FAIL: gcc.dg/lower-subreg-1.c scan-rtl-dump subreg1 "Splitting reg"

2012-10-01 Thread uweigand at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54739



--- Comment #3 from Ulrich Weigand  2012-10-01 
12:16:53 UTC ---

It seems all three of those targets have an "iordi3" pattern that triggers even

for 32-bit compiles.  In this case, the lower-subreg pass now no longer splits

the register, so that the DImode pattern is actually used.  (Prior to my patch,

the register would have been split anyway.)



The test case is intended to run on 32-bit targets where an ior:DI operation is

supposed to be split; it will now fail on targets with an iordi3 pattern.



For those targets, I guess it's up the target maintainers to decide whether:



- you want the iordi3 pattern to trigger since it gives better code than having

lower-subreg split the operation: in this case, just disable the test case for

your target (this is what I did for ARM)



or



- you'd really prefer to have lower-subreg split the operation, in which case

you should remove the iordi3 pattern


[Bug testsuite/49443] gcc.dg/vect/vect-peel-3.c and vect-peel-4.c fail on IA64 after testsuite change

2012-08-10 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49443

--- Comment #7 from Ulrich Weigand  2012-08-10 
13:26:51 UTC ---
Author: uweigand
Date: Fri Aug 10 13:26:44 2012
New Revision: 190296

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190296
Log:
ChangeLog:

Backport from mainline
2012-07-30  Ulrich Weigand  
Richard Earnshaw  

* target.def (vector_alignment): New target hook.
* doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook.
* doc/tm.texi: Regenerate.
* targhooks.c (default_vector_alignment): New function.
* targhooks.h (default_vector_alignment): Add prototype.
* stor-layout.c (layout_type): Use targetm.vector_alignment.
* config/arm/arm.c (arm_vector_alignment): New function.
(TARGET_VECTOR_ALIGNMENT): Define.

* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use
vector type alignment instead of size.
* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use
element type size directly instead of computing it from alignment.
Fix variable naming and comment.


testsuite/ChangeLog:

Backport from mainline
2012-07-30  Ulrich Weigand  

* lib/target-supports.exp
(check_effective_target_vect_natural_alignment): New function.
* gcc.dg/align-2.c: Only run on targets with natural alignment
of vector types.
* gcc.dg/vect/slp-25.c: Adjust tests for targets without natural
alignment of vector types.

2011-12-21  Michael Zolotukhin  

* gcc.dg/vect/vect-peel-1.c: Adjust test diag-scans to fix fail on AVX.
* gcc.dg/vect/vect-peel-2.c: Ditto.

2011-06-21  Ira Rosen  

PR testsuite/49443
* gcc.dg/vect/vect-peel-3.c: Expect to fail on vect_no_align
targets.
* gcc.dg/vect/vect-peel-4.c: Likewise.

2011-06-14  Ira Rosen  

* gcc.dg/vect/vect-peel-3.c: Adjust misalignment values
for double-word vectors.
* gcc.dg/vect/vect-peel-4.c: Likewise.

Modified:
branches/gcc-4_6-branch/gcc/ChangeLog
branches/gcc-4_6-branch/gcc/config/arm/arm.c
branches/gcc-4_6-branch/gcc/doc/tm.texi
branches/gcc-4_6-branch/gcc/doc/tm.texi.in
branches/gcc-4_6-branch/gcc/stor-layout.c
branches/gcc-4_6-branch/gcc/target.def
branches/gcc-4_6-branch/gcc/targhooks.c
branches/gcc-4_6-branch/gcc/targhooks.h
branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/align-2.c
branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/slp-25.c
branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/vect-peel-1.c
branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/vect-peel-2.c
branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/vect-peel-3.c
branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/vect-peel-4.c
branches/gcc-4_6-branch/gcc/testsuite/lib/target-supports.exp
branches/gcc-4_6-branch/gcc/tree-vect-data-refs.c
branches/gcc-4_6-branch/gcc/tree-vect-loop-manip.c


[Bug target/53854] ICE in find_constant_pool_ref

2012-07-04 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53854

--- Comment #2 from Ulrich Weigand  2012-07-04 
17:17:22 UTC ---
The problem with this is that there was a reason why I originally supported
only a single constant pool reference per instruction: there needs to be an
upper bound on the number of bytes consumed in the text section (code +
literals) by a single insn, otherwise the "pool chunkify" mechanism doesn't
work.

As an obvious extreme example, if the asm were to refer to more than 4096 bytes
of literals, it would be impossible to refer to them all using a single literal
pool base pointer.   As another obvious extreme example, if the asm *code* were
to span more than 4096 bytes, it would be impossible to have even a single
literal in the same chunk as the asm and thus be referenced from it (using the
current chunkify algorithm).

All this is less of an issue in 64-bit code since its much easier to address
literals, but we still be should be correct for 31-bit on old machines too ...

Why do the literals end up in the pool anyway in your example, as opposed to a
register?  They did with older compilers; this seems to have changed recently
due to different IRA cost computations ...   Maybe it would be better to
prevent asm statements from generating pool references; it is likely that the
asm will expect this to happen anyway?


[Bug regression/53729] [4.8 regression] PR53636 fix caused bb-slp-16.c to FAIL on sparc64 and powerpc64

2012-06-26 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53729

Ulrich Weigand  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED

--- Comment #3 from Ulrich Weigand  2012-06-26 
09:09:28 UTC ---
Fixed.


[Bug regression/53729] [4.8 regression] PR53636 fix caused bb-slp-16.c to FAIL on sparc64 and powerpc64

2012-06-26 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53729

--- Comment #2 from Ulrich Weigand  2012-06-26 
09:05:55 UTC ---
Author: uweigand
Date: Tue Jun 26 09:05:48 2012
New Revision: 188979

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188979
Log:
PR tree-optimization/53729
PR tree-optimization/53636
* tree-vect-slp.c (vect_slp_analyze_bb_1): Delay call to
vect_verify_datarefs_alignment until after statements have
been marked as relevant/irrelevant.
* tree-vect-data-refs.c (vect_verify_datarefs_alignment):
Skip irrelevant statements.
(vect_enhance_data_refs_alignment): Use STMT_VINFO_RELEVANT_P
instead of STMT_VINFO_RELEVANT.
(vect_get_data_access_cost): Do not check for supportable
alignment before calling vect_get_load_cost/vect_get_store_cost.
* tree-vect-stmts.c (vect_get_store_cost): Do not abort when
handling unsupported alignment.
(vect_get_load_cost): Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-data-refs.c
trunk/gcc/tree-vect-slp.c
trunk/gcc/tree-vect-stmts.c


[Bug tree-optimization/53636] SLP may create invalid unaligned memory accesses

2012-06-26 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53636

--- Comment #3 from Ulrich Weigand  2012-06-26 
09:05:56 UTC ---
Author: uweigand
Date: Tue Jun 26 09:05:48 2012
New Revision: 188979

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188979
Log:
PR tree-optimization/53729
PR tree-optimization/53636
* tree-vect-slp.c (vect_slp_analyze_bb_1): Delay call to
vect_verify_datarefs_alignment until after statements have
been marked as relevant/irrelevant.
* tree-vect-data-refs.c (vect_verify_datarefs_alignment):
Skip irrelevant statements.
(vect_enhance_data_refs_alignment): Use STMT_VINFO_RELEVANT_P
instead of STMT_VINFO_RELEVANT.
(vect_get_data_access_cost): Do not check for supportable
alignment before calling vect_get_load_cost/vect_get_store_cost.
* tree-vect-stmts.c (vect_get_store_cost): Do not abort when
handling unsupported alignment.
(vect_get_load_cost): Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-data-refs.c
trunk/gcc/tree-vect-slp.c
trunk/gcc/tree-vect-stmts.c


[Bug rtl-optimization/53706] [4.8 Regression] Bootstrap failure due to "Invalid write of size 8 at 0xBDC35E: variable_htab_free (var-tracking.c:1418)

2012-06-21 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53706

Ulrich Weigand  changed:

   What|Removed |Added

 CC||uweigand at gcc dot gnu.org

--- Comment #11 from Ulrich Weigand  2012-06-21 
13:18:53 UTC ---
I'm seeing what appears to be a similar issue bootstrapping powerpc64:

*** glibc detected ***
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus: corrupted
double-linked list: 0x11afd980 ***
=== Backtrace: =
/lib64/power6x/libc.so.6[0x80f2ff9bd0]
/lib64/power6x/libc.so.6[0x80f2ffba4c]
/lib64/power6x/libc.so.6(cfree-0x117fc8)[0x80f2ffc050]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z16empty_alloc_poolP14alloc_pool_def-0xd186ac)[0x1035da6c]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z15free_alloc_poolP14alloc_pool_def-0xd18638)[0x1035daf8]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus[0x109b81dc]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z22variable_tracking_mainv-0x6e1734)[0x109cd57c]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z16execute_one_passP8opt_pass-0xa1248c)[0x1067f7cc]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z17execute_pass_listP8opt_pass-0xa1203c)[0x1067fc34]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z17execute_pass_listP8opt_pass-0xa12024)[0x1067fc4c]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z17execute_pass_listP8opt_pass-0xa12024)[0x1067fc4c]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus[0x103e4ea8]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z7compilev-0xc935c4)[0x103e7474]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z25finalize_compilation_unitv-0xc92e84)[0x103e7bcc]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z28cp_write_global_declarationsv-0xeb57fc)[0x101b5414]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus[0x1074b894]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(_Z11toplev_mainiPPc-0x94c944)[0x1074da44]
/home/uweigand/fsf/gcc-head-build-ppu/./prev-gcc/cc1plus(main-0x436788)[0x10c8d4b0]
/lib64/power6x/libc.so.6[0x80f2f99d34]
/lib64/power6x/libc.so.6(__libc_start_main-0x176cf0)[0x80f2f99fd0]


[Bug regression/53729] [4.8 regression] PR53636 fix caused bb-slp-16.c to FAIL on sparc64 and powerpc64

2012-06-20 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53729

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2012-06-20
 AssignedTo|unassigned at gcc dot   |uweigand at gcc dot gnu.org
   |gnu.org |
 Ever Confirmed|0   |1

--- Comment #1 from Ulrich Weigand  2012-06-20 
15:22:45 UTC ---
The problem is that SLP tests *all* accesses within the basic block for
alignment, even those that aren't actually part of a SLP instance.

This is of course broken, but that bug had been hidden by the PR53636 problem
(due to which accesses were considered aligned that actually are not).

The fix for this problem is to only check *relevant* accesses for alignment. 
(This requires moving the alignment check until after relevant statements are
actually marked ...)

I'm testing a fix.


[Bug tree-optimization/53636] SLP may create invalid unaligned memory accesses

2012-06-15 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53636

--- Comment #2 from Ulrich Weigand  2012-06-15 
15:11:51 UTC ---
Now fixed on mainline; still fails on 4.7.

(While the bug is probably latent even earlier, this particular test case does
not crash on 4.6.)


[Bug tree-optimization/53636] SLP may create invalid unaligned memory accesses

2012-06-15 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53636

--- Comment #1 from Ulrich Weigand  2012-06-15 
13:30:40 UTC ---
Author: uweigand
Date: Fri Jun 15 13:30:36 2012
New Revision: 188661

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188661
Log:
gcc/
PR tree-optimization/53636
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify
stride when doing basic-block vectorization.

gcc/testsuite/
PR tree-optimization/53636
* gcc.target/arm/pr53636.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/arm/pr53636.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-data-refs.c


[Bug tree-optimization/53636] SLP may create invalid unaligned memory accesses

2012-06-11 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53636

Ulrich Weigand  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2012-06-11
 AssignedTo|unassigned at gcc dot   |uweigand at gcc dot gnu.org
   |gnu.org |
 Ever Confirmed|0   |1


[Bug tree-optimization/53636] New: SLP may create invalid unaligned memory accesses

2012-06-11 Thread uweigand at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53636

 Bug #: 53636
   Summary: SLP may create invalid unaligned memory accesses
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: uweig...@gcc.gnu.org


The following test case:

void test (unsigned char *dst)
{
 short tmp[11 * 8], *tptr;
 int i;

 fill (tmp);

 tptr = tmp;
 for (i = 0; i < 8; i++)
   {
 dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >>
7;
 dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >>
7;
 dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >>
7;
 dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >>
7;
 dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >>
7;
 dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >>
7;
 dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >>
7;
 dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >>
7;

 dst += 8;
 tptr += 11;
   }
}

when built on ARM with -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -O
-ftree-vectorize creates code that uses a VLDR instruction to access unaligned
memory, which causes a Bus error at runtime.

The problem seems to be that the check in vect_compute_data_ref_alignment is
not enough for SLP.  Even though SLP only considers a basic blokc, the data-ref
analysis still looks at innermost loops to compute scalar evolutions.  This
results in concluding that the access "tptr[0]" is based on "tmp", which is
aligned to 8 bytes, using a step of 22 bytes.

The alignment check now only verified that the *base* is aligned.  This is OK
if we're actually vectorizing the loop.  But in the SLP case, we really need to
verify instead that the access is aligned on *every* iteration through the loop
...


  1   2   >