from:"segher at gcc dot gnu.org"

[Bug target/97329] POWER9 default cache and line sizes appear to be wrong

2021-03-23 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329

--- Comment #10 from Segher Boessenkool  ---
GCC 11 stage 4 will be fine.

I doubt you can ever measure a difference, but you can try :-)

[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux

2021-03-23 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708

--- Comment #3 from Segher Boessenkool  ---
The only such __SIZEOF_* macro that is not about a standards-required type
is for int128.  Not the best example ;-)

There are not predefines for __SIZEOF_FLOAT128__ etc. either.

In an ideal world the user can just assume those types exist always.  In a
less ideal world, use autoconf?  You have to anyway, if you want to support
older compilers at all.

[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux

2021-03-22 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708

--- Comment #1 from Segher Boessenkool  ---
Yes, the __SIZEOF_* macros do not say whether some type can be used.  This is
true for all targets!

What would it be useful for to define these macros?  They all are equivalent to

#define SIXTEEN 16

:-)

[Bug testsuite/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2021-03-22 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

Segher Boessenkool  changed:

   What|Removed |Added

  Component|target  |testsuite
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Segher Boessenkool  ---
Fixed.

[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526

2021-03-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581

--- Comment #14 from Segher Boessenkool  ---
Well, V=m-o (not the same thing, these are sets) -- but, it is clear that "o"
should be a subset of "m":

(define_memory_constraint "TARGET_MEM_CONSTRAINT"
  "Matches any valid memory."

(define_memory_constraint "o"
  "Matches an offsettable memory reference."

So yeah, it should get the memory_address_addr_space_p thing.

[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2021-03-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

--- Comment #5 from Segher Boessenkool  ---
It helps if you test the compiler you just built, not something old.  Sigh.

Patch is testing.

[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2021-03-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

Segher Boessenkool  changed:

   What|Removed |Added

   Assignee|acsawdey at gcc dot gnu.org|segher at gcc dot 
gnu.org

--- Comment #4 from Segher Boessenkool  ---
That is not where the UNGE and UNLE come from.  I have no idea where they
*do* come from though :-/

[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn) since r11-4623

2021-03-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092

Segher Boessenkool  changed:

   What|Removed |Added

  Attachment #50040|0   |1
is obsolete||

--- Comment #6 from Segher Boessenkool  ---
Created attachment 50401
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50401=edit
Patch

[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526

2021-03-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581

--- Comment #7 from Segher Boessenkool  ---
>From the offending patch:

-/* Return true if the eliminated form of AD is a legitimate target address. 
*/
+/* Return true if the eliminated form of AD is a legitimate target address.
+   If OP is a MEM, AD is the address within OP, otherwise OP should be
+   ignored.  CONSTRAINT is one constraint that the operand may need
+   to meet.  */
 static bool
-valid_address_p (struct address_info *ad)
+valid_address_p (rtx op, struct address_info *ad,
+enum constraint_num constraint)

The addition of those extra args makes clear that the function is no
longer just testing if it is a valid address.  It should be renamed.
And perhaps most callers should still use the old version, the one that
actually tests if something is a valid address?

[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557

2021-03-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496

--- Comment #13 from Segher Boessenkool  ---
Hi Nathan,

I think you didn't push the branch that is on?

[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526

2021-03-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581

--- Comment #5 from Segher Boessenkool  ---
Thanks Vladimir.  It is indeed a problem in LRA (or triggered by it).
We have
8: {[r121:DI+low(unspec[`*.LANCHOR0',%2:DI]
47+0x92a4)]=asm_operands;clobber

so this is an offset that is too big for a machine instruction, those can take
-32768..32767.

Changing the constraint to "m" you get in LRA
Inserting insn reload before:
   13: r121:DI=high(unspec[`*.LANCHOR0',%2:DI] 47+0x92a4)

but this doesn't happen if you keep it "o", and it dies later.

[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong

2021-03-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Segher Boessenkool  ---
commit c60ad1c5fe0249f48362be0f989184ca447f9d17
Author: Segher Boessenkool 
Date:   Wed Mar 3 20:34:32 2021 +

rs6000: Fix check_effective_target_sqrt_insn (PR99352)

The previous version returned true for all PowerPC.  This is incorrect.
We only support floating point square root instructions if a) we support
floating point instructions at all, and b) we have _ARCH_PPCSQ defined.

2020-03-09  Segher Boessenkool  

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_powerpc_sqrt): New.
(check_effective_target_sqrt_insn): Use it.

[Bug target/98959] ICE in extract_constrain_insn, at recog.c:2670

2021-03-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98959

--- Comment #20 from Segher Boessenkool  ---
(In reply to Bill Schmidt from comment #14)
> We should definitely not be allowing the AltiVec "& ~16" flavors into these
> patterns.  I'm not certain whether your fix is the best way to achieve that,
> but it could well be; I'll defer to Segher on that.

Hey, it works, so it is okay for now at least.  Longer term we should
probably think of something more elegant and less failure-prone.

[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557

2021-03-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool  ---
Just FYI:

There are four Power Linux systems in the cfarm (as well as some AIX).

gcc110  POWER7  BE
gcc203  POWER8  BE
gcc112  POWER8  LE
gcc135  POWER9  LE

The last one is by far the most powerful of these.

[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong

2021-03-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

--- Comment #3 from Segher Boessenkool  ---
rs6000 has check_effective_target_powerpc_fprs already (with slightly
different semantics).

[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong

2021-03-02 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

Segher Boessenkool  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Target||powerpc*-*-*
   Last reconfirmed||2021-03-02
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Segher Boessenkool  ---
Mine.

[Bug testsuite/99352] New: check_effective_target_sqrt_insn for powerpc is wrong

2021-03-02 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

Bug ID: 99352
   Summary: check_effective_target_sqrt_insn for powerpc is wrong
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

It just just says
  [istarget powerpc*-*-*]
but it should test whether the preprocessor symbol "_ARCH_PPCSQ" is defined.

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-03-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

--- Comment #9 from Segher Boessenkool  ---
The i386 port has

===
(define_insn "trap"
  [(trap_if (const_int 1) (const_int 6))]
  ""
{
#ifdef HAVE_AS_IX86_UD2
  return "ud2";
#else
  return ASM_SHORT "0x0b0f";
#endif
}
  [(set_attr "length" "2")])
===

which implements __builtin_trap, and can implement __builtin_trap_no_abort
just fine as well, if your OS kernel (or similar) can return after a ud2.

If clang uses terribly confusing names (or semantics, or syntax, etc.) we
should not copy that from them.  *Especially* when that already conflicts
with names they copied from us.

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-03-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

--- Comment #7 from Segher Boessenkool  ---
(In reply to Franz Sirl from comment #5)
> For the naming I suggest __builtin_debugtrap() to align with clang. Maybe
> with an aliased __debugbreak() on Windows platforms.

Those are terrible names.  This would *not* be used more often than
__builtin_trap, for debugging.

In general, builtins should say what they *do*, nott what you imagine they
will be used for.

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-03-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

--- Comment #6 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #4)
> I'm not sure what your proposed not noreturn trap() would do in terms of
> IL semantics compared to a not specially annotated general call?

Nothing I think?  But __builtin_trap *is* very different: it ends BBs.

> "recoverable" likely means resuming after the trap, not on an exception
> path (so it'll not be a throw())?

"recoverable" is super unclear.  For example, on Power the hardware has a
concept "recoverable interrupt", which set MSR[RI]=1, and traps never do.
This is a very different concept as what is wanted here, which has nothing
to do with recoverability, and is simply about not being an abort() (which
__builtin_trap *is*!)

> The only thing that might be useful to the middle-end would be marking
> the function as not altering the memory state.  But I suppose it should
> still serve as a barrier for code motion of both loads and stores, even
> of those loads/stores are known to not trap.  The only magic we'd have
> for this would be __attribute__((const,returns_twice)).  Which likely
> will be more detrimental to general optimization.
> 
> So - what's the "sub-optimal code generation" you refer to from the
> (presumably) volatile asm() you use for the trap?
> 
> [yeah, asm() on GIMPLE is less optimized than a call]

The rs6000 backend can optimise the used instructions: we have trap_if
instructions, both with registers and with immediates.  A single
instruction can do a comparison and a conditional trap.  This works great
with __builtin_trap, *if* the kernel's trap handler has abort() semantics.

__builtin_trap_no_abort() maybe?

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-02-27 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

--- Comment #3 from Segher Boessenkool  ---
Ah, thank you.  Well except there is no keyword called that?

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-02-27 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-02-27
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
[ Do we not have a keyword for feature requests, btw?  I don't see one. ]

The only thing needed for GCC is to have a __builtin_trap_no_noreturn (or
something with a less horrible name ;-) ), that does exactly that: it's the
same as __builtin_trap, just not noreturn.  This is useful on most
architectures, not just PowerPC.

[Bug target/93353] ICE: in final_scan_insn_1, at final.c:3073 (error: could not split insn)

2021-02-27 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93353

--- Comment #9 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #7)
>   if (low_int >= 0x8000 - extra)
> is not true and 0x7fff - -1 is 0x8000 (with UB on the compiler side).

These are HWIs, so there is no UB.

> But also
>   && ((unsigned HOST_WIDE_INT) (INTVAL (XEXP (x, 1)) + 0x8000)
> can invoke UB in the compiler, shouldn't it be just
>   && ((UINTVAL (XEXP (x, 1)) + 0x8000)
> ?

That sounds right, yes.

[Bug target/93353] ICE: in final_scan_insn_1, at final.c:3073 (error: could not split insn)

2021-02-27 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93353

--- Comment #8 from Segher Boessenkool  ---
(In reply to Arseny Solokha from comment #5)
> (In reply to Segher Boessenkool from comment #4)
> > I cannot get the reduced testcase to fail.  Are any special options needed?
> 
> If you've been asking me:

Well you reported this, so probably?  :-)

> no, the compiler invocation posted in comment 0 is
> explicit, but maybe you need -m32 -f{,no-}PIC -f{,no-}stack-protector which
> one often does for reproducing my PRs because of the configuration I use.

Maybe that is why I so often cannot reproduce your PRs?  Please always state
the
exact compiler configuration / invocation needed.

> I'll certainly test the current snapshot, but I won't be able to do so at
> least one more week.

Jakub in comment 7 seems to have found the problem.  I cc:ed Alan, who did
4c69e61f4307, which seems to have fixed it on trunk (there is no PR for that
so far?)

[Bug middle-end/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10

2021-02-26 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-02-27
 Ever confirmed|0   |1

--- Comment #2 from Segher Boessenkool  ---
It generates non-optimal code for older CPUs as well (it does two splats
instead of one:

xxpermdi 0,35,35,3   # 7[c=4 l=4]  vsx_extract_v2di/1
xxpermdi 35,0,0,0# 9[c=4 l=4]  vsx_splat_v2di_reg/0
vrlq 2,2,3

This is because we get things like

Trying 7 -> 9:
7: r117:DI=vec_select(r127:V1TI#0,parallel)
  REG_DEAD r127:V1TI
9: r124:V2DI=vec_duplicate(r117:DI)
  REG_DEAD r117:DI
Failed to match this instruction:
(set (reg:V2DI 124)
(vec_duplicate:V2DI (vec_select:DI (subreg:V2DI (reg:V1TI 127) 0)
(parallel [
(const_int 0 [0])
]

(the patterns we do have use vec_concat instead).

Confirmed.

[Bug target/93353] ICE: in final_scan_insn_1, at final.c:3073 (error: could not split insn)

2021-02-26 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93353

--- Comment #4 from Segher Boessenkool  ---
I cannot get the reduced testcase to fail.  Are any special options needed?

[Bug bootstrap/98181] Add support for FreeBSD on powerpc64le

2021-02-22 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98181

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #3 from Segher Boessenkool  ---
I should have looked if there was a PR for this, sorry.  This was:


commit 2a4183234a45ba28db5ce16cf3ccdd70cdef3b7c
Author: Piotr Kubaj 
AuthorDate: Wed Dec 16 22:26:18 2020 +
Commit: Segher Boessenkool 
CommitDate: Wed Dec 16 22:54:51 2020 +

rs6000: Add support for powerpc64le-unknown-freebsd

This implements support for powerpc64le architecture on FreeBSD.  Since
we don't have powerpcle (32-bit), I did not add support for powerpcle
here. This remains to be changed if there is powerpcle support in the
future.

2020-12-15  Piotr Kubaj  

gcc/
* config.gcc (powerpc*le-*-freebsd*): Add.
* configure.ac (powerpc*le-*-freebsd*): Ditto.
* configure: Regenerate.
* config/rs6000/freebsd64.h (ASM_SPEC_COMMON): Use ENDIAN_SELECT.
(DEFAULT_ASM_ENDIAN): Add little endian support.
(LINK_OS_FREEBSD_SPEC64): Ditto.

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-02-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #26 from Segher Boessenkool  ---
Can you show the code you tried in comment 23?  It is near impossible to see
what happened there without that.

[Bug tree-optimization/99068] Missed PowerPC lhau optimization

2021-02-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068

--- Comment #8 from Segher Boessenkool  ---
Using update form instructions constrains register allocation and scheduling.
It is *not* always a good idea.

That is one of the reasons why we currently use update form instructions only
when insns just happen to land close (in the same basic block).  See
auto_inc_dec.c .

We also do some work to make it more likely that loops will use these
constructs.  We don't go out of our way to use update form insns at the cost
of everything else.  You will see them more often if you use -Os or -O1.

[Bug tree-optimization/99068] Missed PowerPC lhau optimization

2021-02-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068

--- Comment #6 from Segher Boessenkool  ---
(In reply to Brian Grayson from comment #4)
> (In reply to Segher Boessenkool from comment #3)
> > Then you get
> > 
> > addi 9,9,-2
> > lhau 10,2(9)
> > addi 9,9,2
> > 
> > which is worse than just
> > 
> > lha 10,0(9)
> > addi 9,9,2
> 
> Why is the second addi needed, in your example?

Because I typoed it.

> And note that if a
> pre-decrement "addi 9,9,-2" is needed to pre-bias the pointer, it is done
> once outside the loop, and not in every iteration of the loop.

You cannot do that without changing the loop structure.  There are various
non-trivial other paths into and out of the loop body.

Since ivopts has decided to not use pre-increment here (because it is more
expensive than not using it), we do not use it.

Why do you think having a lhau is better?

[Bug tree-optimization/99068] Missed PowerPC lhau optimization

2021-02-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #5 from Segher Boessenkool  ---
a) The code does not compile (it is not complete source code, some includes
are needed);
b) When you fix that, the compiler will tell you it is invalid code (you
shadow "a" with an incompatible type).
c) You do not say which target you used.

So let's try this (on powerpc64-linux, -O3 since you want that, everything
else default):

===
#include 

int found_zero_ptr(int16_t *a, int N)
{
for (int16_t *p = a; p < a + N; p++)
if (*p == 0)
return 1;
return 0;
}
===

which as core has

===
.L21:
lha 9,2(3)
addi 3,3,4
cmpld 7,3,4
cmpwi 0,9,0
beq 0,.L5
bge 7,.L4
.L3:
lha 9,0(3)
cmpwi 0,9,0
bne 0,.L21
===

You cannot use lha in this without making the generated code worse.

(Marking this invalid *again*.)

[Bug tree-optimization/99068] Missed PowerPC lhau optimization

2021-02-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Segher Boessenkool  ---
Then you get

addi 9,9,-2
lhau 10,2(9)
addi 9,9,2

which is worse than just

lha 10,0(9)
addi 9,9,2

[Bug target/98468] [9 regression] test case gcc.target/powerpc/rlwimi-2.c fails starting with r9-3594

2021-02-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98468

--- Comment #3 from Segher Boessenkool  ---
git tag -l 'releases*' --contains 8d2d39587d94

[Bug target/99048] __gcc_qadd produces spurious NaN

2021-02-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99048

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-02-12
 Ever confirmed|0   |1

--- Comment #3 from Segher Boessenkool  ---
Yup, something like that.  It should not have any infinities here afaics.

[Bug target/99048] __gcc_qadd produces spurious NaN

2021-02-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99048

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #1 from Segher Boessenkool  ---
IBM long double ("double-double") is not an IEEE floating point format,
so all these rules do not apply, but you are right it is surprising.

[Bug tree-optimization/99068] Missed PowerPC lhau optimization

2021-02-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID
 CC||segher at gcc dot gnu.org

--- Comment #1 from Segher Boessenkool  ---
Because it would be incorrect?  lhau is pre-modify (like all update
form instructions).

[Bug target/99041] combine creates invalid address which ICEs in decompose_normal_address

2021-02-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99041

--- Comment #7 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #6)
> The mma_assemble_pair/mma_assemble_acc patterns both generate lxv or lxvp
> at, which both use a DQ offset and we already have function to
> test for that.  The following change fixes the ICE, so I'll give it a spin
> on regtesting.

That looks fine; if that is the only change you need it is pre-approved
for trunk.  Thanks!

[Bug rtl-optimization/98986] Try matching both orders of commutative RTX operations when there is no canonical order

2021-02-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98986

--- Comment #6 from Segher Boessenkool  ---
(In reply to rguent...@suse.de from comment #4)
> So this is where the "autogenerated" part comes in.  We should have
> an idea what might be useful and what isn't even worth trying by
> looking at the machine description (which might require exposing
> costs in such form for this case of constants).
> 
> For commutative operands maybe recog itself can be relaxed and
> accept the insn with the "wrong" commutation (or fix it up
> itself) for example.  Or maybe genrecog can magically emit
> commutated variants (like genmatch does for :c annotated
> expression branches).

We could probably derive what things in an RTL expression are commutative (even
if there are many quantities in play), but only allowing the canonical forms in
that is a daunting task.  Something like :c could help; we already have % in
RTL,
but we need more general than that (examples: a+b+c and a*b+c*d should both be
handled some way, since such cases (structure, not necessarily those exact ops)
happen a lot in practice.

[Bug rtl-optimization/98986] Try matching both orders of commutative RTX operations when there is no canonical order

2021-02-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98986

--- Comment #5 from Segher Boessenkool  ---
(In reply to rsand...@gcc.gnu.org from comment #3)
> FWIW, another similar thing I've wanted in the past is to try
> recognising multiple possible constants in an (and X (const_int N))
> when X is known to have some bits clear.  Often we try to make N contain
> as few bits as possible, but that can give worse results than a fuller mask.

This could be done in the target machine description, where it makes a lot more
sense to do anyway, *if* nonzero_bits was generally usable there.  I have Plans
for that for GCC 12, but don't depend on it please :-)

[Bug rtl-optimization/98692] Unitialized Values reported only with -Os

2021-02-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692

--- Comment #24 from Segher Boessenkool  ---
I do see the problems for savegpr/restgpr with that suggestion, but maybe
something
in that vein can be done.

[Bug rtl-optimization/98692] Unitialized Values reported only with -Os

2021-02-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692

--- Comment #23 from Segher Boessenkool  ---
savegpr/restgpr are special ABI-defined functions that do not have all the same
ABI
calling conventions as normal functions.  They indeed write into the parent's
frame
(red zone, in this case).

Maybe you should allow this always when a function has not established a new
frame?
That always has to be done with a stdu 1,...(1) insn (in 64-bit; stwu in
32-bit, but
the 32-bit Linux ABI has no red zone anyway) so it probably isn't too hard to
detect.
Only leaf functions will not establish a new frame normally (but that can
happen
later in the function, esp. with shrink-wrapping).

Unstacking a frame is most other things that write to r1, often addi 1,1,...
and
sometimes ld 1,0(1) (there probably are other cases too that I am forgetting
here).
Maybe you should invalidate the red zone whenever r1 is changed, instead?

[Bug rtl-optimization/98692] Unitialized Values reported only with -Os

2021-02-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692

--- Comment #16 from Segher Boessenkool  ---
(In reply to Mark Wielaard from comment #13)
> ==25741== Use of uninitialised value of size 8
> ==25741==at 0x1504: main (pr9862.C:16)

r4 is argv here
>0x14f0 <+16>:  ld  r3,0(r4)
r3 = argv[0];
>0x14f4 <+20>:  mr  r31,r4
r31 = argv; // because we need it after the call, save it in a non-volatile reg
>0x14f8 <+24>:  std r0,16(r1)
>0x14fc <+28>:  stdur1,-48(r1)
>0x1500 <+32>:  bl  0x16b4 
The call; after this we have to load argv[0] again, the called function might
have changed it.
>0x1504 <+36>:  ld  r3,0(r31)
r3 = argv[0];

So it is funny that the exact same insn four insns earlier (in the program
text)
worked fine, but this one fails.

The ABI says (taken from the ELFv1 ABI, the ELFv2 doc is not nice for
copy/paste):


Here is a sample implementation of _savegpr0_N and _restgpr0_N.

  _savegpr0_14:  std  r14,-144(r1)
  _savegpr0_15:  std  r15,-136(r1)
  _savegpr0_16:  std  r16,-128(r1)
  _savegpr0_17:  std  r17,-120(r1)
  _savegpr0_18:  std  r18,-112(r1)
  _savegpr0_19:  std  r19,-104(r1)
  _savegpr0_20:  std  r20,-96(r1)
  _savegpr0_21:  std  r21,-88(r1)
  _savegpr0_22:  std  r22,-80(r1)
  _savegpr0_23:  std  r23,-72(r1)
  _savegpr0_24:  std  r24,-64(r1)
  _savegpr0_25:  std  r25,-56(r1)
  _savegpr0_26:  std  r26,-48(r1)
  _savegpr0_27:  std  r27,-40(r1)
  _savegpr0_28:  std  r28,-32(r1)
  _savegpr0_29:  std  r29,-24(r1)
  _savegpr0_30:  std  r30,-16(r1)
  _savegpr0_31:  std  r31,-8(r1)
 std  r0, 16(r1)
 blr


  _restgpr0_14:  ld   r14,-144(r1)
  _restgpr0_15:  ld   r15,-136(r1)
  _restgpr0_16:  ld   r16,-128(r1)
  _restgpr0_17:  ld   r17,-120(r1)
  _restgpr0_18:  ld   r18,-112(r1)
  _restgpr0_19:  ld   r19,-104(r1)
  _restgpr0_20:  ld   r20,-96(r1)
  _restgpr0_21:  ld   r21,-88(r1)
  _restgpr0_22:  ld   r22,-80(r1)
  _restgpr0_23:  ld   r23,-72(r1)
  _restgpr0_24:  ld   r24,-64(r1)
  _restgpr0_25:  ld   r25,-56(r1)
  _restgpr0_26:  ld   r26,-48(r1)
  _restgpr0_27:  ld   r27,-40(r1)
  _restgpr0_28:  ld   r28,-32(r1)
  _restgpr0_29:  ld   r0, 16(r1)
 ld   r29,-24(r1)
 mtlr r0
 ld   r30,-16(r1)
 ld   r31,-8(r1)
 blr
  _restgpr0_30:  ld   r30,-16(r1)
  _restgpr0_31:  ld   r0, 16(r1)
 ld   r31,-8(r1)
 mtlr r0
 blr

So this is one function with many entry points you could say.  Maybe that is
what confused valgrind?

[Bug rtl-optimization/98692] Unitialized Values reported only with -Os

2021-02-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692

--- Comment #15 from Segher Boessenkool  ---
(In reply to Will Schmidt from comment #14)
> The _restgpr* and _savegpr* functions are not referenced when the test is
> built at other optimization levels.  (I've looked at disassembly from -O0 ..
> -O4).

Right, it is a size optimisation.

> I do note that the _restgpr and _savegpr functions are called differently. 
> savegpr is called with bl while the restgpr is called via a direct branch; i
> can't immediately tell if this is by design or if it is in error.

It is by design: these are special functions defined by the ABI, specifically
to save some code space.

[Bug rtl-optimization/99041] combine creates invalid address which ICEs in decompose_normal_address

2021-02-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99041

--- Comment #5 from Segher Boessenkool  ---
(As Jakub said; I'm just slow).

[Bug rtl-optimization/99041] combine creates invalid address which ICEs in decompose_normal_address

2021-02-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99041

--- Comment #4 from Segher Boessenkool  ---
combine always asks recog(), so that must have said it is okay?

[Bug rtl-optimization/98986] Try matching both orders of commutative RTX operations when there is no canonical order

2021-02-08 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98986

--- Comment #2 from Segher Boessenkool  ---
I agree it makes sense to have the one arm with vec_duplicate first in the
canonical order.  Problem is that this is deep in the arms, but it can be
done of course.

Autogenerating part of combine?  Nonononono please.  Or, what part do you
mean?  Something in rtx-simplify would make sense, and something in recog
would make a *lot* of sense.  For the latter, we probably want some more
syntax in the machine description, things like % are too restrictive (and
that is really only meant for RA).  For example, a common pattern is the
sum of three things, which has no good way of expressing right now.

[Bug libgcc/98952] powerpc*: __trampoline_setup inverted test for trampoline size

2021-02-03 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98952

--- Comment #2 from Segher Boessenkool  ---
And after that it always copies r4 bytes, too (rounded down to a multiple
of four bytes).

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-02-02 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #22 from Segher Boessenkool  ---
Don't replace the constraints.  For one thing, this is very hard to do
correctly.  Just make the "m" constraint not allow prefixed memory in
asms, like I said above.  (So all "general_operand" even!)

[Bug target/98093] ICE in gen_vsx_set_v2df, at config/rs6000/vsx.md:3276

2021-02-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98093

--- Comment #6 from Segher Boessenkool  ---
(In reply to Martin Liška from comment #5)
> It's fixed on master, can we close it now or do we need a backport to active
> branches?

If someone filled in the known-to-work / known-to-fail fields we would know!

[Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads

2021-02-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053

--- Comment #11 from Segher Boessenkool  ---
Please open a separate bug for x86 problems.

[Bug target/98210] [11 Regression] SHF_GNU_RETAIN breaks gold linker generated binaries

2021-01-29 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98210

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #7 from Segher Boessenkool  ---
This also needs a backport to 10?  Can someone please fill in the
known_to_{work,fail} fields?

[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case

2021-01-27 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #26 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #23)
> (that combine number prevails on trunk as well, I can't spot any code
> that disables combine on large BBs so not sure what goes on here)

There is no such thing, indeed.  And the instruction combiner is
"mostly linear", so it shouldn't actually matter.

[Bug target/95095] Feature request: support -fno-unique-section-names

2021-01-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #8 from Segher Boessenkool  ---
I say nothing like that.  I say that
  .text.hot.
is nasty (is easily mistaken for .text.hot).

I also say that and that named-per-function sections are better as
  .text%name
than as
  .text.name
(just as they were long ago), because this doesn't conflict with things like
  .text.hot
(and there is a very long history of such conflicts giving real-world
problems).

[Bug target/95095] Feature request: support -fno-unique-section-names

2021-01-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #6 from Segher Boessenkool  ---
I was under the impression this unique section thing needed the trailing
dot thing.  This probably is not true.

I still think the old "%" thing is much superior to the trailing dot thing,
but that then is orthogonal to the "unique section" thing, so let's ignore
it now :-)

It still remains that this flag needs a name that says what it *does*, as I
mentioned at the end of Comment 4.

[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn)

2021-01-22 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092

--- Comment #3 from Segher Boessenkool  ---
Created attachment 50040
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50040=edit
Patch

Patch in testing.

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-01-22 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #19 from Segher Boessenkool  ---
We cannot allow "m" to allow pcrel memory accesses, because most
existing inline assembler code will break then.  So we then need
some way to tell the compiler that some instruction *does* allow
pcrel memory (or even *requires* it).

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-20 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #22 from Segher Boessenkool  ---
Fixed with

commit f8c66617ab91826af1d950b00d853eaff622
Author: Segher Boessenkool 
Date:   Tue Jan 19 23:43:56 2021 +

rs6000: Fix rs6000_emit_le_vsx_store (PR98549)

One of the advantages of LRA is that you can create new pseudos from it
just fine.  The code in rs6000_emit_le_vsx_store was not aware of this.
This patch changes that, in the process fixing PR98549 (where it is
shown that we do call rs6000_emit_le_vsx_store during LRA, which we
used to assert can not happen).

2021-01-20  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Change assert.
Adjust comment.  Simplify code.

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

Segher Boessenkool  changed:

   What|Removed |Added

  Attachment #49996|0   |1
is obsolete||

--- Comment #21 from Segher Boessenkool  ---
Created attachment 50007
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50007=edit
Better patch

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #18 from Segher Boessenkool  ---
Created attachment 49996
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49996=edit
Patch

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #17 from Segher Boessenkool  ---
(In reply to jos...@codesourcery.com from comment #15)
> Only if the undefined behavior is a property of the program, or of all 
> possible executions of the program, as opposed to a property of a 
> particular execution of the program.  See C90 DR#109.  "A conforming 
> implementation must not fail to translate a strictly conforming program 
> simply because *some* possible execution of that program would result in 
> undefined behavior.".

Yeah, good point.

But we do not have a complete program here at all, so this doesn't
say much.  If this was a complete program likely *every* execution
of it would be UB; but of course it is also possible to make one
where no execution has UB.

Since the main routine in this snippet unconditionally has undefined
behaviour, there is no way I can call this valid code.


Anyway, the attached patch fixes the problem in this testcase.  Not
sure yet it is actually correct ;-)

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #16 from Segher Boessenkool  ---
Needs -mcpu=power8.  Confirmed with that (and the given options).

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #14 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #13)
> For UB at runtime, we can warn, but shouldn't error because the code might
> never be invoked at runtime.

As far as I can see at least the C standard disagrees with this:
  NOTE Possible undefined behavior ranges from ignoring the situation
  completely with unpredictable results, to behaving during translation
  or program execution in a documented manner characteristic of the
  environment (with or without the issuance of a diagnostic message), to
  terminating a translation or execution (with the issuance of a
  diagnostic message).

So we are allowed to error.


It doesn't seem we will ever agree whether this is valid code.  This is
not a very useful discussion anyway: let's just make a small (valid code)
testcase and be done with it :-)

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #12 from Segher Boessenkool  ---
for (long i; i != compress_n_blocks; ++i)

"i" is uninitialized; accessing it is UB.  So this is ice-on-invalid.

I have no doubt there is an actual bug somewhere here.  We just do not
have valid code yet as testcase (preferably shorter than this, and C
code, so that it is easier and can run on more systems).

[Bug target/95095] Feature request: support -fno-unique-section-names

2021-01-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #2 from Segher Boessenkool  ---
Can't we use ".text%name" for -ffunction-sections, like we did originally,
in 1996?  See cf4403481dd6.  This does not conflict with other section
names, and does not have all the problems you get from doing anything that
is not a simple prefix.

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #10 from Segher Boessenkool  ---
(And that new test case is full of obvious invalid code as well, fwiw.)

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #9 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #6)
> The warning often warns on dead code.
> But even if the warning is right, that doesn't make it ice-on-invalid-code.
> The code may have UB at runtime, but that UB doesn't need to be ever
> triggered when running the program.

That does not make it valid code.

> ice-on-invalid-code stands for code that should be rejected (diagnosed with
> errors, not warnings), but instead of giving the error we ICE on it instead.
> That is not the case here.

The documentation says
  ice-on-invalid-code   ICE on code that is not valid
which is true here.

Anyway:
  unsigned long xor_buf_y[1];
  ...
  typecast_copy(xor_buf_y, in, 4);
which obviously is an out-of-bounds access.  But there are even worse things:
  char *__trans_tmp_2;
  memcpy(__trans_tmp_2, S2, 32);
(accessing an uninitialised variable).

So no, there is no way I can consider this a P1.

[Bug rtl-optimization/98692] Unitialized Values reported only with -Os

2021-01-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692

Segher Boessenkool  changed:

   What|Removed |Added

 CC||acsawdey at gcc dot gnu.org

--- Comment #5 from Segher Boessenkool  ---
Have you tried a new valgrind?

Either this is (or was) a known problem in valgrind, or it is related to
one.  Cc:ing Aaron, he might know more (he wrote the GCC optimisations
that expose the problem).

[Bug rtl-optimization/98692] Unitialized Values reported only with -Os

2021-01-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692

--- Comment #4 from Segher Boessenkool  ---
Are you sure that target is correct?!

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-14 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

--- Comment #5 from Segher Boessenkool  ---
The "warninb" says
  warning: ‘void* memcpy(void*, const void*, long unsigned int)’ writing 32
bytes into a region of size 8 overflows the destination [-Wstringop-overflow=]

It says it is wrong, so it is not a warning, it is an error.

Perhaps that warning is just completely broken, it is lying to the user?

[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu

2021-01-14 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549

Segher Boessenkool  changed:

   What|Removed |Added

   Priority|P1  |P4

--- Comment #3 from Segher Boessenkool  ---
It is an ICE-on-invalid, so it cannot be P1.

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #17 from Segher Boessenkool  ---
(What i was referring to in Comment 4 was asm_operand_ok in recog.c --
it may need some surgery if we need to hook into that).

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #16 from Segher Boessenkool  ---
No, this cannot be fixed in this hook, or in any other hook.  The compiler
can never see *at all* what instructions there are, the template is just a
piece of text to it (there could be assembler macros in play, if you need
to see a practical reason).

We just need new constraints, as Bill and Peter agree.

[Bug testsuite/98643] [11 regression] r11-6615 causes failure in gcc.target/powerpc/fold-vec-extract- char.p7.c

2021-01-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98643

Segher Boessenkool  changed:

   What|Removed |Added

   Last reconfirmed||2021-01-13
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Segher Boessenkool  ---
Yeah, the last addi in the new addi/add/addi sequences is superfluous.

Confirmed.

[Bug c++/98645] C++ modules support does not work on PowerPC with IEEE 128-bit long double

2021-01-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98645

--- Comment #1 from Segher Boessenkool  ---
(In reply to Michael Meissner from comment #0)
> I am tuning up the final patches for providing support to enable the PowerPC
> server compilers to change the default long double from using the IBM
> 128-bit double double format to IEEE 128-bit.

You mean "change the default for powerpc64le-*" I hope?  Most other
configurations we cannot change, certainly not before we allow IEEE QP
float everywhere.

> When the default long double is IEEE 128-bit, the powerpc backend needs to
> create a new type (__ibm128) to allow access to the old IBM 128-bit format. 
> It looks like the gcc/cp/module.cc code does not have a method of dealing
> with target specific floating point types.

If that is true, this should be a P1.  Please figure out if it is true!

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-01-05 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #11 from Segher Boessenkool  ---
(In reply to Bill Schmidt from comment #10)
> But it seems we would also need a new constraint that does permit
> PC-relative addresses, since new code will/may not have a TOC.

How could that work?  You need different assembler code for pcrel
accesses!  *Sometimes* just prefixing a "p" is enough, maybe we
should do something for that, but we cannot magically fix the
general problem.

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-01-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #8 from Segher Boessenkool  ---
Yes, "m" can not allow PC-relative, in inline asm (just think of all existing
code that uses "m").

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-01-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #6 from Segher Boessenkool  ---
You cannot look at the instruction, ever.  The inline asm template is
just text, nothing else.  You cannot assume it is valid instructions.

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2021-01-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #4 from Segher Boessenkool  ---
"m" is already handled differently for inline asm, so perhaps we can just
extend that?  ("m" in machine descriptions is "m<>" in asm, for example).

[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC

2020-12-28 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

--- Comment #6 from Segher Boessenkool  ---
(In reply to Fangrui Song from comment #5)
> Please read my first comment why copy relocs is a bad name.

Since I reply to some of that (namely, your argument 1)), you could assume I
have read your comment already ;-)

> The compiler
> behavior is whether the external data symbol is accessed
> directly/indirectly.

Not really, no.  It isn't clear at all what "directly" even means!

> Copy relocs is just the inferred ELF linker behavior
> (in -no-pie/-pie link mode) when the symbol is external. The option name
> should mention the direct behavior, instead of the inferred behavior at the
> linking stage.

Yes.  But your proposed solution just makes this worse :-(

> -fdirect-access-external-data makes sense on other binary formats, though I
> won't ask GCC to
> implement relevant behaviors for other binary formats.

But what does that *mean*?  "direct access"?  (And, "external data", for that
matter!  This isn't as obvious as it was thirty years ago.)

> * For example, on COFF, the behavior is like always
> -fdirect-access-external-data.  __declspec(dllimport) is needed to use
> indirect access.

I don't know what "declspec" is.  Something something mswindows?

> * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic
> (only available on arm) and the opposite for -fpic.

So what you want is that object that are globally visible will be implemented
as-is?  For if you do not do whole-program optimisation, for example?  So that
a) those objects will actually *exist*, and b) they will be laid out in the way
the program expects?

> If you don't want to think of non-ELF, feel free to make the option specific
> to ELF.

The problem is not that I don't want to think about it, but that the way it
seems to be defined only applies to ELF (and to some specific (sub-)targets
using ELF, even).

> > You want to have this a generic option, while it is
> > not clear at all what it would mean, what it would *do*, which is especially
> > important if you want this to be an option used by multiple compilers: if it
> > is not clear to every user what simple, sensible thing a flag is the knob
> > for, that flag simply cannot be used at all -- or worse, some users *will*
> > use it, but then their intentions are not clear to humans, and different
> > compilers can (and will!) think the user wanted something else!
> 
> To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC

Huh?  That isn't a user-visible thing at all, it's an implementation detail.
It is a quite straight-forward auto thing, defined to true if the loader
passes some specific test.

- o - o -

So, what you want is to attach the attribute ((used)) variable attribute to all
data (or at least the data not explicitly made static) automatically?

[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC

2020-12-27 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #4 from Segher Boessenkool  ---
(In reply to Fangrui Song from comment #3)
> Are you happy with the option name -f[no-]direct-access-external-data ?

Not at all, no :-(

The name does not explain its purpose at all, and the whole concept only
makes sense for a fraction of all targets.  A -mcopy-relocs ("generate copy
relocations if that is a good idea"), defined *per target*, would be a lot
better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie),
or something like that.  You want to have this a generic option, while it is
not clear at all what it would mean, what it would *do*, which is especially
important if you want this to be an option used by multiple compilers: if it
is not clear to every user what simple, sensible thing a flag is the knob
for, that flag simply cannot be used at all -- or worse, some users *will*
use it, but then their intentions are not clear to humans, and different
compilers can (and will!) think the user wanted something else!

[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to

2020-12-11 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326

--- Comment #20 from Segher Boessenkool  ---
Yes, that is clear...  But we have ***double*** x in that example even,
as the declared type of the parameter, so converting that to float is
almost certainly a bad idea?

[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to

2020-12-11 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326

--- Comment #18 from Segher Boessenkool  ---
Why is it correct to convert the double x to single precision here?!

[Bug target/98020] PPC: mfvsrwz+extsw not merged to mtvsrwa

2020-12-08 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98020

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-12-08
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
mtvsrwa is the wrong way around, and mfvsrwa does not exist.  Am I missing
anything?

[Bug rtl-optimization/98178] Combine splitter does not split to single instruction

2020-12-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98178

--- Comment #3 from Segher Boessenkool  ---
Yup, this is true in general, we almost never say why we don't combine so
far.  Patches welcome!  (Make sure you use TDF_DETAILS for such prints).

[Bug rtl-optimization/98179] New: gcc.dg/pr97954.c fails on (at least) BE powerpc

2020-12-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98179

Bug ID: 98179
   Summary: gcc.dg/pr97954.c fails on (at least) BE powerpc
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

/home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c: In function 'foo':
/home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: error: too many
outgoing branch edges from bb 4
during RTL pass: loop2_invariant
/home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: internal compiler
error: verify_flow_info failed
0x10435cb3 verify_flow_info()
/home/segher/src/gcc/gcc/cfghooks.c:269
0x10876cc7 checking_verify_flow_info
/home/segher/src/gcc/gcc/cfghooks.h:212
0x10876cc7 move_loop_invariants()
/home/segher/src/gcc/gcc/loop-invariant.c:2299
0x1087142f execute
/home/segher/src/gcc/gcc/loop-init.c:530

This happens because this passed moved insn 8 from bb 4 to 2:

(jump_insn 8 2 22 2 (parallel [
(set (reg:SI 118 [ x ])
(asm_operands:SI ("") ("=r") 0 []
 []
 [
(label_ref:DI 22)
] pr97954.c:10))
(clobber (reg:SI 98 ca))
]) "pr97954.c":10:3 -1
 (expr_list:REG_UNUSED (reg:SI 98 ca)
(nil))
 -> 22)

We shouldn't allow such a move at all (not of any jump_insn!)

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #23 from Segher Boessenkool  ---
Changing the ABI (silently, even!) is never an expected thing.  All of the
four 32-bit ABIs we support have an AltiVec variant that isn't fully
compatible to the non-AltiVec base variant.  It would be a huge disservice
to the user to change the ABI from under his/her feet.

Anyway, patch in testing.

[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972

--- Comment #3 from Segher Boessenkool  ---
#0  moving_insn_creates_bookkeeping_block_p (through_insn=0x3fffb5b23138, 
insn=0x3fffb5b736c0) at /home/segher/src/gcc/gcc/sel-sched.c:2031

It crashes here because the insn is not in any BB; which is correct
actually, because the insn has been deleted!

It is deleted in sel-sched, and it was created there as well.  I don't
see anything wrong in the earlier debug dump; afaics this was just
expose by the 2-2 combine thing.

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #20 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #18)
> So why don't we default to the Altivec ABI with -m32 on cpus that have
> Altivec and VSX units???

History.  I'm not sure all our ABIs are compatible with vectors enabled,
either.

Since always, you have needed to use -mabi=altivec on 32-bit.

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #19 from Segher Boessenkool  ---
(In reply to Arseny Solokha from comment #17)
> (In reply to Segher Boessenkool from comment #16)
> > Oh, it's a different testcase, in comment 6.  Yeah a new PR would
> > have been better ;-/
> 
> Do you want me to reopen PR97963 and copy comment 14 there until it's not
> too late?

Nah, it already is too late...  Just keep it in mind for the future :-)

It is easy to join two PRs.  It is very hard / annoying to separate PRs;
it is much easier if separate bugs just start out separate, so don't
piggy-back it onto a PR that you think may have to do with it (you can
always point to the existing PR!)

[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972

--- Comment #2 from Segher Boessenkool  ---
Confirmed.

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-24 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #16 from Segher Boessenkool  ---
Oh, it's a different testcase, in comment 6.  Yeah a new PR would
have been better ;-/

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-24 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #15 from Segher Boessenkool  ---
Why does that compiler default to -mcpu=power10?

[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2020-11-20 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

--- Comment #1 from Segher Boessenkool  ---
Confirmed (needs -O0).

[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976

2020-11-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847

--- Comment #4 from Segher Boessenkool  ---
This was caused (or exposed) by e3b3b59683c1:

commit e3b3b59683c1e7d31a9d313dd97394abebf644be
Author: Vladimir N. Makarov 
Date:   Fri Nov 13 12:45:59 2020 -0500

[PATCH] Implementation of asm goto outputs

[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976

2020-11-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847

--- Comment #3 from Segher Boessenkool  ---
I can now reproduce it, with a compiler built yesterday (previous was a
few days older), and -O0.

Confirmed.

[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to

2020-11-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #8 from Segher Boessenkool  ---
The fmadd;frsp sequence is correct for this source code.  It does double
rounding of the result (first to DP float, then to SP float), so using
just fmadds is only correct for -ffast-math or similar.

[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976

2020-11-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #1 from Segher Boessenkool  ---
I cannot reproduce this?  Not with any -mcpu= either, or any -O option.

[Bug target/97784] Expressions evaluated as long chain instead of as tree or the like

2020-11-11 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #6 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #3)
> There is targetm.sched.reassociation_width which specifies how re-assocation
> should make such sequence "wide".

Ah cool, thank you :-)

> Andrew is correct that we don't do this
> for any types that are TYPE_OVERFLOW_UNDEFINED.

Yes; but I see the sub-optimal behaviour for unsigned, too.

> And powerpc has
> 
> static int
> rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED,
> machine_mode mode)
> {
>   switch (rs6000_tune)
> {
> case PROCESSOR_POWER8:
> case PROCESSOR_POWER9:
> case PROCESSOR_POWER10:
>   if (DECIMAL_FLOAT_MODE_P (mode))
> return 1;
>   if (VECTOR_MODE_P (mode))
> return 4;
>   if (INTEGRAL_MODE_P (mode))
> return 1;

Yeah this last 1 is the problem :-)

> thus you get width 1 which means a linear chain (even if the user wrote
> a tree).

Yup.

> Note RTL doesn't do any such thing like re-assocation (I guess in principle
> scheduling could, and that's the only place where it would make sense
> on RTL).

RTL unrolling can, actually!  "Variable expansion" is its horrible name
(and it makes a lot of sense there: it allows breaking a bit linear chain
into pieces).

[Bug target/97786] New: rs6000 isinf etc. are pretty horrible

2020-11-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97786

Bug ID: 97786
   Summary: rs6000 isinf etc. are pretty horrible
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

int isfinite(double x) { return __builtin_isfinite (x); }
int isinf(double x) { return __builtin_isinf (x); }
int isinf_sign(double x) { return __builtin_isinf_sign (x); }
int isnan(double x) { return __builtin_isnan (x); }
int isnormal(double x) { return __builtin_isnormal (x); }
int fpclassify(double x) { return __builtin_fpclassify (5, 6, 7, 8, 9, x); }

We can generate much better code for all these than the generic code
we use now.

[Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like

2020-11-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #2 from Segher Boessenkool  ---
No, it is exactly the same with unsigned types :-(

Use  -Dlong="unsigned long"  or use  #define O ^  (as in my original test).
I forgot about this signed thing, but it has nothing to do with it (that
matters on gimple level, sure, but the problem exists in pure RTL as well).

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 3161 matches

Mail list logo