[Bug target/97329] POWER9 default cache and line sizes appear to be wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329 --- Comment #10 from Segher Boessenkool --- GCC 11 stage 4 will be fine. I doubt you can ever measure a difference, but you can try :-)
[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708 --- Comment #3 from Segher Boessenkool --- The only such __SIZEOF_* macro that is not about a standards-required type is for int128. Not the best example ;-) There are not predefines for __SIZEOF_FLOAT128__ etc. either. In an ideal world the user can just assume those types exist always. In a less ideal world, use autoconf? You have to anyway, if you want to support older compilers at all.
[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708 --- Comment #1 from Segher Boessenkool --- Yes, the __SIZEOF_* macros do not say whether some type can be used. This is true for all targets! What would it be useful for to define these macros? They all are equivalent to #define SIXTEEN 16 :-)
[Bug testsuite/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 Segher Boessenkool changed: What|Removed |Added Component|target |testsuite Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Segher Boessenkool --- Fixed.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #14 from Segher Boessenkool --- Well, V=m-o (not the same thing, these are sets) -- but, it is clear that "o" should be a subset of "m": (define_memory_constraint "TARGET_MEM_CONSTRAINT" "Matches any valid memory." (define_memory_constraint "o" "Matches an offsettable memory reference." So yeah, it should get the memory_address_addr_space_p thing.
[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 --- Comment #5 from Segher Boessenkool --- It helps if you test the compiler you just built, not something old. Sigh. Patch is testing.
[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 Segher Boessenkool changed: What|Removed |Added Assignee|acsawdey at gcc dot gnu.org|segher at gcc dot gnu.org --- Comment #4 from Segher Boessenkool --- That is not where the UNGE and UNLE come from. I have no idea where they *do* come from though :-/
[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn) since r11-4623
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092 Segher Boessenkool changed: What|Removed |Added Attachment #50040|0 |1 is obsolete|| --- Comment #6 from Segher Boessenkool --- Created attachment 50401 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50401=edit Patch
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #7 from Segher Boessenkool --- >From the offending patch: -/* Return true if the eliminated form of AD is a legitimate target address. */ +/* Return true if the eliminated form of AD is a legitimate target address. + If OP is a MEM, AD is the address within OP, otherwise OP should be + ignored. CONSTRAINT is one constraint that the operand may need + to meet. */ static bool -valid_address_p (struct address_info *ad) +valid_address_p (rtx op, struct address_info *ad, +enum constraint_num constraint) The addition of those extra args makes clear that the function is no longer just testing if it is a valid address. It should be renamed. And perhaps most callers should still use the old version, the one that actually tests if something is a valid address?
[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496 --- Comment #13 from Segher Boessenkool --- Hi Nathan, I think you didn't push the branch that is on?
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #5 from Segher Boessenkool --- Thanks Vladimir. It is indeed a problem in LRA (or triggered by it). We have 8: {[r121:DI+low(unspec[`*.LANCHOR0',%2:DI] 47+0x92a4)]=asm_operands;clobber so this is an offset that is too big for a machine instruction, those can take -32768..32767. Changing the constraint to "m" you get in LRA Inserting insn reload before: 13: r121:DI=high(unspec[`*.LANCHOR0',%2:DI] 47+0x92a4) but this doesn't happen if you keep it "o", and it dies later.
[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 Segher Boessenkool changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Segher Boessenkool --- commit c60ad1c5fe0249f48362be0f989184ca447f9d17 Author: Segher Boessenkool Date: Wed Mar 3 20:34:32 2021 + rs6000: Fix check_effective_target_sqrt_insn (PR99352) The previous version returned true for all PowerPC. This is incorrect. We only support floating point square root instructions if a) we support floating point instructions at all, and b) we have _ARCH_PPCSQ defined. 2020-03-09 Segher Boessenkool gcc/testsuite/ * lib/target-supports.exp (check_effective_target_powerpc_sqrt): New. (check_effective_target_sqrt_insn): Use it.
[Bug target/98959] ICE in extract_constrain_insn, at recog.c:2670
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98959 --- Comment #20 from Segher Boessenkool --- (In reply to Bill Schmidt from comment #14) > We should definitely not be allowing the AltiVec "& ~16" flavors into these > patterns. I'm not certain whether your fix is the best way to achieve that, > but it could well be; I'll defer to Segher on that. Hey, it works, so it is okay for now at least. Longer term we should probably think of something more elegant and less failure-prone.
[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #2 from Segher Boessenkool --- Just FYI: There are four Power Linux systems in the cfarm (as well as some AIX). gcc110 POWER7 BE gcc203 POWER8 BE gcc112 POWER8 LE gcc135 POWER9 LE The last one is by far the most powerful of these.
[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 --- Comment #3 from Segher Boessenkool --- rs6000 has check_effective_target_powerpc_fprs already (with slightly different semantics).
[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 Segher Boessenkool changed: What|Removed |Added Ever confirmed|0 |1 Target||powerpc*-*-* Last reconfirmed||2021-03-02 Assignee|unassigned at gcc dot gnu.org |segher at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Segher Boessenkool --- Mine.
[Bug testsuite/99352] New: check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 Bug ID: 99352 Summary: check_effective_target_sqrt_insn for powerpc is wrong Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- It just just says [istarget powerpc*-*-*] but it should test whether the preprocessor symbol "_ARCH_PPCSQ" is defined.
[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 --- Comment #9 from Segher Boessenkool --- The i386 port has === (define_insn "trap" [(trap_if (const_int 1) (const_int 6))] "" { #ifdef HAVE_AS_IX86_UD2 return "ud2"; #else return ASM_SHORT "0x0b0f"; #endif } [(set_attr "length" "2")]) === which implements __builtin_trap, and can implement __builtin_trap_no_abort just fine as well, if your OS kernel (or similar) can return after a ud2. If clang uses terribly confusing names (or semantics, or syntax, etc.) we should not copy that from them. *Especially* when that already conflicts with names they copied from us.
[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 --- Comment #7 from Segher Boessenkool --- (In reply to Franz Sirl from comment #5) > For the naming I suggest __builtin_debugtrap() to align with clang. Maybe > with an aliased __debugbreak() on Windows platforms. Those are terrible names. This would *not* be used more often than __builtin_trap, for debugging. In general, builtins should say what they *do*, nott what you imagine they will be used for.
[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 --- Comment #6 from Segher Boessenkool --- (In reply to Richard Biener from comment #4) > I'm not sure what your proposed not noreturn trap() would do in terms of > IL semantics compared to a not specially annotated general call? Nothing I think? But __builtin_trap *is* very different: it ends BBs. > "recoverable" likely means resuming after the trap, not on an exception > path (so it'll not be a throw())? "recoverable" is super unclear. For example, on Power the hardware has a concept "recoverable interrupt", which set MSR[RI]=1, and traps never do. This is a very different concept as what is wanted here, which has nothing to do with recoverability, and is simply about not being an abort() (which __builtin_trap *is*!) > The only thing that might be useful to the middle-end would be marking > the function as not altering the memory state. But I suppose it should > still serve as a barrier for code motion of both loads and stores, even > of those loads/stores are known to not trap. The only magic we'd have > for this would be __attribute__((const,returns_twice)). Which likely > will be more detrimental to general optimization. > > So - what's the "sub-optimal code generation" you refer to from the > (presumably) volatile asm() you use for the trap? > > [yeah, asm() on GIMPLE is less optimized than a call] The rs6000 backend can optimise the used instructions: we have trap_if instructions, both with registers and with immediates. A single instruction can do a comparison and a conditional trap. This works great with __builtin_trap, *if* the kernel's trap handler has abort() semantics. __builtin_trap_no_abort() maybe?
[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 --- Comment #3 from Segher Boessenkool --- Ah, thank you. Well except there is no keyword called that?
[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-02-27 Ever confirmed|0 |1 --- Comment #1 from Segher Boessenkool --- [ Do we not have a keyword for feature requests, btw? I don't see one. ] The only thing needed for GCC is to have a __builtin_trap_no_noreturn (or something with a less horrible name ;-) ), that does exactly that: it's the same as __builtin_trap, just not noreturn. This is useful on most architectures, not just PowerPC.
[Bug target/93353] ICE: in final_scan_insn_1, at final.c:3073 (error: could not split insn)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93353 --- Comment #9 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #7) > if (low_int >= 0x8000 - extra) > is not true and 0x7fff - -1 is 0x8000 (with UB on the compiler side). These are HWIs, so there is no UB. > But also > && ((unsigned HOST_WIDE_INT) (INTVAL (XEXP (x, 1)) + 0x8000) > can invoke UB in the compiler, shouldn't it be just > && ((UINTVAL (XEXP (x, 1)) + 0x8000) > ? That sounds right, yes.
[Bug target/93353] ICE: in final_scan_insn_1, at final.c:3073 (error: could not split insn)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93353 --- Comment #8 from Segher Boessenkool --- (In reply to Arseny Solokha from comment #5) > (In reply to Segher Boessenkool from comment #4) > > I cannot get the reduced testcase to fail. Are any special options needed? > > If you've been asking me: Well you reported this, so probably? :-) > no, the compiler invocation posted in comment 0 is > explicit, but maybe you need -m32 -f{,no-}PIC -f{,no-}stack-protector which > one often does for reproducing my PRs because of the configuration I use. Maybe that is why I so often cannot reproduce your PRs? Please always state the exact compiler configuration / invocation needed. > I'll certainly test the current snapshot, but I won't be able to do so at > least one more week. Jakub in comment 7 seems to have found the problem. I cc:ed Alan, who did 4c69e61f4307, which seems to have fixed it on trunk (there is no PR for that so far?)
[Bug middle-end/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-02-27 Ever confirmed|0 |1 --- Comment #2 from Segher Boessenkool --- It generates non-optimal code for older CPUs as well (it does two splats instead of one: xxpermdi 0,35,35,3 # 7[c=4 l=4] vsx_extract_v2di/1 xxpermdi 35,0,0,0# 9[c=4 l=4] vsx_splat_v2di_reg/0 vrlq 2,2,3 This is because we get things like Trying 7 -> 9: 7: r117:DI=vec_select(r127:V1TI#0,parallel) REG_DEAD r127:V1TI 9: r124:V2DI=vec_duplicate(r117:DI) REG_DEAD r117:DI Failed to match this instruction: (set (reg:V2DI 124) (vec_duplicate:V2DI (vec_select:DI (subreg:V2DI (reg:V1TI 127) 0) (parallel [ (const_int 0 [0]) ] (the patterns we do have use vec_concat instead). Confirmed.
[Bug target/93353] ICE: in final_scan_insn_1, at final.c:3073 (error: could not split insn)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93353 --- Comment #4 from Segher Boessenkool --- I cannot get the reduced testcase to fail. Are any special options needed?
[Bug bootstrap/98181] Add support for FreeBSD on powerpc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98181 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #3 from Segher Boessenkool --- I should have looked if there was a PR for this, sorry. This was: commit 2a4183234a45ba28db5ce16cf3ccdd70cdef3b7c Author: Piotr Kubaj AuthorDate: Wed Dec 16 22:26:18 2020 + Commit: Segher Boessenkool CommitDate: Wed Dec 16 22:54:51 2020 + rs6000: Add support for powerpc64le-unknown-freebsd This implements support for powerpc64le architecture on FreeBSD. Since we don't have powerpcle (32-bit), I did not add support for powerpcle here. This remains to be changed if there is powerpcle support in the future. 2020-12-15 Piotr Kubaj gcc/ * config.gcc (powerpc*le-*-freebsd*): Add. * configure.ac (powerpc*le-*-freebsd*): Ditto. * configure: Regenerate. * config/rs6000/freebsd64.h (ASM_SPEC_COMMON): Use ENDIAN_SELECT. (DEFAULT_ASM_ENDIAN): Add little endian support. (LINK_OS_FREEBSD_SPEC64): Ditto.
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #26 from Segher Boessenkool --- Can you show the code you tried in comment 23? It is near impossible to see what happened there without that.
[Bug tree-optimization/99068] Missed PowerPC lhau optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068 --- Comment #8 from Segher Boessenkool --- Using update form instructions constrains register allocation and scheduling. It is *not* always a good idea. That is one of the reasons why we currently use update form instructions only when insns just happen to land close (in the same basic block). See auto_inc_dec.c . We also do some work to make it more likely that loops will use these constructs. We don't go out of our way to use update form insns at the cost of everything else. You will see them more often if you use -Os or -O1.
[Bug tree-optimization/99068] Missed PowerPC lhau optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068 --- Comment #6 from Segher Boessenkool --- (In reply to Brian Grayson from comment #4) > (In reply to Segher Boessenkool from comment #3) > > Then you get > > > > addi 9,9,-2 > > lhau 10,2(9) > > addi 9,9,2 > > > > which is worse than just > > > > lha 10,0(9) > > addi 9,9,2 > > Why is the second addi needed, in your example? Because I typoed it. > And note that if a > pre-decrement "addi 9,9,-2" is needed to pre-bias the pointer, it is done > once outside the loop, and not in every iteration of the loop. You cannot do that without changing the loop structure. There are various non-trivial other paths into and out of the loop body. Since ivopts has decided to not use pre-increment here (because it is more expensive than not using it), we do not use it. Why do you think having a lhau is better?
[Bug tree-optimization/99068] Missed PowerPC lhau optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #5 from Segher Boessenkool --- a) The code does not compile (it is not complete source code, some includes are needed); b) When you fix that, the compiler will tell you it is invalid code (you shadow "a" with an incompatible type). c) You do not say which target you used. So let's try this (on powerpc64-linux, -O3 since you want that, everything else default): === #include int found_zero_ptr(int16_t *a, int N) { for (int16_t *p = a; p < a + N; p++) if (*p == 0) return 1; return 0; } === which as core has === .L21: lha 9,2(3) addi 3,3,4 cmpld 7,3,4 cmpwi 0,9,0 beq 0,.L5 bge 7,.L4 .L3: lha 9,0(3) cmpwi 0,9,0 bne 0,.L21 === You cannot use lha in this without making the generated code worse. (Marking this invalid *again*.)
[Bug tree-optimization/99068] Missed PowerPC lhau optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #3 from Segher Boessenkool --- Then you get addi 9,9,-2 lhau 10,2(9) addi 9,9,2 which is worse than just lha 10,0(9) addi 9,9,2
[Bug target/98468] [9 regression] test case gcc.target/powerpc/rlwimi-2.c fails starting with r9-3594
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98468 --- Comment #3 from Segher Boessenkool --- git tag -l 'releases*' --contains 8d2d39587d94
[Bug target/99048] __gcc_qadd produces spurious NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99048 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-02-12 Ever confirmed|0 |1 --- Comment #3 from Segher Boessenkool --- Yup, something like that. It should not have any infinities here afaics.
[Bug target/99048] __gcc_qadd produces spurious NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99048 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #1 from Segher Boessenkool --- IBM long double ("double-double") is not an IEEE floating point format, so all these rules do not apply, but you are right it is surprising.
[Bug tree-optimization/99068] Missed PowerPC lhau optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99068 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID CC||segher at gcc dot gnu.org --- Comment #1 from Segher Boessenkool --- Because it would be incorrect? lhau is pre-modify (like all update form instructions).
[Bug target/99041] combine creates invalid address which ICEs in decompose_normal_address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99041 --- Comment #7 from Segher Boessenkool --- (In reply to Peter Bergner from comment #6) > The mma_assemble_pair/mma_assemble_acc patterns both generate lxv or lxvp > at, which both use a DQ offset and we already have function to > test for that. The following change fixes the ICE, so I'll give it a spin > on regtesting. That looks fine; if that is the only change you need it is pre-approved for trunk. Thanks!
[Bug rtl-optimization/98986] Try matching both orders of commutative RTX operations when there is no canonical order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98986 --- Comment #6 from Segher Boessenkool --- (In reply to rguent...@suse.de from comment #4) > So this is where the "autogenerated" part comes in. We should have > an idea what might be useful and what isn't even worth trying by > looking at the machine description (which might require exposing > costs in such form for this case of constants). > > For commutative operands maybe recog itself can be relaxed and > accept the insn with the "wrong" commutation (or fix it up > itself) for example. Or maybe genrecog can magically emit > commutated variants (like genmatch does for :c annotated > expression branches). We could probably derive what things in an RTL expression are commutative (even if there are many quantities in play), but only allowing the canonical forms in that is a daunting task. Something like :c could help; we already have % in RTL, but we need more general than that (examples: a+b+c and a*b+c*d should both be handled some way, since such cases (structure, not necessarily those exact ops) happen a lot in practice.
[Bug rtl-optimization/98986] Try matching both orders of commutative RTX operations when there is no canonical order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98986 --- Comment #5 from Segher Boessenkool --- (In reply to rsand...@gcc.gnu.org from comment #3) > FWIW, another similar thing I've wanted in the past is to try > recognising multiple possible constants in an (and X (const_int N)) > when X is known to have some bits clear. Often we try to make N contain > as few bits as possible, but that can give worse results than a fuller mask. This could be done in the target machine description, where it makes a lot more sense to do anyway, *if* nonzero_bits was generally usable there. I have Plans for that for GCC 12, but don't depend on it please :-)
[Bug rtl-optimization/98692] Unitialized Values reported only with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692 --- Comment #24 from Segher Boessenkool --- I do see the problems for savegpr/restgpr with that suggestion, but maybe something in that vein can be done.
[Bug rtl-optimization/98692] Unitialized Values reported only with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692 --- Comment #23 from Segher Boessenkool --- savegpr/restgpr are special ABI-defined functions that do not have all the same ABI calling conventions as normal functions. They indeed write into the parent's frame (red zone, in this case). Maybe you should allow this always when a function has not established a new frame? That always has to be done with a stdu 1,...(1) insn (in 64-bit; stwu in 32-bit, but the 32-bit Linux ABI has no red zone anyway) so it probably isn't too hard to detect. Only leaf functions will not establish a new frame normally (but that can happen later in the function, esp. with shrink-wrapping). Unstacking a frame is most other things that write to r1, often addi 1,1,... and sometimes ld 1,0(1) (there probably are other cases too that I am forgetting here). Maybe you should invalidate the red zone whenever r1 is changed, instead?
[Bug rtl-optimization/98692] Unitialized Values reported only with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692 --- Comment #16 from Segher Boessenkool --- (In reply to Mark Wielaard from comment #13) > ==25741== Use of uninitialised value of size 8 > ==25741==at 0x1504: main (pr9862.C:16) r4 is argv here >0x14f0 <+16>: ld r3,0(r4) r3 = argv[0]; >0x14f4 <+20>: mr r31,r4 r31 = argv; // because we need it after the call, save it in a non-volatile reg >0x14f8 <+24>: std r0,16(r1) >0x14fc <+28>: stdur1,-48(r1) >0x1500 <+32>: bl 0x16b4 The call; after this we have to load argv[0] again, the called function might have changed it. >0x1504 <+36>: ld r3,0(r31) r3 = argv[0]; So it is funny that the exact same insn four insns earlier (in the program text) worked fine, but this one fails. The ABI says (taken from the ELFv1 ABI, the ELFv2 doc is not nice for copy/paste): Here is a sample implementation of _savegpr0_N and _restgpr0_N. _savegpr0_14: std r14,-144(r1) _savegpr0_15: std r15,-136(r1) _savegpr0_16: std r16,-128(r1) _savegpr0_17: std r17,-120(r1) _savegpr0_18: std r18,-112(r1) _savegpr0_19: std r19,-104(r1) _savegpr0_20: std r20,-96(r1) _savegpr0_21: std r21,-88(r1) _savegpr0_22: std r22,-80(r1) _savegpr0_23: std r23,-72(r1) _savegpr0_24: std r24,-64(r1) _savegpr0_25: std r25,-56(r1) _savegpr0_26: std r26,-48(r1) _savegpr0_27: std r27,-40(r1) _savegpr0_28: std r28,-32(r1) _savegpr0_29: std r29,-24(r1) _savegpr0_30: std r30,-16(r1) _savegpr0_31: std r31,-8(r1) std r0, 16(r1) blr _restgpr0_14: ld r14,-144(r1) _restgpr0_15: ld r15,-136(r1) _restgpr0_16: ld r16,-128(r1) _restgpr0_17: ld r17,-120(r1) _restgpr0_18: ld r18,-112(r1) _restgpr0_19: ld r19,-104(r1) _restgpr0_20: ld r20,-96(r1) _restgpr0_21: ld r21,-88(r1) _restgpr0_22: ld r22,-80(r1) _restgpr0_23: ld r23,-72(r1) _restgpr0_24: ld r24,-64(r1) _restgpr0_25: ld r25,-56(r1) _restgpr0_26: ld r26,-48(r1) _restgpr0_27: ld r27,-40(r1) _restgpr0_28: ld r28,-32(r1) _restgpr0_29: ld r0, 16(r1) ld r29,-24(r1) mtlr r0 ld r30,-16(r1) ld r31,-8(r1) blr _restgpr0_30: ld r30,-16(r1) _restgpr0_31: ld r0, 16(r1) ld r31,-8(r1) mtlr r0 blr So this is one function with many entry points you could say. Maybe that is what confused valgrind?
[Bug rtl-optimization/98692] Unitialized Values reported only with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692 --- Comment #15 from Segher Boessenkool --- (In reply to Will Schmidt from comment #14) > The _restgpr* and _savegpr* functions are not referenced when the test is > built at other optimization levels. (I've looked at disassembly from -O0 .. > -O4). Right, it is a size optimisation. > I do note that the _restgpr and _savegpr functions are called differently. > savegpr is called with bl while the restgpr is called via a direct branch; i > can't immediately tell if this is by design or if it is in error. It is by design: these are special functions defined by the ABI, specifically to save some code space.
[Bug rtl-optimization/99041] combine creates invalid address which ICEs in decompose_normal_address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99041 --- Comment #5 from Segher Boessenkool --- (As Jakub said; I'm just slow).
[Bug rtl-optimization/99041] combine creates invalid address which ICEs in decompose_normal_address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99041 --- Comment #4 from Segher Boessenkool --- combine always asks recog(), so that must have said it is okay?
[Bug rtl-optimization/98986] Try matching both orders of commutative RTX operations when there is no canonical order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98986 --- Comment #2 from Segher Boessenkool --- I agree it makes sense to have the one arm with vec_duplicate first in the canonical order. Problem is that this is deep in the arms, but it can be done of course. Autogenerating part of combine? Nonononono please. Or, what part do you mean? Something in rtx-simplify would make sense, and something in recog would make a *lot* of sense. For the latter, we probably want some more syntax in the machine description, things like % are too restrictive (and that is really only meant for RA). For example, a common pattern is the sum of three things, which has no good way of expressing right now.
[Bug libgcc/98952] powerpc*: __trampoline_setup inverted test for trampoline size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98952 --- Comment #2 from Segher Boessenkool --- And after that it always copies r4 bytes, too (rounded down to a multiple of four bytes).
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #22 from Segher Boessenkool --- Don't replace the constraints. For one thing, this is very hard to do correctly. Just make the "m" constraint not allow prefixed memory in asms, like I said above. (So all "general_operand" even!)
[Bug target/98093] ICE in gen_vsx_set_v2df, at config/rs6000/vsx.md:3276
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98093 --- Comment #6 from Segher Boessenkool --- (In reply to Martin Liška from comment #5) > It's fixed on master, can we close it now or do we need a backport to active > branches? If someone filled in the known-to-work / known-to-fail fields we would know!
[Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053 --- Comment #11 from Segher Boessenkool --- Please open a separate bug for x86 problems.
[Bug target/98210] [11 Regression] SHF_GNU_RETAIN breaks gold linker generated binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98210 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #7 from Segher Boessenkool --- This also needs a backport to 10? Can someone please fill in the known_to_{work,fail} fields?
[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960 --- Comment #26 from Segher Boessenkool --- (In reply to Richard Biener from comment #23) > (that combine number prevails on trunk as well, I can't spot any code > that disables combine on large BBs so not sure what goes on here) There is no such thing, indeed. And the instruction combiner is "mostly linear", so it shouldn't actually matter.
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #8 from Segher Boessenkool --- I say nothing like that. I say that .text.hot. is nasty (is easily mistaken for .text.hot). I also say that and that named-per-function sections are better as .text%name than as .text.name (just as they were long ago), because this doesn't conflict with things like .text.hot (and there is a very long history of such conflicts giving real-world problems).
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #6 from Segher Boessenkool --- I was under the impression this unique section thing needed the trailing dot thing. This probably is not true. I still think the old "%" thing is much superior to the trailing dot thing, but that then is orthogonal to the "unique section" thing, so let's ignore it now :-) It still remains that this flag needs a name that says what it *does*, as I mentioned at the end of Comment 4.
[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092 --- Comment #3 from Segher Boessenkool --- Created attachment 50040 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50040=edit Patch Patch in testing.
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #19 from Segher Boessenkool --- We cannot allow "m" to allow pcrel memory accesses, because most existing inline assembler code will break then. So we then need some way to tell the compiler that some instruction *does* allow pcrel memory (or even *requires* it).
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 Segher Boessenkool changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #22 from Segher Boessenkool --- Fixed with commit f8c66617ab91826af1d950b00d853eaff622 Author: Segher Boessenkool Date: Tue Jan 19 23:43:56 2021 + rs6000: Fix rs6000_emit_le_vsx_store (PR98549) One of the advantages of LRA is that you can create new pseudos from it just fine. The code in rs6000_emit_le_vsx_store was not aware of this. This patch changes that, in the process fixing PR98549 (where it is shown that we do call rs6000_emit_le_vsx_store during LRA, which we used to assert can not happen). 2021-01-20 Segher Boessenkool * config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Change assert. Adjust comment. Simplify code.
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 Segher Boessenkool changed: What|Removed |Added Attachment #49996|0 |1 is obsolete|| --- Comment #21 from Segher Boessenkool --- Created attachment 50007 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50007=edit Better patch
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #18 from Segher Boessenkool --- Created attachment 49996 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49996=edit Patch
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #17 from Segher Boessenkool --- (In reply to jos...@codesourcery.com from comment #15) > Only if the undefined behavior is a property of the program, or of all > possible executions of the program, as opposed to a property of a > particular execution of the program. See C90 DR#109. "A conforming > implementation must not fail to translate a strictly conforming program > simply because *some* possible execution of that program would result in > undefined behavior.". Yeah, good point. But we do not have a complete program here at all, so this doesn't say much. If this was a complete program likely *every* execution of it would be UB; but of course it is also possible to make one where no execution has UB. Since the main routine in this snippet unconditionally has undefined behaviour, there is no way I can call this valid code. Anyway, the attached patch fixes the problem in this testcase. Not sure yet it is actually correct ;-)
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #16 from Segher Boessenkool --- Needs -mcpu=power8. Confirmed with that (and the given options).
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #14 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #13) > For UB at runtime, we can warn, but shouldn't error because the code might > never be invoked at runtime. As far as I can see at least the C standard disagrees with this: NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). So we are allowed to error. It doesn't seem we will ever agree whether this is valid code. This is not a very useful discussion anyway: let's just make a small (valid code) testcase and be done with it :-)
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #12 from Segher Boessenkool --- for (long i; i != compress_n_blocks; ++i) "i" is uninitialized; accessing it is UB. So this is ice-on-invalid. I have no doubt there is an actual bug somewhere here. We just do not have valid code yet as testcase (preferably shorter than this, and C code, so that it is easier and can run on more systems).
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #2 from Segher Boessenkool --- Can't we use ".text%name" for -ffunction-sections, like we did originally, in 1996? See cf4403481dd6. This does not conflict with other section names, and does not have all the problems you get from doing anything that is not a simple prefix.
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #10 from Segher Boessenkool --- (And that new test case is full of obvious invalid code as well, fwiw.)
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #9 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #6) > The warning often warns on dead code. > But even if the warning is right, that doesn't make it ice-on-invalid-code. > The code may have UB at runtime, but that UB doesn't need to be ever > triggered when running the program. That does not make it valid code. > ice-on-invalid-code stands for code that should be rejected (diagnosed with > errors, not warnings), but instead of giving the error we ICE on it instead. > That is not the case here. The documentation says ice-on-invalid-code ICE on code that is not valid which is true here. Anyway: unsigned long xor_buf_y[1]; ... typecast_copy(xor_buf_y, in, 4); which obviously is an out-of-bounds access. But there are even worse things: char *__trans_tmp_2; memcpy(__trans_tmp_2, S2, 32); (accessing an uninitialised variable). So no, there is no way I can consider this a P1.
[Bug rtl-optimization/98692] Unitialized Values reported only with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692 Segher Boessenkool changed: What|Removed |Added CC||acsawdey at gcc dot gnu.org --- Comment #5 from Segher Boessenkool --- Have you tried a new valgrind? Either this is (or was) a known problem in valgrind, or it is related to one. Cc:ing Aaron, he might know more (he wrote the GCC optimisations that expose the problem).
[Bug rtl-optimization/98692] Unitialized Values reported only with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98692 --- Comment #4 from Segher Boessenkool --- Are you sure that target is correct?!
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 --- Comment #5 from Segher Boessenkool --- The "warninb" says warning: ‘void* memcpy(void*, const void*, long unsigned int)’ writing 32 bytes into a region of size 8 overflows the destination [-Wstringop-overflow=] It says it is wrong, so it is not a warning, it is an error. Perhaps that warning is just completely broken, it is lying to the user?
[Bug target/98549] [11 Regression] ICE in rs6000_emit_le_vsx_store, at config/rs6000/rs6000.c:9938 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98549 Segher Boessenkool changed: What|Removed |Added Priority|P1 |P4 --- Comment #3 from Segher Boessenkool --- It is an ICE-on-invalid, so it cannot be P1.
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #17 from Segher Boessenkool --- (What i was referring to in Comment 4 was asm_operand_ok in recog.c -- it may need some surgery if we need to hook into that).
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #16 from Segher Boessenkool --- No, this cannot be fixed in this hook, or in any other hook. The compiler can never see *at all* what instructions there are, the template is just a piece of text to it (there could be assembler macros in play, if you need to see a practical reason). We just need new constraints, as Bill and Peter agree.
[Bug testsuite/98643] [11 regression] r11-6615 causes failure in gcc.target/powerpc/fold-vec-extract- char.p7.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98643 Segher Boessenkool changed: What|Removed |Added Last reconfirmed||2021-01-13 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Segher Boessenkool --- Yeah, the last addi in the new addi/add/addi sequences is superfluous. Confirmed.
[Bug c++/98645] C++ modules support does not work on PowerPC with IEEE 128-bit long double
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98645 --- Comment #1 from Segher Boessenkool --- (In reply to Michael Meissner from comment #0) > I am tuning up the final patches for providing support to enable the PowerPC > server compilers to change the default long double from using the IBM > 128-bit double double format to IEEE 128-bit. You mean "change the default for powerpc64le-*" I hope? Most other configurations we cannot change, certainly not before we allow IEEE QP float everywhere. > When the default long double is IEEE 128-bit, the powerpc backend needs to > create a new type (__ibm128) to allow access to the old IBM 128-bit format. > It looks like the gcc/cp/module.cc code does not have a method of dealing > with target specific floating point types. If that is true, this should be a P1. Please figure out if it is true!
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #11 from Segher Boessenkool --- (In reply to Bill Schmidt from comment #10) > But it seems we would also need a new constraint that does permit > PC-relative addresses, since new code will/may not have a TOC. How could that work? You need different assembler code for pcrel accesses! *Sometimes* just prefixing a "p" is enough, maybe we should do something for that, but we cannot magically fix the general problem.
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #8 from Segher Boessenkool --- Yes, "m" can not allow PC-relative, in inline asm (just think of all existing code that uses "m").
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #6 from Segher Boessenkool --- You cannot look at the instruction, ever. The inline asm template is just text, nothing else. You cannot assume it is valid instructions.
[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 --- Comment #4 from Segher Boessenkool --- "m" is already handled differently for inline asm, so perhaps we can just extend that? ("m" in machine descriptions is "m<>" in asm, for example).
[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 --- Comment #6 from Segher Boessenkool --- (In reply to Fangrui Song from comment #5) > Please read my first comment why copy relocs is a bad name. Since I reply to some of that (namely, your argument 1)), you could assume I have read your comment already ;-) > The compiler > behavior is whether the external data symbol is accessed > directly/indirectly. Not really, no. It isn't clear at all what "directly" even means! > Copy relocs is just the inferred ELF linker behavior > (in -no-pie/-pie link mode) when the symbol is external. The option name > should mention the direct behavior, instead of the inferred behavior at the > linking stage. Yes. But your proposed solution just makes this worse :-( > -fdirect-access-external-data makes sense on other binary formats, though I > won't ask GCC to > implement relevant behaviors for other binary formats. But what does that *mean*? "direct access"? (And, "external data", for that matter! This isn't as obvious as it was thirty years ago.) > * For example, on COFF, the behavior is like always > -fdirect-access-external-data. __declspec(dllimport) is needed to use > indirect access. I don't know what "declspec" is. Something something mswindows? > * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic > (only available on arm) and the opposite for -fpic. So what you want is that object that are globally visible will be implemented as-is? For if you do not do whole-program optimisation, for example? So that a) those objects will actually *exist*, and b) they will be laid out in the way the program expects? > If you don't want to think of non-ELF, feel free to make the option specific > to ELF. The problem is not that I don't want to think about it, but that the way it seems to be defined only applies to ELF (and to some specific (sub-)targets using ELF, even). > > You want to have this a generic option, while it is > > not clear at all what it would mean, what it would *do*, which is especially > > important if you want this to be an option used by multiple compilers: if it > > is not clear to every user what simple, sensible thing a flag is the knob > > for, that flag simply cannot be used at all -- or worse, some users *will* > > use it, but then their intentions are not clear to humans, and different > > compilers can (and will!) think the user wanted something else! > > To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC Huh? That isn't a user-visible thing at all, it's an implementation detail. It is a quite straight-forward auto thing, defined to true if the loader passes some specific test. - o - o - So, what you want is to attach the attribute ((used)) variable attribute to all data (or at least the data not explicitly made static) automatically?
[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #4 from Segher Boessenkool --- (In reply to Fangrui Song from comment #3) > Are you happy with the option name -f[no-]direct-access-external-data ? Not at all, no :-( The name does not explain its purpose at all, and the whole concept only makes sense for a fraction of all targets. A -mcopy-relocs ("generate copy relocations if that is a good idea"), defined *per target*, would be a lot better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie), or something like that. You want to have this a generic option, while it is not clear at all what it would mean, what it would *do*, which is especially important if you want this to be an option used by multiple compilers: if it is not clear to every user what simple, sensible thing a flag is the knob for, that flag simply cannot be used at all -- or worse, some users *will* use it, but then their intentions are not clear to humans, and different compilers can (and will!) think the user wanted something else!
[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326 --- Comment #20 from Segher Boessenkool --- Yes, that is clear... But we have ***double*** x in that example even, as the declared type of the parameter, so converting that to float is almost certainly a bad idea?
[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326 --- Comment #18 from Segher Boessenkool --- Why is it correct to convert the double x to single precision here?!
[Bug target/98020] PPC: mfvsrwz+extsw not merged to mtvsrwa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98020 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2020-12-08 Ever confirmed|0 |1 --- Comment #1 from Segher Boessenkool --- mtvsrwa is the wrong way around, and mfvsrwa does not exist. Am I missing anything?
[Bug rtl-optimization/98178] Combine splitter does not split to single instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98178 --- Comment #3 from Segher Boessenkool --- Yup, this is true in general, we almost never say why we don't combine so far. Patches welcome! (Make sure you use TDF_DETAILS for such prints).
[Bug rtl-optimization/98179] New: gcc.dg/pr97954.c fails on (at least) BE powerpc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98179 Bug ID: 98179 Summary: gcc.dg/pr97954.c fails on (at least) BE powerpc Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- /home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c: In function 'foo': /home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: error: too many outgoing branch edges from bb 4 during RTL pass: loop2_invariant /home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: internal compiler error: verify_flow_info failed 0x10435cb3 verify_flow_info() /home/segher/src/gcc/gcc/cfghooks.c:269 0x10876cc7 checking_verify_flow_info /home/segher/src/gcc/gcc/cfghooks.h:212 0x10876cc7 move_loop_invariants() /home/segher/src/gcc/gcc/loop-invariant.c:2299 0x1087142f execute /home/segher/src/gcc/gcc/loop-init.c:530 This happens because this passed moved insn 8 from bb 4 to 2: (jump_insn 8 2 22 2 (parallel [ (set (reg:SI 118 [ x ]) (asm_operands:SI ("") ("=r") 0 [] [] [ (label_ref:DI 22) ] pr97954.c:10)) (clobber (reg:SI 98 ca)) ]) "pr97954.c":10:3 -1 (expr_list:REG_UNUSED (reg:SI 98 ca) (nil)) -> 22) We shouldn't allow such a move at all (not of any jump_insn!)
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #23 from Segher Boessenkool --- Changing the ABI (silently, even!) is never an expected thing. All of the four 32-bit ABIs we support have an AltiVec variant that isn't fully compatible to the non-AltiVec base variant. It would be a huge disservice to the user to change the ABI from under his/her feet. Anyway, patch in testing.
[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972 --- Comment #3 from Segher Boessenkool --- #0 moving_insn_creates_bookkeeping_block_p (through_insn=0x3fffb5b23138, insn=0x3fffb5b736c0) at /home/segher/src/gcc/gcc/sel-sched.c:2031 It crashes here because the insn is not in any BB; which is correct actually, because the insn has been deleted! It is deleted in sel-sched, and it was created there as well. I don't see anything wrong in the earlier debug dump; afaics this was just expose by the 2-2 combine thing.
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #20 from Segher Boessenkool --- (In reply to Peter Bergner from comment #18) > So why don't we default to the Altivec ABI with -m32 on cpus that have > Altivec and VSX units??? History. I'm not sure all our ABIs are compatible with vectors enabled, either. Since always, you have needed to use -mabi=altivec on 32-bit.
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #19 from Segher Boessenkool --- (In reply to Arseny Solokha from comment #17) > (In reply to Segher Boessenkool from comment #16) > > Oh, it's a different testcase, in comment 6. Yeah a new PR would > > have been better ;-/ > > Do you want me to reopen PR97963 and copy comment 14 there until it's not > too late? Nah, it already is too late... Just keep it in mind for the future :-) It is easy to join two PRs. It is very hard / annoying to separate PRs; it is much easier if separate bugs just start out separate, so don't piggy-back it onto a PR that you think may have to do with it (you can always point to the existing PR!)
[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972 --- Comment #2 from Segher Boessenkool --- Confirmed.
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #16 from Segher Boessenkool --- Oh, it's a different testcase, in comment 6. Yeah a new PR would have been better ;-/
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #15 from Segher Boessenkool --- Why does that compiler default to -mcpu=power10?
[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 --- Comment #1 from Segher Boessenkool --- Confirmed (needs -O0).
[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847 --- Comment #4 from Segher Boessenkool --- This was caused (or exposed) by e3b3b59683c1: commit e3b3b59683c1e7d31a9d313dd97394abebf644be Author: Vladimir N. Makarov Date: Fri Nov 13 12:45:59 2020 -0500 [PATCH] Implementation of asm goto outputs
[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847 --- Comment #3 from Segher Boessenkool --- I can now reproduce it, with a compiler built yesterday (previous was a few days older), and -O0. Confirmed.
[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #8 from Segher Boessenkool --- The fmadd;frsp sequence is correct for this source code. It does double rounding of the result (first to DP float, then to SP float), so using just fmadds is only correct for -ffast-math or similar.
[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847 Segher Boessenkool changed: What|Removed |Added Status|NEW |WAITING --- Comment #1 from Segher Boessenkool --- I cannot reproduce this? Not with any -mcpu= either, or any -O option.
[Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784 --- Comment #6 from Segher Boessenkool --- (In reply to Richard Biener from comment #3) > There is targetm.sched.reassociation_width which specifies how re-assocation > should make such sequence "wide". Ah cool, thank you :-) > Andrew is correct that we don't do this > for any types that are TYPE_OVERFLOW_UNDEFINED. Yes; but I see the sub-optimal behaviour for unsigned, too. > And powerpc has > > static int > rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED, > machine_mode mode) > { > switch (rs6000_tune) > { > case PROCESSOR_POWER8: > case PROCESSOR_POWER9: > case PROCESSOR_POWER10: > if (DECIMAL_FLOAT_MODE_P (mode)) > return 1; > if (VECTOR_MODE_P (mode)) > return 4; > if (INTEGRAL_MODE_P (mode)) > return 1; Yeah this last 1 is the problem :-) > thus you get width 1 which means a linear chain (even if the user wrote > a tree). Yup. > Note RTL doesn't do any such thing like re-assocation (I guess in principle > scheduling could, and that's the only place where it would make sense > on RTL). RTL unrolling can, actually! "Variable expansion" is its horrible name (and it makes a lot of sense there: it allows breaking a bit linear chain into pieces).
[Bug target/97786] New: rs6000 isinf etc. are pretty horrible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97786 Bug ID: 97786 Summary: rs6000 isinf etc. are pretty horrible Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- int isfinite(double x) { return __builtin_isfinite (x); } int isinf(double x) { return __builtin_isinf (x); } int isinf_sign(double x) { return __builtin_isinf_sign (x); } int isnan(double x) { return __builtin_isnan (x); } int isnormal(double x) { return __builtin_isnormal (x); } int fpclassify(double x) { return __builtin_fpclassify (5, 6, 7, 8, 9, x); } We can generate much better code for all these than the generic code we use now.
[Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784 --- Comment #2 from Segher Boessenkool --- No, it is exactly the same with unsigned types :-( Use -Dlong="unsigned long" or use #define O ^ (as in my original test). I forgot about this signed thing, but it has nothing to do with it (that matters on gimple level, sure, but the problem exists in pure RTL as well).