[Bug rtl-optimization/86901] [AArch64] Suboptimal register allocation for int/float reinterpret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86901 --- Comment #4 from Richard Earnshaw --- But why not: f2: fmovw1, s0 ubfxw1, w1, 20, 11 cmp w1, 1015 bhi .L7 fmuls0, s0, s0 str s0, [x0] ret .L7: b g ? There's no need to be using X regs here, W is just fine.
[Bug target/96373] [11 Regression] SVE miscompilation on vectorized division loop, leading to FP exception
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96373 --- Comment #25 from Richard Earnshaw --- (In reply to Kewen Lin from comment #24) > OK, thanks for the comments, I'll mark PR108977 as won't fix then. It would be more normal to mark it as fixed, but set the fix version to the earliest release with the fix.
[Bug target/115611] mve: vsetq_lane for 64-bits has wrong codegen when setting lane 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115611 Richard Earnshaw changed: What|Removed |Added Target Milestone|--- |11.5
[Bug target/105090] BFI instructions are not generated on arm-none-eabi-g++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105090 --- Comment #9 from Richard Earnshaw --- It looks like the compiler now merges b into a rather than a into b. The result is the same, though and we don't need an lsr this way. Technically it ought to be better. But we do end up in a dance with the registers this way at present. I suspect it's due to not splitting DFmode regs as aggressively as we do DImode and then ending up trying to re-form them later on for register allocation purposes. Anyway, I don't think the lsr is essential to the test, so lets just remove that from the test.
[Bug target/103100] [11 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100 Richard Earnshaw changed: What|Removed |Added Target Milestone|11.5|12.5
[Bug c/115770] Undefined arm instruction (udf #255) is generated when optimizer is on O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115770 --- Comment #2 from Richard Earnshaw --- Correction: the option to add is -fno-delete-null-pointer-checks Sorry for the confusion.
[Bug c/115770] Undefined arm instruction (udf #255) is generated when optimizer is on O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115770 Richard Earnshaw changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Richard Earnshaw --- Address 0 is the same as the null pointer value and you haven't told the compiler you're operating in a 'stand-alone' environment (in a hosted environment dereferencing null is considered undefined behaviour). If you change the address to 4, you'll see what I mean. You can add -ffreestanding to force the compiler to treat 0 as a valid address, but this has other side-effects on the compilation as well.
[Bug target/115732] Arm32 architecture definitions for v8+ appear to have wrong FPU/SIMD defaults
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115732 Richard Earnshaw changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #4 from Richard Earnshaw --- In Arm v8+ fp and SIMD are just one option. You can't add FP without adding SIMD and you can't remove FP without removing SIMD. Since older versions of the architecture essentially support SIMD as an extension of the FP extension, we chose to make the options +simd and +nofp, these being the outer envelope of the previous options.
[Bug target/115353] [14 regression] Missed thumb2 table branch instruction optimisations since r14-4946-g7006e5d2d7b5b2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353 Richard Earnshaw changed: What|Removed |Added Target Milestone|--- |14.2
[Bug target/115360] cmse_nonsecure_call wrapper on arm missing STT_FUNCTION
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115360 Richard Earnshaw changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-06-05 Summary|cmse_nonsecure_call wrapper |cmse_nonsecure_call wrapper |missing STT_FUNCTION|on arm missing STT_FUNCTION --- Comment #1 from Richard Earnshaw --- Confirmed by observation.
[Bug target/115353] [14/15 regression] Missed thumb2 table branch instruction optimisations since r14-4946-g7006e5d2d7b5b2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2024-06-05 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Richard Earnshaw --- Confirmed.
[Bug tree-optimization/115157] incorrect TBAA for derived types involving enum types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115157 --- Comment #4 from Richard Earnshaw --- The tests in the last patch fail on arm-eabi. The tests assume that sizeof(enum) == sizeof(int), which is not true if -fshort-enum is the default. + Changes for ./gcc/testsuite/gcc/gcc.sum.sent + New tests that FAIL (6 tests): arm-qemu/-mthumb: gcc: gcc.dg/enum-alias-1.c (test for excess errors) arm-qemu/-mthumb: gcc: gcc.dg/enum-alias-2.c execution test arm-qemu/-mthumb: gcc: gcc.dg/enum-alias-3.c execution test
[Bug target/115086] bic is not used when the non-not part is a constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115086 --- Comment #2 from Richard Earnshaw --- And perhaps more importantly the mov can even be hoisted outside of a loop.
[Bug target/115083] undefined reference for aarch64-w64-mingw32 target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115083 --- Comment #5 from Richard Earnshaw --- Please give the port developers time to finish working on the port. Only the initial patches have been pushed so far and there is plenty of work left to do.
[Bug target/115058] on target arm -mcpu=cortex-a78ae does not allow use pauth and dot product
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115058 Richard Earnshaw changed: What|Removed |Added Resolution|--- |INVALID Status|WAITING |RESOLVED --- Comment #7 from Richard Earnshaw --- This is a bug in GNU Binutils. These system registers are incorrectly described as being part of armv8.3-a, rather than part of the Pauth extension. Please can you raise a bug there: https://sourceware.org/bugzilla (select product binutils).
[Bug target/115058] on target arm -mcpu=cortex-a78ae does not allow use pauth and dot product
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115058 Richard Earnshaw changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |WAITING Last reconfirmed||2024-05-13 --- Comment #1 from Richard Earnshaw --- It looks like those messages are coming from the assembler, not the compiler, but without a testcase it's difficult to be exactly sure what your problem is. Please attach a small program that demonstrates your problem and state the /exact/ command line you used.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #34 from Richard Earnshaw --- To be honest, I'm more concerned that we aren't eliminating a lot of these copies during the gimple optimization phase. The memcpy is really a type punning step (that's strictly ISO C compliant, rather than using the GCC union extension), so ideally we'd recognize that and eliminate as many of the copies as possible (perhaps using some form of view_convert or whatever gimple is appropriate for changing the view without changing the contents). But that's for another day...
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #31 from Richard Earnshaw --- While that does seem to fix the bug, it's at the cost of 6 additional stores in the problematic test that are redundant other than changing the alias set view.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #29 from Richard Earnshaw --- Sorry, I was looking at the wrong pair of insns. The earlier store to that location was insn 111. 111: [r212:SI (1 MEM[(struct Vec128 *)_179]+0 S4 A64)] = {r0:SI..r3:SI} It appears that the problem is a disagreement between alias_set_subset_of () and alias_sets_conflict_p(). The former thinks sets 1 and 2 have a permissible subset relationship (2 is a subset of 1), so removes the later store during postreload. The latter is then used by alias_sets_conflict_p which thinks there is no conflict between the two sets and fails to add a scheduling dependency before sched2.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #27 from Richard Earnshaw --- (In reply to Richard Earnshaw from comment #26) > (In reply to Richard Biener from comment #25) > > I think it's more interesting why > > > > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = > > {r0:SI..r3:SI} > > > > isn't considered as dependence? Why does the earlier insn even come into > > play? What's the breaking transform? I guess insn 119 and 120 are > > exchanged? > > Because 119 was deleted by postreload. Doh! I should have spotted that. But that ought to be ok, insn 115 is a store in alias set 0, so is picked up by later alias analysis. It's just that the compiler then digs deeper and decides that that isn't an addressable object (at the gimple level) so there can't really be a dependency.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #26 from Richard Earnshaw --- (In reply to Richard Biener from comment #25) > I think it's more interesting why > > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = > {r0:SI..r3:SI} > > isn't considered as dependence? Why does the earlier insn even come into > play? What's the breaking transform? I guess insn 119 and 120 are > exchanged? Because 119 was deleted by postreload. Doh! I should have spotted that.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #23 from Richard Earnshaw --- #0 ptr_deref_may_alias_decl_p (ptr=0x75e0c678, decl=0x75dff000) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:295 #1 0x01768173 in indirect_ref_may_alias_decl_p (ref1=0x75e9ad98, base1=0x75e9ad98, offset1=..., max_size1=..., size1=..., ref1_alias_set=3, base1_alias_set=3, ref2=0x75deae60, base2=0x75dff000, offset2=..., max_size2=..., size2=..., ref2_alias_set=0, base2_alias_set=0, tbaa_p=false) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2102 #2 0x01769541 in refs_may_alias_p_2 (ref1=0x7fffceb0, ref2=0x7fffce70, tbaa_p=false) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2505 #3 0x0176968a in refs_may_alias_p_1 (ref1=0x7fffce70, ref2=0x7fffceb0, tbaa_p=false) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2534 #4 0x00f7bf7d in rtx_refs_may_alias_p (x=0x75ed3b40, mem=0x75e9c9d8, tbaa_p=true) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/alias.cc:366 #5 0x00f8243b in true_dependence_1 (mem=0x75e9c9d8, mem_mode=E_SImode, mem_addr=0x75e9c9c0, x=0x75ed3b40, x_addr=0x75ed3b28, mem_canonicalized=false) Where (in true_dependence_1): p mem $96 = (const_rtx) 0x75e9c9d8 (gdb) pr (mem/c:SI (plus:SI (reg/f:SI 14 lr [214]) (const_int 4 [0x4])) [0 MEM [(char * {ref-all})]+4 S4 A32]) p x $97 = (const_rtx) 0x75ed3b40 (gdb) pr (mem/c:V8HI (plus:SI (reg/f:SI 13 sp) (const_int 256 [0x100])) [3 MEM [(short int *)_179]+0 S16 A64]) in refs_may_alias_p_1: p *ref1 $99 = {ref = 0x75e9ad98, base = 0x75e9ad98, offset = {> = {coeffs = {0}}, }, size = {> = {coeffs = {128}}, }, max_size = {> = {coeffs = {128}}, }, ref_alias_set = 3, base_alias_set = 3, volatile_p = false} p *ref2 $100 = {ref = 0x75deae60, base = 0x75dff000, offset = {> = {coeffs = {32}}, }, size = {> = {coeffs = {32}}, }, max_size = {> = {coeffs = {128}}, }, ref_alias_set = 0, base_alias_set = 0, volatile_p = false} p ref1->ref $101 = (tree) 0x75e9ad98 (gdb) pt unit-size align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x77405498 precision:16 min max pointer_to_this reference_to_this > V8HI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x7752d7e0 nunits:8 pointer_to_this > arg:0 sizes-gimplified public unsigned type_6 SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type 0x7740c150 pointer_to_this reference_to_this > var def_stmt version:179 ptr-info 0x75e71468> arg:1 constant 0>> p ref1->base $102 = (tree) 0x75e9ad98 (gdb) pt unit-size align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x77405498 precision:16 min max pointer_to_this reference_to_this > V8HI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x7752d7e0 nunits:8 pointer_to_this > arg:0 sizes-gimplified public unsigned type_6 SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type 0x7740c150 pointer_to_this reference_to_this > var def_stmt version:179 ptr-info 0x75e71468> arg:1 constant 0>> p ref2->ref $103 = (tree) 0x75deae60 (gdb) pt unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x77405348 precision:8 min max > BLK size unit-size user align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76322d20 domain sizes-gimplified public type_6 SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76b33d20 precision:32 min max > pointer_to_this > arg:0 public unsigned SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x766db5e8> arg:0 used ignored BLK ../hwy-pr111231-cpp.cc:4461:27 size unit-size align:64 warn_if_not_align:0 context abstract_origin (mem/c:BLK (plus:SI (reg/f:SI 109 virtual-stack-vars) (const_int -96 [0xffa0])) [2 D.33805+0 S16 A64])> ../hwy-pr111231-cpp.cc:4346:16 start: ../hwy-pr111231-cpp.cc:4346:3 finish: ../hwy-pr111231-cpp.cc:4346:24> arg:1 constant 0>> p ref2->base $104 = (tree) 0x75dff000 (gdb) pt unit-size align:16
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #22 from Richard Earnshaw --- (Previous analysis is based on gcc-13 branch)
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 Richard Earnshaw changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #21 from Richard Earnshaw --- With my new testcase, compiled on an arm-none-eabi cross with cc1plus ../hwy-pr111231-cpp.cc -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee -marm -mlibarch=armv7-a+neon-vfpv4 -march=armv7-a+neon-vfpv4 -O2 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -fmerge-all-constants -fmath-errno -fno-exceptions The critical sequence, at the end of gimple optimization is: v = b; MEM [(char * {ref-all})] = MEM [(char * {ref-all})]; v ={v} {CLOBBER(eol)}; v = D.33805; vect__239.652_700 = MEM [(short int *)]; vect__240.653_702 = vect__239.652_700 << 8; This generates the following (pseudo) rtl: ; D.33805 = _179 113: r215:SI=r109:SI-0x10 114: {r0:SI..r3:SI} = [r215:SI (0 MEM [(char * {ref-all})_179]+0 S4 A64)] 112: r214:SI=r109:SI-0x60 115: [r214:SI (0 MEM [(char * {ref-all})]+0 S4 A64)] = {r0:SI..r3:SI} ; _179 = D.33805 117: r217:SI=r109:SI-0x60 118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)] 116: r216:SI=r109:SI-0x10 * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = {r0:SI..r3:SI} ; r218 = _179 * 120: r218:V8HI=[r109:SI-0x10 (3 MEM [(short int *)_179]+0 S16 A64)] 121: r178:V8HI=unspec[r218:V8HI,const_vector] 451 The two key instructions have been starred. Things proceed OK until sched2, at which point, when building the dependencies, we fail to create a link between i119 and i120. I've tracked this as far as ptr_deref_may_alias_decl_p (), where the call to may_be_aliased () decides that D.33805 cannot be aliased and thus there's no dependency. But it's not clear to me why we've tracked back to the copy before the load of interest, nor why, at this point, we're looking at tree addressability to decide whether or not there are memory dependencies here.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #20 from Richard Earnshaw --- Created attachment 57928 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57928=edit fully preprocessed testcase
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #19 from Richard Earnshaw --- This is another problem with (I suspect) incorrect aliasing information. If I compile with -fno-strict-aliasing, I get 88: f4432a1fvst1.8 {d18-d19}, [r3 :64] // {>E} SP+96/16 8c: f4420a1fvst1.8 {d16-d17}, [r2 :64] // {>A} SP+32/16 90: e893000fldm r3, {r0, r1, r2, r3}// {G} SP+128/16 98: eddd0b20vldrd16, [sp, #128] ; 0x80 // {B} SP+48/16 a4: e28dc040add ip, sp, #64 ; 0x40 a8: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 ac: f2d80570vshl.s16q8, q8, #8 b0: f3f503e0vneg.s16q8, q8 b4: edcd0b20vstrd16, [sp, #128] ; 0x80 // {>G.l} SP+128/8 b8: edcd1b22vstrd17, [sp, #136] ; 0x88 // {>G.h} SP+136/8 bc: e894000fldm r4, {r0, r1, r2, r3}// {C} SP+64/16 c4: e28dc050add ip, sp, #80 ; 0x50 c8: e88c000fstm ip, {r0, r1, r2, r3}// {>D} SP+80/16 cc: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 I've annotated each memory access with its stack address and labeled each 16-byte slot from A to G. With -fstrict-aliasing this becomes: 88: f4420a1fvst1.8 {d16-d17}, [r2 :64] // {>A} SP+32/16 8c: eddd0b20vldrd16, [sp, #128] ; 0x80 // {E} SP+96/16 98: e893000fldm r3, {r0, r1, r2, r3}// {B} SP+48/16 a0: e28dc040add ip, sp, #64 ; 0x40 a4: f2d80570vshl.s16q8, q8, #8 a8: e884000fstm r4, {r0, r1, r2, r3}// {>G} SP+128/16 ! ac: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 b0: f3f503e0vneg.s16q8, q8 b4: edcd0b20vstrd16, [sp, #128] ; 0x80 // {>G.l} SP+128/8 b8: edcd1b22vstrd17, [sp, #136] ; 0x88 // {>G.h} SP+136/8 bc: e894000fldm r4, {r0, r1, r2, r3}// {C} SP+64/16 c4: e28dc050add ip, sp, #80 ; 0x50 c8: e88c000fstm ip, {r0, r1, r2, r3}// {>D} SP+80/16 cc: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 And we see that the initial store to G has been moved after the reads from it. I'm still digging, but it may be pertinent that the reads have been split into two separate instructions; perhaps when the split was done the alias sets weren't copied correctly.
[Bug rtl-optimization/114338] (x & (-1 << y)) should be optimized to ((x >> y) << y) or vice versa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114338 --- Comment #1 from Richard Earnshaw --- Why would that be better? On a machine that does not lack registers, there's more instruction-level parallelism in (set (tmp) (-1)) (set (tmp) (ashift (tmp) (count))) (and (x) (x) (tmp)) What's more, on Arm/AArch64 insns 2 and 3 can be merged into a single instruction: (set (tmp) (-1)) (set (x) (and (ashift (tmp) (count)) (x))) which is definitely preferable to two register-controlled shifts.
[Bug target/114307] [ARM] GCC generates instruction that assembler rejects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114307 --- Comment #2 from Richard Earnshaw --- Note that it's clear from the .syntax markers that this is inline assembler that's the source of the invalid instructions.
[Bug target/114307] [ARM] GCC generates instruction that assembler rejects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114307 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2024-03-11 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Earnshaw --- >From a full assembler dump: .syntax divided @ 71 "/home/rearnsha/gnusrc/gcc/master/gcc/testsuite/gcc.dg/vect/tree-vect.h" 1 vorr d6, d6, d7 @ 0 "" 2 .arm .syntax unified So this is a problem with the test; it shouldn't be enabled for this target.
[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428 Richard Earnshaw changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Richard Earnshaw --- Fixed
[Bug target/113542] [14 Regression] gcc.target/arm/bics_3.c regression after change for pr111267
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542 Richard Earnshaw changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #6 from Richard Earnshaw --- Change the test slightly to avoid the insn matching issues. This does leave open the question of how best to optimize the slightly simpler sequences, where we could do even better than we do now, but that's an enhancement and not appropriate for gcc-14.
[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428 --- Comment #6 from Richard Earnshaw --- Patch here: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647294.html
[Bug debug/100523] [11/12/13/14 Regression] armv8.1-m.main -fcompare-debug failure with -O -fmodulo-sched -mtune=cortex-a53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100523 Richard Earnshaw changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2024-03-06 Ever confirmed|0 |1 --- Comment #6 from Richard Earnshaw --- Confirmed on trunk (14.0.1 20240215) +++ cd.gk.c.gkd 2024-03-06 10:21:59.679317666 + @@ -71,6 +71,7 @@ (nil)) (code_label # 0 0 5 4 (nil) [1 uses]) (note # 0 0 [bb 5] NOTE_INSN_BASIC_BLOCK) +(note # 0 0 NOTE_INSN_DELETED) (insn # 0 0 5 (set (reg:CC 100 cc) (compare:CC (reg/v:SI 3 r3 [orig:116 crc ] [116]) (const_int 0 [0]))) "cd.c":7:11# {*arm_cmpsi_insn} @@ -99,7 +100,6 @@ (const_int -1 [0x]))) ]) "cd.c":5:10 discrim 1# {thumb2_addsi3_compare0} (nil)) -(note # 0 0 NOTE_INSN_DELETED) (insn # 0 0 5 (set (reg:SI 2 r2 [orig:114 _1 ] [114]) (ashiftrt:SI (reg/v:SI 3 r3 [orig:116 crc ] [116]) (const_int 1 [0x1]))) "cd.c":7:11# {*arm_shiftsi3} So it's probably harmless in this case, but still shouldn't happen.
[Bug libgcc/110775] [12/13/14 Regression] abort define causing issues in tsystem.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110775 --- Comment #3 from Richard Earnshaw --- Perhaps we could use #define abort __builtin_trap ? A quick check seems to suggest this will work ok.
[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428 --- Comment #4 from Richard Earnshaw --- /* { dg-warning {cast to pointer from integer of different size} "" { target *-*-* } .-2 } */ I'm guessing it's this that's causing the problem because int and int* are the same size on 32-bit targets. So would changing the test to: - int arr[20]; + char arr[20]; be enough? AFAIK we don't have any targets with 8-bit pointers.
[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428 --- Comment #3 from Richard Earnshaw --- The referenced patch added the test that is failing. How is that a regression? Or are you suggesting that the test works without the rest of the patch applied?
[Bug target/113510] [14 Regression] [ARM Thumb] ICE in extract_constrain_insn with CPU cortex-m23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113510 Richard Earnshaw changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from Richard Earnshaw --- Fixed
[Bug target/113510] [14 Regression] [ARM Thumb] ICE in extract_constrain_insn with CPU cortex-m23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113510 Richard Earnshaw changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rearnsha at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #7 from Richard Earnshaw --- mine
[Bug testsuite/113611] [14 Regression] gcc.dg/pr110279-1.c fails on cross build since gcc-14-5779-g746344dd538
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113611 Richard Earnshaw changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #2 from Richard Earnshaw --- I don't see how this can be a regression. > --with-fpu=vfpv3-d16 FMA was added in vfpv4. If I change the fpu to add this then the test generates the relevant comments in the dump file. Arguably this test should check that the target has FMA instructions before running, but that's a different issue.
[Bug target/112337] arm: ICE in arm_effective_regno when compiling for MVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337 Richard Earnshaw changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #17 from Richard Earnshaw --- Should now be fixed.
[Bug target/112337] arm: ICE in arm_effective_regno when compiling for MVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337 Richard Earnshaw changed: What|Removed |Added Target Milestone|--- |14.0
[Bug middle-end/114136] wrong code for c23 fully anonymous arg lists on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136 Richard Earnshaw changed: What|Removed |Added Status|NEW |RESOLVED Target Milestone|--- |13.3 Resolution|--- |FIXED --- Comment #4 from Richard Earnshaw --- fixed
[Bug target/114143] Non-thumb arm32 code in thumb multilib for libgcc and in -mthumb build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114143 --- Comment #5 from Richard Earnshaw --- (In reply to Richard Earnshaw from comment #4) > You're going to need --with-multilib=aprofile,rmprofile if you want the full > set of multilibs. But beware it builds a *lot* of them. Sorry, I mean --with-multilib-list, not --with-multilib. To make things worse, configure will silently ignore options it does not recognize.
[Bug target/114143] Non-thumb arm32 code in thumb multilib for libgcc and in -mthumb build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114143 --- Comment #4 from Richard Earnshaw --- You're going to need --with-multilib=aprofile,rmprofile if you want the full set of multilibs. But beware it builds a *lot* of them. We don't enable this by default because it conflicts with --with-arch, --with-cpu and --with-float configure options. Describing how to pick the right multilib when there are so many to chose from is just too complex to describe when the base architecture isn't nailed down.
[Bug target/114143] Non-thumb arm32 code in thumb multilib for libgcc and in -mthumb build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114143 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2024-02-28 Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 --- Comment #2 from Richard Earnshaw --- You probably haven't built the correct multilibs. See Christophe's comments
[Bug middle-end/114136] wrong code for c23 fully anonymous arg lists on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136 Richard Earnshaw changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2024-02-27
[Bug middle-end/114136] New: wrong code for c23 fully anonymous arg lists on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136 Bug ID: 114136 Summary: wrong code for c23 fully anonymous arg lists on arm Product: gcc Version: 13.1.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: rearnsha at gcc dot gnu.org Target Milestone: --- Target: arm On arm, a fully anonymous c23-style function is called incorrectly. All arguments are passed on the stack while the receiving function expects r0-r3 to be used for the initial arguments. For example, void f (...); void g() { f (1, 2, 3, 4); } With gcc compiles to: g: push{lr} movsr0, #1 movsr1, #2 sub sp, sp, #20 movsr2, #3 movsr3, #4 stm sp, {r0, r1, r2, r3} // Arguments pushed to stack (wrong) bl f add sp, sp, #20 ldr pc, [sp], #4 When the correct code (eg, as produced by clang) is something like g: mov r0, #1 mov r1, #2 mov r2, #3 mov r3, #4 b f compile with, eg arm-non-eabi-gcc -O2 -c23
[Bug target/108120] [11/12 Regression] ICE: in extract_insn, at recog.cc:2791 (on ARM with -mfpu=neon -freciprocal-math -O3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108120 Richard Earnshaw changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Richard Earnshaw --- Fixed on all active branches
[Bug target/108120] [11/12 Regression] ICE: in extract_insn, at recog.cc:2791 (on ARM with -mfpu=neon -freciprocal-math -O3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108120 Richard Earnshaw changed: What|Removed |Added Summary|[11/12/13 Regression] ICE: |[11/12 Regression] ICE: in |in extract_insn, at |extract_insn, at |recog.cc:2791 (on ARM with |recog.cc:2791 (on ARM with |-mfpu=neon |-mfpu=neon |-freciprocal-math -O3) |-freciprocal-math -O3) Assignee|unassigned at gcc dot gnu.org |rearnsha at gcc dot gnu.org Status|NEW |ASSIGNED
[Bug target/108120] [11/12/13 Regression] ICE: in extract_insn, at recog.cc:2791 (on ARM with -mfpu=neon -freciprocal-math -O3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108120 Richard Earnshaw changed: What|Removed |Added Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE: |ICE: in extract_insn, at|in extract_insn, at |recog.cc:2791 (on ARM with |recog.cc:2791 (on ARM with |-mfpu=neon |-mfpu=neon |-freciprocal-math -O3) |-freciprocal-math -O3) --- Comment #4 from Richard Earnshaw --- Fixed on trunk so far.
[Bug target/107270] [11/12/13/14 Regression] return for structure is not as good as before
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107270 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2024-02-22 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #2 from Richard Earnshaw --- Successfully matched this instruction: (set (reg/i:DI 0 x0) (ior:DI (and:DI (reg/v:DI 92 [ b ]) (const_int 4294967295 [0x])) (ashift:DI (subreg:DI (reg:SI 100) 0) (const_int 32 [0x20] rejecting combination of insns 10 and 15 original costs 4 + 4 = 8 replacement cost 12 But this is just BFI, so it's a costing issue.
[Bug target/113780] [ARM] Incorrect indirect tailcall generated for PAC-enabled function.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113780 Richard Earnshaw changed: What|Removed |Added CC||keithp at keithp dot com --- Comment #2 from Richard Earnshaw --- *** Bug 113795 has been marked as a duplicate of this bug. ***
[Bug target/113795] armv8.1m-m.main+pacbti -mbranch-protection=standard -O2 compile error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113795 Richard Earnshaw changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #3 from Richard Earnshaw --- Same as 113780 *** This bug has been marked as a duplicate of bug 113780 ***
[Bug target/108933] [11/12/13 Regression] Missing rev16 detection
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933 Richard Earnshaw changed: What|Removed |Added Summary|[11/12/13/14 Regression]|[11/12/13 Regression] |Missing rev16 detection |Missing rev16 detection Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |Matthieu.Longo at arm dot com --- Comment #6 from Richard Earnshaw --- Fixed on trunk so far.
[Bug target/113542] [14 Regression] gcc.target/arm/bics_3.c regression after change for pr111267
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542 Richard Earnshaw changed: What|Removed |Added Keywords||missed-optimization --- Comment #2 from Richard Earnshaw --- The costing code is expecting (parallel [ (set (reg:SI 124 [ _7 ]) (ne:SI (reg:SI 122 [ _2 ]) (const_int 0 [0]))) (clobber (reg:CC 100 cc)) ]) To result in the assembler output SUBS r124, R122, #1 SBC r124, R122, r124 so really should have a cost of 8 (two insns). But for some reason the thumb2 back-end is not generating that output in this case. Overall, that means that for bic_si_test BIC r0, r0, r1 SUBS r1, r0, #1 SBC r0, r0, r1 is neither better nor worse than BICS r0, r0, r1 IT ne MOVNE r0, #1 and certainly better than BICS r0, r0, r1 ITE ne MOVNE r2, #1 MOVEQ r2, #0 at least when it comes to code size. So the test is somewhat flaky, but there is a further problem with the compiler not generating the expected sequence for NE(reg, 0) in Thumb2.
[Bug target/113510] [14 Regression] [ARM Thumb] ICE in extract_constrain_insn with CPU cortex-m23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113510 --- Comment #6 from Richard Earnshaw --- (In reply to Andrew Pinski from comment #5) > Yes the peephole2 in thumb1.md looks wrong: > ``` > ;; Reloading and elimination of the frame pointer can > ;; sometimes cause this optimization to be missed. > (define_peephole2 > [(set (match_operand:SI 0 "arm_general_register_operand" "") > (match_operand:SI 1 "const_int_operand" "")) >(set (match_dup 0) > (plus:SI (match_dup 0) (reg:SI SP_REGNUM)))] > "TARGET_THUMB1 >&& UINTVAL (operands[1]) < 1024 >&& (UINTVAL (operands[1]) & 3) == 0" > [(set (match_dup 0) (plus:SI (reg:SI SP_REGNUM) (match_dup 1)))] > "" > ) > ``` > > Confirmed. Since this is a peephole and we're dealing with hard regs, we can just use "low_register_operand" as the predicate for operand 0.
[Bug rtl-optimization/113542] gcc.target/arm/bics_3.c regression after change for pr111267
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542 Richard Earnshaw changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-01-24 --- Comment #1 from Richard Earnshaw --- Options to reproduce -O2 -mcpu=cortex-m3 -mthumb The problem is really a back-end issue. But the cause is that the fwprop pass is now merging propagating insn 9 into insn 10, replacing: (set (reg:SI 124 [ _7 ]) (ne:SI (reg:CC 100 cc) (const_int 0 [0]))) with the flag setting instruction to form (parallel [ (set (reg:SI 124 [ _7 ]) (ne:SI (reg:SI 122 [ _2 ]) (const_int 0 [0]))) (clobber (reg:CC 100 cc)) ]) That's OK, but it means that the combine pass is no-longer able to merge the flag setter with an earlier result producer. A similar thing starts to happen arm state this is dropped because the costs are working out as the same (it has to reduce the cost). So I think it's that the cost model for thumb2 needs tweaking.
[Bug testsuite/113278] analyzer tests relying on fileno() fail on arm-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113278 --- Comment #1 from Richard Earnshaw --- newlib certainly implements fileno(): $ nm libc.a|grep fileno libc_a-fileno.o: T fileno U fileno libc_a-fileno_u.o: T fileno_unlocked U fileno So perhaps the issue is that the prototype is missing (or missing with the default compilation options since it's Posix and I don't think we pass options to enable that by default). Grepping the source, I suspect the former.
[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257 --- Comment #4 from Richard Earnshaw --- I'm not sure. My understanding was that -march=native started by looking up the CPU ID first and then using the internal mapping of that CPU to the architecture (which can't work if we don't recognize the CPU), but perhaps we try a bit harder when both are specified.
[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257 --- Comment #2 from Richard Earnshaw --- For -mcpu=native, the manual says: Additionally on native AArch64 GNU/Linux systems the value @samp{native} tunes performance to the host system. This option has no effect if the compiler is unable to recognize the processor of the host system. With similar working for -march=native Since nobody has contributed patches to recognize the Apple Silicon cores, I suspect that is the source of the problem.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #61 from Richard Earnshaw --- Then I don't understand what you're trying to say in c57.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #59 from Richard Earnshaw --- Memcpy must never write beyond the end of the specified buffer, even if reading it is safe. That wouldn't be thread safe.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #56 from Richard Earnshaw --- I've never heard of a memcpy implementation that corrupts data if called with memcpy (p, p, n). (The problems come from partial overlaps where the direction of the copy may matter). Has anybody considered asking the standards committee to bless this as a special exception? Of course, if n is large, then performing an early test is still worthwhile, but for small n, the cost of the check possibly exceeds the benefit of eliding the copy.
[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045 --- Comment #28 from Richard Earnshaw --- (In reply to David Binderman from comment #5) > No idea. I know the gcc project is over 30 years old and it is not > feasible for me to download the entire history, it is too large. > > I have the last 18 months or so history and that's a whopping > 3.8 Gig on it's own. $ cd ~/gnusrc/gcc/master/.git $ du -sh . 1.8G. So on my machine the entire git history is just 1.8G; that's because the history is very densely packed on the server and pulling the entire history does not require an unpack-repack-send sequence. But if you download a partial history, then the git server has to unpack and then repack the required history in order to send it; that makes the process much slower and results in far more data being transmitted (the on-the-fly repack is not as dense because it would take too much time).
[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045 --- Comment #27 from Richard Earnshaw --- > ==9933==by 0x151D554: search_line_fast (lex.cc:872) This is the entry code; so the issue is with the initial alignment code (unless the buffer is smaller than 16 bytes, when we might get both under reading and overreading).
[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045 --- Comment #26 from Richard Earnshaw --- I think it's more likely that this is at the start of the buffer rather than the end, and related to rounding the address down to a 16-byte alignment. But it could also occur at the end of the buffer as well if the buffer is (nearly) full.
[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045 --- Comment #24 from Richard Earnshaw --- (In reply to David Binderman from comment #22) > Is the optimization still worthwhile some 12 years later ? Almost certainly. Vector operations have become much better than they were at the time the patch went in, so it's probably even more worthwhile.
[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045 --- Comment #21 from Richard Earnshaw --- FTR it was this patch that added this code. So 2012! commit e75b54a2d932929a9b2e940c5aad1ef33a86c008 Author: Richard Earnshaw Date: Thu Mar 22 17:54:55 2012 + * lex.c (search_line_fast): Provide Neon-optimized version for ARM. From-SVN: r185702 diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index 97177e89916..133620b3b70 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,7 @@ +2012-03-22 Richard Earnshaw + + * lex.c (search_line_fast): Provide Neon-optimized version for ARM. +
[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045 Richard Earnshaw changed: What|Removed |Added CC||rearnsha at gcc dot gnu.org --- Comment #20 from Richard Earnshaw --- (In reply to Andrew Pinski from comment #9) > This is almost definitely a valgrind issue. > We start with: > /* Align the source pointer. */ > misalign = (uintptr_t)s & 15; > p = (const uint8_t *)((uintptr_t)s & -16); > data = vld1q_u8 (p); > > > Which all other targets do too. > > Basically this is how you realign the pointer and if don't depend on the > bytes that is not in the original pointer, then this is valid. > > Does it work correctly without valgrind? Yes, for the first fetch, we align down to a 16-byte boundary and fetch the full 16 bytes. We then mask off the bytes that are before the real start of the buffer so that they cannot affect the result. So the code is safe, but valgrind has no real way of knowing this. Tricks like this wouldn't work with capability pointers, but we're not concerned about that here; even MTE (on aarch64) would be ok because the alignment used matches the tag granule size. So I'm pretty sure this is a false positive. But perhaps we should just disable the vectorized scanning when valgrind checking is enabled. Note that glibc implementations of str* functions can perform a similar trick, but perhaps valgrind has special knowledge of such cases.
[Bug target/113030] parsecpu.awk's chkarch/chkcpu commands is broken for aliases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113030 --- Comment #4 from Richard Earnshaw --- Yes, that looks sensible. Can you post it please?
[Bug target/112334] ICE in gen_untyped_return arm.md:9197 while compiling harden-cfr-bret.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112334 --- Comment #1 from Richard Earnshaw --- This might be a side issue, but: @defbuiltin{{void} __builtin_return (void *@var{result})} This built-in function returns the value described by @var{result} from the containing function. You should specify, for @var{result}, a value returned by @code{__builtin_apply}. So I'm not sure it's legal to pass to __builtin_return().
[Bug target/109166] Built-in __atomic_test_and_set does not seem to be atomic on ARMv4T
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109166 Richard Earnshaw changed: What|Removed |Added Resolution|--- |WONTFIX Status|NEW |RESOLVED --- Comment #8 from Richard Earnshaw --- I'm going to close this as WONTFIX. There are several reasons for this. There's no SWPH operation, so it's impossible to generalize atomic operations for all basic data types. It's not possible to synthesize a 16-bit atomic type with either SWP or SWPB. There's no support in Thumb state for SWP[B]. The instruction was removed in later versions of the architecture, which makes code non-portable. Finally, Armv4, which dates to around 1995, is essentially in maintenance only mode and this is really a new feature request. In fact, I don't think we'd really want to add new features for anything before Armv7 these days (even that is more than 10 years old).
[Bug target/111096] Frame pointer is not used even when -fomit-frame-pointer is specified
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111096 --- Comment #8 from Richard Earnshaw --- (In reply to Thomas Koenig from comment #7) > Would it make sense to document this somewhere? Or did I just miss it? :-) Possibly, but I've no idea where. It's too target-specific to put under the generic documentation for -fomit-frame-pointer and I don't think there's a section in the manual that really documents the target-specific behaviours of generic options.
[Bug target/97807] ICE in output_move_double, at config/arm/arm.c:19689
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97807 --- Comment #4 from Richard Earnshaw --- I can reproduce this, but only with -mfloat-abi=soft.
[Bug target/111096] Frame pointer is not used even when -fomit-frame-pointer is specified
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111096 --- Comment #6 from Richard Earnshaw --- For completeness. The AArch64 ABI lists 4 alternatives with respect to having a frame chain. When -fomit-frame-pointer is used, GCC implements this one: - It may require the frame pointer to address a valid frame record at all times, except that any subroutine may elect not to create a frame record
[Bug target/111096] Frame pointer is not used even when -fomit-frame-pointer is specified
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111096 Richard Earnshaw changed: What|Removed |Added Resolution|--- |WONTFIX Status|UNCONFIRMED |RESOLVED --- Comment #5 from Richard Earnshaw --- This was a deliberate design choice. Although the frame chain is not set up by code that omits the frame pointer, the chain of frames that are set up by other functions is still valid this way. This ensures that any code that does try to walk the frame chain will not crash. If we reused the frame pointer for other purposes, then any code trying to walk the frame chain (eg backtrace()) would encounter an invalid record and likely crash. With 31 main registers, the benefit from one additional one is not especially large.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #16 from Richard Earnshaw --- (In reply to Mark Brown from comment #15) > The kernel module loader simply does not insert veneers at present, and > there were some implementation concerns IIRC. That's not a good reason to weaken the security of the generated code.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #14 from Richard Earnshaw --- (In reply to Mark Brown from comment #13) > The kernel hasn't got any problem with BTI as far as I am aware - when built > with clang we run the kernel with BTI enabled since clang does just insert a > BTI C at the start of every function, and GCC works fine so long as we don't > get any out of range jumps being generated. The issue is that we don't have > anything to insert veneers in the case where section placement puts static > functions into a distant enough part of memory to need an indirect jump but > GCC has decided to omit the landing pad. The linker has to insert the veneers.
[Bug target/110908] [aarch64] Internal compiler error when using -ffixed-x30
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110908 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2023-08-07 Status|UNCONFIRMED |NEW Severity|normal |enhancement Ever confirmed|0 |1 --- Comment #4 from Richard Earnshaw --- Why would you ever want to fix x30? Because of the way it is used by the architecture, there's no possible value in doing so. The compiler may insert instructions that must clobber this value at any point in the program (to handle libfuncs, for example), so it would be unsafe to store any useful value in it. I think it would be far more useful to make the compiler reject this option than to give the appearance that it is possible, when frankly, it isn't. Although it isn't technically, an ICE on invalid code, it's about as close to that as you can get.
[Bug target/110901] -march does not override -mcpu (big.little on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2023-08-07 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #4 from Richard Earnshaw --- I think this is a driver bug. The MCPU_TO_MARCH_SPEC should be wrapped with %{!march=*:...} so that the CPU architecture is ignored if -march has been explicitly specified.
[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796 Richard Earnshaw changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rearnsha at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #12 from Richard Earnshaw --- Working on a patch.
[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2023-07-26 CC||rearnsha at gcc dot gnu.org Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #11 from Richard Earnshaw --- Confirmed. It only happens when generating Thumb code. For Arm code it works correctly. I think the problem is that the Thumb code generator is emitting vcmf, while the Arm code generator uses vcmfe - the latter sets the exception bits. I'm not sure why the code is different yet, still investigating.
[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796 --- Comment #9 from Richard Earnshaw --- proc add_options_for_ieee { flags } { if { [istarget alpha*-*-*] || [istarget sh*-*-*] } { return "$flags -mieee" } if { [istarget rx-*-*] } { return "$flags -mnofpu" } return $flags } So it looks like this isn't expecting to add anything in most cases.
[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796 --- Comment #6 from Richard Earnshaw --- Is the exception status supposed to be in a defined state when the test runs? Shouldn't there be a call to feclearexcept (FE_ALL_EXCEPT) at the start of the test?
[Bug target/86772] [meta-bug] tracking port status for CVE-2017-5753
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86772 Bug 86772 depends on bug 86793, which changed state. Bug 86793 Summary: mips port needs updating for CVE-2017-5753 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86793 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug target/86793] mips port needs updating for CVE-2017-5753
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86793 Richard Earnshaw changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |14.0 --- Comment #3 from Richard Earnshaw --- Fixed on main development branch.
[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312 --- Comment #8 from Richard Earnshaw --- Applies to both AArch64 and Arm back-ends.
[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312 Richard Earnshaw changed: What|Removed |Added CC||Vedant.VijayYevale@infineon ||.com --- Comment #7 from Richard Earnshaw --- *** Bug 109415 has been marked as a duplicate of this bug. ***
[Bug target/109415] No predefined macros to differentiate between ARM Cortex-M33 and Cortex-M55
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109415 Richard Earnshaw changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #9 from Richard Earnshaw --- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312 *** This bug has been marked as a duplicate of bug 99312 ***
[Bug target/109415] No predefined macros to differentiate between ARM Cortex-M33 and Cortex-M55
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109415 --- Comment #8 from Richard Earnshaw --- The __ARM_ARCH_...__ macros turned out to be a very bad design decision. Each new architecture needs a new macro that older compilers (and software) will not know about. The ACLE approach is far more sensible and GCC has mostly adopted that now. I personally consider the existing __ARM_ARCH_...__ macros to be deprecated, though I don't think the manual actually says this yet. There is a known bug in GCC. ACLE says that __ARM_ARCH should have the value *100 + for architectures after arm-v8, but we don't implement that yet (there may already be a PR about this) and report __ARM_ARCH=8 for all existing armv8.xxx variants.
[Bug target/108943] ARM Unaligned memory access with high optimizer levels
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108943 Richard Earnshaw changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Richard Earnshaw --- There's no compiler bug here. Cortex-M7 implements the ARMv7em version of the architecture, which supports unaligned accesses. If this is faulting then it's because you're trying to use the operation on something like device memory without informing the compiler about this. You need to mark your pointers as volatile in this case. The alternative is to compile with -mno-unaligned-access, but I wouldn't recommend this as that will disable other optimizations where this might be safe and useful.
[Bug ipa/108470] Missing documentation for alternate uses of __attribute__((noinline))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108470 --- Comment #3 from Richard Earnshaw --- The manual entry for this says "This attribute is supported mainly for the purpose of testing the compiler." which suggests a lack of long-term commitment to the option. Perhaps it would be better to remove that. In some ways the analogy is with "-fast-math" which is a short-hand for a number of other flags but not guaranteed to be only those options - although in this case 'noipa' is, I think, intended to be conservatively safe.
[Bug target/103100] [11/12/13 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100 --- Comment #19 from Richard Earnshaw --- (In reply to Andrew Pinski from comment #18) > I should say that testcase happens at `-Os -mstrict-align`, at `-O2 > -mstrict-align` it works. Because for -Os we don't forcibly align arrays - see AARCH64_EXPAND_ALIGNMENT and the macros that use it.
[Bug target/100000] non-leaf epologue/prologue used if MVE v4sf is used for load/return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10 --- Comment #3 from Richard Earnshaw --- Given that the hard-float ABI essentially requires V4SF as a type, it might be better to consider this mode supported unconditionally in this case, and although that might make the compiler try some pointless vectorizations it would generate better code for cases like this.
[Bug target/100000] non-leaf epologue/prologue used if MVE v4sf is used for load/return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10 --- Comment #2 from Richard Earnshaw --- If the testcase is built with -march=armv8.1-m.main+mve.fp then the useless stack adjustments go away. I think that's because V4SFmode is not a supported vector mode for integer MVE - see arm_vector_mode_supported_p() in arm.cc. When it isn't a builtin type we end up with a BLKmode object that the compiler creates a stack-slot for, even though no RTL is ever generated to use the slot in this case.
[Bug target/108515] Fails to link fixincl with unresolvable R_ARM_MOVW_ABS_NC reloca tion against symbol `stderr@@GLIBC_2.4'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108515 Richard Earnshaw changed: What|Removed |Added Resolution|FIXED |INVALID
[Bug target/108515] Fails to link fixincl with unresolvable R_ARM_MOVW_ABS_NC reloca tion against symbol `stderr@@GLIBC_2.4'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108515 Richard Earnshaw changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #13 from Richard Earnshaw --- (In reply to Richard Biener from comment #11) > So eventually linking with -Wl,-z,nocopyreloc will fail? If you want to avoid copyrelocs you'll need to compile with -fpie.
[Bug target/108515] Fails to link fixincl with unresolvable R_ARM_MOVW_ABS_NC reloca tion against symbol `stderr@@GLIBC_2.4'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108515 --- Comment #10 from Richard Earnshaw --- Almost certainly this is related to the need for a copyreloc and presumably the linker has not created one for some reason. So I suspect this is most likely a binutils issue rather than a compiler one. The code generated for the simple test is just main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 movwr3, #:lower16:stderr movtr3, #:upper16:stderr push{r4, lr} movwr0, #:lower16:.LC0 movtr0, #:upper16:.LC0 ldr r1, [r3] bl printf mov r0, #0 pop {r4, pc} And the references to stderr will require the definition to be moved from the shared library to the static image during linking.
[Bug target/103100] [11/12/13 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100 --- Comment #14 from Richard Earnshaw --- (In reply to Richard Biener from comment #13) > (In reply to Andrew Pinski from comment #10) > > Updated patch submitted: > > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html > > I think you need to ping your patches more aggressively ... Richard Sandiford reviewed it here:| https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589581.html So the problem is that the review wasn't followed up by the submitter.
[Bug target/108442] arm: MVE's vld1* and vst1* do not work when __ARM_MVE_PRESERVE_USER_NAMESPACE is defined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108442 --- Comment #5 from Richard Earnshaw --- Fixed on master. While this is not a regression, we should consider a backport.