from:"rearnsha at gcc dot gnu.org via Gcc\-bugs"

[Bug rtl-optimization/86901] [AArch64] Suboptimal register allocation for int/float reinterpret

2024-07-23 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86901

--- Comment #4 from Richard Earnshaw  ---
But why not:

f2:
fmovw1, s0
ubfxw1, w1, 20, 11
cmp w1, 1015
bhi .L7
fmuls0, s0, s0
str s0, [x0]
ret
.L7:
b   g

?

There's no need to be using X regs here, W is just fine.

[Bug target/96373] [11 Regression] SVE miscompilation on vectorized division loop, leading to FP exception

2024-07-12 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96373

--- Comment #25 from Richard Earnshaw  ---
(In reply to Kewen Lin from comment #24)

> OK, thanks for the comments, I'll mark PR108977 as won't fix then.
It would be more normal to mark it as fixed, but set the fix version to the
earliest release with the fix.

[Bug target/115611] mve: vsetq_lane for 64-bits has wrong codegen when setting lane 1

2024-07-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115611

Richard Earnshaw  changed:

   What|Removed |Added

   Target Milestone|--- |11.5

[Bug target/105090] BFI instructions are not generated on arm-none-eabi-g++

2024-07-10 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105090

--- Comment #9 from Richard Earnshaw  ---
It looks like the compiler now merges b into a rather than a into b.  The
result is the same, though and we don't need an lsr this way.  Technically it
ought to be better.

But we do end up in a dance with the registers this way at present.  I suspect
it's due to not splitting DFmode regs as aggressively as we do DImode and then
ending up trying to re-form them later on for register allocation purposes.

Anyway, I don't think the lsr is essential to the test, so lets just remove
that from the test.

[Bug target/103100] [11 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align

2024-07-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100

Richard Earnshaw  changed:

   What|Removed |Added

   Target Milestone|11.5|12.5

[Bug c/115770] Undefined arm instruction (udf #255) is generated when optimizer is on O2

2024-07-03 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115770

--- Comment #2 from Richard Earnshaw  ---
Correction: the option to add is -fno-delete-null-pointer-checks
Sorry for the confusion.

[Bug c/115770] Undefined arm instruction (udf #255) is generated when optimizer is on O2

2024-07-03 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115770

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Richard Earnshaw  ---
Address 0 is the same as the null pointer value and you haven't told the
compiler you're operating in a 'stand-alone' environment (in a hosted
environment dereferencing null is considered undefined behaviour).  If you
change the address to 4, you'll see what I mean.

You can add -ffreestanding to force the compiler to treat 0 as a valid address,
but this has other side-effects on the compilation as well.

[Bug target/115732] Arm32 architecture definitions for v8+ appear to have wrong FPU/SIMD defaults

2024-07-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115732

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #4 from Richard Earnshaw  ---
In Arm v8+ fp and SIMD are just one option.  You can't add FP without adding
SIMD and you can't remove FP without removing SIMD.  Since older versions of
the architecture essentially support SIMD as an extension of the FP extension,
we chose to make the options +simd and +nofp, these being the outer envelope of
the previous options.

[Bug target/115353] [14 regression] Missed thumb2 table branch instruction optimisations since r14-4946-g7006e5d2d7b5b2

2024-06-07 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353

Richard Earnshaw  changed:

   What|Removed |Added

   Target Milestone|--- |14.2

[Bug target/115360] cmse_nonsecure_call wrapper on arm missing STT_FUNCTION

2024-06-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115360

Richard Earnshaw  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-05
Summary|cmse_nonsecure_call wrapper |cmse_nonsecure_call wrapper
   |missing STT_FUNCTION|on arm missing STT_FUNCTION

--- Comment #1 from Richard Earnshaw  ---
Confirmed by observation.

[Bug target/115353] [14/15 regression] Missed thumb2 table branch instruction optimisations since r14-4946-g7006e5d2d7b5b2

2024-06-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-05
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Richard Earnshaw  ---
Confirmed.

[Bug tree-optimization/115157] incorrect TBAA for derived types involving enum types

2024-06-03 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115157

--- Comment #4 from Richard Earnshaw  ---
The tests in the last patch fail on arm-eabi. The tests assume that
sizeof(enum) == sizeof(int), which is not true if -fshort-enum is the default.


+ Changes for ./gcc/testsuite/gcc/gcc.sum.sent +


New tests that FAIL (6 tests):

arm-qemu/-mthumb: gcc: gcc.dg/enum-alias-1.c (test for excess errors)
arm-qemu/-mthumb: gcc: gcc.dg/enum-alias-2.c execution test
arm-qemu/-mthumb: gcc: gcc.dg/enum-alias-3.c execution test

[Bug target/115086] bic is not used when the non-not part is a constant

2024-05-14 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115086

--- Comment #2 from Richard Earnshaw  ---
And perhaps more importantly the mov can even be hoisted outside of a loop.

[Bug target/115083] undefined reference for aarch64-w64-mingw32 target

2024-05-14 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115083

--- Comment #5 from Richard Earnshaw  ---
Please give the port developers time to finish working on the port.  Only the
initial patches have been pushed so far and there is plenty of work left to do.

[Bug target/115058] on target arm -mcpu=cortex-a78ae does not allow use pauth and dot product

2024-05-14 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115058

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|WAITING |RESOLVED

--- Comment #7 from Richard Earnshaw  ---
This is a bug in GNU Binutils.  These system registers are incorrectly
described as being part of armv8.3-a, rather than part of the Pauth extension. 
Please can you raise a bug there: https://sourceware.org/bugzilla (select
product binutils).

[Bug target/115058] on target arm -mcpu=cortex-a78ae does not allow use pauth and dot product

2024-05-13 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115058

Richard Earnshaw  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2024-05-13

--- Comment #1 from Richard Earnshaw  ---
It looks like those messages are coming from the assembler, not the compiler,
but without a testcase it's difficult to be exactly sure what your problem is.

Please attach a small program that demonstrates your problem and state the 
/exact/ command line you used.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-16 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #34 from Richard Earnshaw  ---
To be honest, I'm more concerned that we aren't eliminating a lot of these
copies during the gimple optimization phase.  The memcpy is really a type
punning step (that's strictly ISO C compliant, rather than using the GCC union
extension), so ideally we'd recognize that and eliminate as many of the copies
as possible (perhaps using some form of view_convert or whatever gimple is
appropriate for changing the view without changing the contents).

But that's for another day...

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-15 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #31 from Richard Earnshaw  ---
While that does seem to fix the bug, it's at the cost of 6 additional stores in
the problematic test that are redundant other than changing the alias set view.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #29 from Richard Earnshaw  ---
Sorry, I was looking at the wrong pair of insns.  The earlier store to that
location was insn 111.

111: [r212:SI (1 MEM[(struct Vec128 *)_179]+0 S4 A64)] = {r0:SI..r3:SI}

It appears that the problem is a disagreement between alias_set_subset_of ()
and alias_sets_conflict_p().  The former thinks sets 1 and 2 have a permissible
subset relationship (2 is a subset of 1), so removes the later store during
postreload.  The latter is then used by alias_sets_conflict_p which thinks
there is no conflict between the two sets and fails to add a scheduling
dependency before sched2.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #27 from Richard Earnshaw  ---
(In reply to Richard Earnshaw from comment #26)
> (In reply to Richard Biener from comment #25)
> > I think it's more interesting why
> > 
> > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
> > {r0:SI..r3:SI}
> > 
> > isn't considered as dependence?  Why does the earlier insn even come into
> > play?  What's the breaking transform?  I guess insn 119 and 120 are
> > exchanged?
> 
> Because 119 was deleted by postreload.  Doh! I should have spotted that.

But that ought to be ok, insn 115 is a store in alias set 0, so is picked up by
later alias analysis.  It's just that the compiler then digs deeper and decides
that that isn't an addressable object (at the gimple level) so there can't
really be a dependency.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #26 from Richard Earnshaw  ---
(In reply to Richard Biener from comment #25)
> I think it's more interesting why
> 
> * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
> {r0:SI..r3:SI}
> 
> isn't considered as dependence?  Why does the earlier insn even come into
> play?  What's the breaking transform?  I guess insn 119 and 120 are
> exchanged?

Because 119 was deleted by postreload.  Doh! I should have spotted that.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #23 from Richard Earnshaw  ---
#0  ptr_deref_may_alias_decl_p (ptr=0x75e0c678, decl=0x75dff000)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:295
#1  0x01768173 in indirect_ref_may_alias_decl_p (ref1=0x75e9ad98, 
base1=0x75e9ad98, offset1=..., max_size1=..., size1=..., 
ref1_alias_set=3, base1_alias_set=3, ref2=0x75deae60, 
base2=0x75dff000, offset2=..., max_size2=..., size2=..., 
ref2_alias_set=0, base2_alias_set=0, tbaa_p=false)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2102
#2  0x01769541 in refs_may_alias_p_2 (ref1=0x7fffceb0, 
ref2=0x7fffce70, tbaa_p=false)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2505
#3  0x0176968a in refs_may_alias_p_1 (ref1=0x7fffce70, 
ref2=0x7fffceb0, tbaa_p=false)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2534
#4  0x00f7bf7d in rtx_refs_may_alias_p (x=0x75ed3b40, 
mem=0x75e9c9d8, tbaa_p=true)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/alias.cc:366
#5  0x00f8243b in true_dependence_1 (mem=0x75e9c9d8, 
mem_mode=E_SImode, mem_addr=0x75e9c9c0, x=0x75ed3b40, 
x_addr=0x75ed3b28, mem_canonicalized=false)

Where (in true_dependence_1):
p mem
$96 = (const_rtx) 0x75e9c9d8
(gdb) pr
(mem/c:SI (plus:SI (reg/f:SI 14 lr [214])
(const_int 4 [0x4])) [0 MEM  [(char *
{ref-all})]+4 S4 A32])

p x
$97 = (const_rtx) 0x75ed3b40
(gdb) pr
(mem/c:V8HI (plus:SI (reg/f:SI 13 sp)
(const_int 256 [0x100])) [3 MEM  [(short int
*)_179]+0 S16 A64])

in refs_may_alias_p_1:
p *ref1
$99 = {ref = 0x75e9ad98, base = 0x75e9ad98, 
  offset = {> = {coeffs = {0}}, }, 
  size = {> = {coeffs = {128}}, }, 
  max_size = {> = {coeffs = {128}}, }, 
  ref_alias_set = 3, base_alias_set = 3, volatile_p = false}
p *ref2
$100 = {ref = 0x75deae60, base = 0x75dff000, 
  offset = {> = {coeffs = {32}}, }, 
  size = {> = {coeffs = {32}}, }, 
  max_size = {> = {coeffs = {128}}, }, 
  ref_alias_set = 0, base_alias_set = 0, volatile_p = false}

p ref1->ref
$101 = (tree) 0x75e9ad98
(gdb) pt
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x77405498 precision:16 min  max

pointer_to_this  reference_to_this
>
V8HI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x7752d7e0 nunits:8
pointer_to_this >

arg:0 
sizes-gimplified public unsigned type_6 SI
size 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type
0x7740c150
pointer_to_this  reference_to_this
>
var 
def_stmt 
version:179
ptr-info 0x75e71468>
arg:1 
constant 0>>

p ref1->base
$102 = (tree) 0x75e9ad98
(gdb) pt
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x77405498 precision:16 min  max

pointer_to_this  reference_to_this
>
V8HI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x7752d7e0 nunits:8
pointer_to_this >

arg:0 
sizes-gimplified public unsigned type_6 SI
size 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type
0x7740c150
pointer_to_this  reference_to_this
>
var 
def_stmt 
version:179
ptr-info 0x75e71468>
arg:1 
constant 0>>

p ref2->ref
$103 = (tree) 0x75deae60
(gdb) pt
 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x77405348 precision:8 min  max >
BLK
size 
unit-size 
user align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x76322d20
domain 
sizes-gimplified public type_6 SI
size 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x76b33d20 precision:32 min  max >
pointer_to_this >

arg:0 
public unsigned SI size  unit-size

align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x766db5e8>

arg:0 
used ignored BLK ../hwy-pr111231-cpp.cc:4461:27 size  unit-size 
align:64 warn_if_not_align:0 context  abstract_origin 
(mem/c:BLK (plus:SI (reg/f:SI 109 virtual-stack-vars)
(const_int -96 [0xffa0])) [2 D.33805+0 S16 A64])>
../hwy-pr111231-cpp.cc:4346:16 start: ../hwy-pr111231-cpp.cc:4346:3
finish: ../hwy-pr111231-cpp.cc:4346:24>
arg:1 
constant 0>>
p ref2->base
$104 = (tree) 0x75dff000
(gdb) pt
 
unit-size 
align:16

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #22 from Richard Earnshaw  ---
(Previous analysis is based on gcc-13 branch)

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Richard Earnshaw  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #21 from Richard Earnshaw  ---
With my new testcase, compiled on an arm-none-eabi cross with 

cc1plus ../hwy-pr111231-cpp.cc -mfpu=neon-vfpv4 -mfloat-abi=hard
-mfp16-format=ieee -marm -mlibarch=armv7-a+neon-vfpv4 -march=armv7-a+neon-vfpv4
-O2 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -fmerge-all-constants
-fmath-errno -fno-exceptions

The critical sequence, at the end of gimple optimization is:

  v = b;
  MEM  [(char * {ref-all})] = MEM  [(char * {ref-all})];
  v ={v} {CLOBBER(eol)};
  v = D.33805;
  vect__239.652_700 = MEM  [(short int *)];
  vect__240.653_702 = vect__239.652_700 << 8;

This generates the following (pseudo) rtl:

; D.33805 = _179
  113: r215:SI=r109:SI-0x10
  114: {r0:SI..r3:SI} = [r215:SI (0 MEM  [(char *
{ref-all})_179]+0 S4 A64)]
  112: r214:SI=r109:SI-0x60
  115: [r214:SI (0 MEM  [(char * {ref-all})]+0 S4
A64)] = {r0:SI..r3:SI}
; _179 = D.33805
  117: r217:SI=r109:SI-0x60
  118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)]
  116: r216:SI=r109:SI-0x10
* 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
{r0:SI..r3:SI}
; r218 = _179
* 120: r218:V8HI=[r109:SI-0x10 (3 MEM  [(short int
*)_179]+0 S16 A64)]
  121: r178:V8HI=unspec[r218:V8HI,const_vector] 451

The two key instructions have been starred. 

Things proceed OK until sched2, at which point, when building the dependencies,
we fail to create a link between i119 and i120.  I've tracked this as far as
ptr_deref_may_alias_decl_p (), where the call to may_be_aliased () decides that
D.33805 cannot be aliased and thus there's no dependency.  But it's not clear
to me why we've tracked back to the copy before the load of interest, nor why,
at this point, we're looking at tree addressability to decide whether or not
there are memory dependencies here.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #20 from Richard Earnshaw  ---
Created attachment 57928
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57928=edit
fully preprocessed testcase

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-03-22 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #19 from Richard Earnshaw  ---
This is another problem with (I suspect) incorrect aliasing information.  If I
compile with -fno-strict-aliasing, I get

  88:   f4432a1fvst1.8  {d18-d19}, [r3 :64] // {>E}   SP+96/16
  8c:   f4420a1fvst1.8  {d16-d17}, [r2 :64] // {>A}   SP+32/16
  90:   e893000fldm r3, {r0, r1, r2, r3}// {G}   SP+128/16
  98:   eddd0b20vldrd16, [sp, #128] ; 0x80  // {B}   SP+48/16
  a4:   e28dc040add ip, sp, #64 ; 0x40
  a8:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16
  ac:   f2d80570vshl.s16q8, q8, #8
  b0:   f3f503e0vneg.s16q8, q8
  b4:   edcd0b20vstrd16, [sp, #128] ; 0x80  // {>G.l} SP+128/8
  b8:   edcd1b22vstrd17, [sp, #136] ; 0x88  // {>G.h} SP+136/8
  bc:   e894000fldm r4, {r0, r1, r2, r3}// {C}   SP+64/16
  c4:   e28dc050add ip, sp, #80 ; 0x50
  c8:   e88c000fstm ip, {r0, r1, r2, r3}// {>D}   SP+80/16
  cc:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16

I've annotated each memory access with its stack address and labeled each
16-byte slot from A to G.

With -fstrict-aliasing this becomes:

  88:   f4420a1fvst1.8  {d16-d17}, [r2 :64] // {>A}   SP+32/16
  8c:   eddd0b20vldrd16, [sp, #128] ; 0x80  // {E}   SP+96/16
  98:   e893000fldm r3, {r0, r1, r2, r3}// {B}   SP+48/16
  a0:   e28dc040add ip, sp, #64 ; 0x40
  a4:   f2d80570vshl.s16q8, q8, #8
  a8:   e884000fstm r4, {r0, r1, r2, r3}// {>G}   SP+128/16
!
  ac:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16
  b0:   f3f503e0vneg.s16q8, q8
  b4:   edcd0b20vstrd16, [sp, #128] ; 0x80  // {>G.l} SP+128/8
  b8:   edcd1b22vstrd17, [sp, #136] ; 0x88  // {>G.h} SP+136/8
  bc:   e894000fldm r4, {r0, r1, r2, r3}// {C}   SP+64/16
  c4:   e28dc050add ip, sp, #80 ; 0x50
  c8:   e88c000fstm ip, {r0, r1, r2, r3}// {>D}   SP+80/16
  cc:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16

And we see that the initial store to G has been moved after the reads from it. 
I'm still digging, but it may be pertinent that the reads have been split into
two separate instructions; perhaps when the split was done the alias sets
weren't copied correctly.

[Bug rtl-optimization/114338] (x & (-1 << y)) should be optimized to ((x >> y) << y) or vice versa

2024-03-14 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114338

--- Comment #1 from Richard Earnshaw  ---
Why would that be better?  On a machine that does not lack registers, there's
more instruction-level parallelism in 
 (set (tmp) (-1))
 (set (tmp) (ashift (tmp) (count)))
 (and (x) (x) (tmp))

What's more, on Arm/AArch64 insns 2 and 3 can be merged into a single
instruction:

  (set (tmp) (-1))
  (set (x) (and (ashift (tmp) (count)) (x)))

which is definitely preferable to two register-controlled shifts.

[Bug target/114307] [ARM] GCC generates instruction that assembler rejects

2024-03-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114307

--- Comment #2 from Richard Earnshaw  ---
Note that it's clear from the .syntax markers that this is inline assembler
that's the source of the invalid instructions.

[Bug target/114307] [ARM] GCC generates instruction that assembler rejects

2024-03-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114307

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2024-03-11
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Richard Earnshaw  ---
>From a full assembler dump:

.syntax divided
@ 71 "/home/rearnsha/gnusrc/gcc/master/gcc/testsuite/gcc.dg/vect/tree-vect.h" 1
vorr d6, d6, d7
@ 0 "" 2
.arm
.syntax unified

So this is a problem with the test; it shouldn't be enabled for this target.

[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d

2024-03-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Richard Earnshaw  ---
Fixed

[Bug target/113542] [14 Regression] gcc.target/arm/bics_3.c regression after change for pr111267

2024-03-08 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #6 from Richard Earnshaw  ---
Change the test slightly to avoid the insn matching issues.  This does leave
open the question of how best to optimize the slightly simpler sequences, where
we could do even better than we do now, but that's an enhancement and not
appropriate for gcc-14.

[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d

2024-03-06 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428

--- Comment #6 from Richard Earnshaw  ---
Patch here: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647294.html

[Bug debug/100523] [11/12/13/14 Regression] armv8.1-m.main -fcompare-debug failure with -O -fmodulo-sched -mtune=cortex-a53

2024-03-06 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100523

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-03-06
 Ever confirmed|0   |1

--- Comment #6 from Richard Earnshaw  ---
Confirmed on trunk (14.0.1 20240215)

+++ cd.gk.c.gkd 2024-03-06 10:21:59.679317666 +
@@ -71,6 +71,7 @@
  (nil))
 (code_label # 0 0 5 4 (nil) [1 uses])
 (note # 0 0 [bb 5] NOTE_INSN_BASIC_BLOCK)
+(note # 0 0 NOTE_INSN_DELETED)
 (insn # 0 0 5 (set (reg:CC 100 cc)
 (compare:CC (reg/v:SI 3 r3 [orig:116 crc ] [116])
 (const_int 0 [0]))) "cd.c":7:11# {*arm_cmpsi_insn}
@@ -99,7 +100,6 @@
 (const_int -1 [0x])))
 ]) "cd.c":5:10 discrim 1# {thumb2_addsi3_compare0}
  (nil))
-(note # 0 0 NOTE_INSN_DELETED)
 (insn # 0 0 5 (set (reg:SI 2 r2 [orig:114 _1 ] [114])
 (ashiftrt:SI (reg/v:SI 3 r3 [orig:116 crc ] [116])
 (const_int 1 [0x1]))) "cd.c":7:11# {*arm_shiftsi3}


So it's probably harmless in this case, but still shouldn't happen.

[Bug libgcc/110775] [12/13/14 Regression] abort define causing issues in tsystem.h

2024-03-06 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110775

--- Comment #3 from Richard Earnshaw  ---
Perhaps we could use 

#define abort __builtin_trap

?

A quick check seems to suggest this will work ok.

[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d

2024-03-06 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428

--- Comment #4 from Richard Earnshaw  ---
/* { dg-warning {cast to pointer from integer of different size} "" { target
*-*-* } .-2 } */

I'm guessing it's this that's causing the problem because int and int* are the
same size on 32-bit targets.  So would changing the test to:

-  int arr[20];
+  char arr[20];

be enough?  AFAIK we don't have any targets with 8-bit pointers.

[Bug testsuite/113428] [14 regression] gcc.dg/gomp/bad-array-section-c-3.c fails after r14-7158-gb5476e4c881b0d

2024-03-06 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113428

--- Comment #3 from Richard Earnshaw  ---
The referenced patch added the test that is failing.  How is that a regression?
 Or are you suggesting that the test works without the rest of the patch
applied?

[Bug target/113510] [14 Regression] [ARM Thumb] ICE in extract_constrain_insn with CPU cortex-m23

2024-03-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113510

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Richard Earnshaw  ---
Fixed

[Bug target/113510] [14 Regression] [ARM Thumb] ICE in extract_constrain_insn with CPU cortex-m23

2024-03-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113510

Richard Earnshaw  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rearnsha at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Richard Earnshaw  ---
mine

[Bug testsuite/113611] [14 Regression] gcc.dg/pr110279-1.c fails on cross build since gcc-14-5779-g746344dd538

2024-03-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113611

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Richard Earnshaw  ---
I don't see how this can be a regression.

> --with-fpu=vfpv3-d16 

FMA was added in vfpv4.  If I change the fpu to add this then the test
generates the relevant comments in the dump file.

Arguably this test should check that the target has FMA instructions before
running, but that's a different issue.

[Bug target/112337] arm: ICE in arm_effective_regno when compiling for MVE

2024-03-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337

Richard Earnshaw  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from Richard Earnshaw  ---
Should now be fixed.

[Bug target/112337] arm: ICE in arm_effective_regno when compiling for MVE

2024-03-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337

Richard Earnshaw  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug middle-end/114136] wrong code for c23 fully anonymous arg lists on arm

2024-03-04 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136

Richard Earnshaw  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Target Milestone|--- |13.3
 Resolution|--- |FIXED

--- Comment #4 from Richard Earnshaw  ---
fixed

[Bug target/114143] Non-thumb arm32 code in thumb multilib for libgcc and in -mthumb build

2024-02-29 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114143

--- Comment #5 from Richard Earnshaw  ---
(In reply to Richard Earnshaw from comment #4)
> You're going to need --with-multilib=aprofile,rmprofile if you want the full
> set of multilibs.  But beware it builds a *lot* of them.

Sorry, I mean --with-multilib-list, not --with-multilib.  To make things worse,
configure will silently ignore options it does not recognize.

[Bug target/114143] Non-thumb arm32 code in thumb multilib for libgcc and in -mthumb build

2024-02-29 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114143

--- Comment #4 from Richard Earnshaw  ---
You're going to need --with-multilib=aprofile,rmprofile if you want the full
set of multilibs.  But beware it builds a *lot* of them.

We don't enable this by default because it conflicts with --with-arch,
--with-cpu and --with-float configure options.  Describing how to pick the
right multilib when there are so many to chose from is just too complex to
describe when the base architecture isn't nailed down.

[Bug target/114143] Non-thumb arm32 code in thumb multilib for libgcc and in -mthumb build

2024-02-28 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114143

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-28
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #2 from Richard Earnshaw  ---
You probably haven't built the correct multilibs.  See Christophe's comments

[Bug middle-end/114136] wrong code for c23 fully anonymous arg lists on arm

2024-02-27 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-27

[Bug middle-end/114136] New: wrong code for c23 fully anonymous arg lists on arm

2024-02-27 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114136

Bug ID: 114136
   Summary: wrong code for c23 fully anonymous arg lists on arm
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rearnsha at gcc dot gnu.org
  Target Milestone: ---
Target: arm

On arm, a fully anonymous c23-style function is called incorrectly.  All
arguments are passed on the stack while the receiving function expects r0-r3 to
be used for the initial arguments.

For example,

void f (...);

void g()
{
f (1, 2, 3, 4);
}

With gcc compiles to:

g:
push{lr}
movsr0, #1
movsr1, #2
sub sp, sp, #20
movsr2, #3
movsr3, #4
stm sp, {r0, r1, r2, r3}  // Arguments pushed to stack (wrong)
bl  f
add sp, sp, #20
ldr pc, [sp], #4

When the correct code (eg, as produced by clang) is something like

g:
mov r0, #1
mov r1, #2
mov r2, #3
mov r3, #4
b   f

compile with, eg 

arm-non-eabi-gcc -O2 -c23

[Bug target/108120] [11/12 Regression] ICE: in extract_insn, at recog.cc:2791 (on ARM with -mfpu=neon -freciprocal-math -O3)

2024-02-23 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108120

Richard Earnshaw  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Richard Earnshaw  ---
Fixed on all active branches

[Bug target/108120] [11/12 Regression] ICE: in extract_insn, at recog.cc:2791 (on ARM with -mfpu=neon -freciprocal-math -O3)

2024-02-23 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108120

Richard Earnshaw  changed:

   What|Removed |Added

Summary|[11/12/13 Regression] ICE:  |[11/12 Regression] ICE: in
   |in extract_insn, at |extract_insn, at
   |recog.cc:2791 (on ARM with  |recog.cc:2791 (on ARM with
   |-mfpu=neon  |-mfpu=neon
   |-freciprocal-math -O3)  |-freciprocal-math -O3)
   Assignee|unassigned at gcc dot gnu.org  |rearnsha at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug target/108120] [11/12/13 Regression] ICE: in extract_insn, at recog.cc:2791 (on ARM with -mfpu=neon -freciprocal-math -O3)

2024-02-23 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108120

Richard Earnshaw  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE:
   |ICE: in extract_insn, at|in extract_insn, at
   |recog.cc:2791 (on ARM with  |recog.cc:2791 (on ARM with
   |-mfpu=neon  |-mfpu=neon
   |-freciprocal-math -O3)  |-freciprocal-math -O3)

--- Comment #4 from Richard Earnshaw  ---
Fixed on trunk so far.

[Bug target/107270] [11/12/13/14 Regression] return for structure is not as good as before

2024-02-22 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107270

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-22
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Earnshaw  ---
Successfully matched this instruction:
(set (reg/i:DI 0 x0)
(ior:DI (and:DI (reg/v:DI 92 [ b ])
(const_int 4294967295 [0x]))
(ashift:DI (subreg:DI (reg:SI 100) 0)
(const_int 32 [0x20]
rejecting combination of insns 10 and 15
original costs 4 + 4 = 8
replacement cost 12

But this is just BFI, so it's a costing issue.

[Bug target/113780] [ARM] Incorrect indirect tailcall generated for PAC-enabled function.

2024-02-08 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113780

Richard Earnshaw  changed:

   What|Removed |Added

 CC||keithp at keithp dot com

--- Comment #2 from Richard Earnshaw  ---
*** Bug 113795 has been marked as a duplicate of this bug. ***

[Bug target/113795] armv8.1m-m.main+pacbti -mbranch-protection=standard -O2 compile error

2024-02-08 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113795

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Richard Earnshaw  ---
Same as 113780

*** This bug has been marked as a duplicate of bug 113780 ***

[Bug target/108933] [11/12/13 Regression] Missing rev16 detection

2024-01-29 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

Richard Earnshaw  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13 Regression]
   |Missing rev16 detection |Missing rev16 detection
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |Matthieu.Longo at arm 
dot com

--- Comment #6 from Richard Earnshaw  ---
Fixed on trunk so far.

[Bug target/113542] [14 Regression] gcc.target/arm/bics_3.c regression after change for pr111267

2024-01-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542

Richard Earnshaw  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #2 from Richard Earnshaw  ---
The costing code is expecting

(parallel [
(set (reg:SI 124 [ _7 ])
(ne:SI (reg:SI 122 [ _2 ])
(const_int 0 [0])))
(clobber (reg:CC 100 cc))
])


To result in the assembler output

SUBS r124, R122, #1
SBC  r124, R122, r124

so really should have a cost of 8 (two insns).  But for some reason the thumb2
back-end is not generating that output in this case.  Overall, that means that
for bic_si_test

BIC r0, r0, r1
SUBS r1, r0, #1
SBC r0, r0, r1

is neither better nor worse than

BICS r0, r0, r1
IT ne
MOVNE r0, #1

and certainly better than

BICS r0, r0, r1
ITE ne
MOVNE r2, #1
MOVEQ r2, #0

at least when it comes to code size.

So the test is somewhat flaky, but there is a further problem with the compiler
not generating the expected sequence for NE(reg, 0) in Thumb2.

[Bug target/113510] [14 Regression] [ARM Thumb] ICE in extract_constrain_insn with CPU cortex-m23

2024-01-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113510

--- Comment #6 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #5)
> Yes the peephole2 in thumb1.md looks wrong:
> ```
> ;; Reloading and elimination of the frame pointer can
> ;; sometimes cause this optimization to be missed.
> (define_peephole2
>   [(set (match_operand:SI 0 "arm_general_register_operand" "")
> (match_operand:SI 1 "const_int_operand" ""))
>(set (match_dup 0)
> (plus:SI (match_dup 0) (reg:SI SP_REGNUM)))]
>   "TARGET_THUMB1
>&& UINTVAL (operands[1]) < 1024
>&& (UINTVAL (operands[1]) & 3) == 0"
>   [(set (match_dup 0) (plus:SI (reg:SI SP_REGNUM) (match_dup 1)))]
>   ""
> )
> ```
> 
> Confirmed.

Since this is a peephole and we're dealing with hard regs, we can just use
"low_register_operand" as the predicate for operand 0.

[Bug rtl-optimization/113542] gcc.target/arm/bics_3.c regression after change for pr111267

2024-01-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542

Richard Earnshaw  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-01-24

--- Comment #1 from Richard Earnshaw  ---
Options to reproduce

-O2 -mcpu=cortex-m3 -mthumb

The problem is really a back-end issue.  But the cause is that the fwprop pass
is now merging

propagating insn 9 into insn 10, replacing:
(set (reg:SI 124 [ _7 ])
(ne:SI (reg:CC 100 cc)
(const_int 0 [0])))

with the flag setting instruction to form

(parallel [
(set (reg:SI 124 [ _7 ])
(ne:SI (reg:SI 122 [ _2 ])
(const_int 0 [0])))
(clobber (reg:CC 100 cc))
])

That's OK, but it means that the combine pass is no-longer able to merge the
flag setter with an earlier result producer.

A similar thing starts to happen arm state this is dropped because the costs
are working out as the same (it has to reduce the cost).

So I think it's that the cost model for thumb2 needs tweaking.

[Bug testsuite/113278] analyzer tests relying on fileno() fail on arm-eabi

2024-01-08 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113278

--- Comment #1 from Richard Earnshaw  ---
newlib certainly implements fileno():

$ nm libc.a|grep fileno
libc_a-fileno.o:
 T fileno
 U fileno
libc_a-fileno_u.o:
 T fileno_unlocked
 U fileno

So perhaps the issue is that the prototype is missing (or missing with the
default compilation options since it's Posix and I don't think we pass options
to enable that by default).  Grepping the source, I suspect the former.

[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra

2024-01-08 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257

--- Comment #4 from Richard Earnshaw  ---
I'm not sure.  My understanding was that -march=native started by looking up
the CPU ID first and then using the internal mapping of that CPU to the
architecture (which can't work if we don't recognize the CPU), but perhaps we
try a bit harder when both are specified.

[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra

2024-01-08 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257

--- Comment #2 from Richard Earnshaw  ---
For -mcpu=native, the manual says:

Additionally on native AArch64 GNU/Linux systems the value
@samp{native} tunes performance to the host system.  This option has no effect
if the compiler is unable to recognize the processor of the host system.

With similar working for -march=native

Since nobody has contributed patches to recognize the Apple Silicon cores, I
suspect that is the source of the problem.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #61 from Richard Earnshaw  ---
Then I don't understand what you're trying to say in c57.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #59 from Richard Earnshaw  ---
Memcpy must never write beyond the end of the specified buffer, even if reading
it is safe.  That wouldn't be thread safe.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #56 from Richard Earnshaw  ---
I've never heard of a memcpy implementation that corrupts data if called with
memcpy (p, p, n).  (The problems come from partial overlaps where the direction
of the copy may matter).

Has anybody considered asking the standards committee to bless this as a
special exception?

Of course, if n is large, then performing an early test is still worthwhile,
but for small n, the cost of the check possibly exceeds the benefit of eliding
the copy.

[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1

2024-01-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045

--- Comment #28 from Richard Earnshaw  ---
(In reply to David Binderman from comment #5)
> No idea. I know the gcc project is over 30 years old and it is not
> feasible for me to download the entire history, it is too large.
> 
> I have the last 18 months or so history and that's a whopping
> 3.8 Gig on it's own.

$ cd ~/gnusrc/gcc/master/.git
$ du -sh .
1.8G.

So on my machine the entire git history is just 1.8G; that's because the
history is very densely packed on the server and pulling the entire history
does not require an unpack-repack-send sequence.  

But if you download a partial history, then the git server has to unpack and
then repack the required history in order to send it; that makes the process
much slower and results in far more data being transmitted (the on-the-fly
repack is not as dense because it would take too much time).

[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1

2024-01-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045

--- Comment #27 from Richard Earnshaw  ---
> ==9933==by 0x151D554: search_line_fast (lex.cc:872)

This is the entry code; so the issue is with the initial alignment code (unless
the buffer is smaller than 16 bytes, when we might get both under reading and
overreading).

[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1

2024-01-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045

--- Comment #26 from Richard Earnshaw  ---
I think it's more likely that this is at the start of the buffer rather than
the end, and related to rounding the address down to a 16-byte alignment.  But
it could also occur at the end of the buffer as well if the buffer is (nearly)
full.

[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1

2024-01-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045

--- Comment #24 from Richard Earnshaw  ---
(In reply to David Binderman from comment #22)
> Is the optimization still worthwhile some 12 years later ?

Almost certainly.  Vector operations have become much better than they were at
the time the patch went in, so it's probably even more worthwhile.

[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1

2024-01-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045

--- Comment #21 from Richard Earnshaw  ---
FTR it was this patch that added this code.  So 2012!

commit e75b54a2d932929a9b2e940c5aad1ef33a86c008
Author: Richard Earnshaw 
Date:   Thu Mar 22 17:54:55 2012 +

* lex.c (search_line_fast): Provide Neon-optimized version for ARM.

From-SVN: r185702

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 97177e89916..133620b3b70 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,7 @@
+2012-03-22  Richard Earnshaw  
+
+   * lex.c (search_line_fast): Provide Neon-optimized version for ARM.
+

[Bug target/113045] armv7l-unknown-linux-gnueabihf: valgrind error during build of libcc1

2024-01-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113045

Richard Earnshaw  changed:

   What|Removed |Added

 CC||rearnsha at gcc dot gnu.org

--- Comment #20 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #9)
> This is almost definitely a valgrind issue.
> We start with:
>   /* Align the source pointer.  */
>   misalign = (uintptr_t)s & 15;
>   p = (const uint8_t *)((uintptr_t)s & -16);
>   data = vld1q_u8 (p);
> 
> 
> Which all other targets do too.
> 
> Basically this is how you realign the pointer and if don't depend on the
> bytes that is not in the original pointer, then this is valid.
> 
> Does it work correctly without valgrind?

Yes, for the first fetch, we align down to a 16-byte boundary and fetch the
full 16 bytes.  We then mask off the bytes that are before the real start of
the buffer so that they cannot affect the result.  So the code is safe, but
valgrind has no real way of knowing this.

Tricks like this wouldn't work with capability pointers, but we're not
concerned about that here; even MTE (on aarch64) would be ok because the
alignment used matches the tag granule size.

So I'm pretty sure this is a false positive.  But perhaps we should just
disable the vectorized scanning when valgrind checking is enabled.

Note that glibc implementations of str* functions can perform a similar trick,
but perhaps valgrind has special knowledge of such cases.

[Bug target/113030] parsecpu.awk's chkarch/chkcpu commands is broken for aliases

2023-12-15 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113030

--- Comment #4 from Richard Earnshaw  ---
Yes, that looks sensible.  Can you post it please?

[Bug target/112334] ICE in gen_untyped_return arm.md:9197 while compiling harden-cfr-bret.c

2023-11-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112334

--- Comment #1 from Richard Earnshaw  ---
This might be a side issue, but:

@defbuiltin{{void} __builtin_return (void *@var{result})}
This built-in function returns the value described by @var{result} from
the containing function.  You should specify, for @var{result}, a value
returned by @code{__builtin_apply}.

So I'm not sure it's legal to pass  to __builtin_return().

[Bug target/109166] Built-in __atomic_test_and_set does not seem to be atomic on ARMv4T

2023-09-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109166

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED

--- Comment #8 from Richard Earnshaw  ---
I'm going to close this as WONTFIX.

There are several reasons for this.

There's no SWPH operation, so it's impossible to generalize atomic operations
for all basic data types.  It's not possible to synthesize a 16-bit atomic type
with either SWP or SWPB.

There's no support in Thumb state for SWP[B].

The instruction was removed in later versions of the architecture, which makes
code non-portable.

Finally, Armv4, which dates to around 1995, is essentially in maintenance only
mode and this is really a new feature request.  In fact, I don't think we'd
really want to add new features for anything before Armv7 these days (even that
is more than 10 years old).

[Bug target/111096] Frame pointer is not used even when -fomit-frame-pointer is specified

2023-08-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111096

--- Comment #8 from Richard Earnshaw  ---
(In reply to Thomas Koenig from comment #7)
> Would it make sense to document this somewhere?  Or did I just miss it? :-)

Possibly, but I've no idea where.  It's too target-specific to put under the
generic documentation for -fomit-frame-pointer and I don't think there's a
section in the manual that really documents the target-specific behaviours of
generic options.

[Bug target/97807] ICE in output_move_double, at config/arm/arm.c:19689

2023-08-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97807

--- Comment #4 from Richard Earnshaw  ---
I can reproduce this, but only with -mfloat-abi=soft.

[Bug target/111096] Frame pointer is not used even when -fomit-frame-pointer is specified

2023-08-23 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111096

--- Comment #6 from Richard Earnshaw  ---
For completeness.

The AArch64 ABI lists 4 alternatives with respect to having a frame chain. When
-fomit-frame-pointer is used, GCC implements this one:

- It may require the frame pointer to address a valid frame record at all
times, except that any subroutine may elect not to create a frame record

[Bug target/111096] Frame pointer is not used even when -fomit-frame-pointer is specified

2023-08-23 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111096

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Richard Earnshaw  ---
This was a deliberate design choice.  Although the frame chain is not set up by
code that omits the frame pointer, the chain of frames that are set up by other
functions is still valid this way.  This ensures that any code that does try to
walk the frame chain will not crash.  If we reused the frame pointer for other
purposes, then any code trying to walk the frame chain (eg backtrace()) would
encounter an invalid record and likely crash.

With 31 main registers, the benefit from one additional one is not especially
large.

[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls

2023-08-21 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

--- Comment #16 from Richard Earnshaw  ---
(In reply to Mark Brown from comment #15)
> The kernel module loader simply does not insert veneers at present, and
> there were some implementation concerns IIRC.

That's not a good reason to weaken the security of the generated code.

[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls

2023-08-21 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

--- Comment #14 from Richard Earnshaw  ---
(In reply to Mark Brown from comment #13)
> The kernel hasn't got any problem with BTI as far as I am aware - when built
> with clang we run the kernel with BTI enabled since clang does just insert a
> BTI C at the start of every function, and GCC works fine so long as we don't
> get any out of range jumps being generated. The issue is that we don't have
> anything to insert veneers in the case where section placement puts static
> functions into a distant enough part of memory to need an indirect jump but
> GCC has decided to omit the landing pad.

The linker has to insert the veneers.

[Bug target/110908] [aarch64] Internal compiler error when using -ffixed-x30

2023-08-07 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110908

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-07
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
 Ever confirmed|0   |1

--- Comment #4 from Richard Earnshaw  ---
Why would you ever want to fix x30?  Because of the way it is used by the
architecture, there's no possible value in doing so.  The compiler may insert
instructions that must clobber this value at any point in the program (to
handle libfuncs, for example), so it would be unsafe to store any useful value
in it.

I think it would be far more useful to make the compiler reject this option
than to give the appearance that it is possible, when frankly, it isn't.

Although it isn't technically, an ICE on invalid code, it's about as close to
that as you can get.

[Bug target/110901] -march does not override -mcpu (big.little on aarch64

2023-08-07 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #4 from Richard Earnshaw  ---
I think this is a driver bug.  The MCPU_TO_MARCH_SPEC should be wrapped with 

%{!march=*:...} 

so that the CPU architecture is ignored if -march has been explicitly
specified.

[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf

2023-08-02 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796

Richard Earnshaw  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rearnsha at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #12 from Richard Earnshaw  ---
Working on a patch.

[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf

2023-07-26 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-26
 CC||rearnsha at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #11 from Richard Earnshaw  ---
Confirmed.  It only happens when generating Thumb code.  For Arm code it works
correctly.

I think the problem is that the Thumb code generator is emitting vcmf, while
the Arm code generator uses vcmfe - the latter sets the exception bits.

I'm not sure why the code is different yet, still investigating.

[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf

2023-07-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796

--- Comment #9 from Richard Earnshaw  ---
proc add_options_for_ieee { flags } {
if { [istarget alpha*-*-*]
 || [istarget sh*-*-*] } {
   return "$flags -mieee"
}
if { [istarget rx-*-*] } {
   return "$flags -mnofpu"
}
return $flags
}

So it looks like this isn't expecting to add anything in most cases.

[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf

2023-07-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796

--- Comment #6 from Richard Earnshaw  ---
Is the exception status supposed to be in a defined state when the test runs? 
Shouldn't there be a call to feclearexcept (FE_ALL_EXCEPT) at the start of the
test?

[Bug target/86772] [meta-bug] tracking port status for CVE-2017-5753

2023-06-16 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86772
Bug 86772 depends on bug 86793, which changed state.

Bug 86793 Summary: mips port needs updating for CVE-2017-5753
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86793

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/86793] mips port needs updating for CVE-2017-5753

2023-06-16 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86793

Richard Earnshaw  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |14.0

--- Comment #3 from Richard Earnshaw  ---
Fixed on main development branch.

[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a

2023-04-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

--- Comment #8 from Richard Earnshaw  ---
Applies to both AArch64 and Arm back-ends.

[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a

2023-04-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

Richard Earnshaw  changed:

   What|Removed |Added

 CC||Vedant.VijayYevale@infineon
   ||.com

--- Comment #7 from Richard Earnshaw  ---
*** Bug 109415 has been marked as a duplicate of this bug. ***

[Bug target/109415] No predefined macros to differentiate between ARM Cortex-M33 and Cortex-M55

2023-04-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109415

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #9 from Richard Earnshaw  ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

*** This bug has been marked as a duplicate of bug 99312 ***

[Bug target/109415] No predefined macros to differentiate between ARM Cortex-M33 and Cortex-M55

2023-04-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109415

--- Comment #8 from Richard Earnshaw  ---
The __ARM_ARCH_...__ macros turned out to be a very bad design decision. Each
new architecture needs a new macro that older compilers (and software) will not
know about.  The ACLE approach is far more sensible and GCC has mostly adopted
that now.  I personally consider the existing __ARM_ARCH_...__ macros to be
deprecated, though I don't think the manual actually says this yet.

There is a known bug in GCC.  ACLE says that __ARM_ARCH should have the value
*100 +  for architectures after arm-v8, but we don't
implement that yet (there may already be a PR about this) and report
__ARM_ARCH=8 for all existing armv8.xxx variants.

[Bug target/108943] ARM Unaligned memory access with high optimizer levels

2023-02-27 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108943

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Richard Earnshaw  ---
There's no compiler bug here.

Cortex-M7 implements the ARMv7em version of the architecture, which supports
unaligned accesses.  If this is faulting then it's because you're trying to use
the operation on something like device memory without informing the compiler
about this.  You need to mark your pointers as volatile in this case.

The alternative is to compile with -mno-unaligned-access, but I wouldn't
recommend this as that will disable other optimizations where this might be
safe and useful.

[Bug ipa/108470] Missing documentation for alternate uses of attribute((noinline))

2023-02-09 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108470

--- Comment #3 from Richard Earnshaw  ---
The manual entry for this says "This attribute is supported mainly for the
purpose
of testing the compiler." which suggests a lack of long-term commitment to the
option.  Perhaps it would be better to remove that.

In some ways the analogy is with "-fast-math" which is a short-hand for a
number of other flags but not guaranteed to be only those options - although in
this case 'noipa' is, I think, intended to be conservatively safe.

[Bug target/103100] [11/12/13 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align

2023-01-31 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100

--- Comment #19 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #18)
> I should say that testcase happens at `-Os -mstrict-align`, at `-O2
> -mstrict-align` it works.

Because for -Os we don't forcibly align arrays - see 
AARCH64_EXPAND_ALIGNMENT and the macros that use it.

[Bug target/100000] non-leaf epologue/prologue used if MVE v4sf is used for load/return

2023-01-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10

--- Comment #3 from Richard Earnshaw  ---
Given that the hard-float ABI essentially requires V4SF as a type, it might be
better to consider this mode supported unconditionally in this case, and
although that might make the compiler try some pointless vectorizations it
would generate better code for cases like this.

[Bug target/100000] non-leaf epologue/prologue used if MVE v4sf is used for load/return

2023-01-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10

--- Comment #2 from Richard Earnshaw  ---
If the testcase is built with -march=armv8.1-m.main+mve.fp then the useless
stack adjustments go away.  I think that's because V4SFmode is not a supported
vector mode for integer MVE - see arm_vector_mode_supported_p() in arm.cc. 
When it isn't a builtin type we end up with a BLKmode object that the compiler
creates a stack-slot for, even though no RTL is ever generated to use the slot
in this case.

[Bug target/108515] Fails to link fixincl with unresolvable R_ARM_MOVW_ABS_NC reloca tion against symbol `stderr@@GLIBC_2.4'

2023-01-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108515

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|FIXED   |INVALID

[Bug target/108515] Fails to link fixincl with unresolvable R_ARM_MOVW_ABS_NC reloca tion against symbol `stderr@@GLIBC_2.4'

2023-01-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108515

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #13 from Richard Earnshaw  ---
(In reply to Richard Biener from comment #11)

> So eventually linking with -Wl,-z,nocopyreloc will fail?

If you want to avoid copyrelocs you'll need to compile with -fpie.

[Bug target/108515] Fails to link fixincl with unresolvable R_ARM_MOVW_ABS_NC reloca tion against symbol `stderr@@GLIBC_2.4'

2023-01-24 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108515

--- Comment #10 from Richard Earnshaw  ---
Almost certainly this is related to the need for a copyreloc and presumably the
linker has not created one for some reason.  So I suspect this is most likely a
binutils issue rather than a compiler one.  The code generated for the simple
test is just

main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
movwr3, #:lower16:stderr
movtr3, #:upper16:stderr
push{r4, lr}
movwr0, #:lower16:.LC0
movtr0, #:upper16:.LC0
ldr r1, [r3]
bl  printf
mov r0, #0
pop {r4, pc}

And the references to stderr will require the definition to be moved from the
shared library to the static image during linking.

[Bug target/103100] [11/12/13 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align

2023-01-20 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100

--- Comment #14 from Richard Earnshaw  ---
(In reply to Richard Biener from comment #13)
> (In reply to Andrew Pinski from comment #10)
> > Updated patch submitted:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html
> 
> I think you need to ping your patches more aggressively ...

Richard Sandiford reviewed it here:|
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589581.html
So the problem is that the review wasn't followed up by the submitter.

[Bug target/108442] arm: MVE's vld1* and vst1* do not work when __ARM_MVE_PRESERVE_USER_NAMESPACE is defined

2023-01-18 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108442

--- Comment #5 from Richard Earnshaw  ---
Fixed on master.  While this is not a regression, we should consider a
backport.

1 2 3 4 >

1 - 100 of 328 matches

Mail list logo