[Bug target/114499] New: MVE: scatter base offset constraints incorrect

2024-03-27 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114499

Bug ID: 114499
   Summary: MVE: scatter base offset constraints incorrect
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kevin.bracey at alifsemi dot com
  Target Milestone: ---

An attempt to use

uint32x4_t base;
float32x4_t value;
vstrwq_scatter_base_wb_f32(, -sizeof(float), value);

Generates an "unsupported" error. It does not accept -4 as a valid offset, but
it should. It's looking for a multiple of 8 from -1016 to +1016, not a multiple
of 4 from -508 to +508 as it should.

Looking at mve.md, I see a number of scatter/gather_base operations have
incorrect constraints; they're rather random.

Offsets for VLDRW/VSTRW are always 7-bit with a sign bit, representing +/-0 to
+/-127*memory size. So the W and D base forms all take -508 to 508 multiples of
4 ("O"?) or -1016 to +1016 multiples of 8 ("Ri").

The "Rl" constraint was wrongly added for just
mve_vstrwq_scatter_base_wb_p_fv4sf
(https://github.com/gcc-mirror/gcc/commit/ae180f26109bfaebb4ab0f4d45035fd075cf02c8),
and it is not required. If it was really needed for a halfword instruction its
range should be -254 to +254. It seems that mve_vector_mem_operand() handles
this range correctly for non-scatter/gather.

Some corrections I think are needed are:

mve_vldrwq_gather_base_v4si i -> O
mve_vldrwq_gather_base_v2di i -> Ri
mve_vldrwq_gather_base_z_v2di i -> Ri
mve_vldrwq_gather_base_fv4sf i -> O
mve_vldrwq_gather_base_z_fv4sf i -> O
mve_vldrwq_gather_base_wb_v4si Ri -> O
mve_vldrwq_gather_base_wb_z_v4si Ri -> O
mve_vldrwq_gather_base_wb_fv4sf  Ri -> O
mve_vldrwq_gather_base_wb_z_fv4sf  Ri -> O

mve_vstrwq_scatter_base_v4si i -> O
mve_vstrwq_scatter_base_fv4sf i -> O
mve_vstrwq_scatter_base_wb_v4si Ri -> O
mve_vstrwq_scatter_base_wb_p_v4si Ri -> O
mve_vstrwq_scatter_base_wb_fv4sf Ri -> O
mve_vstrwq_scatter_base_wb_p_fv4sf Rl -> O

But I don't know that that's exhaustive.

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-11-29 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

--- Comment #8 from Kevin Bracey  ---
I'm only testing on the Linux trunk because it's what Godbolt has. If it has
bare-metal, I'm not seeing it.

Actual real development system is bare-metal using Arm's embedded GCC releases,
and I don't have a set-up to test a trunk GCC build on it at the moment.

Clearly Helium+Linux on Godbolt is a bit confused because it's always using
non-existent registers Q8 upwards. There may be a fundamental config error
leading to all sorts of strange results.

(Mostly reproduces my bare-metal findings though.)

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-11-29 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

--- Comment #6 from Kevin Bracey  ---
Retesting the Godbolt on trunk, it's now worse - every line produces multiple
not-very-informative errors:

source>:7:9: error: '_Generic' specifies two compatible types
7 | x = vmulq(x, 0.5); // ok
  | ^
:7:9: note: compatible type is here
7 | x = vmulq(x, 0.5); // ok
  | ^

(repeated 6 times per source line)

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2022-11-21 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

--- Comment #5 from Kevin Bracey  ---
I had a look at the GCC source. The vld2/vst2/vld4/vst4 instructions in mve.md
have reused the "Um" constraint used for vld/vst in Neon, which permits
both "!" and register offset.

This needs to be tightened up - can't see an existing equivalent constraint.
Perhaps "Um" can be given variant MVE/Neon behaviour, like "Uj".

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-11-21 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

--- Comment #4 from Kevin Bracey  ---
Yes, looking at them it seems clear those patches address what I'm seeing with
the `vmulq(x, 6)` issue.

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2022-11-21 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

--- Comment #4 from Kevin Bracey  ---
The assembler's rejection of the vld2 is valid - the only permitted
post-indexed form is to use "!" for increment by 32 (the amount read).

Experimenting by changing "inStep" you can see the compiler backend knows that
32 is the only valid constant offset - it generates the "!" form for that
correctly - but it apparently hasn't been told not to use register offsets.

[Bug target/107515] MVE: Generic functions do not accept _Float16 scalars

2022-11-16 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

--- Comment #2 from Kevin Bracey  ---
I've just spotted another apparent generic selection problem in my reproducer
for  bug 107714 - should I create a new issue for it?

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2022-11-16 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

Kevin Bracey  changed:

   What|Removed |Added

 CC||stammark at gcc dot gnu.org

--- Comment #2 from Kevin Bracey  ---
Ah, the vmulq is falling foul of some sort of generic selection problem.
Substituting with vmulq_n_u8() gets me the actual 6.

Something in the same area as my bug 107515, perhaps - I've been making liberal
use of the generic functions.

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2022-11-16 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

--- Comment #1 from Kevin Bracey  ---
Looking at that assembly output from Compiler Explorer, I'm also at a loss as
to what happened to the "6" for the VMUL. Maybe something else to look at?

[Bug target/107714] New: MVE: Invalid addressing mode generated for VLD2

2022-11-16 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

Bug ID: 107714
   Summary: MVE: Invalid addressing mode generated for VLD2
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kevin.bracey at alifsemi dot com
  Target Milestone: ---

Created attachment 53909
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53909=edit
Stripped-down reproducer source

While I was working on some Helium intrinsics, GCC produced some invalid code,
meaning my optimisation can only be enabled in our armclang builds. Problem
seems to be still present on GCC trunk.

Posted at https://godbolt.org/z/h3EhMvxao

Compilation options -O2 -mcpu=cortex-m55 -mfloat-abi=hard

Error: instruction does not accept this addressing mode -- `vld21.8
{q4,q5},[r3],r2'

Compiler Explorer output for trunk shows the same invalid addressing mode.

(It also shows non-existent registers q8 and up in use - I don't know why. Not
a problem in my local GCC, obtained from Arm's embedded distribution).

[Bug target/107515] New: MVE: Generic functions do not accept _Float16 scalars

2022-11-03 Thread kevin.bracey at alifsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

Bug ID: 107515
   Summary: MVE: Generic functions do not accept _Float16 scalars
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kevin.bracey at alifsemi dot com
  Target Milestone: ---

Compiling C code, generic functions taking floating point scalars in arm_mve.h
do not accept `_Float16` values.

// Using gcc -mcpu=cortex-m55 -O2
// Uploaded at https://godbolt.org/z/7jrqWWroY

#include 

void test(void)
{
float16x8_t x;

x = vmulq(x, 0.5); // ok
x = vmulq(x, 0.5f); // ok
x = vmulq(x, (__fp16) 0.5); // ok
x = vmulq(x, 0.15f16); // rejected
x = vmulq(x, (_Float16) 0.15); // rejected
}

Output:

:10:9: error: '_Generic' selector of type 'int (*)[4][39]' is not
compatible with any association
   10 | x = vmulq(x, 0.15f16); // rejected
  | ^
:11:9: error: '_Generic' selector of type 'int (*)[4][39]' is not
compatible with any association
   11 | x = vmulq(x, (_Float16) 0.15); // rejected
  | ^