[Bug target/110592] [SPARC] GCC should default to TSO memory model when compiling for sparc32

2023-07-12 Thread koachan+gccbugs at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110592

--- Comment #8 from Koakuma  ---
Created attachment 55529
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55529=edit
Proposed patch for relaxing the guards of barrier emission

Hello, sorry that I only got to reply now.
And yeah, I first noticed it when I was trying out some C++ concurrency
tutorials. A bit weird, I admit...

That being said,
> Sorry, no, NetBSD/sparc is too obscure a platform to justify changing the 
> default for the entire compiler.
Understood. So the memory model default should not change, that is okay.
(And I believe the NetBSD folks also agree with me on this?)

However...

> But you can do like Linux & Solaris and add sparc/tso.h to the tm_file list 
> of sparc-*-netbsdelf*) in config.gcc.
As Campbell has said, the thing is that this (and -mmemory-model=tso) currently
does not work when targeting v7 because all the barrier emitters are gated with
TARGET_V8 || TARGET_V9.

(By the way, `-mcpu=v7 -mmemory-model=tso` is broken on Linux too, for the same
reason)

Attached is a patch to relax the barrier requirements such that it is possible
to emit the ldstub barriers even when targeting v7. This does not change any
defaults, but, crucially, it does allow -mmemory-model=tso to be used with v7
target.
There is probably some better way to do it - I am unfamiliar with GCC internals
- but so far it has been working fine to me.

What do you think? Would it be okay if it is only changed in this way?

[Bug target/110592] New: [SPARC] GCC should default to TSO memory model when compiling for sparc32

2023-07-07 Thread koachan+gccbugs at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110592

Bug ID: 110592
   Summary: [SPARC] GCC should default to TSO memory model when
compiling for sparc32
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: koachan+gccbugs at protonmail dot com
  Target Milestone: ---

Created attachment 55501
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55501=edit
Reproducer of unwanted memory reordering under TSO processors

Currently, when targeting sparc32 processors, GCC assumes that the hardware has
sequentially consistent memory ordering by default.
This can cause problems when running generated binaries on v8 and later
processors, which uses weaker TSO ordering.

In the attached reproducer, when compiled with the default sparc32 target, the
resulting code is missing the required barriers:

0fb0 :
...
104c:   e0 26 00 00 st  %l0, [ %i0 ]
1050:   c2 06 40 00 ld  [ %i1 ], %g1
1054:   c2 26 80 00 st  %g1, [ %i2 ]
...

108c :
...
1128:   e0 26 00 00 st  %l0, [ %i0 ]
112c:   c2 06 40 00 ld  [ %i1 ], %g1
1130:   c2 26 80 00 st  %g1, [ %i2 ]
...

Compare with the result when explicitly specifying -mcpu=v8:

0fa4 :
...
1040:   e0 26 00 00 st  %l0, [ %i0 ]
1044:   c0 6b bf ff ldstub  [ %sp + -1 ], %g0
1048:   c0 6b bf ff ldstub  [ %sp + -1 ], %g0
104c:   c2 06 40 00 ld  [ %i1 ], %g1
1050:   c2 26 80 00 st  %g1, [ %i2 ]
1054:   c0 6b bf ff ldstub  [ %sp + -1 ], %g0
...

108c :
...
1128:   e0 26 00 00 st  %l0, [ %i0 ]
112c:   c0 6b bf ff ldstub  [ %sp + -1 ], %g0
1130:   c0 6b bf ff ldstub  [ %sp + -1 ], %g0
1134:   c2 06 40 00 ld  [ %i1 ], %g1
1138:   c2 26 80 00 st  %g1, [ %i2 ]
113c:   c0 6b bf ff ldstub  [ %sp + -1 ], %g0
...

This causes the default-target code to hit the assert condition.

Since all code that works on TSO processors will work on processors with a
stronger memory model (i.e sequential consistency), it is probably better if
GCC uses TSO by default unless otherwise specified (e.g by explicitly using
-mcpu=v7).

[Bug target/105782] [sparc64] Emission of questionable movxtod/movdtox with -mvis3

2022-06-08 Thread koachan+gccbugs at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105782

--- Comment #4 from Koakuma  ---
(In reply to Eric Botcazou from comment #3)
> I guess that, under high register pressure, the register allocator rather
> uses floating-point registers than spllling values on the stack.

I suppose so?
However, I found that when compiling the source from the previous comment with
-mvis3, it emits over 1400 movXtoY instructions, resulting in 1300-ish extra
instructions compared to the version without VIS 3, which seem to be quite
weird to me.

[Bug target/105782] [sparc64] Emission of questionable movxtod/movdtox with -mvis3

2022-06-01 Thread koachan+gccbugs at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105782

--- Comment #2 from Koakuma  ---
Created attachment 53066
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53066=edit
Vectorization log from -fopt-info-vec-all

(In reply to Richard Biener from comment #1)
> You can check -fopt-info-vec for vectorization.

I tried recompiling it with -fopt-info-vec-all and I got a long message that
ends with:

> blake2b-monocypher-standalone.c:75:18: note: Cost model analysis: 
> blake2b-monocypher-standalone.c:75:18: note: Cost model analysis for part in 
> loop 0:
>   Vector cost: 2282
>   Scalar cost: 181
> blake2b-monocypher-standalone.c:75:18: missed: not vectorized: vectorization 
> is not profitable.

So I dont think that GCC vectorized that function.

Also, I tried recompiling with -fno-tree-optimize and it doesn't improve
anything.
Seems like the problem isn't in the vectorizer?
(it still produces the same slow code with many `movxtod`/`movdtox`s)

[Bug target/105782] New: [sparc64] Emission of questionable movxtod/movdtox with -mvis3

2022-05-30 Thread koachan+gccbugs at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105782

Bug ID: 105782
   Summary: [sparc64] Emission of questionable movxtod/movdtox
with -mvis3
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: koachan+gccbugs at protonmail dot com
  Target Milestone: ---

Created attachment 53055
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53055=edit
The problematic function, adapted for standalone compilation

Hello, I found out that the blake2b implementation in monocypher runs much
slower on a SPARC T4 when compiled with `-O3 -mvis3`, as opposed to plain
`-O3`:

With plain -O3:  Blake2b : 184 megabytes  per second
With -O3 -mvis3: Blake2b : 118 megabytes  per second

(Results are from monocypher's `make speed` benchmark)

Looking at the generated assembly, it seems that when the code is compiled with
-mvis3, GCC emits a lot of questionable `movxtod`/`movdtox` instructions?

I'm using sparc64-linux-gnu-gcc (GCC) 12.1.0.

[Bug c/105292] New: [sparc64] ICE in expand_expr_real_2 on sparc64 when compiling with -mcpu=niagara4

2022-04-16 Thread koachan+gccbugs at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105292

Bug ID: 105292
   Summary: [sparc64] ICE in expand_expr_real_2 on sparc64 when
compiling with -mcpu=niagara4
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: koachan+gccbugs at protonmail dot com
  Target Milestone: ---

Created attachment 52820
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52820=edit
Preprocessed source of ctf-open.c

Hello~

I'm getting this error on sparc64-linux-gnu when compiling gdb (commit
f0072f79e12 from git), in ctf-open.c:

gcc -O3 -mcpu=niagara4 -c libctf_la-ctf-open.c 
during RTL pass: expand
../../libctf/ctf-open.c: In function ‘ctf_bufopen_internal.part.0’:
../../libctf/ctf-open.c:1117:69: internal compiler error: in
expand_expr_real_2, at expr.c:9867
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

As far as I know, compiling with -mcpu=niagara3 and -mcpu=niagara2 also fails
with the same error.
I'm using "gcc version 11.2.0 (GCC)". Attached is the preprocessed source of
the offending file.