Re: [PATCH v2 2/3] RISC-V: Add Zalrsc and Zaamo testsuite support

2024-06-12 Thread Jeff Law




On 6/11/24 12:21 PM, Patrick O'Neill wrote:



I made the whitespace cleanup patch (trailing whitespace, leading groups 
of 8 spaces -> tabs) for

target-supports.exp and got a diff of 584 lines.

Is this still worth doing or will it be too disruptive for rebasing/ 
other people's development?
I don't think it's overly disruptive.  This stuff doesn't have a lot of 
churn.  It'd be different if you were reformatting the whole tree :-)


Consider those fixes pre-approved.

jeff



Re: [PATCH 0/3] RISC-V: Amo testsuite cleanup

2024-06-12 Thread Jeff Law




On 6/11/24 12:03 PM, Patrick O'Neill wrote:

This series moves the atomic-related riscv testcases into their own folder and
fixes some minor bugs/rigidity of existing testcases.

This series is OK.
jeff



Re: [PATCH v2] Test: Move target independent test cases to gcc.dg/torture

2024-06-12 Thread Jeff Law




On 6/11/24 8:53 AM, pan2...@intel.com wrote:

From: Pan Li 

The test cases of pr115387 are target independent,  at least x86
and riscv are able to reproduce.  Thus,  move these cases to
the gcc.dg/torture.

The below test suites are passed.
1. The rv64gcv fully regression test.
2. The x86 fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: Move to...
* gcc.dg/torture/pr115387-1.c: ...here.
* gcc.target/riscv/pr115387-2.c: Move to...
* gcc.dg/torture/pr115387-2.c: ...here.

OK
jeff



Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-12 Thread Jeff Law




On 6/12/24 12:47 AM, Richard Biener wrote:



One of the points I wanted to make is that sched1 can make quite a
difference as to the relative distance of the store and load and
we have the instruction window the pass considers when scanning
(possibly driven by target uarch details).  So doing the rewriting
before sched1 might be not ideal (but I don't know how much cleanup
work the pass leaves behind - there's nothing between sched1 and RA).
ACK.  I guess I'm just skeptical about much separation we can get in 
practice from scheduling.


As far as cleanup opportunity, it likely comes down to how clean the 
initial codegen is for the bitfield insertion step.






On the hardware side I always wondered whether a failed load-to-store
forward results in the load uop stalling (because the hardware actually
_did_ see the conflict with an in-flight store) or whether this gets
catched later as the hardware speculates a load from L1 (with the
wrong value) but has to roll back because of the conflict.  I would
imagine the latter is cheaper to implement but worse in case of
conflict.
I wouldn't be surprised to see both approaches being used and I suspect 
it really depends on how speculative your uarch is.  At some point 
there's enough speculation going on that you can't detect the violation 
early enough and you have to implement a replay/rollback scheme.


jeff


Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-11 Thread Jeff Law




On 6/11/24 7:52 AM, Philipp Tomsich wrote:

On Tue, 11 Jun 2024 at 15:37, Jeff Law  wrote:




On 6/11/24 1:22 AM, Richard Biener wrote:


Absolutely.   But forwarding from a smaller store to a wider load is painful
from a hardware standpoint and if we can avoid it from a codegen standpoint,
we should.


Note there's also the possibility to increase the distance between the
store and the load - in fact the time a store takes to a) retire and
b) get from the store buffers to where the load-store unit would pick it
up (L1-D) is another target specific tuning knob.  That said, if that
distance isn't too large (on x86 there might be only an upper bound
given by the OOO window size and the L1D store latency(?), possibly
also additionally by the store buffer size) attacking the issue in
sched1 or sched2 might be another possibility.  So I think pass placement
is another thing to look at - I'd definitely place it after sched1
but I guess without looking at the pass again it's way before that?

True, but I doubt there are enough instructions we could sink the load
past to make a measurable difference.  This is especially true on the
class of uarchs where this is going to be most important.

In the case where the store/load can't be interchanged and thus this new
pass rejects any transformation, we could try to do something in the
scheduler to defer the load as long as possible.  Essentially it's a
true dependency through a memory location using must-aliasing properties
and in that case we'd want to crank up the "latency" of the store so
that the load gets pushed away.

I think one of the difficulties here is we often model stores as not
having any latency (which is probably OK in most cases).  Input data
dependencies and structural hazards dominate dominate considerations for
stores.


I don't think that TARGET_SCHED_ADJUST_COST would even be called for a
data-dependence through a memory location.
Probably correct, but we could adjust that behavior or add another 
mechanism to adjust costs based on memory dependencies.




Note that, strictly speaking, the store does not have an extended
latency; it will be the load that will have an increased latency
(almost as if we knew that the load will miss to one of the outer
points-of-coherence).  The difference being that the load would not
hang around in a scheduling queue until being dispatched, but its
execution would start immediately and take more cycles (and
potentially block an execution pipeline for longer).
Absolutely true.  I'm being imprecise in my language, increasing the 
"latency" of the store is really a proxy for "do something to encourage 
the load to move away from the store".


But overall rewriting the sequence is probably the better choice.  In my 
mind the scheduler approach would be a secondary attempt if we couldn't 
interchange the store/load.  And I'd make a small bet that its impact 
would be on the margins if we're doing a reasonable job in the new pass.


Jeff



Re: [PATCH v1] Test: Move target independent test cases to gcc.dg/torture

2024-06-11 Thread Jeff Law




On 6/11/24 12:19 AM, pan2...@intel.com wrote:

From: Pan Li 

The test cases of pr115387 are target independent,  at least x86
and riscv are able to reproduce.  Thus,  move these cases to
the gcc.dg/torture.

The below test suites are passed.
1. The rv64gcv fully regression test.
2. The x86 fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: Move to...
* gcc.dg/torture/pr115387-1.c: ...here.
* gcc.target/riscv/pr115387-2.c: Move to...
* gcc.dg/torture/pr115387-2.c: ...here.

OK
jeff



Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-11 Thread Jeff Law




On 6/11/24 1:22 AM, Richard Biener wrote:


Absolutely.   But forwarding from a smaller store to a wider load is painful
from a hardware standpoint and if we can avoid it from a codegen standpoint,
we should.


Note there's also the possibility to increase the distance between the
store and the load - in fact the time a store takes to a) retire and
b) get from the store buffers to where the load-store unit would pick it
up (L1-D) is another target specific tuning knob.  That said, if that
distance isn't too large (on x86 there might be only an upper bound
given by the OOO window size and the L1D store latency(?), possibly
also additionally by the store buffer size) attacking the issue in
sched1 or sched2 might be another possibility.  So I think pass placement
is another thing to look at - I'd definitely place it after sched1
but I guess without looking at the pass again it's way before that?
True, but I doubt there are enough instructions we could sink the load 
past to make a measurable difference.  This is especially true on the 
class of uarchs where this is going to be most important.


In the case where the store/load can't be interchanged and thus this new 
pass rejects any transformation, we could try to do something in the 
scheduler to defer the load as long as possible.  Essentially it's a 
true dependency through a memory location using must-aliasing properties 
and in that case we'd want to crank up the "latency" of the store so 
that the load gets pushed away.


I think one of the difficulties here is we often model stores as not 
having any latency (which is probably OK in most cases).  Input data 
dependencies and structural hazards dominate dominate considerations for 
stores.


jeff




[committed] [RISC-V] Drop dead test

2024-06-10 Thread Jeff Law
This test is no longer useful.  It doesn't test what it was originally 
intended to test and there's really no way to recover it sanely.


We agreed in the patchwork meeting last week that if we want to test Zfa 
that we'll write a new test for that.  Similarly if we want to do deeper 
testing of the non-Zfa sequences in this space that we'd write new tests 
for those as well (execution tests in particular).


So dropping this test.

Jeffcommit 95161c6abfbd7ba9fab0b538ccc885f5980efbee
Author: Jeff Law 
Date:   Mon Jun 10 22:39:40 2024 -0600

[committed] [RISC-V] Drop dead round_32 test

This test is no longer useful.  It doesn't test what it was originally 
intended
to test and there's really no way to recover it sanely.

We agreed in the patchwork meeting last week that if we want to test Zfa 
that
we'll write a new test for that.  Similarly if we want to do deeper testing 
of
the non-Zfa sequences in this space that we'd write new tests for those as 
well
(execution tests in particular).

So dropping this test.

gcc/testsuite
* gcc.target/riscv/round_32.c: Delete.

diff --git a/gcc/testsuite/gcc.target/riscv/round_32.c 
b/gcc/testsuite/gcc.target/riscv/round_32.c
deleted file mode 100644
index 88ff77aff2e..000
--- a/gcc/testsuite/gcc.target/riscv/round_32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* { dg-do compile { target { riscv32*-*-* } } } */
-/* { dg-require-effective-target glibc } */
-/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-math-errno 
-funsafe-math-optimizations -fno-inline" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
-
-#include "round.c"
-
-/* { dg-final { scan-assembler-times {\mfcvt.w.s} 15 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.s.w} 5 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.d.w} 65 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.w.d} 15 } } */
-/* { dg-final { scan-assembler-times {,rup} 6 } } */
-/* { dg-final { scan-assembler-times {,rmm} 6 } } */
-/* { dg-final { scan-assembler-times {,rdn} 6 } } */
-/* { dg-final { scan-assembler-times {,rtz} 6 } } */
-/* { dg-final { scan-assembler-not {\mfcvt.l.d} } } */
-/* { dg-final { scan-assembler-not {\mfcvt.d.l} } } */
-/* { dg-final { scan-assembler-not "\\sceil\\s" } } */
-/* { dg-final { scan-assembler-not "\\sfloor\\s" } } */
-/* { dg-final { scan-assembler-not "\\sround\\s" } } */
-/* { dg-final { scan-assembler-not "\\snearbyint\\s" } } */
-/* { dg-final { scan-assembler-not "\\srint\\s" } } */
-/* { dg-final { scan-assembler-not "\\stail\\s" } } */


[gcc r15-1172] [committed] [RISC-V] Drop dead round_32 test

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:95161c6abfbd7ba9fab0b538ccc885f5980efbee

commit r15-1172-g95161c6abfbd7ba9fab0b538ccc885f5980efbee
Author: Jeff Law 
Date:   Mon Jun 10 22:39:40 2024 -0600

[committed] [RISC-V] Drop dead round_32 test

This test is no longer useful.  It doesn't test what it was originally 
intended
to test and there's really no way to recover it sanely.

We agreed in the patchwork meeting last week that if we want to test Zfa 
that
we'll write a new test for that.  Similarly if we want to do deeper testing 
of
the non-Zfa sequences in this space that we'd write new tests for those as 
well
(execution tests in particular).

So dropping this test.

gcc/testsuite
* gcc.target/riscv/round_32.c: Delete.

Diff:
---
 gcc/testsuite/gcc.target/riscv/round_32.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/round_32.c 
b/gcc/testsuite/gcc.target/riscv/round_32.c
deleted file mode 100644
index 88ff77aff2e..000
--- a/gcc/testsuite/gcc.target/riscv/round_32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* { dg-do compile { target { riscv32*-*-* } } } */
-/* { dg-require-effective-target glibc } */
-/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-math-errno 
-funsafe-math-optimizations -fno-inline" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
-
-#include "round.c"
-
-/* { dg-final { scan-assembler-times {\mfcvt.w.s} 15 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.s.w} 5 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.d.w} 65 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.w.d} 15 } } */
-/* { dg-final { scan-assembler-times {,rup} 6 } } */
-/* { dg-final { scan-assembler-times {,rmm} 6 } } */
-/* { dg-final { scan-assembler-times {,rdn} 6 } } */
-/* { dg-final { scan-assembler-times {,rtz} 6 } } */
-/* { dg-final { scan-assembler-not {\mfcvt.l.d} } } */
-/* { dg-final { scan-assembler-not {\mfcvt.d.l} } } */
-/* { dg-final { scan-assembler-not "\\sceil\\s" } } */
-/* { dg-final { scan-assembler-not "\\sfloor\\s" } } */
-/* { dg-final { scan-assembler-not "\\sround\\s" } } */
-/* { dg-final { scan-assembler-not "\\snearbyint\\s" } } */
-/* { dg-final { scan-assembler-not "\\srint\\s" } } */
-/* { dg-final { scan-assembler-not "\\stail\\s" } } */


Re: [PATCH v3 0/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-10 Thread Jeff Law




On 6/10/24 3:46 PM, Patrick O'Neill wrote:

The A extension has been split into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
Ratification: https://jira.riscv.org/browse/RVS-1995

v2:
Rebased and updated some testcases that rely on the ISA string.

v3:
Regex-ify temp registers in added testcases.
Remove unintentional whitespace changes.
Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move core-v bi
extension doc into appropriate section).

Edwin Lu (1):
   RISC-V: Add basic Zaamo and Zalrsc support

Patrick O'Neill (2):
   RISC-V: Add Zalrsc and Zaamo testsuite support
   RISC-V: Add Zalrsc amo-op patterns

  gcc/common/config/riscv/riscv-common.cc   |  11 +-
  gcc/config/riscv/arch-canonicalize|   1 +
  gcc/config/riscv/riscv.opt|   6 +-
  gcc/config/riscv/sync.md  | 152 +++---
  gcc/doc/sourcebuild.texi  |  16 +-
  .../riscv/amo-table-a-6-amo-add-1.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-2.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-3.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-4.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-5.c   |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-1.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-2.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-3.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-4.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-5.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-6.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-7.c  |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-1.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-2.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-3.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-4.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-5.c   |   2 +-
  .../riscv/amo-table-ztso-amo-add-1.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-2.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-3.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-4.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-5.c  |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-1.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-2.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-3.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-4.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-5.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-6.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-7.c |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-1.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-2.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-3.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-4.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-5.c  |   2 +-
  .../riscv/amo-zaamo-preferred-over-zalrsc.c   |  17 ++
  .../gcc.target/riscv/amo-zalrsc-amo-add-1.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-2.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-3.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-4.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-5.c   |  19 +++
  gcc/testsuite/gcc.target/riscv/attribute-15.c |   2 +-
  gcc/testsuite/gcc.target/riscv/attribute-16.c |   2 +-
  gcc/testsuite/gcc.target/riscv/attribute-17.c |   2 +-
  gcc/testsuite/gcc.target/riscv/attribute-18.c |   2 +-
  gcc/testsuite/gcc.target/riscv/pr110696.c |   2 +-
  .../gcc.target/riscv/rvv/base/pr114352-1.c|   4 +-
  .../gcc.target/riscv/rvv/base/pr114352-3.c|   8 +-
  gcc/testsuite/lib/target-supports.exp |  48 +-
  53 files changed, 366 insertions(+), 70 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-4.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-5.c

This series is OK for the trunk.

jeff



Re: [PATCH v3 0/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-10 Thread Jeff Law




On 6/10/24 6:15 PM, Andrea Parri wrote:

On Mon, Jun 10, 2024 at 02:46:54PM -0700, Patrick O'Neill wrote:

The A extension has been split into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
Ratification: https://jira.riscv.org/browse/RVS-1995

v2:
Rebased and updated some testcases that rely on the ISA string.

v3:
Regex-ify temp registers in added testcases.
Remove unintentional whitespace changes.
Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move core-v bi
extension doc into appropriate section).

Edwin Lu (1):
   RISC-V: Add basic Zaamo and Zalrsc support

Patrick O'Neill (2):
   RISC-V: Add Zalrsc and Zaamo testsuite support
   RISC-V: Add Zalrsc amo-op patterns


While providing a proper/detailed review of the series goes above my
"GCC internals" skills, I've applied the series and checked that the
generated code for some atomic operations meet expectations (expecta-
tions which, w/ "only Zaamo", are arguably quite low as mentioned in
v2 and elsewhere):
Thanks for taking the time.  We realize you're not a GCC expert, but 
having an extra pair of eyes on the atomics is always appreciated.




Tested-by: Andrea Parri 

   Andrea


P.S. Unrelated to the changes at stake, but perhaps worth mentioning:
w/ and w/o these changes, the following

[ ... ]
I'll leave this to Patrick to decide if he wants to update.  I'm always 
hesitant to weaken this stuff as I'm sure there's somebody, somewhere 
that assumes the stronger primitives.


Jeff



Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Jeff Law




On 6/10/24 7:28 PM, Li, Pan2 wrote:

Hi Sam,


This testcases ICEs for me on x86-64 too (without your patch) with just -O2.
Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
too).


Sure thing, but do you have any suggestion about where should I put these 2 
cases?
There are sorts of sub-directories under gcc/testsuite, I am not very familiar 
that where
is the best reasonable location.

gcc.dg/torture would be the most natural location I think.

jeff



[to-be-committed] [RISC-V] Improve (1 << N) | C for rv64

2024-06-10 Thread Jeff Law

Another improvement for generating Zbs instructions.

In this case we're looking at stuff like (1 << N) | C where N varies and 
C is a single bit constant.


In this pattern the (1 << N) happens in SImode, but is zero extended out 
to DImode before the bit manipulation.  The fact that we're modifying a 
DImode object in the logical op is important as it means we don't have 
to worry about whether or not the resulting value is sign extended from 
SI to DI.


This has run through Ventana's CI system.  I'll wait for it to roll 
through pre-commit CI before moving forward.


Jeff



gcc/
* bitmanip.md ((1 << N) | C)): New splitter for IOR/XOR of
a single bit an a DImode object.

gcc/testsuite/

* gcc.target/riscv/zbs-zext.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 6c2736454aa..3cc244898e7 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -727,6 +727,21 @@ (define_insn "*bsetidisi"
   "bseti\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
+;; We can easily handle zero extensions
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+(any_or:DI (zero_extend:DI
+(ashift:SI (const_int 1)
+   (match_operand:QI 1 "register_operand")))
+  (match_operand:DI 2 "single_bit_mask_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBS"
+  [(set (match_dup 3)
+(match_dup 2))
+   (set (match_dup 0)
+ (any_or:DI (ashift:DI (const_int 1) (match_dup 1))
+   (match_dup 3)))])
+
 (define_insn "*bclr"
   [(set (match_operand:X 0 "register_operand" "=r")
(and:X (rotate:X (const_int -2)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-zext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-zext.c
new file mode 100644
index 000..5773b15d298
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-zext.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bset (const uint32_t i)
+{
+  uint64_t checks = 8;
+  checks |= 1U << i;
+  return checks;
+}
+
+uint64_t binv (const uint32_t i)
+{
+  uint64_t checks = 8;
+  checks ^= 1U << i;
+  return checks;
+}
+
+uint64_t bclr (const uint32_t i)
+{
+  uint64_t checks = 10;
+  checks &= ~(1U << i);
+  return checks;
+}
+
+/* { dg-final { scan-assembler-times "bset\t" 1 } } */
+/* { dg-final { scan-assembler-times "binv\t" 1 } } */
+/* { dg-final { scan-assembler-times "bclr\t" 1 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode object

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:08a6582277f62c1b5873dfa4d385a2b2e8843d8f

commit 08a6582277f62c1b5873dfa4d385a2b2e8843d8f
Author: Raphael Zinsly 
Date:   Mon Jun 10 14:16:16 2024 -0600

[to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode 
object

bext is defined as (src >> n) & 1.  With that formulation, particularly the
"&1" means the result is implicitly zero extended.  So we can safely use it 
on
SI objects for rv64 without the need to do any explicit extension.

This patch adds the obvious pattern and a few testcases.   I think one of 
the
tests is derived from coremark, the other two from spec2017.

This has churned through Ventana's CI system repeatedly since it was first
written.  Assuming pre-commit CI doesn't complain, I'll commit it on 
Raphael's
behalf later today or Monday.

gcc/
* config/riscv/bitmanip.md (*bextdisi): New pattern.

gcc/testsuite

* gcc.target/riscv/zbs-ext.c: New test.

(cherry picked from commit 9aaf29b9ba5ffe332220d002ddde85d96fd6657d)

Diff:
---
 gcc/config/riscv/bitmanip.md | 17 +
 gcc/testsuite/gcc.target/riscv/zbs-ext.c | 15 +++
 2 files changed, 32 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7e716d2d076..4ee413c143e 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -684,6 +684,23 @@
 }
 [(set_attr "type" "bitmanip")])
 
+;; An outer AND with a constant where bits 31..63 are 0 can be seen as
+;; a virtual zero extension from 31 to 64 bits.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+(and:DI (not:DI (subreg:DI
+ (ashift:SI (const_int 1)
+(match_operand:QI 1 "register_operand")) 0))
+(match_operand:DI 2 "arith_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBS
+   && clz_hwi (INTVAL (operands[2])) >= 33"
+  [(set (match_dup 3)
+(match_dup 2))
+   (set (match_dup 0)
+ (and:DI (rotate:DI (const_int -2) (match_dup 1))
+ (match_dup 3)))])
+
 (define_insn "*binv"
   [(set (match_operand:X 0 "register_operand" "=r")
(xor:X (ashift:X (const_int 1)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-ext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
new file mode 100644
index 000..65f42545b5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bclr (const uint32_t i)
+{
+  uint64_t checks = 10;
+  checks &= ~(1U << i);
+  return checks;
+}
+
+/* { dg-final { scan-assembler-times "bclr\t" 1 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Just the testsuite bits from:

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1d97b9d17699ea5fdd0945b8ce8aecda79829ff4

commit 1d97b9d17699ea5fdd0945b8ce8aecda79829ff4
Author: Pan Li 
Date:   Mon Jun 10 14:13:38 2024 -0600

Just the testsuite bits from:

[PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
to replace the PHI node.  Unfortunately,  I made a mistake that insert
the gcall to before the last stmt of the bb.  See below gimple,  the PHI
is located at no.1 but we insert the gcall (aka no.9) to the end of
the bb.  Then the use of _9 in no.2 will have no def and will trigger
ICE when verify_ssa.

  1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
deleted.
  2. prephitmp_36 = (char *) _9;
  3. buf.write_base = string_13(D);
  4. buf.write_ptr = string_13(D);
  5. buf.write_end = prephitmp_36;
  6. buf.written = 0;
  7. buf.mode = 3;
  8. _7 = buf.write_end;
  9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
by mistake

This patch would like to insert the gcall to before the start of the bb
stmt.  To ensure the possible use of PHI_result will have a def exists.
After this patch the above gimple will be:

  0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start 
bb by mistake
  1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
deleted.
  2. prephitmp_36 = (char *) _9;
  3. buf.write_base = string_13(D);
  4. buf.write_ptr = string_13(D);
  5. buf.write_end = prephitmp_36;
  6. buf.written = 0;
  7. buf.mode = 3;
  8. _7 = buf.write_end;

The below test suites are passed for this patch:
* The rv64gcv fully regression test with newlib.
* The rv64gcv build with glibc.
* The x86 regression test with newlib.
* The x86 bootstrap test with newlib.

PR target/115387

gcc/ChangeLog:

* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): 
Take
the gsi of start_bb instead of last_bb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: New test.
* gcc.target/riscv/pr115387-2.c: New test.

(cherry picked from commit d03ff3fd3e2da1352a404e3c53fe61314569345c)

Diff:
---
 gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
 gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
 2 files changed, 53 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
new file mode 100644
index 000..a1c926977c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
@@ -0,0 +1,35 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#define PRINTF_CHK 0x34
+
+typedef unsigned long uintptr_t;
+
+struct __printf_buffer {
+  char *write_ptr;
+  int status;
+};
+
+extern void __printf_buffer_init_end (struct __printf_buffer *, char *, char 
*);
+
+void
+test (char *string, unsigned long maxlen, unsigned mode_flags)
+{
+  struct __printf_buffer buf;
+
+  if ((mode_flags & PRINTF_CHK) != 0)
+{
+  string[0] = '\0';
+  uintptr_t end;
+
+  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
+   end = -1;
+
+  __printf_buffer_init_end (, string, (char *) end);
+}
+  else
+__printf_buffer_init_end (, string, (char *) ~(uintptr_t) 0);
+
+  *buf.write_ptr = '\0';
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c 
b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
new file mode 100644
index 000..7183bf18dfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
@@ -0,0 +1,18 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include 
+#include 
+
+char *
+test (char *string, size_t maxlen)
+{
+  string[0] = '\0';
+  uintptr_t end;
+
+  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
+end = -1;
+
+  return (char *) end;
+}


[gcc r15-1168] [to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode object

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:9aaf29b9ba5ffe332220d002ddde85d96fd6657d

commit r15-1168-g9aaf29b9ba5ffe332220d002ddde85d96fd6657d
Author: Raphael Zinsly 
Date:   Mon Jun 10 14:16:16 2024 -0600

[to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode 
object

bext is defined as (src >> n) & 1.  With that formulation, particularly the
"&1" means the result is implicitly zero extended.  So we can safely use it 
on
SI objects for rv64 without the need to do any explicit extension.

This patch adds the obvious pattern and a few testcases.   I think one of 
the
tests is derived from coremark, the other two from spec2017.

This has churned through Ventana's CI system repeatedly since it was first
written.  Assuming pre-commit CI doesn't complain, I'll commit it on 
Raphael's
behalf later today or Monday.

gcc/
* config/riscv/bitmanip.md (*bextdisi): New pattern.

gcc/testsuite

* gcc.target/riscv/zbs-ext.c: New test.

Diff:
---
 gcc/config/riscv/bitmanip.md | 17 +
 gcc/testsuite/gcc.target/riscv/zbs-ext.c | 15 +++
 2 files changed, 32 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7e716d2d076..4ee413c143e 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -684,6 +684,23 @@
 }
 [(set_attr "type" "bitmanip")])
 
+;; An outer AND with a constant where bits 31..63 are 0 can be seen as
+;; a virtual zero extension from 31 to 64 bits.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+(and:DI (not:DI (subreg:DI
+ (ashift:SI (const_int 1)
+(match_operand:QI 1 "register_operand")) 0))
+(match_operand:DI 2 "arith_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBS
+   && clz_hwi (INTVAL (operands[2])) >= 33"
+  [(set (match_dup 3)
+(match_dup 2))
+   (set (match_dup 0)
+ (and:DI (rotate:DI (const_int -2) (match_dup 1))
+ (match_dup 3)))])
+
 (define_insn "*binv"
   [(set (match_operand:X 0 "register_operand" "=r")
(xor:X (ashift:X (const_int 1)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-ext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
new file mode 100644
index 000..65f42545b5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bclr (const uint32_t i)
+{
+  uint64_t checks = 10;
+  checks &= ~(1U << i);
+  return checks;
+}
+
+/* { dg-final { scan-assembler-times "bclr\t" 1 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */


Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Jeff Law




On 6/10/24 8:49 AM, pan2...@intel.com wrote:

When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
to replace the PHI node.  Unfortunately,  I made a mistake that insert
the gcall to before the last stmt of the bb.  See below gimple,  the PHI
is located at no.1 but we insert the gcall (aka no.9) to the end of
the bb.  Then the use of _9 in no.2 will have no def and will trigger
ICE when verify_ssa.

   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;
   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb by 
mistake

This patch would like to insert the gcall to before the start of the bb
stmt.  To ensure the possible use of PHI_result will have a def exists.
After this patch the above gimple will be:

   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
by mistake
   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;

The below test suites are passed for this patch:
* The rv64gcv fully regression test with newlib.
* The rv64gcv build with glibc.
* The x86 regression test with newlib.
* The x86 bootstrap test with newlib.

PR target/115387

gcc/ChangeLog:

* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
the gsi of start_bb instead of last_bb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: New test.
* gcc.target/riscv/pr115387-2.c: New test.

I did a fresh x86_64 bootstrap and regression test and pushed this.

jeff



[gcc r15-1167] [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d03ff3fd3e2da1352a404e3c53fe61314569345c

commit r15-1167-gd03ff3fd3e2da1352a404e3c53fe61314569345c
Author: Pan Li 
Date:   Mon Jun 10 14:13:38 2024 -0600

[PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
to replace the PHI node.  Unfortunately,  I made a mistake that insert
the gcall to before the last stmt of the bb.  See below gimple,  the PHI
is located at no.1 but we insert the gcall (aka no.9) to the end of
the bb.  Then the use of _9 in no.2 will have no def and will trigger
ICE when verify_ssa.

  1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
deleted.
  2. prephitmp_36 = (char *) _9;
  3. buf.write_base = string_13(D);
  4. buf.write_ptr = string_13(D);
  5. buf.write_end = prephitmp_36;
  6. buf.written = 0;
  7. buf.mode = 3;
  8. _7 = buf.write_end;
  9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
by mistake

This patch would like to insert the gcall to before the start of the bb
stmt.  To ensure the possible use of PHI_result will have a def exists.
After this patch the above gimple will be:

  0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start 
bb by mistake
  1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
deleted.
  2. prephitmp_36 = (char *) _9;
  3. buf.write_base = string_13(D);
  4. buf.write_ptr = string_13(D);
  5. buf.write_end = prephitmp_36;
  6. buf.written = 0;
  7. buf.mode = 3;
  8. _7 = buf.write_end;

The below test suites are passed for this patch:
* The rv64gcv fully regression test with newlib.
* The rv64gcv build with glibc.
* The x86 regression test with newlib.
* The x86 bootstrap test with newlib.

PR target/115387

gcc/ChangeLog:

* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): 
Take
the gsi of start_bb instead of last_bb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: New test.
* gcc.target/riscv/pr115387-2.c: New test.

Diff:
---
 gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
 gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
 gcc/tree-ssa-math-opts.cc   |  2 +-
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
new file mode 100644
index 000..a1c926977c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
@@ -0,0 +1,35 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#define PRINTF_CHK 0x34
+
+typedef unsigned long uintptr_t;
+
+struct __printf_buffer {
+  char *write_ptr;
+  int status;
+};
+
+extern void __printf_buffer_init_end (struct __printf_buffer *, char *, char 
*);
+
+void
+test (char *string, unsigned long maxlen, unsigned mode_flags)
+{
+  struct __printf_buffer buf;
+
+  if ((mode_flags & PRINTF_CHK) != 0)
+{
+  string[0] = '\0';
+  uintptr_t end;
+
+  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
+   end = -1;
+
+  __printf_buffer_init_end (, string, (char *) end);
+}
+  else
+__printf_buffer_init_end (, string, (char *) ~(uintptr_t) 0);
+
+  *buf.write_ptr = '\0';
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c 
b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
new file mode 100644
index 000..7183bf18dfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
@@ -0,0 +1,18 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include 
+#include 
+
+char *
+test (char *string, size_t maxlen)
+{
+  string[0] = '\0';
+  uintptr_t end;
+
+  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
+end = -1;
+
+  return (char *) end;
+}
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 173b0366f5e..fbb8e0ea306 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -6102,7 +6102,7 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
   for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
 gsi_next ())
 {
-  gimple_stmt_iterator gsi = gsi_last_bb (bb);
+  gimple_stmt_iterator gsi = gsi_start_bb (bb);
   match_unsigned_saturation_add (, psi.phi ());
 }


Re: [PATCH] Move array_bounds warnings into it's own pass.

2024-06-10 Thread Jeff Law




On 6/10/24 1:24 PM, Andrew MacLeod wrote:
The array bounds warning pass was originally attached to the VRP pass 
because it wanted to leverage the context sensitive ranges available there.


With ranger, we can make it a pass of its own for very little cost. This 
patch does that. It removes the array_bounds_checker from VRP and makes 
it a solo pass that runs immediately after VRP1.


The original version had VRP add any un-executable edge flags it found, 
but I could not find a case where after VRP cleans up the CFG the new 
pass needed that.  I also did not find a case where activating SCEV 
again for the warning pass made a difference after VRP had run.  So this 
patch does neither of those things.


It simple enough to later add SCEV and loop analysis again if it turns 
out to be important.


My primary motivation for removing it was to remove the second DOM walk 
the checker performs which depends on on-demand ranges pre-cached by 
ranger.   This prevented VRP from choosing an alternative VRP solution 
when basic block counts are very high (PR  114855).  I also know Siddesh 
want to experiment with moving the pass later in the pipeline as well, 
which will make that task much simpler as a secondary rationale.


I didn't want to mess with the internal code much. For a multitude of 
reasons.  I did change it so that it always uses the current range_query 
object instead of passing one in to the constructor.  And then I cleaned 
up the VRP code ot no longer take a flag on whether to invoke the 
warning code or not.


The final bit is the pass is set to only run when flag_tree_vrp is on.. 
I did this primarily to preserve existing functionality, and some tests 
depended on it.  ie, would turn on -warray-bounds and disables tree-vrp 
pass (which means the  bounds checker doesnt run) ... which changes the 
expected warnings from the strlen pass.    I'm not going there.    there 
are  also tests which run at -O1 and -Wall that do not expect the bounds 
checker to run either.   So this dependence on the vrp flag is 
documented in the code an preserves existing behavior.


Does anyone have any issues with any of this?
No, in fact, quite the opposite.  I think we very much want the warning 
out of VRP into its own little pass that we can put wherever it makes 
sense in the pipeline rather than having it be tied to VRP.


I'd probably look at the -O1 vs -Wall stuff independently so that we 
could (in theory) eventually remove the dependence on flag_vrp.


jeff




Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-10 Thread Jeff Law




On 6/10/24 12:27 PM, Philipp Tomsich wrote:



This change is what I briefly hinted as "the complete solution" that
we had on the drawing board when we briefly talked last November in
Santa Clara.
I haven't any recollection of that part of the discussion, but I was a 
bit frazzled as you probably noticed.




  We have looked at all of SPEC2017, especially for coverage (i.e.,
making sure we see a significant number of uses of the transformation)
and correctness.  The gcc_r and parest_r components triggered in a
number of "interesting" ways (e.g., motivating the case of
load-elimination).  If it helps, we could share the statistics for how
often the pass triggers on compiling each of the SPEC2017 components?
Definitely helpful.  I may be able to juggle some priorities internally 
to lend a larger hand on testing and helping move this forward.  It's an 
area we're definitely interested in.


Jeff


Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-10 Thread Jeff Law




On 6/10/24 1:55 AM, Manolis Tsamis wrote:




There was an older submission of a load-pair specific pass but this is
a complete reimplementation and indeed significantly more general.
Apart from being target independant, it addresses a number of
important restrictions and can handle multiple store forwardings per
load.
It should be noted that it cannot handle the load-pair cases as these
need special handling, but that's something we're planning to do in
the future by reusing this infrastructure.

ACK.  Thanks for the additional background.







diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4e8967fd8ab..c769744d178 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12657,6 +12657,15 @@ loop unrolling.
   This option is enabled by default at optimization levels @option{-O1},
   @option{-O2}, @option{-O3}, @option{-Os}.

+@opindex favoid-store-forwarding
+@item -favoid-store-forwarding
+@itemx -fno-avoid-store-forwarding
+Many CPUs will stall for many cycles when a load partially depends on previous
+smaller stores.  This pass tries to detect such cases and avoid the penalty by
+changing the order of the load and store and then fixing up the loaded value.
+
+Disabled by default.

Is there any particular reason why this would be off by default at -O1
or higher?  It would seem to me that on modern cores that this
transformation should easily be a win.  Even on an old in-order core,
avoiding the load with the bit insert is likely profitable, just not as
much so.


I don't have a strong opinion for that but I believe Richard's
suggestion to decide this on a per-target basis also makes a lot of
sense.
Deciding whether the transformation is profitable is tightly tied to
the architecture in question (i.e. how large the stall is and what
sort of bit-insert instructions are available).
In order to make this more widely applicable, I think we'll need a
target hook that decides in which case the forwarded stores incur a
penalty and thus the transformation makes sense.
You and Richi are probably right.   I'm not a big fan of passes being 
enabled/disabled on particular targets, but it may make sense here.





Afaik, for each CPU there may be cases that store forwarding is
handled efficiently.
Absolutely.   But forwarding from a smaller store to a wider load is 
painful from a hardware standpoint and if we can avoid it from a codegen 
standpoint, we should.


Did y'all look at spec2017 at all for this patch?  I've got our hardware 
guys to expose a signal for this case so that we can (in a month or so) 
get some hard data on how often it's happening in spec2017 and evaluate 
how this patch helps the most affected workloads.  But if y'all already 
have some data we can use it as a starting point.


jeff


Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Jeff Law




On 6/10/24 8:52 AM, Li, Pan2 wrote:

Not sure if below float eq implement in sail-riscv is useful or not, but looks 
like some special handling for nan, as well as snan.

https://github.com/riscv/sail-riscv/blob/master/c_emulator/SoftFloat-3e/source/f32_eq.c

Yes, but it's symmetrical, which is what we'd want to see.

jeff



Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Jeff Law




On 6/10/24 10:16 AM, Demin Han wrote:

Hi,

I‘m on vacation rencently.
I will return in a few days and summit new patch with the test.

No problem.  Enjoy your vacation, this can certainly wait until you return.

jeff



Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Jeff Law




On 6/10/24 8:49 AM, pan2...@intel.com wrote:

From: Pan Li 

When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
to replace the PHI node.  Unfortunately,  I made a mistake that insert
the gcall to before the last stmt of the bb.  See below gimple,  the PHI
is located at no.1 but we insert the gcall (aka no.9) to the end of
the bb.  Then the use of _9 in no.2 will have no def and will trigger
ICE when verify_ssa.

   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;
   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb by 
mistake

This patch would like to insert the gcall to before the start of the bb
stmt.  To ensure the possible use of PHI_result will have a def exists.
After this patch the above gimple will be:

   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
by mistake
   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;

The below test suites are passed for this patch:
* The rv64gcv fully regression test with newlib.
* The rv64gcv build with glibc.
* The x86 regression test with newlib.
* The x86 bootstrap test with newlib.
So the patch looks fine.  I'm just trying to parse the testing.  If you 
did an x86 bootstrap & regression test, you wouldn't be using newlib. 
That would be a native bootstrap & regression test which would use 
whatever C library is already installed on the system.  I'm assuming 
that's what you did.


If my assumption is correct, then this is fine for the trunk.

jeff



Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Jeff Law




On 6/10/24 1:33 AM, Robin Dapp wrote:

But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?

target = (a == b) ? x : y
target = (a != b) ? y : x

Are equivalent, even for IEEE IIRC.


Yes, that should be fine.  My concern was not that we do a
canonicalization but that we might not do it for some of the
vector cases.  In particular when one of the operands is wrapped
in a vec_duplicate and we end up with it first rather than
second.

My general feeling is that the patch is good but I wasn't entirely
sure about all cases (in particular in case we transform something
after expand).  That's why I would have liked to see at least some
small test cases for it along with the patch (for the combinations
we don't test yet).

Ah, OK.

Demin, can you some additional test coverage, guided by Robin's concerns 
above?


Thanks,
jeff



[to-be-committed][RISC-V] Generate bclr more often for rv64

2024-06-10 Thread Jeff Law
Another of Raphael's patches to improve our ability to safely generate a 
Zbs instruction, bclr in this instance.


In this case we have something like ~(1 << N) & C where N is variable, 
but C is a constant.  If C has 33 or more leading zeros, then no matter 
what bit we clear via bclr, the result will always have at least bits 
31..63 clear.  So we don't have to worry about any of the extension 
issues with SI objects in rv64.


Odds are this was seen in spec at some point by the RAU team, thus 
leading to Raphael's pattern.


Anyway, this has been through Ventana's CI system in the past.  I'll 
wait for it to work through upstream pre-commit CI before taking further 
action, but the plan is to commit after successful CI run.


Jeff



gcc/

* config/riscv/bitmanip.md ((~1 << N) & C): New splitter.

gcc/testsuite/

* gcc.target/riscv/zbs-ext.c: New test.


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 6559d4d6950..4361be1c265 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -784,6 +784,23 @@ (define_insn_and_split "*bclridisi_nottwobits"
 }
 [(set_attr "type" "bitmanip")])
 
+;; An outer AND with a constant where bits 31..63 are 0 can be seen as
+;; a virtual zero extension from 31 to 64 bits.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+(and:DI (not:DI (subreg:DI
+ (ashift:SI (const_int 1)
+(match_operand:QI 1 "register_operand")) 0))
+(match_operand:DI 2 "arith_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBS
+   && clz_hwi (INTVAL (operands[2])) >= 33"
+  [(set (match_dup 3)
+(match_dup 2))
+   (set (match_dup 0)
+ (and:DI (rotate:DI (const_int -2) (match_dup 1))
+ (match_dup 3)))])
+
 (define_insn "*binv"
   [(set (match_operand:X 0 "register_operand" "=r")
(xor:X (ashift:X (const_int 1)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-ext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
new file mode 100644
index 000..65f42545b5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bclr (const uint32_t i)
+{
+  uint64_t checks = 10;
+  checks &= ~(1U << i);
+  return checks;
+}
+
+/* { dg-final { scan-assembler-times "bclr\t" 1 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode object

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:c5c054c429ac5a4d1a665d6e5e4634973dffae5a

commit c5c054c429ac5a4d1a665d6e5e4634973dffae5a
Author: Raphael Zinsly 
Date:   Mon Jun 10 07:03:00 2024 -0600

[to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode 
object

bext is defined as (src >> n) & 1.  With that formulation, particularly the
"&1" means the result is implicitly zero extended.  So we can safely use it 
on
SI objects for rv64 without the need to do any explicit extension.

This patch adds the obvious pattern and a few testcases.   I think one of 
the
tests is derived from coremark, the other two from spec2017.

This has churned through Ventana's CI system repeatedly since it was first
written.  Assuming pre-commit CI doesn't complain, I'll commit it on 
Raphael's
behalf later today or Monday.

gcc/
* config/riscv/bitmanip.md (*bextdisi): New pattern.

gcc/testsuite

* gcc.target/riscv/bext-ext.c: New test.

(cherry picked from commit 3472c1b500cf9184766237bfd3d102aa8451b99f)

Diff:
---
 gcc/config/riscv/bitmanip.md  | 12 
 gcc/testsuite/gcc.target/riscv/bext-ext.c | 27 +++
 2 files changed, 39 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..7e716d2d076 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -754,6 +754,18 @@
   "operands[1] = gen_lowpart (word_mode, operands[1]);"
   [(set_attr "type" "bitmanip")])
 
+;; The logical-and against 0x1 implicitly extends the result.   So we can treat
+;; an SImode bext as-if it's DImode without any explicit extension.
+(define_insn "*bextdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(and:DI (subreg:DI (lshiftrt:SI
+(match_operand:SI 1 "register_operand" "r")
+(match_operand:QI 2 "register_operand" "r")) 0)
+(const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "bext\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/bext-ext.c 
b/gcc/testsuite/gcc.target/riscv/bext-ext.c
new file mode 100644
index 000..eeef07d7013
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bext-ext.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bext1 (int dst, const uint32_t i)
+{
+  uint64_t checks = 1U;
+  checks &= dst >> i;
+  return checks;
+}
+
+int bext2 (int dst, int i_denom)
+{
+  dst = 1 & (dst >> i_denom);
+  return dst;
+}
+
+const uint32_t bext3 (uint32_t bit_count, uint32_t symbol)
+{
+  return (symbol >> bit_count) & 1;
+}
+
+/* { dg-final { scan-assembler-times "bext\t" 3 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */
+/* { dg-final { scan-assembler-not "srlw\t"} } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:7cf4c1e16755f7adab3bff983d980d6ae0b9a6f3

commit 7cf4c1e16755f7adab3bff983d980d6ae0b9a6f3
Author: Andreas Tobler 
Date:   Sun Jun 9 23:18:04 2024 +0200

FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14

As of FreeBSD version 14, FreeBSD no longer provides profiled system
libraries like libc_p and libpthread_p. Stop linking against them if
the FreeBSD major version is 14 or more.

gcc:
* config/freebsd-spec.h: Change fbsd-lib-spec for FreeBSD > 13,
do not link against profiled system libraries if -pg is invoked.
Add a define to note about this change.
* config/aarch64/aarch64-freebsd.h: Use the note to inform if
-pg is invoked on FreeBSD > 13.
* config/arm/freebsd.h: Likewise.
* config/i386/freebsd.h: Likewise.
* config/i386/freebsd64.h: Likewise.
* config/riscv/freebsd.h: Likewise.
* config/rs6000/freebsd64.h: Likewise.
* config/rs6000/sysv4.h: Likeise.
(cherry picked from commit 48abb540701447b0cd9df7542720ab65a34fc1b1)

Diff:
---
 gcc/config/aarch64/aarch64-freebsd.h |  1 +
 gcc/config/arm/freebsd.h |  1 +
 gcc/config/freebsd-spec.h| 18 ++
 gcc/config/i386/freebsd.h|  1 +
 gcc/config/i386/freebsd64.h  |  1 +
 gcc/config/riscv/freebsd.h   |  1 +
 gcc/config/rs6000/freebsd64.h|  1 +
 gcc/config/rs6000/sysv4.h|  1 +
 8 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-freebsd.h 
b/gcc/config/aarch64/aarch64-freebsd.h
index 53cc17a1caf..e26d69ce46c 100644
--- a/gcc/config/aarch64/aarch64-freebsd.h
+++ b/gcc/config/aarch64/aarch64-freebsd.h
@@ -35,6 +35,7 @@
 #undef  FBSD_TARGET_LINK_SPEC
 #define FBSD_TARGET_LINK_SPEC " \
 %{p:%nconsider using `-pg' instead of `-p' with gprof (1)}  \
+" FBSD_LINK_PG_NOTE "  \
 %{v:-V} \
 %{assert*} %{R*} %{rpath*} %{defsym*}   \
 %{shared:-Bshareable %{h*} %{soname*}}  \
diff --git a/gcc/config/arm/freebsd.h b/gcc/config/arm/freebsd.h
index 9d0a5a842ab..ee4860ae637 100644
--- a/gcc/config/arm/freebsd.h
+++ b/gcc/config/arm/freebsd.h
@@ -47,6 +47,7 @@
 #undef LINK_SPEC
 #define LINK_SPEC "\
   %{p:%nconsider using `-pg' instead of `-p' with gprof (1)}   \
+  " FBSD_LINK_PG_NOTE "
\
   %{v:-V}  \
   %{assert*} %{R*} %{rpath*} %{defsym*}
\
   %{shared:-Bshareable %{h*} %{soname*}}   \
diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
index a6d1ad1280f..f43056bf2cf 100644
--- a/gcc/config/freebsd-spec.h
+++ b/gcc/config/freebsd-spec.h
@@ -92,19 +92,29 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
libc, depending on whether we're doing profiling or need threads support.
(similar to the default, except no -lg, and no -p).  */
 
+#if FBSD_MAJOR < 14
+#define FBSD_LINK_PG_NOTHREADS "%{!pg: -lc}  %{pg: -lc_p}"
+#define FBSD_LINK_PG_THREADS   "%{!pg: %{pthread:-lpthread} -lc} " \
+   "%{pg: %{pthread:-lpthread} -lc_p}"
+#define FBSD_LINK_PG_NOTE ""
+#else
+#define FBSD_LINK_PG_NOTHREADS "%{-lc} "
+#define FBSD_LINK_PG_THREADS   "%{pthread:-lpthread} -lc "
+#define FBSD_LINK_PG_NOTE "%{pg:%nFreeBSD no longer provides profiled "\
+ "system libraries}"
+#endif
+
 #ifdef FBSD_NO_THREADS
 #define FBSD_LIB_SPEC "
\
   %{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \
 is built with the --enable-threads configure-time option.} \
   %{!shared:   \
-%{!pg: -lc}
\
-%{pg:  -lc_p}  \
+" FBSD_LINK_PG_NOTHREADS " \
   }"
 #else
 #define FBSD_LIB_SPEC "
\
   %{!shared:   \
-%{!pg: %{pthread:-lpthread} -lc}   \
-%{pg:  %{pthread:-lpthread_p} -lc_p}   \
+" FBSD_LINK_PG_THREADS "   \
   }\
   %{shared:\
 %{pthread:-lpthread} -lc   \
diff --git 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [committed] [RISC-V] Fix false-positive uninitialized variable

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e0a5507e6888f85e2ff53aff76c67293890bed85

commit e0a5507e6888f85e2ff53aff76c67293890bed85
Author: Jeff Law 
Date:   Sun Jun 9 09:17:55 2024 -0600

[committed] [RISC-V] Fix false-positive uninitialized variable

Andreas noted we were getting an uninit warning after the recent constant
synthesis changes.  Essentially there's no way for the uninit analysis code 
to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.

So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

gcc/

* config/riscv/riscv.cc (riscv_move_integer): Initialize "x".

(cherry picked from commit 932c6f8dd8859afb13475c2de466bd1a159530da)

Diff:
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 95f3636f8e4..c17141d909a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2720,7 +2720,7 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
   struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
   machine_mode mode;
   int i, num_ops;
-  rtx x;
+  rtx x = NULL_RTX;
 
   mode = GET_MODE (dest);
   /* We use the original mode for the riscv_build_integer call, because HImode


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:21b9c1625d9178475ebbb7d524923e421a93906d

commit 21b9c1625d9178475ebbb7d524923e421a93906d
Author: Roger Sayle 
Date:   Sat Jun 8 19:47:08 2024 -0600

[middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word 
shifts/rotates.

This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:

unsigned long long foo(unsigned long long x) { return x<<2; }

which with -O2 is currently compiled to:

foo:lsr r2,r0,30
asl_s   r1,r1,2
asl_s   r0,r0,2
j_s.d   [blink]
or_sr1,r1,r2

with this patch becomes:

foo:lsr r2,r0,30
add2r1,r2,r1
j_s.d   [blink]
asl_s   r0,r0,2

unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }

which with -O2 is currently compiled to 6 insns + return:

bar:lsr r12,r0,30
asl_s   r3,r1,2
asl_s   r0,r0,2
lsr_s   r1,r1,30
or_sr0,r0,r1
j_s.d   [blink]
or  r1,r12,r3

with this patch becomes 4 insns + return:

bar:lsr r3,r1,30
lsr r2,r0,30
add2r1,r2,r1
j_s.d   [blink]
add2r0,r3,r0

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

gcc/ChangeLog
* expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
to generate PLUS instead or IOR when unioning disjoint bitfields.
* optabs.cc (expand_subword_shift): Likewise.
(expand_binop): Likewise for double-word rotate.

(cherry picked from commit 2277f987979445f4390a5c6e092d79e04814d641)

Diff:
---
 gcc/expmed.cc | 12 +++-
 gcc/optabs.cc |  8 
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 4ec035e4843..900d418ee94 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2610,10 +2610,11 @@ expand_shift_1 (enum tree_code code, machine_mode mode, 
rtx shifted,
  else if (methods == OPTAB_LIB_WIDEN)
{
  /* If we have been unable to open-code this by a rotation,
-do it as the IOR of two shifts.  I.e., to rotate A
-by N bits, compute
+do it as the IOR or PLUS of two shifts.  I.e., to rotate
+A by N bits, compute
 (A << N) | ((unsigned) A >> ((-N) & (C - 1)))
-where C is the bitsize of A.
+where C is the bitsize of A.  If N cannot be zero,
+use PLUS instead of IOR.
 
 It is theoretically possible that the target machine might
 not be able to perform either shift and hence we would
@@ -2650,8 +2651,9 @@ expand_shift_1 (enum tree_code code, machine_mode mode, 
rtx shifted,
  temp1 = expand_shift_1 (left ? RSHIFT_EXPR : LSHIFT_EXPR,
  mode, shifted, other_amount,
  subtarget, 1);
- return expand_binop (mode, ior_optab, temp, temp1, target,
-  unsignedp, methods);
+ return expand_binop (mode,
+  CONST_INT_P (op1) ? add_optab : ior_optab,
+  temp, temp1, target, unsignedp, methods);
}
 
  temp = expand_binop (mode,
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ce91f94ed43..dcd3e406719 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -566,8 +566,8 @@ expand_subword_shift (scalar_int_mode op1_mode, optab 
binoptab,
   if (tmp == 0)
return false;
 
-  /* Now OR in the bits carried over from OUTOF_INPUT.  */
-  if (!force_expand_binop (word_mode, ior_optab, tmp, carries,
+  /* Now OR/PLUS in the bits carried over from OUTOF_INPUT.  */
+  if (!force_expand_binop (word_mode, add_optab, tmp, carries,
   into_target, unsignedp, methods))
return false;
 }
@@ -1937,7 +1937,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, 
rtx op1,
   

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d63ee880aa3931d37fe73570d3a41952daafd8ee

commit d63ee880aa3931d37fe73570d3a41952daafd8ee
Author: Pan Li 
Date:   Wed Jun 5 16:42:05 2024 +0800

RISC-V: Implement .SAT_SUB for unsigned scalar int

As the middle support of .SAT_SUB committed,  implement the unsigned
scalar int of .SAT_SUB for the riscv backend.  Consider below example
code:

T __attribute__((noinline))\
sat_u_sub_##T##_fmt_1 (T x, T y)   \
{  \
  return (x - y) & (-(T)(x >= y)); \
}

T __attribute__((noinline))   \
sat_u_sub_##T##_fmt_2 (T x, T y)  \
{ \
  return (x - y) & (-(T)(x > y)); \
}

DEF_SAT_U_SUB_FMT_1(uint64_t);
DEF_SAT_U_SUB_FMT_2(uint64_t);

Before this patch:
sat_u_sub_uint64_t_fmt_1:
bltua0,a1,.L2
sub a0,a0,a1
ret
.L2:
li  a0,0
ret

After this patch:
sat_u_sub_uint64_t_fmt_1:
sltua5,a0,a1
addia5,a5,-1
sub a0,a0,a1
and a0,a5,a0
ret

ToDo:
Only above 2 forms of .SAT_SUB are support for now,  we will
support more forms of .SAT_SUB in the middle-end in short future.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_ussub): Add new func
decl for ussub expanding.
* config/riscv/riscv.cc (riscv_expand_ussub): Ditto but for impl.
* config/riscv/riscv.md (ussub3): Add new pattern ussub
for scalar modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macros and comments.
* gcc.target/riscv/sat_u_sub-1.c: New test.
* gcc.target/riscv/sat_u_sub-2.c: New test.
* gcc.target/riscv/sat_u_sub-3.c: New test.
* gcc.target/riscv/sat_u_sub-4.c: New test.
* gcc.target/riscv/sat_u_sub-5.c: New test.
* gcc.target/riscv/sat_u_sub-6.c: New test.
* gcc.target/riscv/sat_u_sub-7.c: New test.
* gcc.target/riscv/sat_u_sub-8.c: New test.
* gcc.target/riscv/sat_u_sub-run-1.c: New test.
* gcc.target/riscv/sat_u_sub-run-2.c: New test.
* gcc.target/riscv/sat_u_sub-run-3.c: New test.
* gcc.target/riscv/sat_u_sub-run-4.c: New test.
* gcc.target/riscv/sat_u_sub-run-5.c: New test.
* gcc.target/riscv/sat_u_sub-run-6.c: New test.
* gcc.target/riscv/sat_u_sub-run-7.c: New test.
* gcc.target/riscv/sat_u_sub-run-8.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit ab50ac8180beae9001c97cc036ce0df055e25b41)

Diff:
---
 gcc/config/riscv/riscv-protos.h  |  1 +
 gcc/config/riscv/riscv.cc| 35 
 gcc/config/riscv/riscv.md| 11 
 gcc/testsuite/gcc.target/riscv/sat_arith.h   | 23 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-1.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-2.c | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-3.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-4.c | 17 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-5.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-6.c | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-7.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-8.c | 17 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-1.c | 25 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-2.c | 25 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-3.c | 25 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-4.c | 25 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-5.c | 25 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-6.c | 25 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-7.c | 25 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-8.c | 25 +
 20 files changed, 414 insertions(+)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 0704968561b..09eb3a574e3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -134,6 +134,7 @@ extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
 extern void riscv_expand_usadd (rtx, rtx, rtx);
+extern void riscv_expand_ussub (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool 
*invert_ptr = 0);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 

[gcc r15-1164] [to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode object

2024-06-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:3472c1b500cf9184766237bfd3d102aa8451b99f

commit r15-1164-g3472c1b500cf9184766237bfd3d102aa8451b99f
Author: Raphael Zinsly 
Date:   Mon Jun 10 07:03:00 2024 -0600

[to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode 
object

bext is defined as (src >> n) & 1.  With that formulation, particularly the
"&1" means the result is implicitly zero extended.  So we can safely use it 
on
SI objects for rv64 without the need to do any explicit extension.

This patch adds the obvious pattern and a few testcases.   I think one of 
the
tests is derived from coremark, the other two from spec2017.

This has churned through Ventana's CI system repeatedly since it was first
written.  Assuming pre-commit CI doesn't complain, I'll commit it on 
Raphael's
behalf later today or Monday.

gcc/
* config/riscv/bitmanip.md (*bextdisi): New pattern.

gcc/testsuite

* gcc.target/riscv/bext-ext.c: New test.

Diff:
---
 gcc/config/riscv/bitmanip.md  | 12 
 gcc/testsuite/gcc.target/riscv/bext-ext.c | 27 +++
 2 files changed, 39 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..7e716d2d076 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -754,6 +754,18 @@
   "operands[1] = gen_lowpart (word_mode, operands[1]);"
   [(set_attr "type" "bitmanip")])
 
+;; The logical-and against 0x1 implicitly extends the result.   So we can treat
+;; an SImode bext as-if it's DImode without any explicit extension.
+(define_insn "*bextdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(and:DI (subreg:DI (lshiftrt:SI
+(match_operand:SI 1 "register_operand" "r")
+(match_operand:QI 2 "register_operand" "r")) 0)
+(const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "bext\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/bext-ext.c 
b/gcc/testsuite/gcc.target/riscv/bext-ext.c
new file mode 100644
index 000..eeef07d7013
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bext-ext.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bext1 (int dst, const uint32_t i)
+{
+  uint64_t checks = 1U;
+  checks &= dst >> i;
+  return checks;
+}
+
+int bext2 (int dst, int i_denom)
+{
+  dst = 1 & (dst >> i_denom);
+  return dst;
+}
+
+const uint32_t bext3 (uint32_t bit_count, uint32_t symbol)
+{
+  return (symbol >> bit_count) & 1;
+}
+
+/* { dg-final { scan-assembler-times "bext\t" 3 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */
+/* { dg-final { scan-assembler-not "srlw\t"} } */


[to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode object

2024-06-09 Thread Jeff Law
bext is defined as (src >> n) & 1.  With that formulation, particularly 
the "&1" means the result is implicitly zero extended.  So we can safely 
use it on SI objects for rv64 without the need to do any explicit extension.


This patch adds the obvious pattern and a few testcases.   I think one 
of the tests is derived from coremark, the other two from spec2017.


This has churned through Ventana's CI system repeatedly since it was 
first written.  Assuming pre-commit CI doesn't complain, I'll commit it 
on Raphael's behalf later today or Monday.



Jeff

gcc/
* config/riscv/bitmanip.md (*bextdisi): New pattern.

gcc/testsuite

* gcc.target/riscv/bext-ext.c: New test.

commit e32599b6c863cffa594ab1eca8f4e11562c4bc6a
Author: Raphael Zinsly 
Date:   Fri Mar 22 16:20:21 2024 -0600

Improvement to bext discovery from Raphael.  Extracted from his "Add Zbs 
extended patterns" MR.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 00560be6161..6559d4d6950 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -854,6 +854,18 @@ (define_insn_and_split "*bextseqzdisi"
   "operands[1] = gen_lowpart (word_mode, operands[1]);"
   [(set_attr "type" "bitmanip")])
 
+;; The logical-and against 0x1 implicitly extends the result.   So we can treat
+;; an SImode bext as-if it's DImode without any explicit extension.
+(define_insn "*bextdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(and:DI (subreg:DI (lshiftrt:SI
+(match_operand:SI 1 "register_operand" "r")
+(match_operand:QI 2 "register_operand" "r")) 0)
+(const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "bext\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/bext-ext.c 
b/gcc/testsuite/gcc.target/riscv/bext-ext.c
new file mode 100644
index 000..eeef07d7013
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bext-ext.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bext1 (int dst, const uint32_t i)
+{
+  uint64_t checks = 1U;
+  checks &= dst >> i;
+  return checks;
+}
+
+int bext2 (int dst, int i_denom)
+{
+  dst = 1 & (dst >> i_denom);
+  return dst;
+}
+
+const uint32_t bext3 (uint32_t bit_count, uint32_t symbol)
+{
+  return (symbol >> bit_count) & 1;
+}
+
+/* { dg-final { scan-assembler-times "bext\t" 3 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */
+/* { dg-final { scan-assembler-not "srlw\t"} } */


[committed] [RISC-V] Fix false-positive uninitialized variable

2024-06-09 Thread Jeff Law
Andreas noted we were getting an uninit warning after the recent 
constant synthesis changes.  Essentially there's no way for the uninit 
analysis code to know the first entry in the CODES array is a UNKNOWN 
which will set X before its first use.


So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

Jeff

commit 932c6f8dd8859afb13475c2de466bd1a159530da
Author: Jeff Law 
Date:   Sun Jun 9 09:17:55 2024 -0600

[committed] [RISC-V] Fix false-positive uninitialized variable

Andreas noted we were getting an uninit warning after the recent constant
synthesis changes.  Essentially there's no way for the uninit analysis code 
to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.

So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

gcc/

* config/riscv/riscv.cc (riscv_move_integer): Initialize "x".

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 95f3636f8e4..c17141d909a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2720,7 +2720,7 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
   struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
   machine_mode mode;
   int i, num_ops;
-  rtx x;
+  rtx x = NULL_RTX;
 
   mode = GET_MODE (dest);
   /* We use the original mode for the riscv_build_integer call, because HImode


[gcc r15-1123] [committed] [RISC-V] Fix false-positive uninitialized variable

2024-06-09 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:932c6f8dd8859afb13475c2de466bd1a159530da

commit r15-1123-g932c6f8dd8859afb13475c2de466bd1a159530da
Author: Jeff Law 
Date:   Sun Jun 9 09:17:55 2024 -0600

[committed] [RISC-V] Fix false-positive uninitialized variable

Andreas noted we were getting an uninit warning after the recent constant
synthesis changes.  Essentially there's no way for the uninit analysis code 
to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.

So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

gcc/

* config/riscv/riscv.cc (riscv_move_integer): Initialize "x".

Diff:
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 95f3636f8e4..c17141d909a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2720,7 +2720,7 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
   struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
   machine_mode mode;
   int i, num_ops;
-  rtx x;
+  rtx x = NULL_RTX;
 
   mode = GET_MODE (dest);
   /* We use the original mode for the riscv_build_integer call, because HImode


Re: [to-be-committed] [RISC-V] Use Zbkb for general 64 bit constants when profitable

2024-06-09 Thread Jeff Law




On 6/7/24 11:49 AM, Andreas Schwab wrote:

In file included from ../../gcc/rtl.h:3973,
  from ../../gcc/config/riscv/riscv.cc:31:
In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)',
 inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, 
rtx)' at ./genrtl.h:50:26,
 inlined from 'void riscv_move_integer(rtx, rtx, long int, machine_mode)' 
at ../../gcc/config/riscv/riscv.cc:2786:10:
./genrtl.h:37:16: error: 'x' may be used uninitialized 
[-Werror=maybe-uninitialized]
37 |   XEXP (rt, 0) = arg0;
../../gcc/config/riscv/riscv.cc: In function 'void riscv_move_integer(rtx, rtx, 
long int, machine_mode)':
../../gcc/config/riscv/riscv.cc:2723:7: note: 'x' was declared here
  2723 |   rtx x;
   |   ^
cc1plus: all warnings being treated as errors
Thanks.  I guess the change in control flow in there does hide x's state 
pretty well.  It may not even be provable as initialized without knowing 
how this routine interacts with the costing phase that fills in the codes.


I'll take care of it.

Thanks again,
jeff



Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2024-06-09 Thread Jeff Law




On 6/9/24 5:28 AM, YunQiang Su wrote:

YunQiang Su  于2024年6月9日周日 18:25写道:




gcc/ChangeLog:

   * ifcvt.cc (cond_move_process_if_block):
   Consider the result of targetm.noce_conversion_profitable_p()
   when replacing the original sequence with the converted one.

THanks.  I pushed this to the trunk.



Sorry for the delay report. With this patch the test
gcc.target/mips/movcc-3.c fails.



The problem may be caused by the different of `seq` and `edge e`.
In `seq`, there may be a compare operation, while
`default_max_noce_ifcvt_seq_cost`
only count the branch operation.

The rtx_cost may consider the compare operation in `seq` as quite expensive.
Overall it sounds like a target issue to me -- ie, now that we're 
testing for profitability instead of just assuming it's profitable some 
targets need adjustment.  Either in their costing model or in the 
testsuite expectations.


Jeff



Re: [PING] [contrib] validate_failures.py: fix python 3.12 escape sequence warnings

2024-06-09 Thread Jeff Law




On 6/9/24 5:45 AM, Gabi Falk wrote:

Hi,

On Sat, Jun 08, 2024 at 03:34:02PM -0600, Jeff Law wrote:

On 5/14/24 8:12 AM, Gabi Falk wrote:

Hi,

This one still needs review:

https://inbox.sourceware.org/gcc-patches/20240415233833.104460-1-gabif...@gmx.com/

I think I just ACK'd an equivalent patch from someone else this week.


Looks like it hasn't been merged yet, and I couldn't find it in the
mailing list archive.
Anyway, I hope either one gets merged soon. :)
I'm sure it will.  The variant I asked is from someone with commit 
privs, so they'll push it to the tree when convenient for them.


jeff



Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-09 Thread Jeff Law



On 6/7/24 4:31 PM, Jeff Law wrote:



I've actually added it to my tester just to see if there's any fallout. 
It'll take a week to churn through the long running targets that 
bootstrap in QEMU, but the crosses should have data Monday.
The first round naturally didn't trigger anything because the option is 
off by default.  So I twiddled it to be on at -O1 and above.


epiphany-elf ICEs in gen_rtx_SUBREG with the attached .i file compiled 
with -O2:



root@577c7458c93a://home/jlaw/jenkins/workspace/epiphany-elf/epiphany-elf-obj/newlib/epiphany-elf/newlib/libm/complex#
 epiphany-elf-gcc -O2 libm_a-cacos.i
during RTL pass: avoid_store_forwarding
../../../..//newlib-cygwin/newlib/libm/complex/cacos.c: In function 'cacos':
../../../..//newlib-cygwin/newlib/libm/complex/cacos.c:99:1: internal compiler 
error: in gen_rtx_SUBREG, at emit-rtl.cc:1032
0x614538 gen_rtx_SUBREG(machine_mode, rtx_def*, poly_int<1u, unsigned long>)
../../..//gcc/gcc/emit-rtl.cc:1032
0x614538 gen_rtx_SUBREG(machine_mode, rtx_def*, poly_int<1u, unsigned long>)
../../..//gcc/gcc/emit-rtl.cc:1030
0xe82216 process_forwardings
../../..//gcc/gcc/avoid-store-forwarding.cc:273
0xe82216 avoid_store_forwarding
../../..//gcc/gcc/avoid-store-forwarding.cc:489
0xe82667 execute
../../..//gcc/gcc/avoid-store-forwarding.cc:558



ft32-elf ICE'd in bitmap_check_index at various optimization levels:


FAIL: execute/pr108498-2.c   -O1  (internal compiler error: in 
bitmap_check_index, at sbitmap.h:104)
FAIL: execute/pr108498-2.c   -O1  (test for excess errors)
FAIL: execute/pr108498-2.c   -O2  (internal compiler error: in 
bitmap_check_index, at sbitmap.h:104)
FAIL: execute/pr108498-2.c   -O2  (test for excess errors)
FAIL: execute/pr108498-2.c   -O3 -g  (internal compiler error: in 
bitmap_check_index, at sbitmap.h:104)
FAIL: execute/pr108498-2.c   -O3 -g  (test for excess errors)



avr, c6x,

lm32-elf failed to build libgcc with an ICE in leaf_function_p, I 
haven't isolated that yet.



There were other failures as well.  But you've got a few to start with 
and we can retest pretty easily as the patch evolves.


jeff

# 0 "../../../..//newlib-cygwin/newlib/libm/complex/cacos.c"
# 1 
"//home/jlaw/jenkins/workspace/epiphany-elf/epiphany-elf-obj/newlib/epiphany-elf/newlib//"
# 0 ""
# 0 ""
# 1 "../../../..//newlib-cygwin/newlib/libm/complex/cacos.c"
# 77 "../../../..//newlib-cygwin/newlib/libm/complex/cacos.c"
# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/complex.h"
 1 3 4
# 15 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/complex.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/cdefs.h"
 1 3 4
# 45 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/cdefs.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 1 3 4







# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/features.h"
 1 3 4
# 28 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/features.h"
 3 4
# 1 "./_newlib_version.h" 1 3 4
# 29 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/features.h"
 2 3 4
# 9 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 2 3 4
# 41 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4

# 41 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef signed char __int8_t;

typedef unsigned char __uint8_t;
# 55 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef short int __int16_t;

typedef short unsigned int __uint16_t;
# 77 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef long int __int32_t;

typedef long unsigned int __uint32_t;
# 103 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef long long int __int64_t;

typedef long long unsigned int __uint64_t;
# 134 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef signed char __int_least8_t;

typedef unsigned char __uint_least8_t;
# 160 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef short int __int_least16_t;

typedef short unsigned int __uint_least16_t;
# 182 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/inc

Re: [PATCH] [tree-prof] skip if errors were seen [PR113681]

2024-06-08 Thread Jeff Law




On 4/15/24 10:03 PM, Alexandre Oliva wrote:

On Mar 29, 2024, Alexandre Oliva  wrote:


On Mar 22, 2024, Jeff Law  wrote:

On 3/9/24 2:11 AM, Alexandre Oliva wrote:

ipa_tree_profile asserts that the symtab is in IPA_SSA state, but we
don't reach that state and ICE if e.g. ipa-strub passes report errors.
Skip this pass if errors were seen.
Regstrapped on x86_64-linux-gnu.  Ok to install?

for  gcc/ChangeLog
PR tree-optimization/113681
* tree-profiling.cc (pass_ipa_tree_profile::gate): Skip if
seen_errors.
for  gcc/testsuite/ChangeLog
PR tree-optimization/113681
* c-c++-common/strub-pr113681.c: New.

So I've really never dug into strub, but this would seem to imply that
an error from strub is non-fatal?



Yeah.  I believe that's no different from other passes.



Various other passes have seen_errors guards, but ipa-prof didn't.


Specifically, pass_build_ssa_passes in passes.cc is gated with
!seen_errors(), so we skip all the passes bundled in it, and don't
advance the symtab state to IPA_SSA.  So other passes that would require
IPA_SSA need to be gated similarly.


I suppose the insertion point for the strubm pass was one where others
passes didn't previously issue errors, so that wasn't an issue for
ipa-prof.  But now it is.


The patch needed adjustments to resolve conflicts with unrelated
changes.


[tree-prof] skip if errors were seen [PR113681]

ipa_tree_profile asserts that the symtab is in IPA_SSA state, but we
don't reach that state and ICE if e.g. ipa-strub passes report errors.
Skip this pass if errors were seen.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/113681
* tree-profiling.cc (pass_ipa_tree_profile::gate): Skip if
seen_errors.

for  gcc/testsuite/ChangeLog

PR tree-optimization/113681
* c-c++-common/strub-pr113681.c: New.

OK.
jeff



Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-06-08 Thread Jeff Law




On 1/18/24 12:54 PM, Roger Sayle wrote:


This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:

unsigned long long foo(unsigned long long x) { return x<<2; }

which with -O2 is currently compiled to:

foo:lsr r2,r0,30
 asl_s   r1,r1,2
 asl_s   r0,r0,2
 j_s.d   [blink]
 or_sr1,r1,r2

with this patch becomes:

foo:lsr r2,r0,30
 add2r1,r2,r1
 j_s.d   [blink]
 asl_s   r0,r0,2

unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }

which with -O2 is currently compiled to 6 insns + return:

bar:lsr r12,r0,30
 asl_s   r3,r1,2
 asl_s   r0,r0,2
 lsr_s   r1,r1,30
 or_sr0,r0,r1
 j_s.d   [blink]
 or  r1,r12,r3

with this patch becomes 4 insns + return:

bar:lsr r3,r1,30
 lsr r2,r0,30
 add2r1,r2,r1
 j_s.d   [blink]
 add2r0,r3,r0


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-01-18  Roger Sayle  

gcc/ChangeLog
 * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
 to generate PLUS instead or IOR when unioning disjoint bitfields.
 * optabs.cc (expand_subword_shift): Likewise.
 (expand_binop): Likewise for double-word rotate.
Also note that on some targets like RISC-V, there's more freedom to 
generate compressed instructions from "and" rather than "or".


Anyway, given the time elapsed since submission, I went ahead and 
retested on x86, then committed & pushed to the trunk.


Thanks!

jeff



[gcc r15-1120] [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-06-08 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:2277f987979445f4390a5c6e092d79e04814d641

commit r15-1120-g2277f987979445f4390a5c6e092d79e04814d641
Author: Roger Sayle 
Date:   Sat Jun 8 19:47:08 2024 -0600

[middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word 
shifts/rotates.

This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:

unsigned long long foo(unsigned long long x) { return x<<2; }

which with -O2 is currently compiled to:

foo:lsr r2,r0,30
asl_s   r1,r1,2
asl_s   r0,r0,2
j_s.d   [blink]
or_sr1,r1,r2

with this patch becomes:

foo:lsr r2,r0,30
add2r1,r2,r1
j_s.d   [blink]
asl_s   r0,r0,2

unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }

which with -O2 is currently compiled to 6 insns + return:

bar:lsr r12,r0,30
asl_s   r3,r1,2
asl_s   r0,r0,2
lsr_s   r1,r1,30
or_sr0,r0,r1
j_s.d   [blink]
or  r1,r12,r3

with this patch becomes 4 insns + return:

bar:lsr r3,r1,30
lsr r2,r0,30
add2r1,r2,r1
j_s.d   [blink]
add2r0,r3,r0

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

gcc/ChangeLog
* expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
to generate PLUS instead or IOR when unioning disjoint bitfields.
* optabs.cc (expand_subword_shift): Likewise.
(expand_binop): Likewise for double-word rotate.

Diff:
---
 gcc/expmed.cc | 12 +++-
 gcc/optabs.cc |  8 
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 50d22762cae..9ba01695f53 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2616,10 +2616,11 @@ expand_shift_1 (enum tree_code code, machine_mode mode, 
rtx shifted,
  else if (methods == OPTAB_LIB_WIDEN)
{
  /* If we have been unable to open-code this by a rotation,
-do it as the IOR of two shifts.  I.e., to rotate A
-by N bits, compute
+do it as the IOR or PLUS of two shifts.  I.e., to rotate
+A by N bits, compute
 (A << N) | ((unsigned) A >> ((-N) & (C - 1)))
-where C is the bitsize of A.
+where C is the bitsize of A.  If N cannot be zero,
+use PLUS instead of IOR.
 
 It is theoretically possible that the target machine might
 not be able to perform either shift and hence we would
@@ -2656,8 +2657,9 @@ expand_shift_1 (enum tree_code code, machine_mode mode, 
rtx shifted,
  temp1 = expand_shift_1 (left ? RSHIFT_EXPR : LSHIFT_EXPR,
  mode, shifted, other_amount,
  subtarget, 1);
- return expand_binop (mode, ior_optab, temp, temp1, target,
-  unsignedp, methods);
+ return expand_binop (mode,
+  CONST_INT_P (op1) ? add_optab : ior_optab,
+  temp, temp1, target, unsignedp, methods);
}
 
  temp = expand_binop (mode,
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index e7913884567..78cd9ef3448 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -566,8 +566,8 @@ expand_subword_shift (scalar_int_mode op1_mode, optab 
binoptab,
   if (tmp == 0)
return false;
 
-  /* Now OR in the bits carried over from OUTOF_INPUT.  */
-  if (!force_expand_binop (word_mode, ior_optab, tmp, carries,
+  /* Now OR/PLUS in the bits carried over from OUTOF_INPUT.  */
+  if (!force_expand_binop (word_mode, add_optab, tmp, carries,
   into_target, unsignedp, methods))
return false;
 }
@@ -1937,7 +1937,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, 
rtx op1,
 NULL_RTX, unsignedp, next_methods);
 

Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-06-08 Thread Jeff Law




On 5/29/24 5:12 AM, Mariam Arutunian wrote:



IIRC we looked at the problem of canonicalizing the loop into a form
where we didn't necessarily have conditional blocks, instead we had
branchless sequences for the conditional xor and dealing with the high
bit in the crc.  My recollection was that the coremark CRC loop would
always canonicalize, but that in general we still saw multiple CRC
implementations that did not canonicalize and thus we still needed the
more complex matching.  Correct?


The loop in CoreMark is not fully canonicalized in that form,
as there are still branches present for the conditional XOR operation.
I checked that using the -O2 and -O3 flags.
A bit of a surprise.  Though it may be the case that some of the 
canonicalization steps are happening later in the pipeline.  No worries 
as I think we'd already concluded that we'd see at least some CRC 
implementations that wouldn't canonicalize down to branchless sequences 
for the conditional xor.






 > +
 > +gimple *
 > +crc_optimization::find_shift_after_xor (tree xored_crc)
 > +{
 > +  imm_use_iterator imm_iter;
 > +  use_operand_p use_p;
 > +
 > +  if (TREE_CODE (xored_crc) != SSA_NAME)
 > +    return nullptr;
If we always expect XORED_CRC to be an SSA_NAME, we might be able to use
gcc_assert TREE_CODE (XORED_CRC) == SSA_NAME);

I'm not sure that it always has to be an SSA_NAME.

For a logical operation like XOR it should always have the form

SSA_NAME = SSA_NAME ^ (SSA_NAME | CONSTANT)

The constant might be a vector  constant, but the basic form won't 
change.  It's one of the nicer properties of gimple.  In contrast RTL 
would allow a variety of lvalues and rvalues, including MEMs, REGs, 
SUBREGs, extensions, other binary ops, etc etc.




 > +
 > +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields.
 > +   Returns false if there are more than two (as in CRC
calculation only CRC's
 > +   and data's phi may exist) or no phi statements in STMTS (at
least there must
 > +   be CRC's phi).
 > +   Otherwise, returns true.  */
 > +
 > +bool
 > +crc_optimization::set_crc_and_data_phi (auto_vec )
 > +{
 > +  for (auto stmt_it = stmts.begin (); stmt_it != stmts.end ();
stmt_it++)
 > +    {
 > +      if (is_a (*stmt_it) && bb_loop_header_p (gimple_bb
(*stmt_it)))
 > +     {
 > +       if (!m_phi_for_crc)
 > +         m_phi_for_crc = as_a (*stmt_it);
 > +       else if (!m_phi_for_data)
 > +         m_phi_for_data = as_a (*stmt_it);
 > +       else
 > +         {
 > +           if (dump_file && (dump_flags & TDF_DETAILS))
 > +             fprintf (dump_file, "Xor-ed variable depends on
more than 2 "
 > +                                 "phis.\n");
 > +           return false;
 > +         }
 > +     }
 > +    }
 > +  return m_phi_for_crc;
Hmm.  For a given PHI, how do we know if it's for the data item or the
crc item, or something else (like a loop counter) entirely?



I trace the def-use chain upwards from the XOR statement to determine 
which PHI node corresponds to CRC and data.
Since we assume the loop calculates CRC, I expect only variables 
representing data and CRC to participate in these operations.
In the implementations I support, the loop counter is used only for the 
iteration.
Any misidentification of CRC and data would occur only if the loop 
doesn't calculate CRC, in which case next checks would fail, leading the 
algorithm to identify it as not CRC.


Here, the PHI nodes for CRC and data might be mixed in places.
I just assume that the first found PHI is CRC, second data.
I correctly determine them later with the | 
*swap_crc_and_data_if_needed*| function.

Ah, OK.  That probably deserves a comment in this code.


jeff


Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation

2024-06-08 Thread Jeff Law




On 5/27/24 7:51 AM, Mariam Arutunian wrote:



I carefully reviewed the indentation of the code using different editors 
and viewers, and everything appeared correct.
I double-checked the specific sections mentioned, and they also looked 
right.

In this reply message I see that it's not correct. I'll try to fix it.

Thanks for double-checking.  It's one of the downsides of email based flows.

Jeff


Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-06-08 Thread Jeff Law




On 6/4/24 7:41 AM, Mariam Arutunian wrote:
/Mariam, your thoughts on whether or not those two phases could handle a 
loop with two CRC calculations inside, essentially creating two calls to 
our new builtins? /


/
/

It is feasible, but it would likely demand considerable effort and 
additional work to implement effectively.
Thanks for the confirmation.  I suspect it likely doesn't come up often 
in practice either.






The key would be to only simulate the use-def cycle from the loop-closed PHI 
(plus the loop control of course, but miter/SCEV should be enough there) and 
just replace that LC PHI, leaving loop DCE to DCE.


Thank you, this is a good idea to just replace the PHI and leave the loop to 
DCE to remove only single CRC parts.
It does seem like replacing the PHI when we have an optimizable case 
might simplify that aspect of the implementation.





The current pass only verifies cases where a single CRC calculation is 
performed within the loop. During the verification phase,
I ensure that there are no other calculations aside from those necessary for 
the considered CRC computation.

Also, when I was investigating the bitwise CRC implementations used in 
different software, in all cases the loop was calculating just one CRC and no 
other calculations were done.
Thus, in almost all cases, the first phase will filter out non-CRCs, and during 
the second phase, only real CRCs with no other calculations will be executed.
This ensures that unnecessary statements won't be executed in most cases.
But we may have had a degree of sampling bias here.  If I remember 
correctly I used the initial filtering pass as the "trigger" to report a 
potential CRC case.  If that initial filtering pass rejected cases with 
other calculations in the loop, then we never would have seen those.




Leaving the loop to DCE will simplify the process of removing parts connected 
to a single CRC calculation.
However, since now we detect a loop that only calculates a single CRC, we can 
entirely remove it at this stage without additional checks.
Let's evaluate this option as we get to the later patches in the series. 
 What I like about Richard's suggestion is that it "just works" and it 
will continue to work, even as the overall infrastructure changes.  In 
contrast a bespoke loop removal implementation in a specific pass may 
need adjustment if other aspects of our infrastructure change.






If we really want a separate pass (or utility to work on a single 
loop) then we might consider moving some of the final value replacement 
code that doesn’t work with only SCEV there as well. There’s also 
special code in loop distribution for strlen recognition now, not 
exactly fitting in. >



Note I had patches to do final value replacement on demand from CD-DCE when it 
figures a loop has no side effects besides of its reduction outputs (still want 
to pick this up at some point again).


Oh, this could provide useful insights for our implementation.
Are you thinking of reusing that on-demand analysis to reduce the set of 
loops we analyze?


Jeff



Re: [PING] [contrib] validate_failures.py: fix python 3.12 escape sequence warnings

2024-06-08 Thread Jeff Law




On 5/14/24 8:12 AM, Gabi Falk wrote:

Hi,

This one still needs review:

https://inbox.sourceware.org/gcc-patches/20240415233833.104460-1-gabif...@gmx.com/

I think I just ACK'd an equivalent patch from someone else this week.

jeff



Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-08 Thread Jeff Law




On 3/1/24 1:12 AM, Demin Han wrote:

Hi juzhe,

I also thought it’s related to commutive firstly.

Following things make me to do the removal:

1.No tests fails in regression

2.When I write if (a == 2) and if (2 == a), the results are same

GCC canonicalizes comparisons so that constants appear second.

Jeff


Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-08 Thread Jeff Law




On 2/29/24 11:27 PM, demin.han wrote:

We can unify eqne and other comparison operations.

Tested on RV32 and RV64

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Remove eqne cond
* config/riscv/vector.md (@pred_eqne_scalar): Remove patterns
(*pred_eqne_scalar_merge_tie_mask): Ditto
(*pred_eqne_scalar): Ditto
(*pred_eqne_scalar_narrow): Ditto
So I'll tentatively ACK this for the trunk, assuming Robin doesn't 
object before Tuesday's patchwork meeting.


jeff




Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-08 Thread Jeff Law




On 5/16/24 1:21 PM, Robin Dapp wrote:

Can eqne pattern removal patches be committed firstly?


Please first make sure you test with corner cases, NaNs in
particular.  I'm pretty sure we don't have any test cases for
those.

But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?

target = (a == b) ? x : y
target = (a != b) ? y : x

Are equivalent, even for IEEE IIRC.

jeff




Re: [PATCH 0/2] fix RISC-V zcmp popretz [PR113715]

2024-06-08 Thread Jeff Law




On 6/5/24 8:42 PM, Fei Gao wrote:


But let's back up and get a good explanation of what the problem is.
Based on patch 2/2 it looks like we have lost an assignment to the
return register.

To someone not familiar with this code, it sounds to me like we've made
a mistake earlier and we're now defining a hook that lets us go back and
fix that earlier mistake.   I'm probably wrong, but so far that's what
it sounds like.

Hi Jeff

You're right. Let me rephrase  patch 2/2 with more details. Search /* feigao to 
location the point I'm
tring to explain.

code snippets from gcc/function.cc
void
thread_prologue_and_epilogue_insns (void)
{
...
   /*feigao:
         targetm.gen_epilogue () is called here to generate epilogue sequence.

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864
Commit above tries in targetm.gen_epilogue () to detect if
there's li  a0,0 insn at the end of insn chain, if so, cm.popret
is replaced by cm.popretz and lia0,0 insn is deleted.
So that seems like the critical issue.  Generation of the 
prologue/epilogue really shouldn't be changing other instructions in the 
instruction stream.  I'm not immediately aware of another target that 
does that, an it seems like a rather risky thing to do.



It looks like the cm.popretz's RTL exposes the assignment to a0 and 
there's a DCE pass that runs after insertion of the prologue/epilogue. 
So I would suggest leaving the assignment to a0 in the RTL chain and see 
if the later DCE pass after prologue generation eliminates the redundant 
assignment.  That seems a lot cleaner.




Jeff


Re: [PATCH] haifa-sched: Avoid the fusion priority of the fused insn to affect the subsequent insn sequence.

2024-06-08 Thread Jeff Law




On 6/6/24 8:51 PM, Jin Ma wrote:


I am very sorry that I did not check the commit information carefully. The 
statement is somewhat inaccurate.


When the insn 1 and 2, 3 and 4 can be fusioned, then there is the
following sequence:

;;    insn |
;;      1  | sp=sp-0x18
;;  +   2  | [sp+0x10]=ra
;;      3  | [sp+0x8]=s0
;;      4  | [sp+0x0]=s1



The fusion priority of the insn 2, 3, and 4 are the same. According to
the current algorithm, since abs(0x10-0x8)


;;    insn |
;;      1  | sp=sp-0x18
;;  +   2  | [sp+0x10]=ra
;;      4  | [sp+0x8]=s1
;;  +   3  | [sp+0x0]=s0

gcc/ChangeLog:



  * haifa-sched.cc (rank_for_schedule): Likewise.


When the insn 1 and 2, 4 and 3 can be fusioned, then there is the
following sequence:

;;    insn |
;;      1  | sp=sp-0x18
;;  +   2  | [sp+0x10]=ra
;;      3  | [sp+0x8]=s0
;;      4  | [sp+0x0]=s1

The fusion priority of the insn 2, 3, and 4 are the same. According to
the current algorithm, since abs(0x10-0x8)I'd really love to see a testcase here, particularly since I'm still 
having trouble understanding the code you're currently getting vs the 
code you want.


Furthermore, I think I need to understand the end motivation here.  I 
always think of fusion priority has bringing insns consecutive so that 
peephole pass can then squash two more more insns into a single insn. 
THe canonical case being load/store pairs.



If you're trying to generate pairs, then that's fine.  I just want to 
make sure I understand the goal.  And if you're trying to generate pairs 
what actually can be paired?  I must admit I don't have any notable 
experience with the thead core extensions.


If you're just trying to keep the instructions consecutive in the IL, 
then I don't think fusion priorities are a significant concern.  Much 
more important for that case is the fusion pair detection (which I think 
is about to get a lot more attention in the near future).


Jeff



Re: How to target a processor with very primitive addressing modes?

2024-06-08 Thread Jeff Law via Gcc




On 6/8/24 10:45 AM, Paul Koning via Gcc wrote:




On Jun 8, 2024, at 5:32 AM, Mikael Pettersson via Gcc  wrote:

On Thu, Jun 6, 2024 at 8:59 PM Dimitar Dimitrov  wrote:

Have you tried defining TARGET_LEGITIMIZE_ADDRESS for your target? From
a quick search I see that the iq2000 and rx backends are rewriting some
PLUS expression addresses with insn sequence to calculate the address.


I have partial success.

The key was to define both TARGET_LEGITIMATE_ADDRESS_P and an
addptr3 insn.

I had tried TARGET_LEGITIMATE_ADDRESS_P before, together with various
combinations of TARGET_LEGITIMIZE_ADDRESS and
LEGITIMIZE_RELOAD_ADDRESS, but they all threw gcc into reload loops.

My add3 insn clobbers the CC register. The docs say to define
addptr3 in this case, and that eliminated the reload loops.

The issue now is that the machine cannot perform an add without
clobbering the CC register, so I'll have to hide that somehow. When
emitting the asm code, can one check if the CC register is LIVE-OUT
from the insn? If it isn't I shouldn't have to generate code to
preserve it.

/Mikael


I'm not sure why add that clobbers CC requires anything special (other than of 
course showing the CC register as clobbered in the definition).  pdp11 is 
another target that only has a CC-clobbering add.  Admittedly, it does have 
register+offset addressing modes, but still the reload machinery deals just 
fine with add operations like that.
If he's got a CC register exposed prior to LRA and LRA needs to insert 
any code, that inserted code may clobber the CC state.  This is 
discussed in the reload-to-LRA transition wiki page.


jeff



Re: Reverted recent patches to resource.cc

2024-06-08 Thread Jeff Law




On 5/29/24 8:07 PM, Jeff Law wrote:



On 5/29/24 7:28 PM, Hans-Peter Nilsson wrote:

From: Hans-Peter Nilsson 
Date: Mon, 27 May 2024 19:51:47 +0200



2: Does not depend on 1, but corrects an incidentally found wart:
find_basic_block calls fails too often.  Replace it with "modern"
insn-to-basic-block cross-referencing.

3: Just an addendum to 2: removes an "if", where the condition is now
always-true, dominated by a gcc_assert, and where the change in
indentation was too ugly.

4: Corrects another incidentally found wart: for the last 15 years the
code in resource.cc has only been called from within reorg.cc (and
reorg.c), specifically not possibly before calling init_resource_info
or after free_resource_info, so we can discard the code that tests
certain allocated arrays for NULL.  I didn't even bother with a
gcc_assert; besides some gen*-generated files, only reorg.cc includes
resource.h (not to be confused with the system sys/resource.h).
A grep says the #include resource.h can be removed from those gen*
files and presumably from RESOURCE_H(!) as well.  Some Other Time.
Also, removed a redundant "if (tinfo != NULL)" and moved the then-code
into the previous then-clause.

   resource.cc: Replace calls to find_basic_block with cfgrtl
 BLOCK_FOR_INSN
   resource.cc (mark_target_live_regs): Remove check for bb not found
   resource.cc: Remove redundant conditionals


I had to revert those last three patches due to PR
bootstrap/115284.  I hope to revisit once I have a means to
reproduce (and fix) the underlying bug.  It doesn't have to
be a bug with those changes per-se: IMHO the "improved"
lifetimes could just as well have uncovered a bug elsewhere
in reorg.  It's still on me to resolve that situation; done.
I'm just glad the cause was the incidental improvements and
not the original bug I wanted to fix.

There appears to be only a single supported SPARC machine in
cfarm: cfarm216, and I currently can't reach it due to what
appears to be issues at my end.  I guess I'll either fix
that or breathe life into sparc-elf+sim.

Or if you've got a reasonable server to use, QEMU might save you :-)



Even better option.  The sh4/sh4eb-linux-gnu ports with 
execute/ieee/fp-cmp-5.c test.  That started execution failing at -O2 
with the first patch in the series and there are very clear assembly 
differences before/after your change.  Meaning you can probably look at 
them with just a cross compile and compare the before/after.



Jeff


Re: How to target a processor with very primitive addressing modes?

2024-06-08 Thread Jeff Law via Gcc




On 6/8/24 3:32 AM, Mikael Pettersson via Gcc wrote:

On Thu, Jun 6, 2024 at 8:59 PM Dimitar Dimitrov  wrote:

Have you tried defining TARGET_LEGITIMIZE_ADDRESS for your target? From
a quick search I see that the iq2000 and rx backends are rewriting some
PLUS expression addresses with insn sequence to calculate the address.


I have partial success.

The key was to define both TARGET_LEGITIMATE_ADDRESS_P and an
addptr3 insn.

If it doesn't work without TARGET_LEGITIMATE_ADDRESS, then it's wrong.

At the highest level that hook is meant to provide a way for the target 
to adjust addresses to optimize them better.  If you're using it for 
correctness purposes, it's ultimately going to fail in one way or another.


GCC has certainly supported targets with limited addressing modes in the 
past (ia64 being a good example).  It's painful to deal with such 
targets, but it can be made to work.


Jeff


Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-06-08 Thread Jeff Law




On 6/8/24 1:53 AM, Richard Sandiford wrote:



I realise there are many ways of writing this out there though,
so that's just a suggestion.  (And only lightly tested.)

FWIW, we could easily extend the interface to work on wide_ints if we
ever need it for N>63.
I think there's constraints elsewhere that keep us in the N<=63 range. 
If we extended things elsewhere to include TI then we could fully 
support 64bit CRCs.


I don't *think* it's that hard, but we haven't actually tried.

Jeff




Re: [PATCH v2 3/3] RISC-V: Add Zalrsc amo-op patterns

2024-06-07 Thread Jeff Law




On 6/3/24 3:53 PM, Patrick O'Neill wrote:

All amo patterns can be represented with lrsc sequences.
Add these patterns as a fallback when Zaamo is not enabled.

gcc/ChangeLog:

* config/riscv/sync.md (atomic_): New expand 
pattern.
(amo_atomic_): Rename amo pattern.
(atomic_fetch_): New lrsc sequence pattern.
(lrsc_atomic_): New expand pattern.
(amo_atomic_fetch_): Rename amo pattern.
(lrsc_atomic_fetch_): New lrsc sequence pattern.
(atomic_exchange): New expand pattern.
(amo_atomic_exchange): Rename amo pattern.
(lrsc_atomic_exchange): New lrsc sequence pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-1.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-2.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-3.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-4.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
--
rv64imfdc_zalrsc has the same testsuite results as rv64imafdc after this
patch is applied.
---
AFAIK there isn't a way to subtract an extension similar to dg-add-options.
As a result I needed to specify a -march string for
amo-zaamo-preferred-over-zalrsc.c instead of using testsuite infra.

I believe you are correct.




diff --git a/gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c 
b/gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
new file mode 100644
index 000..1c124c2b8b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c

[ ... ]
Not a big fan of the function-bodies tests.  If we're going to use them, 
we need to be especially careful about requiring specific registers so 
that we're not stuck adjusting them all the time due to changes in the 
regsiter allocator, optimizers, etc.



diff --git a/gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
new file mode 100644
index 000..3cd6ce04830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that lrsc atomic op mappings match Table A.6's recommended mapping.  
*/
+/* { dg-options "-O3 -march=rv64id_zalrsc" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** foo:
+** 1:
+** lr.w\ta5, 0\(a0\)
+** add\ta5, a5, a1
+** sc.w\ta5, a5, 0\(a0\)
+**  bnez\ta5, 1b
+** ret
+*/
+void foo (int* bar, int* baz)
+{
+  __atomic_add_fetch(bar, baz, __ATOMIC_RELAXED);
+}
This one is a good example.  We could just as easily use a variety of 
registers other than a5 for the temporary.


Obviously for registers that hold the incoming argument or an outgoing 
result, we can be more strict.


If you could take a look at the added tests and generalize the registers 
it'd be appreciated.  OK with that adjustment.


jeff




Re: [PATCH v2 2/3] RISC-V: Add Zalrsc and Zaamo testsuite support

2024-06-07 Thread Jeff Law




On 6/3/24 3:53 PM, Patrick O'Neill wrote:

Convert testsuite infrastructure to use Zalrsc and Zaamo rather than A.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Use Zaamo rather than A.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Use Zalrsc rather
than A.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Use Zaamo rather
than A.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Zaamo option.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Use Zalrsc 
rather
than A.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
* lib/target-supports.exp: Add testsuite infrastructure support for
Zaamo and Zalrsc.
So there's a lot of whitespace changes going on in target-supports.exp 
that make it harder to find the real changes.


There's always a bit of a judgement call for that kind of thing.  This 
one probably goes past would generally recommend, meaning that the 
formatting stuff would be a separate patch.


A reasonable starting point would be if you're not changing the function 
in question, then fixing formatting in it probably should be a distinct 
patch.


You probably should update the docs in sourcebuild.texi for the new 
target-supports tests.


So OK for the trunk (including the whitespace fixes) with a suitable 
change to sourcebuild.texi.


jeff


Re: [PATCH v2 1/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-07 Thread Jeff Law




On 6/3/24 3:53 PM, Patrick O'Neill wrote:

The A extension has been split into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
Ratification: https://jira.riscv.org/browse/RVS-1995

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zaamo and Zalrsc.
* config/riscv/arch-canonicalize: Make A imply Zaamo and Zalrsc.
* config/riscv/riscv.opt: Add Zaamo and Zalrsc
* config/riscv/sync.md: Convert TARGET_ATOMIC to TARGET_ZAAMO and
TARGET_ZALRSC.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: Adjust expected arch string.
* gcc.target/riscv/attribute-16.c: Ditto.
* gcc.target/riscv/attribute-17.c: Ditto.
* gcc.target/riscv/attribute-18.c: Ditto.
* gcc.target/riscv/pr110696.c: Ditto.
* gcc.target/riscv/rvv/base/pr114352-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr114352-3.c: Ditto.

OK
jeff



Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-07 Thread Jeff Law




On 6/6/24 4:10 AM, Manolis Tsamis wrote:

This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

  strbw2, [x1, 1]
  ldr x0, [x1]  # Expensive store forwarding to larger load.

To:

  ldr x0, [x1]
  strbw2, [x1]
  bfi x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:  +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:+17.5%

gcc/ChangeLog:

* Makefile.in: Add avoid-store-forwarding.o.
* common.opt: New option -favoid-store-forwarding.
* params.opt: New param store-forwarding-max-distance.
* doc/invoke.texi: Document new pass.
* doc/passes.texi: Document new pass.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
So this is getting a lot more interesting.  I think the first time I 
looked at this it was more concerned with stores feeding something like 
a load-pair and avoiding the store forwarding penalty for that case.  Am 
I mis-remembering, or did it get significantly more general?







+
+static unsigned int stats_sf_detected = 0;
+static unsigned int stats_sf_avoided = 0;
+
+static rtx
+get_load_mem (rtx expr)
Needs a function comment.  You should probably mention that EXPR must be 
a single_set in that comment.




 +

+  rtx dest;
+  if (eliminate_load)
+dest = gen_reg_rtx (load_inner_mode);
+  else
+dest = SET_DEST (load);
+
+  int move_to_front = -1;
+  int total_cost = 0;
+
+  /* Check if we can emit bit insert instructions for all forwarded stores.  */
+  FOR_EACH_VEC_ELT (stores, i, it)
+{
+  it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
+  rtx_insn *insns = NULL;
+
+  /* If we're eliminating the load then find the store with zero offset
+and use it as the base register to avoid a bit insert.  */
+  if (eliminate_load && it->offset == 0)
So often is this triggering?  We have various codes in the gimple 
optimizers to detect store followed by a load from the same address and 
do the forwarding.  If they're happening with any frequency that would 
be a good sign code in DOM and elsewhere isn't working well.


THe way these passes detect this case is to take store, flip the 
operands around (ie, it looks like a load) and enter that into the 
expression hash tables.  After that standard redundancy elimination 
approaches will work.




+   {
+ start_sequence ();
+
+ /* We can use a paradoxical subreg to force this to a wider mode, as
+the only use will be inserting the bits (i.e., we don't care about
+the value of the higher bits).  */
Which may be a good hint about the cases you're capturing -- if the 
modes/sizes differ that would make more sense since I don't think we're 
as likely to be capturing those cases.




diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4e8967fd8ab..c769744d178 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12657,6 +12657,15 @@ loop unrolling.
  This option is enabled by default at optimization levels @option{-O1},
  @option{-O2}, @option{-O3}, @option{-Os}.
  
+@opindex favoid-store-forwarding

+@item -favoid-store-forwarding
+@itemx -fno-avoid-store-forwarding
+Many CPUs will stall for many cycles when a load partially depends on previous
+smaller stores.  This pass tries to detect such cases and avoid the penalty by
+changing the order of the load and store and then fixing up the loaded value.
+
+Disabled by default.
Is there any particular reason why this would be off by default at -O1 
or higher?  It would seem to me that on modern cores that this 
transformation should easily be a win.  Even on an old in-order core, 
avoiding the load with the bit insert is likely profitable, just not as 
much so.






diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..b8115f5c27a 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1032,6 +1032,10 @@ Allow the store merging pass to introduce unaligned 
stores if it is legal to do
  Common Joined UInteger Var(param_store_merging_max_size) Init(65536) 
IntegerRange(1, 65536) Param Optimization
  Maximum size of a single store merging region in bytes.
  
+-param=store-forwarding-max-distance=

+Common Joined UInteger Var(param_store_forwarding_max_distance) Init(10) 
IntegerRange(1, 1000) Param Optimization
+Maximum number of 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for scalar unsigned SAT_ADD form 3

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4b3c0b3380d38553e76bbf01e1ac5b3f66dc3d5c

commit 4b3c0b3380d38553e76bbf01e1ac5b3f66dc3d5c
Author: Pan Li 
Date:   Mon Jun 3 10:24:47 2024 +0800

RISC-V: Add testcases for scalar unsigned SAT_ADD form 3

After the middle-end support the form 3 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 3 of unsigned .SAT_ADD.

Form 3:
  #define SAT_ADD_U_3(T) \
  T sat_add_u_3_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, ) ? -1 : ret; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 3.
* gcc.target/riscv/sat_u_add-13.c: New test.
* gcc.target/riscv/sat_u_add-14.c: New test.
* gcc.target/riscv/sat_u_add-15.c: New test.
* gcc.target/riscv/sat_u_add-16.c: New test.
* gcc.target/riscv/sat_u_add-run-13.c: New test.
* gcc.target/riscv/sat_u_add-run-14.c: New test.
* gcc.target/riscv/sat_u_add-run-15.c: New test.
* gcc.target/riscv/sat_u_add-run-16.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 39dde9200dd936339df7dd6c8f56e88866bcecc5)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-13.c | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-14.c | 21 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-15.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_add-16.c | 17 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-13.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-14.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-15.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-16.c | 25 +++
 9 files changed, 185 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index d44fd63fd83..adb8be5886e 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -26,6 +26,15 @@ sat_u_add_##T##_fmt_3 (T x, T y)\
   return (T)(-overflow) | ret;  \
 }
 
+#define DEF_SAT_U_ADD_FMT_4(T)   \
+T __attribute__((noinline))  \
+sat_u_add_##T##_fmt_4 (T x, T y) \
+{\
+  T ret; \
+  return __builtin_add_overflow (x, y, ) ? -1 : ret; \
+}
+
+
 #define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
 void __attribute__((noinline))   \
 vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
@@ -42,6 +51,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
 #define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
 #define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
 #define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
+#define RUN_SAT_U_ADD_FMT_4(T, x, y) sat_u_add_##T##_fmt_4(x, y)
 
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-13.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-13.c
new file mode 100644
index 000..b2d93f29f48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-13.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_4:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_4(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-14.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-14.c
new file mode 100644
index 000..eafc578aafa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-14.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_4:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:efe00579c04e02b6132c678962ce8050c8759bee

commit efe00579c04e02b6132c678962ce8050c8759bee
Author: Pan Li 
Date:   Wed May 29 14:15:45 2024 +0800

RISC-V: Add testcases for scalar unsigned SAT_ADD form 1

After the middle-end support the form 1 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 1 of unsigned .SAT_ADD.

Form 1:

  #define SAT_ADD_U_1(T)   \
  T sat_add_u_1_##T(T x, T y)  \
  {\
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Passed the riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for form 1.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit a737c2bf5212822b8225f65efa643a968e5a7c78)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h   |  8 
 gcc/testsuite/gcc.target/riscv/sat_u_add-5.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add-6.c | 21 
 gcc/testsuite/gcc.target/riscv/sat_u_add-7.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-8.c | 17 
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c | 25 
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c | 25 
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c | 25 
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c | 25 
 9 files changed, 183 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 2ef9fd825f3..2abc83d7666 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -10,6 +10,13 @@ sat_u_add_##T##_fmt_1 (T x, T y)   \
   return (x + y) | (-(T)((T)(x + y) < x)); \
 }
 
+#define DEF_SAT_U_ADD_FMT_2(T)   \
+T __attribute__((noinline))  \
+sat_u_add_##T##_fmt_2 (T x, T y) \
+{\
+  return (T)(x + y) >= x ? (x + y) : -1; \
+}
+
 #define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
 void __attribute__((noinline))   \
 vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
@@ -24,6 +31,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
 }
 
 #define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
 
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
new file mode 100644
index 000..4c73c7f8a21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-6.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-6.c
new file mode 100644
index 000..0d64f5631bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-6.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Regenerate opt urls.

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4f20feccf708ff7a7af5d776ca87d4995ef46f76

commit 4f20feccf708ff7a7af5d776ca87d4995ef46f76
Author: Robin Dapp 
Date:   Thu Jun 6 09:32:28 2024 +0200

RISC-V: Regenerate opt urls.

I wasn't aware that I needed to regenerate the opt urls when
adding an option.  This patch does that.

gcc/ChangeLog:

* config/riscv/riscv.opt.urls: Regenerate.

(cherry picked from commit 037fc4d1012dc9d533862ef7e2c946249877dd71)

Diff:
---
 gcc/config/riscv/riscv.opt.urls | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index d87e9d5c9a8..622cb6e7b44 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -47,6 +47,12 @@ UrlSuffix(gcc/RISC-V-Options.html#index-mcmodel_003d-4)
 mstrict-align
 UrlSuffix(gcc/RISC-V-Options.html#index-mstrict-align-4)
 
+mscalar-strict-align
+UrlSuffix(gcc/RISC-V-Options.html#index-mscalar-strict-align)
+
+mvector-strict-align
+UrlSuffix(gcc/RISC-V-Options.html#index-mvector-strict-align)
+
 ; skipping UrlSuffix for 'mexplicit-relocs' due to finding no URLs
 
 mrelax


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for scalar unsigned SAT_ADD form 5

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1a6d2ed7fbd20bfa3079da4700eb591f2abaa395

commit 1a6d2ed7fbd20bfa3079da4700eb591f2abaa395
Author: Pan Li 
Date:   Mon Jun 3 10:43:10 2024 +0800

RISC-V: Add testcases for scalar unsigned SAT_ADD form 5

After the middle-end support the form 5 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 5 of unsigned .SAT_ADD.

Form 5:
  #define SAT_ADD_U_5(T) \
  T sat_add_u_5_##T(T x, T y) \
  { \
return (T)(x + y) < x ? -1 : (x + y); \
  }

Passed the riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 5.
* gcc.target/riscv/sat_u_add-21.c: New test.
* gcc.target/riscv/sat_u_add-22.c: New test.
* gcc.target/riscv/sat_u_add-23.c: New test.
* gcc.target/riscv/sat_u_add-24.c: New test.
* gcc.target/riscv/sat_u_add-run-21.c: New test.
* gcc.target/riscv/sat_u_add-run-22.c: New test.
* gcc.target/riscv/sat_u_add-run-23.c: New test.
* gcc.target/riscv/sat_u_add-run-24.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 93f44e18cddb2b5eb3a00232d3be9a5bc8179f25)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 
 gcc/testsuite/gcc.target/riscv/sat_u_add-21.c | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-22.c | 21 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-23.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_add-24.c | 17 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-21.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-22.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-23.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-24.c | 25 +++
 9 files changed, 183 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 6ca158d57c4..976ef1c44c1 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -42,6 +42,13 @@ sat_u_add_##T##_fmt_5 (T x, T y) 
 \
   return __builtin_add_overflow (x, y, ) == 0 ? ret : -1; \
 }
 
+#define DEF_SAT_U_ADD_FMT_6(T)  \
+T __attribute__((noinline)) \
+sat_u_add_##T##_fmt_6 (T x, T y)\
+{   \
+  return (T)(x + y) < x ? -1 : (x + y); \
+}
+
 #define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
 void __attribute__((noinline))   \
 vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
@@ -60,6 +67,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
 #define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
 #define RUN_SAT_U_ADD_FMT_4(T, x, y) sat_u_add_##T##_fmt_4(x, y)
 #define RUN_SAT_U_ADD_FMT_5(T, x, y) sat_u_add_##T##_fmt_5(x, y)
+#define RUN_SAT_U_ADD_FMT_6(T, x, y) sat_u_add_##T##_fmt_6(x, y)
 
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-21.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-21.c
new file mode 100644
index 000..f75e35a5fa9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-21.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_6:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_6(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-22.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-22.c
new file mode 100644
index 000..ad957a061f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-22.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_6:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for scalar unsigned SAT_ADD form 2

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6f5119eed91a2ec0708e38c9f2e5d58169a3f53e

commit 6f5119eed91a2ec0708e38c9f2e5d58169a3f53e
Author: Pan Li 
Date:   Mon Jun 3 09:35:49 2024 +0800

RISC-V: Add testcases for scalar unsigned SAT_ADD form 2

After the middle-end support the form 2 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 2 of unsigned .SAT_ADD.

Form 2:

  #define SAT_ADD_U_2(T) \
  T sat_add_u_2_##T(T x, T y) \
  { \
T ret; \
T overflow = __builtin_add_overflow (x, y, ); \
return (T)(-overflow) | ret; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 2.
* gcc.target/riscv/sat_u_add-10.c: New test.
* gcc.target/riscv/sat_u_add-11.c: New test.
* gcc.target/riscv/sat_u_add-12.c: New test.
* gcc.target/riscv/sat_u_add-9.c: New test.
* gcc.target/riscv/sat_u_add-run-10.c: New test.
* gcc.target/riscv/sat_u_add-run-11.c: New test.
* gcc.target/riscv/sat_u_add-run-12.c: New test.
* gcc.target/riscv/sat_u_add-run-9.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 0261ed4337f62c247b33145a81cd4fb5a69bc5a7)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-10.c | 21 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-11.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_add-12.c | 17 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-9.c  | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c  | 25 +++
 9 files changed, 185 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 2abc83d7666..d44fd63fd83 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -17,6 +17,15 @@ sat_u_add_##T##_fmt_2 (T x, T y) \
   return (T)(x + y) >= x ? (x + y) : -1; \
 }
 
+#define DEF_SAT_U_ADD_FMT_3(T)  \
+T __attribute__((noinline)) \
+sat_u_add_##T##_fmt_3 (T x, T y)\
+{   \
+  T ret;\
+  T overflow = __builtin_add_overflow (x, y, ); \
+  return (T)(-overflow) | ret;  \
+}
+
 #define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
 void __attribute__((noinline))   \
 vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
@@ -32,6 +41,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
 
 #define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
 #define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
+#define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
 
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-10.c
new file mode 100644
index 000..3f627ef80b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-10.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-11.c
new file mode 100644
index 000..b6dc779b212
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-11.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint32_t_fmt_3:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for scalar unsigned SAT_ADD form 4

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b93df02d58c0c448c4b524c07bdf5f3d7c305378

commit b93df02d58c0c448c4b524c07bdf5f3d7c305378
Author: Pan Li 
Date:   Mon Jun 3 10:33:15 2024 +0800

RISC-V: Add testcases for scalar unsigned SAT_ADD form 4

After the middle-end support the form 4 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 4 of unsigned .SAT_ADD.

Form 4:
  #define SAT_ADD_U_4(T) \
  T sat_add_u_4_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, ) == 0 ? ret : -1; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 4.
* gcc.target/riscv/sat_u_add-17.c: New test.
* gcc.target/riscv/sat_u_add-18.c: New test.
* gcc.target/riscv/sat_u_add-19.c: New test.
* gcc.target/riscv/sat_u_add-20.c: New test.
* gcc.target/riscv/sat_u_add-run-17.c: New test.
* gcc.target/riscv/sat_u_add-run-18.c: New test.
* gcc.target/riscv/sat_u_add-run-19.c: New test.
* gcc.target/riscv/sat_u_add-run-20.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit a171aac72408837ed0b20e3912a22c5b4891ace4)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 
 gcc/testsuite/gcc.target/riscv/sat_u_add-17.c | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-18.c | 21 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-19.c | 18 
 gcc/testsuite/gcc.target/riscv/sat_u_add-20.c | 17 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-17.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-18.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-19.c | 25 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-run-20.c | 25 +++
 9 files changed, 183 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index adb8be5886e..6ca158d57c4 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -34,6 +34,13 @@ sat_u_add_##T##_fmt_4 (T x, T y) \
   return __builtin_add_overflow (x, y, ) ? -1 : ret; \
 }
 
+#define DEF_SAT_U_ADD_FMT_5(T)\
+T __attribute__((noinline))   \
+sat_u_add_##T##_fmt_5 (T x, T y)  \
+{ \
+  T ret;  \
+  return __builtin_add_overflow (x, y, ) == 0 ? ret : -1; \
+}
 
 #define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
 void __attribute__((noinline))   \
@@ -52,6 +59,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
 #define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
 #define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
 #define RUN_SAT_U_ADD_FMT_4(T, x, y) sat_u_add_##T##_fmt_4(x, y)
+#define RUN_SAT_U_ADD_FMT_5(T, x, y) sat_u_add_##T##_fmt_5(x, y)
 
 #define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-17.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-17.c
new file mode 100644
index 000..7085ac835f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-17.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_5:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_5(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-18.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-18.c
new file mode 100644
index 000..355ff8ba4ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-18.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_5:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e46fc82745c1a917ade318222d514c881c68ce1a

commit e46fc82745c1a917ade318222d514c881c68ce1a
Author: liuhongt 
Date:   Fri Apr 19 10:29:34 2024 +0800

Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
  (ashifrt:v8hi A 8)
  (const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

gcc/ChangeLog:

PR target/114428
* simplify-rtx.cc
(simplify_context::simplify_binary_operation_1):
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
specific mask.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114428-1.c: New test.

(cherry picked from commit 7876cde25cbd2f026a0ae488e5263e72f8e9bfa0)

Diff:
---
 gcc/simplify-rtx.cc| 25 +++
 gcc/testsuite/gcc.target/i386/pr114428-1.c | 39 ++
 2 files changed, 64 insertions(+)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index bb562c3af2c..216aedbe7e2 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -4050,6 +4050,31 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* (and:v4si
+  (ashiftrt:v4si A 16)
+  (const_vector: 0x x4))
+is just (lshiftrt:v4si A 16).  */
+  if (VECTOR_MODE_P (mode) && GET_CODE (op0) == ASHIFTRT
+ && (CONST_INT_P (XEXP (op0, 1))
+ || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR
+ && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1
+ && GET_CODE (op1) == CONST_VECTOR
+ && CONST_VECTOR_DUPLICATE_P (op1))
+   {
+ unsigned HOST_WIDE_INT shift_count
+   = (CONST_INT_P (XEXP (op0, 1))
+  ? UINTVAL (XEXP (op0, 1))
+  : UINTVAL (XVECEXP (XEXP (op0, 1), 0, 0)));
+ unsigned HOST_WIDE_INT inner_prec
+   = GET_MODE_PRECISION (GET_MODE_INNER (mode));
+
+ /* Avoid UD shift count.  */
+ if (shift_count < inner_prec
+ && (UINTVAL (XVECEXP (op1, 0, 0))
+ == (HOST_WIDE_INT_1U << (inner_prec - shift_count)) - 1))
+   return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP 
(op0, 1));
+   }
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
diff --git a/gcc/testsuite/gcc.target/i386/pr114428-1.c 
b/gcc/testsuite/gcc.target/i386/pr114428-1.c
new file mode 100644
index 000..927476f2269
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114428-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-final { scan-assembler-times "psrlw" 1 } } */
+/* { dg-final { scan-assembler-times "psrld" 1 } } */
+/* { dg-final { scan-assembler-times "psrlq" 1 { target { ! ia32 } } } } */
+
+
+#define SHIFTC 12
+
+typedef int v4si __attribute__((vector_size(16)));
+typedef short v8hi __attribute__((vector_size(16)));
+typedef long long v2di __attribute__((vector_size(16)));
+
+v8hi
+foo1 (v8hi a)
+{
+  return
+(a >> (16 - SHIFTC)) & (__extension__(v8hi){(1<> (32 - SHIFTC)) & (__extension__(v4si){(1<> (long long)(64 - SHIFTC)) & (__extension__(v2di){(1ULL<

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Introduce -mvector-strict-align.

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0e0b666a30f53364292432903b68febd85a3e114

commit 0e0b666a30f53364292432903b68febd85a3e114
Author: Robin Dapp 
Date:   Tue May 28 21:19:26 2024 +0200

RISC-V: Introduce -mvector-strict-align.

this patch disables movmisalign by default and introduces
the -mno-vector-strict-align option to override it and re-enable
movmisalign.  For now, generic-ooo is the only uarch that supports
misaligned vector access.

The patch also adds a check_effective_target_riscv_v_misalign_ok to
the testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

Changes from v3:
 - Adressed Kito's comments.
 - Made -mscalar-strict-align a real alias.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and map to riscv_vector_unaligned_access_p.
* config/riscv/riscv.opt: Add -mvector-strict-align.
* config/riscv/riscv.cc (struct riscv_tune_param): Add
vector_unaligned_access.
(riscv_override_options_internal): Set
riscv_vector_unaligned_access_p.
* doc/invoke.texi: Document -mvector-strict-align.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mno-vector-strict-align.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.

(cherry picked from commit 68b0742a49de7122d5023f0bf46460ff2fb3e3dd)

Diff:
---
 gcc/config/riscv/riscv-opts.h  |  3 --
 gcc/config/riscv/riscv.cc  | 19 
 gcc/config/riscv/riscv.h   |  5 
 gcc/config/riscv/riscv.opt |  8 +
 gcc/doc/invoke.texi| 16 ++
 .../vect/costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c   |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c   |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c   |  2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c |  2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/misalign-1.c  |  2 +-
 gcc/testsuite/lib/target-supports.exp  | 35 --
 13 files changed, 88 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))
 
-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c5c4c777349..9704ff9c6a0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -288,6 +288,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  bool vector_unaligned_access;
   bool use_divmod_expansion;
   bool overlap_op_by_pieces;
   unsigned int fusible_ops;
@@ -300,6 +301,10 @@ struct riscv_tune_param
 /* Whether unaligned accesses execute very slowly.  */
 bool riscv_slow_unaligned_access_p;
 
+/* Whether misaligned vector accesses are supported (i.e. do not
+   throw an exception).  */
+bool riscv_vector_unaligned_access_p;
+
 /* Whether user explicitly passed -mstrict-align.  */
 bool riscv_user_wants_strict_align;
 
@@ -442,6 +447,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* vector_unaligned_access */
   false,   /* use_divmod_expansion */
   false,   

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add Zfbfmin extension

2024-06-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:f11cbf2edfbd9615cf0d8519bd7a570a2ae00397

commit f11cbf2edfbd9615cf0d8519bd7a570a2ae00397
Author: Xiao Zeng 
Date:   Wed May 15 13:56:42 2024 +0800

RISC-V: Add Zfbfmin extension

1 In the previous patch, the libcall for BF16 was implemented:



2 Riscv provides Zfbfmin extension, which completes the "Scalar BF16 
Converts":



3 Implemented replacing libcall with Zfbfmin extension instruction.

4 Reused previous testcases in:


gcc/ChangeLog:

* config/riscv/iterators.md: Add mode_iterator between
floating-point modes and BFmode.
* config/riscv/riscv.cc (riscv_output_move): Handle BFmode move
for zfbfmin.
* config/riscv/riscv.md (truncbf2): New pattern for BFmode.
(extendbfsf2): Dotto.
(*movhf_hardfloat): Add BFmode.
(*mov_hardfloat): Dotto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin-bf16_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin-bf16_comparison.c: New test.
* gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c: New test.

(cherry picked from commit 4638e508aa814d4aa2e204c3ab041c6a56aad2bd)

Diff:
---
 gcc/config/riscv/iterators.md  |  6 +-
 gcc/config/riscv/riscv.cc  |  4 +-
 gcc/config/riscv/riscv.md  | 49 +---
 .../gcc.target/riscv/zfbfmin-bf16_arithmetic.c | 35 
 .../gcc.target/riscv/zfbfmin-bf16_comparison.c | 33 +++
 .../riscv/zfbfmin-bf16_float_libcall_convert.c | 45 +++
 .../riscv/zfbfmin-bf16_integer_libcall_convert.c   | 66 ++
 7 files changed, 228 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 3c139bc2e30..1e37e843023 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -78,9 +78,13 @@
 ;; Iterator for floating-point modes that can be loaded into X registers.
 (define_mode_iterator SOFTF [SF (DF "TARGET_64BIT") (HF "TARGET_ZFHMIN")])
 
-;; Iterator for floating-point modes of BF16
+;; Iterator for floating-point modes of BF16.
 (define_mode_iterator HFBF [HF BF])
 
+;; Conversion between floating-point modes and BF16.
+;; SF to BF16 have hardware instructions.
+(define_mode_iterator FBF [HF DF TF])
+
 ;; ---
 ;; Mode attributes
 ;; ---
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 10af38a5a81..c5c4c777349 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4310,7 +4310,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.x.h\t%0,%1";
/* Using fmv.x.s + sign-extend to emulate fmv.x.h.  */
return "fmv.x.s\t%0,%1;slli\t%0,%0,16;srai\t%0,%0,16";
@@ -4366,7 +4366,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.h.x\t%0,%z1";
/* High 16 bits should be all-1, otherwise HW will treated
   as a n-bit canonical NaN, but isn't matter for softfloat.  */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 25d341ec987..e57bfcf616a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1763,6 +1763,31 @@
   [(set_attr "type" "fcvt")
(set_attr "mode" "HF")])
 
+(define_insn "truncsfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:SF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  "fcvt.bf16.s\t%0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+;; The conversion of HF/DF/TF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __trunc[h|d|t]fbf2.
+(define_expand "truncbf2"
+  [(set (match_operand:BF  0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:FBF   1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, mode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
 ;;
 ;;  

Re: [PATCH] expmed: TRUNCATE value1 if needed in store_bit_field_using_insv

2024-06-05 Thread Jeff Law




On 6/5/24 8:57 AM, YunQiang Su wrote:

Richard Sandiford  于2024年6月5日周三 22:14写道:


YunQiang Su  writes:

PR target/113179.

In `store_bit_field_using_insv`, we just use SUBREG if value_mode

= op_mode, while in some ports, a sign_extend will be needed,

such as MIPS64:
   If either GPR rs or GPR rt does not contain sign-extended 32-bit
   values (bits 63..31 equal), then the result of the operation is
   UNPREDICTABLE.

The problem happens for the code like:
   struct xx {
 int a:4;
 int b:24;
 int c:3;
 int d:1;
   };

   void xx (struct xx *a, long long b) {
 a->d = b;
   }

In the above code, the hard register contains `b`, may be note well
sign-extended.

gcc/
   PR target/113179
   * expmed.c(store_bit_field_using_insv): TRUNCATE value1 if
   needed.

gcc/testsuite
   PR target/113179
   * gcc.target/mips/pr113179.c: New tests.
---
  gcc/expmed.cc| 12 +---
  gcc/testsuite/gcc.target/mips/pr113179.c | 18 ++
  2 files changed, 27 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/mips/pr113179.c

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 4ec035e4843..6a582593da8 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -704,9 +704,15 @@ store_bit_field_using_insv (const extraction_insn *insv, 
rtx op0,
   }
 else
   {
-   tmp = gen_lowpart_if_possible (op_mode, value1);
-   if (! tmp)
- tmp = gen_lowpart (op_mode, force_reg (value_mode, value1));
+   if (targetm.mode_rep_extended (op_mode, value_mode))
+ tmp = simplify_gen_unary (TRUNCATE, op_mode,
+   value1, value_mode);
+   else
+ {
+   tmp = gen_lowpart_if_possible (op_mode, value1);
+   if (! tmp)
+ tmp = gen_lowpart (op_mode, force_reg (value_mode, value1));
+ }
   }
 value1 = tmp;
   }


I notice this patch is already applied.  Was it approved?  I didn't
see an approval in my feed or in the archives.



Sorry. I was supposed that it only effects MIPS targets since only MIPS defines
   targetm.mode_rep_extended
Just a note for the future, even if something is guarded by a target 
hook that's only defined by a single target we'd want to see an ACK if 
the code is in a target independent file.


Jeff


Re: [V2 PATCH] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-06-05 Thread Jeff Law




On 6/4/24 10:22 PM, liuhongt wrote:

Can you add a testcase for this?  I don't mind if it's x86 specific and
does a bit of asm scanning.

Also note that the context for this patch has changed, so it won't
automatically apply.  So be extra careful when updating so that it goes
into the right place (all the more reason to have a testcase validating
that the optimization works correctly).


I think the patch itself is fine.  So further review is just for the
testcase and should be easy.

rebased and add a testcase.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?


When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
   (ashifrt:v8hi A 8)
   (const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

gcc/ChangeLog:

PR target/114428
* simplify-rtx.cc
(simplify_context::simplify_binary_operation_1):
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
specific mask.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114428-1.c: New test.

OK.

Being x264 related, I took a quick glance at RISC-V before/after and 
seems to be slightly better as well.


Jeff


Re: [PATCH] Record edge true/false value for gcov

2024-06-05 Thread Jeff Law




On 6/4/24 6:26 AM, Jørgen Kvalsvik wrote:

Make gcov aware which edges are the true/false to more accurately
reconstruct the CFG.  There are plenty of bits left in arc_info and it
opens up for richer reporting.

gcc/ChangeLog:

* gcov-io.h (GCOV_ARC_TRUE): New.
(GCOV_ARC_FALSE): New.
* gcov.cc (struct arc_info): Add true_value, false_value.
(read_graph_file): Read true_value, false_value.
Going to trust you'll find this useful in the near future :-)  So OK for 
the trunk.


jeff



Re: [PATCH] expmed: TRUNCATE value1 if needed in store_bit_field_using_insv

2024-06-05 Thread Jeff Law




On 6/5/24 8:14 AM, Richard Sandiford wrote:

YunQiang Su  writes:

PR target/113179.

In `store_bit_field_using_insv`, we just use SUBREG if value_mode

= op_mode, while in some ports, a sign_extend will be needed,

such as MIPS64:
   If either GPR rs or GPR rt does not contain sign-extended 32-bit
   values (bits 63..31 equal), then the result of the operation is
   UNPREDICTABLE.

The problem happens for the code like:
   struct xx {
 int a:4;
 int b:24;
 int c:3;
 int d:1;
   };

   void xx (struct xx *a, long long b) {
 a->d = b;
   }

In the above code, the hard register contains `b`, may be note well
sign-extended.

gcc/
PR target/113179
* expmed.c(store_bit_field_using_insv): TRUNCATE value1 if
needed.

gcc/testsuite
PR target/113179
* gcc.target/mips/pr113179.c: New tests.
---
  gcc/expmed.cc| 12 +---
  gcc/testsuite/gcc.target/mips/pr113179.c | 18 ++
  2 files changed, 27 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/mips/pr113179.c

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 4ec035e4843..6a582593da8 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -704,9 +704,15 @@ store_bit_field_using_insv (const extraction_insn *insv, 
rtx op0,
}
  else
{
- tmp = gen_lowpart_if_possible (op_mode, value1);
- if (! tmp)
-   tmp = gen_lowpart (op_mode, force_reg (value_mode, value1));
+ if (targetm.mode_rep_extended (op_mode, value_mode))
+   tmp = simplify_gen_unary (TRUNCATE, op_mode,
+ value1, value_mode);
+ else
+   {
+ tmp = gen_lowpart_if_possible (op_mode, value1);
+ if (! tmp)
+   tmp = gen_lowpart (op_mode, force_reg (value_mode, value1));
+   }
}
  value1 = tmp;
}


I notice this patch is already applied.  Was it approved?  I didn't
see an approval in my feed or in the archives.
I don't see an approval and this patch was in my pending-patches box as 
unresolved.


Jeff


Re: [PATCH 0/2] fix RISC-V zcmp popretz [PR113715]

2024-06-05 Thread Jeff Law




On 6/5/24 1:47 AM, Fei Gao wrote:


On 2024-06-05 14:36  Kito Cheng  wrote:


Thanks for fixing this issue, and I am wondering doest it possible to
fix that without introduce target hook? I ask that because...GCC 14
also has this bug, but I am not sure it's OK to introduce new target
hook for release branch? or would you suggest we just revert patch to
fix that on GCC 14?


If hook is unacceptable in GCC14, I suggest to revert on GCC 14 the following 
commit.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864

I started fixing this issue by adding changes in mach pass but abandoned it
due to the following reasons:
1. more codes to detect location of epilogue in the whole insn list.
2. due to impact by scheduling pass, clear a0 and use a0 insns get reordered, 
resulting in more
     codes.
3. data flow analysis is needed, but insn does't have bb info any more, so 
rescan actually does
     nothing, which I guess there's some hidden issue in 
riscv_remove_unneeded_save_restore_calls
     using dfa.

So I came up this hook based patch in prologue and epilogue pass to make the 
optimization
happen as earlier as possible. It ends up with simplicity and clear logic.
But let's back up and get a good explanation of what the problem is. 
Based on patch 2/2 it looks like we have lost an assignment to the 
return register.


To someone not familiar with this code, it sounds to me like we've made 
a mistake earlier and we're now defining a hook that lets us go back and 
fix that earlier mistake.   I'm probably wrong, but so far that's what 
it sounds like.


So let's get a good explanation of the problem and perhaps we'll find a 
better way to solve it.


jeff




Re: [PATCH 0/2] fix RISC-V zcmp popretz [PR113715]

2024-06-05 Thread Jeff Law




On 6/5/24 12:36 AM, Kito Cheng wrote:

Thanks for fixing this issue, and I am wondering doest it possible to
fix that without introduce target hook? I ask that because...GCC 14
also has this bug, but I am not sure it's OK to introduce new target
hook for release branch? or would you suggest we just revert patch to
fix that on GCC 14?
The question I would ask is why is the target hook needed.  ie, what 
problem needs to be solved and how does a new target hook help.




Jeff


Re: Which GCC version start to support RISC-V RVV1.0

2024-06-05 Thread Jeff Law via Gcc




On 6/4/24 8:51 PM, Erick Kuo-Chen Huang(黃國鎭) via Gcc-help wrote:

Hi,

We would like to know which GCC version start to support RISC-V RVV1.0 ?
We appreciate for your help.

gcc-14.

Jeff



Re: [PATCH-1v2] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-05 Thread Jeff Law




On 6/5/24 3:08 AM, Richard Sandiford wrote:

HAO CHEN GUI  writes:

Hi,
   This patch replaces rtx_cost with insn_cost in forward propagation.
In the PR, one constant vector should be propagated and replace a
pseudo in a store insn if we know it's a duplicated constant vector.
It reduces the insn cost but not rtx cost. In this case, the cost is
determined by destination operand (memory or pseudo). Unfortunately,
rtx cost can't help.

   The test case is added in the second target specific patch.
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643995.html

   Compared to previous version, the main change is not to do
substitution if either new or old insn cost is zero. The zero means
the cost is unknown.

  Previous version
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643994.html

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

ChangeLog
fwprop: Replace set_src_cost with insn_cost in try_fwprop_subst_pattern

gcc/
* fwprop.cc (try_fwprop_subst_pattern): Replace set_src_cost with
insn_cost.


Thanks for doing this.  It's definitely the right direction, but:


patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index cb6fd6700ca..184a22678b7 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -470,21 +470,19 @@ try_fwprop_subst_pattern (obstack_watermark , 
insn_change _change,
redo_changes (0);
  }

-  /* ??? In theory, it should be better to use insn costs rather than
- set_src_costs here.  That would involve replacing this code with
- change_is_worthwhile.  */


...as hinted at in the comment, rtl-ssa already has a routine for
insn_cost-based calculations.  It has two (supposed) advantages:
it caches the old costs, and it takes execution frequency into
account when optimising for speed.

The comment is out of date though.  The name of the routine is
changes_are_worthwhile rather than change_is_worthwhile.  Could you
try using that instead?
Funny, I went wandering around looking for that function to see how it 
was implemented and how it might compare to what was being proposed.


Of course I never found it, even after rewinding to various old git 
hashes that looked promising.


So, yea, definitely would prefer re-using changes_are_worthwhile if it 
works reasonably well for the issue at hand.


jeff



Re: [PATCH v4] RISC-V: Introduce -mvector-strict-align.

2024-06-04 Thread Jeff Law




On 5/28/24 1:19 PM, Robin Dapp wrote:

Hi,

this patch disables movmisalign by default and introduces
the -mno-vector-strict-align option to override it and re-enable
movmisalign.  For now, generic-ooo is the only uarch that supports
misaligned vector access.

The patch also adds a check_effective_target_riscv_v_misalign_ok to
the testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

Changes from v3:
  - Adressed Kito's comments.
  - Made -mscalar-strict-align a real alias.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and map to riscv_vector_unaligned_access_p.
* config/riscv/riscv.opt: Add -mvector-strict-align.
* config/riscv/riscv.cc (struct riscv_tune_param): Add
vector_unaligned_access.
(riscv_override_options_internal): Set
riscv_vector_unaligned_access_p.
* doc/invoke.texi: Document -mvector-strict-align.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mno-vector-strict-align.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
So per the patchwork discussion this morning let's go ahead with this, 
knowing we may need to revisit for:


1. Coordination with LLVM on option naming/behavior.  LLVM will have a 
release before gcc-15, so if at all possible we should follow their lead 
on option naming.


2. Adjusting defaults once kernel unaligned trap handlers are in place.

Palmer is going to reach out to David on his team to tray and push 
things towards using generic-ooo tuning for Fedora on RISC-V.  I'll do 
the same with Ventana's contacts at Canonical (Heinrich & Gordon).


I expect we're better aligned with Fedora on this topic -- Fedora feeds 
RHEL which isn't likely to care about SBCs, so cores that Fedora is 
going to be the most interested in over time are much more likely to 
handle unaligned vector loads/stores in hardware.  So the path we want 
lines up with Fedora quite well, IMHO.


Canonical seems to be more interested in supporting these SBCs, so they 
may have a harder time with a default to ooo-generic since it'll either 
result in binaries that don't work (today) or have poor performance 
(future with kernel trap unaligned trap handlers updated).


Jeff


Re: [PATCH] Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

2024-06-04 Thread Jeff Law




On 5/26/24 7:08 PM, liuhongt wrote:

Update in V2:
Guard constant folding for overflow value in
fold_convert_const_int_from_real with flag_trapping_math.
Add -fno-trapping-math to related testcases which warn for overflow
in conversion from floating point to integer.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

According to IEEE standard, for conversions from floating point to
integer. When a NaN or infinite operand cannot be represented in the
destination format and this cannot otherwise be indicated, the invalid
operation exception shall be signaled. When a numeric operand would
convert to an integer outside the range of the destination format, the
invalid operation exception shall be signaled if this situation cannot
otherwise be indicated.

The patch prevent simplication of the conversion from floating point
to integer for NAN/INF/out-of-range constant when flag_trapping_math.

gcc/ChangeLog:

PR rtl-optimization/100927
PR rtl-optimization/115161
PR rtl-optimization/115115
* simplify-rtx.cc (simplify_const_unary_operation): Prevent
simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range
constant when flag_trapping_math.
* fold-const.cc (fold_convert_const_int_from_real): Don't fold
for overflow value when_trapping_math.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100927.c: New test.
* c-c++-common/Wconversion-1.c: Add -fno-trapping-math.
* c-c++-common/dfp/convert-int-saturate.c: Ditto.
* g++.dg/ubsan/pr63956.C: Ditto.
* g++.dg/warn/Wconversion-real-integer.C: Ditto.
* gcc.c-torture/execute/20031003-1.c: Ditto.
* gcc.dg/Wconversion-complex-c99.c: Ditto.
* gcc.dg/Wconversion-real-integer.c: Ditto.
* gcc.dg/c90-const-expr-11.c: Ditto.
* gcc.dg/overflow-warn-8.c: Ditto.

OK.  Thanks.

jeff




Re: [PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-04 Thread Jeff Law




On 1/25/24 6:16 PM, HAO CHEN GUI wrote:

Hi,
   This patch replaces rtx_cost with insn_cost in forward propagation.
In the PR, one constant vector should be propagated and replace a
pseudo in a store insn if we know it's a duplicated constant vector.
It reduces the insn cost but not rtx cost. In this case, the kind of
destination operand (memory or pseudo) decides the cost and rtx cost
can't reflect it.

   The test case is added in the second target specific patch.

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen


ChangeLog
fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern

gcc/
PR target/113325
* fwprop.cc (try_fwprop_subst_pattern): Replace rtx_cost with
insn_cost.

Testcase?  I don't care of it's ppc specific.

I think we generally want to move from rtx_cost to insn_cost, so I think 
the change itself is fine.  We just want to make sure a test covers the 
change in some manner.


Also note this a change to generic code and could likely trigger 
failures on various targets that have assembler scanning tests.  So once 
you've got a testcase and the full patch is ack'd we'll need to watch 
closely for regressions reported on other targets.



So ACK'd once you add a testcase.

Jeff


Re: [PATCH 1/2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-06-04 Thread Jeff Law




On 5/23/24 8:25 PM, Hongtao Liu wrote:

CC for review.

On Tue, May 21, 2024 at 1:12 PM liuhongt  wrote:


When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
   (ashifrt:v8hi A 8)
   (const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok of trunk?

gcc/ChangeLog:

 PR target/114428
 * simplify-rtx.cc
 (simplify_context::simplify_binary_operation_1):
 Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
 specific mask.


Can you add a testcase for this?  I don't mind if it's x86 specific and 
does a bit of asm scanning.


Also note that the context for this patch has changed, so it won't 
automatically apply.  So be extra careful when updating so that it goes 
into the right place (all the more reason to have a testcase validating 
that the optimization works correctly).



I think the patch itself is fine.  So further review is just for the 
testcase and should be easy.


jeff

ps.  It seems to help RISC-V as well :-)




Re: [PATCH] Add config file so b4 uses inbox.sourceware.org automatically

2024-06-03 Thread Jeff Law




On 5/23/24 9:49 AM, Jonathan Wakely wrote:

It looks like my patch[1] to make b4 figure this out automagically won't
be accepted, so this makes it work for GCC. A similar commit could be
done for each project hosted on sourceware.org if desired.

[1] 
https://lore.kernel.org/tools/20240523143752.385810-1-jwak...@redhat.com/T/#u

OK for trunk?

-- >8 --

This makes b4 use inbox.sourceware.org instead of the default host
lore.kernel.org, so that every b4 user doesn't have to configure this
themselves.

ChangeLog:

* .b4-config: New file.
Given the impact on anyone not using b4 is nil and it makes life easier 
for those using b4, this seems trivially OK for the trunk.


jeff



Re: [PATCH v2] RISC-V: Add Zfbfmin extension

2024-06-03 Thread Jeff Law




On 6/1/24 1:45 AM, Xiao Zeng wrote:

1 In the previous patch, the libcall for BF16 was implemented:


2 Riscv provides Zfbfmin extension, which completes the "Scalar BF16 Converts":


3 Implemented replacing libcall with Zfbfmin extension instruction.

4 Reused previous testcases in:

gcc/ChangeLog:

* config/riscv/iterators.md: Add mode_iterator between
floating-point modes and BFmode.
* config/riscv/riscv.cc (riscv_output_move): Handle BFmode move
for zfbfmin.
* config/riscv/riscv.md (truncbf2): New pattern for BFmode.
(extendbfsf2): Dotto.
(*movhf_hardfloat): Add BFmode.
(*mov_hardfloat): Dotto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin-bf16_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin-bf16_comparison.c: New test.
* gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c: New test.

OK for the trunk.  Thanks!

jeff



Re: [PATCH] RISC-V: Add min/max patterns for ifcvt.

2024-06-03 Thread Jeff Law




On 6/3/24 11:03 AM, Palmer Dabbelt wrote:



+;; Provide a minmax pattern for ifcvt to match.
+(define_insn "*_cmp_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+    (if_then_else:X
+    (bitmanip_minmax_cmp_op
+    (match_operand:X 1 "register_operand" "r")
+    (match_operand:X 2 "register_operand" "r"))
+    (match_dup 1)
+    (match_dup 2)))]
+  "TARGET_ZBB"
+  "\t%0,%1,%z2"
+  [(set_attr "type" "")])


This is a bit different than how we're handling the other min/max type 
attributes


    (define_insn "*3"
  [(set (match_operand:X 0 "register_operand" "=r")
    (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "reg_or_0_operand" "rJ")))]
  "TARGET_ZBB"
  "\t%0,%1,%z2"
  [(set_attr "type" "")])

but it looks like it ends up with the same types after all the iterators 
(there's some "max vs smax" and "smax vs maxs" juggling, but IIUC it 
ends up in the same place).  I think it'd be clunkier to try and use all 
the same iterators, though, so


Reviewed-by: Palmer Dabbelt 

[I was wondering if we need the reversed, Jeff on the call says we 
don't.  I couldn't figure out how to write a test for it.]
Right.  I had managed to convince myself that we didn't need the 
reversed case.  I'm less sure now than I was earlier, but I'm also 
confident that if the need arises we can trivially handle it.  At some 
point there's canonicalization of the condition and that's almost 
certainly what's making it hard to synthesize a testcase for the 
reversed pattern.


The other thing I pondered was whether or not we should support SImode 
min/max on rv64.  It was critical for simplifying that abs2 routine in 
x264, but I couldn't convince myself it was needed here.  So I just set 
it aside and didn't mention it.


jeff


Re: [PATCH] check_GNU_style: Use raw strings.

2024-06-03 Thread Jeff Law




On 5/31/24 1:38 PM, Robin Dapp wrote:

Hi,

this silences some warnings when using check_GNU_style.

I didn't expect this to have any bootstrap or regtest impact
but I still ran it on x86 - no change.

Regards
  Robin

contrib/ChangeLog:

* check_GNU_style_lib.py: Use raw strings for regexps.

OK
jeff



Re: Epiphany target

2024-06-03 Thread Jeff Law via Gcc




On 6/3/24 8:12 AM, Andreas Olofsson via Gcc wrote:

Hi,

Letting the community know that we are working on getting the Epiphany port
back to a proper operational state. There will be a GCC maintainer assigned
soon. New Epiphany silicon is in development and old Epiphany devices are
still in circulation.
Good to hear.  You may be better off first switching to LRA (all non-LRA 
ports are to be deprecated this release IIRC).


I suspect that will be a better use of time than first fixing epiphany 
to work with the deprecated reload infrastructure, then switching it to LRA.


In addition to the LRA conversion, getting a degree of stability in the 
testsuite would be a huge step forward.


jeff



[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6eb2b8506e4123b00c32b8a23bafdee4c8c8b7f8

commit 6eb2b8506e4123b00c32b8a23bafdee4c8c8b7f8
Author: Jiawei 
Date:   Mon May 27 15:40:51 2024 +0800

tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at 
tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases.

Return NULL_TREE when genop3 equal EXACT_DIV_EXPR.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html

version log v3: remove additional POLY_INT_CST check.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652795.html

gcc/ChangeLog:

* tree-ssa-pre.cc (create_component_ref_by_pieces_1): New 
conditions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr115214.c: New test.

(cherry picked from commit c9842f99042454bef99fe82506c6dd50f34e283e)

Diff:
---
 .../gcc.target/riscv/rvv/vsetvl/pr115214.c | 52 ++
 gcc/tree-ssa-pre.cc| 10 +++--
 2 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
new file mode 100644
index 000..fce2e9da766
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 -w" 
} */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+
+#include 
+
+static inline __attribute__(()) int vaddq_f32();
+static inline __attribute__(()) int vload_tillz_f32(int nlane) {
+  vint32m1_t __trans_tmp_9;
+  {
+int __trans_tmp_0 = nlane;
+{
+  vint64m1_t __trans_tmp_1;
+  vint64m1_t __trans_tmp_2;
+  vint64m1_t __trans_tmp_3;
+  vint64m1_t __trans_tmp_4;
+  if (__trans_tmp_0 == 1) {
+{
+  __trans_tmp_3 =
+  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
+}
+__trans_tmp_4 = __trans_tmp_2;
+  }
+  __trans_tmp_4 = __trans_tmp_3;
+  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
+}
+  }
+  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' cannot 
be passed to an unprototyped function} } */
+}
+
+char CFLOAT_add_args[3];
+const int *CFLOAT_add_steps;
+const int CFLOAT_steps;
+
+__attribute__(()) void CFLOAT_add() {
+  char *b_src0 = _add_args[0], *b_src1 = _add_args[1],
+   *b_dst = _add_args[2];
+  const float *src1 = (float *)b_src1;
+  float *dst = (float *)b_dst;
+  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
+  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
+  const int hstep = 4 / 2;
+  vfloat32m1x2_t a;
+  int len = 255;
+  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
+int b = vload_tillz_f32(len);
+int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
'__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
+  }
+  for (; len > 0; --len, b_src0 += CFLOAT_steps,
+  b_src1 += CFLOAT_add_steps[1], b_dst += CFLOAT_add_steps[2])
+;
+}
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 75217f5cde1..5cf1968bc26 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -2685,11 +2685,15 @@ create_component_ref_by_pieces_1 (basic_block block, 
vn_reference_t ref,
   here as the element alignment may be not visible.  See
   PR43783.  Simply drop the element size for constant
   sizes.  */
-   if (TREE_CODE (genop3) == INTEGER_CST
+   if ((TREE_CODE (genop3) == INTEGER_CST
&& TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
&& wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
-(wi::to_offset (genop3)
- * vn_ref_op_align_unit (currop
+(wi::to_offset (genop3) * vn_ref_op_align_unit 
(currop
+ || (TREE_CODE (genop3) == EXACT_DIV_EXPR
+   && TREE_CODE (TREE_OPERAND (genop3, 1)) == INTEGER_CST
+   && operand_equal_p (TREE_OPERAND (genop3, 0), TYPE_SIZE_UNIT 
(elmt_type))
+   && wi::eq_p (wi::to_offset (TREE_OPERAND (genop3, 1)),
+vn_ref_op_align_unit (currop
  genop3 = NULL_TREE;
else
  {


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Just the riscv bits from:

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d9181d77435b2037dfbdf7e1b54de8d3e2748beb

commit d9181d77435b2037dfbdf7e1b54de8d3e2748beb
Author: Jeff Law 
Date:   Sun Jun 2 13:19:16 2024 -0600

Just the riscv bits from:

commit a0d60660f2aae2d79685f73d568facb2397582d8
Author: Andrew Pinski 
Date:   Wed May 29 20:40:31 2024 -0700

Fix some opindex for some options [PR115022]

While looking at the index I noticed that some options had
`-` in the front for the index which is wrong. And then
I noticed there was no index for `mcmodel=` for targets or had
used `-mcmodel` incorrectly.

This fixes both of those and regnerates the urls files see that
`-mcmodel=` option now has an url associated with it.

gcc/ChangeLog:

PR target/115022
* doc/invoke.texi (fstrub=disable): Fix opindex.
(minline-memops-threshold): Fix opindex.
(mcmodel=): Add opindex and fix them.
* common.opt.urls: Regenerate.
* config/aarch64/aarch64.opt.urls: Regenerate.
* config/bpf/bpf.opt.urls: Regenerate.
* config/i386/i386.opt.urls: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* config/nds32/nds32-elf.opt.urls: Regenerate.
* config/nds32/nds32-linux.opt.urls: Regenerate.
* config/or1k/or1k.opt.urls: Regenerate.
* config/riscv/riscv.opt.urls: Regenerate.
* config/rs6000/aix64.opt.urls: Regenerate.
* config/rs6000/linux64.opt.urls: Regenerate.
* config/sparc/sparc.opt.urls: Regenerate.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/config/riscv/riscv.opt.urls | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index e02ef3ee3dd..d87e9d5c9a8 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -41,7 +41,8 @@ UrlSuffix(gcc/RISC-V-Options.html#index-msave-restore)
 mshorten-memrefs
 UrlSuffix(gcc/RISC-V-Options.html#index-mshorten-memrefs)
 
-; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
+mcmodel=
+UrlSuffix(gcc/RISC-V-Options.html#index-mcmodel_003d-4)
 
 mstrict-align
 UrlSuffix(gcc/RISC-V-Options.html#index-mstrict-align-4)


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed] [RISC-V] Use Zbkb for general 64 bit constants when profitable

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e26b14182c2c18deab641b4b81fc53c456573818

commit e26b14182c2c18deab641b4b81fc53c456573818
Author: Jeff Law 
Date:   Fri May 31 21:45:01 2024 -0600

[to-be-committed] [RISC-V] Use Zbkb for general 64 bit constants when 
profitable

Basically this adds the ability to generate two independent constants during
synthesis, then bring them together with a pack instruction. Thus we never 
need
to go out to the constant pool when zbkb is enabled. The worst sequence we 
ever
generate is

lui+addi+lui+addi+pack

Obviously if either half can be synthesized with just a lui or just an addi,
then we'll DTRT automagically.   So for example:

unsigned long foo_0xf857f2def857f2de(void) {
return 0x14252800;
}

The high and low halves are just a lui.  So the final synthesis is:

> li  a5,671088640# 15[c=4 l=4]  *movdi_64bit/1
> li  a0,337969152# 16[c=4 l=4]  *movdi_64bit/1
> packa0,a5,a0# 17[c=12 l=4]  riscv_xpack_di_si_2

On the implementation side, I think the bits I've put in here likely can be
used to handle the repeating constant case for !zbkb.  I think it likely 
could
be used to help capture cases where the upper half can be derived from the
lower half (say by turning a bit on or off, shifting or something similar).
The key in both of these cases is we need a temporary register holding an
intermediate value.

Ventana's internal tester enables zbkb, but I don't think any of the other
testers currently exercise zbkb.  We'll probably want to change that at some
point, but I don't think it's super-critical yet.

While I can envision a few more cases where we could improve constant
synthesis,   No immediate plans to work in this space, but if someone is
interested, some thoughts are recorded here:

> 
https://wiki.riseproject.dev/display/HOME/CT_00_031+--+Additional+Constant+Synthesis+Improvements

gcc/
* config/riscv/riscv.cc (riscv_integer_op): Add new field.
(riscv_build_integer_1): Initialize the new field.
(riscv_built_integer): Recognize more cases where Zbkb's
pack instruction is profitable.
(riscv_move_integer): Loop over all the codes.  If requested,
save the current constant into a temporary.  Generate pack
for more cases using the saved constant.

gcc/testsuite

* gcc.target/riscv/synthesis-10.c: New test.

(cherry picked from commit c0ded050cd29cc73f78cb4ab23674c7bc024969e)

Diff:
---
 gcc/config/riscv/riscv.cc | 108 ++
 gcc/testsuite/gcc.target/riscv/synthesis-10.c |  18 +
 2 files changed, 110 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 91fefacee80..10af38a5a81 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -250,6 +250,7 @@ struct riscv_arg_info {
and each VALUE[i] is a constant integer.  CODE[0] is undefined.  */
 struct riscv_integer_op {
   bool use_uw;
+  bool save_temporary;
   enum rtx_code code;
   unsigned HOST_WIDE_INT value;
 };
@@ -759,6 +760,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   codes[0].code = UNKNOWN;
   codes[0].value = value;
   codes[0].use_uw = false;
+  codes[0].save_temporary = false;
   return 1;
 }
   if (TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (value))
@@ -767,6 +769,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   codes[0].code = UNKNOWN;
   codes[0].value = value;
   codes[0].use_uw = false;
+  codes[0].save_temporary = false;
 
   /* RISC-V sign-extends all 32bit values that live in a 32bit
 register.  To avoid paradoxes, we thus need to use the
@@ -796,6 +799,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  alt_codes[alt_cost-1].code = PLUS;
  alt_codes[alt_cost-1].value = low_part;
  alt_codes[alt_cost-1].use_uw = false;
+ alt_codes[alt_cost-1].save_temporary = false;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -810,6 +814,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  alt_codes[alt_cost-1].code = XOR;
  alt_codes[alt_cost-1].value = low_part;
  alt_codes[alt_cost-1].use_uw = false;
+ alt_codes[alt_cost-1].save_temporary = false;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -852,6 +857,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  alt_codes[alt_cost-1].code = ASHIFT;
  alt_codes[alt_cost-1].value = shift;
  

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Remove dead perm series code and document.

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:43c16b2ab69f25e4705b9582ed0ac921e1ec620e

commit 43c16b2ab69f25e4705b9582ed0ac921e1ec620e
Author: Robin Dapp 
Date:   Fri May 17 12:48:52 2024 +0200

RISC-V: Remove dead perm series code and document.

With the introduction of shuffle_series_patterns the explicit handler
code for a perm series is dead.  This patch removes it and also adds
a function-level comment to shuffle_series_patterns.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Document.
(shuffle_extract_and_slide1up_patterns): Remove.

(cherry picked from commit 30cfdd6ff56972d9d1b9dbdd43a8333c85618775)

Diff:
---
 gcc/config/riscv/riscv-v.cc | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9428beca268..948aaf7d8dd 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1485,28 +1485,6 @@ expand_const_vector (rtx target, rtx src)
  emit_vlmax_insn (code_for_pred_merge (mode), MERGE_OP, ops);
}
}
-  else if (npatterns == 1 && nelts_per_pattern == 3)
-   {
- /* Generate the following CONST_VECTOR:
-{ base0, base1, base1 + step, base1 + step * 2, ... }  */
- rtx base0 = builder.elt (0);
- rtx base1 = builder.elt (1);
- rtx base2 = builder.elt (2);
-
- rtx step = simplify_binary_operation (MINUS, builder.inner_mode (),
-   base2, base1);
-
- /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
- rtx tmp = gen_reg_rtx (mode);
- expand_vec_series (tmp, base1, step);
- /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
- if (!rtx_equal_p (base0, const0_rtx))
-   base0 = force_reg (builder.inner_mode (), base0);
-
- insn_code icode = optab_handler (vec_shl_insert_optab, mode);
- gcc_assert (icode != CODE_FOR_nothing);
- emit_insn (GEN_FCN (icode) (target, tmp, base0));
-   }
   else
/* TODO: We will enable more variable-length vector in the future.  */
gcc_unreachable ();
@@ -3580,6 +3558,10 @@ shuffle_extract_and_slide1up_patterns (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* This looks for a series pattern in the provided vector permute structure D.
+   If successful it emits a series insn as well as a gather to implement it.
+   Return true if successful, false otherwise.  */
+
 static bool
 shuffle_series_patterns (struct expand_vec_perm_d *d)
 {


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add vector popcount, clz, ctz.

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:36260f7a2be90ed27498a28c4d0490414db1491f

commit 36260f7a2be90ed27498a28c4d0490414db1491f
Author: Robin Dapp 
Date:   Wed May 15 17:41:07 2024 +0200

RISC-V: Add vector popcount, clz, ctz.

This patch adds the zvbb vcpop, vclz and vctz to the autovec machinery
as well as tests for them.

gcc/ChangeLog:

* config/riscv/autovec.md (ctz2): New expander.
(clz2): Ditto.
* config/riscv/generic-vector-ooo.md: Add bitmanip ops to insn
reservation.
* config/riscv/vector-crypto.md: Add VLS modes to insns.
* config/riscv/vector.md: Add bitmanip ops to mode_idx and other
attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: Adjust check
for zvbb.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-template.h: New test.

(cherry picked from commit 6fa4b0135439d64c0ea1816594d7dc830e836376)

Diff:
---
 gcc/config/riscv/autovec.md|  30 -
 gcc/config/riscv/generic-vector-ooo.md |   2 +-
 gcc/config/riscv/vector-crypto.md  | 137 +++--
 gcc/config/riscv/vector.md |  14 +--
 .../gcc.target/riscv/rvv/autovec/unop/clz-1.c  |   8 ++
 .../gcc.target/riscv/rvv/autovec/unop/clz-run.c|  36 ++
 .../riscv/rvv/autovec/unop/clz-template.h  |  21 
 .../gcc.target/riscv/rvv/autovec/unop/ctz-1.c  |   8 ++
 .../gcc.target/riscv/rvv/autovec/unop/ctz-run.c|  36 ++
 .../riscv/rvv/autovec/unop/ctz-template.h  |  21 
 .../gcc.target/riscv/rvv/autovec/unop/popcount-1.c |   4 +-
 .../gcc.target/riscv/rvv/autovec/unop/popcount-2.c |   4 +-
 .../gcc.target/riscv/rvv/autovec/unop/popcount-3.c |   8 ++
 .../riscv/rvv/autovec/unop/popcount-run-1.c|   3 +-
 .../riscv/rvv/autovec/unop/popcount-template.h |  21 
 15 files changed, 272 insertions(+), 81 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 87d4171bc89..15db26d52c6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1566,7 +1566,7 @@
 })
 
 ;; 
---
-;; - [INT] POPCOUNT.
+;; - [INT] POPCOUNT, CTZ and CLZ.
 ;; 
---
 
 (define_expand "popcount2"
@@ -1574,10 +1574,36 @@
(match_operand:V_VLSI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_popcount (operands);
+  if (!TARGET_ZVBB)
+riscv_vector::expand_popcount (operands);
+  else
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred_v (POPCOUNT, mode),
+riscv_vector::CPOP_OP, operands);
+}
   DONE;
 })
 
+(define_expand "ctz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CTZ, mode),
+  riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
+(define_expand "clz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CLZ, mode),
+  riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
 
 ;; -
 ;;  [INT] Highpart multiplication
diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index 96cb1a0be29..5e933c83841 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -74,7 +74,7 @@
 
 ;; Vector crypto, assumed to be a generic operation for now.
 (define_insn_reservation "vec_crypto" 4
-  (eq_attr "type" "crypto")
+  (eq_attr "type" "crypto,vclz,vctz,vcpop")
   "vxu_ooo_issue,vxu_ooo_alu")
 
 ;; Vector crypto, AES
diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index 0ddc2f3f3c6..17432b15815 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ -99,42 +99,43 @@
 ;; vror.vv vror.vx vror.vi
 ;; vwsll.vv vwsll.vx vwsll.vi
 (define_insn "@pred_vandn"
- 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add vandn combine helper.

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:2ec5e6c7a87ebd75ab937ea5f8d926fc212631e2

commit 2ec5e6c7a87ebd75ab937ea5f8d926fc212631e2
Author: Robin Dapp 
Date:   Wed May 15 15:01:35 2024 +0200

RISC-V: Add vandn combine helper.

This patch adds a combine pattern for vandn as well as tests for it.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vandn_): New pattern.
* config/riscv/vector.md: Add vandn to mode_idx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vandn-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-template.h: New test.

(cherry picked from commit f48448276f29a3823827292c72b7fc8e9cd39e1e)

Diff:
---
 gcc/config/riscv/autovec-opt.md| 18 
 gcc/config/riscv/vector.md |  2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vandn-1.c   |  8 
 .../gcc.target/riscv/rvv/autovec/binop/vandn-run.c | 54 ++
 .../riscv/rvv/autovec/binop/vandn-template.h   | 38 +++
 5 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index bc6af042bcf..6a2eabbd854 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1591,3 +1591,21 @@
 DONE;
   }
   [(set_attr "type" "vwsll")])
+
+;; vnot + vand = vandn.
+(define_insn_and_split "*vandn_"
+ [(set (match_operand:V_VLSI 0 "register_operand" "=vr")
+   (and:V_VLSI
+(not:V_VLSI
+  (match_operand:V_VLSI  2 "register_operand"  "vr"))
+(match_operand:V_VLSI1 "register_operand"  "vr")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vandn (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vandn")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 69423be6917..c15af17ec62 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -743,7 +743,7 @@
vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,\

vfcvtitof,vfncvtitof,vfncvtftoi,vfncvtftof,vmalu,vmiota,vmidx,\

vimovxv,vfmovfv,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,\
-   vgather,vcompress,vmov,vnclip,vnshift")
+   vgather,vcompress,vmov,vnclip,vnshift,vandn")
   (const_int 0)
 
   (eq_attr "type" "vimovvx,vfmovvf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
new file mode 100644
index 000..3bb5bf8dd5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+/* { dg-final { scan-assembler-times {\tvandn\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
new file mode 100644
index 000..243c5975068
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_zvbb_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+#include 
+
+#define SZ 512
+
+#define RUN(TYPE, VAL) 
\
+  TYPE a##TYPE[SZ];
\
+  TYPE b##TYPE[SZ];
\
+  for (int i = 0; i < SZ; i++) 
\
+{  
\
+  a##TYPE[i] = 123;
\
+  b##TYPE[i] = VAL;
\
+}  
\
+  vandn_##TYPE (a##TYPE, a##TYPE, b##TYPE, SZ);
\
+  for (int i = 0; i < SZ; i++) 
\
+assert (a##TYPE[i] == (TYPE) (123 & ~VAL));
+
+#define RUN2(TYPE, VAL)
\
+  TYPE as##TYPE[SZ];   
\
+  for (int i = 0; i < SZ; i++) 
\
+as##TYPE[i] = 123;  

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Use widening shift for scatter/gather if applicable.

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:5eade133c823d4ef2e226991c6ab5cfb63f2b338

commit 5eade133c823d4ef2e226991c6ab5cfb63f2b338
Author: Robin Dapp 
Date:   Fri May 10 13:37:03 2024 +0200

RISC-V: Use widening shift for scatter/gather if applicable.

With the zvbb extension we can emit a widening shift for scatter/gather
index preparation in case we need to multiply by 2 and zero extend.

The patch also adds vwsll to the mode_idx attribute and removes the
mode from shift-count operand of the insn pattern.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_gather_scatter): Use vwsll if
applicable.
* config/riscv/vector-crypto.md: Remove mode from vwsll shift
count operator.
* config/riscv/vector.md: Add vwsll to mode iterator.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add zvbb.
* 
gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c: New test.

(cherry picked from commit 309ee005aa871286c8daccbce7586f82be347440)

Diff:
---
 gcc/config/riscv/riscv-v.cc|  42 +---
 gcc/config/riscv/vector-crypto.md  |   4 +-
 gcc/config/riscv/vector.md |   4 +-
 .../gather-scatter/gather_load_64-12-zvbb.c| 113 +
 gcc/testsuite/lib/target-supports.exp  |  48 -
 5 files changed, 193 insertions(+), 18 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f105f470495..9428beca268 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4016,7 +4016,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
 {
   rtx ptr, vec_offset, vec_reg;
   bool zero_extend_p;
-  int scale_log2;
+  int shift;
   rtx mask = ops[5];
   rtx len = ops[6];
   if (is_load)
@@ -4025,7 +4025,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[1];
   vec_offset = ops[2];
   zero_extend_p = INTVAL (ops[3]);
-  scale_log2 = exact_log2 (INTVAL (ops[4]));
+  shift = exact_log2 (INTVAL (ops[4]));
 }
   else
 {
@@ -4033,7 +4033,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[0];
   vec_offset = ops[1];
   zero_extend_p = INTVAL (ops[2]);
-  scale_log2 = exact_log2 (INTVAL (ops[3]));
+  shift = exact_log2 (INTVAL (ops[3]));
 }
 
   machine_mode vec_mode = GET_MODE (vec_reg);
@@ -4043,9 +4043,12 @@ expand_gather_scatter (rtx *ops, bool is_load)
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   bool is_vlmax = is_vlmax_len_p (vec_mode, len);
 
+  bool use_widening_shift = false;
+
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
 {
+  use_widening_shift = TARGET_ZVBB && zero_extend_p && shift == 1;
   /* 7.2. Vector Load/Store Addressing Modes.
 If the vector offset elements are narrower than XLEN, they are
 zero-extended to XLEN before adding to the ptr effective address. If
@@ -4054,8 +4057,8 @@ expand_gather_scatter (rtx *ops, bool is_load)
 raise an illegal instruction exception if the EEW is not supported for
 offset elements.
 
-RVV spec only refers to the scale_log == 0 case.  */
-  if (!zero_extend_p || scale_log2 != 0)
+RVV spec only refers to the shift == 0 case.  */
+  if (!zero_extend_p || shift)
{
  if (zero_extend_p)
inner_idx_mode
@@ -4064,19 +4067,32 @@ expand_gather_scatter (rtx *ops, bool is_load)
inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
  machine_mode new_idx_mode
= get_vector_mode (inner_idx_mode, nunits).require ();
- rtx tmp = gen_reg_rtx (new_idx_mode);
- emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
- zero_extend_p ? true : false));
- vec_offset = tmp;
+ if (!use_widening_shift)
+   {
+ rtx tmp = gen_reg_rtx (new_idx_mode);
+ emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, 
idx_mode,
+ zero_extend_p ? true : false));
+ vec_offset = tmp;
+   }
  idx_mode = new_idx_mode;
}
 }
 
-  if (scale_log2 != 0)
+  if (shift)
 {
-  rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
- gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
- OPTAB_DIRECT);
+  rtx tmp;
+  if (!use_widening_shift)
+   tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+   gen_int_mode (shift, Pmode), NULL_RTX, 0,
+   OPTAB_DIRECT);
+  else
+   {
+ tmp = gen_reg_rtx (idx_mode);
+ insn_code icode = code_for_pred_vwsll_scalar (idx_mode);
+ rtx ops[] = {tmp, vec_offset, const1_rtx};
+ emit_vlmax_insn (icode, 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add vwsll combine helpers.

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:9d209e560d83b535ed0e916a5e964381c6111750

commit 9d209e560d83b535ed0e916a5e964381c6111750
Author: Robin Dapp 
Date:   Mon May 13 22:09:35 2024 +0200

RISC-V: Add vwsll combine helpers.

This patch enables the usage of vwsll in autovec context by adding the
necessary combine patterns and tests.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vwsll_zext1_): New
pattern.
(*vwsll_zext2_): Ditto.
(*vwsll_zext1_scalar_): Ditto.
(*vwsll_zext1_trunc_): Ditto.
(*vwsll_zext2_trunc_): Ditto.
(*vwsll_zext1_trunc_scalar_): Ditto.
* config/riscv/vector-crypto.md: Make pattern similar to other
narrowing/widening patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vwsll-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-template.h: New test.

(cherry picked from commit af4bf422a699de0e7af5a26e02997d313e7301a6)

Diff:
---
 gcc/config/riscv/autovec-opt.md| 126 -
 gcc/config/riscv/vector-crypto.md  |   2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vwsll-1.c   |  10 ++
 .../gcc.target/riscv/rvv/autovec/binop/vwsll-run.c |  67 +++
 .../riscv/rvv/autovec/binop/vwsll-template.h   |  49 
 5 files changed, 251 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 04f85d8e455..bc6af042bcf 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1467,5 +1467,127 @@
operands, operands[4]);
 DONE;
   }
-  [(set_attr "type" "vector")]
-)
+  [(set_attr "type" "vector")])
+
+;; vzext.vf2 + vsll = vwsll.
+(define_insn_and_split "*vwsll_zext1_"
+  [(set (match_operand:VWEXTI 0"register_operand" "=vr 
")
+  (ashift:VWEXTI
+   (zero_extend:VWEXTI
+ (match_operand: 1 "register_operand" " vr "))
+ (match_operand: 2 "vector_shift_operand" "vrvk")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_"
+  [(set (match_operand:VWEXTI 0"register_operand" "=vr 
")
+  (ashift:VWEXTI
+   (zero_extend:VWEXTI
+ (match_operand: 1 "register_operand" " vr "))
+   (zero_extend:VWEXTI
+ (match_operand: 2 "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+
+(define_insn_and_split "*vwsll_zext1_scalar_"
+  [(set (match_operand:VWEXTI 0"register_operand"  
  "=vr")
+  (ashift:VWEXTI
+   (zero_extend:VWEXTI
+ (match_operand: 1 "register_operand"" 
vr"))
+ (match_operand:2 "vector_scalar_shift_operand" " 
rK")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+if (GET_CODE (operands[2]) == SUBREG)
+  operands[2] = SUBREG_REG (operands[2]);
+insn_code icode = code_for_pred_vwsll_scalar (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+;; For
+;;   uint16_t dst;
+;;   uint8_t a, b;
+;;   dst = vwsll (a, b)
+;; we seem to create
+;;   aa = (int) a;
+;;   bb = (int) b;
+;;   dst = (short) vwsll (aa, bb);
+;; The following patterns help to combine this idiom into one vwsll.
+
+(define_insn_and_split "*vwsll_zext1_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+   (zero_extend:VQEXTI
+ (match_operand: 1   "register_operand" " vr "))
+   (match_operand:VQEXTI   2   "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+   (zero_extend:VQEXTI
+ (match_operand: 1   "register_operand" " vr "))
+   (zero_extend:VQEXTI
+ (match_operand: 2   "vector_shift_operand" "vrvk")]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a0fef33b5183de07ca0b0cf248e917d4849a6f2f

commit a0fef33b5183de07ca0b0cf248e917d4849a6f2f
Author: Robin Dapp 
Date:   Thu May 16 12:43:43 2024 +0200

RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch
splits the insn pattern in the same way vfwadd.wf was split.

It also adds two patterns to recognize extended scalars.  In practice
those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.

gcc/ChangeLog:

* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and
add extended_scalar patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx
tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.

(cherry picked from commit 9781885a624f3e29634d95c14cd10940cefb1a5a)

Diff:
---
 gcc/config/riscv/vector.md | 62 ++
 .../gcc.target/riscv/rvv/base/pr115068-run.c   | 24 +
 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c | 26 +
 .../gcc.target/riscv/rvv/base/vwaddsub-1.c | 48 +
 4 files changed, 128 insertions(+), 32 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 92bbb8ce6ae..dccf76f0003 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3877,27 +3877,71 @@
(set_attr "mode" "")])
 
 (define_insn 
"@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
(if_then_else:VWEXTI
  (unspec:
-   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
-(match_operand 5 "vector_length_operand"  "   rK,   
rK")
-(match_operand 6 "const_int_operand"  "i,
i")
-(match_operand 7 "const_int_operand"  "i,
i")
-(match_operand 8 "const_int_operand"  "i,
i")
+   [(match_operand: 1 "vector_mask_operand"   " 
vm,vm,Wc1,Wc1")
+(match_operand 5 "vector_length_operand"  " rK,rK, rK, 
rK")
+(match_operand 6 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 7 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 8 "const_int_operand"  "  i, i,  i, 
 i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
-   (match_operand:VWEXTI 3 "register_operand" "   vr,   
vr")
+   (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, 
vr")
(any_extend:VWEXTI
  (vec_duplicate:
-   (match_operand: 4 "reg_or_0_operand"   "   rJ,   
rJ"
- (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,
0")))]
+   (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, 
rJ"
+ (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu, 
 0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
 
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+   (if_then_else:VWEXTI
+ (unspec:
+   [(match_operand: 1 "vector_mask_operand"   " 
vm,vm,Wc1,Wc1")
+(match_operand 5 "vector_length_operand"  " rK,rK, rK, 
rK")
+(match_operand 6 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 7 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 8 "const_int_operand"  "  i, i,  i, 
 i")
+(reg:SI VL_REGNUM)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+ (plus:VWEXTI
+   (vec_duplicate:VWEXTI
+ (any_extend:
+   (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, 
rJ")))
+   (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, 
vr"))
+ (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu, 
 0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+   (if_then_else:VWEXTI
+ (unspec:
+   [(match_operand: 1 "vector_mask_operand"   " 
vm,vm,Wc1,Wc1")
+(match_operand 5 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:901c9e95366a04a0ccf9af86654ddeb9ae18ee59

commit 901c9e95366a04a0ccf9af86654ddeb9ae18ee59
Author: Robin Dapp 
Date:   Mon May 13 13:49:57 2024 +0200

RISC-V: Do not allow v0 as dest when merging [PR115068].

This patch splits the vfw...wf pattern so we do not emit e.g. vfwadd.wf
v0,v8,fa5,v0.t anymore.

gcc/ChangeLog:

PR target/115068

* config/riscv/vector.md:  Split vfw.wf pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr115068-run.c: New test.
* gcc.target/riscv/rvv/base/pr115068.c: New test.

(cherry picked from commit a2fd0812a54cf51520f15e900df4cfb5874b75ed)

Diff:
---
 gcc/config/riscv/vector.md | 20 +++
 .../gcc.target/riscv/rvv/base/pr115068-run.c   | 28 +
 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c | 29 ++
 3 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c8c9667eaa2..92bbb8ce6ae 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7178,24 +7178,24 @@
(symbol_ref "riscv_vector::get_frm_mode (operands[9])"))])
 
 (define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTF 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTF 0 "register_operand""=vd, vd, 
vr, vr")
(if_then_else:VWEXTF
  (unspec:
-   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
-(match_operand 5 "vector_length_operand"  "   rK,   
rK")
-(match_operand 6 "const_int_operand"  "i,
i")
-(match_operand 7 "const_int_operand"  "i,
i")
-(match_operand 8 "const_int_operand"  "i,
i")
-(match_operand 9 "const_int_operand"  "i,
i")
+   [(match_operand: 1 "vector_mask_operand"  " vm, 
vm,Wc1,Wc1")
+(match_operand 5 "vector_length_operand" " rK, rK, rK, 
rK")
+(match_operand 6 "const_int_operand" "  i,  i,  i, 
 i")
+(match_operand 7 "const_int_operand" "  i,  i,  i, 
 i")
+(match_operand 8 "const_int_operand" "  i,  i,  i, 
 i")
+(match_operand 9 "const_int_operand" "  i,  i,  i, 
 i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTF
-   (match_operand:VWEXTF 3 "register_operand" "   vr,   
vr")
+   (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, 
vr")
(float_extend:VWEXTF
  (vec_duplicate:
-   (match_operand: 4 "register_operand"   "f,
f"
- (match_operand:VWEXTF 2 "vector_merge_operand"   "   vu,
0")))]
+   (match_operand: 4 "register_operand"  "  f,  f,  f, 
 f"
+ (match_operand:VWEXTF 2 "vector_merge_operand"  " vu,  0, vu, 
 0")))]
   "TARGET_VECTOR"
   "vfw.wf\t%0,%3,%4%p1"
   [(set_attr "type" "vf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
new file mode 100644
index 000..95ec8e06021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile ("fence" ::: "memory");
+  vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m (
+__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()),
+__riscv_vundefined_f64m8 (), 1.0, __riscv_vsetvlmax_e64m8 ());
+  asm volatile ("" ::"vr"(vfwadd_wf_f64m8_m_vd) : "memory");
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
new file mode 100644
index 000..6d680037aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed] [RISC-V] Use pack to handle repeating constants

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1ca03816836d050e56f8d3c0e1ce0943e39c0444

commit 1ca03816836d050e56f8d3c0e1ce0943e39c0444
Author: Jeff Law 
Date:   Wed May 29 07:41:55 2024 -0600

[to-be-committed] [RISC-V] Use pack to handle repeating constants

This patch utilizes zbkb to improve the code we generate for 64bit constants
when the high half is a duplicate of the low half.

Basically we generate the low half and use a pack instruction with that same
register repeated.  ie

pack dest,src,src

That gives us a maximum sequence of 3 instructions and sometimes it will be
just 2 instructions (say if the low 32bits can be constructed with a single
addi or lui).

As with shadd, I'm abusing an RTL opcode.  This time it's CONCAT.  It's
reasonably close to what we're doing.  Obviously it's just how we identify 
the
desire to generate a pack in the array of opcodes.  We don't actually emit a
CONCAT.

Note that we don't care about the potential sign extension from bit 31. pack
will only look at bits 0..31 of each input (for rv64).  So we go ahead and 
sign
extend before synthesizing the low part as that allows us to handle more 
cases
trivially.

I had my testsuite generator chew on random cases of a repeating constant
without any surprises.  I don't see much point in including all those in the
testcase (after all there's 2**32 of them).  I've got a set of 10 I'm
including.  Nothing particularly interesting in them.

An enterprising developer that needs this improved without zbkb could 
probably
do so with a bit of work.  First increase the cost by 1 unit. Second avoid
cases where bit 31 is set and restrict it to cases when we can still create
pseudos.   On the codegen side, when encountering the CONCAT, generate the
appropriate shift of "X" into a temporary register, then IOR the temporary 
with
"X" into the new destination.

Anyway, I've tested this in my tester (though it doesn't turn on zbkb, yet).
I'll let the CI system chew on it overnight, but like mine, I don't think it
lights up zbkb.  So it's unlikely to spit out anything interesting.

gcc/
* config/riscv/crypto.md (riscv_xpack___2): Remove 
'*'
allow it to be used via the gen_* interface.
* config/riscv/riscv.cc (riscv_build_integer): Identify when Zbkb
can be used to profitably synthesize repeating constants.
(riscv_move_integer): Codegen changes to generate those Zbkb 
sequences.

gcc/testsuite/

* gcc.target/riscv/synthesis-9.c: New test.

(cherry picked from commit 3ae02dcb108df426838bbbcc73d7d01855bc1196)

Diff:
---
 gcc/config/riscv/crypto.md   |  2 +-
 gcc/config/riscv/riscv.cc| 23 +++
 gcc/testsuite/gcc.target/riscv/synthesis-9.c | 28 
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
index b632312ade2..b9cac78fce1 100644
--- a/gcc/config/riscv/crypto.md
+++ b/gcc/config/riscv/crypto.md
@@ -107,7 +107,7 @@
 ;; This is slightly more complex than the other pack patterns
 ;; that fully expose the RTL as it needs to self-adjust to
 ;; rv32 and rv64.  But it's not that hard.
-(define_insn "*riscv_xpack__2"
+(define_insn "riscv_xpack___2"
   [(set (match_operand:X 0 "register_operand" "=r")
(ior:X (ashift:X (match_operand:X 1 "register_operand" "r")
 (match_operand 2 "immediate_operand" "n"))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a99211d56b1..91fefacee80 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1123,6 +1123,22 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
 }
 
+  /* With pack we can generate a 64 bit constant with the same high
+ and low 32 bits triviall.  */
+  if (cost > 3 && TARGET_64BIT && TARGET_ZBKB)
+{
+  unsigned HOST_WIDE_INT loval = value & 0x;
+  unsigned HOST_WIDE_INT hival = value & ~loval;
+  if (hival >> 32 == loval)
+   {
+ cost = 1 + riscv_build_integer_1 (codes, sext_hwi (loval, 32), mode);
+ codes[cost - 1].code = CONCAT;
+ codes[cost - 1].value = 0;
+ codes[cost - 1].use_uw = false;
+   }
+
+}
+
   return cost;
 }
 
@@ -2679,6 +2695,13 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
  rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
  x = riscv_emit_set (t, x);
}
+ else if (codes[i].code == CONCAT)
+   {
+ rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
+ rtx t2 = gen_lowpart (SImo

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed] [RISC-V] Some basic patterns for zbkb code generation

2024-06-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:95439d053f53a014bed59465ceb563baf44e9a6f

commit 95439d053f53a014bed59465ceb563baf44e9a6f
Author: Lyut Nersisyan 
Date:   Tue May 28 09:17:50 2024 -0600

[to-be-committed] [RISC-V] Some basic patterns for zbkb code generation

And here's Lyut's basic Zbkb support.  Essentially it's four new patterns 
for
packh, packw, pack plus a bridge pattern needed for packh.

packw is a bit ugly as we need to match a sign extension in an inconvenient
location.  We pull it out so that the extension is exposed in a convenient
place for subsequent sign extension elimination.

We need a bridge pattern to get packh.  Thankfully the bridge pattern is a
degenerate packh where one operand is x0, so it works as-is without 
splitting
and provides the bridge to the more general form of packh.

This patch also refines the condition for the constant reassociation patch 
to
avoid a few more cases than can be handled efficiently with other 
preexisting
patterns and one bugfix to avoid losing bits, particularly in the xor/ior 
case.

Lyut did the core work here.  I think I did some minor cleanups and the 
bridge
pattern to work with gcc-15 and beyond.

This is a prerequisite for using zbkb in constant synthesis.  It also 
stands on
its own.  I know we've seen it trigger in spec without the constant 
synthesis
bits.

It's been through our internal CI and my tester.  I'll obviously wait for 
the
upstream CI to finish before taking further action.

gcc/
* config/riscv/crypto.md: Add new combiner patterns to generate
pack, packh, packw instrutions.
* config/riscv/iterators.md (HX): New iterator for half X mode.
* config/riscv/riscv.md (_shift_reverse): Tighten
cases to avoid.  Do not lose bits for XOR/IOR.

gcc/testsuite

* gcc.target/riscv/pack32.c: New test.
* gcc.target/riscv/pack64.c: New test.
* gcc.target/riscv/packh32.c: New test.
* gcc.target/riscv/packh64.c: New test.
* gcc.target/riscv/packw.c: New test.

Co-authored-by: Jeffrey A Law 

(cherry picked from commit 236116068151bbc72aaaf53d0f223fe06f7e3bac)

Diff:
---
 gcc/config/riscv/crypto.md   | 63 
 gcc/config/riscv/iterators.md|  3 ++
 gcc/config/riscv/riscv.md|  9 +++--
 gcc/testsuite/gcc.target/riscv/pack32.c  | 18 +
 gcc/testsuite/gcc.target/riscv/pack64.c  | 17 +
 gcc/testsuite/gcc.target/riscv/packh32.c | 13 +++
 gcc/testsuite/gcc.target/riscv/packh64.c |  6 +++
 gcc/testsuite/gcc.target/riscv/packw.c   | 13 +++
 8 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
index dd2bc94ee88..b632312ade2 100644
--- a/gcc/config/riscv/crypto.md
+++ b/gcc/config/riscv/crypto.md
@@ -104,6 +104,19 @@
   "pack\t%0,%1,%2"
   [(set_attr "type" "crypto")])
 
+;; This is slightly more complex than the other pack patterns
+;; that fully expose the RTL as it needs to self-adjust to
+;; rv32 and rv64.  But it's not that hard.
+(define_insn "*riscv_xpack__2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (ior:X (ashift:X (match_operand:X 1 "register_operand" "r")
+(match_operand 2 "immediate_operand" "n"))
+  (zero_extend:X
+(match_operand:HX 3 "register_operand" "r"]
+  "TARGET_ZBKB && INTVAL (operands[2]) == BITS_PER_WORD / 2"
+  "pack\t%0,%3,%1"
+  [(set_attr "type" "crypto")])
+
 (define_insn "riscv_packh_"
   [(set (match_operand:X 0 "register_operand" "=r")
 (unspec:X [(match_operand:QI 1 "register_operand" "r")
@@ -113,6 +126,29 @@
   "packh\t%0,%1,%2"
   [(set_attr "type" "crypto")])
 
+;; So this is both a useful pattern unto itself and a bridge to the
+;; general packh pattern below.
+(define_insn "*riscv_packh__2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X (ashift:X (match_operand:X 1 "register_operand" "r")
+(const_int 8))
+  (const_int 65280)))]
+ "TARGET_ZBKB"
+ "packh\t%0,x0,%1"
+ [(set_attr "type" "crypto")])
+
+;; While the two operands of the IOR could be swapped, this appears
+;; to be the canonical form.  The other form doesn't seem to trigger.
+(define_insn "*riscv_packh__3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (ior:X (and:X (ashift:X (match_operand:X 1 "register_operand" "r")
+   (const_int 8))
+ (const_int 65280))
+  (zero_extend:X (match_operand:QI 2 "register_operand" "r"]
+ "TARGET_ZBKB"
+ "packh\t%0,%2,%1"
+ [(set_attr "type" "crypto")])
+
 (define_insn "riscv_packw"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (unspec:DI [(match_operand:HI 1 "register_operand" "r")
@@ -122,6 

Re: RISC-V: Fix round_32.c test on RV32

2024-05-31 Thread Jeff Law




On 5/27/24 4:17 PM, Jivan Hakobyan wrote:

Ya, makes sense -- I guess the current values aren't that exciting for
execution, but we could just add some more interesting ones...


During the development of the patch, I have an issue with large
numbers (2e34, -2e34). They are used in gfortran.fortran-torture/
execute/ intrinsic_aint_anint.f90 test. Besides that, a benchmark
from Spec 2017 also failed (can not remember which one), Now we
haven't an issue with them, Of course, I can add additional tests
with large numbers. But it will be double-check (first fortran's
test)

So i think the question is what do we want to do in the immediate term.

We can remove the test to get cleaner testresults on rv32.  I'm not a 
big fan of removing tests, but this test just doesn't make sense on rv32 
as-is.



We could leave things alone for now on the assumption the test will be 
rewritten to check for calls to the proper routines and possibly 
extended to include runtime verification.


I tend to lean towards the first.  That obviously wouldn't close the 
door on re-adding the test later with runtime verification and such.


Palmer, do you have a strong opinion either way?

jeff


Re: [RFC/RFA] [PATCH 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs

2024-05-31 Thread Jeff Law




On 5/28/24 12:44 AM, Richard Biener wrote:

On Mon, May 27, 2024 at 5:16 PM Jeff Law  wrote:




On 5/27/24 12:38 AM, Richard Biener wrote:

On Fri, May 24, 2024 at 10:44 AM Mariam Arutunian
 wrote:


This patch introduces new built-in functions to GCC for computing bit-forward 
and bit-reversed CRCs.
These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by the 
presence of a CRC optab),
the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating code for 
a table-based CRC calculation.


I wonder whether for embedded target use we should arrange for the
table-based CRC calculation to be out-of-line and implemented in a
way so uses across TUs can be merged?  I guess a generic
implementation inside libgcc is difficult?

I think the difficulty is the table is dependent upon the polynomial.
So we'd have to arrange to generate, then pass in the table.

In theory we could have the linker fold away duplicate tables as those
should be in read only sections without relocations to internal members.
   So much like we do for constant pool entries.  Though this hasn't been
tested.

The CRC implementation itself could be subject to ICF if it's off in its
own function.  If it's inlined (and that's a real possibility), then
there's little hope of ICF helping on the codesize.


I was wondering of doing some "standard" mangling in the implementation
namespace and using comdat groups for both code and data?
But I'm not sure how that really solves anything given the dependencies 
on the polynomial.  ie, the contents of the table varies based on that 
polynomial and the polynomial can (and will) differ across CRC 
implementations.








Or we could just not do any of this for -Os/-Oz if the target doesn't
have a carryless multiply or crc with the appropriate polynomial.  Given
the CRC table is probably larger than all the code in a bitwise
impementation, disabling for -Os/-Oz seems like a very reasonable choice.


I was mainly thinking about the case where the user uses the new builtins,
but yes, when optimizing for size we can disable the recognition of open-coded
variants.

Turns out Mariam's patch already disables this for -Os.  :-)

For someone directly using the builtin, they're going to have to pass 
the polynomial as a constant to the builtin, with the possible exception 
of when the target has a crc instruction where the polynomial is defined 
by the hardware.


Jeff


Re: [PATCH 5/5][v3] RISC-V: Avoid inserting after a GIMPLE_COND with SLP and early break

2024-05-31 Thread Jeff Law




On 5/31/24 7:44 AM, Richard Biener wrote:

When vectorizing an early break loop with LENs (do we miss some
check here to disallow this?) we can end up deciding to insert
stmts after a GIMPLE_COND when doing SLP scheduling and trying
to be conservative with placing of stmts only dependent on
the implicit loop mask/len.  The following avoids this, I guess
it's not perfect but it does the job fixing some observed
RISC-V regression.

* tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
loops make sure to not advance the insertion iterator
beyond a GIMPLE_COND.
Note this patch may depend on others in the series.  I don't think the 
pre-commit CI tester is particularly good at handling that, particularly 
if the other patches in the series don't have the tagging for the 
pre-commit CI.


What most likely happened is this patch and only this patch was applied 
against the baseline for testing.


There are (manual) ways to get things re-tested.  I'm hoping Patrick and 
Edwin automate that procedure relatively soon.  Until that happens you 
have to email patchworks...@rivosinc.com with a URL for the patch in 
patchwork that you want retested.




Jeff



  1   2   3   4   5   6   7   8   9   10   >