[Bug target/116103] [15 Regression] GCN vs. "Internal-fn: Only allow modes describe types for internal fn[PR115961]"

2024-07-29 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103

--- Comment #10 from Li Pan  ---
(In reply to Thomas Schwinge from comment #9)
> (In reply to Li Pan from comment #7)
> > confirm with you all related failures are covered.
> 
> Yes, the testing state is restored to what it was before, thanks!
> 
> 
> Before 'git push', please note Richard Sandiford's comment:
> .

Thanks for the confirmation and reminder, I will send v2 for addressing.

[Bug target/116103] [15 Regression] GCN vs. "Internal-fn: Only allow modes describe types for internal fn[PR115961]"

2024-07-29 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103

--- Comment #7 from Li Pan  ---
Hi Thomas,

Could you please help to double confirm the below patch is able to fix these
asm check failure?

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658519.html

I tested below cases for target=amdgcn-amdhsa, but would like to double confirm
with you all related failures are covered.

gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc,
s[0-9]+, v[0-9]+ 80
gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc,
s[0-9]+, v[0-9]+ 80
gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc,
s[0-9]+, v[0-9]+ 56
gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc,
s[0-9]+, v[0-9]+ 56
gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "

Pan

[Bug target/116103] [15 Regression] GCN vs. "Internal-fn: Only allow modes describe types for internal fn[PR115961]"

2024-07-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103

--- Comment #6 from Li Pan  ---
(In reply to Thomas Schwinge from comment #5)
> (In reply to Li Pan from comment #3)
> > best practice of cross
> > compile gfx908 in x86 linux?
> 
> If you only need the 'cc1' (and no assembler, linker, libc), the following
> should do:
> 
> $ [...]/configure --target=amdgcn-amdhsa --enable-languages=c
> $ make -j12 all-gcc
> $ gcc/cc1 [...]

Thanks a lot. I can reproduce this and will send the PATCH following Richard's
suggestions after no surprise from test suites.

[Bug target/116103] [15 Regression] GCN vs. "Internal-fn: Only allow modes describe types for internal fn[PR115961]"

2024-07-26 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103

--- Comment #3 from Li Pan  ---
Thanks Richard for the suggestion.

Hi Thomas, could you please help to insight me the best practice of cross
compile gfx908 in x86 linux?
Then I can have a try following Richard's suggestion.

[Bug middle-end/115961] [15 Regression] wrong code on llvm-18.1.8 since r15-1936-g80e446e829d818 with bitfields less than the type mode precision

2024-07-16 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115961

--- Comment #5 from Li Pan  ---
Thanks Andrew Pinski.

That make much sense to me, and I can reproduce this from upstream now. Let me
file a patch for it.

[Bug middle-end/115863] [15 Regression] zlib-1.3.1 miscompilation since r15-1936-g80e446e829d818

2024-07-16 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #14 from Li Pan  ---
Hi Uroš,

> Please note two new instructions in the second asm dump. These are expanded
> from .SAT_TRUNC and are not present in the first asm dump.

> The problem here is that the presence of ustrunc{m}{n}2 optab in i386.md 
> prevents some optimization involving .MIN_EXPR that would result in better 
> code.

Would like to double confirm with you if vector mode of ustrunc{m}{n}2 has
similar issue like this. If not,  add single_use to match.pd may also effect on
vector.

For example, in RVV we may have a similar insn layout.

 [local count: 536870912]:
_18 = MIN_EXPR ;  // vminu
iftmp.0_11 = (unsigned int) _18;  // vcvt
stream.avail_out = iftmp.0_11;// vmv
left_37 = left_8 - _18;   // vsub

while .SAT_TRUNC somehow interferes with this optimization to produce:

 [local count: 536870912]:
_45 = MIN_EXPR ;  // vminu
iftmp.0_11 = .SAT_TRUNC (left_8); // vnclipu
stream.avail_out = iftmp.0_11;// vmv
left_37 = left_8 - _45;   // vsub

[Bug middle-end/115961] [15 Regression] wrong code on llvm-18.1.8 since r15-1936-g80e446e829d818 with bitfields less than the type mode precision

2024-07-16 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115961

--- Comment #3 from Li Pan  ---
Only x86 implemented the .SAT_TRUNC for scalar, so I bet it is almost the same
as this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863 ?

[Bug middle-end/115863] [15 Regression] zlib-1.3.1 miscompilation since r15-1936-g80e446e829d818

2024-07-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #13 from Li Pan  ---
Thanks Richard and Bizjak.

Got the point here, and let me have a try for the improvement.

[Bug middle-end/115863] [15 Regression] zlib-1.3.1 miscompilation since r15-1936-g80e446e829d818

2024-07-12 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #8 from Li Pan  ---
(In reply to Richard Biener from comment #7)
> (In reply to Uroš Bizjak from comment #6)
> > Please note that w/o .SAT_TRUNC the compiler is able to optimize hot loop in
> > compress2 to:
> > 
> >[local count: 536870912]:
> >   _18 = MIN_EXPR ;
> >   iftmp.0_11 = (unsigned int) _18;
> >   stream.avail_out = iftmp.0_11;
> >   left_37 = left_8 - _18;
> > 
> > while .SAT_TRUNC somehow interferes with this optimization to produce:
> > 
> >[local count: 536870912]:
> >   _45 = MIN_EXPR ;
> >   iftmp.0_11 = .SAT_TRUNC (left_8);
> >   stream.avail_out = iftmp.0_11;
> >   left_37 = left_8 - _45;
> 
> it looks like whatever recognizes .SAT_TRUNC doesn't pay attention that
> there are other uses of the MIN_EXPR and thus the MIN_EXPR stays live.
> 
> IIRC :s on (match (...) is ignored (and that's good IMO) at the moment so
> the user of the match predicate has to check.

Thanks Richard.
Yes, the .SAT_TRUNC doesn't pay any attention the other possible use of
MIN_EXPR.

As your suggestion, we may need one additional check here (like
gimple_unsigned_sat_trunc() && no_other_MIN_EXPR_use_after_sat_trunc_p ())
before we build the SAT_TRUNC call.
Sorry I didn't get the point here why we need to do this, could you please
help to explain a bit more about it? Like wrong code or something else in above
sample code.

[Bug middle-end/115863] [15 Regression] zlib-1.3.1 miscompilation since r15-1936-g80e446e829d818

2024-07-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #4 from Li Pan  ---
(In reply to H.J. Lu from comment #3)
> This may be fixed by r15-1954.

Thank HJ, this makes much sense to me.

[Bug middle-end/115863] [15 Regression] zlib-1.3.1 miscompilation since r15-1936-g80e446e829d818

2024-07-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #1 from Li Pan  ---
Thanks for reporting this.

It should be this "stream.avail_out = left > (uLong)max ? max : (uInt)left;"
which HIT the .SAT_TRUNC.  Aka below pattern.

+/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
+   SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
+(match (unsigned_integer_sat_trunc @0)
+ (convert (min @0 INTEGER_CST@1))
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0)))
+ (with
+  {
+   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
+   unsigned otype_precision = TYPE_PRECISION (type);
+   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
+   wide_int int_cst = wi::to_wide (@1, itype_precision);
+  }
+  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst))

Take a quick look but failed to find something abnormal,  will take a look into
it.

[Bug target/115763] RISC-V: Use wrong SEW for vfmv.v.f when -march only has zvfhmin

2024-07-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115763

--- Comment #9 from Li Pan  ---
(In reply to Monk Chiang from comment #8)
> Li Pan, I tested it without any errors, I think this issue has been fixed

Thank you! Looks this bug is still in UNCONFIRMED status, will close it once
confirmed.

[Bug target/115763] RISC-V: Use wrong SEW for vfmv.v.f when -march only has zvfhmin

2024-07-04 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115763

--- Comment #6 from Li Pan  ---
Seems no surprise.

Monk Chiang, could you please help to double check if upstream has fixed this
issue ? Thanks.

[Bug target/115763] RISC-V: Use wrong SEW for vfmv.v.f when -march only has zvfhmin

2024-07-03 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115763

--- Comment #5 from Li Pan  ---
The second test may still have some problem, will double check about it.

[Bug target/115763] RISC-V: Use wrong SEW for vfmv.v.f when -march only has zvfhmin

2024-07-03 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115763

--- Comment #3 from Li Pan  ---
Thanks, I have a quick fix but it looks to break the zvfh to generate the vfmv
insn. Let me have another try.

[Bug target/115763] RISC-V: Use wrong SEW for vfmv.v.f when -march only has zvfhmin

2024-07-03 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115763

--- Comment #1 from Li Pan  ---
Thanks for reporting this, I can reproduce from today's upstream by the below
command:

../__RV64/bin/riscv64-unknown-elf-gcc pr115763.c -mabi=lp64d
-march=rv64gcv_zfh_zvfhmin -O3 -ftree-vectorize -fno-vect-cost-model -c -S -o -

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725

--- Comment #6 from Li Pan  ---
(In reply to Robin Dapp from comment #5)
> > zvl128b => GOOD.
> > vec_set_vnx8hi_0:
> > vl1re16.v   v1,0(a1)
> > vsetivlizero,1,e16,m1,ta,ma
> > vmv.s.x v1,a2
> > vs1r.v  v1,0(a0)  // Only store 1 element as source code.
> > ret
> > 
> > 
> > zvl512b => BAD.
> > vec_set_vnx8hi_0:
> > vsetivlizero,1,e16,mf4,ta,ma
> > vle16.v v1,0(a1)
> > vmv.s.x v1,a2
> > vsetivlizero,8,e16,mf4,ta,ma
> > vse16.v v1,0(a0) // Store 8 elements
> > ret
> 
> Isn't this similar in that both write a "full" 8-element vector?

Oh, yes, you are right, missed the vs1r whole register store which doesn't
honor vl.

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725

--- Comment #3 from Li Pan  ---
Both qemu and spike cannot reproduce the failure as mentioned "hecking (res[1]
!= 1) will get abort()". 

But I bet you mean that we have an additional and unnecessary store here,
right?

zvl128b => GOOD.
vec_set_vnx8hi_0:
vl1re16.v   v1,0(a1)
vsetivlizero,1,e16,m1,ta,ma
vmv.s.x v1,a2
vs1r.v  v1,0(a0)  // Only store 1 element as source code.
ret


zvl512b => BAD.
vec_set_vnx8hi_0:
vsetivlizero,1,e16,mf4,ta,ma
vle16.v v1,0(a1)
vmv.s.x v1,a2
vsetivlizero,8,e16,mf4,ta,ma
vse16.v v1,0(a0) // Store 8 elements
ret

[Bug target/115458] [15 regression] [RISC-V] ICE in lra_split_hard_reg_for, at lra-assigns.cc:1868 unable to find a register to spill since r15-518-g99b1daae18c095

2024-06-18 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458

--- Comment #5 from Li Pan  ---
(In reply to Richard Biener from comment #4)
> The bisected rev only exposes this.

Thanks Richard for hint, will take a look into it.

[Bug target/115458] [15 regression] [RISC-V] ICE in lra_split_hard_reg_for, at lra-assigns.cc:1868 unable to find a register to spill

2024-06-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458

--- Comment #3 from Li Pan  ---
Locate the first commit 99b1daae18c095d6c94d32efb77442838e11cbfb.

tree-optimization/114589 - remove profile based sink heuristics

[Bug target/115458] [15 regression] [RISC-V] ICE in lra_split_hard_reg_for, at lra-assigns.cc:1868 unable to find a register to spill

2024-06-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458

--- Comment #2 from Li Pan  ---
3700bd68d1b01f0fe6d15f8a40b7fdca0904d5aa round May 15 is OK, let me run a
bisect for the first bad commit.

[Bug target/115458] [15 regression] [RISC-V] ICE in lra_split_hard_reg_for, at lra-assigns.cc:1868 unable to find a register to spill

2024-06-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #1 from Li Pan  ---
Ack, reproduced from today's upstream.

[Bug target/115456] RISC-V: ICE: unrecognizable insn with march=rv64gcv_zvfhmin

2024-06-12 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115456

--- Comment #2 from Li Pan  ---
According to the ISA, Zvfhmin only contains 2 insns, quote as below

"
The Zvfhmin extension provides minimal support for vectors of IEEE 754-2008
binary16 values, adding conversions to and from binary32. When the Zvfhmin
extension is implemented, the vfwcvt.f.f.v and vfncvt.f.f.w instructions become
defined when SEW=16. The EEW=16 floating-point operands of these instructions
use the binary16 format.
"

Thus, for this case the vfmv.f.s should be invalid for V4HF mode but expanded.

We should not generate insn like that here when only zvfhmin is given.

[Bug target/115456] RISC-V: ICE: unrecognizable insn with march=rv64gcv_zvfhmin

2024-06-12 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115456

--- Comment #1 from Li Pan  ---
Ack, will take care of it.

[Bug tree-optimization/115387] [15 regression] RISC-V: ICE in iovsprintf.c since r15-1081-ge14afbe2d1c

2024-06-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115387

--- Comment #7 from Li Pan  ---
Thanks a lot. I am testing a fix, and will send it out after no surprise.

[Bug tree-optimization/115387] [15 regression] RISC-V: ICE in iovsprintf.c since r15-1081-ge14afbe2d1c

2024-06-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115387

--- Comment #5 from Li Pan  ---
Thanks all. I can reproduce this now.
Sorry I didn't run the test with glibc(only newlib), will take care of it ASAP.

[Bug tree-optimization/115387] [15 regression] RISC-V: ICE in iovsprintf.c since r15-1081-ge14afbe2d1c

2024-06-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115387

--- Comment #2 from Li Pan  ---
(In reply to Edwin Lu from comment #1)
> Bisected to r15-1081-ge14afbe2d1c being the first bad commit

Ack, thanks Edwin, will try to reproduce this.

[Bug rtl-optimization/115013] [15 Regression] LRA: PR114810 fix result in ICE in the RISC-V Vector

2024-05-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115013

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Li Pan  ---
Fixed.

[Bug c/115013] New: LRA: PR114810 fix result in ICE in the RISC-V Vector

2024-05-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115013

Bug ID: 115013
   Summary: LRA: PR114810 fix result in ICE in the RISC-V Vector
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The patch:
[PR114810][LRA]: Recognize alternatives with lack of available registers for
insn and demote them.

Results in some ICE in the rvv.exp of RISC-V backend.

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
rv64gcv/  lp64d/ medlow | 1061 /69 |0 / 0 |  - |
make: *** [Makefile:1096: report-gcc-newlib] Error 1

Just pick one imm_loop_invariant-10.c as below.

.../gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1:
error: unrecognizable insn:
(insn 265 0 0 (parallel [
(set (reg:RVVMF8QI 309 [239])
(unspec:RVVMF8QI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))
(clobber (scratch:SI))
]) -1
 (nil))
during RTL pass: reload
…. gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1:
internal compiler error: in extract_insn, at recog.cc:2812
0xa9d309 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../.././gcc/gcc/rtl-error.cc:108
0xa9d32b _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././gcc/gcc/rtl-error.cc:116
0xa9bc07 extract_insn(rtx_insn*)
../.././gcc/gcc/recog.cc:2812
0x10e5ad2 ira_remove_insn_scratches(rtx_insn*, bool, _IO_FILE*, rtx_def*
(*)(rtx_def*))
../.././gcc/gcc/ira.cc:5381
0x112868f remove_insn_scratches
../.././gcc/gcc/lra.cc:2154
0x112868f lra_emit_move(rtx_def*, rtx_def*)
../.././gcc/gcc/lra.cc:513
0x1136883 match_reload
../.././gcc/gcc/lra-constraints.cc:1184
0x1142ae4 curr_insn_transform
../.././gcc/gcc/lra-constraints.cc:4778
0x11443cb lra_constraints(bool)
../.././gcc/gcc/lra-constraints.cc:5481
0x112b192 lra(_IO_FILE*, int)
../.././gcc/gcc/lra.cc:2442
0x10e0e7f do_reload
../.././gcc/gcc/ira.cc:5973
0x10e0e7f execute
../.././gcc/gcc/ira.cc:6161

reproduced by below command:
riscv64-unknown-elf-gcc -c -S
gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c
-march=rv32gcv -mabi=ilp32 -o -

[Bug c/114885] New: RISC-V: ICE of unrecog insn when graphite for both the c/c++ and fortran

2024-04-29 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114885

Bug ID: 114885
   Summary: RISC-V: ICE of unrecog insn when graphite for both the
c/c++ and fortran
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

When some graphite (require the isl build for gcc) tests, there are sorts of
ICE that cannot recog the insn.

FAIL: gcc.dg/graphite/pr111878.c (internal compiler error: in 
extract_insn, at recog.cc:2812)

FAIL: gfortran.dg/graphite/id-27.f90   -O  (internal compiler error: in 
extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/id-27.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (internal compiler error: 
in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal 
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (internal compiler 
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal 
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (internal compiler 
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (internal compiler 
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (test for excess errors)

Reproduce step(s):

1. download isl-0.24, let isl -> /some-where/riscv-gnu-toolchain/gcc/isl-0.24

2. mkdir __BUILD__ && cd __BUILD__ && ../configure \
  --target=riscv64-unknown-elf \
  --prefix=${INSTALL_DIR} \
  --disable-shared \
  --enable-threads \
  --enable-tls \
  --enable-languages=c,c++,fortran \
  --with-system-zlib \
  --with-newlib \
  --disable-libmudflap \
  --disable-libssp \
  --disable-libquadmath \
  --disable-libgomp \
  --enable-nls \
  --disable-tm-clone-registry \
  --src=`pwd`/../ \
  --with-abi=lp64d \
  --with-arch=rv64gcv \
  --with-tune=rocket \
  --with-isa-spec=20191213 \
  CFLAGS_FOR_BUILD="-O0 -g" \
  CXXFLAGS_FOR_BUILD="-O0 -g" \
  CFLAGS_FOR_TARGET="-O0  -g" \
  CXXFLAGS_FOR_TARGET="-O0 -g" \
  BOOT_CFLAGS="-O0 -g" \
  CFLAGS="-O0 -g" \
  CXXFLAGS="-O0 -g" \
  GM2FLAGS_FOR_TARGET="-O0 -g" \
  GOCFLAGS_FOR_TARGET="-O0 -g" \
  GDCFLAGS_FOR_TARGET="-O0 -g"
make -j $(nproc) all-gcc && make install-gcc

3. ../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc
gcc/testsuite/gcc.dg/graphite/pr111878.c -O3 -fgraphite-identity
-fsave-optimization-record -march=rv64gcv -mabi=lp64d -c -S -o -

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #19 from Li Pan  ---
Thanks Juzhe.  Here is another example

-
#include 

extern size_t get_new_vl ();

size_t
__attribute__((noinline))
get_vl (size_t *c)
{
  size_t vl = c[0] + c[1];

  return vl;
}

vbool64_t
test_fail_2 (vuint64m1_t a, unsigned long b, size_t *c)
{
  return __riscv_vmsne_vx_u64m1_b64 (a, b, get_vl (c));
}
---

test_fail_2:   
   
   [30/37834]
addisp,sp,-16
sd  ra,8(sp)
sd  s0,0(sp)
csrrt0,vlenb
sub sp,sp,t0
vs1r.v  v1,0(sp)
sub sp,sp,t0
vs1r.v  v2,0(sp)
sub sp,sp,t0
vs1r.v  v3,0(sp)
sub sp,sp,t0
vs1r.v  v4,0(sp)
sub sp,sp,t0
vs1r.v  v5,0(sp)
sub sp,sp,t0
vs1r.v  v6,0(sp)
sub sp,sp,t0
vs1r.v  v7,0(sp)
sub sp,sp,t0
vs1r.v  v24,0(sp)
sub sp,sp,t0
vs1r.v  v25,0(sp)
sub sp,sp,t0
vs1r.v  v26,0(sp)
sub sp,sp,t0
vs1r.v  v27,0(sp)
sub sp,sp,t0
vs1r.v  v28,0(sp)
sub sp,sp,t0   
   
 vs1r.v  v29,0(sp) 
   
   
  sub sp,sp,t0
vs1r.v  v30,0(sp)
sub sp,sp,t0
vs1r.v  v31,0(sp)
csrrt0,vlenb
sub sp,sp,t0
vs1r.v  v8,0(sp)
mv  s0,a0
mv  a0,a1
callget_vl
vl1re64.v   v8,0(sp)
vsetvli zero,a0,e64,m1,ta,ma
vmsne.vxv0,v8,s0
csrrt0,vlenb
add sp,sp,t0
csrrt0,vlenb
vl1re64.v   v31,0(sp)
add sp,sp,t0
vl1re64.v   v30,0(sp)
add sp,sp,t0
vl1re64.v   v29,0(sp)
add sp,sp,t0
vl1re64.v   v28,0(sp)
...

As I understand, these callee saved vector registers are not required if the
function body doesn't pollute these registers.  Only the polluted registers
need to go in/out stack.

However, it is somehow one optimization here, we can consider to improve this
in GCC-15 if my understanding is correct.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #17 from Li Pan  ---
According to the V abi, looks like the asm code tries to save/restore the
callee-saved registers when there is a call in function body.

| Name| ABI Mnemonic | Meaning  | Preserved across
calls?
=
| v0  |  | Argument register| No
| v1-v7   |  | Callee-saved registers   | Yes
| v8-v23  |  | Argument registers   | No
| v24-v31 |  | Callee-saved registers   | Yes

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714

--- Comment #4 from Li Pan  ---
(In reply to Kito Cheng from comment #3)
> Reduced case, not the final result, but it already run 8+ hours...
> ```
> typedef int a;
> typedef short b;
> typedef unsigned c;
> template < typename > using e = unsigned;
> template < typename > void ab();
> #pragma riscv intrinsic "vector"
> template < typename f, int, int ac > struct g {
>   using i = f;
>   template < typename m > using j = g< m, 0, ac >;
>   using k = g< i, 1, ac - 1 >;
>   using ad = g< i, 1, ac + 1 >;
> };
> namespace ae {
> struct af {
>   using h = g< short, 6, 0 < 3 >;
> };
> struct ag {
>   using h = af::h;
> };
> } template < typename, int > using ah = ae::ag::h;
> template < class ai > using aj = typename ai::i;
> template < class i, class ai > using j = typename ai::j< i >;
> template < class ai > using ak = j< e< ai >, ai >;
> template < class ai > using k = typename ai::k;
> template < class ai > using ad = typename ai::ad;
> template < a ap > vuint16m1_t ar(g< b, ap, 0 >, b);
> template < a ap > vuint16m2_t ar(g< b, ap, 1 >, b);
> template < a ap > vuint32m2_t ar(g< c, ap, 1 >, c);
> template < a ap > vuint32m4_t ar(g< c, ap, 2 >, c);
> template < class ai > using as = decltype(ar(ai(), aj< ai >()));
> template < class ai > as< ai > at(ai);
> namespace ae {
> template < int ap > vuint32m4_t au(g< c, ap, 1 + 1 >, vuint32m2_t l) {
>   return __riscv_vlmul_ext_v_u32m2_u32m4(l);
> }
> } template < int ap > vuint32m2_t aw(g< c, ap, 1 >, vuint16m1_t l) {
>   return __riscv_vzext_vf2_u32m2(l, 0);
> }
> namespace ae {
> vuint32m4_t ax(vuint32m4_t, vuint32m4_t, a);
> }
> template < class ay, class an > as< ay > az(ay ba, an bc) {
>   an bb;
>   return ae::ax(ae::au(ba, bc), ae::au(ba, bb), 2);
> }
> template < class bd > as< bd > be(bd, as< ad< bd > >);
> namespace ae {
> template < class bh, class bi > void bj(bh bk, bi bl) {
>   ad< decltype(bk) > bn;
>   az(bn, bl);
> }
> } template < int ap, int ac, class bp, class bq >
> void br(g< c, ap, ac > bk, bp, bq bl) {
>   ae::bj(bk, bl);
> }
> template < class ai > using bs = decltype(at(ai()));
> struct bt;
> template < int ac = 1 > class bu {
> public:
>   template < typename i > void operator()(i) {
> ah< i, ac > d;
> bt()(i(), d);
>   }
> };
> struct bt {
>   template < typename bv, class bf > void operator()(bv, bf bw) {
> using bx = bv;
> ak< bf > by;
> k< bf > bz;
> using bq = bs< decltype(by) >;
> using bp = bs< decltype(bw) >;
> bp cb;
> ab< bx >();
> for (;;) {
>   bp cc;
>   bq bl = aw(by, be(bz, cc));
>   br(by, cb, bl);
> }
>   }
> };
> void d() { bu()(b()); }
> 
> ```

Thanks Kito, really save my day!

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714

--- Comment #2 from Li Pan  ---
The vzext.vf2 has earlyclobber dest operand, and then it cannot allocated to
the source operand, like vzext.vf2 v0, v0.  Thus we will fail when check_rtl.

(define_insn "@pred__vf2"
  [(set (match_operand:VWEXTI 0 "register_operand" "=vd, vr,
vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
(if_then_else:VWEXTI
  (unspec:
[(match_operand: 1 "vector_mask_operand"   " vm,Wc1,
vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1,vmWc1,vmWc1")
 (match_operand 4 "vector_length_operand"  " rK, rK,
rK, rK, rK, rK, rK, rK, rK, rK, rK, rK,   rK,   rK")
 (match_operand 5 "const_int_operand"  "i,  i,  i, 
i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
 (match_operand 6 "const_int_operand"  "i,  i,  i, 
i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
 (match_operand 7 "const_int_operand"  "i,  i,  i, 
i,  i,  i,  i,  i,  i,  i,  i,  i,i,i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (any_extend:VWEXTI
(match_operand: 3 "register_operand"  
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,   vr,   vr"))
  (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, vu, 
0,  0, vu, vu,  0,  0, vu, vu,  0,  0,   vu,0")))]
  "TARGET_VECTOR"
  "vext.vf2\t%0,%3%p1"
  [(set_attr "type" "vext")
   (set_attr "mode" "")
   (set_attr "group_overlap"
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")])



insn 1205 1214 5405 70 (set (reg:RVVM1SI 97 v1 [orig:687 _1177 ] [687])
(if_then_else:RVVM1SI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 25 s9 [orig:539 _889 ] [539])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(zero_extend:RVVM1SI (reg:RVVMF2HI 97 v1 [orig:654 _1100 ] [654]))
(unspec:RVVM1SI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) "../hwy/ops/rvv-inl.h":1964:386 discrim 1
8452 {pred_zero_extendrvvm1si_vf2}
 (nil))
during RTL pass: reload

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #1 from Li Pan  ---
Confirmed from riscv64-unknown-elf-g++ (GCC) 14.0.1 20240415 (experimental).

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #13 from Li Pan  ---
overriding TARGET_CLASS_LIKELY_SPILLED_P hook may not be a fix as it will
generate sorts of spill for the below sample code.

vbool2_t test_vmfge_vf_f16m8_b2(vfloat16m8_t op1, float16_t op2, size_t vl) {
  return __riscv_vmfge_vf_f16m8_b2(op1, op2, vl);  
   
 }

need to re-think from the mode-switch side.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #12 from Li Pan  ---
#include 

extern unsigned long get_vl ();

#if 0

#else

vint32m1_t test (vint32m1_t a)
{
  unsigned b;
  return __riscv_vadd_vx_i32m1 (a, b, get_vl ()); // No ICE
}

vbool16_t test (vuint64m4_t a)
{
  unsigned long b;
  return __riscv_vmsne_vx_u64m4_b16 (a, b, get_vl ()); // ICE
}

#endif

This is comes from the below parts:

!(targetm.class_likely_spilled_p (REGNO_REG_CLASS (ret_start)));

For RVV, the reg_class values are listed as below. Because the Vector Mask has
only one reg, then it will be considered as likely spilled as the hook
TARGET_CLASS_LIKELY_SPILLED_P default returns true if reg_class_size[class] ==
1.

Not very sure if overriding TARGET_CLASS_LIKELY_SPILLED_P hook for riscv is a
reasonable fix, trying to understand TARGET_CLASS_LIKELY_SPILLED_P...


panli-reg_class_size[0]=0
panli-reg_class_size[1]=14 
   

panli-reg_class_size[2]=26
panli-reg_class_size[3]=32 
   

panli-reg_class_size[4]=32
panli-reg_class_size[5]=2  
   

panli-reg_class_size[6]=1  <= VM
panli-reg_class_size[7]=31 <= VD   
   

panli-reg_class_size[8]=32 <= V
panli-reg_class_size[9]=98

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #11 from Li Pan  ---
(In reply to Li Pan from comment #10)
> The #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) == FP_RETURN)
> of the riscv backend doesn't honor vector mode.  Then the below part
> 
>  370 if (!targetm.calls.function_value_regno_p
> (copy_start))   
> 
>  371   copy_num = 0;
> 
>  372 else
>  373   copy_num = hard_regno_nregs (copy_start,
>  374    GET_MODE (copy_reg));
> 
> will have copy_num == 0 and then went to a different code path.
> 
> Let me run fully riscv regression test for this fix first.

Maybe misunderstand here, need to double-check the vector ABI for return
values.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #10 from Li Pan  ---
The #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) == FP_RETURN) of
the riscv backend doesn't honor vector mode.  Then the below part

 370 if (!targetm.calls.function_value_regno_p
(copy_start))   
 371   copy_num = 0;
 372 else
 373   copy_num = hard_regno_nregs (copy_start,
 374    GET_MODE (copy_reg));

will have copy_num == 0 and then went to a different code path.

Let me run fully riscv regression test for this fix first.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #8 from Li Pan  ---
Find an even simpler code for reproduction.

#include 

extern unsigned long get_vl ();

vbool16_t test (vuint64m4_t a)
{
  unsigned long b;
  return __riscv_vmsne_vx_u64m4_b16 (a, b, get_vl ());
}

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-g++ -O3 -march=rv64gcv -c
ref.c -S -o -

acc22d56e140220e7dc6c138918cb6754b6d1c0b enabled the vector abi by default, and
trigger this assert in create_pre_exit. Replace get_vl () with a local variable
could bypass this issue. will continue to investigate.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #7 from Li Pan  ---
Looks this commit from bisect acc22d56e140220e7dc6c138918cb6754b6d1c0b, will
take a look into it.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #5 from Li Pan  ---
(In reply to Kito Cheng from comment #4)
> Reduced case:
> ```c
> typedef long c;
> #pragma riscv intrinsic "vector"
> template  struct d {};
> struct e {
>   using f = d<0>;
> };
> struct g {
>   using f = e::f;
> };
> template  using h = g::f;
> template  long k(d);
> vbool16_t j(vuint64m4_t a) {
>   c b;
>   return __riscv_vmsne_vx_u64m4_b16(a, b, k(h()));
> }
> 
> ```

Thanks Kito, reproduced on reduced case with option "riscv64-unknown-elf-g++
-O2 -march=rv64gcv". will take a look into it.


during RTL pass: mode_sw
test.c: In function ‘vbool16_t j(vuint64m4_t)’:
test.c:15:1: internal compiler error: in create_pre_exit, at
mode-switching.cc:451
   15 | }
  | ^
0x3978f12 create_pre_exit  
   

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/mode-switching.cc:451
0x3979e9e optimize_mode_switching
   
/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/mode-switching.cc:849
0x397b9bc execute
   
/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/mode-switching.cc:1324
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #3 from Li Pan  ---
Reproduced from my side too.

[Bug target/114352] RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114352

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Li Pan  ---
Fixed.

[Bug c/114352] RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114352

--- Comment #1 from Li Pan  ---
Test GCC version:

riscv64-unknown-elf-gcc (GCC) 14.0.1 20240315 (experimental)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug c/114352] New: RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114352

Bug ID: 114352
   Summary: RISC-V: ICE when __attribute__((target("arch=+v")) and
build with rv64gc -O3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Assume we have a sample code as below

void
__attribute__((target("arch=+v")))
add (int *a, int *b, int *out, unsigned count)
{
  unsigned i;

  for (i = 0; i < count; i++)
out[i] = a[i] + b[i];
}

When build with -march=rv64gc -O3 there will be ICE as below:
test.c: In function ‘add’:
test.c:4:1: internal compiler error: Floating point exception  
   
 4 | {
  | ^  
   
 0x1a5891b crash_signal
   
   
 
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:319
   
 0x7f0a7884251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x1f51ba4 riscv_hard_regno_nregs
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
0x1967bb9 init_reg_modes_target()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/reginfo.cc:471
0x13fc029 init_emit_regs()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
0x1a5b83d target_reinit()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:1936
0x35e374d save_target_globals()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:92
0x35e381f save_target_globals_default_opts()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:122
0x1f544cc riscv_save_restore_target_globals(tree_node*)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
0x1f55c36 riscv_set_current_function
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9477
0x1505be7 invoke_set_current_function_hook
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4690
0x1505f60 allocate_struct_function(tree_node*, bool)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4813
0x1044e33 store_parm_decls()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-decl.cc:11084
0x10b8a54 c_parser_declaration_or_fndef
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2975
0x10b62b7 c_parser_external_declaration
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2046
   
0x10b5d2a c_parser_translation_unit
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:1900
0x110d5f4 c_parse_file()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:26889
0x11bd3f3 c_common_parse_file()

Prepare a script for most vector arch combinations we will have:
arch=+v Fail
arch=+zve32x Fail
arch=+zve32f Fail
arch=+zve64x Fail
arch=+zve64f Fail
arch=+zve64d Fail
arch=+zvl64b Pass
arch=+zvl128b Pass
arch=+zvl256b Pass
arch=+zvl4096b Pass
arch=+zve32x_zvl64b Fail
arch=+zve32x_zvl128b Fail
arch=+zve32x_zvl256b Fail
arch=+zve32x_zvl4096b Fail
arch=+zve32f_zvl64b Fail
arch=+zve32f_zvl128b Fail
arch=+zve32f_zvl256b Fail
arch=+zve32f_zvl4096b Fail
arch=+zve64x_zvl64b Fail
arch=+zve64x_zvl128b Fail
arch=+zve64x_zvl256b Fail
arch=+zve64x_zvl4096b Fail
arch=+zve64f_zvl64b Fail
arch=+zve64f_zvl128b Fail
arch=+zve64f_zvl256b Fail
arch=+zve64f_zvl4096b Fail
arch=+zve64d_zvl64b Fail
arch=+zve64d_zvl128b Fail
arch=+zve64d_zvl256b Fail
arch=+zve64d_zvl4096b Fail

The passed arch cannot vectorized but the -march=armv8-a -O3 with
__attribute__((target("+sve2"))) can vectorize.

I will try to fix this ICE soon.

[Bug c/114351] New: RISC-V: ICE when __attribute__((target("arch=+v")) and build with rv64gc -O3

2024-03-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114351

Bug ID: 114351
   Summary: RISC-V: ICE when __attribute__((target("arch=+v")) and
build with rv64gc -O3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Assume we have a sample code as below

void
__attribute__((target("arch=+v")))
add (int *a, int *b, int *out, unsigned count)
{
  unsigned i;

  for (i = 0; i < count; i++)
out[i] = a[i] + b[i];
}

When build with -march=rv64gc -O3 there will be ICE as below:
test.c: In function ‘add’:
test.c:4:1: internal compiler error: Floating point exception  
   
 4 | {
  | ^  
   
 0x1a5891b crash_signal
   
   
 
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:319
   
 0x7f0a7884251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x1f51ba4 riscv_hard_regno_nregs
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
0x1967bb9 init_reg_modes_target()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/reginfo.cc:471
0x13fc029 init_emit_regs()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
0x1a5b83d target_reinit()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/toplev.cc:1936
0x35e374d save_target_globals()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:92
0x35e381f save_target_globals_default_opts()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/target-globals.cc:122
0x1f544cc riscv_save_restore_target_globals(tree_node*)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
0x1f55c36 riscv_set_current_function
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9477
0x1505be7 invoke_set_current_function_hook
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4690
0x1505f60 allocate_struct_function(tree_node*, bool)
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/function.cc:4813
0x1044e33 store_parm_decls()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-decl.cc:11084
0x10b8a54 c_parser_declaration_or_fndef
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2975
0x10b62b7 c_parser_external_declaration
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:2046
   
0x10b5d2a c_parser_translation_unit
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:1900
0x110d5f4 c_parse_file()
   
/home/pli/gcc/333/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/c/c-parser.cc:26889
0x11bd3f3 c_common_parse_file()

Prepare a script for most vector arch combinations we will have:
arch=+v Fail
arch=+zve32x Fail
arch=+zve32f Fail
arch=+zve64x Fail
arch=+zve64f Fail
arch=+zve64d Fail
arch=+zvl64b Pass
arch=+zvl128b Pass
arch=+zvl256b Pass
arch=+zvl4096b Pass
arch=+zve32x_zvl64b Fail
arch=+zve32x_zvl128b Fail
arch=+zve32x_zvl256b Fail
arch=+zve32x_zvl4096b Fail
arch=+zve32f_zvl64b Fail
arch=+zve32f_zvl128b Fail
arch=+zve32f_zvl256b Fail
arch=+zve32f_zvl4096b Fail
arch=+zve64x_zvl64b Fail
arch=+zve64x_zvl128b Fail
arch=+zve64x_zvl256b Fail
arch=+zve64x_zvl4096b Fail
arch=+zve64f_zvl64b Fail
arch=+zve64f_zvl128b Fail
arch=+zve64f_zvl256b Fail
arch=+zve64f_zvl4096b Fail
arch=+zve64d_zvl64b Fail
arch=+zve64d_zvl128b Fail
arch=+zve64d_zvl256b Fail
arch=+zve64d_zvl4096b Fail

The passed arch cannot vectorized but the -march=armv8-a -O3 with
__attribute__((target("+sve2"))) can vectorize.

I will try to fix this ICE soon.

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #4 from Li Pan  ---
Hi Patrick,

Could you please help to double-check if upstream has this problem? As well as
PR114198.

Thanks.

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #3 from Li Pan  ---
Testing a fix for possible regression.

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-06 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #2 from Li Pan  ---
Trigger below assert in vectorizable_store, the loop_vinfo use_partial,
fully_masked and fully_lens are all true here.

/* Shouldn't go with length-based approach if fully masked.  */
gcc_assert (!loop_lens || !loop_masks);

Introduce by this commit
https://github.com/gcc-mirror/gcc/commit/9fb832ce382d649b7687426e6bc4e5d3715cb78a#diff-97f675a4f401d6ec84d031e0d7259a0b6ba3b50eccc3fe483e9376becc9d9cf9

[Bug middle-end/114195] [14] RISC-V vector ICE: in vectorizable_store, at tree-vect-stmts.cc:8690

2024-03-06 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114195

--- Comment #1 from Li Pan  ---
Confirmed with
  1. build option '-march=rv64gcv -O3'.
  2. riscv64-unknown-elf-gcc (GCC) 14.0.1 20240306 (experimental).

If no one works on this ICE already, will take a look into it.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #4 from Li Pan  ---
Just did some hacks from the riscv backend, which is to replace the expanding
code of reduc_smax_scal_ to the reduc_xor_scal_.

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..58424baabd7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2107,10 +2107,8 @@ (define_expand "reduc_smax_scal_"
(match_operand:V_VLSI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  int prec = GET_MODE_PRECISION (mode);
-  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), mode);
-  riscv_vector::expand_reduction (UNSPEC_REDUC_MAX, riscv_vector::REDUCE_OP,
-  operands, min);
+  riscv_vector::expand_reduction (UNSPEC_REDUC_XOR, riscv_vector::REDUCE_OP,
+  operands, CONST0_RTX (mode));
   DONE;
 })

My idea would like to prove that the last standard name should be .REDUC_XOR.

Then the test (include the narrowed and the original one) can pass. That may
indicates we take .REDUC_MAX by mistake in somewhere. let me try to figure it
out.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #3 from Li Pan  ---
Narrow a little compares to the original test case.

---
int b[10][7] = {{}, // 0
{}, // 1
{}, // 2
{}, // 3
{}, // 4
{}, // 5
{0, 0, 0, 0, 0, 1}, // 6
{2, 3, 4, 5, 6, 7}, // 7
{8, 8, 8, 8, 8, 8}};// 8
   //0  1  2  3  4  5
int c;

int main() {
  int d = 0, a = 0;
  c = 0x;

  for (a = 0; a < 5; a++) {
for (d = 0; d < 6; d++) {
  c ^= -3L;

  if (b[a + 3][d])
continue;

  c = 0;
}
  }

  if (c == -3) {
return 0;
  } else {
return 1;
  }
}
---

The sematics of the loop acts on 5 * 6 matrix. The upstream currently makes the
first 4 * 6 vectorized and then goes scalar for the last 6 elements. The
vectorized part may looks like below.

  vect_array.16 = .MASK_LEN_LOAD_LANES (  [(void *) + 84B],
32B, { -1, ... }, POLY_INT_CST [4, 4], 0);
  vect__28.17_94 = vect_array.16[0];
  vect__28.18_95 = vect_array.16[1];
  vect__28.19_96 = vect_array.16[2];
  vect__28.20_97 = vect_array.16[3];
  vect__28.21_98 = vect_array.16[4];
  vect__28.22_99 = vect_array.16[5];
  vect_array.16 ={v} {CLOBBER};
  mask__70.24_102 = vect__28.17_94 != { 0, ... };
  vect_prephitmp_76.25_104 = .VCOND_MASK (mask__70.24_102, { -1, ... }, { -3,
... });
  mask__80.26_106 = vect__28.18_95 != { 0, ... };
  vect_c_lsm.27_108 = .VCOND_MASK (mask__80.26_106, vect_prephitmp_76.25_104, {
0, ... });
  mask__51.28_110 = vect__28.19_96 != { 0, ... };
  vect_prephitmp_66.29_112 = .VCOND_MASK (mask__51.28_110, vect_c_lsm.27_108, {
-3, ... });
  mask__16.30_114 = vect__28.20_97 != { 0, ... };
  vect_c_lsm.31_116 = .VCOND_MASK (mask__16.30_114, vect_prephitmp_66.29_112, {
0, ... });
  mask__79.32_118 = vect__28.21_98 != { 0, ... };
  vect_prephitmp_56.33_120 = .VCOND_MASK (mask__79.32_118, vect_c_lsm.31_116, {
-3, ... });
  mask__25.34_122 = vect__28.22_99 != { 0, ... };
  vect_c_lsm.35_124 = .VCOND_MASK (mask__25.34_122, vect_prephitmp_56.33_120, {
0, ... });
  _126 = .REDUC_MAX (vect_c_lsm.35_124);

Looks like the last .REDUC_MAX is kind of a surprise here? It is not easy to
get the sematics of REDUC_MAX for source code.  Actually the c will depend on
the previous iteration.

For example, if b condition is 0, c will be 0 forever. If b condition is 1, the
c will be the sequence similar to [-3, 0, -3, 0...].

Not sure if my understanding is correct, will take a look into tree-vect.

[Bug c/113696] RISC-V: ineffective vsetvl behavior

2024-02-19 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113696

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-06 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #18 from Li Pan  ---
Thanks for the confirmation.

Yes, it was before expand. I will prepare one PATCH for this, and it should
target for gcc-15 I bet.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-05 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #16 from Li Pan  ---
I have a try like below and finally have the Standard Name "SAT_ADD". Could you
please help to double-check if my understanding is correct?

Given below example code below:

typedef unsigned int uint32_t;

uint32_t
sat_add (uint32_t x, uint32_t y)
{
  return (x + y) | - ((x + y) < x);
}

And then add one simpify to match.pd and define new DEF_INTERNAL_OPTAB_FN for
it. Then we have the SAT_ADD representation after expand.

uint32_t sat_add (uint32_t x, uint32_t y)
{
  uint32_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _6;
;;succ:   EXIT

}

If everything goes well, I will prepare the patch for it later. Thanks.

[Bug target/113766] ICE: in generate_insn, at config/riscv/riscv-vector-builtins.cc:4186 with (invalid?) __riscv_vfredosum_tu()

2024-02-05 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113766

--- Comment #1 from Li Pan  ---
Thanks, I will take care of it.

[Bug target/112896] RISC-V: gcc.dg/pr30957-1.c run failure when rv64gcv_zvl1024b_zvfh_zfh

2024-02-04 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112896

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Li Pan  ---
This testcase is not well designed, removed from upstream and close this
bugzilla.

[Bug target/113697] RISC-V: Redundant vsetvl insn in function

2024-02-03 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113697

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-02 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #15 from Li Pan  ---
(In reply to Tamar Christina from comment #14)
> Awesome! Feel free to reach out if you need any help.
> 
> It’s likely easier to start with add and sub and get things pipe cleaned and
> expand incrementally than to try and do it all at once.

Cool, thanks in advance.

I will first try to make a SAT_ADD to the direct optab for a POC following your
RFC and suggestion. Looks like at least match.pd and internal-fn.def will be
touched. I am learning how match.pd works right now.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #13 from Li Pan  ---
I'll try to understand it and make it happen recently.

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-02-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #7 from Li Pan  ---
RISC-V backend reproduce code, build with "-march=rv64gcv_zba_zbb_zbc_zbs
--param=riscv-autovec-preference=fixed-vlmax -Ofast -ffast-math"

typedef unsigned short uint16_t;

void AAA (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count)
{
  unsigned m = 0, n = count;
  register uint16_t *p;

  p = x;

  do {
m = *--p;
*p = (uint16_t)(m >= wsize ? m-wsize : 0);
  } while (--n);

  n = wsize;
  p = y;

  do {
  m = *--p;
  *p = (uint16_t)(m >= wsize ? m-wsize : 0);
  } while (--n);
}

[Bug c/113697] New: RISC-V: Redundant vsetvl insn in function

2024-01-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113697

Bug ID: 113697
   Summary: RISC-V: Redundant vsetvl insn in function
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Give the sample code as below, build with -march=rv64gcv -O3 -g0

int foo (int * __restrict a, int n)
{
int result = 0;
for (int i = 0; i < n; i++)
  result += a[i];
return result;
}

The asm code looks like below, we have one duplicated vsetvl insn here.

foo:
.LFB0:
.cfi_startproc
ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li  a5,0
vsetivlizero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma  <== redundant vsetvl
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li  a0,0
ret

[Bug c/113696] New: RISC-V: ineffective vsetvl behavior

2024-01-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113696

Bug ID: 113696
   Summary: RISC-V: ineffective vsetvl behavior
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

Given we have a sample code, build with '-march=rv64gcv -O3 -g0'.


#include "riscv_vector.h"

void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond,
size_t cond2)
{
  for (size_t i = 0; i < n; i++)
{
  if (i == cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
*(vint8mf8_t*)(out + i + 100) = v;
  } else if (i == cond2) {
vfloat32mf2_t v = *(vfloat32mf2_t*)(in + i + 200);
*(vfloat32mf2_t*)(out + i + 200) = v;
  } else if (i == (cond2 - 1)) {
vuint16mf2_t v = *(vuint16mf2_t*)(in + i + 300);
*(vuint16mf2_t*)(out + i + 300) = v;
  } else {
vint8mf4_t v = *(vint8mf4_t*)(in + i + 400);
*(vint8mf4_t*)(out + i + 400) = v;
  }
}
}

when we have asm code as below, the vsetvl insn is somehow ineffective and can
be refined up to a point.

f:
.LFB0:
.cfi_startproc
beq a2,zero,.L12
addia7,a0,400
addia6,a1,400
addia0,a0,1600
addia1,a1,1600
li  a5,0
addit6,a4,-1
vsetvli t3,zero,e8,mf8,ta,ma
.L7:
beq a3,a5,.L15
beq a4,a5,.L16
beq t6,a5,.L17
vsetvli t1,zero,e8,mf4,ta,ma
vle8.v  v1,0(a0)
vse8.v  v1,0(a1)
vsetvli t3,zero,e8,mf8,ta,ma
.L4:
addia5,a5,1
addia7,a7,4
addia6,a6,4
addia0,a0,4
addia1,a1,4
bne a2,a5,.L7
.L12:
ret
.L15:
vle8.v  v1,0(a7)
vse8.v  v1,0(a6)
j   .L4
.L17:
vsetvli t1,zero,e8,mf4,ta,ma
addit5,a0,-400
addit4,a1,-400
vle16.v v1,0(t5)
vse16.v v1,0(t4)
vsetvli t3,zero,e8,mf8,ta,ma
j   .L4
.L16:
addit5,a0,-800
addit4,a1,-800
vle32.v v1,0(t5)
vse32.v v1,0(t4)
j   .L4

[Bug target/113469] RISC-V: Illegal Insn for test case 920501-8.c when make linux for rv32

2024-01-25 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113469

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug c/113469] New: RISC-V: Illegal Insn for test case 920501-8.c when make linux for rv32

2024-01-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113469

Bug ID: 113469
   Summary: RISC-V: Illegal Insn for test case 920501-8.c when
make linux for rv32
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The test case will have illegal instruction when `make linux` build of the repo
riscv-gnu-toolchain for rv32.

1. Build.
../__RISC-V_INSTALL___RV32/bin/riscv32-unknown-linux-gnu-gcc
gcc/testsuite/gcc.c-torture/execute/920501-8.c -march=rv32gcv -mabi=ilp32d
-mtune=rocket -mcmodel=medlow -fdiagnostics-plain-output -O2 -w -lm -o
./920501-8.elf -static

2. Run with qemu
../build-qemu/qemu-riscv32 -cpu rv32,vlen=512,v=true,vext_spec=v1.0
920501-8.elf
Illegal instruction (core dumped)

3. When enter function __printf_buffer (comes from libc.a), it will go to insn
like below for the first insn
  __printf_buffer:
auipc a5,0x5f  => directly jump to the vmv insn and then illegal insn met.
...
vmv.v.i.v1,0

4. After some investigation, the function __printf_buffer should be the
   function Xprintf_buffer in glibc/stdio-common/vfprintf-internal.c. You can
   use the below command to compile it.

   cd glibc/stdio-common/
../../__RISC-V_INSTALL___RV32/bin/riscv32-unknown-linux-gnu-gcc
vfprintf-internal.c  \
 -c -std=gnu11 -fgnu89-inline  -mcmodel=medlow -O2 -Wall -Wwrite-strings
-Wundef \
 -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common
-Wstrict-prototypes -Wold-style-definition  \
 -fmath-errno -fPIE   -ftls-model=initial-exec -I../include \

-I/home/pli/gcc/444/riscv-gnu-toolchain/build-glibc-linux-rv32gcv-ilp32d/stdio-common
 \
 -I/home/pli/gcc/444/riscv-gnu-toolchain/build-glibc-linux-rv32gcv-ilp32d 
-I../sysdeps/unix/sysv/linux/riscv/rv32 \
 -I../sysdeps/unix/sysv/linux/riscv  -I../sysdeps/riscv/nptl 
-I../sysdeps/unix/sysv/linux/generic/wordsize-32   \
 -I../sysdeps/unix/sysv/linux/generic  -I../sysdeps/unix/sysv/linux/include
-I../sysdeps/unix/sysv/linux  \
 -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu 
-I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  \
 -I../sysdeps/unix  -I../sysdeps/posix  \
 -I../sysdeps/riscv/rv32/rvd  -I../sysdeps/riscv/rv32/rvf  
-I../sysdeps/riscv/rvf \
 -I../sysdeps/riscv/rvd  -I../sysdeps/riscv/rv32  -I../sysdeps/riscv  \
 -I../sysdeps/ieee754/ldbl-128  -I../sysdeps/ieee754/dbl-64 
-I../sysdeps/ieee754/flt-32  \
 -I../sysdeps/wordsize-32   -I../sysdeps/ieee754  -I../sysdeps/generic \
 -I.. -I../libio -I. -nostdinc -isystem
/home/pli/gcc/444/riscv-gnu-toolchain/__RISC-V_INSTALL___RV32/lib/gcc/riscv32-unknown-linux-gnu/14.0.1/include
\
 -isystem
/home/pli/gcc/444/riscv-gnu-toolchain/__RISC-V_INSTALL___RV32/lib/gcc/riscv32-unknown-linux-gnu/14.0.1/include-fixed
  \
 -isystem /home/pli/gcc/444/riscv-gnu-toolchain/linux-headers/include \
 -D_LIBC_REENTRANT -include
/home/pli/gcc/444/riscv-gnu-toolchain/build-glibc-linux-rv32gcv-ilp32d/libc-modules.h
\
 -DMODULE_NAME=libc -include ../include/libc-symbols.h  -DPIC  \
 -DTOP_NAMESPACE=glibc -D_IO_MTSAFE_IO -o test.o

[Bug target/110265] RISC-V: ICE when build RVV intrinsic integer reduction with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.

2024-01-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110265

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Li Pan  ---
Fixed.

[Bug target/109615] Redundant VSETVL after optimized code of RVV

2024-01-17 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109615

Li Pan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Li Pan  ---
Fixed.

[Bug target/113393] RISC-V: Full coverage test bugs for upstream 20240112

2024-01-15 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113393

Li Pan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Li Pan  ---
Closed as fixed in upstream and validated.

[Bug c/113393] New: RISC-V: Full coverage test bugs for upstream 20240112

2024-01-14 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113393

Bug ID: 113393
   Summary: RISC-V:  Full coverage test bugs for upstream 20240112
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

For the RV64 parts

Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.c-torture/execute/pr68532.c   -O0  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test

Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247

--- Comment #10 from Li Pan  ---
(In reply to Robin Dapp from comment #9)
> I also noticed this (likely unwanted) vector snippet and wondered where it
> is being created.  First I thought it's a vec_extract but doesn't look like
> it.  I'm going to check why we create this.
> 
> Pan, the test was on real hardware I suppose?  

Yes.

> So regardless of the fact
> that we likely want to get rid of the snippet above, would you mind checking
> whether generic-ooo has any effect on performance?  Maybe you could try
> -march=rv64gc -mtune=generic-ooo.  Thanks.

Sure thing, actually I have some performance data that is under review to make
sure the alignment to the company policy before share to the community. I will
add a new column for generic-ooo.

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #8 from Li Pan  ---
The performance ratio of sha-test compared to scalar is about -2.2% when build
with -mtune=generic-ooo.

Aka the option -mtune=generic-ooo makes the ratio (compares to scalar) from
-70% to -2.2%. I suppose the negative ratio may be caused by the part mentioned
by Juzhe.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2024-01-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #29 from Li Pan  ---
(In reply to Patrick O'Neill from comment #27)
> Linking the discussion/plan here since more interested people are CCd here.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113206#c9
> Using 4a0a8dc1b88408222b88e10278017189f6144602, the spec run failed on:
> zvl128b (All runtime fails):
> 527.cam4 (Runtime)
> 531.deepsjeng (Runtime)
> 521.wrf (Runtime)
> 523.xalancbmk (Runtime)
> 
> zvl256b:
> 507.cactuBSSN (Runtime)
> 521.wrf (Build)
> 527.cam4 (Runtime)
> 531.deepsjeng (Runtime)
> 549.fotonik3d (Runtime)
> 
> With that info I think the next steps are:
> 1. Triage the zvl256b 521.wrf build failure
> 2. Bisect the newly-failing testcases
> 3. Finish triaging the remaining testcases the fuzzer found
> 4. Attempt to manually reduce cam4 for zvl128b (since it seems to have the
> fastest build+runtime)
> 5. Attempt to manually reduce other fails.

Hi Patrick,

Thanks a lot for the summary. Could you please help to share some more
information about the spec2017 for above data? Like data set (test, train, or
ref), the enviornment (qemu, spike, or hardware) as well as the spec config
file. Just would like to make sure we are on the same page for the failures and
reproducible from others.

Thanks again. Pan

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-12 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

--- Comment #19 from Li Pan  ---
(In reply to Robin Dapp from comment #7)
> Here
> 
> 0x105c6   vse8.v  v8,(a5)
> 
> is where we overwrite m.  The vl is 128 but the preceding vsetvl gets a4 =
> 46912504507016 as AVL which seems already borken.

I can reproduce this up to a point.

0x10282   vsetvli zero,a4,e8,m8,ta,ma

(gdb) p $a4
$2 = 110736

Looks like 110736 is not the correct vl here, will continue to investigate.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-11 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

--- Comment #18 from Li Pan  ---
I see, thanks all, will have a try with variadic function call.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

--- Comment #12 from Li Pan  ---
(In reply to Patrick O'Neill from comment #0)
> Testcase:
> int printf(char *, ...);
> int a, b, l, i, p, q, t, n, o;
> int *volatile c;
> static int j;
> static struct pack_1_struct d;
> long e;
> char m = 5;
> short s;
> #pragma pack(1)
> struct pack_1_struct {
>   long c;
>   int d;
>   int e;
>   int f;
>   int g;
>   int h;
>   int i;
> } h, r = {1}, *f = , *volatile g;
> int main() {
>   int u;
>   j = 0;
>   for (; j < 9; ++j) {
> u = ++t ? a : 0;
> if (u) {
>   int *v = 
>   *v = g || e;
>   *c = 0;
>   *f = h;
> }
> s = l && c;
> o = i;
> d.f || (p = 0);
> q |= n;
>   }
>   r = *f;
>   printf("b: %d\n", b);
>   printf("m: %d\n", m);
> }
> 
> Commands:
> rv64gc:
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/riscv64-unknown-linux-gnu-gcc
> >  -march=rv64gc -mabi=lp64d -O3 red.c -o rv64gc.out
> > QEMU_CPU=rv64,vlen=128,v=true,vext_spec=v1.0 
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/qemu-riscv64 rv64gc.out
> b: 0
> m: 5
> 
> rv64gcv:
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/riscv64-unknown-linux-gnu-gcc
> >  -march=rv64gcv -mabi=lp64d -O3 red.c -o rv64gcv.out
> > QEMU_CPU=rv64,vlen=128,v=true,vext_spec=v1.0 
> > /scratch/tc-testing/tc-dec-8-trunk/build-rv64gcv/bin/qemu-riscv64 
> > rv64gcv.out
> b: 0
> m: 0
> 
> Nothing touches the m variable so at the end it should equal 5.
> 
> Commenting out the preceding printf("b: %d\n", b); statement causes the
> testcase to pass successfully (and doesn't cause much change to the
> assembly):
> https://godbolt.org/z/Erzzqxo8q

Could you please help to share the commit id of GCC for the above test? Would
like to double check if the upstream still have this issue.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-10 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #11 from Li Pan  ---
(In reply to JuzheZhong from comment #8)
> Li Pan will investigate it. He will note me if there is a bug in vsetvl pass.

The interesting thing is that I cannot fully reproduce this with build
20231210.

PASS >> ../build-qemu/qemu-riscv64 -cpu rv64,vlen=128,v=true,vext_spec=v1.0
test.rv64gc.elf

FAIL ../build-qemu/qemu-riscv64 -cpu rv64,vlen=128,v=true,vext_spec=v1.0
test.gcv.elf
Segmentation fault (core dumped)

It will be PASS if built with rv64gc but got a segment fault in printf when
built with rv64gcv. Thus I did some adjusting for this case to bypass the
segment, and then have the rv64gcv pass. Update the modified test case as
below.

qemu-riscv64 version 8.1.92 (v8.2.0-rc2-48-gd451e32ce8).
newlib for gcc build.

Modified test case:

int a, b, l, i, p, q, t, n, o;
int *volatile c;
static int j;
static struct pack_1_struct d;
long e;
char m = 5;
short s;

#pragma pack(1)
struct pack_1_struct {
  long c;
  int d;
  int e;
  int f;
  int g;
  int h;
  int i;
} h, r = {1}, *f = , *volatile g;

int main() {
  int u;
  j = 0;

  for (; j < 9; ++j) {
u = ++t ? a : 0;
if (u) {
  int *v = 
  *v = g || e;
  *c = 0;
  *f = h;
}
s = l && c;
o = i;
d.f || (p = 0);
q |= n;
  }

  r = *f;

  if (m == 5)// Reference m like print
return 0;

  return 1234;
}

[Bug c/112896] New: RISC-V: gcc.dg/pr30957-1.c run failure when rv64gcv_zvl1024b_zvfh_zfh

2023-12-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112896

Bug ID: 112896
   Summary: RISC-V: gcc.dg/pr30957-1.c run failure when
rv64gcv_zvl1024b_zvfh_zfh
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The gcc.dg/pr30957-1.c test case is failed in RISC-V backend when build with
below options.

-march=rv64gcv_zvl1024b_zvfh_zfh -mabi=lp64d  -O2 -mcmodel=medlow
--param=riscv-autovec-preference=fixed-vlmax -funroll-loops -fassociative-math
-fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller
-fdump-rtl-expand-details -lm gcc/testsuite/gcc.dg/pr30957-1.c -o test.elf

The test gcc/testsuite/gcc.dg/pr30957-1.c may be similar as below.

float __attribute__((noinline))
foo (float d, int n)
{
  unsigned i;
  float accum = d;

  for (i = 0; i < n; i++)
accum += d;

  return accum;
}

int
main ()
{
  /* When compiling standard compliant we expect foo to return -0.0.  But the
 variable expansion during unrolling optimization (for this testcase
enabled
 by non-compliant -fassociative-math) instantiates copy(s) of the
 accumulator which it initializes with +0.0.  Hence we expect that foo
 returns +0.0.  */
  if (__builtin_copysignf (1.0, foo (0.0 / -5.0, 10)) != 1.0)
abort ();
  exit (0);
}

Have an initial investigation that RISC-V backend always get LPT_NONE when
unroll_loops, as the step of loop will be dynamic after vectorizing, and get
the simple loop flag as false, then the pass unroll_loops will do nothing for
non simple loop.

We may need further investigation for this case.

[Bug target/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-12-02 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

--- Comment #6 from Li Pan  ---
Double confirmed the riscv-gnu-toolchain can be built successfully with the
latest newlib.

[Bug target/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-11-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

--- Comment #4 from Li Pan  ---
There may be another ICE for zve32f, will double-check about the details.

[Bug c/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-11-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #1 from Li Pan  ---
Thanks Juzhe, will take a look at this issue and keep you posted.

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #17 from Li Pan  ---
(In reply to Robin Dapp from comment #15)
> Does the =m fix your issue?  Or is the code gen different then and we're
> just lucky?  For my problem it doesn't help because we still don't recognize
> an alias between load and store and the load is moved.
> 
> Richi's RFC patch from a while ago helps, though.  I cc'd you.

No, it should be something that happens to work. I double-checked the asm
layout, the alias is still false between scalar load and vector store.

   1016e:   158000efjal 102c6 
   10172:   ffc50793add a5,a0,-4
   10176:   4689li  a3,2
   10178:   0d047057vsetvli zero,s0,e32,m1,ta,ma
   1017c:   40d8lw  a4,4(s1)<= LOAD
   1017e:   5e00b0d7vmv.v.i v1,1
   10182:   74d1a423sw  a3,1864(gp) # 13398 
   10186:   0207e0a7vse32.v v1,(a5) <=
STORE
   1018a:   03271163bne a4,s2,101ac 

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #14 from Li Pan  ---
The below diff similar to the x86 workaround looks not working, unless we
change the `+m` to `=m`. But I don't fully test the impact of this change
except the case itself.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 935eeb7fd8e..882fc8fe5ec 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -85,6 +85,9 @@ (define_c_enum "unspec" [

   ;; String unspecs
   UNSPEC_STRLEN
+
+  ;; test
+  UNSPEC_MASKSTORE
 ])

 (define_c_enum "unspecv" [
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ba9c9e5a9b6..2f74cec51d1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1738,16 +1738,17 @@ (define_insn_and_split "*pred_mov"
 ;; Dedicated pattern for vse.v instruction since we can't reuse pred_mov
pattern to include
 ;; memory operand as input which will produce inferior codegen.
 (define_insn "@pred_store"
-  [(set (match_operand:V 0 "memory_operand" "+m")
-   (if_then_else:V
- (unspec:
-   [(match_operand: 1 "vector_mask_operand" "vmWc1")
-(match_operand 3 "vector_length_operand""   rK")
-(match_operand 4 "const_int_operand""i")
-(reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (match_operand:V 2 "register_operand" "vr")
- (match_dup 0)))]
+  [(set (match_operand:V 0 "memory_operand" "=m")
+   (unspec:V
+ [(if_then_else:V
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand" "vmWc1")
+  (match_operand 3 "vector_length_operand""   rK")
+  (match_operand 4 "const_int_operand""i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (match_operand:V 2 "register_operand" "vr")
+   (match_dup 0))] UNSPEC_MASKSTORE))]
   "TARGET_VECTOR"
   "vse.v\t%2,%0%p1"
   [(set_attr "type" "vste")

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #12 from Li Pan  ---
Hi Robin,

Do you have any ideas about the possible fix for this issue? The x86 backend
has one workaround for this issue as below.

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=dbf8ab449417aa24669f6ccf50be8c17f8c1278e

But unfortunately not suitable for riscv after a quick try because of the below
define_insn:
 (define_insn "@pred_store"
   [(set (match_operand:V 0 "memory_operand" "+m") // "=m" here
for x86 SSE.

Given current stage of GCC, I am not quite sure if we need to fix it in the
backend (Or bypass it) or from the middle end.

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-26 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #10 from Li Pan  ---
Link to one similar issue as below.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-23 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #9 from Li Pan  ---
Before tracer
- 
ENTRY
   |
   +---+
   |  B2   |
   +---+
  / \
a < 2a >= 2
/  \ 
   +---+  +---+
   | vec store |->| _3 = b[1] |
   +---+  +---+
  /\
_3 != 1 _3 == 1
/\
+++--+
| abort  || return 0 |
+++--+
After tracer
- 

ENTRY
   |
   +---+
   |  B2   |
   +---+
  / \
a < 2a >= 2
/  \ 
+-+  +---+
| vec store   |  | _3 = b[1] |
| |  +---+
after tracer| |   /\
| |_3 != 1 _3 == 1
| _31 = b[1]  | /\
+-+ +++--+
|-->| abort  || return 0 |<---|
|   +++--+|
| |
|-|

After tracer, the vec store and scalar load will be in the same basic block and
unfortunately referenced to the same memory address. Thus, the sch1 make the
scalar load before vec store cause the failure on memory access sequeneces.

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-22 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #8 from Li Pan  ---
For gcc.dg/torture/pr58955-2.c, we can simply reproduce it by options

Pass when: -O3
Pass when: -O3 -ftracer -fno-schedule-insns -fno-schedule-insns2
Fail when: -O3 -ftracer -fno-schedule-insns2

   10154:   4409   li   s0,2
   10156:   9c1d   subw s0,s0,a5
   10158:   1402   sll  s0,s0,0x20
   1015a:   9001   srl  s0,s0,0x20
   1015c:   97ca   add  a5,a5,s2
   1015e:   078a   sll  a5,a5,0x2
   10160:   7b018493   add  s1,gp,1968 # 13400 
   10164:   97a6   add  a5,a5,s1
   10166:   00241613   sll  a2,s0,0x2
   1016a:   853e   mv   a0,a5
   1016c:   4581   li   a1,0
   1016e:   158000ef   jal  102c6 
   10172:   ffc50793   add  a5,a0,-4
   10176:   4689   li   a3,2
   10178:   0d047057   vsetvli  zero,s0,e32,m1,ta,ma
   1017c:   40d8   lw   a4,4(s1)<== Load
   1017e:   5e00b0d7   vmv.v.i  v1,1
   10182:   74d1a423   sw   a3,1864(gp) # 13398 
   10186:   0207e0a7   vse32.v  v1,(a5) <== Store
   1018a:   03271163   bne  a4,s2,101ac 

Looks like the tracer and the sch1 resulted in the failure, it is a typical
Load Before Store issue AFAIK. The lw load should be after the vse32 store in
semantics but the sch1 moves it before the store and of course, the value of a4
is unexpected here.

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-11-22 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #31 from Li Pan  ---
We still have some unnecessary code here, which is stack-related, will take
care of it in another PATCH.

After this patch:
test:
  lui a5,%hi(.LANCHOR0)
  addia5,a5,%lo(.LANCHOR0)
  li  a4,32
  addisp,sp,-32   <== unnecessary insn
  vsetvli zero,a4,e8,m1,ta,ma
  vle8.v  v1,0(a5)
  vs1r.v  v1,0(a0)
  addisp,sp,32<== unnecessary insn
  jr  ra

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-11-20 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #21 from Li Pan  ---
(In reply to Robin Dapp from comment #18)
> I did a quick testsuite run on rv32 and can confirm that this fixes the
> issue for me.

Confirmed that this fixes the issue on RV64 too.

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-09 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

Li Pan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Li Pan  ---
The FLOATN support patch merged to trunk already, the below builtin has FLOATN
support now.

1. lrint
2. lround
3. llrint
4. llrount

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

--- Comment #5 from Li Pan  ---
(In reply to Li Pan from comment #4)
> (In reply to Richard Biener from comment #3)
> > Ah, yes, for lrint we have the builtins - I just looked for lceil here.  So
> > yeah, where there are DEF_EXT_LIB_FLOATN_NX_BUILTINS we should have
> > DEF_INTERNAL_FLT_FLOATN_FN.
> 
> Thanks Richard, I will have a try for this change.

After some double-confirmation, the related definition are list as below

 glibc  GCC-FLOATN_NX_BUILTINS
iceilN  N
ifloor   N  N
irintN  N
iround   N  N

lceilN  N
lfloor   N  N
lrintY  Y
lround   Y  Y

llceil   N  N
llfllor  N  N
llrint   Y  Y
llround  Y  Y

We only need to support lrint/lround/llrint/llround for FLOATN for now.

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-08 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

--- Comment #4 from Li Pan  ---
(In reply to Richard Biener from comment #3)
> Ah, yes, for lrint we have the builtins - I just looked for lceil here.  So
> yeah, where there are DEF_EXT_LIB_FLOATN_NX_BUILTINS we should have
> DEF_INTERNAL_FLT_FLOATN_FN.

Thanks Richard, I will have a try for this change.

[Bug c/112432] Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

--- Comment #2 from Li Pan  ---
(In reply to Richard Biener from comment #1)
> Is there a corresponding C API?  We don't have "generic" versions in
> builtins.def either (with _VAR).
> 
> That said, what's the testcase here?

I found some FLOATN like api from glibc doc, when given N is 16.

long int lrintfN (_FloatN x);
long int lroundfN (_FloatN x);

https://www.gnu.org/software/libc/manual/2.38/html_mono/libc.html

The context comes from the autovec for the lrintf and lrintf16. For example as
below

void
test_lrintf16 (long *out, _Float16 *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf16 (in[i]);
}

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}

We may have similar rtl code when compile with "-march=rv64gcv_zvfh_zfh
-mabi=lp64d -O3 -ftree-vectorize -ffast-math".
void
test_lrintf16 (long *out, _Float16 *in, unsigned count)
{
  # ivtmp.8_28 = PHI  
  # ivtmp.9_25 = PHI 
  _22 = (void *) ivtmp.8_28;
  _4 = MEM[(_Float16 *)_22];
  _7 = __builtin_lrintf16 (_4);
  _21 = (void *) ivtmp.9_25;
  MEM[(long int *)_21] = _7;
  ivtmp.8_27 = ivtmp.8_28 + 2;
  ivtmp.9_24 = ivtmp.9_25 + 8;
}

void
test_lrintf (long *out, float *in, unsigned count)
{
  # ivtmp.37_32 = PHI 
  # ivtmp.40_26 = PHI 
  _23 = (void *) ivtmp.37_32;
  vect__4.21_40 = MEM  [(float *)_23];
  vect__7.22_41 = .LRINT (vect__4.21_40); // Expand lrint
  _22 = (void *) ivtmp.40_26;
  MEM  [(long int *)_22] = vect__7.22_41;
  ivtmp.37_48 = ivtmp.37_32 + 64;
  ivtmp.40_25 = ivtmp.40_26 + 128;
}

[Bug c/112432] New: Internal-fn: The [i|l|ll]rint family don't support FLOATN

2023-11-07 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112432

Bug ID: 112432
   Summary: Internal-fn: The [i|l|ll]rint family don't support
FLOATN
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pan2.li at intel dot com
  Target Milestone: ---

The [i|l|ll]rint family are defined as DEF_INTERNAL_FLT_FN instead of
DEF_INTERNAL_FLT_FLOATN_FN in the internal-fn.def. Thus, the standard name like
lrint cannot be expanded when _Float16 type is given.

Is there any reason/background that [i|l|ll]rint can honor FLOATN or not? List
all related fn definition as below.

DEF_INTERNAL_FLT_FN (ICEIL, ECF_CONST, lceil, unary_convert)
DEF_INTERNAL_FLT_FN (IFLOOR, ECF_CONST, lfloor, unary_convert)
DEF_INTERNAL_FLT_FN (IRINT, ECF_CONST, lrint, unary_convert)
DEF_INTERNAL_FLT_FN (IROUND, ECF_CONST, lround, unary_convert)
DEF_INTERNAL_FLT_FN (LCEIL, ECF_CONST, lceil, unary_convert)
DEF_INTERNAL_FLT_FN (LFLOOR, ECF_CONST, lfloor, unary_convert)
DEF_INTERNAL_FLT_FN (LRINT, ECF_CONST, lrint, unary_convert)
DEF_INTERNAL_FLT_FN (LROUND, ECF_CONST, lround, unary_convert)
DEF_INTERNAL_FLT_FN (LLCEIL, ECF_CONST, lceil, unary_convert)
DEF_INTERNAL_FLT_FN (LLFLOOR, ECF_CONST, lfloor, unary_convert)
DEF_INTERNAL_FLT_FN (LLRINT, ECF_CONST, lrint, unary_convert)
DEF_INTERNAL_FLT_FN (LLROUND, ECF_CONST, lround, unary_convert)

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-11-01 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #27 from Li Pan  ---
Hi Richard and Juzhe.

I investigated this issue recently and noticed that it may be related to the
array size of the constant memory. Assume we have 2 functions as below.

vuint8m1_t fn_0 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m1(arr, 32);
}

vuint8m2_t fn_1 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m2(arr, 32);
}

The vuint8m1 will have stack variables but the vuint8m2 doesn't. Thus I guess
there may be some limitations when optimization. Finally, I located
extract_low_bits when get_stored_val in dse. Looks like it can only take care
of scalar mode if the nunits are not equal.

rtx extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
{
  ...
  if (!int_mode_for_mode (src_mode).exists (_int_mode)
  || !int_mode_for_mode (mode).exists (_mode))
return NULL_RTX;
  ...
}

I try to allow the vector mode for the gen_lowpart here if and only if the size
of mode is not greater than src mode. It can eliminate the stack variables as
we expected up to a point for the above functions.

I tested RVV regression and looks good for now. But I would like to double
confirm with you that it is reasonable? Before we start to do more testing. ;).

Thanks.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #8 from Li Pan  ---
Still fail in upstream.

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc -march=rv64imafdcv
-mabi=lp64d \
  -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax \
  --param riscv-autovec-lmul=dynamic --param vect-epilogues-nomask=0 \
  -ffast-math -lm
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
\
  -o test.elf

../build-qemu/qemu-riscv64 -cpu rv64,v=true,vlen=128,elen=64,vext_spec=v1.0
test.elf
assertion "dest_int32_t_int8_t[i * 2] == (src_int32_t_int8_t
[index_int32_t_int8_t[i * 2]] + 1)"
  failed: file
"gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c",
line 45, function: main

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (GCC) 14.0.0 20231021 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-31 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #7 from Li Pan  ---
Seems no luck when --param vect-epilogues-nomask=0. I will have a try with the
newest upstream for this issue if everything look OK, and keep you posted.

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc -march=rv64imafdcv
-mabi=lp64d \
  -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax \
  --param riscv-autovec-lmul=dynamic --param vect-epilogues-nomask=0 \
  -ffast-math -lm
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
\
  -o test.elf

../build-qemu/qemu-riscv64 -cpu rv64,v=true,vlen=128,elen=64,vext_spec=v1.0
test.elf
assertion "dest_int32_t_int8_t[i * 2] == (src_int32_t_int8_t
[index_int32_t_int8_t[i * 2]] + 1)" failed: \
  file
"gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c",
line 45, function: main

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (GCC) 14.0.0 20231019 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-27 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #5 from Li Pan  ---
Thank you, any thing I can help please feel free to let me know.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-25 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #3 from Li Pan  ---
Double confirmed the trunk of GCC still has this issue.

[Bug tree-optimization/111970] [14 regression] SLP for non-IFN gathers result in RISC-V test failure on gather since r14-4745-gbeab5b95c58145

2023-10-24 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #2 from Li Pan  ---
Add more information about how to build and run the test cases.

Build:

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc -march=rv64imafdcv
-mabi=lp64d -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax
--param riscv-autovec-lmul=dynamic -ffast-math -lm
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
-o test.elf

Run:

qemu-riscv64 -cpu rv64,v=true,vlen=128,elen=64,vext_spec=v1.0  test.elf
assertion "dest_float_uint8_t[i * 2] == (src_float_uint8_t
[index_float_uint8_t[i * 2]] + 1)" failed: file
"gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c",
line 106, function: main

  1   2   >