[Bug target/115752] [13/14/15 Regression] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-11 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #17 from chenglulu  ---
(In reply to Xi Ruoyao from comment #16)
> Should we do a backport to 13 and 14?

Yes, I'm doing test.

[Bug target/115752] [13/14/15 Regression] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-11 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

chenglulu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #15 from chenglulu  ---
Resolved.

[Bug target/115752] [13/14/15 Regression] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-03 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #13 from chenglulu  ---
(In reply to Xi Ruoyao from comment #12)
> (In reply to chenglulu from comment #11)
> > (In reply to chenglulu from comment #7)
> > > (In reply to Xi Ruoyao from comment #4)
> > > > Reduced more:
> > > > 
> > > > long double
> > > > test (long double xx)
> > > > {
> > > >__asm ("" :: "f"(xx));
> > > >return xx + 1;
> > > > }
> > > > 
> > > > and this one fails at -O2 & -O3 too.
> > > 
> > > I'm not sure if this should be an error or not...
> > 
> > So for such an inline assembly, we prefer a compiler output type mismatch
> > error, similar to riscv?
> 
> I agree.  But I'm slightly concerned if the ICE will pop out again in some
> cases using a pair of GPR for TFmode or TImode (I cannot see a difference...)
TI has been deleted by me, so just let TARGET_HARD_REGNO_OK(TFmode, fpreg)
return false here and it should be no problem.

[Bug target/115752] [13/14/15 Regression] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-03 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #11 from chenglulu  ---
(In reply to chenglulu from comment #7)
> (In reply to Xi Ruoyao from comment #4)
> > Reduced more:
> > 
> > long double
> > test (long double xx)
> > {
> >__asm ("" :: "f"(xx));
> >return xx + 1;
> > }
> > 
> > and this one fails at -O2 & -O3 too.
> 
> I'm not sure if this should be an error or not...

So for such an inline assembly, we prefer a compiler output type mismatch
error, similar to riscv?

[Bug target/115752] [13/14/15 Regression] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-03 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #2)
> 
> > On LA, if mode is TFmode and regno is the number of the floating-point
> > register, can this hook return true, or must it return false?
> 
> To me it can return true, but there's a different question "do we want it
> return true?".  I cannot see a benefit to store TFmode into a pair of FPRs
> anyway...
> 
> Maybe we need to exclude the last FPR (??) but even if I exclude the last
> FPR the ICE still triggers.

I think the same way, for TF type calculations are simulated by soft
floating-point, so putting two fixed-point registers is probably what we want

[Bug target/115752] [13/14/15 Regression] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-03 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #7 from chenglulu  ---
(In reply to Xi Ruoyao from comment #4)
> Reduced more:
> 
> long double
> test (long double xx)
> {
>__asm ("" :: "f"(xx));
>return xx + 1;
> }
> 
> and this one fails at -O2 & -O3 too.

I'm not sure if this should be an error or not...

[Bug target/115752] [13/14/15 Regression] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-03 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #6 from chenglulu  ---
(In reply to Xi Ruoyao from comment #5)
> Interestingly, not happening with 12.

Error from r15-1765,but I think there should have been a problem, it just
didn't come out.

[Bug c/115752] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-02 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #2 from chenglulu  ---
I have one place that I don't understand, and the description of the
TARGET_HARD_REGNO_MODE_OK is as follows:

This hook returns true if it is permissible to store a value of mode mode in
hard
register number regno (or in several registers starting with that one). The
default definition returns true unconditionally.

On LA, if mode is TFmode and regno is the number of the floating-point
register, can this hook return true, or must it return false?

[Bug c/115752] [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-02 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

--- Comment #1 from chenglulu  ---
test.c

extern long double test1 (long double);
extern long double test2 (long double);

long double
__ieee754_y1l (long double x, long double xx,
   long double z, long double p)
{
  if (xx <= 2)
{
  __asm ("" : "+f"(x));
  p = test1 (x) * test2 (x) + p;
  return p;
}
  return z;
}

$ gcc/cc1 test.c -o - -O1
.file   "test.c"
 __ieee754_y1l
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> {heap 908k}  {heap 908k} 
{heap 908k}  {heap 1180k}  {heap 1784k}
 {heap 1784k}  {heap 1784k}Streaming LTO
  {heap 1784k}  {heap 1784k}  {heap
1784k}  {heap 1784k}  {heap 1784k}  {heap 1784k}
 {heap 1784k}  {heap 1784k}  {heap
1784k}  {heap 1784k}   .text
Assembling functions:
 __ieee754_y1lduring RTL pass: reload

test.c: In function '__ieee754_y1l':
test.c:16:1: internal compiler error: maximum number of generated reload insns
per insn achieved (90)
   16 | }
  | ^
0x11d5abd lra_constraints(bool)
   
/home/chenglulu/work/loongisa-toolchain/new_toolchain/src/gcc-upstream-test/gcc/lra-constraints.cc:5402
0x11bf43f lra(_IO_FILE*, int)
   
/home/chenglulu/work/loongisa-toolchain/new_toolchain/src/gcc-upstream-test/gcc/lra.cc:2442
0x116c862 do_reload
   
/home/chenglulu/work/loongisa-toolchain/new_toolchain/src/gcc-upstream-test/gcc/ira.cc:5973
0x116cd16 execute
   
/home/chenglulu/work/loongisa-toolchain/new_toolchain/src/gcc-upstream-test/gcc/ira.cc:6161
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c/115752] New: [loongarch -O1] ICE: maximum number of generated reload insns per insn achieved (90)

2024-07-02 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115752

Bug ID: 115752
   Summary: [loongarch -O1] ICE: maximum number of generated
reload insns per insn achieved (90)
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
  Target Milestone: ---

[loongarch -O1] ICE: maximum number of generated reload insns per insn achieved
(90)

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-21 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #22 from chenglulu  ---
(In reply to Xi Ruoyao from comment #21)
> (In reply to chenglulu from comment #19)
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index e7835ae34ae..6a808cb0a5c 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -2383,7 +2383,7 @@ loongarch_address_insns (rtx x, machine_mode mode,
> > bool might_split_p)
> > return factor;
> >  
> >case ADDRESS_REG_REG:
> > -   return factor;
> > +   return factor * 3;
> >  
> >case ADDRESS_CONST_INT:
> > return lsx_p ? 0 : factor;
> > 
> > With this patch, -march=la464 has a score of 11.9.
> > However, the specific revision plan has not yet been decided.
> 
> Hmm are ldx and stx really so slow?

I think it's more like it's because LDX/STX uses an extra register.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-21 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #20 from chenglulu  ---
(In reply to chenglulu from comment #19)
> diff --git a/gcc/config/loongarch/loongarch.cc
> b/gcc/config/loongarch/loongarch.cc
> index e7835ae34ae..6a808cb0a5c 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -2383,7 +2383,7 @@ loongarch_address_insns (rtx x, machine_mode mode,
> bool might_split_p)
> return factor;
>  
>case ADDRESS_REG_REG:
> -   return factor;
> +   return factor * 3;
>  
>case ADDRESS_CONST_INT:
> return lsx_p ? 0 : factor;
> 
> With this patch, -march=la464 has a score of 11.9.
> However, the specific revision plan has not yet been decided.

This is the score of R14-9540

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-21 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #19 from chenglulu  ---
diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index e7835ae34ae..6a808cb0a5c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2383,7 +2383,7 @@ loongarch_address_insns (rtx x, machine_mode mode, bool
might_split_p)
return factor;

   case ADDRESS_REG_REG:
-   return factor;
+   return factor * 3;

   case ADDRESS_CONST_INT:
return lsx_p ? 0 : factor;

With this patch, -march=la464 has a score of 11.9.
However, the specific revision plan has not yet been decided.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-14 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #18 from chenglulu  ---
(In reply to Xi Ruoyao from comment #17)
> Strangely PR114074 is a wrong-code (instead of missed-optimization) and
> reverting its fix seems improving performance for other targets...

This is very strange. I tried turning off reg_reg addressing on the basis of
r14-9540, and the performance was not much different from r14-9539. But
unfortunately I still don’t know why

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-14 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #16 from chenglulu  ---
The performance degradation on LoongArch is caused by one commit:

commit e0e9499aeffdaca88f0f29334384aa5f710a81a4 (HEAD -> trunk)
Author: Richard Biener 
Date:   Tue Mar 19 12:24:08 2024 +0100

tree-optimization/114151 - revert PR114074 fix

The following reverts the chrec_fold_multiply fix and only keeps
handling of constant overflow which keeps the original testcase
fixed.  A better solution might involve ranger improvements or
tracking of assumptions during SCEV analysis similar to what niter
analysis does.

PR tree-optimization/114151
PR tree-optimization/114269
PR tree-optimization/114322
PR tree-optimization/114074
* tree-chrec.cc (chrec_fold_multiply): Restrict the use of
unsigned arithmetic when actual overflow on constant operands
is observed.

* gcc.dg/pr68317.c: Revert last change.
The scores before and after this patch are:
(-g -Ofast -march=la464)
r14-9539: 12.3
r14-9540: 9.26

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-09 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #15 from chenglulu  ---
(In reply to Chen Chen from comment #14)
> (In reply to Xi Ruoyao from comment #13)
> > (In reply to Chen Chen from comment #12)
> > 
> > > No. I used system default gcc.
> > 
> > AOSC backports *many* changes not in upstream GCC 13.2 to their "13.2":
> > https://github.com/AOSC-Dev/aosc-os-abbs/tree/stable/core-devel/gcc/01-
> > runtime/patches
> > 
> > So the default GCC is simply not GCC 13.2.
> 
> You are correct. The above 13.2 results should be "AOSC system default gcc
> 13.2" results. Under AOSC system I recompiled official gcc 13.2 source with
> the same parameters except for "--with-tune=la664" (changed to
> "--with-tune=la464" since gcc 13.2 does not support "LA664" architecture).
> The test results from official gcc 13.2 are following:
> 
> -g -Ofast -march=native  : 6.54 (400s)
> -g -Ofast -march=native -flto: 6.57 (399s)
> -g -Ofast -march=la464   : 6.46 (405s)
> -g -Ofast -march=la464 -flto : 6.57 (399s)

The data of r13.2 I tested is similar to this. I am currently testing gcc with
the AOSC patch.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-09 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #11 from chenglulu  ---
(In reply to Chen Chen from comment #0)
> We tested Loongarch64 CPU Loongson 3A6000 with "LA664" architecture in Linux
> operating system AOSC OS 11.4.0 (default gcc version is 13.2.0). And we
> found the 548.exchange2_r benchmark from SPEC 2017 INTrate suite suffered
> significant regressions from 14% to 28% with various compiling options.
> 
> The rate-1 results are following:
> 
> after snapshot 20240317 score 14.3-19.3% lower with parameters "-g -Ofast
> -march=native":
> 13.2.0:11.7 (223s) [gcc 13.2.0, system default]
Hi:

 I can't reproduce the score of r13.2. Have you made any modifications there?

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> 
> > diff --git a/gcc/config/loongarch/loongarch-def.cc
> > b/gcc/config/loongarch/loongarch-def.cc
> > index e8c129ce643..f27284cb20a 100644
> > --- a/gcc/config/loongarch/loongarch-def.cc
> > +++ b/gcc/config/loongarch/loongarch-def.cc
> > @@ -111,11 +111,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
> >   tune targets (i.e. -mtune=native while PRID does not correspond to
> >   any known "-mtune" type).  */
> >  array_tune loongarch_cpu_rtx_cost_data =
> > -  array_tune ()
> > -.set (CPU_LA664,
> > - loongarch_rtx_cost_data ()
> > -   .movcf2gr_ (COSTS_N_INSNS (1))
> > -   .movgr2cf_ (COSTS_N_INSNS (1)));
> > +  array_tune ();
> 
> But why?  Isn't movcf2gr and movgr2cf one-cycle on LA664?

I think this is weird too. I'm still testing other situations, and I'll find
out the reason after the testing is completed.

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #8 from chenglulu  ---
(In reply to Chen Chen from comment #0)
> We tested Loongarch64 CPU Loongson 3A6000 with "LA664" architecture in Linux
> operating system AOSC OS 11.4.0 (default gcc version is 13.2.0). And we
> found the 548.exchange2_r benchmark from SPEC 2017 INTrate suite suffered
> significant regressions from 14% to 28% with various compiling options.
> 
> The rate-1 results are following:
> 
/* snip */
> 
> after snapshot 20240317 score 18-23.1% lower with parameters "-g -Ofast
> -march=la664":   
> 13.2.0:"-march=la664" flag is not supported
> 20240317:  11.5 (227s)
> 20240324:  8.84 (296s)
> 20240430:  9.43 (278s)
> 14.1.0:9.42 (278s)
> 
/* snip */
> 
> 
> after snapshot 20240317 score 26.3-26.6% lower with parameters "-g -Ofast
> -march=la464":   
> 13.2.0:8.76 (299s)
> 20240317:  12.8 (205s)
> 20240324:  9.39 (279s)
> 20240430:  9.43 (278s)
> 14.1.0:9.43 (278s)
> 
> 

> 20240317:  11.5 (227s) -march=la664
> 20240317:  12.8 (205s) -march=la464
I looked for the reason for the gap between the above two results. The
performance regression is caused by r14-6814. If the following modifications
are made, the scores of -march=la664 and -march464 will be the same.

diff --git a/gcc/config/loongarch/loongarch-def.cc
b/gcc/config/loongarch/loongarch-def.cc
index e8c129ce643..f27284cb20a 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -111,11 +111,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
  tune targets (i.e. -mtune=native while PRID does not correspond to
  any known "-mtune" type).  */
 array_tune loongarch_cpu_rtx_cost_data =
-  array_tune ()
-.set (CPU_LA664,
- loongarch_rtx_cost_data ()
-   .movcf2gr_ (COSTS_N_INSNS (1))
-   .movgr2cf_ (COSTS_N_INSNS (1)));
+  array_tune ();

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2024-05-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978

--- Comment #5 from chenglulu  ---
I will verify it on multiple machines to see if the problem can be reproduced.

[Bug target/114848] loongarch: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-04-27 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114848

--- Comment #5 from chenglulu  ---
(In reply to Xi Ruoyao from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > (In reply to Xi Ruoyao from comment #1)
> > > Hmm, AFAIK this should be already fixed with r14-6440?
> > > 
> > > I cannot reproduce it with r14-9823 but maybe it has regressed again in 
> > > the
> > > recent weeks.
> > 
> > Oh I only tested gcc 13.2.0. If it is fixed you can close it.
> 
> Hmm it looks like we need a backport to releases/gcc-13 (and 12?)

I have backpointed r14-6440 to gcc-13 and gcc-12 and am testing

> 
> I thought the bug was introduced by my shrink-wrap change (r14-545) so I
> didn't proposed a backport.  But it seems I was wrong and the bug exists
> even before r14-545.

[Bug libfortran/114304] [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114304

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #23 from chenglulu  ---
(In reply to GCC Commits from comment #22)
> The master branch has been updated by Jerry DeLisle :
> 
> https://gcc.gnu.org/g:93adf88cc6744aa2c732b765e1e3b96e66cb3300
> 
> commit r14-9822-g93adf88cc6744aa2c732b765e1e3b96e66cb3300
> Author: Jerry DeLisle 
> Date:   Fri Apr 5 19:25:13 2024 -0700
> 
> libfortran: Fix handling of formatted separators.
> 
> PR libfortran/114304
> PR libfortran/105473
> 
> libgfortran/ChangeLog:
> 
> * io/list_read.c (eat_separator): Add logic to handle spaces
> preceding a comma or semicolon such that that a 'null' read
> occurs without error at the end of comma or semicolon
> terminated input lines. Add check and error message for ';'.
> (list_formatted_read_scalar): Treat comma as a decimal point
> when specified by the decimal mode on the first item.
> 
> gcc/testsuite/ChangeLog:
> 
> * gfortran.dg/pr105473.f90: Modify to verify new error message.
> * gfortran.dg/pr114304.f90: New test.

Hi,
This patch causes spec2017 527 and 627 tests to fail.

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-04-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #21 from chenglulu  ---
(In reply to Xi Ruoyao from comment #20)
> (In reply to chenglulu from comment #19)
> > (In reply to Xi Ruoyao from comment #18)
> > > (In reply to chenglulu from comment #17)
> > > 
> > > > The results of spec2006 on LA464 are:
> > > > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16
> > > 
> > > Would you send a patch for them or prefer I to do it?
> > 
> > I'll send a patch tomorrow.
> 
> Ping.
> 
> I'd like to do another system rebuild after this patch lands for verifying
> GCC 14.

Oh sorry, I'm waiting for yujie's patch, just merged today. I'll send this
align patch tomorrow.

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-27 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #19 from chenglulu  ---
(In reply to Xi Ruoyao from comment #18)
> (In reply to chenglulu from comment #17)
> 
> > The results of spec2006 on LA464 are:
> > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16
> 
> Would you send a patch for them or prefer I to do it?

I'll send a patch tomorrow.

[Bug tree-optimization/114027] [11/12 Regression] miscompile at `-O3 -fno-vect-cost-model -msse4.2`

2024-03-26 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #17 from chenglulu  ---
(In reply to Richard Biener from comment #14)
> int __attribute__((noipa))
> foo (int *f, int n)
> {
>   int res = 0;
>   for (int i = 0; i < n; ++i)
> {
>   if (f[2*i])
> res = 2;
>   if (f[2*i+1])
> res = -2;

Sorry I have a problem, the array f has 16 elements, the value of n is 16, when
the value of i is greater than 7, isn't it out of bounds to access f[2*i] and
f[2*i+1]?

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-25 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #17 from chenglulu  ---
(In reply to Xi Ruoyao from comment #15)
> > Hi,Ruoyao:
> > 
> >  The results of spec2006 on 3A6000 were obtained, I removed the more 
> > volatile
> > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
> > -falign-lables=4' this set of parameters got the highest score. This is the
> > same combination of parameters as the coremark tested by Xu Chenghua.
> > 
> > The test of the 3A5000 will also be completed around the 15th of this month,
> > so I want to change the code after the test results of the 3a5000 are out.
> > What do you think?
> 
> Ok to me.
> 
> I'm getting some different results on LA664:
> 
> 22031.284424 Compiler flags : -O2 -falign-labels=4 -falign-functions=8
> -falign-loops=8 -falign-jumps=32 -DPERFORMANCE_RUN=1 -lrt
> 
> vs the "best" one:
> 
> 22075.055188 Compiler flags : -O2 -falign-labels=4 -falign-functions=32
> -falign-loops=16 -falign-jumps=8 -DPERFORMANCE_RUN=1 -lrt
> 
> maybe such a 0.1% difference is some random fluctuation, or hardware or
> kernel configuration difference anyway.

The results of spec2006 on LA464 are:
-falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #16 from chenglulu  ---
(In reply to Xi Ruoyao from comment #15)
> > Hi,Ruoyao:
> > 
> >  The results of spec2006 on 3A6000 were obtained, I removed the more 
> > volatile
> > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
> > -falign-lables=4' this set of parameters got the highest score. This is the
> > same combination of parameters as the coremark tested by Xu Chenghua.
> > 
> > The test of the 3A5000 will also be completed around the 15th of this month,
> > so I want to change the code after the test results of the 3a5000 are out.
> > What do you think?
> 
> Ok to me.
> 
> I'm getting some different results on LA664:
> 
> 22031.284424 Compiler flags : -O2 -falign-labels=4 -falign-functions=8
> -falign-loops=8 -falign-jumps=32 -DPERFORMANCE_RUN=1 -lrt
> 
> vs the "best" one:
> 
> 22075.055188 Compiler flags : -O2 -falign-labels=4 -falign-functions=32
> -falign-loops=16 -falign-jumps=8 -DPERFORMANCE_RUN=1 -lrt
> 
> maybe such a 0.1% difference is some random fluctuation, or hardware or
> kernel configuration difference anyway.

It's also possible that I'll find a few more machines to test the coremark
score.

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-06 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #14 from chenglulu  ---
(In reply to chenglulu from comment #13)
> (In reply to Xi Ruoyao from comment #9)
> > (In reply to chenglulu from comment #8)
> > > (In reply to Xi Ruoyao from comment #7)
> > > > Any update? :)
> > > 
> > > Well, I haven't run it yet. Since this does not have a big impact on the
> > > spec score, I am currently testing it on a single-channel machine, so the
> > > test time will be longer.
> > > I will reply here as soon as the results are available.
> > 
> > Can we determine on LA664 if the current default alignment is better than
> > not aligning at all?  Coremarks results suggest the current default is even
> > worse than not aligning, but arguably Coremarks is far different from real
> > workloads. However if the current default is not better than not aligning
> > (or the difference is only marginal and is likely covered up by some random
> > fluctuation) we can disable the aligning for LA664.
> > 
> > (Maybe we and the HW engineers have done some repetitive work or even some
> > work cancelling each other out :(. )
> 
> The results of spec2006 on 3A6000 were obtained, I removed the more volatile
> test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
> -falign-lables=4' this set of parameters got the highest score. This is the
> same combination of parameters as the coremark tested by Xu Chenghua.

Hi,Ruoyao:

The test of the 3a5000 will also be completed around the 15th of this month, so
I want to change the code after the test results of the 3a5000 are out.
What do you think?(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> > (In reply to Xi Ruoyao from comment #7)
> > > Any update? :)
> > 
> > Well, I haven't run it yet. Since this does not have a big impact on the
> > spec score, I am currently testing it on a single-channel machine, so the
> > test time will be longer.
> > I will reply here as soon as the results are available.
> 
> Can we determine on LA664 if the current default alignment is better than
> not aligning at all?  Coremarks results suggest the current default is even
> worse than not aligning, but arguably Coremarks is far different from real
> workloads. However if the current default is not better than not aligning
> (or the difference is only marginal and is likely covered up by some random
> fluctuation) we can disable the aligning for LA664.

Hi,Ruoyao:

 The results of spec2006 on 3A6000 were obtained, I removed the more volatile
test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
-falign-lables=4' this set of parameters got the highest score. This is the
same combination of parameters as the coremark tested by Xu Chenghua.

The test of the 3A5000 will also be completed around the 15th of this month, so
I want to change the code after the test results of the 3a5000 are out.
What do you think?

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-06 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #13 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> > (In reply to Xi Ruoyao from comment #7)
> > > Any update? :)
> > 
> > Well, I haven't run it yet. Since this does not have a big impact on the
> > spec score, I am currently testing it on a single-channel machine, so the
> > test time will be longer.
> > I will reply here as soon as the results are available.
> 
> Can we determine on LA664 if the current default alignment is better than
> not aligning at all?  Coremarks results suggest the current default is even
> worse than not aligning, but arguably Coremarks is far different from real
> workloads. However if the current default is not better than not aligning
> (or the difference is only marginal and is likely covered up by some random
> fluctuation) we can disable the aligning for LA664.
> 
> (Maybe we and the HW engineers have done some repetitive work or even some
> work cancelling each other out :(. )

The results of spec2006 on 3A6000 were obtained, I removed the more volatile
test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32
-falign-lables=4' this set of parameters got the highest score. This is the
same combination of parameters as the coremark tested by Xu Chenghua.

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #12 from chenglulu  ---
(In reply to Xi Ruoyao from comment #11)
> (In reply to chenglulu from comment #10)
> > (In reply to Xi Ruoyao from comment #9)
> > > (In reply to chenglulu from comment #8)
> > > > (In reply to Xi Ruoyao from comment #7)
> > > > > Any update? :)
> > > > 
> > > > Well, I haven't run it yet. Since this does not have a big impact on the
> > > > spec score, I am currently testing it on a single-channel machine, so 
> > > > the
> > > > test time will be longer.
> > > > I will reply here as soon as the results are available.
> > > 
> > > Can we determine on LA664 if the current default alignment is better than
> > > not aligning at all?  Coremarks results suggest the current default is 
> > > even
> > > worse than not aligning, but arguably Coremarks is far different from real
> > > workloads. However if the current default is not better than not aligning
> > > (or the difference is only marginal and is likely covered up by some 
> > > random
> > > fluctuation) we can disable the aligning for LA664.
> > > 
> > > (Maybe we and the HW engineers have done some repetitive work or even some
> > > work cancelling each other out :(. )
> > On March 8th I should be able to get the test results on the 3A6000 machine,
> > I need to judge the fluctuation of the spec and then let's see if the
> > default alignment is set?
> 
> I just mean if we cannot get a decisive result before GCC 14 we may just
> turn off alignment.  But if we can get a decisive result as expected in Mar
> we can just use the best we'll find.

Well, the results should be available before GCC14 is released. It also seems
that the setting of 3A5000 needs to be changed, because the value of
'-falign-labels' was affected by the macro ASM_OUTPUT_ALIGN_WITH_NOP in the
previous test.

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-03-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> (In reply to chenglulu from comment #8)
> > (In reply to Xi Ruoyao from comment #7)
> > > Any update? :)
> > 
> > Well, I haven't run it yet. Since this does not have a big impact on the
> > spec score, I am currently testing it on a single-channel machine, so the
> > test time will be longer.
> > I will reply here as soon as the results are available.
> 
> Can we determine on LA664 if the current default alignment is better than
> not aligning at all?  Coremarks results suggest the current default is even
> worse than not aligning, but arguably Coremarks is far different from real
> workloads. However if the current default is not better than not aligning
> (or the difference is only marginal and is likely covered up by some random
> fluctuation) we can disable the aligning for LA664.
> 
> (Maybe we and the HW engineers have done some repetitive work or even some
> work cancelling each other out :(. )
On March 8th I should be able to get the test results on the 3A6000 machine, I
need to judge the fluctuation of the spec and then let's see if the default
alignment is set?
In addition, I also tested it on the 3A5000 again, and the results will be
available around March 15th.
The conclusion of coremark from our team leader Xu Chenghua is that
'-falign-labels' have a regular effect on the performance of coremark, and when
the value of '-falign-labels' is greater than 4 bytes, the performance
decreases significantly.

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-02-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #8 from chenglulu  ---
(In reply to Xi Ruoyao from comment #7)
> Any update? :)

Well, I haven't run it yet. Since this does not have a big impact on the spec
score, I am currently testing it on a single-channel machine, so the test time
will be longer.
I will reply here as soon as the results are available.

[Bug c/113626] New: The r14-8450 commit causes the loongarch [x]vfcmp-{d/f}.c test case to fail

2024-01-26 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113626

Bug ID: 113626
   Summary: The r14-8450 commit causes the loongarch
[x]vfcmp-{d/f}.c test case to fail
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
  Target Milestone: ---

The r14-8450 commit causes the loongarch [x]vfcmp-{d/f}.c test case to fail

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2024-01-15 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #6 from chenglulu  ---
Hi,Ruoyao:
I am testing the spec2006 scores when the parameters 'align-loops',
'align-jumps', 'align-functions', and 'align-labels' are '1', '8', '16', and
'32' respectively.
However, the test was suspended due to the company's power maintenance last
week, and it will take some time to retest.

[Bug middle-end/112985] LOGICAL_OP_NON_SHORT_CIRCUIT unconditionally execute comparison even if it's very expensive

2023-12-12 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112985

--- Comment #1 from chenglulu  ---
(In reply to Xi Ruoyao from comment #0)
> /* { dg-do compile } */
> /* { dg-options "-O2 -ffast-math -fdump-tree-gimple" } */
> 
> int
> short_circuit (float *a)
> {
>   float t1x = a[0];
>   float t2x = a[1];
>   float t1y = a[2];
>   float t2y = a[3];
>   float t1z = a[4];
>   float t2z = a[5];
> 
>   if (t1x > t2y  || t2x < t1y  || t1x > t2z || t2x < t1z || t1y > t2z || t2y
> < t1z)
> return 0;
> 
>   return 1;
> }
> 
> on LoongArch it produces something like:
> 
>   _1 = t1x > t2y;
>   _2 = t2x < t1y;
>   _3 = _1 | _2; 
>   if (_3 != 0) goto ; else goto ;
>   :
>   _4 = t1x > t2z;
>   _5 = t2x < t1z;
>   _6 = _4 | _5; 
>   if (_6 != 0) goto ; else goto ;
>   :
>   _7 = t1y > t2z;
>   _8 = t2y < t1z;
>   _9 = _7 | _8; 
>   if (_9 != 0) goto ; else goto ;
>   :
>   D.2209 = 0;
> 
> but it's better to produce 6 if (per
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640313.html it will
> produce a 1.8% improvement in SPECCPU 2017 fprate).
> 
> One obvious issue is LoongArch cost model for FP comparison is incorrect
> (PR112936) but even if I set the cost of floating-point comparison to 5000
> the gimple still produces 3 if with non-shorted comparisons.

I agree that the code should generate logic similar to a fixed point:

  slt $r17,$r15,$r14
  slt $r13,$r16,$r12
  or  $r13,$r13,$r17
  bstrpick.w  $r13,$r13,7,0 
  bnez$r13,.L3

[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance

2023-12-11 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919

--- Comment #5 from chenglulu  ---
(In reply to Xi Ruoyao from comment #4)

> Lulu: can you help to run some other benchmarks like SPEC (I don't have an
> access to it) and update these values for LA464 and LA664?
No problem, this is what I should do. However, there are many parameter
combinations, so the time may be longer.

[Bug target/112578] LoongArch: Wrong code -with -mlsx -fno-fp-int-builtin-inexact

2023-11-17 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112578

--- Comment #2 from chenglulu  ---
(In reply to Xi Ruoyao from comment #1)
> I've made a patch and it's under testing.
> 
> I've seen some "random" gcc.dg/torture/builtin-fp-int-inexact.c failures
> recently but maybe it's not related, we don't enable LSX/LASX compiling it
> anyway...  But who knows.

This problem of random errors has always existed.This is a known problem.

[Bug target/112330] [14 Regression] LoongArch: Bootstrap failure with GAS 2.41

2023-11-02 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112330

--- Comment #14 from chenglulu  ---
(In reply to Xi Ruoyao from comment #13)
> (In reply to chenglulu from comment #12)
> > (In reply to Xi Ruoyao from comment #11)
> > > I cherry-picked f87cf663af71e5d78c8d647fa48562102f3b0615 for Binutils 2.41
> > > and get some better error message:
> > > 
> > > t.s:98064: Error: Reloc overflow
> > > t.s:101127: Error: Reloc overflow
> > > t.s:101453: Error: Reloc overflow
> > > t.s:101555: Error: Reloc overflow
> > > 
> > > t.s is the assembly file in attachments.
> > 
> > 1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c is also required.
> 
> I know, I'm just trying to understand the issue better.
> 
> I don't really understand why this was not a problem before r14-4674.  And
> this issue also did not show up immediately after r14-4674 (it even did not
> show up when I was developing r14-4851).


We can clearly delete the macro ASM_OUTPUT_ALIGN_WITH_NOP in r14-4674, and
before deleting this macro, the generated assembly file looks like this:

...
   .align 16,54525952,4
.L1:
...
   blt  $r12,$r13,.L1
...

after:

   .align 16
.L1:
...
   blt $r12,$r13,.L1
...

First of all, if you add -mrelax during the assembly process,".align
16,54525952,4" will be based on the situation at the time of assembly, and the
choice is to insert a nop function without insertion.But this ".align 16" will
insert 3 nops unconditionally. When calculating the jump range of the
conditional branch, gcc calculates the space required for .align according to
the actual alignment.
So after r14-4674 there will be an error.

[Bug target/112330] [14 Regression] LoongArch: Bootstrap failure with GAS 2.41

2023-11-02 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112330

--- Comment #12 from chenglulu  ---
(In reply to Xi Ruoyao from comment #11)
> I cherry-picked f87cf663af71e5d78c8d647fa48562102f3b0615 for Binutils 2.41
> and get some better error message:
> 
> t.s:98064: Error: Reloc overflow
> t.s:101127: Error: Reloc overflow
> t.s:101453: Error: Reloc overflow
> t.s:101555: Error: Reloc overflow
> 
> t.s is the assembly file in attachments.

1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c is also required.

[Bug target/112330] [14 Regression] LoongArch: Bootstrap failure with GAS 2.41

2023-11-02 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112330

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> Xuerui informed me that non-LTO bootstrapping is broken too.

Well, this has nothing to do with whether to open lto or not, it is caused by
binutils inserting "nop" when relaxing optimization.

[Bug target/112330] [14 Regression] LoongArch: LTO bootstrap failure with GAS 2.41

2023-11-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112330

--- Comment #7 from chenglulu  ---
(In reply to Xi Ruoyao from comment #6)
> (In reply to chenglulu from comment #5)
> > (In reply to Xi Ruoyao from comment #4)
> > > (In reply to chenglulu from comment #3)
> > > > (In reply to Xi Ruoyao from comment #1)
> > > > > (In reply to Xi Ruoyao from comment #0)
> > > > > 
> > > > > > I guess the easiest solution is raising the minimal GAS requirement 
> > > > > > of
> > > > > > bootstrapping GCC 14 on LoongArch to 2.42.
> > > > > 
> > > > > Another solution might be default to -mno-relax if GAS is 2.41, but 
> > > > > I'm not
> > > > > sure if it's enough.
> > > > 
> > > > This issue really doesn't happen after adding -mno-relax, but is it 
> > > > really
> > > > necessary to judge the version of binutils because of this?
> > > 
> > > I'm not sure if -mno-relax is the proper fix.  For now I've reduced the 
> > > test
> > > case to:
> > > 
> > > a:
> > > .rept 10
> > > nop
> > > .endr
> > > beq $r12, $r13, a
> > > 
> > > but this still does not work with GAS 2.41 even if -mno-relax.
> > 
> > If this is the assembly code compiled by gcc, then I think it's a gcc bug,
> > although AS shouldn't be an internal error.
> 
> The internal error issue is fixed by Binutils commit
> f87cf663af71e5d78c8d647fa48562102f3b0615.
> 
> I think I've over-simplified the test case.  GCC does not generate something
> like this.

Uh, I also thought about this gcc and binutils matching issue when I submitted
r14-4674, but I didn't think about whether this should be solved? How to fix
it?

[Bug target/112330] [14 Regression] LoongArch: LTO bootstrap failure with GAS 2.41

2023-11-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112330

--- Comment #5 from chenglulu  ---
(In reply to Xi Ruoyao from comment #4)
> (In reply to chenglulu from comment #3)
> > (In reply to Xi Ruoyao from comment #1)
> > > (In reply to Xi Ruoyao from comment #0)
> > > 
> > > > I guess the easiest solution is raising the minimal GAS requirement of
> > > > bootstrapping GCC 14 on LoongArch to 2.42.
> > > 
> > > Another solution might be default to -mno-relax if GAS is 2.41, but I'm 
> > > not
> > > sure if it's enough.
> > 
> > This issue really doesn't happen after adding -mno-relax, but is it really
> > necessary to judge the version of binutils because of this?
> 
> I'm not sure if -mno-relax is the proper fix.  For now I've reduced the test
> case to:
> 
> a:
> .rept 10
> nop
> .endr
> beq $r12, $r13, a
> 
> but this still does not work with GAS 2.41 even if -mno-relax.

If this is the assembly code compiled by gcc, then I think it's a gcc bug,
although AS shouldn't be an internal error.

[Bug target/112330] [14 Regression] LoongArch: LTO bootstrap failure with GAS 2.41

2023-11-01 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112330

--- Comment #3 from chenglulu  ---
(In reply to Xi Ruoyao from comment #1)
> (In reply to Xi Ruoyao from comment #0)
> 
> > I guess the easiest solution is raising the minimal GAS requirement of
> > bootstrapping GCC 14 on LoongArch to 2.42.
> 
> Another solution might be default to -mno-relax if GAS is 2.41, but I'm not
> sure if it's enough.

This issue really doesn't happen after adding -mno-relax, but is it really
necessary to judge the version of binutils because of this?

[Bug go/112286] Go does not support LoongArch

2023-10-30 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112286

--- Comment #6 from chenglulu  ---
(In reply to Xi Ruoyao from comment #5)
> (In reply to chenglulu from comment #4)
> > (In reply to Xi Ruoyao from comment #2)
> > > (In reply to chenglulu from comment #1)
> > > > (In reply to Robin Lee from comment #0)
> > > > > Follow-up from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108682#c2
> > > > > 
> > > > > libgo runtime needs an update.
> > > > > gccgo build scripts need to support loongarch64.
> > > > 
> > > > Do I have to use gccgo? Since the version of go in gcc is relatively 
> > > > low,
> > > > this transplant requires some work.
> > > 
> > > GCC will eventually bump the Go version but it requires some non-trivial
> > > work: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108682#c6.  So I'd like
> > > to defer the transplant after Ian completes all these.
> > 
> > Do you know when the version of gccgo will be upgraded?
> 
> No but anyway we should be struggling to make it match the latest Google Go
> compiler.  If we cannot keep up one day I guess gccgo will be simply removed.

Ok, I see. Thanks!

[Bug go/112286] Go does not support LoongArch

2023-10-30 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112286

--- Comment #4 from chenglulu  ---
(In reply to Xi Ruoyao from comment #2)
> (In reply to chenglulu from comment #1)
> > (In reply to Robin Lee from comment #0)
> > > Follow-up from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108682#c2
> > > 
> > > libgo runtime needs an update.
> > > gccgo build scripts need to support loongarch64.
> > 
> > Do I have to use gccgo? Since the version of go in gcc is relatively low,
> > this transplant requires some work.
> 
> GCC will eventually bump the Go version but it requires some non-trivial
> work: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108682#c6.  So I'd like
> to defer the transplant after Ian completes all these.

Do you know when the version of gccgo will be upgraded?

[Bug go/112286] Go does not support LoongArch

2023-10-30 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112286

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #1 from chenglulu  ---
(In reply to Robin Lee from comment #0)
> Follow-up from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108682#c2
> 
> libgo runtime needs an update.
> gccgo build scripts need to support loongarch64.

Do I have to use gccgo? Since the version of go in gcc is relatively low, this
transplant requires some work. We have discussed with our colleagues who are
working on the Go compiler that if it is not necessary to use gccgo, we will
not do the transplant.

Thanks!

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-13 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

chenglulu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #21 from chenglulu  ---
fixed

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-09 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #18 from chenglulu  ---
(In reply to Xi Ruoyao from comment #17)
> I think the proper description should be:
> 
> diff --git a/gcc/config/loongarch/loongarch.md
> b/gcc/config/loongarch/loongarch.md
> index 75f641b38ee..000d17b0ba6 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -64,6 +64,8 @@ (define_c_enum "unspec" [
>UNSPEC_CRC
>UNSPEC_CRCC
>  
> +  UNSPEC_DIV_W_OPERAND
> +
>UNSPEC_LOAD_FROM_GOT
>UNSPEC_PCALAU12I
>UNSPEC_ORI_L_LO12
> @@ -892,7 +894,7 @@ (define_expand "3"
>  emit_insn (gen_rtx_SET (reg1, operands[1]));
>  emit_insn (gen_rtx_SET (reg2, operands[2]));
>  
> -emit_insn (gen_di3_fake (rd, reg1, reg2));
> +emit_insn (gen_si3_extended (rd, reg1, reg2));
>  emit_insn (gen_rtx_SET (operands[0],
>   simplify_gen_subreg (SImode, rd, DImode, 0)));
>  DONE;
> @@ -915,11 +917,14 @@ (define_insn "*3"
>   (const_string "yes")
>   (const_string "no")))])
>  
> -(define_insn "di3_fake"
> +(define_insn "si3_extended"
>[(set (match_operand:DI 0 "register_operand" "=r,,")
>   (sign_extend:DI
> -   (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> -   (match_operand:DI 2 "register_operand" "r,r,r"]
> +   (any_div:SI
> + (unspec:SI [(match_operand:DI 1 "register_operand" "r,r,0")]
> +UNSPEC_DIV_W_OPERAND)
> + (unspec:SI [(match_operand:DI 2 "register_operand" "r,r,r")]
> +UNSPEC_DIV_W_OPERAND]
>""
>  {
>return loongarch_output_division (".w\t%0,%1,%2", operands);
> 
> i. e. we define "UNSPEC_DIV_W_OPERAND" as a "machine-specific operation": if
> the input is a sign-extended 32-bit value, the operation extracts the low
> 32-bit; otherwise, it produces random junks.
> 
> Note that the behavior actually depends on the values of operand[1] and
> operands[2], not the result of operand[1] / operand[2].  So we should put
> unspec inside any_div, not outside.
> 
> (I've not included the TARGET_64BIT change here, it should be done anyway.)
> 
> BTW is LA664 improved to handle non-properly-extended inputs with div.w?

This problem has been fixed on LA664.
I don't quite understand why this operation is still needed in !TARGET_64BIT?

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-09 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #16 from chenglulu  ---
(In reply to Xi Ruoyao from comment #15)
> (In reply to chenglulu from comment #13)
> > (In reply to Xi Ruoyao from comment #12)
> > > (In reply to chenglulu from comment #11)
> > > > (In reply to Xi Ruoyao from comment #10)
> > > > > (In reply to Xi Ruoyao from comment #9)
> > > > > 
> > > > > >  (define_insn "di3_fake"
> > > > > >[(set (match_operand:DI 0 "register_operand" "=r,,")
> > > > > > -   (sign_extend:DI
> > > > > > - (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> > > > > > - (match_operand:DI 2 "register_operand" 
> > > > > > "r,r,r"]
> > > > > > -  ""
> > > > > > +   (if_then_else
> > > > > > + (and (eq (match_operand:DI 1 "register_operand" "r,r,0")
> > > > > > +  (sign_extend:DI (subreg:SI (match_dup 1) 0)))
> > > > > > +  (eq (match_operand:DI 2 "register_operand" "r,r,r")
> > > > > > +  (sign_extend:DI (subreg:SI (match_dup 2) 0
> > > > > > + (sign_extend:DI
> > > > > > +   (any_div:SI (subreg:SI (match_dup 1) 0)
> > > > > > +   (subreg:SI (match_dup 2) 0)))
> > > > > > + (unspec:DI [(const_int 0)] UNSPEC_BAD_DIVW)))]
> > > > > 
> > > > > With this the compiler will still believe all bad {div,mod}.w{,u}
> > > > 
> > > > I think this is already defined as UNSPEC. Isn’t the simpler the logic, 
> > > > the
> > > > better?
> > > 
> > > Yes, I think we should just use 4 different UNSPEC_ values and the simple
> > > version.  But I've not find a way to use 4 different UNSPEC_ values in the
> > > RTL template except duplicating everything 4 times...
> > 
> > I still have a question that I don't quite understand, that is, why that the
> > four generated strings are equivalent when using an UNSPEC name? My template
> > names are different, and they will not be automatically matched during
> > optimization.???
> 
> Oh I get it, you mean
> 
>  (define_insn "di3_fake"
>[(set (match_operand:DI 0 "register_operand" "=r,,")
>   (sign_extend:DI
> -   (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> -   (match_operand:DI 2 "register_operand" "r,r,r"]
> +   (unspec:DI [(any_div:DI
> + (match_operand:DI 1 "register_operand" "r,r,0")
> + (match_operand:DI 2 "register_operand" "r,r,r"))]
> +  UNSPEC_ANY_DIV)))]
>""
>  {
>return loongarch_output_division (".w\t%0,%1,%2", operands);
> 
> Good idea! I think it's better than my stupid hacks :).
> 
> I'd been thinking about:
> 
>  (define_insn "di3_fake"
>[(set (match_operand:DI 0 "register_operand" "=r,,")
>   (sign_extend:DI
> -   (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> -   (match_operand:DI 2 "register_operand" "r,r,r"]
> +   (unspec:DI [(match_operand:DI 1 "register_operand" "r,r,0")
> +   (match_operand:DI 2 "register_operand" "r,r,r")]
> +  UNSPEC_ANY_DIV)))]
>""
>  {
>return loongarch_output_division (".w\t%0,%1,%2", operands);
> 
> and this is just wrong.

Is it better to modify it this way?

--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -60,6 +60,7 @@ (define_c_enum "unspec" [
   ;; Stack tie
   UNSPEC_TIE

+  UNSPEC_ANY_DIV
   ;; CRC
   UNSPEC_CRC
   UNSPEC_CRCC
@@ -900,7 +901,7 @@ (define_expand "3"
 (match_operand:GPR 2 "register_operand")))]
   ""
 {
- if (GET_MODE (operands[0]) == SImode)
+ if (GET_MODE (operands[0]) == SImode && TARGET_64BIT)
   {
 rtx reg1 = gen_reg_rtx (DImode);
 rtx reg2 = gen_reg_rtx (DImode);
@@ -938,9 +939,12 @@ (define_insn "*3"
 (define_insn "di3_fake"
   [(set (match_operand:DI 0 "register_operand" "=r,,")
(sign_extend:DI
- (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
- (match_operand:DI 2 "register_operand" "r,r,r"]
-  ""
+ (unspec:SI
+  [(subreg:SI
+(any_div:DI (match_operand:DI 1 "register_operand" "r,r,0")
+(match_operand:DI 2 "register_operand" "r,r,r")) 0)]
+ UNSPEC_ANY_DIV)))]
+  "TARGET_64BIT"
 {
   return loongarch_output_division (".w\t%0,%1,%2", operands);

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #13 from chenglulu  ---
(In reply to Xi Ruoyao from comment #12)
> (In reply to chenglulu from comment #11)
> > (In reply to Xi Ruoyao from comment #10)
> > > (In reply to Xi Ruoyao from comment #9)
> > > 
> > > >  (define_insn "di3_fake"
> > > >[(set (match_operand:DI 0 "register_operand" "=r,,")
> > > > -   (sign_extend:DI
> > > > - (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> > > > - (match_operand:DI 2 "register_operand" 
> > > > "r,r,r"]
> > > > -  ""
> > > > +   (if_then_else
> > > > + (and (eq (match_operand:DI 1 "register_operand" "r,r,0")
> > > > +  (sign_extend:DI (subreg:SI (match_dup 1) 0)))
> > > > +  (eq (match_operand:DI 2 "register_operand" "r,r,r")
> > > > +  (sign_extend:DI (subreg:SI (match_dup 2) 0
> > > > + (sign_extend:DI
> > > > +   (any_div:SI (subreg:SI (match_dup 1) 0)
> > > > +   (subreg:SI (match_dup 2) 0)))
> > > > + (unspec:DI [(const_int 0)] UNSPEC_BAD_DIVW)))]
> > > 
> > > With this the compiler will still believe all bad {div,mod}.w{,u}
> > 
> > I think this is already defined as UNSPEC. Isn’t the simpler the logic, the
> > better?
> 
> Yes, I think we should just use 4 different UNSPEC_ values and the simple
> version.  But I've not find a way to use 4 different UNSPEC_ values in the
> RTL template except duplicating everything 4 times...

I still have a question that I don't quite understand, that is, why that the
four generated strings are equivalent when using an UNSPEC name? My template
names are different, and they will not be automatically matched during
optimization.???

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #11 from chenglulu  ---
(In reply to Xi Ruoyao from comment #10)
> (In reply to Xi Ruoyao from comment #9)
> 
> >  (define_insn "di3_fake"
> >[(set (match_operand:DI 0 "register_operand" "=r,,")
> > -   (sign_extend:DI
> > - (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> > - (match_operand:DI 2 "register_operand" "r,r,r"]
> > -  ""
> > +   (if_then_else
> > + (and (eq (match_operand:DI 1 "register_operand" "r,r,0")
> > +  (sign_extend:DI (subreg:SI (match_dup 1) 0)))
> > +  (eq (match_operand:DI 2 "register_operand" "r,r,r")
> > +  (sign_extend:DI (subreg:SI (match_dup 2) 0
> > + (sign_extend:DI
> > +   (any_div:SI (subreg:SI (match_dup 1) 0)
> > +   (subreg:SI (match_dup 2) 0)))
> > + (unspec:DI [(const_int 0)] UNSPEC_BAD_DIVW)))]
> 
> With this the compiler will still believe all bad {div,mod}.w{,u}

I think this is already defined as UNSPEC. Isn’t the simpler the logic, the
better?

> instructions generate the exactly same unspecified value.  But I don't think
> this is really relevant: if a program depends on the unspecified value (no
> matter one value or multiple values) it's already wrong.
> 
> If we are really "paranoid" about this we can make 4 UNSPEC_BAD_* constants
> and use [(match_dup 1) (match_dup 2)] instead of [(const_int 0)].
> 
> > +  "TARGET_64BIT"
> >  {
> >return loongarch_output_division (".w\t%0,%1,%2", operands);
> >  }

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #8 from chenglulu  ---
(In reply to Andrew Pinski from comment #4)
> (In reply to chenglulu from comment #3)
> > This involves the template di3_fake:
> > (define_insn "di3_fake"
> >   [(set (match_operand:DI 0 "register_operand" "=r,,")
> > (sign_extend:DI
> >   (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> >   (match_operand:DI 2 "register_operand" "r,r,r"]
> 
> That pattern definitely looks broken.
> Divide's operands' mode must match the mode of the divide IIRC.

OK, thanks! So the compilation failure is caused by an error in this template,
right? (Sorry, I don't understand the optimization of this combine)

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #7 from chenglulu  ---
(In reply to Xi Ruoyao from comment #6)
> (In reply to Xi Ruoyao from comment #5)
> > (In reply to chenglulu from comment #3)
> > > This involves the template di3_fake:
> > > (define_insn "di3_fake"
> > >   [(set (match_operand:DI 0 "register_operand" "=r,,")
> > > (sign_extend:DI
> > >   (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
> > >   (match_operand:DI 2 "register_operand" "r,r,r"]
> > >   ""
> > > {
> > >   return loongarch_output_division (".w\t%0,%1,%2", operands);
> > > }
> > >   [(set_attr "type" "idiv")
> > >(set_attr "mode" "SI")
> > >(set (attr "enabled")
> > >   (if_then_else
> > > (match_test "!!which_alternative == loongarch_check_zero_div_p()")
> > > (const_string "yes")
> > > (const_string "no")))])
> > > 
> > > 
> > > I think there is a problem with the implementation of this template. 
> > > First, the instructions generated in the template are [u]div.w[u], etc. 
> > > The
> > > description of such instructions in the instruction manual is that if the
> > > upper 32 bits are not extended by the 31st bit sign then the result is
> > > uncertain.
> > 
> > I think this reason alone makes the pattern looks very wrong.
> > 
> > I'll take a look...
> 
> Hmm, I guess we should just make di3_fake an UNSPEC because there is no way
> to use div.w and its friends out of 3.

I agree with your idea, so I tried changing it to something like this.Do you
think it's okay for me to change like this?


(define_insn "di3_fake"
  [(set (match_operand:DI 0 "register_operand" "=r,,")
(sign_extend:DI
  (unspec:SI [(any_div:SI (match_operand:DI 1 "register_operand"
"r,r,0")
  (match_operand:DI 2 "register_operand" "r,r,r"))]
   UNSPEC_ANY_DIV)))]
  ""
{
  return loongarch_output_division (".w\t%0,%1,%2", operands);
}
  [(set_attr "type" "idiv")
   (set_attr "mode" "SI")
   (set (attr "enabled")
  (if_then_else
(match_test "!!which_alternative == loongarch_check_zero_div_p()")
(const_string "yes")
(const_string "no")))])

[Bug c/111334] ICE is reported during the combine pass optimization

2023-09-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #3 from chenglulu  ---
This involves the template di3_fake:
(define_insn "di3_fake"
  [(set (match_operand:DI 0 "register_operand" "=r,,")
(sign_extend:DI
  (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
  (match_operand:DI 2 "register_operand" "r,r,r"]
  ""
{
  return loongarch_output_division (".w\t%0,%1,%2", operands);
}
  [(set_attr "type" "idiv")
   (set_attr "mode" "SI")
   (set (attr "enabled")
  (if_then_else
(match_test "!!which_alternative == loongarch_check_zero_div_p()")
(const_string "yes")
(const_string "no")))])


I think there is a problem with the implementation of this template. 
First, the instructions generated in the template are [u]div.w[u], etc. The
description of such instructions in the instruction manual is that if the upper
32 bits are not extended by the 31st bit sign then the result is uncertain.
The second port, I don't know if the following is correct.
  (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
  (match_operand:DI 2 "register_operand" "r,r,r"]
I try to modify this template:
(define_insn "di3_fake"
  [(set (match_operand:DI 0 "register_operand" "=r,,")
(sign_extend:DI
  (unspec:SI [(any_div:SI (match_operand:DI 1 "register_operand"
"r,r,0")
  (match_operand:DI 2 "register_operand" "r,r,r"))]
   UNSPEC_ANY_DIV)))]
  ""
{
  return loongarch_output_division (".w\t%0,%1,%2", operands);
}
  [(set_attr "type" "idiv")
   (set_attr "mode" "SI")
   (set (attr "enabled")
  (if_then_else
(match_test "!!which_alternative == loongarch_check_zero_div_p()")
(const_string "yes")
(const_string "no")))])

This problem can be solved. But I don't know if what I'm doing is correct...

[Bug c/111334] ICE is reported during the combine pass optimization

2023-09-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #2 from chenglulu  ---
This problem occurred after adding the r14-3511 optimization.

However, during the debugging process, it was discovered that it was due to the
attempt to generate rtx during the combine pass optimization.

(set (reg:DI 124)
(zero_extend:DI (subreg:QI (umod:SI (reg:DI 122 [ reg ])
(ior:DI (if_then_else:DI (eq:DI (reg:DI 114)
(const_int 0 [0]))
(reg:DI 112)
(const_int 0 [0]))
(reg:DI 118))) 0)))

During the optimization process, the function simplify_context::simplify_subreg
will make the following judgments:

rtx
simplify_context::simplify_subreg (machine_mode outermode, rtx op,
   machine_mode innermode, poly_uint64 byte)
{
  /* Little bit of sanity checking.  */
  gcc_assert (innermode != VOIDmode);
  gcc_assert (outermode != VOIDmode);
  gcc_assert (innermode != BLKmode);
  gcc_assert (outermode != BLKmode);

  gcc_assert (GET_MODE (op) == innermode
  || GET_MODE (op) == VOIDmode);
...

op is (reg:DI 122 [ reg ]) but innermode is SI_mode,so wrong.

[Bug c/111334] ICE is reported during the combine pass optimization

2023-09-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #1 from chenglulu  ---
$ gcc test.c -o - -S -O1

test.c: 在函数‘add_startpgm’中:
test.c:33:1: 编译器内部错误:在 simplify_subreg 中,于 simplify-rtx.cc:7538
   33 | }
  | ^
0x13506f4 simplify_context::simplify_subreg(machine_mode, rtx_def*,
machine_mode, poly_int<1u, unsigned long>)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/simplify-rtx.cc:7537
0x1351ea3 simplify_context::simplify_subreg(machine_mode, rtx_def*,
machine_mode, poly_int<1u, unsigned long>)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/simplify-rtx.cc:7787
0x135264c simplify_context::simplify_gen_subreg(machine_mode, rtx_def*,
machine_mode, poly_int<1u, unsigned long>)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/simplify-rtx.cc:7858
0xd1684f simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, poly_int<1u,
unsigned long>)
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/rtl.h:3549
0x1e4aa06 if_then_else_cond
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:9400
0x1e4a21e if_then_else_cond
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:9265
0x1e3ddb8 combine_simplify_rtx
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:5748
0x1e3d79a subst
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:5609
0x1e3df1f combine_simplify_rtx
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:5769
0x1e3d79a subst
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:5609
0x1e3d50d subst
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:5536
0x1e35f40 try_combine
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:3339
0x1e305a0 combine_instructions
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:1285
0x1e5addb rest_of_handle_combine
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:15063
0x1e5ae98 execute
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/combine.cc:15107

[Bug c/111334] New: ICE is reported during the combine pass optimization

2023-09-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

Bug ID: 111334
   Summary: ICE is reported during the combine pass optimization
   Product: gcc
   Version: rust/master
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
  Target Milestone: ---

Created attachment 55852
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55852=edit
test

[Bug target/110484] Spec2017 541 after adding the '-flto-fomit-frame-pointer' optimization, after optimizing the rnreg, directly replaced other registers with the $r22 register, so that the value of t

2023-08-31 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110484

chenglulu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from chenglulu  ---
resolved

[Bug c++/110484] Spec2017 541 after adding the '-flto-fomit-frame-pointer' optimization, after optimizing the rnreg, directly replaced other registers with the $r22 register, so that the value of the

2023-06-29 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110484

--- Comment #1 from chenglulu  ---

The following code modification problem can be solved:

--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1112,7 +1112,9 @@ loongarch_first_stack_step (struct loongarch_frame_info
*frame)
 static void
 loongarch_emit_stack_tie (void)
 {
-  emit_insn (gen_stack_tie (Pmode, stack_pointer_rtx,
hard_frame_pointer_rtx));
+  emit_insn (gen_stack_tie (Pmode, stack_pointer_rtx,
+   frame_pointer_needed ? hard_frame_pointer_rtx
+   : stack_pointer_rtx));
 }

[Bug c++/110484] New: Spec2017 541 after adding the '-flto-fomit-frame-pointer' optimization, after optimizing the rnreg, directly replaced other registers with the $r22 register, so that the value of

2023-06-29 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110484

Bug ID: 110484
   Summary: Spec2017 541 after adding the
'-flto-fomit-frame-pointer' optimization, after
optimizing the rnreg, directly replaced other
registers with the $r22 register, so that the value of
the $r22 register was destroyed without being saved.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
  Target Milestone: ---
Target: loongarch64-*-linux

Spec2017 541 after adding the '-flto-fomit-frame-pointer' optimization, after
optimizing the rnreg, directly replaced other registers with the $r22 register,
so that the value of the $r22 register was destroyed without being saved.

Through debugging, it was found that when compiling the SGFTree .cpp file, the
load_from_file function generated the following template in pro_and_epilogue
optimization:
(insn 782 781 783 61 (set (mem:BLK (scratch) [0  A8])
(unspec:BLK [
(reg/f:DI 3 $r3)
(reg/f:DI 22 $r22)
] UNSPEC_TIE)) "SGFTree.cpp":115:1 -1
 (nil))
This results in $r22 being present in the load_from_file's reg ever live list
thereafter. However, this is not desirable when the $r22 register is not used
in the function.

[Bug target/110136] After optimization, the $r1 register will be broken when jumping to the jump table, resulting in a significant increase in the false prediction rate of branch prediction.

2023-06-15 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136

chenglulu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from chenglulu  ---
This issue is resolved

[Bug target/110136] After optimization, the $r1 register will be broken when jumping to the jump table, resulting in a significant increase in the false prediction rate of branch prediction.

2023-06-06 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136

--- Comment #4 from chenglulu  ---
(In reply to Andrew Pinski from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > >In the regrename passover optimization
> > 
> > I am trying to understand the issue.
> > 
> > 5912 ldx.d   $r20,$r16,$r19
> >  5913 add.d   $r1,$r16,$r20
> >  5914 jr  $r1
> > 
> > Is the issue is jr does not like r1 register or some other kind of
> > performance issue?
> 
> If it is just r1 that is the issue, you could change the pattern in
> loongarch.md to discourage r1 by changing the constraints there.
> Because right now it assumes all registers are similar in cost:
> 
> (define_insn "@indirect_jump"
>   [(set (pc) (match_operand:P 0 "register_operand" "r"))]
>   ""
>   "jr\t%0"
>   [(set_attr "type" "jump")
>(set_attr "mode" "none")])

Thank you very much, I modified the template to have a try.

[Bug target/110136] After optimization, the $r1 register will be broken when jumping to the jump table, resulting in a significant increase in the false prediction rate of branch prediction.

2023-06-06 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136

--- Comment #3 from chenglulu  ---
(In reply to Andrew Pinski from comment #1)
> >In the regrename passover optimization
> 
> I am trying to understand the issue.
> 
> 5912 ldx.d   $r20,$r16,$r19
>  5913 add.d   $r1,$r16,$r20
>  5914 jr  $r1
> 
> Is the issue is jr does not like r1 register or some other kind of
> performance issue?

This is because if you break $r1 when jumping to the jump table, it will affect
the branching prediction rate of the hardware.

[Bug lto/110136] New: After optimization, the $r1 register will be broken when jumping to the jump table, resulting in a significant increase in the false prediction rate of branch prediction.

2023-06-06 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136

Bug ID: 110136
   Summary: After optimization, the $r1 register will be broken
when jumping to the jump table, resulting in a
significant increase in the false prediction rate of
branch prediction.
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
CC: marxin at gcc dot gnu.org
  Target Milestone: ---
Target: loongarch64-*-linux

Created attachment 55267
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55267=edit
perlbench.ltrans15.ltrans.args.0

tag: releases/gcc-12.2.0
The code here replicates the problem.

$ ./libexec/gcc/loongarch64-linux-gnu/12.2.0/lto1 -quiet -dumpbase
./perlbench.ltrans15.ltrans -mabi=lp64d -march=loongarch64 -mfpu=64
-mcmodel=normal -mtune=loongarch64 -g -g -Ofast -Ofast -version -fno-openmp
-fno-openacc -fcf-protection=none -fno-omit-frame-pointer -funroll-all-loops 
-fltrans @./perlbench.ltrans15.ltrans.args.0 -fdump-rtl-all -o
./perlbench.ltrans15.ltrans.s -fpie

Perl_sv_upgrade:
...
 5908 addi.w  $r18,$r0,15 # 0xf
 5909 bgtu$r23,$r18,.L502
 5910 la.local$r16,.L504
 5911 slli.d  $r19,$r23,3
 5912 ldx.d   $r20,$r16,$r19
 5913 add.d   $r1,$r16,$r20
 5914 jr  $r1
...

In the regrename passover optimization, replace the registers of lines 5193 and
5194 with $r1.

I tried debugging and found that the problem would be solved if hook
HARD_REGNO_RENAME_OK was defined, but found that this was just an accident and
there is no guarantee that this register will not be replaced with $r1 when
jumping to the jump table.

The patch that defines the HARD_REGNO_RENAME_OK is as follows:
diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index 5c9a33c14f7..0df0ae15c3e 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5782,6 +5782,19 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }

+/* Return nonzero if register FROM_REGNO can be renamed to register
+   TO_REGNO.  */
+
+bool
+loongarch_hard_regno_rename_ok (unsigned from_regno ATTRIBUTE_UNUSED,
+   unsigned to_regno)
+{
+  return df_regs_ever_live_p (to_regno);
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/loongarch/loongarch.h
b/gcc/config/loongarch/loongarch.h
index f9de9a6e4fb..b22b439eaac 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -563,6 +563,8 @@ enum reg_class
 #define IMM_BITS 12
 #define IMM_REACH (1LL << IMM_BITS)

+#define HARD_REGNO_RENAME_OK(FROM, TO) loongarch_hard_regno_rename_ok (FROM,
TO)
+
 /* True if VALUE is an unsigned 6-bit number.  */


Is there a way to make sure that the $r1 register is not corrupted when jumping
to the table?

[Bug tree-optimization/108357] [13 Regression] Dead Code Elimination Regression at -O2 since r13-4607-g2dc5d6b1e7ec88

2023-04-14 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108357

--- Comment #19 from chenglulu  ---
(In reply to Xi Ruoyao from comment #18)
> (In reply to Richard Biener from comment #17)
> > Isn't this the same issue as seen in another bug, most targets defining
> > TARGET_PROMOTE_PROTOTYPES to hook_bool_const_tree_true but loongarch not?
> > That will cause those conversions to be missed.
> 
> Looks like we should define it, as our psABI says:
> 
> In most cases, the unsigned integer data types are zero-extended when stored
> in general-purpose register, and the signed integer data types are
> sign-extended. However, in the LP64D ABI, unsigned 32-bit types, such as
> unsigned int, are stored in general-purpose registers as proper sign
> extensions of their 32-bit values.
> 
> IIUC it matches the semantics of TARGET_PROMOTE_PROTOTYPE

I also think this should be considered

[Bug tree-optimization/108357] [13 Regression] Dead Code Elimination Regression at -O2 since r13-4607-g2dc5d6b1e7ec88

2023-04-14 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108357

--- Comment #16 from chenglulu  ---
(In reply to rguent...@suse.de from comment #15)
> On Thu, 13 Apr 2023, xry111 at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108357
> > 
> > --- Comment #14 from Xi Ruoyao  ---
> > (In reply to rguent...@suse.de from comment #13)
> > > On Thu, 13 Apr 2023, chenglulu at loongson dot cn wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108357
> > > > 
> > > > --- Comment #10 from chenglulu  ---
> > > > (In reply to Xi Ruoyao from comment #5)
> > > > > The test fails on loongarch64-linux-gnu.  foo is kept in 
> > > > > 114t.threadfull1,
> > > > > but removed in 135t.forwprop3.
> > > > > 
> > > > > Does this mean something is wrong for LoongArch, or we should simply 
> > > > > check
> > > > > the tree dump in a later pass (for e.g. 254t.optimized)?
> > > > 
> > > > If the definition of the macro DEFAULT_SIGNED_CHAR is changed to 0, the 
> > > > test
> > > > case can pass the test. I guess it is because the definition of
> > > > DEFAULT_SIGNED_CHAR affects the optimization of the ccp pass, resulting 
> > > > in some
> > > > blocks that cannot be removed, resulting in the failure of this test 
> > > > case.
> > > 
> > > Can you check if making b unsigned fixes the test for you?  If so
> > > that's what we should do.
> > 
> > It works?
> > 
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr108357.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr108357.c
> > index 44c457b7a97..79cf371ef28 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/pr108357.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr108357.c
> > @@ -1,7 +1,7 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -fdump-tree-threadfull1" } */
> > 
> > -static char b;
> > +static unsigned char b;
> >  static unsigned c;
> >  void foo();
> >  short(a)(short d, short e) { return d * e; }
> > 
> > But I'm still wondering why this is not an issue for x86_64.
> 
> Yes, that's interesting to see.  It does change how b is extended
> in b ^ 9854 (but for the value zero it doesn't matter).

I think the problem is here:
In adjust_alignment, the intermediate result output of loongarch and x86 is as
follows:

LoongArch:
  ...
  b.2_1 = bD.2176;
  # RANGE [irange] short int [-128, 127]
  _2 = (short intD.12) b.2_1;
  # RANGE [irange] short int [-16384, -1][1, 16383]
  _3 = _2 ^ 9854;
  # RANGE [irange] unsigned short [1, 16383][49152, +INF]
  e.1_6 = (unsigned short) _3;
  _7 = e.1_6 * 5;
  _8 = (short intD.12) _7;
  # .MEM_15 = VDEF <.MEM_4(D)>
  bD.2176 = 0;
  if (_8 != 0)
goto ; [67.00%]
  else
goto ; [33.00%]
  ...
c.4_9 = 0;
  _10 = c.4_9 == 0;
  # RANGE [irange] int [0, 1] NONZERO 0x1
  _11 = (intD.1) _10;
  # RANGE [irange] int [-32768, -1][1, 32767]
  _12 = (intD.1) _8;
 ...

X86:
  ...
  b.2_1 = bD.2738;
  # RANGE [irange] short int [-128, 127]
  _2 = (short intD.17) b.2_1;
  # RANGE [irange] short int [-16384, -1][1, 16383]
  _3 = _2 ^ 9854;
  # RANGE [irange] unsigned short [1, 16383][49152, +INF]
  e.1_7 = (unsigned short) _3;
  _8 = e.1_7 * 5;
  _9 = (short intD.17) _8;
  # RANGE [irange] int [-32768, 32767]
  _4 = (intD.6) _9;
  d_10 = (short intD.17) _4;
  # .MEM_17 = VDEF <.MEM_5(D)>
  bD.2738 = 0;
  if (d_10 != 0)
goto ; [67.00%]
  else
goto ; [33.00%]
  ...


There is an additional intermediate variable _9 in x86 and loongarch does not,
but _8 is used, but _8 is used twice, so 
  if (_8 != 0)
goto ; [67.00%]
  else
goto ; [33.00%]
is not deleted when ccp2 passes.
That's why the test case failed. I think if loongarch can generate an
intermediate variable like x86, the test will pass.

[Bug tree-optimization/108357] [13 Regression] Dead Code Elimination Regression at -O2 since r13-4607-g2dc5d6b1e7ec88

2023-04-13 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108357

--- Comment #12 from chenglulu  ---
(In reply to Xi Ruoyao from comment #11)
> (In reply to chenglulu from comment #10)
> > (In reply to Xi Ruoyao from comment #5)
> > > The test fails on loongarch64-linux-gnu.  foo is kept in 114t.threadfull1,
> > > but removed in 135t.forwprop3.
> > > 
> > > Does this mean something is wrong for LoongArch, or we should simply check
> > > the tree dump in a later pass (for e.g. 254t.optimized)?
> > 
> > If the definition of the macro DEFAULT_SIGNED_CHAR is changed to 0, the test
> > case can pass the test. I guess it is because the definition of
> > DEFAULT_SIGNED_CHAR affects the optimization of the ccp pass, resulting in
> > some blocks that cannot be removed, resulting in the failure of this test
> > case.
> 
> Hmm, but we cannot change DEFAULT_SIGNED_CHAR or we'll break ABI and API
> everywhere.  And x86_64-linux-gnu also uses DEFAULT_SIGNED_CHAR=1.

Uh, I didn't notice this, I'll keep looking.

[Bug tree-optimization/108357] [13 Regression] Dead Code Elimination Regression at -O2 since r13-4607-g2dc5d6b1e7ec88

2023-04-13 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108357

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #5)
> The test fails on loongarch64-linux-gnu.  foo is kept in 114t.threadfull1,
> but removed in 135t.forwprop3.
> 
> Does this mean something is wrong for LoongArch, or we should simply check
> the tree dump in a later pass (for e.g. 254t.optimized)?

If the definition of the macro DEFAULT_SIGNED_CHAR is changed to 0, the test
case can pass the test. I guess it is because the definition of
DEFAULT_SIGNED_CHAR affects the optimization of the ccp pass, resulting in some
blocks that cannot be removed, resulting in the failure of this test case.

[Bug tree-optimization/109192] [13 Regression] timeout with -O3 since r13-5579

2023-03-29 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109192

--- Comment #15 from chenglulu  ---
(In reply to Andrew Macleod from comment #14)
> The upcoming patch for 109274 should resolve this.

The problem has been solved. Thanks!

[Bug tree-optimization/109192] [13 Regression] timeout with -O3 since r13-5579

2023-03-28 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109192

--- Comment #13 from chenglulu  ---
(In reply to CVS Commits from comment #9)
> The master branch has been updated by Andrew Macleod :
> 
> https://gcc.gnu.org/g:0963cb5fde158cce986523a90fa9edc51c881f31
> 
> commit r13-6787-g0963cb5fde158cce986523a90fa9edc51c881f31
> Author: Andrew MacLeod 
> Date:   Mon Mar 20 16:11:12 2023 -0400
> 
> Terminate GORI calculations if a relation is not relevant.
> 
> We currently allow VARYING lhs GORI calculations to continue if there is
> a relation present in the hope it will eventually better refine a result.
> This adds a check that the relation is relevant to the outgoing range
> calculation first.  If it is not relevant, stop calculating.
> 
> PR tree-optimization/109192
> * gimple-range-gori.cc (gori_compute::compute_operand_range):
> Terminate gori calculations if a relation is not relevant.
> * value-relation.h (value_relation::set_relation): Allow
> equality between op1 and op2 if they are the same.

After applying this patch, the zheev.fppized.f file in the compilation
attachment under the LoongArch and riscv64 architecture will report ICE. The
error message is as follows:

$ riscv64-linux-gnu-gfortran -c -o zheev.fppized.o -O3
-fno-aggressive-loop-optimizations -std=legacy zheev.fppized.f

during GIMPLE pass: dom
zheev.fppized.f:4968:23:

 4968 |   SUBROUTINE DLAMC2( BETA, T, RND, EPS, EMIN, RMIN, EMAX, RMAX )
  |   ^
internal compiler error: in in_chain_p, at gimple-range-gori.cc:119
0x25683f3 range_def_chain::in_chain_p(tree_node*, tree_node*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:119
0x2569e66 gori_compute::compute_operand_range(vrange&, gimple*, vrange const&,
tree_node*, fur_source&, value_relation*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:667
0x256bfd4 gori_compute::compute_operand1_range(vrange&,
gimple_range_op_handler&, vrange const&, tree_node*, fur_source&,
value_relation*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:1174
0x256a408 gori_compute::compute_operand_range(vrange&, gimple*, vrange const&,
tree_node*, fur_source&, value_relation*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:726
0x256c648 gori_compute::compute_operand2_range(vrange&,
gimple_range_op_handler&, vrange const&, tree_node*, fur_source&,
value_relation*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:1254
0x256c744 gori_compute::compute_operand1_and_operand2_range(vrange&,
gimple_range_op_handler&, vrange const&, tree_node*, fur_source&,
value_relation*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:1274
0x256a3b7 gori_compute::compute_operand_range(vrange&, gimple*, vrange const&,
tree_node*, fur_source&, value_relation*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:723
0x256ccfc gori_compute::outgoing_edge_range_p(vrange&, edge_def*, tree_node*,
range_query&)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:1384
0x255e2dc ranger_cache::edge_range(vrange&, edge_def*, tree_node*,
ranger_cache::rfd_mode)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-cache.cc:964
0x255e465 ranger_cache::range_on_edge(vrange&, edge_def*, tree_node*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-cache.cc:1001
0x2563ba9 fur_edge::get_operand(vrange&, tree_node*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-fold.cc:131
0x25652a5 fold_using_range::range_of_range_op(vrange&,
gimple_range_op_handler&, fur_source&)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-fold.cc:558
0x2564cf8 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&,
tree_node*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-fold.cc:489
0x256420c fold_range(vrange&, gimple*, edge_def*, range_query*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-fold.cc:326
0x256cf5f gori_compute::outgoing_edge_range_p(vrange&, edge_def*, tree_node*,
range_query&)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-gori.cc:1411
0x2560449 ranger_cache::range_from_dom(vrange&, tree_node*, basic_block_def*,
ranger_cache::rfd_mode)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-cache.cc:1524
0x255f069 ranger_cache::fill_block_cache(tree_node*, basic_block_def*,
basic_block_def*)
   
/home/chenglulu/work/loongisa-toolchain/gcc-upstream/gcc/gimple-range-cache.cc:1212
0x255e5c7 ranger_cache::block_range(vrange&, basic_block_def*, tree_node*,
bool)
   

[Bug tree-optimization/109192] [13 Regression] timeout with -O3 since r13-5579

2023-03-28 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109192

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #12 from chenglulu  ---
Created attachment 54776
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54776=edit
zheev.fppized.f

[Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch

2023-03-10 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #6 from chenglulu  ---
I tried changing the code,
diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 4220639..efaea6922b5 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -914,6 +914,11 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p,
bool first_p,
   /* First see if the source is of the form (plus (...) CST).  */
   if (plus_src && poly_int_rtx_p (XEXP (plus_src, 1), ))
plus_cst_src = plus_src;
+  else if (plus_src && ira_reg_equiv[REGNO (XEXP (plus_src, 1))].constant)
+   {
+ poly_int_rtx_p (ira_reg_equiv[REGNO (XEXP (plus_src, 1))].constant,
);
+ plus_cst_src = gen_rtx_PLUS (GET_MODE (XEXP (plus_src, 0)),XEXP
(plus_src, 0), ira_reg_equiv[REGNO (XEXP (plus_src, 1))].constant);
+   }
   /* Check that the first operand of the PLUS is a hard reg or
 the lowpart subreg of one.  */
   if (plus_cst_src)

Redundant instructions can be eliminated, but I don't know if it can be
modified like this.

[Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch

2023-03-10 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #5 from chenglulu  ---
On AARCH64:
$cat t.c
int test(int x)
{
char buf[128 << 10];
return buf[x];
}
$cat t-1.c
int test(int x)
{
char buf[0xfff];
return buf[x];
}

The generated assemblies are as follows:

  t.s  | t-1.s
test:  |test:
.LFB0: |.LFB0:
.cfi_startproc |.cfi_startproc
sub sp, sp, #131072|mov x12, 16
.cfi_def_cfa_offset 131072 |mov x2, 16
ldrbw0, [sp, w0, sxtw] |movkx12, 0x1000, lsl 16
add sp, sp, 131072 |sub sp, sp, x12
.cfi_def_cfa_offset 0  |.cfi_def_cfa_offset 268435472
ret|movkx2, 0x1000, lsl 16
.cfi_endproc   |mov x1, -268435456
.LFE0: |add x1, x2, x1
.size   test, .-test   |add x1, sp, x1
   |str x1, [sp, 8]
   |ldrbw0, [x1, w0, sxtw]
   |add sp, sp, x12
   |.cfi_def_cfa_offset 0
   |ret

In my opinion, not only the instruction "str x1, [sp, 8]" is redundant in
t-1.s.
The following instructions are redundant:
"movk   x2, 0x1000, lsl 16
 movx1, -268435456
 addx1, x2, x1
 addx1, sp, x1
 strx1, [sp, 8]"

Comparing the intermediate results of the two test cases t.c t-1.c reload pass
optimization, I found the reason for these redundant instructions.
In t.c,
(insn 7 15 12 2 (set (reg/f:DI 96)
(plus:DI (reg/f:DI 64 sfp)
(const_int -131072 [0xfffe]))) "t-1.c":4:12 153
{*adddi3_aarch64}
 (expr_list:REG_EQUIV (plus:DI (reg/f:DI 64 sfp)
(const_int -131072 [0xfffe]))
(nil)))
It will be deleted after reload.

In t-1.c, the behavior of insn 7 in t.c is realized by two instructions
(insn 7 16 8 2 (set (reg:DI 97)
(const_int -268435456 [0xf000])) "t.c":4:12 65
{*movdi_aarch64}
 (expr_list:REG_EQUIV (const_int -268435456 [0xf000])
(nil)))
(insn 8 7 13 2 (set (reg:DI 96)
(plus:DI (reg/f:DI 64 sfp)
(reg:DI 97))) "t.c":4:12 153 {*adddi3_aarch64}
 (expr_list:REG_DEAD (reg:DI 97)
(expr_list:REG_EQUIV (plus:DI (reg/f:DI 64 sfp)
(const_int -268435456 [0xf000]))
(nil
Due to the problem of reload pass optimization, these two instructions are not
deleted, thus generating redundant instructions.

[Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch

2023-03-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #4 from chenglulu  ---
(In reply to Xi Ruoyao from comment #3)

> I don't really understand why we should prefer the memory if there is a
> REG_EQUIV note, nor why this does not happen with -fPIE.
I didn't understand the optimization of this place, but I found that if
FRAME_GROWS_DOWNWARD is set to 0, then this problem can be avoided.

[Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch

2023-03-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #2 from chenglulu  ---
I think this is most likely caused by the implementation of the public code. If
the immediate value is set to be large enough, such as 0xfff, aarch64 also
has the same problem.

$ cat t.c
int test(int x)
{
char buf[0xfff];
return buf[x];
}

$ ./cc1 t.c -o - -O2  2>/dev/null | grep test -A20
test:
.LFB0:
.cfi_startproc
mov x12, 16
mov x2, 16
movkx12, 0x1000, lsl 16
sub sp, sp, x12
.cfi_def_cfa_offset 268435472
movkx2, 0x1000, lsl 16
mov x1, -268435456
add x1, x2, x1
add x1, sp, x1
str x1, [sp, 8]
ldrbw0, [x1, w0, sxtw]
add sp, sp, x12
.cfi_def_cfa_offset 0
ret

The "str x1, [sp, 8]" instruction also has the same problem.

[Bug target/107731] loongarch Operand Modifiers are not documented

2023-01-23 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107731

chenglulu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #9 from chenglulu  ---
Fixed

[Bug target/107713] Wrong implementation atomic_exchange on LoongArch

2022-11-19 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107713

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> Fixed for gcc-12 too.

Thanks! ^v^

[Bug target/107713] Wrong implementation atomic_exchange on LoongArch

2022-11-18 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107713

--- Comment #7 from chenglulu  ---
(In reply to Xi Ruoyao from comment #6)
> Fixed for trunk.  Should we backport it to gcc-12 branch too?

I don't know what the problem is, I always fail when I backport.
If it is convenient for you, could you help me backport?

[Bug target/107731] loongarch Operand Modifiers are not documented

2022-11-16 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107731

--- Comment #7 from chenglulu  ---
(In reply to Andrew Pinski from comment #3)
> MIPS nor RISCV does not define a %c either.

These two architectures can also fail under the following conditions:

void
test(void)
{
  asm (".long %c0"
   ::"i"(0x12345678));
}

[Bug target/107731] loongarch Operand Modifiers are not documented

2022-11-16 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107731

--- Comment #6 from chenglulu  ---
(In reply to Andrew Pinski from comment #1)
> %c does not mean anything in loongarch.
> 
> The codes are not documented in the documentation for loonarch though but
> they currently only documented in loongarch.cc:
>'A'  Print a _DB suffix if the memory model requires a release.
>'b'  Print the address of a memory operand, without offset.
>'C'  Print the integer branch condition for comparison OP.
>'d'  Print CONST_INT OP in decimal.
>'F'  Print the FPU branch condition for comparison OP.
>'G'  Print a DBAR insn if the memory model requires a release.
>'H'  Print address 52-61bit relocation associated with OP.
>'h'  Print the high-part relocation associated with OP.
>'i'  Print i if the operand is not a register.
>'L'  Print the low-part relocation associated with OP.
>'m'  Print one less than CONST_INT OP in decimal.
>'N'  Print the inverse of the integer branch condition for comparison OP.
>'r'  Print address 12-31bit relocation associated with OP.
>'R'  Print address 32-51bit relocation associated with OP.
>'T'  Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
>   'z' for (eq:?I ...), 'n' for (ne:?I ...).
>'t'  Like 'T', but with the EQ/NE cases reversed
>'V'  Print exact log2 of CONST_INT OP element 0 of a replicated
>   CONST_VECTOR in decimal.
>'W'  Print the inverse of the FPU branch condition for comparison OP.
>'X'  Print CONST_INT OP in hexadecimal format.
>'x'  Print the low 16 bits of CONST_INT OP in hexadecimal format.
>'Y'  Print loongarch_fp_conditions[INTVAL (OP)]
>'y'  Print exact log2 of CONST_INT OP in decimal.
>'Z'  Print OP and a comma for 8CC, otherwise print nothing.
>'z'  Print $0 if OP is zero, otherwise print OP normally.  */

Sorry, I'll put the ones that will be used in the document.(In reply to Xi
Ruoyao from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > %c does not mean anything in loongarch.
> > 
> > The codes are not documented in the documentation for loonarch though but
> > they currently only documented in loongarch.cc:
> >'A'  Print a _DB suffix if the memory model requires a release.
> >'b'  Print the address of a memory operand, without offset.
> >'C'  Print the integer branch condition for comparison OP.
> >'d'  Print CONST_INT OP in decimal.
> >'F'  Print the FPU branch condition for comparison OP.
> >'G'  Print a DBAR insn if the memory model requires a release.
> >'H'  Print address 52-61bit relocation associated with OP.
> >'h'  Print the high-part relocation associated with OP.
> >'i'  Print i if the operand is not a register.
> >'L'  Print the low-part relocation associated with OP.
> >'m'  Print one less than CONST_INT OP in decimal.
> >'N'  Print the inverse of the integer branch condition for comparison OP.
> >'r'  Print address 12-31bit relocation associated with OP.
> >'R'  Print address 32-51bit relocation associated with OP.
> >'T'  Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
> >   'z' for (eq:?I ...), 'n' for (ne:?I ...).
> >'t'  Like 'T', but with the EQ/NE cases reversed
> >'V'  Print exact log2 of CONST_INT OP element 0 of a replicated
> >   CONST_VECTOR in decimal.
> >'W'  Print the inverse of the FPU branch condition for comparison OP.
> >'X'  Print CONST_INT OP in hexadecimal format.
> >'x'  Print the low 16 bits of CONST_INT OP in hexadecimal format.
> >'Y'  Print loongarch_fp_conditions[INTVAL (OP)]
> >'y'  Print exact log2 of CONST_INT OP in decimal.
> >'Z'  Print OP and a comma for 8CC, otherwise print nothing.
> >'z'  Print $0 if OP is zero, otherwise print OP normally.  */
> 
> Interestingly it "worked" with GCC 12.2...

(In reply to Xi Ruoyao from comment #5)
> (In reply to Andrew Pinski from comment #4)
> > (In reply to Xi Ruoyao from comment #2)
> > > Interestingly it "worked" with GCC 12.2...
> 
> No it does not work.  I guess I typed the test command in a wrong SSH
> session...

I have found the error cause, I will push patch later

[Bug target/107731] New: error: invalid 'asm': invalid use of '%c'

2022-11-16 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107731

Bug ID: 107731
   Summary: error: invalid 'asm': invalid use of '%c'
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
CC: paulhua at gcc dot gnu.org, xry111 at gcc dot gnu.org
  Target Milestone: ---
Target: loongarch64-linux-gnu

void
test(void)
{
  asm (".long %c0"
   ::"i"(10));
}
Compiling this test case under loongarch64 failed with the following
information:

test1.c: In function 'test':
test1.c:3:3: error: invalid 'asm': invalid use of '%c'
3 |   asm (".long %c0"
  |   ^~~

[Bug target/107453] [13 Regression] New stdarg tests in r13-3549-g4fe34cdcc80ac2 fail

2022-11-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107453

chenglulu  changed:

   What|Removed |Added

 CC||chenglulu at loongson dot cn

--- Comment #4 from chenglulu  ---

On LoongArch:
Since type_arg_types is null, the value of n_named_args is set to num_actuals.

  function_arg_info arg (type, argpos < n_named_args);
  if (pass_by_reference (args_so_far_pnt, arg))

then arg.named is true. Cause test failure.

[Bug target/106096] [13 regression] ICE building stage 2 libgcc on loongarch64-linux-gnu because stage 2 gcc is miscompiled

2022-06-28 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106096

--- Comment #10 from chenglulu  ---
(In reply to Xi Ruoyao from comment #9)
> Created attachment 53214 [details]
> patch removing r13 from SIBCALL_REGS
> 
> I'm testing this patch now.
> 
> I suggest to apply this for trunk and gcc-12 branch first (as gcc-12 also
> miscompiles the test case).
> 
> Then if the reordering of RA preference can improve performance, you may
> apply it later (and also adjust the changes in this patch again).

OK!

[Bug target/106097] undefined behaviors regarding integer shifts in loongarch_build_integer

2022-06-28 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106097

--- Comment #11 from chenglulu  ---

> Otherwise LGTM.  As the port maintainer you can push it directly into
> master.  Normally we should bootstrap and regtest before applying a patch,
> but currently the bootstrap is blocked by PR106096 :(.
Ok, I will submit patches in order.

[Bug target/106096] [13 regression] ICE building stage 2 libgcc on loongarch64-linux-gnu because stage 2 gcc is miscompiled

2022-06-28 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106096

--- Comment #8 from chenglulu  ---
(In reply to Xi Ruoyao from comment #6)
> (In reply to chenglulu from comment #5)
> > Created attachment 53213 [details]
> > Modify the allocation order of caller saved registers.
> 
> I think we need to completely prevent LARCH_PROLOGUE_TEMP from being used
> for sibcall:
> 
> diff --git a/gcc/config/loongarch/loongarch.h
> b/gcc/config/loongarch/loongarch.h
> index 4d107a42209..f9de9a6e4fb 100644
> --- a/gcc/config/loongarch/loongarch.h
> +++ b/gcc/config/loongarch/loongarch.h
> @@ -511,7 +511,7 @@ enum reg_class
>  #define REG_CLASS_CONTENTS \
>  {  \
>{ 0x, 0x, 0x },  /* NO_REGS  */  \
> -  { 0x001ff000, 0x, 0x },  /* SIBCALL_REGS  */ \
> +  { 0x001fd000, 0x, 0x },  /* SIBCALL_REGS  */ \
>{ 0xff90, 0x, 0x },  /* JIRL_REGS  */\
>{ 0xfffc, 0x, 0x },  /* CSR_REGS  */ \
>{ 0x, 0x, 0x },  /* GR_REGS  */  \
> 
> Or even if LARCH_PROLOGUE_TEMP is less preferred, the register allocator may
> still use it for sibcall and blow something up again.
> 
This solved my doubt. ^v^

[Bug target/106096] [13 regression] ICE building stage 2 libgcc on loongarch64-linux-gnu because stage 2 gcc is miscompiled

2022-06-28 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106096

--- Comment #5 from chenglulu  ---
Created attachment 53213
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53213=edit
Modify the allocation order of caller saved registers.

[Bug target/106097] undefined behaviors regarding integer shifts in loongarch_build_integer

2022-06-27 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106097

--- Comment #9 from chenglulu  ---
Created attachment 53206
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53206=edit
use LU52I_B and LU32I_B instead of hard coding those long

[Bug target/106097] undefined behaviors regarding integer shifts in loongarch_build_integer

2022-06-27 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106097

--- Comment #8 from chenglulu  ---
> You can reuse LU32I_B and LU52I_B instead of hard coding those long
> constants :).

I have fixed it, thanks!:)

[Bug target/106097] undefined behaviors regarding integer shifts in loongarch_build_integer

2022-06-27 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106097

--- Comment #6 from chenglulu  ---
Created attachment 53205
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53205=edit
0001-Fix-bug-for-PR16097.patch

[Bug target/106097] undefined behaviors regarding integer shifts in loongarch_build_integer

2022-06-26 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106097

--- Comment #1 from chenglulu  ---

How can I reproduce the problem?
Thanks!
Lulu Cheng

[Bug target/105514] rv64 qsort gets wrong result when '-O2 -DDEBUG'.

2022-05-11 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105514

chenglulu  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #4 from chenglulu  ---
(In reply to Richard Biener from comment #3)
> indeed strict aliasing violation

Ok,thanks!

[Bug target/105514] rv64 qsort gets wrong result when '-O2 -DDEBUG'.

2022-05-08 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105514

--- Comment #2 from chenglulu  ---
(In reply to Andrew Pinski from comment #1)
> Just looking at the code, there seems to be some aliasing violations going
> on which is causing the problem.
> 
> Sometimes accessing via unsigned long and others by double.
> 
> Does -fno-strict-aliasing fixes the issue?

After adding the compile option '-fno-strict-aliasing', the program runs
without problems.

But I don't understand why one print affects so much?

Thanks!

[Bug c/105514] New: rv64 qsort gets wrong result when '-O2 -DDEBUG'.

2022-05-07 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105514

Bug ID: 105514
   Summary: rv64 qsort gets wrong result when '-O2 -DDEBUG'.
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chenglulu at loongson dot cn
  Target Milestone: ---

Created attachment 52936
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52936=edit
riscv64-linux-gnu

%riscv64-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=riscv64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/.../libexec/gcc/riscv64-linux-gnu/13.0.0/lto-wrapper
Target: riscv64-linux-gnu
Configured with: /.../configure --target=riscv64-linux-gnu --with-arch=rv64g
--with-abi=lp64d --enable-shared --disable-emultls --disable-bootstrap
--enable-languages=c,c++,fortran --disable-multilib 
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.0 20220507 (experimental) (GCC) 

There is no problem with the execution result of the following compiler.
% riscv64-linux-gnu-gcc qsort.c -o qsort -O2 --static -march=rv64g -mabi=lp64d
% ./qsort
./qsort 

But, When I add a line to the code to print,program execution result is wrong.
% riscv64-linux-gnu-gcc qsort.c -o qsort -O2 --static -march=rv64g -mabi=lp64d
-DDEBUG
./qsort 





test simple_qsort array with function pointer failed.

When add option '-DDEBUG -fno-reorder-blocks', The program execution result is
no problem again.