RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

xiezhiheng Thu, 20 Aug 2020 05:17:16 -0700

> -----Original Message-----
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, August 20, 2020 4:55 PM
> To: xiezhiheng <xiezhih...@huawei.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng <xiezhih...@huawei.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> >> Sent: Wednesday, August 19, 2020 6:06 PM
> >> To: xiezhiheng <xiezhih...@huawei.com>
> >> Cc: Richard Biener <richard.guent...@gmail.com>;
> gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> >> emitted at -O3
> >>
> >> xiezhiheng <xiezhih...@huawei.com> writes:
> >> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first 
> >> > for a
> try,
> >> > including all the add/sub arithmetic intrinsics.
> >> >
> >> > Something like faddp intrinsic which only handles floating-point
> operations,
> >> > both FP and NONE flags are suitable for it because FLAG_FP will be
> added
> >> > later if the intrinsic handles floating-point operations.  And I prefer 
> >> > FP
> >> since
> >> > it would be more clear.
> >>
> >> Sounds good to me.
> >>
> >> > But for qadd intrinsics, they would modify FPSR register which is a
> scenario
> >> > I missed before.  And I consider to add an additional flag
> >> FLAG_WRITE_FPSR
> >> > to represent it.
> >>
> >> I don't think we make any attempt to guarantee that the Q flag is
> >> meaningful after saturating intrinsics.  To do that, we'd need to model
> >> the modification of the flag in the .md patterns too.
> >>
> >> So my preference would be to leave this out and just use NONE for the
> >> saturating forms too.
> >
> > The problem is that the test case in the attachment has different results
> under -O0 and -O2.
> 
> Right.  But my point was that I don't think that use case is supported.
> If you want to use saturating instructions and read the Q flag afterwards,
> the saturating instructions need to be inline asm too.
> 
> > In gimple phase statement:
> >   _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6);
> > would be treated as dead code if we set NONE flag for saturating intrinsics.
> > Adding FLAG_WRITE_FPSR would help fix this problem.
> >
> > Even when we set FLAG_WRITE_FPSR, the uqadd insn:
> >   (insn 11 10 12 2 (set (reg:V2SI 97)
> >         (us_plus:V2SI (reg:V2SI 98)
> >             (reg:V2SI 99))) {aarch64_uqaddv2si}
> >      (nil))
> > could also be eliminated in RTL phase because this insn will be treated as
> dead insn.
> > So I think we might also need to modify saturating instruction patterns
> adding the side effect of set the FPSR register.
> 
> The problem is that FPSR is global state and we don't in general
> know who might read it.  So if we modelled the modification of the FPSR,
> we'd never be able to fold away saturating arithmetic that does actually
> saturate at compile time, because we'd never know whether the program
> wanted the effect on the Q flag result to be visible (perhaps to another
> function that the compiler can't see).  We'd also be unable to remove
> results that really are dead.
> 
> So I think this is one of those situations in which we can't keep all
> constituents happy.  Catering for people who want to read the Q flag
> would make things worse for those who want saturating arithmetic to be
> optimised as aggressively as possible.  And the same holds in reverse.


I agree.  The test case is extracted from 
gcc.target/aarch64/advsimd-intrinsics/vqadd.c
If we set NONE flag for saturating intrinsics, it would fail in regression 
because some qadd
intrinsics would be treated as dead code and be eliminated.
  Running target unix
  Running ./gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp ...
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  (test for excess 
errors)
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O0  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  (test for excess 
errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O1  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  (test for excess 
errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  (test for 
excess errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O3 -g  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  (test for excess 
errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Os  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  (test for 
excess errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -Og -g  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
  PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
  FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test

So maybe this test case should only be tested at -O0 level?

Thanks,
Xie Zhiheng

RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Reply via email to