> -----Original Message----- > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > Sent: Thursday, August 20, 2020 4:55 PM > To: xiezhiheng <xiezhih...@huawei.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > emitted at -O3 > > xiezhiheng <xiezhih...@huawei.com> writes: > >> -----Original Message----- > >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] > >> Sent: Wednesday, August 19, 2020 6:06 PM > >> To: xiezhiheng <xiezhih...@huawei.com> > >> Cc: Richard Biener <richard.guent...@gmail.com>; > gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > >> emitted at -O3 > >> > >> xiezhiheng <xiezhih...@huawei.com> writes: > >> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first > >> > for a > try, > >> > including all the add/sub arithmetic intrinsics. > >> > > >> > Something like faddp intrinsic which only handles floating-point > operations, > >> > both FP and NONE flags are suitable for it because FLAG_FP will be > added > >> > later if the intrinsic handles floating-point operations. And I prefer > >> > FP > >> since > >> > it would be more clear. > >> > >> Sounds good to me. > >> > >> > But for qadd intrinsics, they would modify FPSR register which is a > scenario > >> > I missed before. And I consider to add an additional flag > >> FLAG_WRITE_FPSR > >> > to represent it. > >> > >> I don't think we make any attempt to guarantee that the Q flag is > >> meaningful after saturating intrinsics. To do that, we'd need to model > >> the modification of the flag in the .md patterns too. > >> > >> So my preference would be to leave this out and just use NONE for the > >> saturating forms too. > > > > The problem is that the test case in the attachment has different results > under -O0 and -O2. > > Right. But my point was that I don't think that use case is supported. > If you want to use saturating instructions and read the Q flag afterwards, > the saturating instructions need to be inline asm too. > > > In gimple phase statement: > > _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6); > > would be treated as dead code if we set NONE flag for saturating intrinsics. > > Adding FLAG_WRITE_FPSR would help fix this problem. > > > > Even when we set FLAG_WRITE_FPSR, the uqadd insn: > > (insn 11 10 12 2 (set (reg:V2SI 97) > > (us_plus:V2SI (reg:V2SI 98) > > (reg:V2SI 99))) {aarch64_uqaddv2si} > > (nil)) > > could also be eliminated in RTL phase because this insn will be treated as > dead insn. > > So I think we might also need to modify saturating instruction patterns > adding the side effect of set the FPSR register. > > The problem is that FPSR is global state and we don't in general > know who might read it. So if we modelled the modification of the FPSR, > we'd never be able to fold away saturating arithmetic that does actually > saturate at compile time, because we'd never know whether the program > wanted the effect on the Q flag result to be visible (perhaps to another > function that the compiler can't see). We'd also be unable to remove > results that really are dead. > > So I think this is one of those situations in which we can't keep all > constituents happy. Catering for people who want to read the Q flag > would make things worse for those who want saturating arithmetic to be > optimised as aggressively as possible. And the same holds in reverse.
I agree. The test case is extracted from gcc.target/aarch64/advsimd-intrinsics/vqadd.c If we set NONE flag for saturating intrinsics, it would fail in regression because some qadd intrinsics would be treated as dead code and be eliminated. Running target unix Running ./gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp ... PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O0 (test for excess errors) PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O0 execution test PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O1 (test for excess errors) FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O1 execution test PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 (test for excess errors) FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 execution test PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O3 -g (test for excess errors) FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O3 -g execution test PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Os (test for excess errors) FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Os execution test PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Og -g (test for excess errors) FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -Og -g execution test PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test PASS: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: gcc.target/aarch64/advsimd-intrinsics/vqadd.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test So maybe this test case should only be tested at -O0 level? Thanks, Xie Zhiheng