> -----Original Message----- > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > Sent: Tuesday, August 25, 2020 7:08 PM > To: xiezhiheng <xiezhih...@huawei.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > emitted at -O3 > > xiezhiheng <xiezhih...@huawei.com> writes: > >> -----Original Message----- > >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] > >> Sent: Friday, August 21, 2020 5:02 PM > >> To: xiezhiheng <xiezhih...@huawei.com> > >> Cc: Richard Biener <richard.guent...@gmail.com>; > gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > >> emitted at -O3 > > > > Cut... > > > >> Looks like the saturating intrinsics might need a bit more thought. > >> Would you mind submitting the patch with just the other parts? > >> Those were uncontroversial and it would be a shame to hold them > >> up over this. > > > > Okay, I reorganized the existing patch and finished the first half of the > intrinsics > > except saturating intrinsics and load intrinsics. > > > > Bootstrapped and tested on aarch64 Linux platform. > > I know this'll be frustrating, sorry, but could you post the > 2020-08-17 patch without the saturation changes? It's going to be > easier to track and review if each patch deals with similar intrinsics. > The non-saturating part of the 2020-08-17 patch was good because it was > dealing purely with arithmetic operations. Loads should really be a > separate change. > > BTW, for something like this, it's OK to test and submit several patches > at once, so separating the patches doesn't need to mean longer test cycles. > It's just that for review purposes, it's easier if one patch does one thing. >
That's true. And I finished the patch to add FLAG for add/sub arithmetic intrinsics except saturating intrinsics. Later I will try to separate the rest into several subsets to fix. Bootstrapped and tested on aarch64 Linux platform. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 7a71b4367d4..a93712ae0a5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2020-08-26 Zhiheng Xie <xiezhih...@huawei.com> + + * config/aarch64/aarch64-simd-builtins.def: Add proper FLAG + for add/sub arithmetic intrinsics. + > > For load intrinsics, I have one problem when I set FLAG_READ_MEMORY > for them, > > some test cases like > > gcc.target/aarch64/advsimd-intrinsics/vld2_lane_p8_indices_1.c > > #include <arm_neon.h> > > > > /* { dg-do compile } */ > > /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ > > > > poly8x8x2_t > > f_vld2_lane_p8 (poly8_t * p, poly8x8x2_t v) > > { > > poly8x8x2_t res; > > /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */ > > res = vld2_lane_p8 (p, v, 8); > > /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */ > > res = vld2_lane_p8 (p, v, -1); > > return res; > > } > > would fail in regression. Because the first statement > > res = vld2_lane_p8 (p, v, 8); > > would be eliminated as dead code in gimple phase but the error message is > > generated in expand pass. So I am going to replace the second statement > > res = vld2_lane_p8 (p, v, -1); > > with > > res = vld2_lane_p8 (p, res, -1); > > or do you have any other suggestions? > > The test is valid as-is, so it would be better not to change it. > > I guess this means that we should leave the _lane loads and stores until > we implement the range checks in a different way. This is somewhat > related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95969 , > although your example shows that the “dummy const function” approach > might not work. > > So to start with, could you just patch the non-lane loads? Okay. > > > And for test case gcc.target/aarch64/arg-type-diagnostics-1.c, I return the > result > > to prevent the statement > > result = vrsra_n_s32 (arg1, arg2, a); > > from being eliminated by treated as dead code. > > Hmm. Here too I think the test is valid as-is. I think we need > to ensure that the range check still happens even if the call is > dead code (similar to PR95969 above). I agree. That would be more reasonable. > > So I guess here too, it might be better to leave the _n forms to > a separate patch. > > That doesn't mean we shouldn't fix the _lane and _n cases (or the > previous saturating cases). It's just that each time we find a group > of functions that's awkward for some reason, it'd be better to deal > with those functions separately. > > Thanks, > Richard
pr94442-v2.patch
Description: pr94442-v2.patch