> -----Original Message-----
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Tuesday, August 25, 2020 7:08 PM
> To: xiezhiheng <xiezhih...@huawei.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng <xiezhih...@huawei.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> >> Sent: Friday, August 21, 2020 5:02 PM
> >> To: xiezhiheng <xiezhih...@huawei.com>
> >> Cc: Richard Biener <richard.guent...@gmail.com>;
> gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> >> emitted at -O3
> >
> > Cut...
> >
> >> Looks like the saturating intrinsics might need a bit more thought.
> >> Would you mind submitting the patch with just the other parts?
> >> Those were uncontroversial and it would be a shame to hold them
> >> up over this.
> >
> > Okay, I reorganized the existing patch and finished the first half of the
> intrinsics
> > except saturating intrinsics and load intrinsics.
> >
> > Bootstrapped and tested on aarch64 Linux platform.
> 
> I know this'll be frustrating, sorry, but could you post the
> 2020-08-17 patch without the saturation changes?  It's going to be
> easier to track and review if each patch deals with similar intrinsics.
> The non-saturating part of the 2020-08-17 patch was good because it was
> dealing purely with arithmetic operations.  Loads should really be a
> separate change.
> 
> BTW, for something like this, it's OK to test and submit several patches
> at once, so separating the patches doesn't need to mean longer test cycles.
> It's just that for review purposes, it's easier if one patch does one thing.
> 

That's true.  And I finished the patch to add FLAG for add/sub arithmetic
intrinsics except saturating intrinsics.  Later I will try to separate the rest
into several subsets to fix.

Bootstrapped and tested on aarch64 Linux platform.


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7a71b4367d4..a93712ae0a5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-26  Zhiheng Xie  <xiezhih...@huawei.com>
+
+       * config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
+       for add/sub arithmetic intrinsics.
+

> > For load intrinsics, I have one problem when I set FLAG_READ_MEMORY
> for them,
> > some test cases like
> > gcc.target/aarch64/advsimd-intrinsics/vld2_lane_p8_indices_1.c
> >   #include <arm_neon.h>
> >
> >   /* { dg-do compile } */
> >   /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
> >
> >   poly8x8x2_t
> >   f_vld2_lane_p8 (poly8_t * p, poly8x8x2_t v)
> >   {
> >     poly8x8x2_t res;
> >     /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
> >     res = vld2_lane_p8 (p, v, 8);
> >     /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */
> >     res = vld2_lane_p8 (p, v, -1);
> >     return res;
> >   }
> > would fail in regression.  Because the first statement
> >   res = vld2_lane_p8 (p, v, 8);
> > would be eliminated as dead code in gimple phase but the error message is
> > generated in expand pass.  So I am going to replace the second statement
> >   res = vld2_lane_p8 (p, v, -1);
> > with
> >   res = vld2_lane_p8 (p, res, -1);
> > or do you have any other suggestions?
> 
> The test is valid as-is, so it would be better not to change it.
> 
> I guess this means that we should leave the _lane loads and stores until
> we implement the range checks in a different way.  This is somewhat
> related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95969 ,
> although your example shows that the “dummy const function” approach
> might not work.
> 
> So to start with, could you just patch the non-lane loads?

Okay.

> 
> > And for test case gcc.target/aarch64/arg-type-diagnostics-1.c, I return the
> result
> > to prevent the statement
> >   result = vrsra_n_s32 (arg1, arg2, a);
> > from being eliminated by treated as dead code.
> 
> Hmm.  Here too I think the test is valid as-is.  I think we need
> to ensure that the range check still happens even if the call is
> dead code (similar to PR95969 above).

I agree.  That would be more reasonable.

> 
> So I guess here too, it might be better to leave the _n forms to
> a separate patch.
> 
> That doesn't mean we shouldn't fix the _lane and _n cases (or the
> previous saturating cases).  It's just that each time we find a group
> of functions that's awkward for some reason, it'd be better to deal
> with those functions separately.
> 
> Thanks,
> Richard

Attachment: pr94442-v2.patch
Description: pr94442-v2.patch

Reply via email to