> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Thursday, February 1, 2024 2:24 PM
> To: Andrew Pinski <pins...@gmail.com>
> Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; nd
> <n...@arm.com>; Richard Earnshaw <richard.earns...@arm.com>; Marcus
> Shawcroft <marcus.shawcr...@arm.com>; Kyrylo Tkachov
> <kyrylo.tkac...@arm.com>
> Subject: Re: [PATCH]AArch64: update vget_set_lane_1.c test output
> 
> Andrew Pinski <pins...@gmail.com> writes:
> > On Thu, Feb 1, 2024 at 1:26 AM Tamar Christina <tamar.christ...@arm.com>
> wrote:
> >>
> >> Hi All,
> >>
> >> In the vget_set_lane_1.c test the following entries now generate a zip1 
> >> instead
> of an INS
> >>
> >> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
> >> BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
> >> BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
> >>
> >> This is because the non-Q variant for indices 0 and 1 are just shuffling 
> >> values.
> >> There is no perf difference between INS SIMD to SIMD and ZIP, as such just
> update the
> >> test file.
> > Hmm, is this true on all cores? I suspect there is a core out there
> > where INS is implemented with a much lower latency than ZIP.
> > If we look at config/aarch64/thunderx.md, we can see INS is 2 cycles
> > while ZIP is 6 cycles (3/7 for q versions).
> > Now I don't have any invested interest in that core any more but I
> > just wanted to point out that is not exactly true for all cores.
> 
> Thanks for the pointer.  In that case, perhaps we should prefer
> aarch64_evpc_ins over aarch64_evpc_zip in aarch64_expand_vec_perm_const_1?
> That's enough to fix this failure, but it'll probably require other
> tests to be adjusted...

I think given that Thundex-X is a 10 year old micro-architecture that is 
several cases where
often used instructions have very high latencies that generic codegen should 
not be blocked
from progressing because of it.

we use zips in many things and if thunderx codegen is really of that much 
importance then I
think the old codegen should be gated behind -mcpu=thunderx rather than 
preventing generic
changes.

Regards,
Tamar.

> 
> Richard

Reply via email to