On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 25, 2021 at 12:37:14PM -0400, Michael Meissner wrote:
> > I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
> > xxpermx, and xxeval all used the 'vecsimple' type.  These instructions are
> > permute instructions (3 cycle latency) and should use 'vecperm' instead.
> 
> They are all executed on the PM pipe currently, yup.  If this changes
> later we'll have to fix it, but that is for then :-)
> 
> > While I was at it, I changed the UNSPEC name for xxspltidp to be
> > UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
> 
> In the future please do separate things as separate patches.
> 
> >     * config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
> >     UNSPEC_XXSPLTID.
> 
>       * config/rs6000/vsx.md (UNSPEC_XXSPLTID): Rename to...
>       (UNSPEC_XXSPLTIDP): ... this.
> 
> >     (xxspltidp_v2df): Use vecperm type attribute.  Use
> >     UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
> 
> Typo ("UU").
> 
> Okay for trunk with those trivial fixes.  Also okay for backport to 11,
> it is trivial enough.  Thanks!

Thanks.

> Out of interest, did you notice any scheduling differences with this?

I don't use the built-ins so I wouldn't notice a difference.  I noticed this as
part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW in
a future patch).  The XXSPLTIDP instruction allows loading up many SFmode,
DFmode, and V2DFmode constants.  The XXSPLTIW instruction allows loading up
certain V16QImode, V8HImode, V4SImode, and V4SFmode constants.

I'm trying to iron out the slow-downs, and I wanted the scheduler to know it
needed to add insns between the XXSPLTIDP to load the constant and its use if
it can.  Right now, I'm seeing a slight boost in blender_r with XXSPLTIDP (over
doing a load).  However, I suspect if you aren't running spec on an otherwise
idle machine, things will change where XXSPLTIDP will be more of a win by
eliminating the loads.

While XXSPLTIDP by itself is positive, unfortunately, there is a regression in
cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to
track down.

If I add both instructions, several of the benchmarks improve (including
xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and
fotonik3d_r also has a new 3% regression.

Given that many more programs use floating point constants than vector
constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push
the XXSPLTIDP now, and try to figure out the differences before submitting the
XXSPLTIW patch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Reply via email to