Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally

Bill Schmidt Fri, 18 Oct 2013 06:52:08 -0700

Just a quick note that overnight testing of the posted patch was clean.

Recap: There are three options on the table:
 - the posted patch
 - that patch with the (1 - start) change
 - replace nunits - 1 with nunits as the loop upper bound


I'm happy to implement any of them, as you prefer.  I lean towards the
last as the most easily understood.

Thanks,
Bill

On Fri, 2013-10-18 at 00:09 -0500, Bill Schmidt wrote:
> 
> On Fri, 2013-10-18 at 00:34 -0400, David Edelsohn wrote:
> > On Thu, Oct 17, 2013 at 10:43 PM, Bill Schmidt
> > <wschm...@linux.vnet.ibm.com> wrote:
> > > Hi,
> > >
> > > In little endian mode, we managed to convert a load of the V4SI vector
> > > {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the
> > > number 7.  It turns out we only looked at the first N-1 elements of an
> > > N-element vector in little endian mode, and verified the zeroth element
> > > twice.  Adjusting the loop boundaries fixes the problem.
> > >
> > > Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu.  Ok to
> > > commit to trunk if no regressions?
> > 
> > This patch does not make sense. It biases the loop bounds based on
> > BYTES_BIG_ENDIAN, but the body of the loop biases the index variable
> > based on BYTES_BIG_ENDIAN in one of the two uses.  The changes seem to
> > compute the same value for the first use of index "i" for both
> > BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN in a convoluted way.  It looks
> > like something should be able to be simplified.
> > 
> > Thanks, David
> > 
> 
> It seems like it, but I haven't been able to figure out anything nicer.
> The thing that makes this weird is that "i" is referring to different
> elements of the array depending on Big or Little Endian numbering.
> 
> Suppose you have a V16QI: {0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15} with
> step = 2.  Then for BE, the 15's are in odd-numbered elements (starting
> with zero from the left), but for LE they are in even-numbered elements
> (starting with zero from the right).  For both cases, we found the
> candidate value in the rightmost element (nunits - 1 for BE, 0 for LE).
> 
> The first two iterations for BE process the two leftmost elements.  For
> i = 0, we find (i+1)&1 to be nonzero, so the desired value is msb_val =
> 0.  For i = 1, we find (i+1)&1 to be zero, so the desired value is val =
> 15, and so on.
> 
> The first two iterations for LE process the second and third elements
> from the right.  For i = 1, we find i&1 to be nonzero, so the desired
> value is 0.  For i = 2, we find i&2 to be zero, so the desired value is
> 15.
> 
> So even though the first iteration calculates the same value for both
> endian modes, "i" means something different in each case.
> 
> I guess we could make a "simplification" as follows:
> 
>   /* Check if VAL is present in every STEP-th element, and the                
>   
>      other elements are filled with its most significant bit.  */
>   start = BYTES_BIG_ENDIAN ? 0 : 1;
>   end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits;
> 
>   for (i = start; i < end; ++i)
>     {
>       HOST_WIDE_INT desired_val;
>       if (((i + (1 - start)) & (step - 1)) == 0)
>         desired_val = val;
>       else
>         desired_val = msb_val;
> 
> but that may be as hard to understand as what's there now...
> 
> An alternative is to just change the loop bounds to
> 
>   for (i = 0; i < nunits; ++i)
> 
> which will reprocess the candidate element for both endian modes, and
> leave everything else alone.  That would probably actually be faster
> than all the convoluted nonsense to avoid the reprocessing.  Want me to
> go in that direction instead?
> 
> Thanks,
> Bill
>

Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally

Reply via email to