Just a quick note that overnight testing of the posted patch was clean. Recap: There are three options on the table: - the posted patch - that patch with the (1 - start) change - replace nunits - 1 with nunits as the loop upper bound
I'm happy to implement any of them, as you prefer. I lean towards the last as the most easily understood. Thanks, Bill On Fri, 2013-10-18 at 00:09 -0500, Bill Schmidt wrote: > > On Fri, 2013-10-18 at 00:34 -0400, David Edelsohn wrote: > > On Thu, Oct 17, 2013 at 10:43 PM, Bill Schmidt > > <wschm...@linux.vnet.ibm.com> wrote: > > > Hi, > > > > > > In little endian mode, we managed to convert a load of the V4SI vector > > > {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the > > > number 7. It turns out we only looked at the first N-1 elements of an > > > N-element vector in little endian mode, and verified the zeroth element > > > twice. Adjusting the loop boundaries fixes the problem. > > > > > > Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu. Ok to > > > commit to trunk if no regressions? > > > > This patch does not make sense. It biases the loop bounds based on > > BYTES_BIG_ENDIAN, but the body of the loop biases the index variable > > based on BYTES_BIG_ENDIAN in one of the two uses. The changes seem to > > compute the same value for the first use of index "i" for both > > BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN in a convoluted way. It looks > > like something should be able to be simplified. > > > > Thanks, David > > > > It seems like it, but I haven't been able to figure out anything nicer. > The thing that makes this weird is that "i" is referring to different > elements of the array depending on Big or Little Endian numbering. > > Suppose you have a V16QI: {0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15} with > step = 2. Then for BE, the 15's are in odd-numbered elements (starting > with zero from the left), but for LE they are in even-numbered elements > (starting with zero from the right). For both cases, we found the > candidate value in the rightmost element (nunits - 1 for BE, 0 for LE). > > The first two iterations for BE process the two leftmost elements. For > i = 0, we find (i+1)&1 to be nonzero, so the desired value is msb_val = > 0. For i = 1, we find (i+1)&1 to be zero, so the desired value is val = > 15, and so on. > > The first two iterations for LE process the second and third elements > from the right. For i = 1, we find i&1 to be nonzero, so the desired > value is 0. For i = 2, we find i&2 to be zero, so the desired value is > 15. > > So even though the first iteration calculates the same value for both > endian modes, "i" means something different in each case. > > I guess we could make a "simplification" as follows: > > /* Check if VAL is present in every STEP-th element, and the > > other elements are filled with its most significant bit. */ > start = BYTES_BIG_ENDIAN ? 0 : 1; > end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits; > > for (i = start; i < end; ++i) > { > HOST_WIDE_INT desired_val; > if (((i + (1 - start)) & (step - 1)) == 0) > desired_val = val; > else > desired_val = msb_val; > > but that may be as hard to understand as what's there now... > > An alternative is to just change the loop bounds to > > for (i = 0; i < nunits; ++i) > > which will reprocess the candidate element for both endian modes, and > leave everything else alone. That would probably actually be faster > than all the convoluted nonsense to avoid the reprocessing. Want me to > go in that direction instead? > > Thanks, > Bill >