On Tue, Feb 4, 2014 at 10:15 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi, > > One final patch in the series, this one for vec_sum2s. This builtin > requires some additional code generation for the case of little endian > without -maltivec=be. Here's an example: > > va = {-10,1,2,3}; 0x 00000003 00000002 00000001 fffffff6 > vb = {100,101,102,-103}; 0x ffffff99 00000066 00000065 00000064 > vc = vec_sum2s (va, vb); 0x ffffff9e 00000000 0000005c 00000000 > = {0,92,0,-98}; > > We need to add -10 + 1 + 101 = 92 and place it in vc[1], and add 2 + 3 + > -103 and place the result in vc[3], with zeroes in the other two > elements. To do this, we first use "vsldoi vs,vb,vb,12" to rotate 101 > and -103 into big-endian elements 1 and 3, as required by the vsum2sws > instruction: > > 0x ffffff99 00000066 00000065 00000064 ffffff99 00000066 00000065 00000064 > ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ > vs = 00000064 ffffff99 00000066 00000065 > > Executing "vsum2sws vs,va,vs" then gives > > vs = 0x 00000000 ffffff9e 00000000 0000005c > > which then must be shifted into position with "vsldoi vc,vs,vs,4" > > 0x 00000000 ffffff9e 00000000 0000005c 00000000 ffffff9e 00000000 0000005c > ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ > vc = ffffff9e 00000000 0000005c 00000000 > > which is the desired result. > > In addition to this change, I noticed a redundant test from one of my > previous patches and simplified it. (BYTES_BIG_ENDIAN implies > VECTOR_ELT_ORDER_BIG, so we don't need to test BYTES_BIG_ENDIAN.) > > As usual, new test cases are added to cover the possible cases. These > are simpler this time since only vector signed integer is a legal type > for vec_sum2s. > > Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no > regressions. Is this ok for trunk? > > Thanks, > Bill > > > gcc: > > 2014-02-04 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (altivec_vsum2sws): Adjust code > generation for -maltivec=be. > (altivec_vsumsws): Simplify redundant test. > > gcc/testsuite: > > 2014-02-04 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * gcc.dg/vmx/sum2s.c: New. > * gcc.dg/vmx/sum2s-be-order.c: New.
Okay. The multi-instruction sequences really should be emitted as separate instructions and the scratch only allocated for the LE case. Thanks, David