On Thu, Apr 19, 2012 at 5:40 PM, Matt Turner <matts...@gmail.com> wrote: > Uses the pmadd technique described in > http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf > +static force_inline __m64 > +pack_4xpacked565 (__m64 a, __m64 b) > +{ > + __m64 rb0 = _mm_and_si64 (a, MC (packed_565_rb)); > + __m64 rb1 = _mm_and_si64 (b, MC (packed_565_rb)); > + > + __m64 t0 = _mm_madd_pi16 (rb0, MC (565_pack_multiplier)); > + __m64 t1 = _mm_madd_pi16 (rb1, MC (565_pack_multiplier)); > + > + __m64 g0 = _mm_and_si64 (a, MC (packed_565_g)); > + __m64 g1 = _mm_and_si64 (b, MC (packed_565_g)); > + > + t0 = _mm_or_si64 (t0, g0); > + t1 = _mm_or_si64 (t1, g1); > + > + t0 = shift(t0, -5); > + t1 = shift(t1, -5 + 16); > + > + return _mm_shuffle_pi16 (_mm_or_si64 (t0, t1), _MM_SHUFFLE (3, 1, 2, 0)); > +}
I think the return statement can be simplified with a _mm_packs_pi32, but I couldn't get it to work. If someone has a chance to take a look, I'd be very appreciative. Thanks, Matt _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman