On 8/23/07, Tim Prince <[EMAIL PROTECTED]> wrote:
> The primary icc/icl use of SSE/SSE2 masking operations, of course, is in
> the auto-vectorization of fabs[f] and conditional operations:
>
>       sum = 0.f;
>       i__2 = *n;
>       for (i__ = 1; i__ <= i__2; ++i__)
>           if (a[i__] > 0.f)
>               sum += a[i__];
> .... (Windows/intel asm syntax)
>        pxor      xmm2, xmm2
>        cmpltps   xmm2, xmm3
>        andps     xmm3, xmm2
>        addps     xmm0, xmm3
> ...
Note that icc9 has a strong bias for pentium4, which had no stall
penalty for mistyped fp vectors as for Intel it came with the pentium
M line, so you see a pxor even if generating code for the core2.
# cat autoicc.cc
float foo(const float *a, int n) {
        float sum = 0.f;
        for (int i = 0; i <n; ++i)
                if (a[i] > 0.f)
                        sum += a[i];
        return sum;
}
int main() { return 0; }
# /opt/intel/cce/9.1.051/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3) : (col. 2) remark: LOOP WAS VECTORIZED.
  4007a9:       pxor   %xmm4,%xmm4
  4007ad:       cmpltps %xmm3,%xmm4
  4007b1:       andps  %xmm3,%xmm4
# /opt/intel/cce/10.0.023/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3): (col. 2) remark: LOOP WAS VECTORIZED.
  400b50:       xorps  %xmm3,%xmm3
  400b53:       cmpltps %xmm4,%xmm3
  400b57:       andps  %xmm3,%xmm4

Reply via email to