On 8/23/07, Tim Prince <[EMAIL PROTECTED]> wrote:
> The primary icc/icl use of SSE/SSE2 masking operations, of course, is in
> the auto-vectorization of fabs[f] and conditional operations:
>
> sum = 0.f;
> i__2 = *n;
> for (i__ = 1; i__ <= i__2; ++i__)
> if (a[i__] > 0.f)
> sum += a[i__];
> .... (Windows/intel asm syntax)
> pxor xmm2, xmm2
> cmpltps xmm2, xmm3
> andps xmm3, xmm2
> addps xmm0, xmm3
> ...
Note that icc9 has a strong bias for pentium4, which had no stall
penalty for mistyped fp vectors as for Intel it came with the pentium
M line, so you see a pxor even if generating code for the core2.
# cat autoicc.cc
float foo(const float *a, int n) {
float sum = 0.f;
for (int i = 0; i <n; ++i)
if (a[i] > 0.f)
sum += a[i];
return sum;
}
int main() { return 0; }
# /opt/intel/cce/9.1.051/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3) : (col. 2) remark: LOOP WAS VECTORIZED.
4007a9: pxor %xmm4,%xmm4
4007ad: cmpltps %xmm3,%xmm4
4007b1: andps %xmm3,%xmm4
# /opt/intel/cce/10.0.023/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3): (col. 2) remark: LOOP WAS VECTORIZED.
400b50: xorps %xmm3,%xmm3
400b53: cmpltps %xmm4,%xmm3
400b57: andps %xmm3,%xmm4