Hello Richard, On 28 Oct 08:20, Richard Henderson wrote: > Why is a masked *scalar* operation useful?
The reason the instructions exist is so that you can do fully fault correct predicated scalar algorithms. I example. In fact, with some hacky tricks, you can fully predicate normal C code in the SIMD registers. One might want to have such region as fast as possible while staying scalar: (all vars are integers): if ( a[i] ) b[i] += c[i]; Definetely to have max performace we want to have the region fully predicated. This code cannot be predicated correctly in IA pre-AVX-512: vmovd a[i], %xmm0 vptestm %zmm0, %zmm0, %k1 // hack because we didn’t have masking for VPMOVD/Q vmovss b[i], %xmm0 {%k1}{z} // no scalar int add, hack, 128-bit works fine // because mask is sawed off in the right places vpaddd c[i], %zmm0, %zmm0 {%k1}{z} // vmpmovd/w hack again vmovss %xmm0, b[i] {%k1} So, having such masked scalar insns allows us to have non-branching scalar code. II Example. Perhaps one interesting case of scalar and mask, though not to do with predication (and really narrow), is as an idiom to generate a write mask value of 0x1: vcmpss k1, xmm0, xmm0, {sae}, 0xf Currently we do: mov 1, %eax kmovw %eax, %k1 But we sometimes have to spill a GPR in order to introduce this sequence. In that case the vcmpss idiom seems like a better choice (though you might worry about the additional false dependency on xmm0). And finally. It’s kind of strange not to have complete ISA support. What if someone would want to have equivalent code for the vector version and the scalar remainder version? -- Thanks, K