Hello Richard,
On 28 Oct 08:20, Richard Henderson wrote:
> Why is a masked *scalar* operation useful?

The reason the instructions exist is so that
you can do fully fault correct predicated scalar algorithms.

I example.

In fact, with some hacky tricks, you can fully predicate
normal C code in the SIMD registers. 

One might want to have such region
as fast as possible while staying scalar:
(all vars are integers):
  if ( a[i] )
    b[i] += c[i];

Definetely to have max performace we want to have
the region fully predicated. This code cannot be
predicated correctly in IA pre-AVX-512:

  vmovd a[i], %xmm0
  vptestm %zmm0, %zmm0, %k1
  // hack because we didn’t have masking for VPMOVD/Q
  vmovss b[i], %xmm0 {%k1}{z}
  // no scalar int add, hack, 128-bit works fine
  // because mask is sawed off in the right places
  vpaddd c[i], %zmm0, %zmm0 {%k1}{z}
  // vmpmovd/w hack again
  vmovss %xmm0, b[i] {%k1}

So, having such masked scalar insns allows us to have
non-branching scalar code.

II Example.

Perhaps one interesting case of scalar and mask, though not
to do with predication (and really narrow), is as an idiom to
generate a write mask value of 0x1:

  vcmpss k1, xmm0, xmm0, {sae}, 0xf

Currently we do:
  mov   1, %eax
  kmovw %eax, %k1

But we sometimes have to spill a GPR in order to introduce
this sequence. In that case the vcmpss idiom seems like a
better choice  (though you might worry about the additional
false dependency on xmm0).

And finally.
It’s kind of strange not to have complete ISA support.
What if someone would want to have equivalent code for
the vector version and the scalar remainder version?

--
Thanks, K

Reply via email to