Hello Richard,

On 28 Oct 14:45, Richard Henderson wrote:
> On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
> > Hello Richard,
> > On 28 Oct 08:20, Richard Henderson wrote:
> >> Why is a masked *scalar* operation useful?
> > 
> > The reason the instructions exist is so that
> > you can do fully fault correct predicated scalar algorithms.
> 
> Using VEC_MERGE isn't the proper representation for that.
> 
> If that's your real goal, then COND_EXEC is the only way to let
> rtl know that faults are suppressed in the false condition.

I believe cond_exec approach supposed to look like this:
  (define_subst "mask_scalar"
    [(set (match_operand:SUBST_V 0)
          (vec_merge:SUBST_V
            (match_operand:SUBST_V 1)
            (match_operand:SUBST_V 2)
            (const_int 1)))]
    "TARGET_AVX512F"
    [(cond_exec (eq:CC
                  (match_operand:<avx512fmaskmode> 3 "register_operand" "k")
                  (const_int 1))
                (set (match_dup 0)
                     (vec_merge:SUBST_V
                       (match_dup 1)
                       (match_dup 2)
                       (const_int 1))))])

But this only will describe merge-masking in incorrect way.
We will need to add a clobber to signal that even for false
condition we will zero higher part of register.
Preferable zerro-masking will be indistinguishable from merge-
masking and will need to choose which mask mode to enable. Bad turn.

IMHO, we have 3 options to implement scalar masked insns:
  1. `vec_merge' over vec_merge (current approach).
     Pro.
       1. Precise semantic description
       2. Unified approach with vector patterns
       3. Freedom for simplifier to reduce EVEX to VEX for
       certain const masks
     Cons.
       1. Too precise semantic description and as a
       consequence complicated code in md-file

  2. `cond_exec' approach
    Pro.
      1. Look useful for compiler when trying to generate
      predicated code
    Cons.
      1. Not precise. Extra clobbers (?) needed: to signal
      that we're changing the register even for false
      condition in cond_exec 
      2. Unable to describe zero masking nicely
      3. Code still complicated as for option #1
      4. Simplifier won't work (clobber is always clobber)

  3. Make all masked scalar insns to be unspecs
    Pro.
      1. Straight-forward, not overweighted. Enough for
      intrinsics to work
    Cons.
      1. Since every unspec needs a code: substs won't be
      applied directly: huge volume of similar code
      2. Simplifier won't work
      3. Generation of predicated code become hard

Am I missing  some options, or that’s all we have?
If so, what option would you prefer?

Thanks, K

Reply via email to