Hello Richard, On 28 Oct 14:45, Richard Henderson wrote: > On 10/28/2013 01:58 PM, Kirill Yukhin wrote: > > Hello Richard, > > On 28 Oct 08:20, Richard Henderson wrote: > >> Why is a masked *scalar* operation useful? > > > > The reason the instructions exist is so that > > you can do fully fault correct predicated scalar algorithms. > > Using VEC_MERGE isn't the proper representation for that. > > If that's your real goal, then COND_EXEC is the only way to let > rtl know that faults are suppressed in the false condition.
I believe cond_exec approach supposed to look like this: (define_subst "mask_scalar" [(set (match_operand:SUBST_V 0) (vec_merge:SUBST_V (match_operand:SUBST_V 1) (match_operand:SUBST_V 2) (const_int 1)))] "TARGET_AVX512F" [(cond_exec (eq:CC (match_operand:<avx512fmaskmode> 3 "register_operand" "k") (const_int 1)) (set (match_dup 0) (vec_merge:SUBST_V (match_dup 1) (match_dup 2) (const_int 1))))]) But this only will describe merge-masking in incorrect way. We will need to add a clobber to signal that even for false condition we will zero higher part of register. Preferable zerro-masking will be indistinguishable from merge- masking and will need to choose which mask mode to enable. Bad turn. IMHO, we have 3 options to implement scalar masked insns: 1. `vec_merge' over vec_merge (current approach). Pro. 1. Precise semantic description 2. Unified approach with vector patterns 3. Freedom for simplifier to reduce EVEX to VEX for certain const masks Cons. 1. Too precise semantic description and as a consequence complicated code in md-file 2. `cond_exec' approach Pro. 1. Look useful for compiler when trying to generate predicated code Cons. 1. Not precise. Extra clobbers (?) needed: to signal that we're changing the register even for false condition in cond_exec 2. Unable to describe zero masking nicely 3. Code still complicated as for option #1 4. Simplifier won't work (clobber is always clobber) 3. Make all masked scalar insns to be unspecs Pro. 1. Straight-forward, not overweighted. Enough for intrinsics to work Cons. 1. Since every unspec needs a code: substs won't be applied directly: huge volume of similar code 2. Simplifier won't work 3. Generation of predicated code become hard Am I missing some options, or that’s all we have? If so, what option would you prefer? Thanks, K