On Wed, Aug 14, 2019 at 6:49 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Aug 14, 2019 at 5:06 PM Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: > > > > Hi, > > The attached patch tries to fix PR86753. > > > > For following test: > > void > > f1 (int *restrict x, int *restrict y, int *restrict z) > > { > > for (int i = 0; i < 100; ++i) > > x[i] = y[i] ? z[i] : 10; > > } > > > > vect dump shows: > > vect_cst__42 = { 0, ... }; > > vect_cst__48 = { 0, ... }; > > > > vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40); > > _4 = *_3; > > _5 = z_12(D) + _2; > > mask__35.8_43 = vect__4.7_41 != vect_cst__42; > > _35 = _4 != 0; > > vec_mask_and_46 = mask__35.8_43 & loop_mask_40; > > vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46); > > iftmp.0_13 = 0; > > vect_iftmp.12_50 = VEC_COND_EXPR <vect__4.7_41 != vect_cst__48, > > vect_iftmp.11_47, vect_cst__49>; > > > > and following code-gen: > > L2: > > ld1w z0.s, p2/z, [x1, x3, lsl 2] > > cmpne p1.s, p3/z, z0.s, #0 > > cmpne p0.s, p2/z, z0.s, #0 > > ld1w z0.s, p0/z, [x2, x3, lsl 2] > > sel z0.s, p1, z0.s, z1.s > > > > We could reuse vec_mask_and_46 in vec_cond_expr since the conditions > > vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42 > > are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != > > vect_cst__48. > > > > I suppose in general for vec_cond_expr <C, T, E> if T comes from masked > > load, > > which is conditional on C, then we could reuse the mask used in load, > > in vec_cond_expr ? > > > > The patch maintains a hash_map cond_to_vec_mask > > from <cond, loop_mask -> vec_mask (with loop predicate applied). > > In prepare_load_store_mask, we record <cond, loop_mask> -> vec_mask & > > loop_mask, > > and in vectorizable_condition, we check if <cond, loop_mask> exists in > > cond_to_vec_mask > > and if found, the corresponding vec_mask is used as 1st operand of > > vec_cond_expr. > > > > <cond, loop_mask> is represented with cond_vmask_key, and the patch > > adds tree_cond_ops to represent condition operator and operands coming > > either from cond_expr > > or a gimple comparison stmt. If the stmt is not comparison, it returns > > <ne_expr, lhs, 0> and inserts that into cond_to_vec_mask. > > > > With patch, the redundant p1 is eliminated and sel uses p0 for above test. > > > > For following test: > > void > > f2 (int *restrict x, int *restrict y, int *restrict z, int fallback) > > { > > for (int i = 0; i < 100; ++i) > > x[i] = y[i] ? z[i] : fallback; > > } > > > > input to vectorizer has operands swapped in cond_expr: > > _36 = _4 != 0; > > iftmp.0_14 = .MASK_LOAD (_5, 32B, _36); > > iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14; > > > > So we need to check for inverted condition in cond_to_vec_mask, > > and swap the operands. > > Does the patch look OK so far ? > > > > One major issue remaining with the patch is value numbering. > > Currently, it does value numbering for entire function using sccvn > > during start of vect pass, which is too expensive since we only need > > block based VN. I am looking into that. > > Why do you need it at all? We run VN on the if-converted loop bodies btw.
Also I can't trivially see the equality of the masks and probably so can't VN. Is it that we just don't bother to apply loop_mask to VEC_COND but there's no harm if we do? Richard. > Richard. > > > > > Thanks, > > Prathamesh