Hi, Xionhu Should "altivec_vsel<mode>2" .. 3 .. 4 be "*altivec_vsel<mode>2", etc. because they are combiner patterns and never referenced by name? Only the first, named pattern is referenced by the builtin code.
Other than that question / suggestion, this patch is okay. Please coordinate with Bill and his builtin patches. Thanks, David On Wed, Sep 15, 2021 at 3:50 AM Xionghu Luo <luo...@linux.ibm.com> wrote: > > Ping^3, thanks. > > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html > > > On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote: > > Ping^2, thanks. > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html > > > > On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote: > >> Gentle ping, thanks. > >> > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html > >> > >> > >> On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote: > >>> Hi, > >>> > >>> On 2021/5/13 18:49, Segher Boessenkool wrote: > >>>> Hi! > >>>> > >>>> On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote: > >>>>> The vsel instruction is a bit-wise select instruction. Using an > >>>>> IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code > >>>>> being generated in the combine pass. Per element selection is a > >>>>> subset of per bit-wise selection,with the patch the pattern is > >>>>> written using bit operations. But there are 8 different patterns > >>>>> to define "op0 := (op1 & ~op3) | (op2 & op3)": > >>>>> > >>>>> (~op3&op1) | (op3&op2), > >>>>> (~op3&op1) | (op2&op3), > >>>>> (op3&op2) | (~op3&op1), > >>>>> (op2&op3) | (~op3&op1), > >>>>> (op1&~op3) | (op3&op2), > >>>>> (op1&~op3) | (op2&op3), > >>>>> (op3&op2) | (op1&~op3), > >>>>> (op2&op3) | (op1&~op3), > >>>>> > >>>>> Combine pass will swap (op1&~op3) to (~op3&op1) due to commutative > >>>>> canonical, which could reduce it to the FIRST 4 patterns, but it won't > >>>>> swap (op2&op3) | (~op3&op1) to (~op3&op1) | (op2&op3), so this patch > >>>>> handles it with two patterns with different NOT op3 position and check > >>>>> equality inside it. > >>>> > >>>> Yup, that latter case does not have canonicalisation rules. Btw, not > >>>> only combine does this canonicalisation: everything should, > >>>> non-canonical RTL is invalid RTL (in the instruction stream, you can do > >>>> everything in temporary code of course, as long as the RTL isn't > >>>> malformed). > >>>> > >>>>> -(define_insn "*altivec_vsel<mode>" > >>>>> +(define_insn "altivec_vsel<mode>" > >>>>> [(set (match_operand:VM 0 "altivec_register_operand" "=v") > >>>>> - (if_then_else:VM > >>>>> - (ne:CC (match_operand:VM 1 "altivec_register_operand" "v") > >>>>> - (match_operand:VM 4 "zero_constant" "")) > >>>>> - (match_operand:VM 2 "altivec_register_operand" "v") > >>>>> - (match_operand:VM 3 "altivec_register_operand" "v")))] > >>>>> - "VECTOR_MEM_ALTIVEC_P (<MODE>mode)" > >>>>> - "vsel %0,%3,%2,%1" > >>>>> + (ior:VM > >>>>> + (and:VM > >>>>> + (not:VM (match_operand:VM 3 "altivec_register_operand" "v")) > >>>>> + (match_operand:VM 1 "altivec_register_operand" "v")) > >>>>> + (and:VM > >>>>> + (match_operand:VM 2 "altivec_register_operand" "v") > >>>>> + (match_operand:VM 4 "altivec_register_operand" "v"))))] > >>>>> + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode) > >>>>> + && (rtx_equal_p (operands[2], operands[3]) > >>>>> + || rtx_equal_p (operands[4], operands[3]))" > >>>>> + { > >>>>> + if (rtx_equal_p (operands[2], operands[3])) > >>>>> + return "vsel %0,%1,%4,%3"; > >>>>> + else > >>>>> + return "vsel %0,%1,%2,%3"; > >>>>> + } > >>>>> [(set_attr "type" "vecmove")]) > >>>> > >>>> That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I > >>>> think. So please write this as two patterns (and keep the expand if > >>>> that helps). > >>> > >>> I was a bit concerned that there would be a lot of duplicate code if we > >>> write two patterns for each vsel, totally 4 similar patterns in > >>> altivec.md and another 4 in vsx.md make it difficult to maintain, > >>> however > >>> I updated it since you prefer this way, as you pointed out the xxsel in > >>> vsx.md could be folded by later patch. > >>> > >>>> > >>>>> +(define_insn "altivec_vsel<mode>2" > >>>> > >>>> (same here of course). > >>>> > >>>>> ;; Fused multiply add. > >>>>> diff --git a/gcc/config/rs6000/rs6000-call.c > >>>>> b/gcc/config/rs6000/rs6000-call.c > >>>>> index f5676255387..d65bdc01055 100644 > >>>>> --- a/gcc/config/rs6000/rs6000-call.c > >>>>> +++ b/gcc/config/rs6000/rs6000-call.c > >>>>> @@ -3362,11 +3362,11 @@ const struct altivec_builtin_types > >>>>> altivec_overloaded_builtins[] = { > >>>>> RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, > >>>>> RS6000_BTI_unsigned_V2DI }, > >>>>> { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI, > >>>>> RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, > >>>>> RS6000_BTI_V2DI }, > >>>>> - { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI, > >>>>> + { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS, > >>>> > >>>> Are the _uns things still used for anything? But, let's not change > >>>> this until Bill's stuff is in :-) > >>>> > >>>> Why do you want to change this here, btw? I don't understand. > >>> > >>> OK, they are actually "unsigned type" overload builtin functions, change > >>> it or not so far won't cause functionality issue, I will revert this > >>> change > >>> in the updated patch. > >>> > >>>> > >>>>> + if (target == 0 > >>>>> + || GET_MODE (target) != tmode > >>>>> + || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) > >>>> > >>>> No space after ! and other unary operators (except for casts and other > >>>> operators you write with alphanumerics, like "sizeof"). I know you > >>>> copied this code, but :-) > >>> > >>> OK, thanks. > >>> > >>>> > >>>>> @@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx > >>>>> op_true, rtx op_false, > >>>>> case GEU: > >>>>> case LTU: > >>>>> case LEU: > >>>>> - /* Mark unsigned tests with CCUNSmode. */ > >>>>> - cc_mode = CCUNSmode; > >>>>> /* Invert condition to avoid compound test if necessary. */ > >>>>> if (rcode == GEU || rcode == LEU) > >>>> > >>>> So this is related to the _uns thing. Could you split off that change? > >>>> Probably as an earlier patch (but either works for me). > >>> > >>> Not related to the ALTIVEC_BUILTIN_VSEL_2DI_UNS things, previously > >>> cc_mode > >>> is a parameter to generate the condition for IF_THEN_ELSE > >>> instruction, now > >>> we don't need it again as we use IOR (AND... AND...) style, remove it > >>> to avoid > >>> build error. > >>> > >>> > >>> - cond2 = gen_rtx_fmt_ee (NE, cc_mode, gen_lowpart (dest_mode, mask), > >>> - CONST0_RTX (dest_mode)); > >>> - emit_insn (gen_rtx_SET (dest, > >>> - gen_rtx_IF_THEN_ELSE (dest_mode, > >>> - cond2, > >>> - op_true, > >>> - op_false))); > >>> + rtx tmp = gen_rtx_IOR (dest_mode, > >>> + gen_rtx_AND (dest_mode, gen_rtx_NOT > >>> (dest_mode, mask), > >>> + op_false), > >>> + gen_rtx_AND (dest_mode, mask, op_true)); > >>> + emit_insn (gen_rtx_SET (dest, tmp)); > >>> > >>> > >>>> > >>>>> @@ -15629,6 +15625,9 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx > >>>>> op_true, rtx op_false, > >>>>> if (!mask) > >>>>> return 0; > >>>>> + if (mask_mode != dest_mode) > >>>>> + mask = simplify_gen_subreg (dest_mode, mask, mask_mode, 0); > >>>> > >>>> Indent just two characters please: line continuations (usually) align, > >>>> but indents do not.> > >>>> Can you fold vsel and xxsel together completely? They have exactly the > >>>> same semantics! This does not have to be in this patch of course. > >>> > >>> I noticed that vperm/xxperm are folded together, do you mean fold > >>> vsel/xxsel > >>> like them? It's attached as: > >>> 0002-rs6000-Fold-xxsel-to-vsel-since-they-have-same-seman.patch > >>> > >>> > >>> Thanks, > >>> Xionghu > >> > > > > -- > Thanks, > Xionghu