Re: [PATCH rs6000]Fix PR92132

2019-11-10 Thread Kewen.Lin
Hi Segher,

on 2019/11/9 上午1:36, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Nov 08, 2019 at 10:38:13AM +0800, Kewen.Lin wrote:
 +  [(set (match_operand: 0 "vint_operand")
 +   (match_operator 1 "comparison_operator"
>>>
>>> If you make an iterator for this instead, it is simpler code (you can then
>>> use  to do all these cases in one statement).
>>
>> If my understanding is correct and based on some tries before, I think we
>> have to leave these **CASEs** there (at least at the 1st level define_expand
>> for vec_cmp*), since vec_cmp* doesn't have  field in the pattern name.
>> The code can be only extracted from operator 1.  I tried to add one dummy
>> operand to hold  but it's impractical.
>>
>> Sorry, I may miss something here, I'm happy to make a subsequent patch to
>> uniform these cases if there is a good way to run a code iterator on them.
> 
> Instead of
> 
>   [(set (match_operand:VEC_I 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"
> [(match_operand:VEC_I 2 "vint_operand")
>  (match_operand:VEC_I 3 "vint_operand")]))]
> 
> you can do
> 
>   [(set (match_operand:VEC_I 0 "vint_operand")
>   (some_iter:VEC_I (match_operand:VEC_I 1 "vint_operand")
>(match_operand:VEC_I 2 "vint_operand")))]
> 

Thanks for your example.  But I'm afraid that it doesn't work for these 
patterns.

I tried it with simple code below:

; For testing
(define_code_iterator some_iter [eq gt])

(define_expand "vec_cmp"
  [(set (match_operand:VEC_I 0 "vint_operand")
(some_iter:VEC_I
  (match_operand:VEC_I 2 "vint_operand")
  (match_operand:VEC_I 3 "vint_operand")))]
  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
{
  emit_insn (gen_vector_ (operands[0], operands[2], operands[3]));
  DONE;
})

Error messages were emitted:

/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv16qiv16qi'
/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv8hiv8hi'
/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv4siv4si'
/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv2div2di'

It's expected, since the pattern here is vec_cmp rather than
vec_cmp, your example would work perfectly for the later.
Btw, in that pattern, the comparison operator is passed in operand 1.


BR,
Kewen

> with some_iter some code_iterator, (note you need to renumber), and in the
> body you can then just use  (or , or some other code_attribute).
> 
> code_iterator is more flexible than match_operator, in most ways.
> 
> 
> Segher
> 



Re: [PATCH rs6000]Fix PR92132

2019-11-08 Thread Segher Boessenkool
Hi!

On Fri, Nov 08, 2019 at 10:38:13AM +0800, Kewen.Lin wrote:
> >> +  [(set (match_operand: 0 "vint_operand")
> >> +   (match_operator 1 "comparison_operator"
> > 
> > If you make an iterator for this instead, it is simpler code (you can then
> > use  to do all these cases in one statement).
> 
> If my understanding is correct and based on some tries before, I think we
> have to leave these **CASEs** there (at least at the 1st level define_expand
> for vec_cmp*), since vec_cmp* doesn't have  field in the pattern name.
> The code can be only extracted from operator 1.  I tried to add one dummy
> operand to hold  but it's impractical.
> 
> Sorry, I may miss something here, I'm happy to make a subsequent patch to
> uniform these cases if there is a good way to run a code iterator on them.

Instead of

  [(set (match_operand:VEC_I 0 "vint_operand")
(match_operator 1 "signed_or_equality_comparison_operator"
  [(match_operand:VEC_I 2 "vint_operand")
   (match_operand:VEC_I 3 "vint_operand")]))]

you can do

  [(set (match_operand:VEC_I 0 "vint_operand")
(some_iter:VEC_I (match_operand:VEC_I 1 "vint_operand")
 (match_operand:VEC_I 2 "vint_operand")))]

with some_iter some code_iterator, (note you need to renumber), and in the
body you can then just use  (or , or some other code_attribute).

code_iterator is more flexible than match_operator, in most ways.


Segher


Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin
Hi Segher,

on 2019/11/8 上午8:07, Segher Boessenkool wrote:
> Hi!
> 
>>> Half are pretty simple:
>>>
>>> lt(a,b) = gt(b,a)
>>> gt(a,b) = gt(a,b)
>>> eq(a,b) = eq(a,b)
>>> le(a,b) = ge(b,a)
>>> ge(a,b) = ge(a,b)
>>>
>>> ltgt(a,b) = ge(a,b) ^ ge(b,a)
>>> ord(a,b)  = ge(a,b) | ge(b,a)
>>>
>>> The other half are the negations of those:
>>>
>>> unge(a,b) = ~gt(b,a)
>>> unle(a,b) = ~gt(a,b)
>>> ne(a,b)   = ~eq(a,b)
>>> ungt(a,b) = ~ge(b,a)
>>> unlt(a,b) = ~ge(a,b)
>>>
>>> uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
>>> un(a,b) = ~(ge(a,b) | ge(b,a))
>>
>> Awesome!  Do you suggest refactoring on them?  :)
> 
> I'd do the first five in one pattern (which then swaps two ops and the
> condition in the lt and le case), and the other five in another pattern.
> And the rest in two or four patterns?  Just try it out, see what works
> well.  It helps to do a bunch together in one pattern, but if that then
> turns into special cases for everything, more might be lost than gained.> 

Got it, I'll make a refactoring patch for this part later.

> 
>>> 8 codes, ordered:never lt   gt   ltgt eq   le   ge   ordered
>>> 8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
>>> 8 codes, fast-math:  never lt   gt   ne   eq   le   ge   always
>>> 8 codes, non-fp: never lt   gt   ne   eq   le   ge   always
>>
>> Sorry, I don't quite follow this table.  What's the column heads?
> 
> The first row is the eight possible fp conditions that are not always
> true if unordered is set; the second row is those that *are* always true
> if it is set.  The other two rows (which are the same) is just the eight
> conditions that do not test unordered at all.
> 
> The tricky one is "ne": for FP *with* NaNs, "ne" means "less than, or
> greater than, or unordered", while without NaNs (i.e. -ffast-math) it
> means "less than, or greater than".
> 
> You could write the column heads as
> --/--/--  lt/--/--  --/gt/--  lt/gt/--  --/--/eq  lt/--/eq  --/gt/eq  lt/gt/eq
> if that helps?  Just the eight combinations of the first free flags.
> 

Thanks a lot for the explanation.  It's helpful!
 

>> +;; For signed integer vectors comparison.
>> +(define_expand "vec_cmp"
>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>> +(match_operator 1 "signed_or_equality_comparison_operator"
>> +  [(match_operand:VEC_I 2 "vint_operand")
>> +   (match_operand:VEC_I 3 "vint_operand")]))]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>> +{
>> +  enum rtx_code code = GET_CODE (operands[1]);
>> +  rtx tmp = gen_reg_rtx (mode);
>> +  switch (code)
>> +{
>> +case NE:
>> +  emit_insn (gen_vector_eq (operands[0], operands[2], 
>> operands[3]));
>> +  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
>> +  break;
>> +case EQ:
>> +  emit_insn (gen_vector_eq (operands[0], operands[2], 
>> operands[3]));
>> +  break;
>> +case GE:
>> +  emit_insn (gen_vector_nlt (operands[0],operands[2], operands[3],
>> +   tmp));
>> +  break;
>> +case GT:
>> +  emit_insn (gen_vector_gt (operands[0], operands[2], 
>> operands[3]));
>> +  break;
>> +case LE:
>> +  emit_insn (gen_vector_ngt (operands[0], operands[2], 
>> operands[3],
>> +   tmp));
>> +  break;
>> +case LT:
>> +  emit_insn (gen_vector_gt (operands[0], operands[3], 
>> operands[2]));
>> +  break;
>> +default:
>> +  gcc_unreachable ();
>> +  break;
>> +}
>> +  DONE;
>> +})
> 
> I would think this can be done easier, but it is alright for now, it can
> be touched up later if we want.
> 
>> +;; For float point vectors comparison.
>> +(define_expand "vec_cmp"
> 
> This, too.
> 
>> +  [(set (match_operand: 0 "vint_operand")
>> + (match_operator 1 "comparison_operator"
> 
> If you make an iterator for this instead, it is simpler code (you can then
> use  to do all these cases in one statement).

If my understanding is correct and based on some tries before, I think we
have to leave these **CASEs** there (at least at the 1st level define_expand
for vec_cmp*), since vec_cmp* doesn't have  field in the pattern name.
The code can be only extracted from operator 1.  I tried to add one dummy
operand to hold  but it's impractical.

Sorry, I may miss something here, I'm happy to make a subsequent patch to
uniform these cases if there is a good way to run a code iterator on them.

> 
> But that can be done later.  Okay for trunk.  Thanks!
> 

Many thanks for your time!


BR,
Kewen



Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Segher Boessenkool
Hi!

On Thu, Nov 07, 2019 at 06:17:53PM +0800, Kewen.Lin wrote:
> on 2019/11/7 上午7:49, Segher Boessenkool wrote:
> > The expander named "one_cmpl3":
> > 
> > Erm.  2, not 3 :-)

> Ah, sorry I didn't notice we have one cmpl**3** but actually for one
> cmpl**2** expand, a bit surprised.  Done.  Thanks for pointing that.

Yeah, I suddenly couldn't find it myself either.  Real head-scratcher :-)

> > etc., so you can just delete the expand and rename the insn to the proper
> > name (one_cmpl2).  It sometimes is useful to have an expand like
> > this if there are multiple insns that could implement this, but that is
> > not the case here.
> 
> OK, example like vector_select?  :)

Sure, like that.  There are many examples where you are required to have
just one define_expand, it is called by name after all, but you want to
have different define_insns (for different cpus, say).

> > So we have only gt/ge/eq.
> > 
> > I think the following are ooptimal (not tested!):
> > 
> > lt(a,b) = gt(b,a)
> yes, this is what I used for that operator.
> 
> > gt(a,b) = gt(a,b)
> > eq(a,b) = eq(a,b)
> > un(a,b) = ~(ge(a,b) | ge(b,a))
> > 
> 
> existing code uses (~ge(a,b) & ~ge(b,a))
> but should be the same.

Yup, it's just ge/ge/nor, whatever way you write it :-)  (RTL requires
you write the expression in your form, with all the NOTs "pushed in").

> > ltgt(a,b) = ge(a,b) ^ ge(b,a)
> 
> existing code uses gt(a,b) | gt(b,a)
> but should be the same.

Yup, computes exactly the same, and exactly the same execution speeds.

Your form might be slightly easier to optimise with (it has no XOR).

> > Half are pretty simple:
> > 
> > lt(a,b) = gt(b,a)
> > gt(a,b) = gt(a,b)
> > eq(a,b) = eq(a,b)
> > le(a,b) = ge(b,a)
> > ge(a,b) = ge(a,b)
> > 
> > ltgt(a,b) = ge(a,b) ^ ge(b,a)
> > ord(a,b)  = ge(a,b) | ge(b,a)
> > 
> > The other half are the negations of those:
> > 
> > unge(a,b) = ~gt(b,a)
> > unle(a,b) = ~gt(a,b)
> > ne(a,b)   = ~eq(a,b)
> > ungt(a,b) = ~ge(b,a)
> > unlt(a,b) = ~ge(a,b)
> > 
> > uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
> > un(a,b) = ~(ge(a,b) | ge(b,a))
> 
> Awesome!  Do you suggest refactoring on them?  :)

I'd do the first five in one pattern (which then swaps two ops and the
condition in the lt and le case), and the other five in another pattern.
And the rest in two or four patterns?  Just try it out, see what works
well.  It helps to do a bunch together in one pattern, but if that then
turns into special cases for everything, more might be lost than gained.

> > And please remember to test everythin with -ffast-math :-)  That is, when
> > flag_finite_math_only is set.  You cannot get unordered results, then,
> > making the optimal sequences different in some cases (and changing what
> > "ne" means!)
> 
> Thanks for the remind!  On RTL pattern, I think we won't get any un*
> related operators with -ffast-math, so that part on un* expansion
> would be fine?

Yeah, but look what you should do for "ne" :-)

> > 8 codes, ordered:never lt   gt   ltgt eq   le   ge   ordered
> > 8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
> > 8 codes, fast-math:  never lt   gt   ne   eq   le   ge   always
> > 8 codes, non-fp: never lt   gt   ne   eq   le   ge   always
> 
> Sorry, I don't quite follow this table.  What's the column heads?

The first row is the eight possible fp conditions that are not always
true if unordered is set; the second row is those that *are* always true
if it is set.  The other two rows (which are the same) is just the eight
conditions that do not test unordered at all.

The tricky one is "ne": for FP *with* NaNs, "ne" means "less than, or
greater than, or unordered", while without NaNs (i.e. -ffast-math) it
means "less than, or greater than".

You could write the column heads as
--/--/--  lt/--/--  --/gt/--  lt/gt/--  --/--/eq  lt/--/eq  --/gt/eq  lt/gt/eq
if that helps?  Just the eight combinations of the first free flags.

> > Yes, it is redundant, the comparison code already says if it is an
> > unsigned comparison.  So this a question about the generic patterns, not
> > your implementation of them :-)
> > 
> > And if it is *one* pattern then handling LTU etc. makes perfect sense.
> 
> Fully agree, but it separates for now.  :)

Sure :-)

> Thanks again!  I've updated a new version as some comments, you can review 
> this
> one to save your time.  :)


> +;; Return 1 if OP is a signed comparison or an equality operator.
> +(define_predicate "signed_or_equality_comparison_operator"
> +  (ior (match_operand 0 "equality_operator")
> +   (match_operand 0 "signed_comparison_operator")))
> +
> +;; Return 1 if OP is an unsigned comparison or an equality operator.
> +(define_predicate "unsigned_or_equality_comparison_operator"
> +  (ior (match_operand 0 "equality_operator")
> +   (match_operand 0 "unsigned_comparison_operator")))

Hrm.  Unpleasant.

> +(define_expand "vcond_mask_"
> +  [(match_operand:VEC_I 0 "vint_operand")
> +   (match_operand:VEC_I 1 "vint_operand")
> + 

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin
Hi Segher,

on 2019/11/7 上午7:49, Segher Boessenkool wrote:
> 
> The expander named "one_cmpl3":
> 
> Erm.  2, not 3 :-)
> 
> (define_expand "one_cmpl2"
>   [(set (match_operand:BOOL_128 0 "vlogical_operand")
> (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))]
>   ""
>   "")
> 
> while the define_insn is
> 
> (define_insn_and_split "*one_cmpl3_internal"
>   [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
> (not:BOOL_128
>   (match_operand:BOOL_128 1 "vlogical_operand" "")))]
>   ""
> {
> 

Ah, sorry I didn't notice we have one cmpl**3** but actually for one
cmpl**2** expand, a bit surprised.  Done.  Thanks for pointing that.

> etc., so you can just delete the expand and rename the insn to the proper
> name (one_cmpl2).  It sometimes is useful to have an expand like
> this if there are multiple insns that could implement this, but that is
> not the case here.
> 

OK, example like vector_select?  :)

 +(define_code_iterator fpcmpun [ungt unge unlt unle])
>>>
>>> Why these four?  Should there be more?  Should this be added to some
>>> existing iterator?
>>
>> For floating point comparison operator and vector type, currently rs6000
>> supports eq, gt, ge, *ltgt, *unordered, *ordered, *uneq (* for unnamed).
>> We can leverage gt, ge, eq for lt, le, ne, then these four left.
> 
> There are four conditions for FP: lt/gt/eq/un.  For every comparison,
> exactly one of the four is true.  If not HONOR_NANS for this mode you
> never have un, so it is one of lt/gt/eq then, just like with integers.
> 
> If we have HONOR_NANS(mode) (or !flag_finite_math_only), there are 14
> possible combinations to test for (testing for any of the four or none
> of the four is easy ;-) )
> 
> Four test just if lt, gt, eq, or un is set.  Another four test if one of
> the flags is *not* set, or said differently, if one of three flags is set:
> ordered, ne, unle, unge.  The remaining six test two flags each: ltgt, le,
> unlt, ge, ungt, uneq.

Yes, for these 14, rs6000 current support status:

  ge: vector_ge -> define_expand -> match vsx/altivec insn
  gt: vector_gt -> define_expand -> match vsx/altivec insn
  eq: vector_eq -> define_expand -> match vsx/altivec insn
  
  ltgt: *vector_ltgt -> define_insn_and_split
  ord: *vector_ordered -> define_insn_and_split
  unord: *vector_unordered -> define_insn_and_split
  uneq: *vector_uneq -> define_insn_and_split

  ne: no RTL pattern.
  lt: Likewise.
  le: Likewise.
  unge: Likewise.
  ungt: Likewise.
  unle: Likewise.
  unlt: Likewise.

Since I thought the un{ge,gt,le,lt} is a bit complicated than ne/lt/le (wrong
thought actually), I added the specific define_expand for them.  As your
simpler example below, I've added the RTL patterns with define_expand for the
missing ne, lt, le, unge, ungt, unle, unlt.

I didn't use iterator any more, since without further refactoring, just
several ones (2 each pair) can be shared with iterators, and need to check
 to decide swap or not.  Maybe the subsequent uniform refactoring patch
is required to make it?  

> 
>> I originally wanted to merge them into the existing unordered or uneq, but
>> I found it's hard to share their existing patterns.  For example, the uneq
>> looks like:
>>
>>   [(set (match_dup 3)
>>  (gt:VEC_F (match_dup 1)
>>(match_dup 2)))
>>(set (match_dup 4)
>>  (gt:VEC_F (match_dup 2)
>>(match_dup 1)))
>>(set (match_dup 0)
>>  (and:VEC_F (not:VEC_F (match_dup 3))
>> (not:VEC_F (match_dup 4]
> 
> Or ge/ge/eqv, etc. -- there are multiple options.
> 
>> While ungt looks like:
>>
>>   [(set (match_dup 3)
>>  (ge:VEC_F (match_dup 1)
>>(match_dup 2)))
>>(set (match_dup 4)
>>  (ge:VEC_F (match_dup 2)
>>(match_dup 1)))
>>(set (match_dup 3)
>>  (ior:VEC_F (not:VEC_F (match_dup 3))
>> (not:VEC_F (match_dup 4
>>(set (match_dup 4)
>>  (gt:VEC_F (match_dup 1)
>>(match_dup 2)))
>>(set (match_dup 3)
>>  (ior:VEC_F (match_dup 3)
>> (match_dup 4)))]
> 
> (set (match_dup 3)
>  (ge:VEC_F (match_dup 2)
>(match_dup 1)))
> (set (match_dup 0)
>  (not:VEC_F (match_dup 3)))
> 
> should be enough?
> 

Nice!  I was trapped to get unordered first.  :(

> 
> So we have only gt/ge/eq.
> 
> I think the following are ooptimal (not tested!):
> 
> lt(a,b) = gt(b,a)
yes, this is what I used for that operator.

> gt(a,b) = gt(a,b)
> eq(a,b) = eq(a,b)
> un(a,b) = ~(ge(a,b) | ge(b,a))
> 

existing code uses (~ge(a,b) & ~ge(b,a))
but should be the same.

> ltgt(a,b) = ge(a,b) ^ ge(b,a)

existing code uses gt(a,b) | gt(b,a)
but should be the same.

> le(a,b)   = ge(b,a)
> unlt(a,b) = ~ge(a,b)
> ge(a,b)   = ge(a,b)
> ungt(a,b) = ~ge(b,a)
> uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
> 

existing code uses ~gt(a,b) & ~gt(b,a)
but should be the same.

> ord(a,b)  = ge(a,b) | ge(b,a)
> ne(a,b)   = ~eq(a,b)
> unle(a,b) = ~gt(a,b)
> ung

Re: [PATCH rs6000]Fix PR92132

2019-11-06 Thread Segher Boessenkool
Hi Ke Wen,

On Tue, Nov 05, 2019 at 04:35:05PM +0800, Kewen.Lin wrote:
> >>  ;; 128-bit one's complement
> >> -(define_insn_and_split "*one_cmpl3_internal"
> >> +(define_insn_and_split "one_cmpl3_internal"
> > 
> > Instead, rename it to "one_cmpl3" and delete the define_expand that
> > serves no function?
> 
> Renamed.  Sorry, what's the "define_expand" specified here.  I thought it's
> for existing one_cmpl3 but I didn't find it. 

The expander named "one_cmpl3":

Erm.  2, not 3 :-)

(define_expand "one_cmpl2"
  [(set (match_operand:BOOL_128 0 "vlogical_operand")
(not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))]
  ""
  "")

while the define_insn is

(define_insn_and_split "*one_cmpl3_internal"
  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
(not:BOOL_128
  (match_operand:BOOL_128 1 "vlogical_operand" "")))]
  ""
{

etc., so you can just delete the expand and rename the insn to the proper
name (one_cmpl2).  It sometimes is useful to have an expand like
this if there are multiple insns that could implement this, but that is
not the case here.

> >> +(define_code_iterator fpcmpun [ungt unge unlt unle])
> > 
> > Why these four?  Should there be more?  Should this be added to some
> > existing iterator?
> 
> For floating point comparison operator and vector type, currently rs6000
> supports eq, gt, ge, *ltgt, *unordered, *ordered, *uneq (* for unnamed).
> We can leverage gt, ge, eq for lt, le, ne, then these four left.

There are four conditions for FP: lt/gt/eq/un.  For every comparison,
exactly one of the four is true.  If not HONOR_NANS for this mode you
never have un, so it is one of lt/gt/eq then, just like with integers.

If we have HONOR_NANS(mode) (or !flag_finite_math_only), there are 14
possible combinations to test for (testing for any of the four or none
of the four is easy ;-) )

Four test just if lt, gt, eq, or un is set.  Another four test if one of
the flags is *not* set, or said differently, if one of three flags is set:
ordered, ne, unle, unge.  The remaining six test two flags each: ltgt, le,
unlt, ge, ungt, uneq.

> I originally wanted to merge them into the existing unordered or uneq, but
> I found it's hard to share their existing patterns.  For example, the uneq
> looks like:
> 
>   [(set (match_dup 3)
>   (gt:VEC_F (match_dup 1)
> (match_dup 2)))
>(set (match_dup 4)
>   (gt:VEC_F (match_dup 2)
> (match_dup 1)))
>(set (match_dup 0)
>   (and:VEC_F (not:VEC_F (match_dup 3))
>  (not:VEC_F (match_dup 4]

Or ge/ge/eqv, etc. -- there are multiple options.

> While ungt looks like:
> 
>   [(set (match_dup 3)
>   (ge:VEC_F (match_dup 1)
> (match_dup 2)))
>(set (match_dup 4)
>   (ge:VEC_F (match_dup 2)
> (match_dup 1)))
>(set (match_dup 3)
>   (ior:VEC_F (not:VEC_F (match_dup 3))
>  (not:VEC_F (match_dup 4
>(set (match_dup 4)
>   (gt:VEC_F (match_dup 1)
> (match_dup 2)))
>(set (match_dup 3)
>   (ior:VEC_F (match_dup 3)
>  (match_dup 4)))]

(set (match_dup 3)
 (ge:VEC_F (match_dup 2)
   (match_dup 1)))
(set (match_dup 0)
 (not:VEC_F (match_dup 3)))

should be enough?


So we have only gt/ge/eq.

I think the following are ooptimal (not tested!):

lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
un(a,b) = ~(ge(a,b) | ge(b,a))

ltgt(a,b) = ge(a,b) ^ ge(b,a)
le(a,b)   = ge(b,a)
unlt(a,b) = ~ge(a,b)
ge(a,b)   = ge(a,b)
ungt(a,b) = ~ge(b,a)
uneq(a,b) = ~(ge(a,b) ^ ge(b,a))

ord(a,b)  = ge(a,b) | ge(b,a)
ne(a,b)   = ~eq(a,b)
unle(a,b) = ~gt(a,b)
unge(a,b) = ~gt(b,a)

This is quite regular :-)  5 are done with one cmp; 5 are done with a cmp
and an inversion; 4 are done with two compares and one xor/eqv/or/nor.


Half are pretty simple:

lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
le(a,b) = ge(b,a)
ge(a,b) = ge(a,b)

ltgt(a,b) = ge(a,b) ^ ge(b,a)
ord(a,b)  = ge(a,b) | ge(b,a)

The other half are the negations of those:

unge(a,b) = ~gt(b,a)
unle(a,b) = ~gt(a,b)
ne(a,b)   = ~eq(a,b)
ungt(a,b) = ~ge(b,a)
unlt(a,b) = ~ge(a,b)

uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
un(a,b) = ~(ge(a,b) | ge(b,a))


And please remember to test everythin with -ffast-math :-)  That is, when
flag_finite_math_only is set.  You cannot get unordered results, then,
making the optimal sequences different in some cases (and changing what
"ne" means!)

8 codes, ordered:never lt   gt   ltgt eq   le   ge   ordered
8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
8 codes, fast-math:  never lt   gt   ne   eq   le   ge   always
8 codes, non-fp: never lt   gt   ne   eq   le   ge   always


> >> +;; Same mode for condition true/false values and predicate operand.
> >> +(define_expand "vcond_mask_"
> >> +  [(match_operand:VEC_I 0 "vint_operand")
> >> +   (match_operand:VEC_I 1 "vint_operand")
> >> +   (match_operand:VEC_I 2 "vint_operand")
> >>

Re: [PATCH rs6000]Fix PR92132

2019-11-05 Thread Kewen.Lin
Hi Segher,

Thanks for the comments!

on 2019/11/2 上午7:17, Segher Boessenkool wrote:
> On Tue, Oct 29, 2019 at 01:16:53PM +0800, Kewen.Lin wrote:
>>  (vcond_mask_): New expand.
> 
> Say for which mode please?  Like
>   (vcond_mask_ for VEC_I and VEC_I): New expand.
> 

Fixed as below.

>>  (vcond_mask_): Likewise.
> 
> "for VEC_I and VEC_F", here, but the actual names in the pattern are for
> vector modes of same-size integer elements.  Maybe it is clear enough like
> this, dunno.

Changed to for VEC_F, New expand for float vector modes and same-size 
integer vector modes.

> 
>>  (vector_{ungt,unge,unlt,unle}): Likewise.
> 
> Never use wildcards (or shell expansions) in the "what changed" part of a
> changelog, because people try to search for that.

Thanks for the explanation, fixed. 

> 
>>  ;; 128-bit one's complement
>> -(define_insn_and_split "*one_cmpl3_internal"
>> +(define_insn_and_split "one_cmpl3_internal"
> 
> Instead, rename it to "one_cmpl3" and delete the define_expand that
> serves no function?

Renamed.  Sorry, what's the "define_expand" specified here.  I thought it's
for existing one_cmpl3 but I didn't find it. 

> 
>> +(define_code_iterator fpcmpun [ungt unge unlt unle])
> 
> Why these four?  Should there be more?  Should this be added to some
> existing iterator?

For floating point comparison operator and vector type, currently rs6000
supports eq, gt, ge, *ltgt, *unordered, *ordered, *uneq (* for unnamed).
We can leverage gt, ge, eq for lt, le, ne, then these four left.

I originally wanted to merge them into the existing unordered or uneq, but
I found it's hard to share their existing patterns.  For example, the uneq
looks like:

  [(set (match_dup 3)
(gt:VEC_F (match_dup 1)
  (match_dup 2)))
   (set (match_dup 4)
(gt:VEC_F (match_dup 2)
  (match_dup 1)))
   (set (match_dup 0)
(and:VEC_F (not:VEC_F (match_dup 3))
   (not:VEC_F (match_dup 4]

While ungt looks like:

  [(set (match_dup 3)
(ge:VEC_F (match_dup 1)
  (match_dup 2)))
   (set (match_dup 4)
(ge:VEC_F (match_dup 2)
  (match_dup 1)))
   (set (match_dup 3)
(ior:VEC_F (not:VEC_F (match_dup 3))
   (not:VEC_F (match_dup 4
   (set (match_dup 4)
(gt:VEC_F (match_dup 1)
  (match_dup 2)))
   (set (match_dup 3)
(ior:VEC_F (match_dup 3)
   (match_dup 4)))]
  
> 
> It's not all comparisons including unordered, there are uneq, unordered
> itself, and ne as well.

Yes, they are not, just a list holding missing support comparison operator.

> 
>> +;; Same mode for condition true/false values and predicate operand.
>> +(define_expand "vcond_mask_"
>> +  [(match_operand:VEC_I 0 "vint_operand")
>> +   (match_operand:VEC_I 1 "vint_operand")
>> +   (match_operand:VEC_I 2 "vint_operand")
>> +   (match_operand:VEC_I 3 "vint_operand")]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>> +{
>> +  emit_insn (gen_vector_select_ (operands[0], operands[2], 
>> operands[1],
>> +  operands[3]));
>> +  DONE;
>> +})
> 
> So is this exactly the same as vsel/xxsel?

Yes, expanded into if_then_else and ne against zero, can match their patterns.

> 
>> +;; For signed integer vectors comparison.
>> +(define_expand "vec_cmp"
> 
>> +case GEU:
>> +  emit_insn (
>> +gen_vector_nltu (operands[0], operands[2], operands[3], tmp));
>> +  break;
>> +case GTU:
>> +  emit_insn (gen_vector_gtu (operands[0], operands[2], 
>> operands[3]));
>> +  break;
>> +case LEU:
>> +  emit_insn (
>> +gen_vector_ngtu (operands[0], operands[2], operands[3], tmp));
>> +  break;
>> +case LTU:
>> +  emit_insn (gen_vector_gtu (operands[0], operands[3], 
>> operands[2]));
>> +  break;
> 
> You shouldn't allow those for signed comparisons, that will only hide
> problems.

OK, moved into vec_cmpu*.

> 
> You can do all the rest with some iterator / code attribute?  Or two cases,
> one for the codes that need ops 2 and 3 swapped, one for the rest?
> 

Sorry, I tried to use code attributes here but failed.  I think the reason is 
the
pattern name doesn't have .  I can only get the code from operand 1, then
have to use "switch case"?  I can change it with one more define_expand, but
is that what we wanted?  It looks we still need "case"s.

define_expand "vec_cmp"
...
{...
enum rtx_code code = GET_CODE (operands[1]);
switch (code)
  case GT:
  ... gen_vec_cmpgt
  ...
}

define_expand "vec_cmp"
  ... gen_vector_


>> +;; For unsigned integer vectors comparison.
>> +(define_expand "vec_cmpu"
>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>> +(match_operator 1 "comparison_operator"
>> +  [(match_operand:VEC_I 2 "vint_operand")
>> +   (match_operand:VEC_I 3 "vint_operand")]))]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>> +{
>> +  emit_insn (gen_vec_cmp (operands[0], operands[1],
>> +  

Re: [PATCH rs6000]Fix PR92132

2019-11-01 Thread Segher Boessenkool
On Tue, Oct 29, 2019 at 01:16:53PM +0800, Kewen.Lin wrote:
>   (vcond_mask_): New expand.

Say for which mode please?  Like
(vcond_mask_ for VEC_I and VEC_I): New expand.

>   (vcond_mask_): Likewise.

"for VEC_I and VEC_F", here, but the actual names in the pattern are for
vector modes of same-size integer elements.  Maybe it is clear enough like
this, dunno.

>   (vector_{ungt,unge,unlt,unle}): Likewise.

Never use wildcards (or shell expansions) in the "what changed" part of a
changelog, because people try to search for that.

>  ;; 128-bit one's complement
> -(define_insn_and_split "*one_cmpl3_internal"
> +(define_insn_and_split "one_cmpl3_internal"

Instead, rename it to "one_cmpl3" and delete the define_expand that
serves no function?

> +(define_code_iterator fpcmpun [ungt unge unlt unle])

Why these four?  Should there be more?  Should this be added to some
existing iterator?

It's not all comparisons including unordered, there are uneq, unordered
itself, and ne as well.

> +;; Same mode for condition true/false values and predicate operand.
> +(define_expand "vcond_mask_"
> +  [(match_operand:VEC_I 0 "vint_operand")
> +   (match_operand:VEC_I 1 "vint_operand")
> +   (match_operand:VEC_I 2 "vint_operand")
> +   (match_operand:VEC_I 3 "vint_operand")]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> +{
> +  emit_insn (gen_vector_select_ (operands[0], operands[2], operands[1],
> +   operands[3]));
> +  DONE;
> +})

So is this exactly the same as vsel/xxsel?

> +;; For signed integer vectors comparison.
> +(define_expand "vec_cmp"

> +case GEU:
> +  emit_insn (
> + gen_vector_nltu (operands[0], operands[2], operands[3], tmp));
> +  break;
> +case GTU:
> +  emit_insn (gen_vector_gtu (operands[0], operands[2], 
> operands[3]));
> +  break;
> +case LEU:
> +  emit_insn (
> + gen_vector_ngtu (operands[0], operands[2], operands[3], tmp));
> +  break;
> +case LTU:
> +  emit_insn (gen_vector_gtu (operands[0], operands[3], 
> operands[2]));
> +  break;

You shouldn't allow those for signed comparisons, that will only hide
problems.

You can do all the rest with some iterator / code attribute?  Or two cases,
one for the codes that need ops 2 and 3 swapped, one for the rest?

> +;; For unsigned integer vectors comparison.
> +(define_expand "vec_cmpu"
> +  [(set (match_operand:VEC_I 0 "vint_operand")
> + (match_operator 1 "comparison_operator"
> +   [(match_operand:VEC_I 2 "vint_operand")
> +(match_operand:VEC_I 3 "vint_operand")]))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> +{
> +  emit_insn (gen_vec_cmp (operands[0], operands[1],
> +   operands[2], operands[3]));
> +  DONE;
> +})

unsigned_comparison_operator?

Why *are* there separate vec_cmp and vec_cmpu patterns, in the first place?


Segher


Re: [PATCH rs6000]Fix PR92132

2019-10-28 Thread Kewen.Lin
Fixed one place without consistent mode. 

Bootstrapped and regress testing passed on powerpc64le-linux.


Thanks!
Kewen

---

gcc/ChangeLog

2019-10-25  Kewen Lin  

PR target/92132
* config/rs6000/rs6000.md (one_cmpl3_internal): Expose name.
* config/rs6000/vector.md (fpcmpun): New code_iterator.
(vcond_mask_): New expand.
(vcond_mask_): Likewise.
(vec_cmp): Likewise.
(vec_cmpu): Likewise.
(vec_cmp): Likewise.
(vector_{ungt,unge,unlt,unle}): Likewise.
(vector_uneq): Expose name.
(vector_ltgt): Likewise.
(vector_unordered): Likewise.
(vector_ordered): Likewise.

gcc/testsuite/ChangeLog

2019-10-25  Kewen Lin  

PR target/92132
* gcc.target/powerpc/pr92132-fp-1.c: New test.
* gcc.target/powerpc/pr92132-fp-2.c: New test.
* gcc.target/powerpc/pr92132-int-1.c: New test.
* gcc.target/powerpc/pr92132-int-2.c: New test.



diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d0cca1e..2a68548 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6800,7 +6800,7 @@
 (const_string "16")))])
 
 ;; 128-bit one's complement
-(define_insn_and_split "*one_cmpl3_internal"
+(define_insn_and_split "one_cmpl3_internal"
   [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
(not:BOOL_128
  (match_operand:BOOL_128 1 "vlogical_operand" "")))]
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 886cbad..0ef64eb 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -107,6 +107,8 @@
 (smin "smin")
 (smax "smax")])
 
+(define_code_iterator fpcmpun [ungt unge unlt unle])
+
 
 ;; Vector move instructions.  Little-endian VSX loads and stores require
 ;; special handling to circumvent "element endianness."
@@ -493,6 +495,241 @@
 FAIL;
 })
 
+;; To support vector condition vectorization, define vcond_mask and vec_cmp.
+
+;; Same mode for condition true/false values and predicate operand.
+(define_expand "vcond_mask_"
+  [(match_operand:VEC_I 0 "vint_operand")
+   (match_operand:VEC_I 1 "vint_operand")
+   (match_operand:VEC_I 2 "vint_operand")
+   (match_operand:VEC_I 3 "vint_operand")]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  emit_insn (gen_vector_select_ (operands[0], operands[2], operands[1],
+ operands[3]));
+  DONE;
+})
+
+;; Condition true/false values are float but predicate operand is of
+;; type integer vector with same element size.
+(define_expand "vcond_mask_"
+  [(match_operand:VEC_F 0 "vfloat_operand")
+   (match_operand:VEC_F 1 "vfloat_operand")
+   (match_operand:VEC_F 2 "vfloat_operand")
+   (match_operand: 3 "vint_operand")]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  emit_insn (gen_vector_select_ (operands[0], operands[2], operands[1],
+ gen_lowpart (mode, operands[3])));
+  DONE;
+})
+
+;; For signed integer vectors comparison.
+(define_expand "vec_cmp"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+   (match_operator 1 "comparison_operator"
+ [(match_operand:VEC_I 2 "vint_operand")
+  (match_operand:VEC_I 3 "vint_operand")]))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx tmp = gen_reg_rtx (mode);
+  switch (code)
+{
+case NE:
+  emit_insn (gen_vector_eq (operands[0], operands[2], operands[3]));
+  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  break;
+case EQ:
+  emit_insn (gen_vector_eq (operands[0], operands[2], operands[3]));
+  break;
+case GE:
+  emit_insn (
+   gen_vector_nlt (operands[0], operands[2], operands[3], tmp));
+  break;
+case GT:
+  emit_insn (gen_vector_gt (operands[0], operands[2], operands[3]));
+  break;
+case LE:
+  emit_insn (
+   gen_vector_ngt (operands[0], operands[2], operands[3], tmp));
+  break;
+case LT:
+  emit_insn (gen_vector_gt (operands[0], operands[3], operands[2]));
+  break;
+case GEU:
+  emit_insn (
+   gen_vector_nltu (operands[0], operands[2], operands[3], tmp));
+  break;
+case GTU:
+  emit_insn (gen_vector_gtu (operands[0], operands[2], operands[3]));
+  break;
+case LEU:
+  emit_insn (
+   gen_vector_ngtu (operands[0], operands[2], operands[3], tmp));
+  break;
+case LTU:
+  emit_insn (gen_vector_gtu (operands[0], operands[3], operands[2]));
+  break;
+default:
+  gcc_unreachable ();
+  break;
+}
+  DONE;
+})
+
+;; For unsigned integer vectors comparison.
+(define_expand "vec_cmpu"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+   (match_operator 1 "comparison_operator"
+ [(match_operand:VEC_I 2 "vint_operand")
+  (match_operand:VEC_I 3 "vint_operand")]))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  em