On Fri, Jul 28, 2017 at 3:15 PM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> "Bin.Cheng" <amker.ch...@gmail.com> writes:
>> On Fri, Jul 28, 2017 at 12:55 PM, Richard Sandiford
>> <richard.sandif...@linaro.org> wrote:
>>> Bin Cheng <bin.ch...@arm.com> writes:
>>>> Hi,
>>>> This simple patch fixes the ICE by adding LTGT in
>>>> vec_cmp<mode><v_cmp_result> pattern.
>>>> I also modified the original test case into a compilation one since
>>>> -fno-wrapping-math
>>>> should not be used in general.
>>>> Bootstrap and test on AArch64, test result check for x86_64.  Is it OK?
>>>> I would also need to
>>>> backport it to gcc-7-branch.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-07-27  Bin Cheng  <bin.ch...@arm.com>
>>>>
>>>>       PR target/81228
>>>>       * config/aarch64/aarch64-simd.md (vec_cmp<mode><v_cmp_result>): Add
>>>>       LTGT.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2017-07-27  Bin Cheng  <bin.ch...@arm.com>
>>>>
>>>>       PR target/81228
>>>>       * gcc.dg/pr81228.c: New.
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>>>> b/gcc/config/aarch64/aarch64-simd.md
>>>> index 011fcec0..9cd67a2 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -2524,6 +2524,7 @@
>>>>      case EQ:
>>>>        comparison = gen_aarch64_cmeq<mode>;
>>>>        break;
>>>> +    case LTGT:
>>>>      case UNEQ:
>>>>      case ORDERED:
>>>>      case UNORDERED:
>>>> @@ -2571,6 +2572,7 @@
>>>>        emit_insn (comparison (operands[0], operands[2], operands[3]));
>>>>        break;
>>>>
>>>> +    case LTGT:
>>>>      case UNEQ:
>>>>        /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>>>        this result will then give us (a == b || a UNORDERED b).  */
>>>> @@ -2578,7 +2580,8 @@
>>>>                                        operands[2], operands[3]));
>>>>        emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>>>        emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>>>> -      emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>>> +      if (code == UNEQ)
>>>> +     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>>>        break;
>>>
>>> AFAIK this is still a grey area, but I think (ltgt x y) is supposed to
>>> be a trapping operation, i.e. it's closer to (ior (lt x y) (gt x y))
>>> than (not (uneq x y)).  See e.g. the handling in may_trap_p_1, where
>>> LTGT is handled like LT and GT rather than like UNEQ.
>>>
>>> See also: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html
>> Thanks for pointing me to this, I don't know anything about floating point 
>> here.
>> As for the change, the code now looks like:
>>
>>     case LTGT:
>>     case UNEQ:
>>       /* We first check (a > b ||  b > a) which is !UNEQ, inverting
>>      this result will then give us (a == b || a UNORDERED b).  */
>>       emit_insn (gen_aarch64_cmgt<mode> (operands[0],
>>                      operands[2], operands[3]));
>>       emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
>>       emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
>>       if (code == UNEQ)
>>     emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>>       break;
>>
>> So (a > b || b > a) is generated for LTGT which you suggested?
>
> Ah, yeah, I was just going off LTGT being treated as !UNEQ, but...
>
>> Here we invert the result for UNEQ though.
>
> ...it looks like it might be the UNEQ code that's wrong.  E.g. this
> test fails at -O3 and passes at -O for me:
>
> #define _GNU_SOURCE
> #include <fenv.h>
>
> double x[16], y[16];
> int res[16];
>
> int
> main (void)
> {
>   for (int i = 0; i < 16; ++i)
>     {
>       x[i] = __builtin_nan ("");
>       y[i] = i;
>     }
>   asm volatile ("" ::: "memory");
>   feclearexcept (FE_ALL_EXCEPT);
>   for (int i = 0; i < 16; ++i)
>     res[i] = __builtin_islessgreater (x[i], y[i]);
>   asm volatile ("" ::: "memory");
>   return fetestexcept (FE_ALL_EXCEPT) != 0;
> }
>
> (asm volatiles just added for paranoia, in case stuff gets optimised
> away otherwise.)
Thanks for the test, I file PR81647 for tracking.  And this is
actually inconsistent LTGT behavior issue.  It's translated
differently w/o vectorization I think.

Thanks,
bin
>
> But I suppose that's no reason to hold up your patch. :-)  Maybe it'd
> be worth having a comment though?
>
> Thanks,
> Richard

Reply via email to