[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-18 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #12 from Uroš Bizjak  ---
(In reply to jos...@codesourcery.com from comment #11)

> There are definitely bugs on some architectures involving ordinary ordered 
> comparisons such as < and >= wrongly using quiet instructions.  See bug 
> 52451 for i386 (x87 floating point) and bug 58684 for powerpc, for 
> example.  A consequence of this is that if you add tests of comparisons 
> doing the right thing, some of those tests would immediately fail on some 
> architectures.

PR 52451 involves x87 compares as well as SSE compares. The invalid (?) compare
mode for x86 target comes from config/i386.c:

--cut here--
/* Figure out whether to use ordered or unordered fp comparisons.
   Return the appropriate mode to use.  */

machine_mode
ix86_fp_compare_mode (enum rtx_code)
{
  /* ??? In order to make all comparisons reversible, we do all comparisons
 non-trapping when compiling for IEEE.  Once gcc is able to distinguish
 all forms trapping and nontrapping comparisons, we can make inequality
 comparisons trapping again, since it results in better code when using
 FCOM based compares.  */
  return TARGET_IEEE_FP ? CCFPUmode : CCFPmode;
}
--cut here--

The "Once gcc is able ..." part in the comment implies that simply returning
different mode based on the incoming rtx code argument won't work. OTOH, this
is ancient comment, so things *could* work now.

Referring to Jakub's observation in Comment #9: As mentioned b HJ, the wrong
mode is due to the above function (PR 37158).

> (These sorts of local bugs with particular operations or optimizations 
> being incorrect regarding exceptions are certainly easier to fix than the 
> issues with optimizations not being aware of exceptions and rounding modes 
> as extra inputs / outputs to floating-point operations.  The ones with 
> individual operations could I expect largely be found through thorough 
> test coverage; those with optimizations might be harder to find.)
> 
> Note that there is some ambiguity about whether LTGT RTL (and 
> corresponding GENERIC / GIMPLE) should be a quiet operation corresponding 
> to islessgreater, or ((x < y) || (x > y)) raising exceptions for quiet 
> NaNs.  See the discussion at 
> .  Fixing 
> the ambiguity in either direction would probably involve changes to the 
> part of the compiler expecting the other semantics.

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #11 from joseph at codesourcery dot com  ---
On Fri, 17 Jun 2016, jakub at gcc dot gnu.org wrote:

> The patch is completely untested though (and wonder if we have testcases for
> not raising exceptions when isgreater etc. arguments are qNaNs.

We probably don't have such tests - tests for exception raising or its 
absence are fairly limited.  Those functions are covered in the glibc 
testsuite, but of course that only covers whichever insn patterns get used 
for __builtin_is* for that particular build of the glibc tests.

There are definitely bugs on some architectures involving ordinary ordered 
comparisons such as < and >= wrongly using quiet instructions.  See bug 
52451 for i386 (x87 floating point) and bug 58684 for powerpc, for 
example.  A consequence of this is that if you add tests of comparisons 
doing the right thing, some of those tests would immediately fail on some 
architectures.

(These sorts of local bugs with particular operations or optimizations 
being incorrect regarding exceptions are certainly easier to fix than the 
issues with optimizations not being aware of exceptions and rounding modes 
as extra inputs / outputs to floating-point operations.  The ones with 
individual operations could I expect largely be found through thorough 
test coverage; those with optimizations might be harder to find.)

Note that there is some ambiguity about whether LTGT RTL (and 
corresponding GENERIC / GIMPLE) should be a quiet operation corresponding 
to islessgreater, or ((x < y) || (x > y)) raising exceptions for quiet 
NaNs.  See the discussion at 
.  Fixing 
the ambiguity in either direction would probably involve changes to the 
part of the compiler expecting the other semantics.

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #10 from H.J. Lu  ---
This is related to PR 37158?

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #9 from Jakub Jelinek  ---
If I compile the above testcase with -O2 -mavx -ftrapping-math, then it
generates vucomiss in each case, which seems wrong to me (because for
gt/ge/lt/le it should raise exceptions, so IMHO should use vcomiss in that
case).

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

Jakub Jelinek  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com,
   ||jsm28 at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek  ---
#define N 1024
float a[N], b[N];
int c[N];

void eq () { int i; for (i = 0; i < N; i++) c[i] = a[i] == b[i]; }
void ne () { int i; for (i = 0; i < N; i++) c[i] = a[i] != b[i]; }
void gt () { int i; for (i = 0; i < N; i++) c[i] = a[i] > b[i]; }
void ge () { int i; for (i = 0; i < N; i++) c[i] = a[i] >= b[i]; }
void lt () { int i; for (i = 0; i < N; i++) c[i] = a[i] < b[i]; }
void le () { int i; for (i = 0; i < N; i++) c[i] = a[i] <= b[i]; }
void unle () { int i; for (i = 0; i < N; i++) c[i] = !__builtin_isgreater
(a[i], b[i]); }
void unlt () { int i; for (i = 0; i < N; i++) c[i] = !__builtin_isgreaterequal
(a[i], b[i]); }
void unge () { int i; for (i = 0; i < N; i++) c[i] = !__builtin_isless (a[i],
b[i]); }
void ungt () { int i; for (i = 0; i < N; i++) c[i] = !__builtin_islessequal
(a[i], b[i]); }
void uneq () { int i; for (i = 0; i < N; i++) c[i] = !__builtin_islessgreater
(a[i], b[i]); }
void ordered () { int i; for (i = 0; i < N; i++) c[i] = !__builtin_isunordered
(a[i], b[i]); }
void unordered () { int i; for (i = 0; i < N; i++) c[i] = __builtin_isunordered
(a[i], b[i]); }

shows the various codes in vcond.
From C99 and other sources, all of the
isgreater/isequal/isless/isequal/islessgreater return false if any argument is
NaN and don't raise exceptions (except for sNaN). isunordered returns true only
if any argument is NaN and doesn't raise exceptions either.
The matching of the above to RTX codes has been confirmed by compiling the
above testcase.
Thus, IMNSHO the right values are:
 A > BA < BA = BUNORDSIGNALIMM
EQ FFTF N   0
NE TTFT N   4
GT TFFF Y   0xe
GE TFTF Y   0xd
LT FTFF Y   1
LE FTTF Y   2
UNLE   FTTT N   0x1a
UNLT   FTFT N   0x19
UNGE   TFTT N   0x15
UNGT   TFFT N   0x16
UNEQ   FFTT N   8
LTGT   TTFF N   0xc
ORDEREDTTTF N   7
UNORDERED  FFFT N   3

This is in sync with the 'D' stuff except for UN{LE,LT,GE,GT,EQ} where the AVX
implementation uses the signalling instructions instead of non-signalling. 
Unless there is some bug in the generic code, I'd say if one gets UNLE for
inverted isgreater, then in the above table
one needs to replace all Ts for Fs and vice versa, but keep Y and N as is
(because the fact whether the insn raises exception or not just depends on the
arguments (and not even on their order), not on whether the result is inverted
(nor arguments swapped).

So I'd expect something like:
--- gcc/config/i386/i386.c.jj   2016-06-16 21:00:08.0 +0200
+++ gcc/config/i386/i386.c  2016-06-17 19:35:52.237836780 +0200
@@ -17628,7 +17628,7 @@ ix86_print_operand (FILE *file, rtx x, i
case UNEQ:
  if (TARGET_AVX)
{
- fputs ("eq_us", file);
+ fputs ("eq_uq", file);
  break;
}
case EQ:
@@ -17637,7 +17637,7 @@ ix86_print_operand (FILE *file, rtx x, i
case UNLT:
  if (TARGET_AVX)
{
- fputs ("nge", file);
+ fputs ("nge_uq", file);
  break;
}
case LT:
@@ -17646,7 +17646,7 @@ ix86_print_operand (FILE *file, rtx x, i
case UNLE:
  if (TARGET_AVX)
{
- fputs ("ngt", file);
+ fputs ("ngt_uq", file);
  break;
}
case LE:
@@ -17671,7 +17671,10 @@ ix86_print_operand (FILE *file, rtx x, i
  break;
}
case UNGE:
- fputs ("nlt", file);
+ if (TARGET_AVX)
+   fputs ("nlt_uq", file);
+ else
+   fputs ("nlt", file);
  break;
case GT:
  if (TARGET_AVX)
@@ -17680,7 +17683,10 @@ ix86_print_operand (FILE *file, rtx x, i
  break;
}
case UNGT:
- fputs ("nle", file);
+ if (TARGET_AVX)
+   fputs ("nle_uq", 

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #7 from Jakub Jelinek  ---
I've created the table just by walking through the 'D' handling.
That said, looking e.g. at
https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
for the C comparisons (EQ, NE, LT, LE, GT, GE) EQ and NE should not raise
FE_INVALID, while LT, LE, GT, GE should.
And, if one or both operands are NaNs, then NE must be true, while EQ, LT, LE,
GT, GE false.
So, at least for these, the 'D' numbers in the table are right, i.e.
EQ 0, NE 4, GT 0xe, GE 0xd, LT 1, LE 2.  Let me investigate the other
comparison codes.

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread tripiana at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #6 from Carlos Tripiana Montes  ---
Hope you guys can do that and fix this problem soon. It would much appreciated
as we are (at Barcelona Supercomputing Center, Spain) doing research on KNL and
we need the fastest/most robust code we can obtain.

Thanks in advance.

(In reply to Ilya Enkovich from comment #5)
> (In reply to Jakub Jelinek from comment #3)
> > I'd say it is a bug in ix86_fp_cmp_code_to_pcmp_immediate that it handles
> > only a small portion of the FP comparison codes, while VCMPP[SD]
> > instructions should be able to handle everything needed.
> > But, I'm also surprised where the values in that function come from.
> > Looking at the D modifier expansion in i386.c that is used for AVX vcmp, I
> > see that:
> > code  %D3 emits corresponding imm   
> > ix86_fp_cmp_code_to_pcmp_immediate
> > UNEQ  eq_us 0x18 ICE
> > EQeq08
> > UNLT  nge   9ICE
> > LTlt10x19
> > UNLE  ngt   0xa  ICE
> > LEle20x1a
> > UNORDERED unord 3ICE
> > LTGT  neq_oq0xc  ICE
> > NEneq   44
> > GEge0xd  0x15
> > UNGE  nlt   5ICE
> > GTgt0xe  0x16
> > UNGT  nle   6ICE
> > ORDERED   ord   7ICE
> > So, there is agreement only on NE and nothing else.
> 
> I took values for ix86_fp_cmp_code_to_pcmp_immediate from some table Kirill
> gave me and seems all values I used were unordered non-signaling variants. 
> I don't fully understand the logic of imms in this table.  Some of them are
> signalling and some of them are not.  Also why is NE unordered?  But I
> suspect there is good reason for such choice and suppose AVX512 compares
> should be put in a consistent state with AVX ones by fixing
> ix86_fp_cmp_code_to_pcmp_immediate appropriately.

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

Ilya Enkovich  changed:

   What|Removed |Added

 CC||ienkovich at gcc dot gnu.org

--- Comment #5 from Ilya Enkovich  ---
(In reply to Jakub Jelinek from comment #3)
> I'd say it is a bug in ix86_fp_cmp_code_to_pcmp_immediate that it handles
> only a small portion of the FP comparison codes, while VCMPP[SD]
> instructions should be able to handle everything needed.
> But, I'm also surprised where the values in that function come from.
> Looking at the D modifier expansion in i386.c that is used for AVX vcmp, I
> see that:
> code  %D3 emits corresponding imm   
> ix86_fp_cmp_code_to_pcmp_immediate
> UNEQ  eq_us 0x18 ICE
> EQeq08
> UNLT  nge   9ICE
> LTlt10x19
> UNLE  ngt   0xa  ICE
> LEle20x1a
> UNORDERED unord 3ICE
> LTGT  neq_oq0xc  ICE
> NEneq   44
> GEge0xd  0x15
> UNGE  nlt   5ICE
> GTgt0xe  0x16
> UNGT  nle   6ICE
> ORDERED   ord   7ICE
> So, there is agreement only on NE and nothing else.

I took values for ix86_fp_cmp_code_to_pcmp_immediate from some table Kirill
gave me and seems all values I used were unordered non-signaling variants.  I
don't fully understand the logic of imms in this table.  Some of them are
signalling and some of them are not.  Also why is NE unordered?  But I suspect
there is good reason for such choice and suppose AVX512 compares should be put
in a consistent state with AVX ones by fixing
ix86_fp_cmp_code_to_pcmp_immediate appropriately.

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread tripiana at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #4 from Carlos Tripiana Montes  ---
I wasn't sure but that was my intuition. But I'm afraid I don't have enough
knowledge to propose a fix.

(In reply to Jakub Jelinek from comment #3)
> I'd say it is a bug in ix86_fp_cmp_code_to_pcmp_immediate that it handles
> only a small portion of the FP comparison codes, while VCMPP[SD]
> instructions should be able to handle everything needed.
> But, I'm also surprised where the values in that function come from.
> Looking at the D modifier expansion in i386.c that is used for AVX vcmp, I
> see that:
> code  %D3 emits corresponding imm   
> ix86_fp_cmp_code_to_pcmp_immediate
> UNEQ  eq_us 0x18 ICE
> EQeq08
> UNLT  nge   9ICE
> LTlt10x19
> UNLE  ngt   0xa  ICE
> LEle20x1a
> UNORDERED unord 3ICE
> LTGT  neq_oq0xc  ICE
> NEneq   44
> GEge0xd  0x15
> UNGE  nlt   5ICE
> GTgt0xe  0x16
> UNGT  nle   6ICE
> ORDERED   ord   7ICE
> So, there is agreement only on NE and nothing else.

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-06-17
 CC||jakub at gcc dot gnu.org,
   ||uros at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Jakub Jelinek  ---
I'd say it is a bug in ix86_fp_cmp_code_to_pcmp_immediate that it handles only
a small portion of the FP comparison codes, while VCMPP[SD] instructions should
be able to handle everything needed.
But, I'm also surprised where the values in that function come from.
Looking at the D modifier expansion in i386.c that is used for AVX vcmp, I see
that:
code  %D3 emits corresponding immix86_fp_cmp_code_to_pcmp_immediate
UNEQ  eq_us 0x18 ICE
EQeq08
UNLT  nge   9ICE
LTlt10x19
UNLE  ngt   0xa  ICE
LEle20x1a
UNORDERED unord 3ICE
LTGT  neq_oq0xc  ICE
NEneq   44
GEge0xd  0x15
UNGE  nlt   5ICE
GTgt0xe  0x16
UNGT  nle   6ICE
ORDERED   ord   7ICE
So, there is agreement only on NE and nothing else.

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-17 Thread izamyatin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

Igor Zamyatin  changed:

   What|Removed |Added

 CC||izamyatin at gmail dot com

--- Comment #2 from Igor Zamyatin  ---
Created attachment 38715
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38715=edit
Raw fortran source file

Fail can be seen on trunk with
gfortran -march=knl -Ofast -fno-finite-math-only prja.f.

Adding Ilya to look at it

[Bug libgcc/71559] ICE in ix86_fp_cmp_code_to_pcmp_immediate, at config/i386/i386.c:23042 (KNL/AVX512)

2016-06-16 Thread tripiana at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71559

--- Comment #1 from Carlos Tripiana Montes  ---
Does anyone has a clue on this?

I've tried  to follow the stacktrace and it seems like it's generating an
invalid code tag for this particular operation.

Any help would be much appreciated!