https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79487
Bug ID: 79487 Summary: Invalid _Decimal32 comparison on s390x Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: vogt at linux dot vnet.ibm.com CC: jakub at gcc dot gnu.org, krebbel at gcc dot gnu.org Target Milestone: --- Host: s390x Target: s390x This is a finding from an Asan test case failure reported here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341 It may be a bug in the middle end or the backend. The failing program is -- extern _Decimal32 bar(_Decimal32 x); void foo(void) { int i; #if 0 /*!!!*/ volatile #endif _Decimal32 min = (-0x7fffffffffffffffL - 1L); volatile _Decimal32 tem = min; for (i = 0; i < 999999; i++) { tem -= (_Decimal32)1.0; if (min != tem) { bar(tem); break; } } } -- When compiled with -O3 -march=zEC12, with the "volatile" disabled, the comparison "min != tem" is true right in the first pass of the loop, but bar() is called actually with the a value that is identical to "(_Decimal32)min". One cause of this is that although s390x supports 32-Bit decimal floating point in hardware, there are no instructions to do arithmetics or comparisons on such values. The 32 bit values need to be converted to 64 bit format for comparisons. Gcc pre-calculates the constant (7fffffffffffffffL - 1L) and puts it into the literal pool as 64 bit quantity. At the same time, "tem" is kept in memory as a 32 bit quantity, loaded to a register, extended to 64 bits and then compared to the value from the literal pool. Since the latter was not rounded, the comparison is always true. Making "min" volatile circumvents the problem. This is a (slightly shortened) diff of the assembly code with the "volatile" enabled (-, good) and disabled (+, bad): -- foo: ldgr %f4,%r15 larl %r5,.L8 lay %r15,-168(%r15) le %f0,.L9-.L8(%r5) - ste %f0,160(%r15) + ste %f0,164(%r15) iilf %r1,999999 - l %r2,160(%r15) - st %r2,164(%r15) .L3: le %f0,164(%r15) ldetr %f0,%f0,0 <-- extend tem to 64 bits ld %f2,.L10-.L8(%r5) sdtr %f0,%f0,%f2 <-- subtract 1 from tem + ld %f2,.L11-.L8(%r5) <-- min: 64-bit from literal pool ledtr %f0,0,%f0,0 <-- round tem to 32 bits ste %f0,164(%r15) <-- store a copy to stack - le %f2,160(%r15) <-- min: 32-bit from stack le %f0,164(%r15) <-- load tem from stack (32 bits) - ldetr %f2,%f2,0 <-- min: extend to 64 bits ldetr %f0,%f0,0 <-- tem: extend to 64 bits - cdtr %f2,%f0 <-- compare min and tem (64 bits) + cdtr %f0,%f2 jne .L7 brct %r1,.L3 lgdr %r15,%f4 br %r14 .L7: le %f0,164(%r15) lgdr %r15,%f4 jg bar ... .L8: +.L11: + .long -297458820 <--- (-7fffffffffffffffL - 1L) as 64 bit value + .long -2090241034 <-/ .L10: .long 573833216 .long 16 .L9: .long -283865614 <--- (-7fffffffffffffffL - 1L) as 32 bit value -- Somewhere, Gcc is using 64 bit precision for _Decimal32 where it should not. Note that this does not happen on Power which has similar DFP instructions. Gcc does not store the constant with 64 bit precision in the literal pool there.