Hi, Here is a simple test-case to reproduce 176.gcc failure (I run it on Haswell machine). Using 20160819 compiler build we get: gcc -O3 -m32 -mavx2 test.c -o test.ref.exe /users/ysrumyan/isse_6866$ ./test.ref.exe Aborted (core dumped)
If I apply patch proposed by Patrick test runs properly Instead of running we can check number of .jump thread. 2016-08-19 12:25 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Fri, Aug 19, 2016 at 1:06 AM, Patrick Palka <patr...@parcs.ath.cx> wrote: >> On Thu, 18 Aug 2016, Richard Biener wrote: >> >>> On August 18, 2016 8:25:18 PM GMT+02:00, Patrick Palka >>> <patr...@parcs.ath.cx> wrote: >>> >In comment #5 Yuri reports that r235653 introduces a runtime failure >>> >for >>> >176.gcc which I guess is caused by the combining step in >>> >simplify_control_stmt_condition_1() not behaving properly on operands >>> >of >>> >type VECTOR_TYPE. I'm a bit stumped as to why it mishandles >>> >VECTOR_TYPEs because the logic should be generic enough to support them >>> >as well. But it was confirmed that restricting the combining step to >>> >operands of scalar type fixes the runtime failure so here is a patch >>> >that does this. Does this look OK to commit after bootstrap + >>> >regtesting on x86_64-pc-linux-gnu? >>> >>> Hum, I'd rather understand what is going wrong. Can you at least isolate a >>> testcase? >>> >>> Richard. >> >> I don't have access to the SPEC benchmarks unfortunately. Maybe Yuri >> can isolate a test case? >> >> But I think I found a theoretical bug which may or may not coincide with >> the bug that Yuri is observing. The part of the combining step that may >> provide wrong results for VECTOR_TYPEs is the one that simplifies the >> conditional (A & B) != 0 to true when given that A != 0 and B != 0 and >> given that their TYPE_PRECISION is 1. >> >> The TYPE_PRECISION test was intended to succeed only on scalars, but >> IIUC it accidentally succeeds on one-dimensional vectors too. So we may >> be wrongly simplifying X & Y != <0> to true given that e.g. X == <8> >> and Y == <2>. So this simplification should probably be restricted to >> integral types like so: >> >> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c >> index 170e456..b8c8b70 100644 >> --- a/gcc/tree-ssa-threadedge.c >> +++ b/gcc/tree-ssa-threadedge.c >> @@ -648,14 +648,17 @@ simplify_control_stmt_condition_1 (edge e, >> if (res1 != NULL_TREE && res2 != NULL_TREE) >> { >> if (rhs_code == BIT_AND_EXPR >> + && INTEGRAL_TYPE_P (TREE_TYPE (op0)) >> && TYPE_PRECISION (TREE_TYPE (op0)) == 1 > > you can use element_precision (op0) == 1 instead. > > Richard. > >> && integer_nonzerop (res1) >> && integer_nonzerop (res2)) >> -- >> 2.9.3.650.g20ba99f >> >> Hope this makes sense. >> >>> >>> >gcc/ChangeLog: >>> > >>> > PR tree-optimization/71077 >>> > * tree-ssa-threadedge.c (simplify_control_stmt_condition_1): >>> > Perform the combining step only if the operands have an integral >>> > or a pointer type. >>> >--- >>> > gcc/tree-ssa-threadedge.c | 3 +++ >>> > 1 file changed, 3 insertions(+) >>> > >>> >diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c >>> >index 170e456..a97c00c 100644 >>> >--- a/gcc/tree-ssa-threadedge.c >>> >+++ b/gcc/tree-ssa-threadedge.c >>> >@@ -577,6 +577,9 @@ simplify_control_stmt_condition_1 (edge e, >>> > if (handle_dominating_asserts >>> > && (cond_code == EQ_EXPR || cond_code == NE_EXPR) >>> > && TREE_CODE (op0) == SSA_NAME >>> >+ /* ??? Vector types are mishandled here. */ >>> >+ && (INTEGRAL_TYPE_P (TREE_TYPE (op0)) >>> >+ || POINTER_TYPE_P (TREE_TYPE (op0))) >>> > && integer_zerop (op1)) >>> > { >>> > gimple *def_stmt = SSA_NAME_DEF_STMT (op0); >>> >>> >>>
typedef unsigned int ui; ui x[32*32]; ui y[32]; ui z[32]; void __attribute__ ((noinline, noclone)) foo (ui n, ui z) { ui i, b; ui v; for (i = 0; i< n; i++) { v = y[i]; if (v) { for (b = 0; b < 32; b++) if ((v >> b) & 1) x[i*32 +b] = z; y[i] = 0; } } } int main() { int i; unsigned int val; for (i = 0; i<32; i++) { val = 1 << i; y[i] = (i & 1)? 0 : val; z[i] = i; } foo (32, 10); for (i=0; i<1024; i+=66) if (x[i] != 10) __builtin_abort (); return 0; }