Hi Tom, > On 03/19/2018 10:11 AM, Richard Biener wrote: >> On Fri, 16 Mar 2018, Tom de Vries wrote: >> >>> On 03/16/2018 12:55 PM, Richard Biener wrote: >>>> On Fri, 16 Mar 2018, Tom de Vries wrote: >>>> >>>>> On 02/27/2018 01:42 PM, Richard Biener wrote: >>>>>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c >>>>>> =================================================================== >>>>>> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) >>>>>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) >>>>>> @@ -0,0 +1,15 @@ >>>>>> +/* { dg-do compile } */ >>>>>> +/* { dg-options "-O3 -fdump-tree-optimized" } */ >>>>>> + >>>>>> +int foo() >>>>>> +{ >>>>>> + int a[10]; >>>>>> + for(int i = 0; i < 10; ++i) >>>>>> + a[i] = i*i; >>>>>> + int res = 0; >>>>>> + for(int i = 0; i < 10; ++i) >>>>>> + res += a[i]; >>>>>> + return res; >>>>>> +} >>>>>> + >>>>>> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ >>>>> >>>>> This fails for nvptx, because it doesn't have the required vector >>>>> operations. >>>>> To fix the fail, I've added requiring effective target vect_int_mult. >>>> >>>> On targets that do not vectorize you should see the scalar loops unrolled >>>> instead. Or do you have only one loop vectorized? >>> >>> Sort of. Loop vectorization has no effect, and the scalar loops are >>> completely >>> unrolled. But then slp vectorization vectorizes the stores. >>> >>> So at optimized we have: >>> ... >>> MEM[(int *)&a] = { 0, 1 }; >>> MEM[(int *)&a + 8B] = { 4, 9 }; >>> MEM[(int *)&a + 16B] = { 16, 25 }; >>> MEM[(int *)&a + 24B] = { 36, 49 }; >>> MEM[(int *)&a + 32B] = { 64, 81 }; >>> _6 = a[0]; >>> _28 = a[1]; >>> res_29 = _6 + _28; >>> _35 = a[2]; >>> res_36 = res_29 + _35; >>> _42 = a[3]; >>> res_43 = res_36 + _42; >>> _49 = a[4]; >>> res_50 = res_43 + _49; >>> _56 = a[5]; >>> res_57 = res_50 + _56; >>> _63 = a[6]; >>> res_64 = res_57 + _63; >>> _70 = a[7]; >>> res_71 = res_64 + _70; >>> _77 = a[8]; >>> res_78 = res_71 + _77; >>> _2 = a[9]; >>> res_11 = _2 + res_78; >>> a ={v} {CLOBBER}; >>> return res_11; >>> ... >>> >>> The stores and loads are eliminated by dse1 in the rtl phase, and in the end >>> we have: >>> ... >>> .visible .func (.param.u32 %value_out) foo >>> { >>> .reg.u32 %value; >>> .local .align 16 .b8 %frame_ar[48]; >>> .reg.u64 %frame; >>> cvta.local.u64 %frame, %frame_ar; >>> mov.u32 %value, 285; >>> st.param.u32 [%value_out], %value; >>> ret; >>> } >>> ... >>> >>>> That's precisely >>>> what the PR was about... which means it isn't fixed for nvptx :/ >>> >>> Indeed the assembly is not optimal, and would be optimal if we'd have >>> optimal >>> code at optimized. >>> >>> FWIW, using this patch we generate optimal code at optimized: >>> ... >>> diff --git a/gcc/passes.def b/gcc/passes.def >>> index 3ebcfc30349..6b64f600c4a 100644 >>> --- a/gcc/passes.def >>> +++ b/gcc/passes.def >>> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see >>> NEXT_PASS (pass_tracer); >>> NEXT_PASS (pass_thread_jumps); >>> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); >>> + NEXT_PASS (pass_fre); >>> NEXT_PASS (pass_strlen); >>> NEXT_PASS (pass_thread_jumps); >>> NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); >>> ... >>> >>> and we get: >>> ... >>> .visible .func (.param.u32 %value_out) foo >>> { >>> .reg.u32 %value; >>> mov.u32 %value, 285; >>> st.param.u32 [%value_out], %value; >>> ret; >>> } >>> ... >>> >>> I could file a missing optimization PR for nvptx, but I'm not sure where >>> this >>> should be fixed. >> >> Ah, yeah... the usual issue then. >> >> Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? >> > > Done. > > Committed at attached.
this caused the test to FAIL on 64-bit (only) sparc-sun-solaris2.11: FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;" where it was UNSUPPORTED before. The dump has ;; Function foo (foo, funcdef_no=0, decl_uid=1557, cgraph_uid=0, symbol_order=0) foo () { int res; int a[10]; int _2; int _6; int _28; int _35; int _42; int _49; int _56; int _63; int _70; int _77; <bb 2> [local count: 97603132]: MEM[(int *)&a] = { 0, 1 }; MEM[(int *)&a + 8B] = { 4, 9 }; MEM[(int *)&a + 16B] = { 16, 25 }; MEM[(int *)&a + 24B] = { 36, 49 }; MEM[(int *)&a + 32B] = { 64, 81 }; _6 = a[0]; _28 = a[1]; res_29 = _6 + _28; _35 = a[2]; res_36 = res_29 + _35; _42 = a[3]; res_43 = res_36 + _42; _49 = a[4]; res_50 = res_43 + _49; _56 = a[5]; res_57 = res_50 + _56; _63 = a[6]; res_64 = res_57 + _63; _70 = a[7]; res_71 = res_64 + _70; _77 = a[8]; res_78 = res_71 + _77; _2 = a[9]; res_11 = _2 + res_78; a ={v} {CLOBBER}; return res_11; } Rainer -- ----------------------------------------------------------------------------- Rainer Orth, Center for Biotechnology, Bielefeld University