On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 10/04/2011 03:03 PM, Richard Guenther wrote: >> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <tom_devr...@mentor.com> wrote: >>> On 10/01/2011 05:46 PM, Tom de Vries wrote: >>>> On 09/30/2011 03:29 PM, Richard Guenther wrote: >>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <tom_devr...@mentor.com> >>>>> wrote: >>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote: >>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <tom_devr...@mentor.com> >>>>>>> wrote: >>>>>>>> Richard, >>>>>>>> >>>>>>>> I got a patch for PR50527. >>>>>>>> >>>>>>>> The patch prevents the alignment of vla-related allocas to be set to >>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after >>>>>>>> folding >>>>>>>> the alloca. >>>>>>>> >>>>>>>> Bootstrapped and regtested on x86_64. >>>>>>>> >>>>>>>> OK for trunk? >>>>>>> >>>>>>> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that >>>>>>> the vectorizer then will no longer see that the arrays are properly >>>>>>> aligned. >>>>>>> >>>>>>> I'm not sure what the best thing to do is here, other than trying to >>>>>>> record >>>>>>> the alignment requirement of the VLA somewhere. >>>>>>> >>>>>>> Forcing the alignment of the alloca replacement decl to >>>>>>> BIGGEST_ALIGNMENT >>>>>>> has the issue that it will force stack-realignment which isn't free >>>>>>> (and the >>>>>>> point was to make the decl cheaper than the alloca). But that might >>>>>>> possibly be the better choice. >>>>>>> >>>>>>> Any other thoughts? >>>>>> >>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of >>>>>> the vla >>>>>> for the new array prevents stack realignment for folded vla-allocas, >>>>>> also for >>>>>> large vlas. >>>>>> >>>>>> This will not help in vectorizing large folded vla-allocas, but I think >>>>>> it's not >>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that >>>>>> has >>>>>> been the case up until we started to fold). If you want to trigger >>>>>> vectorization >>>>>> for a vla, you can still use the aligned attribute on the declaration. >>>>>> >>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also >>>>>> without using >>>>>> an attribute on the decl. This patch exploits this by setting it at the >>>>>> end of >>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in >>>>>> propagation though, because although the ptr_info of the lhs is >>>>>> propagated via >>>>>> copy_prop afterwards, it's not propagated anymore via ccp. >>>>>> >>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of >>>>>> ccp2 and >>>>>> not fold during ccp3. >>>>> >>>>> Ugh, somehow I like this the least ;) >>>>> >>>>> How about lowering VLAs to >>>>> >>>>> p = __builtin_alloca (...); >>>>> p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); >>>>> >>>>> and not assume anything for alloca itself if it feeds a >>>>> __builtin_assume_aligned? >>>>> >>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do >>>>> >>>>> p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); >>>>> >>>>> that's less awkward to use? >>>>> >>>>> Sorry for not having a clear plan here ;) >>>>> >>>> >>>> Using assume_aligned is a more orthogonal way to represent this in gimple, >>>> but >>>> indeed harder to use. >>>> >>>> Another possibility is to add a 'tree vla_decl' field to struct >>>> gimple_statement_call, which is probably the easiest to implement. >>>> >>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I >>>> decided to try this one. Attached patch implements my first stab at this >>>> (now >>>> testing on x86_64). >>>> >>>> Is this an acceptable approach? >>>> >>> >>> bootstrapped and reg-tested (including ada) on x86_64. >>> >>> Ok for trunk? >> >> The idea is ok I think. But >> >> case BUILT_IN_ALLOCA: >> + case BUILT_IN_ALLOCA_WITH_ALIGN: >> /* If the allocation stems from the declaration of a variable-sized >> object, it cannot accumulate. */ >> target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); >> if (target) >> return target; >> + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) >> + == BUILT_IN_ALLOCA_WITH_ALIGN) >> + { >> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp), >> + >> built_in_decls[BUILT_IN_ALLOCA], >> + 1, CALL_EXPR_ARG (exp, 0)); >> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp); >> + exp = new_call; >> + } >> >> Ick. Why can't the rest of the compiler not handle >> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA? >> (thus, arrange things so the assembler name of alloca-with-align is alloca?) >> > > We can set the assembler name in the local_define_builtin call. But that will > still create a call alloca (12, 4). How do we deal with the second argument? > >> I don't see why you still need the special late CCP pass. >> > > For alloca_with_align, the align will minimally be the 2nd argument. This is > independent of folding, and we can propagate this information in every ccp. > > If the alloca_with_align is not folded and will not be folded anymore > (something > we know at the earliest after the propagation phase of the last ccp), > the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a > normal allloca. We try to exploit this by improving the ptr_info->align.
I'd just assume the (lower) constant alignment of the argument always. > Well, that was the idea. But now I wonder, isn't it better to do this in > expand_builtin_alloca: > ... > /* Compute the argument. */ > op0 = expand_normal (CALL_EXPR_ARG (exp, 0)); > > + align = > + (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) == > + BUILT_IN_ALLOCA_WITH_ALIGN) > + ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1)) > + : BIGGEST_ALIGNMENT; > + > + > /* Allocate the desired space. */ > - result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT, > - cannot_accumulate); > + result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate); > result = convert_memory_address (ptr_mode, result); Yes, that's probably a good idea anyway - it will reduce excess stack re-alignment for VLAs. > return result; > > ... > > This way, we lower the alignment requirement for vlas in general, not only > when > folding (unless we expand to an actual alloca call). > > If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be > applicable > until expand, so we can't exploit that in the middle-end, so we lose the need > for ccp_last. Good ;) Thanks, Richard. >> -static tree >> -fold_builtin_alloca_for_var (gimple stmt) >> +static bool >> +builtin_alloca_with_align_p (gimple stmt) >> { >> - unsigned HOST_WIDE_INT size, threshold, n_elem; >> - tree lhs, arg, block, var, elem_type, array_type; >> - unsigned int align; >> + tree fndecl; >> + if (!is_gimple_call (stmt)) >> + return false; >> + >> + fndecl = gimple_call_fndecl (stmt); >> + >> + return (fndecl != NULL_TREE >> + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN); >> +} >> >> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN). >> >> + DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); >> >> Now, users might call __builtin_alloca_with_align (8, align), thus use >> a non-constant alignment argument. You either need to reject this >> in c-family/c-common.c:check_builtin_function_arguments with an >> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation >> I suppose). >> >> +DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align") >> >> Oh, I see you are not exposing this to users, so ignore the comments >> about non-constant align. Can you move this >> DEF down, near the other DEF_BUILTIN_STUBs and add a comment >> that this is used for VLAs? >> >> + build_int_cstu (size_type_node, >> + DECL_ALIGN (parm))); >> >> size_int (DECL_ALIGN (parm)), similar in other places. >> >> Let's see if there are comments from others on this - which I like. >> > > Thanks, > - Tom > >> Thanks, >> Richard. >> >>> Thanks, >>> - Tom >>> > > >>> 2011-10-03 Tom de Vries <t...@codesourcery.com> >>> >>> PR middle-end/50527 >>> * tree.c (build_common_builtin_nodes): Add local_define_builtin for >>> BUILT_IN_ALLOCA_WITH_ALIGN. >>> * builtins.c (expand_builtin_alloca): Handle >>> BUILT_IN_ALLOCA_WITH_ALIGN >>> arglist. >>> (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. Expand >>> BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA. >>> (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. >>> * tree-pass.h (pass_ccp_last): Declare. >>> * passes.c (init_optimization_passes): Replace 3rd pass_ccp with >>> pass_ccp_last. >>> * gimplify.c (gimplify_vla_decl): Lower vla to >>> BUILT_IN_ALLOCA_WITH_ALIGN. >>> * tree-ssa-ccp.c (cpp_last): Define. >>> (ccp_finalize): Improve align ptr_info for >>> BUILT_IN_ALLOCA_WITH_ALIGN. >>> (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN. >>> (builtin_alloca_with_align_p): New function. >>> (fold_builtin_alloca_with_align_p): New function, factored out of ... >>> (fold_builtin_alloca_for_var): Renamed to ... >>> (fold_builtin_alloca_with_align): Use >>> fold_builtin_alloca_with_align_p. >>> Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument. >>> (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if >>> builtin_alloca_with_align_p. >>> (do_ssa_ccp_last): New function. >>> (pass_ccp_last): Define. >>> (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN. >>> * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using >>> DEF_BUILTIN_STUB. >>> * ipa-pure-const.c (special_builtin_state): Handle >>> BUILT_IN_ALLOCA_WITH_ALIGN. >>> * tree-ssa-alias.c (ref_maybe_used_by_call_p_1 >>> (call_may_clobber_ref_p_1): Same. >>> function.c (gimplify_parameters): Lower vla to >>> BUILT_IN_ALLOCA_WITH_ALIGN. >>> cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN. >>> tree-mudflap.c (mf_xform_statements): Same. >>> tree-ssa-dce.c (mark_stmt_if_obviously_necessary) >>> (mark_all_reaching_defs_necessary_1, propagate_necessity): Same. >>> varasm.c (incorporeal_function_p): Same. >>> tree-object-size.c (alloc_object_size): Same. >>> gimple.c (gimple_build_call_from_tree): Same. >>> >>> * gcc.dg/pr50527.c: New test. >>> >>> >>> > >