ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html
Thanks, Prathamesh On 23 June 2016 at 22:51, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > On 17 June 2016 at 19:52, Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >> On 14 June 2016 at 18:31, Prathamesh Kulkarni >> <prathamesh.kulka...@linaro.org> wrote: >>> On 13 June 2016 at 16:13, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h >>>>> index ecafe63..41ac408 100644 >>>>> --- a/gcc/cgraph.h >>>>> +++ b/gcc/cgraph.h >>>>> @@ -1874,6 +1874,9 @@ public: >>>>> if we did not do any inter-procedural code movement. */ >>>>> unsigned used_by_single_function : 1; >>>>> >>>>> + /* Set if -fsection-anchors is set. */ >>>>> + unsigned section_anchor : 1; >>>>> + >>>>> private: >>>>> /* Assemble thunks and aliases associated to varpool node. */ >>>>> void assemble_aliases (void); >>>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c >>>>> index 4bfcad7..e75d5c0 100644 >>>>> --- a/gcc/cgraphunit.c >>>>> +++ b/gcc/cgraphunit.c >>>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl) >>>>> it is available to notice_global_symbol. */ >>>>> node->definition = true; >>>>> notice_global_symbol (decl); >>>>> + >>>>> + node->section_anchor = flag_section_anchors; >>>>> + >>>>> if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl) >>>>> /* Traditionally we do not eliminate static variables when not >>>>> optimizing and when not doing toplevel reoder. */ >>>>> diff --git a/gcc/common.opt b/gcc/common.opt >>>>> index f0d7196..e497795 100644 >>>>> --- a/gcc/common.opt >>>>> +++ b/gcc/common.opt >>>>> @@ -1590,6 +1590,10 @@ fira-algorithm= >>>>> Common Joined RejectNegative Enum(ira_algorithm) Var(flag_ira_algorithm) >>>>> Init(IRA_ALGORITHM_CB) Optimization >>>>> -fira-algorithm=[CB|priority] Set the used IRA algorithm. >>>>> >>>>> +fipa-increase_alignment >>>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization >>>>> +Option to gate increase_alignment ipa pass. >>>>> + >>>>> Enum >>>>> Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA >>>>> algorithm %qs) >>>>> >>>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) >>>>> Init(1) Optimization >>>>> Enable the dependent count heuristic in the scheduler. >>>>> >>>>> fsection-anchors >>>>> -Common Report Var(flag_section_anchors) Optimization >>>>> +Common Report Var(flag_section_anchors) >>>>> Access data in the same section from shared anchor points. >>>>> >>>>> fsee >>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>> index a0db3a4..1482566 100644 >>>>> --- a/gcc/config/aarch64/aarch64.c >>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void) >>>>> >>>>> aarch64_register_fma_steering (); >>>>> >>>>> + /* Enable increase_alignment pass. */ >>>>> + flag_ipa_increase_alignment = 1; >>>> >>>> I would rather enable it always on targets that do support anchors. >>> AFAIK aarch64 supports section anchors. >>>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c >>>>> index ce9e146..7f09f3a 100644 >>>>> --- a/gcc/lto/lto-symtab.c >>>>> +++ b/gcc/lto/lto-symtab.c >>>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, >>>>> symtab_node *entry) >>>>> The type compatibility checks or the completing of types has >>>>> properly >>>>> dealt with most issues. */ >>>>> >>>>> + /* ??? is this assert necessary ? */ >>>>> + varpool_node *v_prevailing = dyn_cast<varpool_node *> (prevailing); >>>>> + varpool_node *v_entry = dyn_cast<varpool_node *> (entry); >>>>> + gcc_assert (v_prevailing && v_entry); >>>>> + /* section_anchor of prevailing_decl wins. */ >>>>> + v_entry->section_anchor = v_prevailing->section_anchor; >>>>> + >>>> Other flags are merged in lto_varpool_replace_node so please move this >>>> there. >>> Ah indeed, thanks for the pointers. >>> I wonder though if we need to set >>> prevailing_node->section_anchor = vnode->section_anchor ? >>> IIUC, the function merges flags from vnode into prevailing_node >>> and removes vnode. However we want prevailing_node->section_anchor >>> to always take precedence. >>>>> +/* Return true if alignment should be increased for this vnode. >>>>> + This is done if every function that references/referring to vnode >>>>> + has flag_tree_loop_vectorize set. */ >>>>> + >>>>> +static bool >>>>> +increase_alignment_p (varpool_node *vnode) >>>>> +{ >>>>> + ipa_ref *ref; >>>>> + >>>>> + for (int i = 0; vnode->iterate_reference (i, ref); i++) >>>>> + if (cgraph_node *cnode = dyn_cast<cgraph_node *> (ref->referred)) >>>>> + { >>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>> + return false; >>>>> + } >>>> >>>> If you take address of function that has vectorizer enabled probably >>>> doesn't >>>> imply need to increase alignment of that var. So please drop the loop. >>>> >>>> You only want function that read/writes or takes address of the symbol. But >>>> onthe other hand, you need to walk all aliases of the symbol by >>>> call_for_symbol_and_aliases >>>>> + >>>>> + for (int i = 0; vnode->iterate_referring (i, ref); i++) >>>>> + if (cgraph_node *cnode = dyn_cast<cgraph_node *> (ref->referring)) >>>>> + { >>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>> + return false; >>>>> + } >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>>> /* Entry point to increase_alignment pass. */ >>>>> static unsigned int >>>>> increase_alignment (void) >>>>> @@ -914,9 +942,12 @@ increase_alignment (void) >>>>> tree decl = vnode->decl; >>>>> unsigned int alignment; >>>>> >>>>> - if ((decl_in_symtab_p (decl) >>>>> - && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>> - || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl)) >>>>> + if (!vnode->section_anchor >>>>> + || (decl_in_symtab_p (decl) >>>>> + && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>> + || DECL_USER_ALIGN (decl) >>>>> + || DECL_ARTIFICIAL (decl) >>>>> + || !increase_alignment_p (vnode)) >>>> >>>> Incrementally we probably should do more testing whether the variable >>>> looks like >>>> someting that can be vectorized, i.e. it contains array, has address taken >>>> or the >>>> accesses are array accesses within loop. >>>> This can be done by the analysis phase of the IPA pass inspecting the >>>> function >>>> bodies. >>> Thanks, I will try to check for array accesses are within a loop in >>> followup patch. >>> I was wondering if we could we treat a homogeneous global struct >>> (having members of one type), >>> as a global array of that type and increase it's alignment if required ? >>>> >>>> I think it is important waste to bump up everything including error >>>> messages etc. >>>> At least on i386 the effect on firefox datasegment of various alignment >>>> setting is >>>> very visible. >>> Um for a start, would it be OK to check if all functions referencing >>> variable >>> have attribute noreturn, and in that case we skip increasing the alignment ? >>> I suppose that error functions would be having attribute noreturn set ? >>>> >>>> Looks OK to me otherwise. please send updated patch. >>> I have done the changes in the attached patch (stage-1 built). >>> I am not sure what to return from the callback function and >>> arbitrarily chose to return true. >> Hi, >> In this version (stage-1 built), I added read/write summaries which >> stream those variables >> which we want to increase alignment for. >> >> I have a few questions: >> >> a) To check if global var is used inside a loop, I obtained >> corresponding ipa_ref >> and checked loop_containing_stmt (ref->stmt), however it returned non-NULL >> even when ref->stmt was not inside a loop. >> for instance: >> int a[32]; >> int f() { int x = a[0]; return x; } >> Um how to check if stmt is used inside a loop ? >> >> b) Is it necessary during WPA to check if function has >> flag_tree_vectorize_set since >> during analysis phase in increase_alignment_generate_summary() I check >> if cnode has flag_tree_loop_vectorize_set ? >> >> c) In LTO_increase_alignment_section, the following is streamed: >> i) a "count" of type uwhi, to represent number of variables >> ii) decls corresponding to the variables >> The variables are then obtained during read_summay using >> symtab_node::get_create(). >> I suppose since decls for varpool_nodes would already be streamed in >> LTO_section_decls, I was wondering if I >> could somehow refer to the decls in that section to avoid duplicate >> streaming of decls ? > Hi, > In this version, the variable is streamed if it occurs within a loop > or it's address is taken. > To check if stmt is inside a loop I am using: > > struct loop *l = loop_containing_stmt (ref->stmt); > if (l != DECL_STRUCT_FUNCTION (cnode->decl)->x_current_loops->tree_root) > vars->add (vnode); > Is this correct ? > > I came across an unexpected issue in my previous patch with -ffat-lto-objects. > I am allocating vars = new hash_set<varpool_node *> () in > generate_summary() and freeing it in write_summary(). > However with -ffat-lto-objects, it appears execute() gets called after > write_summary() > during LGEN and we hit segfault in execute() at: > for (hash_set<varpool_node *>::iterator it = vars.begin (); it != > vars.end (); it++) > { ... } > because write_summary() has freed vars. > To workaround the issue, I gated call to free vars in write_summary on > flag_fat_lto_objects, > I am not sure if that's a good solution. > > Cross tested on arm*-*-*, aarch64*-*-*. > > Thanks, > Prathamesh >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Prathamesh >>>> >>>> Honza