Thanks. I committed this patch. -Sri.
svn commit Sending gcc/ChangeLog.google-main Sending gcc/Makefile.in Sending gcc/builtin-types.def Sending gcc/builtins.def Sending gcc/c-family/c-common.c Sending gcc/common.opt Sending gcc/doc/invoke.texi Adding gcc/mversn-dispatch.c Sending gcc/params.def Sending gcc/passes.c Adding gcc/testsuite/g++.dg/mversn10.C Adding gcc/testsuite/g++.dg/mversn10a.C Adding gcc/testsuite/g++.dg/mversn12.C Adding gcc/testsuite/g++.dg/mversn14.C Adding gcc/testsuite/g++.dg/mversn14a.C Adding gcc/testsuite/g++.dg/mversn16.C Adding gcc/testsuite/g++.dg/mversn8.C Adding gcc/testsuite/g++.dg/mversn9.C Adding gcc/testsuite/g++.dg/torture/mversn11.C Adding gcc/testsuite/g++.dg/torture/mversn5.C Adding gcc/testsuite/g++.dg/torture/mversn5.h Adding gcc/testsuite/g++.dg/torture/mversn5a.C Adding gcc/testsuite/g++.dg/tree-prof/mversn13.C Adding gcc/testsuite/g++.dg/tree-prof/mversn15.C Adding gcc/testsuite/g++.dg/tree-prof/mversn15a.C Adding gcc/testsuite/gcc.dg/mversn2.c Adding gcc/testsuite/gcc.dg/mversn3.c Adding gcc/testsuite/gcc.dg/mversn4.c Adding gcc/testsuite/gcc.dg/mversn4.h Adding gcc/testsuite/gcc.dg/mversn4a.c Adding gcc/testsuite/gcc.dg/mversn6.c Adding gcc/testsuite/gcc.dg/mversn7.c Adding gcc/testsuite/gcc.dg/torture/mversn1.c Sending gcc/timevar.def Sending gcc/tree-pass.h Transmitting file data ................................... Committed revision 173271. On Mon, May 2, 2011 at 12:32 PM, Xinliang David Li <davi...@google.com> wrote: > Ok. There may be more correctness related comments -- but those can be > addressed when available. For trunk, you need to address issues such > as multi-way dispatch. > > Thanks, > > David > > On Mon, May 2, 2011 at 12:11 PM, Sriraman Tallam <tmsri...@google.com> wrote: >> Hi, >> >> I want to submit this patch to google/main to make sure it is >> available for our internal use at Google in order to materialize some >> optimization opportunities. Let us continue this dicussion as I make >> changes and submit this for review for trunk. >> >> Thanks, >> -Sri. >> >> >> On Mon, May 2, 2011 at 9:41 AM, Xinliang David Li <davi...@google.com> wrote: >>> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther >>> <richard.guent...@gmail.com> wrote: >>>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li <davi...@google.com> >>>> wrote: >>>>> Here is the background for this feature: >>>>> >>>>> 1) People relies on function multi-version to explore hw features and >>>>> squeeze performance, but there is no standard ways of doing so, either >>>>> a) using indirect function calls with function pointers set at program >>>>> initialization; b) using manual dispatch at each callsite; b) using >>>>> features like IFUNC. The dispatch mechanism needs to be promoted to >>>>> the language level and becomes the first class citizen; >>>> >>>> You are not doing that, you are inventing a new (crude) GCC extension. >>> >>> To capture the high level semantics and prevent user from lowering the >>> dispatch calls into forms compiler can not recognize, language >>> extension is the way to go. >>> >>>> >>> >>>>> See above. Multi-way dispatch can be added similarly. >>>> >>>> Not with the specified syntax. So you'd end up with _two_ >>>> language extensions. That's bad (and unacceptable, IMHO). >>> >>> This part can be improved. >>> >>>> >>>>> >>>>>> >>>>>> That's a nice idea, but why not simply process all functions with >>>>>> a const attribute and no arguments this way? IMHO >>>>>> >>>>>> int testsse (void) __attribute__((const)); >>>>>> >>>>>> is as good and avoids the new attribute completely. The pass would >>>>>> also benefit other uses of this idiom (giving a way to have global >>>>>> dynamic initializers in C). The functions may not be strictly 'const' >>>>>> in a way the compiler autodetects this attribute but it presents >>>>>> exactly the promises to the compiler that allow this optimization. >>>>>> >>>>>> Thus, please split this piece out into a separate patch and use >>>>>> the const attribute. >>>>> >>>>> >>>>> Sounds good. >>>>> >>> >>>>> >>>>> 1) the function selection may happen in a different function; >>>> >>>> Well, sure. I propose to have the above as lowered form. If people >>>> deliberately obfuscate code it's their fault. Your scheme simply >>>> makes that obfuscation impossible (or less likely possible). So >>>> 1) is not a valid argument. >>> >>> Lowering too early may defeat the purpose because >>> >>> 1) the desired optimization may not happen subject to many compiler >>> heuristic changes; >>> 2) it has other side effects such as wrong estimation of function size >>> which may impact inlining >>> 3) it limits the lowering into one form which may not be ideal -- >>> with builtin_dispatch, after hoisting optimization, the lowering can >>> use more efficient IFUNC scheme, for instance. >>> >>>> >>>> >>>> My point is that such optimization is completely independent of >>>> that dispatch thing. The above could as well be a selection to >>>> use different input arrays, one causing alias analysis issues >>>> and one not. >>>> >>>> Thus, a __builtin_dispatch centric optimization pass is the wrong >>>> way to go. >>> >>> I agree that many things can implemented in different ways, but a high >>> level standard builtin_dispatch mechanism doing hoisting >>> interprocedcurally is cleaner and simpler and targeted for function >>> multi-versioning -- it does not depend on/rely on later phase's >>> heuristic tunings to make the right things to happen. Function MV >>> deserves this level of treatment as it will become more and more >>> important for some users (e.g., Google). >>> >>>> >>>> Now, with FDO I'd expect the foo is inlined into bar (because foo >>>> is deemed hot), >>> >>> That is a myth -- the truth is that there are other heuristics which >>> can prevent this from happening. >>> >>>> then you only need to deal with loop unswitching, >>>> which should be easy to drive from FDO. >>> >>> Same here -- the loop body may not be well formed/recognized. The loop >>> nests may not be perfectly nested, or other unswitching heuristics may >>> block it from happening. This is the common problem form many other >>> things that get lowered too early. It is cleaner to make the high >>> level transformation first in IPA, and let unswitching dealing with >>> intra-procedural optimization. >>> >>> Thanks, >>> >>> David >>> >>>> >>>> Richard. >>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>>> >>>>>> Note that we already have special FDO support for indirect to direct >>>>>> call promotion, so that might work already. >>>>>> >>>>>> Thus, with all this, __builtin_dispatch would be more like syntactic >>>>>> sugar to avoid writing above switch statement yourself. If you restrict >>>>>> that sugar to a constant number of candidates it can be replaced by >>>>>> a macro (or several ones, specialized for the number of candidates). >>>>>> >>>>>> Richard. >>>>>> >>>>>>> For the call graph that looks like this before cloning : >>>>>>> >>>>>>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> >>>>>>> popcnt >>>>>>> | >>>>>>> -----> no_popcnt >>>>>>> >>>>>>> where popcnt and no_popcnt are the multi-versioned functions, then >>>>>>> after cloning : >>>>>>> >>>>>>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> >>>>>>> findOnes_clone0 ----> popcnt >>>>>>> | >>>>>>> ----> func_hot_clone1 ----> >>>>>>> findOnes_clone1 ----> no_popcnt >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Flags : -fclone-hot-version-paths does function unswitching via >>>>>>> cloning. >>>>>>> --param=num-mversn-clones=<num> allows to specify the number of >>>>>>> functions that should be cloned. >>>>>>> --param=mversn-clone-depth=<num> allows to specify the length >>>>>>> of >>>>>>> the call graph path that should be cloned. num = 0 implies >>>>>>> only >>>>>>> leaf node that contains the __builtin_dispatch statement must >>>>>>> be >>>>>>> cloned. >>>>>>> ============================================================================================ >>>>>>> >>>>>>> Google ref 45947, 45986 >>>>>>> >>>>>>> 2011-04-28 Sriraman Tallam <tmsri...@google.com> >>>>>>> >>>>>>> * c-family/c-common.c (revision 173122) >>>>>>> (handle_version_selector_attribute): New function. >>>>>>> (c_common_attribute_table): New attribute "version_selector". >>>>>>> * tree-pass.h (revision 173122) >>>>>>> (pass_tree_convert_builtin_dispatch): New pass. >>>>>>> (pass_ipa_multiversion_dispatch): New pass. >>>>>>> * testsuite/gcc.dg/mversn7.c (revision 0): New test case. >>>>>>> * testsuite/gcc.dg/mversn4.c (revision 0): New test case. >>>>>>> * testsuite/gcc.dg/mversn4.h (revision 0): New test case. >>>>>>> * testsuite/gcc.dg/mversn4a.c (revision 0): New test case. >>>>>>> * testsuite/gcc.dg/torture/mversn1.c (revision 0): New test >>>>>>> case. >>>>>>> * testsuite/gcc.dg/mversn2.c (revision 0): New test case. >>>>>>> * testsuite/gcc.dg/mversn6.c (revision 0): New test case. >>>>>>> * testsuite/gcc.dg/mversn3.c (revision 0): New test case. >>>>>>> * testsuite/g++.dg/mversn8.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/mversn10a.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/mversn14a.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test >>>>>>> case. >>>>>>> * testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test >>>>>>> case. >>>>>>> * testsuite/g++.dg/tree-prof/mversn15a.C (revision 0): >>>>>>> New test case. >>>>>>> * testsuite/g++.dg/mversn9.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/mversn10.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/mversn12.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/mversn14.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/mversn16.C (revision 0): New test case. >>>>>>> * testsuite/g++.dg/torture/mversn11.C (revision 0): New test >>>>>>> case. >>>>>>> * testsuite/g++.dg/torture/mversn5.C (revision 0): New test >>>>>>> case. >>>>>>> * testsuite/g++.dg/torture/mversn5.h (revision 0): New test >>>>>>> case. >>>>>>> * testsuite/g++.dg/torture/mversn5a.C (revision 0): New test >>>>>>> case. >>>>>>> * builtin-types.def (revision 173122) (BT_PTR_FN_INT): New >>>>>>> pointer type. >>>>>>> (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for >>>>>>> __builtin_dispatch. >>>>>>> * builtins.def (revision 173122) (BUILT_IN_DISPATCH): New >>>>>>> builtin to >>>>>>> support multi-version calls. >>>>>>> * mversn-dispatch.c (revision 0): New file. >>>>>>> * timevar.def (revision 173122) (TV_MVERSN_DISPATCH): New time >>>>>>> var. >>>>>>> * common.opt (revision 173122) (fclone-hot-version-paths): >>>>>>> New flag. >>>>>>> * Makefile.in (revision 173122) (mversn-dispatch.o): New rule. >>>>>>> * passes.c (revision 173122) (init_optimization_passes): >>>>>>> Add the new >>>>>>> multi-version and dispatch passes to the pass list. >>>>>>> * params.def (revision 173122) >>>>>>> (PARAM_NUMBER_OF_MVERSN_CLONES): Define. >>>>>>> (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define. >>>>>>> * doc/invoke.texi (revision 173122) (mversn-clone-depth): >>>>>>> Document. >>>>>>> (num-mversn-clones): Document. >>>>>>> (fclone-hot-version-paths): Document. >>>>>> >>>>> >>>> >>> >> >