Here is the background for this feature: 1) People relies on function multi-version to explore hw features and squeeze performance, but there is no standard ways of doing so, either a) using indirect function calls with function pointers set at program initialization; b) using manual dispatch at each callsite; b) using features like IFUNC. The dispatch mechanism needs to be promoted to the language level and becomes the first class citizen; 2) Most importantly, the above non standard approaches block interprocedural optimizations such as inlining. With the introduction of buitlin_dispatch and the clear semantics, compiler can do more aggressive optimization.
Multi-way dispatch will be good, but most of the use cases we have seen is 2-way -- a fast path + a default path. Who are the targeted consumer of the feature? 1) For developers who like to MV function manually; 2) For user directed function cloning e.g, int foo (...) __attribute__((clone ("sse")): 3) Automatic function cloning determined by compiler. > > I am not sure that combining the function choice and its invocation > to a single builtin is good. First, you use variadic arguments for > the actual function arguments which will cause vararg promotions > to apply and thus it will be impossible to dispatch to functions > taking single precision float values (and dependent on ABI details > many more similar issues). Second, you restrict yourself to > only two versions of a function - that looks quite limited and this > restriction is not easy to lift with your proposed interface. See above. Multi-way dispatch can be added similarly. > > That's a nice idea, but why not simply process all functions with > a const attribute and no arguments this way? IMHO > > int testsse (void) __attribute__((const)); > > is as good and avoids the new attribute completely. The pass would > also benefit other uses of this idiom (giving a way to have global > dynamic initializers in C). The functions may not be strictly 'const' > in a way the compiler autodetects this attribute but it presents > exactly the promises to the compiler that allow this optimization. > > Thus, please split this piece out into a separate patch and use > the const attribute. Sounds good. > >> What happens with cloning, -fclone-hot-version-paths ? >> ----------------------------------------------------- > > Now, here you lost me somewhat, because I didn't look into the > patch details and I am missing an example on how the lowered > IL looks before that cloning. So for now I suppose this > -fclone-hot-version-paths > is to expose direct calls to the IL. If you would lower __builtin_dispatch > to a style like > > int sel = selector (); > switch (sel) > { > case 0: > fn = popcnt; > break; > case 1: > fn = popcnt_sse4; > break; > } > res = (*fn) (25); > > then rather than a new pass specialized for __builtin_dispatch existing > optimization passes that are able to do tracing like VRP and DOM > via jump-threading or the tracer pass should be able to do this > optimization for you. If they do not use FDO in a good way it is better > to improve them for this case instead of writing a new pass. What you describe may not work 1) the function selection may happen in a different function; 2) Compiler will need to hoist the selection into the program initializer to avoid overhead As an example of why dispatch hoisting and call chain cloning is needed: void foo(); void bar(); void doit_v1(); void doit_v2(); bool check_v () __attribute__((const)); int test(); void bar() { .... for (.....) { foo(); .... } } void foo() { ... for (...) { __builtin_dispatch(check_v, doit_v1, doit_v2,...); ... } } int test () { .. bar (); } The feature testing and dispatch is embedded in a 2-deep loop nest crossing function boundaries. The call paths test --> bar --> foo needs to be cloned. This is done by hoisting dispatch up the call chain -- it ends up : void bar_v1() { .... for (..) { foo_v1 (); } .. } void bar_v2 () { ... for (..) { foo_v2(); } .. } void foo_v1 () { .. for () { doit_v1() } } void foo_v2 () { .. for () { doit_v2() } } int test () { __builtin_dispatch(check_v, bar_v1, bar_v2); .. } Thanks, David > > Note that we already have special FDO support for indirect to direct > call promotion, so that might work already. > > Thus, with all this, __builtin_dispatch would be more like syntactic > sugar to avoid writing above switch statement yourself. If you restrict > that sugar to a constant number of candidates it can be replaced by > a macro (or several ones, specialized for the number of candidates). > > Richard. > >> For the call graph that looks like this before cloning : >> >> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> popcnt >> | >> -----> no_popcnt >> >> where popcnt and no_popcnt are the multi-versioned functions, then after >> cloning : >> >> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOnes_clone0 >> ----> popcnt >> | >> ----> func_hot_clone1 ----> findOnes_clone1 >> ----> no_popcnt >> >> >> >> >> Flags : -fclone-hot-version-paths does function unswitching via cloning. >> --param=num-mversn-clones=<num> allows to specify the number of >> functions that should be cloned. >> --param=mversn-clone-depth=<num> allows to specify the length of >> the call graph path that should be cloned. num = 0 implies only >> leaf node that contains the __builtin_dispatch statement must be >> cloned. >> ============================================================================================ >> >> Google ref 45947, 45986 >> >> 2011-04-28 Sriraman Tallam <tmsri...@google.com> >> >> * c-family/c-common.c (revision 173122) >> (handle_version_selector_attribute): New function. >> (c_common_attribute_table): New attribute "version_selector". >> * tree-pass.h (revision 173122) >> (pass_tree_convert_builtin_dispatch): New pass. >> (pass_ipa_multiversion_dispatch): New pass. >> * testsuite/gcc.dg/mversn7.c (revision 0): New test case. >> * testsuite/gcc.dg/mversn4.c (revision 0): New test case. >> * testsuite/gcc.dg/mversn4.h (revision 0): New test case. >> * testsuite/gcc.dg/mversn4a.c (revision 0): New test case. >> * testsuite/gcc.dg/torture/mversn1.c (revision 0): New test case. >> * testsuite/gcc.dg/mversn2.c (revision 0): New test case. >> * testsuite/gcc.dg/mversn6.c (revision 0): New test case. >> * testsuite/gcc.dg/mversn3.c (revision 0): New test case. >> * testsuite/g++.dg/mversn8.C (revision 0): New test case. >> * testsuite/g++.dg/mversn10a.C (revision 0): New test case. >> * testsuite/g++.dg/mversn14a.C (revision 0): New test case. >> * testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test case. >> * testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test case. >> * testsuite/g++.dg/tree-prof/mversn15a.C (revision 0): New >> test case. >> * testsuite/g++.dg/mversn9.C (revision 0): New test case. >> * testsuite/g++.dg/mversn10.C (revision 0): New test case. >> * testsuite/g++.dg/mversn12.C (revision 0): New test case. >> * testsuite/g++.dg/mversn14.C (revision 0): New test case. >> * testsuite/g++.dg/mversn16.C (revision 0): New test case. >> * testsuite/g++.dg/torture/mversn11.C (revision 0): New test case. >> * testsuite/g++.dg/torture/mversn5.C (revision 0): New test case. >> * testsuite/g++.dg/torture/mversn5.h (revision 0): New test case. >> * testsuite/g++.dg/torture/mversn5a.C (revision 0): New test case. >> * builtin-types.def (revision 173122) (BT_PTR_FN_INT): New >> pointer type. >> (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for >> __builtin_dispatch. >> * builtins.def (revision 173122) (BUILT_IN_DISPATCH): New builtin to >> support multi-version calls. >> * mversn-dispatch.c (revision 0): New file. >> * timevar.def (revision 173122) (TV_MVERSN_DISPATCH): New time var. >> * common.opt (revision 173122) (fclone-hot-version-paths): New >> flag. >> * Makefile.in (revision 173122) (mversn-dispatch.o): New rule. >> * passes.c (revision 173122) (init_optimization_passes): Add the >> new >> multi-version and dispatch passes to the pass list. >> * params.def (revision 173122) (PARAM_NUMBER_OF_MVERSN_CLONES): >> Define. >> (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define. >> * doc/invoke.texi (revision 173122) (mversn-clone-depth): >> Document. >> (num-mversn-clones): Document. >> (fclone-hot-version-paths): Document. >