On Wed, 23 Aug 2017, Richard Biener wrote: > On Tue, 22 Aug 2017, Richard Biener wrote: > > > On August 22, 2017 6:38:55 PM GMT+02:00, "Richard Earnshaw (lists)" > > <richard.earns...@arm.com> wrote: > > >On 22/08/17 13:55, Kyrill Tkachov wrote: > > >> Hi Richard, > > >> [roping in more aarch64 maintainers] > > >> > > >> On 22/08/17 13:27, Richard Biener wrote: > > >>> On Tue, 22 Aug 2017, Uros Bizjak wrote: > > >>> > > >>>> On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener > > ><rguent...@suse.de> > > >>>> wrote: > > >>>>> The following patch fixes PR81921 (and LTO build of libgo) which I > > >ran > > >>>>> into when trying to enable free-lang-data for non-LTO compiles. > > >>>>> > > >>>>> free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all > > >functions > > >>>>> so we have them ending up with target_option_default_node > > >eventually > > >>>>> which is something ix86_can_inline_p doesn't expect (I tried > > >forcing > > >>>>> a compare of the actual options but that fails as well as we get > > >>>>> spurious differences in use-fpmath, the default node with -m32 > > >>>>> -march=x86_64 doesn't have it while non-default nodes have it). > > >>>>> > > >>>>> The patch is what I consider safe for branches, we might want to > > >work > > >>>>> on sth better (actually comparing always and fixing the fpmath > > >issue) > > >>>>> on trunk as followup. > > >>>>> > > >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for > > >trunk > > >>>>> and active branches? > > >>>>> > > >>>>> Note the change to the ref = false conditional isn't strictly > > >necessary > > >>>>> but it makes -flto and non-flto behave consistently.
... > > >> The aarch64 changes looks ok to me FWIW (since I wrote that function) > > >> > > > > > >I *think* this is probably OK for AArch64, but I don't think it's safe > > >in general. > > > > > >Consider, for example, the arm port where we can have functions built > > >for ARM and functions built for Thumb. You cannot, in general, inline > > >either into the other, (think inline asm, or isa-specific intrinsic > > >operations). > > > > > >I think this approach only works at all because it's generally > > >meaningless to have some ISA that is the default and then construct > > >other functions later in the compilation that deliberately target a > > >smaller subset of the ISA. But such an assumption doesn't apply when > > >you construct functions in completely separate ISAs which are > > >call-compatible but otherwise distinct. > > > > Note the patch doesn't change anything for NON-LTO, where you never see > > target_option_default_node. Just LTO forces those in place of NULL to > > make cross-TU behavior more like you suggest. > > So thinking about this some more we lack the ability to distinguish > between no explicit target attribute and an explicit target attribute > that matches the global defaults with LTO. Without LTO it is the > target that decides whether to put target_option_default_node into > DECL_FUNCTION_SPECIFIC_TARGET but LTO treats the global options > as explicit target attribute for all functions of a TU. > > That said, the > > /* If callee has no option attributes, then it is ok to inline */ > if (!callee_opts) > ret = true; > > tries to allow inlining of functions with no explicit attribute > into any other function which IMHO is reasonable? > > That argues for sth like DECL_FUNCTION_SPECIFIC_TARGET_USER_P and > DECL_FUNCTION_SPECIFIC_OPTIMIZATION_USER_P. > > Does that make any sense? The question is whether, in LTO context, > differing global options are USER_P or not ... > > OTOH as you say usually function specific opts are a superset of > global opts which means if the target properly checks two option > sets against each other the effect should be similar. Which means we can also do the following (only the default and the x86 hook changed), always compare and properly treat a not present DECL_FUNCTION_SPECIFIC_TARGET as target_option_default_node. To fix PR81921 we need to make sure to process target("sse2") when it might change fpmath use (for -m32). Not sure if that captures all cases so I wonder whether skipping the processing is not premature optimization -- we should do this at most once per function? Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk for the x86 part? Is the ix86_valid_target_attribute_tree change ok for branches? Note that as there is the chance this now rejects cases that it didn't before even w/o -flto such change isn't appropriate for the branches (the ix86_valid_target_attribute_tree change is). Note I arrived here because enabling free-lang-data unconditionally broke bootstrap, so the changes to the other archs where a similar thing may happen -- you can check with Index: gcc/tree.c =================================================================== --- gcc/tree.c (revision 251274) +++ gcc/tree.c (working copy) @@ -5660,8 +5660,7 @@ free_lang_data (void) unsigned i; /* If we are the LTO frontend we have freed lang-specific data already. */ - if (in_lto_p - || (!flag_generate_lto && !flag_generate_offload)) + if (in_lto_p) return 0; /* Provide a dummy TRANSLATION_UNIT_DECL if the FE failed to provide one. */ Richard, can you add a testcase where it is invalid to inline -mthumb/arm code into the other? I suppose that requires some inline asm to break. Or is there already one? Thanks, Richard. 2017-08-23 Richard Biener <rguent...@suse.de> PR target/81921 * targhooks.c (default_target_can_inline_p): Properly use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare. * config/i386/i386.c (ix86_valid_target_attribute_tree): Make sure to process the attribute when fpmath may change. (ix86_can_inline_p): Properly use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare. Index: gcc/targhooks.c =================================================================== --- gcc/targhooks.c (revision 251275) +++ gcc/targhooks.c (working copy) @@ -1442,27 +1442,18 @@ default_target_option_pragma_parse (tree bool default_target_can_inline_p (tree caller, tree callee) { - bool ret = false; tree callee_opts = DECL_FUNCTION_SPECIFIC_TARGET (callee); tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller); - - /* If callee has no option attributes, then it is ok to inline */ - if (!callee_opts) - ret = true; - - /* If caller has no option attributes, but callee does then it is not ok to - inline */ - else if (!caller_opts) - ret = false; + if (! callee_opts) + callee_opts = target_option_default_node; + if (! caller_opts) + caller_opts = target_option_default_node; /* If both caller and callee have attributes, assume that if the pointer is different, the two functions have different target options since build_target_option_node uses a hash table for the options. */ - else - ret = (callee_opts == caller_opts); - - return ret; + return callee_opts == caller_opts; } /* If the machine does not have a case insn that compares the bounds, Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 251275) +++ gcc/config/i386/i386.c (working copy) @@ -7369,7 +7369,8 @@ ix86_valid_target_attribute_tree (tree a || opts->x_target_flags != def->x_target_flags || option_strings[IX86_FUNCTION_SPECIFIC_ARCH] || option_strings[IX86_FUNCTION_SPECIFIC_TUNE] - || enum_opts_set.x_ix86_fpmath) + || enum_opts_set.x_ix86_fpmath + || !TARGET_64BIT_P (opts->x_ix86_isa_flags)) { /* If we are using the default tune= or arch=, undo the string assigned, and use the default. */ @@ -7502,53 +7503,47 @@ ix86_valid_target_attribute_p (tree fnde static bool ix86_can_inline_p (tree caller, tree callee) { - bool ret = false; tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); - - /* If callee has no option attributes, then it is ok to inline. */ if (!callee_tree) - ret = true; + callee_tree = target_option_default_node; + if (!caller_tree) + caller_tree = target_option_default_node; + if (callee_tree == caller_tree) + return true; - /* If caller has no option attributes, but callee does then it is not ok to - inline. */ - else if (!caller_tree) + struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); + struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); + bool ret = false; + + /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 + function can inline a SSE2 function but a SSE2 function can't inline + a SSE4 function. */ + if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) + != callee_opts->x_ix86_isa_flags) + || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) + != callee_opts->x_ix86_isa_flags2)) ret = false; - else - { - struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); - struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); + /* See if we have the same non-isa options. */ + else if (caller_opts->x_target_flags != callee_opts->x_target_flags) + ret = false; - /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 - function can inline a SSE2 function but a SSE2 function can't inline - a SSE4 function. */ - if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) - != callee_opts->x_ix86_isa_flags) - || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) - != callee_opts->x_ix86_isa_flags2)) - ret = false; - - /* See if we have the same non-isa options. */ - else if (caller_opts->x_target_flags != callee_opts->x_target_flags) - ret = false; - - /* See if arch, tune, etc. are the same. */ - else if (caller_opts->arch != callee_opts->arch) - ret = false; - - else if (caller_opts->tune != callee_opts->tune) - ret = false; + /* See if arch, tune, etc. are the same. */ + else if (caller_opts->arch != callee_opts->arch) + ret = false; - else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath) - ret = false; + else if (caller_opts->tune != callee_opts->tune) + ret = false; - else if (caller_opts->branch_cost != callee_opts->branch_cost) - ret = false; + else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath) + ret = false; - else - ret = true; - } + else if (caller_opts->branch_cost != callee_opts->branch_cost) + ret = false; + + else + ret = true; return ret; } Index: gcc/testsuite/gcc.target/i386/pr81921.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr81921.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr81921.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lto } */ +/* { dg-options "-flto -march=x86-64" } */ + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__, target("sse2"))) +_mm_loadu_si128 (int const *__P) +{ + return *__P; +} + +void __attribute__((target("ssse3"))) foo (void *p) +{ + volatile int x = _mm_loadu_si128 (p); +}