On Wed, Aug 23, 2017 at 10:04 AM, Richard Biener <rguent...@suse.de> wrote: > On Wed, 23 Aug 2017, Richard Biener wrote: > >> On Tue, 22 Aug 2017, Richard Biener wrote: >> >> > On August 22, 2017 6:38:55 PM GMT+02:00, "Richard Earnshaw (lists)" >> > <richard.earns...@arm.com> wrote: >> > >On 22/08/17 13:55, Kyrill Tkachov wrote: >> > >> Hi Richard, >> > >> [roping in more aarch64 maintainers] >> > >> >> > >> On 22/08/17 13:27, Richard Biener wrote: >> > >>> On Tue, 22 Aug 2017, Uros Bizjak wrote: >> > >>> >> > >>>> On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener >> > ><rguent...@suse.de> >> > >>>> wrote: >> > >>>>> The following patch fixes PR81921 (and LTO build of libgo) which I >> > >ran >> > >>>>> into when trying to enable free-lang-data for non-LTO compiles. >> > >>>>> >> > >>>>> free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all >> > >functions >> > >>>>> so we have them ending up with target_option_default_node >> > >eventually >> > >>>>> which is something ix86_can_inline_p doesn't expect (I tried >> > >forcing >> > >>>>> a compare of the actual options but that fails as well as we get >> > >>>>> spurious differences in use-fpmath, the default node with -m32 >> > >>>>> -march=x86_64 doesn't have it while non-default nodes have it). >> > >>>>> >> > >>>>> The patch is what I consider safe for branches, we might want to >> > >work >> > >>>>> on sth better (actually comparing always and fixing the fpmath >> > >issue) >> > >>>>> on trunk as followup. >> > >>>>> >> > >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for >> > >trunk >> > >>>>> and active branches? >> > >>>>> >> > >>>>> Note the change to the ref = false conditional isn't strictly >> > >necessary >> > >>>>> but it makes -flto and non-flto behave consistently. > > ... > >> > >> The aarch64 changes looks ok to me FWIW (since I wrote that function) >> > >> >> > > >> > >I *think* this is probably OK for AArch64, but I don't think it's safe >> > >in general. >> > > >> > >Consider, for example, the arm port where we can have functions built >> > >for ARM and functions built for Thumb. You cannot, in general, inline >> > >either into the other, (think inline asm, or isa-specific intrinsic >> > >operations). >> > > >> > >I think this approach only works at all because it's generally >> > >meaningless to have some ISA that is the default and then construct >> > >other functions later in the compilation that deliberately target a >> > >smaller subset of the ISA. But such an assumption doesn't apply when >> > >you construct functions in completely separate ISAs which are >> > >call-compatible but otherwise distinct. >> > >> > Note the patch doesn't change anything for NON-LTO, where you never see >> > target_option_default_node. Just LTO forces those in place of NULL to >> > make cross-TU behavior more like you suggest. >> >> So thinking about this some more we lack the ability to distinguish >> between no explicit target attribute and an explicit target attribute >> that matches the global defaults with LTO. Without LTO it is the >> target that decides whether to put target_option_default_node into >> DECL_FUNCTION_SPECIFIC_TARGET but LTO treats the global options >> as explicit target attribute for all functions of a TU. >> >> That said, the >> >> /* If callee has no option attributes, then it is ok to inline */ >> if (!callee_opts) >> ret = true; >> >> tries to allow inlining of functions with no explicit attribute >> into any other function which IMHO is reasonable? >> >> That argues for sth like DECL_FUNCTION_SPECIFIC_TARGET_USER_P and >> DECL_FUNCTION_SPECIFIC_OPTIMIZATION_USER_P. >> >> Does that make any sense? The question is whether, in LTO context, >> differing global options are USER_P or not ... >> >> OTOH as you say usually function specific opts are a superset of >> global opts which means if the target properly checks two option >> sets against each other the effect should be similar. > > Which means we can also do the following (only the default and the x86 > hook changed), always compare and properly treat a not present > DECL_FUNCTION_SPECIFIC_TARGET as target_option_default_node. > > To fix PR81921 we need to make sure to process target("sse2") when > it might change fpmath use (for -m32). Not sure if that captures > all cases so I wonder whether skipping the processing is not > premature optimization -- we should do this at most once per function? > > Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk > for the x86 part? Is the ix86_valid_target_attribute_tree change ok > for branches? > > Note that as there is the chance this now rejects cases that it didn't > before even w/o -flto such change isn't appropriate for the branches > (the ix86_valid_target_attribute_tree change is). Note I arrived > here because enabling free-lang-data unconditionally broke bootstrap, > so the changes to the other archs where a similar thing may happen -- > you can check with > > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c (revision 251274) > +++ gcc/tree.c (working copy) > @@ -5660,8 +5660,7 @@ free_lang_data (void) > unsigned i; > > /* If we are the LTO frontend we have freed lang-specific data already. > */ > - if (in_lto_p > - || (!flag_generate_lto && !flag_generate_offload)) > + if (in_lto_p) > return 0; > > /* Provide a dummy TRANSLATION_UNIT_DECL if the FE failed to provide > one. */ > > Richard, can you add a testcase where it is invalid to inline > -mthumb/arm code into the other? I suppose that requires some > inline asm to break. Or is there already one? > > Thanks, > Richard. > > 2017-08-23 Richard Biener <rguent...@suse.de> > > PR target/81921 > * targhooks.c (default_target_can_inline_p): Properly > use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET > is present and always compare. > * config/i386/i386.c (ix86_valid_target_attribute_tree): Make > sure to process the attribute when fpmath may change. > (ix86_can_inline_p): Properly use target_option_default_node when > no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare. > > Index: gcc/targhooks.c > =================================================================== > --- gcc/targhooks.c (revision 251275) > +++ gcc/targhooks.c (working copy) > @@ -1442,27 +1442,18 @@ default_target_option_pragma_parse (tree > bool > default_target_can_inline_p (tree caller, tree callee) > { > - bool ret = false; > tree callee_opts = DECL_FUNCTION_SPECIFIC_TARGET (callee); > tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller); > - > - /* If callee has no option attributes, then it is ok to inline */ > - if (!callee_opts) > - ret = true; > - > - /* If caller has no option attributes, but callee does then it is not ok to > - inline */ > - else if (!caller_opts) > - ret = false; > + if (! callee_opts) > + callee_opts = target_option_default_node; > + if (! caller_opts) > + caller_opts = target_option_default_node; > > /* If both caller and callee have attributes, assume that if the > pointer is different, the two functions have different target > options since build_target_option_node uses a hash table for the > options. */ > - else > - ret = (callee_opts == caller_opts); > - > - return ret; > + return callee_opts == caller_opts; > } > > /* If the machine does not have a case insn that compares the bounds, > Index: gcc/config/i386/i386.c > =================================================================== > --- gcc/config/i386/i386.c (revision 251275) > +++ gcc/config/i386/i386.c (working copy) > @@ -7369,7 +7369,8 @@ ix86_valid_target_attribute_tree (tree a > || opts->x_target_flags != def->x_target_flags > || option_strings[IX86_FUNCTION_SPECIFIC_ARCH] > || option_strings[IX86_FUNCTION_SPECIFIC_TUNE] > - || enum_opts_set.x_ix86_fpmath) > + || enum_opts_set.x_ix86_fpmath > + || !TARGET_64BIT_P (opts->x_ix86_isa_flags))
You should use (!TARGET_64BIT_P ... && TARGET_SSE_P ...) to match the condition in the special fpmath processing part. (BTW: The comment in that part is wrong, we need SSE, not sse2 on 32-bit targets to use SSE math.) Other x86 changes LGTM. Uros. > { > /* If we are using the default tune= or arch=, undo the string > assigned, > and use the default. */ > @@ -7502,53 +7503,47 @@ ix86_valid_target_attribute_p (tree fnde > static bool > ix86_can_inline_p (tree caller, tree callee) > { > - bool ret = false; > tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); > tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); > - > - /* If callee has no option attributes, then it is ok to inline. */ > if (!callee_tree) > - ret = true; > + callee_tree = target_option_default_node; > + if (!caller_tree) > + caller_tree = target_option_default_node; > + if (callee_tree == caller_tree) > + return true; > > - /* If caller has no option attributes, but callee does then it is not ok to > - inline. */ > - else if (!caller_tree) > + struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); > + struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); > + bool ret = false; > + > + /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 > + function can inline a SSE2 function but a SSE2 function can't inline > + a SSE4 function. */ > + if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > + != callee_opts->x_ix86_isa_flags) > + || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) > + != callee_opts->x_ix86_isa_flags2)) > ret = false; > > - else > - { > - struct cl_target_option *caller_opts = TREE_TARGET_OPTION > (caller_tree); > - struct cl_target_option *callee_opts = TREE_TARGET_OPTION > (callee_tree); > + /* See if we have the same non-isa options. */ > + else if (caller_opts->x_target_flags != callee_opts->x_target_flags) > + ret = false; > > - /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 > - function can inline a SSE2 function but a SSE2 function can't inline > - a SSE4 function. */ > - if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > - != callee_opts->x_ix86_isa_flags) > - || ((caller_opts->x_ix86_isa_flags2 & > callee_opts->x_ix86_isa_flags2) > - != callee_opts->x_ix86_isa_flags2)) > - ret = false; > - > - /* See if we have the same non-isa options. */ > - else if (caller_opts->x_target_flags != callee_opts->x_target_flags) > - ret = false; > - > - /* See if arch, tune, etc. are the same. */ > - else if (caller_opts->arch != callee_opts->arch) > - ret = false; > - > - else if (caller_opts->tune != callee_opts->tune) > - ret = false; > + /* See if arch, tune, etc. are the same. */ > + else if (caller_opts->arch != callee_opts->arch) > + ret = false; > > - else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath) > - ret = false; > + else if (caller_opts->tune != callee_opts->tune) > + ret = false; > > - else if (caller_opts->branch_cost != callee_opts->branch_cost) > - ret = false; > + else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath) > + ret = false; > > - else > - ret = true; > - } > + else if (caller_opts->branch_cost != callee_opts->branch_cost) > + ret = false; > + > + else > + ret = true; > > return ret; > } > Index: gcc/testsuite/gcc.target/i386/pr81921.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/pr81921.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/pr81921.c (working copy) > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target lto } */ > +/* { dg-options "-flto -march=x86-64" } */ > + > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > __artificial__, target("sse2"))) > +_mm_loadu_si128 (int const *__P) > +{ > + return *__P; > +} > + > +void __attribute__((target("ssse3"))) foo (void *p) > +{ > + volatile int x = _mm_loadu_si128 (p); > +}