On Wed, 23 Aug 2017, Richard Biener wrote:

> On Tue, 22 Aug 2017, Richard Biener wrote:
> 
> > On August 22, 2017 6:38:55 PM GMT+02:00, "Richard Earnshaw (lists)" 
> > <richard.earns...@arm.com> wrote:
> > >On 22/08/17 13:55, Kyrill Tkachov wrote:
> > >> Hi Richard,
> > >> [roping in more aarch64 maintainers]
> > >> 
> > >> On 22/08/17 13:27, Richard Biener wrote:
> > >>> On Tue, 22 Aug 2017, Uros Bizjak wrote:
> > >>>
> > >>>> On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener
> > ><rguent...@suse.de>
> > >>>> wrote:
> > >>>>> The following patch fixes PR81921 (and LTO build of libgo) which I
> > >ran
> > >>>>> into when trying to enable free-lang-data for non-LTO compiles.
> > >>>>>
> > >>>>> free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all
> > >functions
> > >>>>> so we have them ending up with target_option_default_node
> > >eventually
> > >>>>> which is something ix86_can_inline_p doesn't expect (I tried
> > >forcing
> > >>>>> a compare of the actual options but that fails as well as we get
> > >>>>> spurious differences in use-fpmath, the default node with -m32
> > >>>>> -march=x86_64 doesn't have it while non-default nodes have it).
> > >>>>>
> > >>>>> The patch is what I consider safe for branches, we might want to
> > >work
> > >>>>> on sth better (actually comparing always and fixing the fpmath
> > >issue)
> > >>>>> on trunk as followup.
> > >>>>>
> > >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for
> > >trunk
> > >>>>> and active branches?
> > >>>>>
> > >>>>> Note the change to the ref = false conditional isn't strictly
> > >necessary
> > >>>>> but it makes -flto and non-flto behave consistently.

...

> > >> The aarch64 changes looks ok to me FWIW (since I wrote that function)
> > >> 
> > >
> > >I *think* this is probably OK for AArch64, but I don't think it's safe
> > >in general.
> > >
> > >Consider, for example, the arm port where we can have functions built
> > >for ARM and functions built for Thumb.  You cannot, in general, inline
> > >either into the other, (think inline asm, or isa-specific intrinsic
> > >operations).
> > >
> > >I think this approach only works at all because it's generally
> > >meaningless to have some ISA that is the default and then construct
> > >other functions later in the compilation that deliberately target a
> > >smaller subset of the ISA.  But such an assumption doesn't apply when
> > >you construct functions in completely separate ISAs which are
> > >call-compatible but otherwise distinct.
> > 
> > Note the patch doesn't change anything for NON-LTO, where you never see 
> > target_option_default_node. Just LTO forces those in place of NULL to 
> > make cross-TU behavior more like you suggest.
> 
> So thinking about this some more we lack the ability to distinguish
> between no explicit target attribute and an explicit target attribute
> that matches the global defaults with LTO.  Without LTO it is the
> target that decides whether to put target_option_default_node into
> DECL_FUNCTION_SPECIFIC_TARGET but LTO treats the global options
> as explicit target attribute for all functions of a TU.
> 
> That said, the
> 
>   /* If callee has no option attributes, then it is ok to inline */
>   if (!callee_opts)
>     ret = true;
> 
> tries to allow inlining of functions with no explicit attribute
> into any other function which IMHO is reasonable?
> 
> That argues for sth like DECL_FUNCTION_SPECIFIC_TARGET_USER_P and
> DECL_FUNCTION_SPECIFIC_OPTIMIZATION_USER_P.
> 
> Does that make any sense?  The question is whether, in LTO context,
> differing global options are USER_P or not ...
> 
> OTOH as you say usually function specific opts are a superset of
> global opts which means if the target properly checks two option
> sets against each other the effect should be similar.

Which means we can also do the following (only the default and the x86
hook changed), always compare and properly treat a not present
DECL_FUNCTION_SPECIFIC_TARGET as target_option_default_node.

To fix PR81921 we need to make sure to process target("sse2") when
it might change fpmath use (for -m32).  Not sure if that captures
all cases so I wonder whether skipping the processing is not
premature optimization -- we should do this at most once per function?

Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk
for the x86 part?  Is the ix86_valid_target_attribute_tree change ok
for branches?

Note that as there is the chance this now rejects cases that it didn't
before even w/o -flto such change isn't appropriate for the branches
(the ix86_valid_target_attribute_tree change is).  Note I arrived
here because enabling free-lang-data unconditionally broke bootstrap,
so the changes to the other archs where a similar thing may happen --
you can check with

Index: gcc/tree.c
===================================================================
--- gcc/tree.c  (revision 251274)
+++ gcc/tree.c  (working copy)
@@ -5660,8 +5660,7 @@ free_lang_data (void)
   unsigned i;
 
   /* If we are the LTO frontend we have freed lang-specific data already.  
*/
-  if (in_lto_p
-      || (!flag_generate_lto && !flag_generate_offload))
+  if (in_lto_p)
     return 0;
 
   /* Provide a dummy TRANSLATION_UNIT_DECL if the FE failed to provide 
one.  */

Richard, can you add a testcase where it is invalid to inline
-mthumb/arm code into the other?  I suppose that requires some
inline asm to break.  Or is there already one?

Thanks,
Richard.

2017-08-23  Richard Biener  <rguent...@suse.de>

        PR target/81921
        * targhooks.c (default_target_can_inline_p): Properly
        use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET
        is present and always compare.
        * config/i386/i386.c (ix86_valid_target_attribute_tree): Make
        sure to process the attribute when fpmath may change.
        (ix86_can_inline_p): Properly use target_option_default_node when
        no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare.

Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c     (revision 251275)
+++ gcc/targhooks.c     (working copy)
@@ -1442,27 +1442,18 @@ default_target_option_pragma_parse (tree
 bool
 default_target_can_inline_p (tree caller, tree callee)
 {
-  bool ret = false;
   tree callee_opts = DECL_FUNCTION_SPECIFIC_TARGET (callee);
   tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller);
-
-  /* If callee has no option attributes, then it is ok to inline */
-  if (!callee_opts)
-    ret = true;
-
-  /* If caller has no option attributes, but callee does then it is not ok to
-     inline */
-  else if (!caller_opts)
-    ret = false;
+  if (! callee_opts)
+    callee_opts = target_option_default_node;
+  if (! caller_opts)
+    caller_opts = target_option_default_node;
 
   /* If both caller and callee have attributes, assume that if the
      pointer is different, the two functions have different target
      options since build_target_option_node uses a hash table for the
      options.  */
-  else
-    ret = (callee_opts == caller_opts);
-
-  return ret;
+  return callee_opts == caller_opts;
 }
 
 /* If the machine does not have a case insn that compares the bounds,
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c      (revision 251275)
+++ gcc/config/i386/i386.c      (working copy)
@@ -7369,7 +7369,8 @@ ix86_valid_target_attribute_tree (tree a
       || opts->x_target_flags != def->x_target_flags
       || option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
       || option_strings[IX86_FUNCTION_SPECIFIC_TUNE]
-      || enum_opts_set.x_ix86_fpmath)
+      || enum_opts_set.x_ix86_fpmath
+      || !TARGET_64BIT_P (opts->x_ix86_isa_flags))
     {
       /* If we are using the default tune= or arch=, undo the string assigned,
         and use the default.  */
@@ -7502,53 +7503,47 @@ ix86_valid_target_attribute_p (tree fnde
 static bool
 ix86_can_inline_p (tree caller, tree callee)
 {
-  bool ret = false;
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
-
-  /* If callee has no option attributes, then it is ok to inline.  */
   if (!callee_tree)
-    ret = true;
+    callee_tree = target_option_default_node;
+  if (!caller_tree)
+    caller_tree = target_option_default_node;
+  if (callee_tree == caller_tree)
+    return true;
 
-  /* If caller has no option attributes, but callee does then it is not ok to
-     inline.  */
-  else if (!caller_tree)
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  bool ret = false;
+
+  /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
+     function can inline a SSE2 function but a SSE2 function can't inline
+     a SSE4 function.  */
+  if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
+       != callee_opts->x_ix86_isa_flags)
+      || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+         != callee_opts->x_ix86_isa_flags2))
     ret = false;
 
-  else
-    {
-      struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
-      struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  /* See if we have the same non-isa options.  */
+  else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
+    ret = false;
 
-      /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
-        function can inline a SSE2 function but a SSE2 function can't inline
-        a SSE4 function.  */
-      if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
-          != callee_opts->x_ix86_isa_flags)
-         || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
-             != callee_opts->x_ix86_isa_flags2))
-       ret = false;
-
-      /* See if we have the same non-isa options.  */
-      else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
-       ret = false;
-
-      /* See if arch, tune, etc. are the same.  */
-      else if (caller_opts->arch != callee_opts->arch)
-       ret = false;
-
-      else if (caller_opts->tune != callee_opts->tune)
-       ret = false;
+  /* See if arch, tune, etc. are the same.  */
+  else if (caller_opts->arch != callee_opts->arch)
+    ret = false;
 
-      else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath)
-       ret = false;
+  else if (caller_opts->tune != callee_opts->tune)
+    ret = false;
 
-      else if (caller_opts->branch_cost != callee_opts->branch_cost)
-       ret = false;
+  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath)
+    ret = false;
 
-      else
-       ret = true;
-    }
+  else if (caller_opts->branch_cost != callee_opts->branch_cost)
+    ret = false;
+
+  else
+    ret = true;
 
   return ret;
 }
Index: gcc/testsuite/gcc.target/i386/pr81921.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr81921.c     (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr81921.c     (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -march=x86-64" } */
+
+extern __inline int  __attribute__((__gnu_inline__, __always_inline__, 
__artificial__, target("sse2")))
+_mm_loadu_si128 (int const *__P)
+{
+    return *__P;
+}
+
+void __attribute__((target("ssse3"))) foo (void *p)
+{
+  volatile int x = _mm_loadu_si128 (p);
+}

Reply via email to