On 3/15/22 13:09, Patrick Palka wrote:
On Tue, 15 Mar 2022, Jason Merrill wrote:

On 3/15/22 10:03, Patrick Palka wrote:
On Mon, 14 Mar 2022, Jason Merrill wrote:

On 3/14/22 13:13, Patrick Palka wrote:
On Fri, 11 Mar 2022, Jason Merrill wrote:

On 3/10/22 11:27, Patrick Palka wrote:
On Wed, 9 Mar 2022, Jason Merrill wrote:

On 3/1/22 18:08, Patrick Palka wrote:
A well-formed call to std::move/forward is equivalent to a cast,
but
the
former being a function call means it comes with bloated debug
info,
which
persists even after the call has been inlined away, for an
operation
that
is never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by
folding
calls to std::move/forward into casts as part of the frontend's
general
expression folding routine.  After this patch with -O2 and a
non-checking
compiler, debug info size for some testcases decreases by about
~10%
and
overall compile time and memory usage decreases by ~2%.

Impressive.  Which testcases?

I saw the largest percent reductions in debug file object size in
various tests from cmcstl2 and range-v3, e.g.
test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
(which are among their biggest tests).

Significant reductions in debug object file size can be observed in
some libstdc++ testcases too, such as a 5.5% reduction in
std/ranges/adaptor/join.cc


Do you also want to handle addressof and as_const in this patch,
as
Jonathan
suggested?

Yes, good idea.  Since each of their argument and return types are
indirect types, I think we can use the same NOP_EXPR-based folding
for
them.


I think we can do this now, and think about generalizing more in
stage
1.

Bootstrapped and regtested on x86_64-pc-linux-gnu, is this
something
we
want to consider for GCC 12?

        PR c++/96780

gcc/cp/ChangeLog:

        * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing,
        fold calls to std::move/forward into simple casts.
        * cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
        * typeck.cc (is_std_move_p, is_std_forward_p): Export.

gcc/testsuite/ChangeLog:

        * g++.dg/opt/pr96780.C: New test.
---
      gcc/cp/cp-gimplify.cc              | 18 ++++++++++++++++++
      gcc/cp/cp-tree.h                   |  2 ++
      gcc/cp/typeck.cc                   |  6 ++----
      gcc/testsuite/g++.dg/opt/pr96780.C | 24
++++++++++++++++++++++++
      4 files changed, 46 insertions(+), 4 deletions(-)
      create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..0b009b631c7 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,6 +2756,24 @@ cp_fold (tree x)
            case CALL_EXPR:
            {
+       if (optimize

I think this should check flag_no_inline rather than optimize.

Sounds good.

Here's a patch that extends the folding to as_const and addressof
(as
well as __addressof, which I'm kind of unsure about since it's
non-standard).  I suppose it also doesn't hurt to verify that the
return
and argument type of the function are sane before we commit to
folding.

-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but
the
former being a function call means the compiler generates debug info
for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by
folding
calls to std::move/forward and other cast-like functions into simple
casts as part of the frontend's general expression folding routine.
After this patch with -O2 and a non-checking compiler, debug info
size
for some testcases decreases by about ~10% and overall compile time
and
memory usage decreases by ~2%.

        PR c++/96780

gcc/cp/ChangeLog:

        * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing,
        fold calls to std::move/forward and other cast-like functions
        into simple casts.

gcc/testsuite/ChangeLog:

        * g++.dg/opt/pr96780.C: New test.
---
     gcc/cp/cp-gimplify.cc              | 36
+++++++++++++++++++++++++++-
     gcc/testsuite/g++.dg/opt/pr96780.C | 38
++++++++++++++++++++++++++++++
     2 files changed, 73 insertions(+), 1 deletion(-)
     create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..efc4c8f0eb9 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,9 +2756,43 @@ cp_fold (tree x)
           case CALL_EXPR:
           {
-       int sv = optimize, nw = sv;
        tree callee = get_callee_fndecl (x);
     +  /* "Inline" calls to std::move/forward and other cast-like
functions
+          by simply folding them into the corresponding cast
determined by
+          their return type.  This is cheaper than relying on the
middle-end
+          to do so, and also means we avoid generating useless debug
info for
+          them at all.
+
+          At this point the argument has already been converted into
a
+          reference, so it suffices to use a NOP_EXPR to express the
+          cast.  */
+       if (!flag_no_inline

In our conversation yesterday it occurred to me that we might make
this a
separate flag that defaults to the value of flag_no_inline; I was
thinking
of
-ffold-simple-inlines.  Then Vittorio et al can specify that
explicitly at
-O0
if they'd like.

Makes sense, like so?  Bootstrapped and regtested on
x86_64-pc-linux-gnu.

The patch defaults -ffold-simple-inlines according to the value of
flag_no_inline at startup.  IIUC this means that if the flag has been
defaulted to set, then e.g. an optimize("O0") function attribute won't
disable -ffold-simple-inlines for that function, since we only compute
its default value once.

I wonder if we therefore instead want to handle defaulting the flag
when it's used, e.g. check

     (flag_fold_simple_inlines == -1
      ? flag_no_inline
      : flag_fold_simple_inlines)

instead of

     flag_fold_simple_inlines

in cp_fold?

I guess that makes sense, we can't add front-end options to the
default_options_table.  But I think let's use OPTION_SET_P instead of
checking
for -1.

Done.


-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem by folding calls to std::move/forward
and other cast-like functions into simple casts as part of the
frontend's
general expression folding routine.  This behavior is controlled by a
new flag -ffold-simple-inlines which defaults to the value of
-fno-inline.

After this patch with -O2 and a non-checking compiler, debug info size
for some testcases (e.g. from range-v3 and cmcstl2) decreases by about
~10% and overall compile time and memory usage decreases by ~2%.

Did you compare the reduction after handling more functions?

The numbers are roughly the same, which I guess is not too surprising
since calls to std::move/forward outnumber the other functions by about
10:1 in libstdc++, range-v3 and cmcstl2.

The biggest reduction in debug object file size (measured by du) I've
observed is 14% with range-v3's test/algorithm/stable_partition.cpp.
The biggest reduction in peak memory usage is (measured by /usr/bin/time -v)
is 5% with cmcstl's test/algorithm/set_symmetric_difference4.cpp.  The
biggest reduction in compile time (measured by perf stat) is about 3%,
also from that testcase.

-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem by folding calls to std::move/forward
and other cast-like functions into simple casts as part of the frontend's
general expression folding routine.  This behavior is controlled by a
new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that
users can enable such folding even with -O0 (which implies -fno-inline).

After this patch with -O2 and a non-checking compiler, debug info size
for some testcases (e.g. from range-v3 and cmcstl2) decreases by about
~10% and overall compile time and memory usage decreases by ~2%.

        PR c++/96780

gcc/c-family/ChangeLog:

        * c-opts.cc (c_common_post_options): Handle defaulting of
        flag_fold_simple_inlines.
        * c.opt: Add -ffold-simple-inlines.

Looks like you still need a doc/invoke.texi change for the new flag. The rest
of the patch looks good.

Like his perhaps?  I opted to document the current scope of the flag
(which only cares about a fixed set of functions) as opposed to its
future scope (folding all sufficiently simple inline functions).

OK.

-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem by folding calls to std::move/forward
and other cast-like functions into simple casts as part of the frontend's
general expression folding routine.  This behavior is controlled by a
new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that
users can enable this folding with -O0 (which implies -fno-inline).

After this patch with -O2 and a non-checking compiler, debug info size
for some testcases from range-v3 and cmcstl2 decreases by as much as ~10%
and overall compile time and memory usage decreases by ~2%.

        PR c++/96780

gcc/c-family/ChangeLog:

        * c.opt: Add -ffold-simple-inlines.

gcc/ChangeLog:

        * doc/invoke.texi (C++ Dialect Options): Document
        -ffold-simple-inlines.

gcc/cp/ChangeLog:

        * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to
        std::move/forward and other cast-like functions into simple
        casts.

gcc/testsuite/ChangeLog:

        * g++.dg/opt/pr96780.C: New test.
---
  gcc/c-family/c.opt                 |  4 ++++
  gcc/cp/cp-gimplify.cc              | 38 +++++++++++++++++++++++++++++-
  gcc/doc/invoke.texi                | 10 ++++++++
  gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++
  4 files changed, 89 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 9cfd2a6bc4e..9a4828ebe37 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables 
in a different translat
  fexternal-templates
  C++ ObjC++ WarnRemoved
+ffold-simple-inlines
+C++ ObjC++ Optimization Var(flag_fold_simple_inlines)
+Fold calls to simple inline functions.
+
  ffor-scope
  C++ ObjC++ WarnRemoved
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..e4c2644af15 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "file-prefix-map.h"
  #include "cgraph.h"
  #include "omp-general.h"
+#include "opts.h"
/* Forward declarations. */ @@ -2756,9 +2757,44 @@ cp_fold (tree x) case CALL_EXPR:
        {
-       int sv = optimize, nw = sv;
        tree callee = get_callee_fndecl (x);
+ /* "Inline" calls to std::move/forward and other cast-like functions
+          by simply folding them into a corresponding cast to their return
+          type.  This is cheaper than relying on the middle end to do so, and
+          also means we avoid generating useless debug info for them at all.
+
+          At this point the argument has already been converted into a
+          reference, so it suffices to use a NOP_EXPR to express the
+          cast.  */
+       if ((OPTION_SET_P (flag_fold_simple_inlines)
+            ? flag_fold_simple_inlines
+            : !flag_no_inline)
+           && call_expr_nargs (x) == 1
+           && decl_in_std_namespace_p (callee)
+           && DECL_NAME (callee) != NULL_TREE
+           && (id_equal (DECL_NAME (callee), "move")
+               || id_equal (DECL_NAME (callee), "forward")
+               || id_equal (DECL_NAME (callee), "addressof")
+               /* This addressof equivalent is used heavily in libstdc++.  */
+               || id_equal (DECL_NAME (callee), "__addressof")
+               || id_equal (DECL_NAME (callee), "as_const")))
+         {
+           r = CALL_EXPR_ARG (x, 0);
+           /* Check that the return and argument types are sane before
+              folding.  */
+           if (INDIRECT_TYPE_P (TREE_TYPE (x))
+               && INDIRECT_TYPE_P (TREE_TYPE (r)))
+             {
+               if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r)))
+                 r = build_nop (TREE_TYPE (x), r);
+               x = cp_fold (r);
+               break;
+             }
+         }
+
+       int sv = optimize, nw = sv;
+
        /* Some built-in function calls will be evaluated at compile-time in
           fold ().  Set optimize to 1 when folding __builtin_constant_p inside
           a constexpr function so that fold_builtin_1 doesn't fold it to 0.  */
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2a14e1a9472..d65979bba3f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3124,6 +3124,16 @@ On targets that support symbol aliases, the default is
  @option{-fextern-tls-init}.  On targets that do not support symbol
  aliases, the default is @option{-fno-extern-tls-init}.
+@item -ffold-simple-inlines
+@itemx -fno-fold-simple-inlines
+@opindex ffold-simple-inlines
+@opindex fno-fold-simple-inlines
+Permit the C++ frontend to fold calls to @code{std::move}, @code{std::forward},
+@code{std::addressof} and @code{std::as_const}.  In contrast to inlining, this
+means no debug information will be generated for such calls.  Since these
+functions are rarely interesting to debug, this flag is enabled by default
+unless @option{-fno-inline} is active.
+
  @item -fno-gnu-keywords
  @opindex fno-gnu-keywords
  @opindex fgnu-keywords
diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C 
b/gcc/testsuite/g++.dg/opt/pr96780.C
new file mode 100644
index 00000000000..61e11855eeb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr96780.C
@@ -0,0 +1,38 @@
+// PR c++/96780
+// Verify calls to std::move/forward are folded away by the frontend.
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" }
+
+#include <utility>
+
+struct A;
+
+extern A& a;
+extern const A& ca;
+
+void f() {
+  auto&& x1 = std::move(a);
+  auto&& x2 = std::forward<A>(a);
+  auto&& x3 = std::forward<A&>(a);
+
+  auto&& x4 = std::move(ca);
+  auto&& x5 = std::forward<const A>(ca);
+  auto&& x6 = std::forward<const A&>(ca);
+
+  auto x7 = std::addressof(a);
+  auto x8 = std::addressof(ca);
+#if __GLIBCXX__
+  auto x9 = std::__addressof(a);
+  auto x10 = std::__addressof(ca);
+#endif
+#if __cpp_lib_as_const
+  auto&& x11 = std::as_const(a);
+  auto&& x12 = std::as_const(ca);
+#endif
+}
+
+// { dg-final { scan-tree-dump-not "= std::move" "gimple" } }
+// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } }
+// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } }
+// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } }
+// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }

Reply via email to