Richard Biener <richard.guent...@gmail.com> writes: > On Mon, Jul 8, 2019 at 4:41 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Richard Biener <richard.guent...@gmail.com> writes: >> > On Sun, Jul 7, 2019 at 9:07 PM Jeff Law <l...@redhat.com> wrote: >> >> >> >> On 7/7/19 3:45 AM, Richard Sandiford wrote: >> >> > DCE tries to delete dead stores to local data and also tries to insert >> >> > debug binds for simple cases: >> >> > >> >> > /* If this is a store into a variable that is being optimized away, >> >> > add a debug bind stmt if possible. */ >> >> > if (MAY_HAVE_DEBUG_BIND_STMTS >> >> > && gimple_assign_single_p (stmt) >> >> > && is_gimple_val (gimple_assign_rhs1 (stmt))) >> >> > { >> >> > tree lhs = gimple_assign_lhs (stmt); >> >> > if ((VAR_P (lhs) || TREE_CODE (lhs) == PARM_DECL) >> >> > && !DECL_IGNORED_P (lhs) >> >> > && is_gimple_reg_type (TREE_TYPE (lhs)) >> >> > && !is_global_var (lhs) >> >> > && !DECL_HAS_VALUE_EXPR_P (lhs)) >> >> > { >> >> > tree rhs = gimple_assign_rhs1 (stmt); >> >> > gdebug *note >> >> > = gimple_build_debug_bind (lhs, unshare_expr (rhs), stmt); >> >> > gsi_insert_after (i, note, GSI_SAME_STMT); >> >> > } >> >> > } >> >> > >> >> > But this doesn't help for things like "print *ptr" when ptr points >> >> > to the local variable (tests Og-dce-1.c and Og-dce-2.c). It also tends >> >> > to make the *live* -- and thus useful -- values optimised out, because >> >> > we can't yet switch back to tracking the memory location as it evolves >> >> > over time (test Og-dce-3.c). >> >> > >> >> > So for -Og I think it'd be better not to delete any stmts with >> >> > vdefs for now. This also means that we can avoid the potentially >> >> > expensive vop walks (which already have a cut-off, but still). >> >> > >> >> > The patch also fixes the Og failures in gcc.dg/guality/pr54970.c >> >> > (PR 86638). >> >> > >> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> > >> >> > Richard >> >> > >> >> > >> >> > 2019-07-07 Richard Sandiford <richard.sandif...@arm.com> >> >> > >> >> > gcc/ >> >> > PR debug/86638 >> >> > * tree-ssa-dce.c (keep_all_vdefs_p): New function. >> >> > (mark_stmt_if_obviously_necessary): Mark all stmts with vdefs as >> >> > necessary if keep_all_vdefs_p is true. >> >> > (mark_aliased_reaching_defs_necessary): Add a gcc_checking_assert >> >> > that keep_all_vdefs_p is false. >> >> > (mark_all_reaching_defs_necessary): Likewise. >> >> > (propagate_necessity): Skip the vuse scan if keep_all_vdefs_p is >> >> > true. >> >> > >> >> > gcc/testsuite/ >> >> > * c-c++-common/guality/Og-dce-1.c: New test. >> >> > * c-c++-common/guality/Og-dce-2.c: Likewise. >> >> > * c-c++-common/guality/Og-dce-3.c: Likewise. >> >> OK >> > >> > I wonder how code size (and compile-time) is affected by the DSE/DCE patch? >> > Say just look at -Og built cc1? >> >> Overall I see a ~2.5% slowdown and a 4.7% increase in load size. >> That comes almost entirely from the (RTL) DSE side; this patch >> and gimple DSE part don't seem to make much difference. >> >> If I keep the gimple passes as-is and just disable RTL DSE, the slowdown >> is still ~2.5% and there's a 4.4% increase in load size. >> >> These are all measuring cc1plus (built from post-patch sources) >> and using -O2 -g tree-into-ssa.ii for the speed checks. >> >> > Can you restrict the keep-all-vdefs to user variables (and measure the >> > difference this makes)? >> >> In order to avoid wrong debug for pointer dereferences, I think it would >> have to be keep-all-vdefs for writes to either user variables or unknown >> locations. But as above, I can't measure a significant difference with >> the patch. >> >> > Again I wonder if this makes C++ with -Og impractical runtime-wise. >> >> Got a particular test in mind? > > Nothing specific - there are a few C/C++ benchmarks in SPEC and there's > also tramp3d-v4. I guess SRA is much more important for the abstraction > penalty than DSE - FRE should be able to remove the abstraction, just the > dead stores will remain (but they'd probably nicely execute out-of-order). > > Anyway, the biggest runtime penalty from -Og is probably not running > any loop optimization (invariant motion mostly).
Finally tried it on tramp3d-v4, and I see a slowdown of ~1.6%. Thanks, Richard