https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564
--- Comment #17 from Patrick Palka <ppalka at gcc dot gnu.org> --- The following patch by itself closes the gap between the C++ and C FEs, to make compilation with the C++ FE at least as good as with the C FE: diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c index b6e1d27..8b1d020 100644 --- a/gcc/cp/cp-gimplify.c +++ b/gcc/cp/cp-gimplify.c @@ -237,8 +237,8 @@ genericize_cp_loop (tree *stmt_p, location_t start_locus, tree cond, tree body, location_t cloc = EXPR_LOC_OR_LOC (cond, start_locus); exit = build1_loc (cloc, GOTO_EXPR, void_type_node, get_bc_label (bc_break)); - exit = fold_build3_loc (cloc, COND_EXPR, void_type_node, cond, - build_empty_stmt (cloc), exit); + exit = build3_loc (cloc, COND_EXPR, void_type_node, cond, + build_empty_stmt (cloc), exit); } if (exit && cond_is_first) Here, when building the exit condition of a for-loop, fold_build3 decides to swap the operands of the COND_EXPR (and invert its condition), which I suppose causes subsequent optimization passes to generate worse code for these loops. Avoid having this done by using build3 instead. This means that we won't fold degenerate COND_EXPRs but I doubt that's a big deal in practice. (An alternative fix would be to adjust tree_swap_operands_p to avoid returning true in this case -- maybe return false if the TREE_TYPEs are VOID_TYPEs?) I haven't checked -flto or an optimization setting besides -Ofast.