[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #20 from Richard Biener --- Fixed.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #19 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:f99d7d669eaa2830eb5878df4da67e77ec791522 commit r13-5106-gf99d7d669eaa2830eb5878df4da67e77ec791522 Author: Richard Biener Date: Mon Jan 9 09:42:22 2023 +0100 tree-optimization/107767 - not profitable switch conversion When the CFG has not merged equal PHI defs in a switch stmt the cost model from switch conversion gets off and we prefer a jump table over branches. The following fixes that by recording cases that will be merged later and more appropriately counting unique values. PR tree-optimization/107767 * tree-cfgcleanup.cc (phi_alternatives_equal): Export. * tree-cfgcleanup.h (phi_alternatives_equal): Declare. * tree-switch-conversion.cc (switch_conversion::collect): Count unique non-default targets accounting for later merging opportunities. * gcc.dg/tree-ssa/pr107767.c: New testcase.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #18 from Richard Biener --- Mine then.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #17 from Martin Liška --- Oh, I didn't notice it's P1 :) Then sure, send it for this release.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P1 --- Comment #16 from Jakub Jelinek --- Why next stage1? We can still fix P1 regressions, don't we?
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #15 from Martin Liška --- @Richi: Please send the patch for switch conversion in the next stage 1.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #14 from Richard Biener --- diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 250782b1140..41f48c30bb9 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -2713,7 +2713,9 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p) bool old_in_switch_expr = gimplify_ctxp->in_switch_expr; gimplify_ctxp->in_switch_expr = true; + gimple_push_condition (); gimplify_stmt (_BODY (switch_expr), _body_seq); + gimple_pop_condition (pre_p); gimplify_ctxp->in_switch_expr = old_in_switch_expr; maybe_warn_switch_unreachable_and_auto_init (switch_body_seq); "properly" adds early return predictors to switch returns and will result in the same pessimization. Cutting off early return predictor generation will make firewall2 produce via if-to-switch : dst_port_5 = MEM[(const uint16_t *)data_3(D) + 64B]; switch (dst_port_5) [INV], case 1: [INV], case 2: [INV], case 3: [INV], case 15: [INV], case 23: [INV], case 42: [INV], case 45: [INV], case 47: [INV]> : : : # _2 = PHI <1(3), 0(2)> : return _2; and retaining a bit test. Note that after stripping predict hints it takes tail-merging to unify the forwarders, this is not something done by CFG cleanup. That's because in this case all forwarders have '1' as the PHI argument but the single non-forwarder has '0'. CFG cleanup doesn't redirect forwarders to duplicates. The default label doesn't have an early return prediction (the return doesn't happen in conditional context as far as gimplification is concerned). If it had a forwarder as well which forwarder CFG cleanup would remove would be "random". Note this all would still happen too late for the early switch conversion pass. It might be possible to alter the switch conversion heuristics in ::collect to handle empty_block_p forwarders specially, counting the number of forwarders with distinct m_final_bb PHI argument sets. Like with the following. diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc index b4869aee78d..38e40eca164 100644 --- a/gcc/tree-cfgcleanup.cc +++ b/gcc/tree-cfgcleanup.cc @@ -450,7 +450,7 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) those alternatives are equal in each of the PHI nodes, then return true, else return false. */ -static bool +bool phi_alternatives_equal (basic_block dest, edge e1, edge e2) { int n1 = e1->dest_idx; diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc index 1d75d7c7fc7..6d2889f6c5a 100644 --- a/gcc/tree-switch-conversion.cc +++ b/gcc/tree-switch-conversion.cc @@ -69,6 +69,9 @@ switch_conversion::switch_conversion (): m_final_bb (NULL), { } +bool +phi_alternatives_equal (basic_block dest, edge e1, edge e2); + /* Collection information about SWTCH statement. */ void @@ -132,6 +135,8 @@ switch_conversion::collect (gswitch *swtch) /* Require that all switch destinations are either that common FINAL_BB or a forwarder to it, except for the default case if contiguous range. */ + auto_vec fw_edges; + m_uniq = 0; if (m_final_bb) FOR_EACH_EDGE (e, ei, m_switch_bb->succs) { @@ -141,7 +146,26 @@ switch_conversion::collect (gswitch *swtch) if (single_pred_p (e->dest) && single_succ_p (e->dest) && single_succ (e->dest) == m_final_bb) - continue; + { + if (empty_block_p (e->dest)) + { + /* For empty blocks consider forwarders with equal + PHI arguments in m_final_bb as unique. */ + for (unsigned i = 0; i < fw_edges.length (); ++i) + if (phi_alternatives_equal (m_final_bb, fw_edges[i], e)) + break; + if (i == fw_edges.length ()) + { + /* But limit the above possibly quadratic search. */ + if (fw_edges.length () < 10) + fw_edges.quick_push (e); + m_uniq++; + } + } + else + m_uniq++; + continue; + } if (e == e_default && m_contiguous_range) { @@ -168,11 +192,6 @@ switch_conversion::collect (gswitch *swtch) && ! tree_int_cst_equal (CASE_LOW (elt), CASE_HIGH (elt))) m_count++; } - - /* Get the number of unique non-default targets out of the GIMPLE_SWITCH - block. Assume a CFG cleanup would have already removed degenerate - switch statements, this allows us to just use EDGE_COUNT. */ - m_uniq = EDGE_COUNT (gimple_bb (swtch)->succs) - 1; } /* Checks whether the range given by individual case statements of the switch
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #13 from Martin Liška --- > As a hack I've removed them manually: > FOR_EACH_BB_FN (bb, cfun) > { > gimple_stmt_iterator gsi = gsi_after_labels (bb); > if (!gsi_end_p (gsi) && gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT) > gsi_remove (, true); > } > in pass_if_to_switch::execute before return TODO_cleanup_cfg;, but that > didn't help. @Richi: Can you please take a look why TODO_cleanup_cfg doesn't help us here?
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #12 from Jakub Jelinek --- (In reply to Richard Biener from comment #10) > Looks like a general CFG optimization if we have N forwarders to a PHI with > the same value then we can merge them to one (it's enough to have a single > forwarder which we can use as the remaining one even). Note that's kind-of > a reverse > mergephi for same values. > > PHI > > -> > > forwarder > | > PHI > > CFG cleanup will then eventually elide forwarders into the added forwarder. Are those actually forwarder blocks though? Doesn't the GIMPLE_PREDICT statement at the start of each one of them prevent tree_forwarder_block_p from returning true? As a hack I've removed them manually: FOR_EACH_BB_FN (bb, cfun) { gimple_stmt_iterator gsi = gsi_after_labels (bb); if (!gsi_end_p (gsi) && gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT) gsi_remove (, true); } in pass_if_to_switch::execute before return TODO_cleanup_cfg;, but that didn't help.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #11 from Jakub Jelinek --- (In reply to Martin Liška from comment #7) > However, the CFG is not collapsed and thus it fails due to: > > bool > bit_test_cluster::is_beneficial (unsigned count, unsigned uniq) > { > return (((uniq == 1 && count >= 3) > || (uniq == 2 && count >= 5) > || (uniq == 3 && count >= 6))); > } > > as count == 7. and so tree-switch-conversion happens. So one can mitigate > that with: > 1) use switch statement instead of if series > 2) reduce -param=switch-conversion-max-branch-ratio= that will not create so > big CSWTCH array > 3) disable tree-switch-conversion pass But that just means that either iftoswitch should happen earlier or switchconv later so that some other pass would remove the redundancies, or iftoswitch should perform them when it optimizes something, or switchconv should not rely on the collapsing being done already and do it itself. In the #c9 testcase, it is cleanup_tree_cfg (TODO_update_ssa) during ccp1 pass that optimizes it. But cleanup_tree_cfg (0) during iftoswitch for firewall2 doesn't do that for some reason.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #10 from Richard Biener --- Looks like a general CFG optimization if we have N forwarders to a PHI with the same value then we can merge them to one (it's enough to have a single forwarder which we can use as the remaining one even). Note that's kind-of a reverse mergephi for same values. PHI -> forwarder | PHI CFG cleanup will then eventually elide forwarders into the added forwarder.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #9 from Jakub Jelinek --- Indeed, comparing that to int firewall3(const uint8_t *restrict data) { const uint16_t dst_port = *((const uint16_t *)data + 32); switch (dst_port) { case 15: return 1; case 23: return 1; case 47: return 1; case 45: return 1; case 42: return 1; case 1: return 1; case 2: return 1; case 3: return 1; default: return 0; } } this one is optimized correctly. In iftoswitch dump this one is much simpler though: dst_port_6 = MEM[(const uint16_t *)data_4(D) + 64B]; switch (dst_port_6) [INV], case 1 ... 3: [INV], case 15: [INV], case 23: [INV], case 42: [INV], case 45: [INV], case 47: [INV]> : : : # _3 = PHI <1(2), 0(3)> : return _3; vs. dst_port_5 = MEM[(const uint16_t *)data_3(D) + 64B]; switch (dst_port_5) [INV], case 1: [INV], case 2: [INV], case 3: [INV], case 15: [INV], case 23: [INV], case 42: [INV], case 45: [INV], case 47: [INV]> : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. : # _2 = PHI <1(4), 1(5), 1(3), 1(10), 1(9), 1(8), 1(7), 1(6), 0(2)> : return _2; but I guess switchconv should be able to handle both the same.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #8 from Коренберг Марк --- Okay, but why switch-case is not handled using fast implementation using masks (when difference between smallest and biggest integer <=64 ? See the first function in my first message where it works as expected. Seems, the problem is not in converting to switch-case, but missing optimisation for switch-case case.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 Martin Liška changed: What|Removed |Added Assignee|marxin at gcc dot gnu.org |unassigned at gcc dot gnu.org Status|ASSIGNED|NEW --- Comment #7 from Martin Liška --- So what happens with the current master: we first convert the if-else-if series to switch in iftoswitch pass: dst_port_5 = MEM[(const uint16_t *)data_3(D) + 64B]; switch (dst_port_5) [INV], case 1: [INV], case 2: [INV], case 3: [INV], case 15: [INV], case 23: [INV], case 42: [INV], case 45: [INV], case 47: [INV]> : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. goto ; [INV] : : // predicted unlikely by early return (on trees) predictor. : # _2 = PHI <1(4), 1(5), 1(3), 1(10), 1(9), 1(8), 1(7), 1(6), 0(2)> then convert tree-switch-conversion which prefers bit test if possible. However, the CFG is not collapsed and thus it fails due to: bool bit_test_cluster::is_beneficial (unsigned count, unsigned uniq) { return (((uniq == 1 && count >= 3) || (uniq == 2 && count >= 5) || (uniq == 3 && count >= 6))); } as count == 7. and so tree-switch-conversion happens. So one can mitigate that with: 1) use switch statement instead of if series 2) reduce -param=switch-conversion-max-branch-ratio= that will not create so big CSWTCH array 3) disable tree-switch-conversion pass
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #6 from Martin Liška --- > this code should not be treated as if-else-if-else-if. Why? It's an equivalent representation as all ifs have a return statement. One another equivalent representation is: switch (dst_port) { case 15: case 23: case 47: case 45: case 42: case 1: case 2: case 3: return 1; default: return 0; }
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #5 from Коренберг Марк --- Not only -s problem. I think -O3 in gcc 12.2 will run faster than -O3 in gcc 13 (for this case). this code should not be treated as if-else-if-else-if. gcc 12 does its job right.
[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767 --- Comment #4 from Martin Liška --- So the optimization is enabled after r13-1184-g57424087e82db140 where we newly convert if-else-if series to switch, that's later converted by switch conversion pass to the array lookup. Now the question is why the CSWITCH happens with -Os.