[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from Richard Biener  ---
Fixed.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:f99d7d669eaa2830eb5878df4da67e77ec791522

commit r13-5106-gf99d7d669eaa2830eb5878df4da67e77ec791522
Author: Richard Biener 
Date:   Mon Jan 9 09:42:22 2023 +0100

tree-optimization/107767 - not profitable switch conversion

When the CFG has not merged equal PHI defs in a switch stmt the
cost model from switch conversion gets off and we prefer a
jump table over branches.  The following fixes that by recording
cases that will be merged later and more appropriately counting
unique values.

PR tree-optimization/107767
* tree-cfgcleanup.cc (phi_alternatives_equal): Export.
* tree-cfgcleanup.h (phi_alternatives_equal): Declare.
* tree-switch-conversion.cc (switch_conversion::collect):
Count unique non-default targets accounting for later
merging opportunities.

* gcc.dg/tree-ssa/pr107767.c: New testcase.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2023-01-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #18 from Richard Biener  ---
Mine then.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2023-01-06 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #17 from Martin Liška  ---
Oh, I didn't notice it's P1 :) Then sure, send it for this release.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2023-01-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1

--- Comment #16 from Jakub Jelinek  ---
Why next stage1?  We can still fix P1 regressions, don't we?

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-23 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #15 from Martin Liška  ---
@Richi: Please send the patch for switch conversion in the next stage 1.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-21 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #14 from Richard Biener  ---
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 250782b1140..41f48c30bb9 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -2713,7 +2713,9 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
   bool old_in_switch_expr = gimplify_ctxp->in_switch_expr;
   gimplify_ctxp->in_switch_expr = true;

+  gimple_push_condition ();
   gimplify_stmt (_BODY (switch_expr), _body_seq);
+  gimple_pop_condition (pre_p);

   gimplify_ctxp->in_switch_expr = old_in_switch_expr;
   maybe_warn_switch_unreachable_and_auto_init (switch_body_seq);

"properly" adds early return predictors to switch returns and will result in
the same pessimization.  Cutting off early return predictor generation will
make firewall2 produce via if-to-switch

   :
  dst_port_5 = MEM[(const uint16_t *)data_3(D) + 64B];
  switch (dst_port_5)  [INV], case 1:  [INV], case 2: 
[INV], case 3:  [INV], case 15:  [INV], case 23:  [INV], case
42:  [INV], case 45:  [INV], case 47:  [INV]>

   :
:

   :
  # _2 = PHI <1(3), 0(2)>
:
  return _2;

and retaining a bit test.

Note that after stripping predict hints it takes tail-merging to unify
the forwarders, this is not something done by CFG cleanup.  That's
because in this case all forwarders have '1' as the PHI argument but
the single non-forwarder has '0'.  CFG cleanup doesn't redirect
forwarders to duplicates.  The default label doesn't have an
early return prediction (the return doesn't happen in conditional
context as far as gimplification is concerned).  If it had a forwarder
as well which forwarder CFG cleanup would remove would be "random".

Note this all would still happen too late for the early switch conversion
pass.

It might be possible to alter the switch conversion heuristics in ::collect
to handle empty_block_p forwarders specially, counting the number of
forwarders with distinct m_final_bb PHI argument sets.  Like with the
following.

diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
index b4869aee78d..38e40eca164 100644
--- a/gcc/tree-cfgcleanup.cc
+++ b/gcc/tree-cfgcleanup.cc
@@ -450,7 +450,7 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
those alternatives are equal in each of the PHI nodes, then return
true, else return false.  */

-static bool
+bool
 phi_alternatives_equal (basic_block dest, edge e1, edge e2)
 {
   int n1 = e1->dest_idx;
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 1d75d7c7fc7..6d2889f6c5a 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -69,6 +69,9 @@ switch_conversion::switch_conversion (): m_final_bb (NULL),
 {
 }

+bool
+phi_alternatives_equal (basic_block dest, edge e1, edge e2);
+
 /* Collection information about SWTCH statement.  */

 void
@@ -132,6 +135,8 @@ switch_conversion::collect (gswitch *swtch)
   /* Require that all switch destinations are either that common
  FINAL_BB or a forwarder to it, except for the default
  case if contiguous range.  */
+  auto_vec fw_edges;
+  m_uniq = 0;
   if (m_final_bb)
 FOR_EACH_EDGE (e, ei, m_switch_bb->succs)
   {
@@ -141,7 +146,26 @@ switch_conversion::collect (gswitch *swtch)
if (single_pred_p (e->dest)
&& single_succ_p (e->dest)
&& single_succ (e->dest) == m_final_bb)
- continue;
+ {
+   if (empty_block_p (e->dest))
+ {
+   /* For empty blocks consider forwarders with equal
+  PHI arguments in m_final_bb as unique.  */
+   for (unsigned i = 0; i < fw_edges.length (); ++i)
+ if (phi_alternatives_equal (m_final_bb, fw_edges[i], e))
+   break;
+   if (i == fw_edges.length ())
+ {
+   /* But limit the above possibly quadratic search.  */
+   if (fw_edges.length () < 10)
+ fw_edges.quick_push (e);
+   m_uniq++;
+ }
+ }
+   else
+ m_uniq++;
+   continue;
+ }

if (e == e_default && m_contiguous_range)
  {
@@ -168,11 +192,6 @@ switch_conversion::collect (gswitch *swtch)
  && ! tree_int_cst_equal (CASE_LOW (elt), CASE_HIGH (elt)))
m_count++;
 }
-
-  /* Get the number of unique non-default targets out of the GIMPLE_SWITCH
- block.  Assume a CFG cleanup would have already removed degenerate
- switch statements, this allows us to just use EDGE_COUNT.  */
-  m_uniq = EDGE_COUNT (gimple_bb (swtch)->succs) - 1;
 }

 /* Checks whether the range given by individual case statements of the switch

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #13 from Martin Liška  ---
> As a hack I've removed them manually:
> FOR_EACH_BB_FN (bb, cfun)
> {
> gimple_stmt_iterator gsi = gsi_after_labels (bb);
> if (!gsi_end_p (gsi) && gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT)
> gsi_remove (, true);
> }
> in pass_if_to_switch::execute before return TODO_cleanup_cfg;, but that
> didn't help.

@Richi: Can you please take a look why TODO_cleanup_cfg doesn't help us here?

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #12 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #10)
> Looks like a general CFG optimization if we have N forwarders to a PHI with
> the same value then we can merge them to one (it's enough to have a single
> forwarder which we can use as the remaining one even).  Note that's kind-of
> a reverse
> mergephi for same values.
> 
>  PHI 
> 
> ->
> 
>  forwarder
> |
>  PHI 
> 
> CFG cleanup will then eventually elide forwarders into the added forwarder.

Are those actually forwarder blocks though?
Doesn't the GIMPLE_PREDICT statement at the start of each one of them prevent
tree_forwarder_block_p from returning true?
As a hack I've removed them manually:
FOR_EACH_BB_FN (bb, cfun)
{
gimple_stmt_iterator gsi = gsi_after_labels (bb);
if (!gsi_end_p (gsi) && gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT)
gsi_remove (, true);
}
in pass_if_to_switch::execute before return TODO_cleanup_cfg;, but that didn't
help.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #11 from Jakub Jelinek  ---
(In reply to Martin Liška from comment #7)
> However, the CFG is not collapsed and thus it fails due to:
> 
> bool
> bit_test_cluster::is_beneficial (unsigned count, unsigned uniq)
> {
>   return (((uniq == 1 && count >= 3)
>  || (uniq == 2 && count >= 5)
>  || (uniq == 3 && count >= 6)));
> }
> 
> as count == 7. and so tree-switch-conversion happens. So one can mitigate
> that with:
> 1) use switch statement instead of if series
> 2) reduce -param=switch-conversion-max-branch-ratio= that will not create so
> big CSWTCH array
> 3) disable tree-switch-conversion pass

But that just means that either iftoswitch should happen earlier or switchconv
later
so that some other pass would remove the redundancies, or iftoswitch should
perform them when it optimizes something, or switchconv should not rely on the
collapsing being done already and do it itself.
In the #c9 testcase, it is cleanup_tree_cfg (TODO_update_ssa) during ccp1 pass
that optimizes it.  But cleanup_tree_cfg (0) during iftoswitch for firewall2
doesn't do that for some reason.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #10 from Richard Biener  ---
Looks like a general CFG optimization if we have N forwarders to a PHI with the
same value then we can merge them to one (it's enough to have a single
forwarder which we can use as the remaining one even).  Note that's kind-of a
reverse
mergephi for same values.

 PHI 

->

 forwarder
|
 PHI 

CFG cleanup will then eventually elide forwarders into the added forwarder.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #9 from Jakub Jelinek  ---
Indeed, comparing that to
int firewall3(const uint8_t *restrict data) {
const uint16_t dst_port = *((const uint16_t *)data + 32);

switch (dst_port) {
case 15: return 1;
case 23: return 1;
case 47: return 1;
case 45: return 1;
case 42: return 1;
case 1: return 1;
case 2: return 1;
case 3: return 1;
default: return 0;
}
}
this one is optimized correctly.
In iftoswitch dump this one is much simpler though:
  dst_port_6 = MEM[(const uint16_t *)data_4(D) + 64B];
  switch (dst_port_6)  [INV], case 1 ... 3:  [INV], case 15:
 [INV], case 23:  [INV], case 42:  [INV], case 45:  [INV],
case 47:  [INV]>

   :
:

   :
  # _3 = PHI <1(2), 0(3)>
:
  return _3;
vs.
  dst_port_5 = MEM[(const uint16_t *)data_3(D) + 64B];
  switch (dst_port_5)  [INV], case 1:  [INV], case 2: 
[INV], case 3:  [INV], case 15:  [INV], case 23:  [INV], case
42:  [INV], case 45:
  [INV], case 47:  [INV]>

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.

   :
  # _2 = PHI <1(4), 1(5), 1(3), 1(10), 1(9), 1(8), 1(7), 1(6), 0(2)>
:
  return _2;

but I guess switchconv should be able to handle both the same.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread socketpair at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #8 from Коренберг Марк  ---
Okay, but why switch-case is not handled using fast implementation using masks
(when difference between smallest and biggest integer <=64 ?

See the first function in my first message where it works as expected.

Seems, the problem is not in converting to switch-case, but missing
optimisation for switch-case case.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

Martin Liška  changed:

   What|Removed |Added

   Assignee|marxin at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org
 Status|ASSIGNED|NEW

--- Comment #7 from Martin Liška  ---
So what happens with the current master: we first convert the if-else-if series
to switch in iftoswitch pass:

  dst_port_5 = MEM[(const uint16_t *)data_3(D) + 64B];
  switch (dst_port_5)  [INV], case 1:  [INV], case 2: 
[INV], case 3:  [INV], case 15:  [INV], case 23:  [INV], case
42:  [INV], case 45:  [INV], case 47:  [INV]>

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

   :
:
  // predicted unlikely by early return (on trees) predictor.

   :
  # _2 = PHI <1(4), 1(5), 1(3), 1(10), 1(9), 1(8), 1(7), 1(6), 0(2)>

then convert tree-switch-conversion which prefers bit test if possible.
However, the CFG is not collapsed and thus it fails due to:

bool
bit_test_cluster::is_beneficial (unsigned count, unsigned uniq)
{
  return (((uniq == 1 && count >= 3)
   || (uniq == 2 && count >= 5)
   || (uniq == 3 && count >= 6)));
}

as count == 7. and so tree-switch-conversion happens. So one can mitigate that
with:
1) use switch statement instead of if series
2) reduce -param=switch-conversion-max-branch-ratio= that will not create so
big CSWTCH array
3) disable tree-switch-conversion pass

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #6 from Martin Liška  ---
> this code should not be treated as if-else-if-else-if.

Why? It's an equivalent representation as all ifs have a return statement. One
another equivalent representation is:

switch (dst_port)
{
  case 15:
  case 23:
  case 47:
  case 45:
  case 42:
  case 1:
  case 2:
  case 3:
return 1;
  default:
return 0;
}

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread socketpair at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #5 from Коренберг Марк  ---
Not only -s problem. I think -O3 in gcc 12.2 will run faster than -O3 in gcc 13
(for this case). this code should not be treated as if-else-if-else-if. gcc 12
does its job right.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2022-12-02 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #4 from Martin Liška  ---
So the optimization is enabled after r13-1184-g57424087e82db140 where we newly
convert if-else-if series to switch, that's later converted by switch
conversion pass to the array lookup.

Now the question is why the CSWITCH happens with -Os.