Re: [PATCH] Loop unswitching: support gswitch statements.

Martin Liška Thu, 06 Jan 2022 08:31:07 -0800

On 1/5/22 13:34, Richard Biener wrote:

On Thu, Dec 9, 2021 at 2:02 PM Martin Liška <[email protected]> wrote:


On 11/30/21 12:17, Richard Biener wrote:

I'd like to see the gswitch support - that's what was posted before stage3
close, this patch on its own doesn't seem worth pushing for.  That said,
I have some comments below (and the already raised ones about how
things might need to change with gswitch support).  Is it so difficult to
develop gswitch support as a separate change ontop of this?


Hello.

Took me some time, but I have a working version of the patch that makes both
refactoring of the costing model and adds support for gswitch. For quite some
time, I maintained 2 patches, but as commonly happens, I was forced doing quite
some rework. If we really want a separation for bisection purpose, I suggest 
simple
disabling of gswitch support?


It was really meant to ease review.


Sure, but keeping a separation if the final shape is changing is difficult :P

I'm now looking at the combined
patch, comments will
follow.

+static void
+clean_up_after_unswitching (const auto_edge_flag &ignored_edge_flag)
+{
+  basic_block bb;
+
...
+  /* Clean up the ignored_edge_flag from edges.  */
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      edge e;

you can probably clean up outgoing edge flags in the loop above?


Done.


(I'm randomly jumping)

+             /* Build compound expression for all cases leading
+                to this edge.  */
+             tree expr = NULL_TREE;
+             for (unsigned i = 1; i < gimple_switch_num_labels (stmt); ++i)
+               {
+                 tree lab = gimple_switch_label (stmt, i);
+                 basic_block dest = label_to_block (cfun, CASE_LABEL (lab));
+                 edge e2 = find_edge (gimple_bb (stmt), dest);
+                 if (e == e2)

just as a style note I prefer if (e != e2) continue; so the following code
needs less indentation


Likewise.


+                         tree cmp1 = build2 (GE_EXPR, boolean_type_node, idx,
+                                             CASE_LOW (lab));

is there a reason you are not using fold_build2?


Changed.

Do we want to somehow account
for the built expression size or maybe have a separate limit for the number of
cases we want to combine this way?


Huh, I would ignore it for now, but yes, can be accounted as well.


+                 unswitch_predicate *predicate
+                   = new unswitch_predicate (expr, idx, edge_index);
+                 ranger->gori ().outgoing_edge_range_p
(predicate->true_range, e,
+                                                        idx,
*get_global_range_query ());
+                 /* Huge switches are not supported by Ranger.  */
+                 if (predicate->true_range.undefined_p ())

I hope ranger will set the range to varying_p () in that case, not
undefined?


... it's been discussed in a different sub-thread.

But even
then, is that a reason for punting?  I guess we fail to prune cases in
that case but
the cost modeling should then account for those and thus we are at
least consistent?


Yes, but I think the situation (huge switches) will be rarely an interesting
optimization opportunity. Plus we would have to find a hot case in such switch.
That's currently ignored by the pass.


+                   {
+                     delete predicate;
+                     return;

In this context, when we do

+      if (operand_equal_p (predicate->lhs, last_predicate->lhs, 0))
+       {
+         irange &other = (true_edge ? predicate->merged_true_range
+                          : predicate->merged_false_range);
+         last_predicate->merged_true_range.intersect (other);
+         last_predicate->merged_false_range.intersect (other);
+         return;

ranger may be conservative when intersecting (and hopefully not end up
with undefined_p - heh).  I also am confused about intersecting both
merged_true_range and merged_false_range with 'other'?


other is a predicate for a SSA_NAME _x_ that we already switched on. And thus
we can improve both predicates for true/false edge that are also related to
the same SSA_NAME _x_.

I would
have expected to merge true edge info with true edge info and thus
only "swap" things somehow?  OTOH "path" suggests we're dealing
with more than one edge and associated ranges?


Yep, path means the we already did a series of unswitching like:

- if (a > 10) TRUE
- if (b == 10) FALSE
- if (c > 0) TRUE

Maybe expanding
the comment on the predicate_vector would be useful.  AFAIR we


Sure, will do that.

there store the sequence of unswitchings done with pairs of
the predicate unswitched on and a bool indicating whether we're
dealing with the copy that had the predicate true or the one that
had it false.


Exactly.


+  unswitch_predicate *predicate = NULL;
+  basic_block bb = NULL;
+  if (num > param_max_unswitch_level)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                        "Not unswitching anymore, hit max level\n");
+      goto exit;
+    }

I'll notice that given we have the overall size budget limiting the number
of unswitchings itself is probably unnecessary (as noted above we might
need to account for the size of the unswitch condition).


Removed.


+  for (unsigned i = 0; i != loop->num_nodes; i++)
+    {
+      vec<unswitch_predicate *> &predicates = get_predicates_for_bb (bbs[i]);
+      if (!predicates.is_empty ())
+       {

same comment about indenting


Done.


I wonder whether evaluate_control_stmt_using_entry_checks could set
ignored_edge_flag itself instead of communicating via a hash_set?


Note the function is used in 2 contexts (costing and simplification) and for
the costing, we modify edges as we want to evaluate all opportunities in 
parallel.


It's not exactly clear what we use pred->handled for, do we re-discover
and re-try predicates when unswitching another level otherwise?


We discover them only once and reuse them via gimple_uid in loop versions.
What's tricky is that switch predicates can be unswitched multiple times
and that's why we can't remove them.

 But
we _do_ want to unswitch the same switch stmt again, no?


Yes, we do.

And since
the BB predicate is keyed on the switch stmt that wouldn't work?
I wonder whether on recursive invocations of tree_unswitch_single_loop
we'd want to skip over unreachable BBs in the loop when looking
for candidates (when we compute BB predicates we skip over them
but that's done before all cases).


As mentioned, they are discivered only once before we start unswitching.

Ah, the BB predicates are a vector,
I wonder why we not simply remove the predicate from the BB predicate
vector when we decide to unswitch on it and thus append it to a predicate
path?


As explained above, it's because switches can have >2 edges.


Can you please amend the toplevel file comment as to the new flow of
things?


Sure, will do that and that should really help.


For the switch unswitching testcases I wonder if we can add dump scanning
to verify we do not end up with duplicate cases across the loop versions?


Yep, will do that.


Overall it looks reasonable but I hope for some simplification still.  I also
think it's something for stage1 now :/


Fully agree, that's stage1 material.


After addressing comments above can you put the patch on a branch
so I can play & fixup things a bit when I run into them?  Often it's easier
to just do a change instead of writing up a review comment and expecting
that to be point-on ;)


I really welcome that, I've pushed devel/loop-unswitch-support-switches
branch with first changes you pointed out. Feel free playing with the branch.

Thanks,
Martin


Thanks,
Richard.


Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
SPEC 2006 and 2017 survive running -O3 -march=native. There are 2090 
optimization notes
for SPEC 2006 (compared to 1945 before the patch).

Thoughts?
Thanks,
Martin

Re: [PATCH] Loop unswitching: support gswitch statements.

Reply via email to