On 11/01/2016 10:27 AM, Segher Boessenkool wrote:
For code like the testcase in PR71785 GCC factors all the indirect branches
to a single dispatcher that then everything jumps to.  This is because
having many indirect branches with each many jump targets does not scale
in large parts of the compiler.  Very late in the pass pipeline (right
before peephole2) the indirect branches are then unfactored again, by
the duplicate_computed_gotos pass.

This pass works by replacing branches to such a common dispatcher by a
copy of the dispatcher.  For code like this testcase this does not work
so well: most cases do a single addition instruction right before the
dispatcher, but not all, and we end up with only two indirect jumps: the
one without the addition, and the one with the addition in its own basic
block, and now everything else jumps _there_.

This patch solves this problem by simply running the core of the
duplicate_computed_gotos pass again, as long as it does any work.  The
patch looks much bigger than it is, because I factored out two routines
to simplify the control flow.

Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version
of the testcase that has 2000 cases instead of 4.  Is this okay for trunk?


Segher


2016-10-30  Segher Boessenkool  <seg...@kernel.crashing.org>

        PR rtl-optimization/71785
        * bb-reorder.c (duplicate_computed_gotos_find_candidates): New
        function, factored out from pass_duplicate_computed_gotos::execute.
        (duplicate_computed_gotos_do_duplicate): Ditto.  Don't use BB_VISITED.
        (pass_duplicate_computed_gotos::execute): Rewrite.  Rerun the pass as
        long as it makes changes.
OK. I'm just going to note for the record here that while we iterate until nothing changes, the statement and block clamps should in practice ensure we hit a point where nothing changes.

Ideally I'd like to see testcases with this kind of change. It should be standard operating procedure at this point.

Jeff


Reply via email to