The constant pool placement that sh_reorg does has somewhat hapazard results. It does not take execution frequencies into account, so if you are unlucky, you can end up with a constant table wedged into some hoit spot of the code, which not only adds an extra jump into the critical path, but can also cause conditional branches
to go out of range, which are then converted into branches around jumps or
branches to jumps.
A related problem is that branch splitting does not take into account the
branch probabilities, and the placement of jumps - and if necessary, constants for
these jumps - is also rather ad-hoc.
Moreover, when you think about optimal placement of individual jumps, it is
only a little step further to think about small basic blocks ending with a jump
which are branched over by a likely taken branch.  Or if the branched-around
code is very seldom executed, we can also consider it even if it falls through at the end to merge with the destination of the branch. On architectures where a
not-taken branch is cheaper than a taken branch, it makes sense to move this
code out of the way - not far away like the large-scale hot/cold partitioning does,
but rather within the range of conditional branches.
For the SH, this means within -252..+258 bytes. That fits nicely with the range of
of 16-bit constants, which are in a range of 4..514 bytes.
32-bit constants can be a bit farther, in the range 4..1024 bytes.

In order to do the constant and jump / cold code placement sensibly, I need to
be able to determine which branches go out of range beause of a particular.
With the current separation of branch shortening and constant table placement,
there are no useful estimates - before constant placements, we estimate that
no single conditional branch is in range. The problem here is that we have to
assume that a maximum sized constant pool of 1020 bytes might be inserted
anywhere, even though in practice most constant pools are much smaller.
The length of the branches themselves is estimated at 6 when it should be 2.

I'e realized now that I can take the code of shorten_branches, add two lookup arrays and a bit of code (which won't change the time complexity of shorten_branches), I can calculate much better estimates for instruction sizes for indeterminate
constant pool placement - simply by keeping track of the maximum amount of
constants that could be inserted into any one place. The informationj calculated
by such a modifed shorten_branches pass would also allow to determine
when a short branch is possible in the absence of a cosntant pool table, and
at what table size inserted it will go out of range.

I wonder now if I should keep this as SH-specific code, or does it make sense to write this a bit more generic - i.e. a variable number of constant ranges, configurable size of small cold blocks, and the range of branches selectable -
and provide this as a new piece of gcc infrastructure.

Do other port maintainers see similar issues with their ports?

Reply via email to