RFD: Integrate shorten_branches, machine-dependent constant pool placement and small-scale hot/cold partitioning

Joern RENNECKE Sun, 14 May 2006 09:09:49 -0700

The constant pool placement that sh_reorg does has somewhat hapazardresults.It does not take execution frequencies into account, so if you areunlucky, youcan end up with a constant table wedged into some hoit spot of the code,which notonly adds an extra jump into the critical path, but can also causeconditional branches

to go out of range, which are then converted into branches around jumps or
branches to jumps.
A related problem is that branch splitting does not take into account the

branch probabilities, and the placement of jumps - and if necessary,constants for

these jumps - is also rather ad-hoc.
Moreover, when you think about optimal placement of individual jumps, it is

only a little step further to think about small basic blocks ending witha jump

which are branched over by a likely taken branch.  Or if the branched-around

code is very seldom executed, we can also consider it even if it fallsthroughat the end to merge with the destination of the branch. On architectureswhere a

not-taken branch is cheaper than a taken branch, it makes sense to move this

code out of the way - not far away like the large-scale hot/coldpartitioning does,

but rather within the range of conditional branches.

For the SH, this means within -252..+258 bytes. That fits nicely withthe range of

of 16-bit constants, which are in a range of 4..514 bytes.
32-bit constants can be a bit farther, in the range 4..1024 bytes.

In order to do the constant and jump / cold code placement sensibly, Ineed to

be able to determine which branches go out of range beause of a particular.

With the current separation of branch shortening and constant tableplacement,

there are no useful estimates - before constant placements, we estimate that

no single conditional branch is in range. The problem here is that wehave to

assume that a maximum sized constant pool of 1020 bytes might be inserted
anywhere, even though in practice most constant pools are much smaller.
The length of the branches themselves is estimated at 6 when it should be 2.

I'e realized now that I can take the code of shorten_branches, add twolookup arraysand a bit of code (which won't change the time complexity ofshorten_branches),I can calculate much better estimates for instruction sizes forindeterminate

constant pool placement - simply by keeping track of the maximum amount of

constants that could be inserted into any one place. The informationjcalculated

by such a modifed shorten_branches pass would also allow to determine
when a short branch is possible in the absence of a cosntant pool table, and
at what table size inserted it will go out of range.

I wonder now if I should keep this as SH-specific code, or does it makesenseto write this a bit more generic - i.e. a variable number of constantranges,configurable size of small cold blocks, and the range of branchesselectable -

and provide this as a new piece of gcc infrastructure.

Do other port maintainers see similar issues with their ports?

RFD: Integrate shorten_branches, machine-dependent constant pool placement and small-scale hot/cold partitioning

Reply via email to