Re: [RS6000] Fix PR61098, Poor code setting count register

David Edelsohn Wed, 14 May 2014 14:28:02 -0700

On Wed, May 14, 2014 at 5:56 AM, Alan Modra <amo...@gmail.com> wrote:


>> I seem to remember problems in the past with late creation of TOC
>> entries for constants causing problems, so it was easier to fall back
>> to materializing all integer constants inline. I don't remember the
>> PRs, but I think there were issues with creating a TOC if the late
>> constant were the only TOC reference, or maybe the issue was buying a
>> stack frame to materialize the TOC/GOT for a late constant.  And
>> maximum 5 instruction sequence is not really bad relative to a load
>> from the TOC (even with medium model / data in TOC). There are a lot
>> of trade-offs with respect to I$ expansion versus the load hitting in
>> the L1 D$.
>
> Sure, but Steve will tell you that the 5 instruction sequence is both
> slower due to all the dependent ops, and results in larger code+data
> size.  We definitely want to avoid it if possible, and pr67836 shows a
> case taken from glibc math library code where there should be no
> problem in using the TOC.

I don't necessarily believe this is a win overall.  If the constant
reliably is in the L1 D$ (or maybe L2 D$) and accessed with a direct
load (data in TOC or medium model), then yes. If it's farther away in
the memory hierarchy, then it's not a win. I agree about the code
expansion concern, which has its own secondary effects.

If this is a constant in a tight loop, okay, but if it's a unique
constant, it may not occur elsewhere in the code to be shared and may
not be placed in the same cache line as other, recently accessed
constants. This would push the load to L3 or farther.

Also, remember that this same heuristic is used by AIX, which still
defaults to small TOC model. So either the constant is in the TOC
anchor constant pool, which hopefully will pre-load the anchor, or
will be a constant in the TOC, possibly putting more pressure on TOC
size and causing overflow.

I am certain that there are anecdotal examples where it is a win for
PPC64 Linux, but I would want more evidence that it's a general win.

>> alpha_emit_set_long_const() always will materialize the constant and
>> does not check for a maximum number of instructions. This is why it's
>> comment says "fall back to straight forward decomposition".

> No, that is wrong.  alpha_emit_set_const does *not* always try to
> materialize the constant inline.  It does so for constants that need
> more than three instructions only when TARGET_BUILD_CONSTANTS.

I said that alpha_emit_set_long_const() always materializes the
constant, but, as you say, it is not always called.
alpha_emit_set_const() may fail if it requires too many instructions
or the search depth is too deep. You seem to be referring to some of
the logic in alpha_split_const_mov() as well.

Again, this definitely is worth exploring. And I am confident that
there are cases where loading the constant from memory is a win. I
just don't have a good instinct if it is a win most of the time for a
broad range of real-world applications. One optimization opportunity
in GLIBC is not a general heuristic. I don't think that we know a lot
about the context of the use of the constant to apply a finer-grained
policy.

I think the original code tried to put the constant in memory if it
appeared before reload, when everything could be calculated correctly
for prologue and materializing the TOC, but tried to materialze any
constants that appeared during reload using splitters.  That can avoid
some of the problem corner cases. The code needs to handle

PPC32
PPC64
eABI
AIX

Thanks, David

Re: [RS6000] Fix PR61098, Poor code setting count register

Reply via email to