On 8/20/20 6:33 PM, Segher Boessenkool wrote:
Hi!

On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:

In order to do this, the pass that converts the load address and load/store
must occur late in the compilation cycle.
That does not follow afaics.

Let me see if I can help explain this.

I think the issue is that this optimization creates a dependency that isn't directly represented in RTL.  We either have to figure out how to represent it, or we have to do this very late to avoid problems.

Suppose we are at a point where hard registers have been assigned, and the RTL looks like:

    addi  r5,r3,4
    sldi  r6,r5,2
    pld  r10,symbol@got@pcrel
    lwz  r5,0(r10)

Everything is fine for the optimization to take place, since the two instructions are adjacent and therefore we can't have any problems with r10 being redefined in between, or r5 being used. So we stick on the relocation telling the linker to change this if resolved during static link time to:

    addi  r5,r3,4
    sldi  r6,r5,2
    plwz  r5,symbol@pcrel
    nop

Now, suppose after we insert the relocation we get a reordering of instructions such as

    addi  r5,r3,4
    pld  r10,symbol@got@pcrel
    sldi  r6,r5,2
    lwz  r5,0(r10)

When the linker performs the replacement, we will now end up with

    addi  r5,r3,4
    plwz  r5,symbol@pcrel
    sldi  r6,r5,2
    nop

which has altered the semantics of the program.

What is necessary in order to allow this optimization to occur earlier is to make this hidden dependency explicit.  When the relocation is inserted, we have to change the "pld" instruction to have a specific clobber of (in this case) r5, which represents what will happen if the linker makes the substitution.

I agree that it's too fragile to force this to be the last pass, so I think if Mike can look into introducing a clobber of the hard register when performing the optimization, that would at least allow us to move this anywhere after reload.

I don't immediately see a solution that works prior to register allocation because we basically are representing two potential starting points of a live range, only one of which will survive in the final code.  That is too ugly a problem to hand to the register allocator.

Thanks,
Bill

Reply via email to