This is really Jim's code, but it's been sitting around in Bugzilla for a while
so I've picked it up.  All I really did here is add a target hook and mangle
some comments, but I think I understand enough about what's going on to try and
get things moving forward.  So I'm writing up a pretty big cover letter to try
and summarize what I think is going on here, as it's definitely not something I
fully understand yet.

We've got a quirk in the RISC-V ABI where DF arguments on rv32 get split into
an X register and a 32-bit aligned stack slot.  The middle-end prologue code
just stores out the X register and treats the argument as if it was entirely
passed on the stack.  This can result in a misaligned load, and those are still
slow on a bunch of RISC-V systems.

This patch set adds a target hook that essentially biases the middle-end the
other way: load the stack part of the argument and then merge it with the
register part via subword moves.  That's essentially handling these via
register-register operations, but for the specific case that trips up as a
misaligned access bug on RISC-V the generated code ends up with more memory
ops.

More specifically, the included test case is essentially

    double foo(..., double split) { return split; }

with the arguments sot up so "split" has 32 bits in a7 (an integer register
used for arguments) and 32 bits on the stack.  The return goes into a
floating-point register, as they're 64 bits on rv32ifd (even when integer
registers are only 32 bits).

Without this patch (and with this patch on targets with fast misaligned
accesses) that generates

        sw      a7,12(sp)
        fld     fa0,12(sp)

and with this patch (on a subtarget with slow misaligned access) ends up as

        lw      a5,16(sp)
        sw      a7,8(sp)
        sw      a5,12(sp)
        fld     fa0,8(sp)

That looks a little odd, but I think it's actually good code -- the only way to
get a double into a register on rv32 is to load it from memory, so without
misaligned loads we're sort of just stuck there.

While playing around writing this cover letter I came up with another case
that's essentially

    long long foo(..., long long split) { return split; }

that used to generate 

        sw      a7,12(sp)
        lw      a0,12(sp)
        lw      a1,16(sp)

and now generates

        lw      a1,0(sp)
        mv      a0,a7

so I do think we've at least got some room for new optimizations here, maybe
even on other targets.

The target hook will need some adjustment, but ultimately I'm not even sure if
a target hook is the way to go here.  It was just an easy way to flip the
behavior so I could play around with some of Jim's code.  It kind of feels like
the load/subword merge version would result in better code in general, but I'm
not sure on that one.

That said, I figured I'd just send it out so others could see this.  It's very
much out of my wheel house, so I'd be shocked if this doesn't cause any
failures...


Reply via email to