On Thu, Sep 10, 2020 at 12:08:44PM +0200, Richard Biener wrote:
> On Wed, Sep 9, 2020 at 6:03 PM Segher Boessenkool
> <seg...@kernel.crashing.org> wrote:
> > There often are problems over function calls (where the compiler cannot
> > usually *see* how something is used).
> 
> Yep.  The best way would be to use small loads and larger stores
> which is what CPUs usually tend to handle fine (with alignment
> constraints, etc.).  Of course that's not what either of the "solutions"
> can do.

Yes, and yes.

> That said, since you seem to be "first" in having an instruction
> to insert into a vector at a variable position the idea that we'd
> have to spill anyway for this to be expanded and thus we expand
> the vector to a stack location in the first place falls down.  And
> that's where I'd first try to improve things.
> 
> So what can the CPU actually do?

Both immediate and variable inserts, of 1, 2, 4, or 8 bytes.  The
inserted part is not allowed to cross the 16B boundary (all aligned
stuff never has that problem).  Variable inserts look at only the low
bits of the GPR that says where to insert (4 bits for bytes, 3 bits
for halfs, etc.)


Segher

Reply via email to