On Tue, Mar 05, 2024 at 09:27:22AM +0100, Richard Biener wrote:
> On Tue, 5 Mar 2024, Jakub Jelinek wrote:
> > The following patch adds support for BIT_FIELD_REF lowering with
> > large/huge _BitInt lhs.  BIT_FIELD_REF requires mode argument first
> > operand, so the operand shouldn't be any huge _BitInt.
> > If we only access limbs from inside of BIT_FIELD_REF using constant
> > indexes, we can just create a new BIT_FIELD_REF to extract the limb,
> > but if we need to use variable index in a loop, I'm afraid we need
> > to spill it into memory, which is what the following patch does.
> 
> :/
> 
> If it's only ever "small" _BitInt and we'd want to optimize we could
> fully unroll the loop at code generation time and thus avoid the
> variable indices?  You could also lower the BIT_FIELD_REF to
> variable shifts & masking I suppose.

Not really sure if one can have some of the SVE/RISCV modes in there,
that couldn't be small anymore.  But otherwise yes, likely right now at most
64 byte vectors aka 512 bits.  Now, if it is say extraction of _BitInt(448)
out of it (so that it isn't just VCE instead), that would still mean
e.g. on ia32 unrolling the loop with 7 iterations handling 2 limbs each.
14 is already huge I'm afraid especially when it can be hidden somewhere in
the middle of a large expression which is all mergeable.
But more importantly, currently there are simple rules, large _BitInt
implies straight line code, huge _BitInt implies a loop and the loop handles
just 2 limbs (for other operations just 1 limb) per iteration.  Changing
that depending on what trees are somewhere used would be a nightmare.
The idea was that if it is worth unrolling, unroller can unroll it later
and at that point I'd think e.g. FRE would optimize away the temporary
memory.

For variable shifts/masking I'd need some type in which I can do it.
Sure, perhaps if the inner operand is a vector I could use some non-constant
permutations or similar.

        Jakub

Reply via email to