On Tue, Apr 30, 2024 at 8:54 PM Jeff Law <j...@ventanamicro.com> wrote:
>
>
> In doing some preparation work for using zbkb's pack instructions for
> constant synthesis I figured it would be wise to get a sense of how well
> our constant synthesis is actually working and address any clear issues.
>
> So the first glaring inefficiency is in our handling of constants with a
> small number of bits set.  Let's start with just two bits set.   There
> are 2016 distinct constants in that space (rv64).  With Zbs enabled the
> absolute worst we should ever do is two instructions (bseti+bseti).  Yet
> we have 503 cases where we're generating 3+ instructions when there's
> just two bits set in the constant.  A constant like 0x8000000000001000
> generates 4 instructions!
>
> This patch adds bseti (and indirectly binvi if we needed it) as a first
> class citizen for constant synthesis.  There's two components to this
> change.
>
> First, we can't generate an IOR with a constant like (1 << 45) as an
> operand.  The IOR/XOR define_insn is in riscv.md.  The constant argument
> for those patterns must match an arith_operand which means its not
> really usable for generating bseti directly in the cases we care about
> (at least one of the bits will be in the 32..63 range and thus won't
> match arith_operand).
>
> We have a few things we could do.  One would be to extend the existing
> pattern to incorporate bseti cases.  But I suspect folks like the
> separation of the base architecture (riscv.md) from the Zb* extensions
> (bitmanip.md).  We could also try to generate the RTL for bseti
> directly, bypassing gen_fmt_ee (which forces undesirable constants into
> registers based on the predicate of the appropriate define_insn).
> Neither of these seemed particularly appealing to me.
>
> So what I've done instead is to make ior/xor a define_expand and have
> the expander allow a wider set of constant operands when Zbs is enabled.
>   That allows us to keep the bulk of Zb* support inside bitmanip.md and
> continue to use gen_fmt_ee in the constant synthesis paths.

Seems like a clean solution to me.

>
> Note the code generation in this case is designed to first set as many
> bits as we can with lui, then with addi since those can both set
> multiple bits at a time.  If there are any residual bits left to set we
> can emit bseti instructions up to the current cost ceiling.
>
> This results in fixing all of the 503 2-bit set cases where we emitted
> too many instructions.  It also significantly helps other scenarios with
> more bits set.
>
> The testcase I'm including verifies the number of instructions we
> generate for the full set of 2016 possible cases.  Obviously this won't
> be possible as we increase the number of bits (there are something like
> 48k cases with just 3 bits set).
>
> Build and regression tested on rv64gc.  OK for the trunk?
>
>
> THanks,
> Jeff
>

Reply via email to