On Tue, Apr 30, 2024 at 8:54 PM Jeff Law <j...@ventanamicro.com> wrote: > > > In doing some preparation work for using zbkb's pack instructions for > constant synthesis I figured it would be wise to get a sense of how well > our constant synthesis is actually working and address any clear issues. > > So the first glaring inefficiency is in our handling of constants with a > small number of bits set. Let's start with just two bits set. There > are 2016 distinct constants in that space (rv64). With Zbs enabled the > absolute worst we should ever do is two instructions (bseti+bseti). Yet > we have 503 cases where we're generating 3+ instructions when there's > just two bits set in the constant. A constant like 0x8000000000001000 > generates 4 instructions! > > This patch adds bseti (and indirectly binvi if we needed it) as a first > class citizen for constant synthesis. There's two components to this > change. > > First, we can't generate an IOR with a constant like (1 << 45) as an > operand. The IOR/XOR define_insn is in riscv.md. The constant argument > for those patterns must match an arith_operand which means its not > really usable for generating bseti directly in the cases we care about > (at least one of the bits will be in the 32..63 range and thus won't > match arith_operand). > > We have a few things we could do. One would be to extend the existing > pattern to incorporate bseti cases. But I suspect folks like the > separation of the base architecture (riscv.md) from the Zb* extensions > (bitmanip.md). We could also try to generate the RTL for bseti > directly, bypassing gen_fmt_ee (which forces undesirable constants into > registers based on the predicate of the appropriate define_insn). > Neither of these seemed particularly appealing to me. > > So what I've done instead is to make ior/xor a define_expand and have > the expander allow a wider set of constant operands when Zbs is enabled. > That allows us to keep the bulk of Zb* support inside bitmanip.md and > continue to use gen_fmt_ee in the constant synthesis paths.
Seems like a clean solution to me. > > Note the code generation in this case is designed to first set as many > bits as we can with lui, then with addi since those can both set > multiple bits at a time. If there are any residual bits left to set we > can emit bseti instructions up to the current cost ceiling. > > This results in fixing all of the 503 2-bit set cases where we emitted > too many instructions. It also significantly helps other scenarios with > more bits set. > > The testcase I'm including verifies the number of instructions we > generate for the full set of 2016 possible cases. Obviously this won't > be possible as we increase the number of bits (there are something like > 48k cases with just 3 bits set). > > Build and regression tested on rv64gc. OK for the trunk? > > > THanks, > Jeff >