Hi all!

I want to adapt the current register allocation scheme so that it is
usable for targets with virtual registers. I believe that, if done
correctly, this may even result in some benefit to physical register
targets, too.

** Some background **
I am working on a wasm port for GCC. wasm is a virtual register target,
meaning that a compiler is supposed to declare an arbitrary set of
registers that it is willing to use for a function, and use this set of
registers without ever needing to spill values onto the stack.

Currently, the only other virtual register target is nvptx. To have
virtual registers, it skips over IRA and LRA entirely. This works
relatively well for them, since due to specifics of the target, there is
a one-to-one correspondence between register's mode and would-be
register class, so instructions can be selected based purely on the
instruction's RTL shape, without need for constraints. nvptx also
doesn't need to worry about size, so it can declare as many registers
as it likes, without trying to compress the set of declared registers.

wasm unfortunately isn't like that: being a target primarily aimed at
use on websites, size is usually very important for page loading speed.
Proper register allocation would still be useful here to compress the
set of declared registers for a function. This compression would also be
useful because wasm addresses registers via leb128 indices, so popular
registers can also have lower indices, which reduces size of register
access instructions. (benefit form register allocation here is that
as there are less registers in total, a bigger proportion of them can
get short indices)

Unlike nvptx, wasm lacks virtual registers for 8- and 16-bit values. As
constraints are meaningless without register allocation, such values
still have to be stored as 32-bit values, and pretty complicated
workarounds are required to ensure that a register of incorrect mode
does not end up in an operation that does not work on physical registers
of such size (e.g. via a subreg expr).

There are also optimization opportunities that are missed if only mode
determines which register value gets placed in: if a register of small
mode is primarily used via subregs of large mode, it would be beneficial
to place such value in a large register in the first place, instead of
trying to actually cast the value to a large mode first every time.
Similar situation can occur in reverse: A large value that is used as a
small one several times in a row, would benefit from being temporarily
reloaded to a small register class, to coalesce the casts.

** wasm-specific considerations **
- wasm also has special types that can only exist in registers, and for
which no instructions to load, store or cast to/from an integer is
available. This would mean that for these types the register allocator
cannot ever pick NO_REGS as a fallback (which it shouldn't ever do on a
virtual register target regardless, but it's one more thing to keep in
mind).
- wasm's to-be register classes are also disjoint, meaning that a value
that resides in a 64-bit register cannot be accessed as-if it resides in
a 32-bit register. There is also no register class that can hold any
value, so GENERAL_REGS is also not meaningful, and ALL_REGS is not a
register class any target virtual register can actually be assigned.

** General considerations **
- instructions for virtual register targets may actually accept an
arbitrary amount of arguments (e.g. wasm's call family of insns, or asm
blocks), each of which would need a constraint, so register allocation
infrastructure needs to be adapted to support that. It probably would be
useful even for targets with a fixed amount of registers, as that will
potentially remove the current 30 operand limit on asm_operands.
- register asm decls would no longer be able to specify a specific
register name to store the value in, they would need to specify a class
instead. This may be of use for fixed register targets too, e.g.
to place a value in any register as long as it has the required class.

** The request **
I would like to adapt the current setup of IRA+LRA to work under these
constraints. Unfortunately, after attempting to do that by myself my
changes kept behaving unpredictably, and I realized that I lack the
understanding necessary, and would like some guidance.
- Which parts should I touch first, to get something testable? Every
time I tried to start this, I felt like there were too many moving
parts, between machine constraints, and various parts of ira and lra.
- What would a good design for this look like? E.g. my first attempt
used backend hooks to mark out additional operands based on
circumstantial evidence from the RTL shape itself. Would it be better
to create another rtx like "use with constraint" instead?
- Is adapting the current infrastructure even the right call, or would
it be simpler to write a separate register allocator? It may not even
be possible to make the current scheme never reload to memory, but
that's a hard requirement for wasm.

Thanks,
feedable

Reply via email to