Hi all! I want to adapt the current register allocation scheme so that it is usable for targets with virtual registers. I believe that, if done correctly, this may even result in some benefit to physical register targets, too.
** Some background ** I am working on a wasm port for GCC. wasm is a virtual register target, meaning that a compiler is supposed to declare an arbitrary set of registers that it is willing to use for a function, and use this set of registers without ever needing to spill values onto the stack. Currently, the only other virtual register target is nvptx. To have virtual registers, it skips over IRA and LRA entirely. This works relatively well for them, since due to specifics of the target, there is a one-to-one correspondence between register's mode and would-be register class, so instructions can be selected based purely on the instruction's RTL shape, without need for constraints. nvptx also doesn't need to worry about size, so it can declare as many registers as it likes, without trying to compress the set of declared registers. wasm unfortunately isn't like that: being a target primarily aimed at use on websites, size is usually very important for page loading speed. Proper register allocation would still be useful here to compress the set of declared registers for a function. This compression would also be useful because wasm addresses registers via leb128 indices, so popular registers can also have lower indices, which reduces size of register access instructions. (benefit form register allocation here is that as there are less registers in total, a bigger proportion of them can get short indices) Unlike nvptx, wasm lacks virtual registers for 8- and 16-bit values. As constraints are meaningless without register allocation, such values still have to be stored as 32-bit values, and pretty complicated workarounds are required to ensure that a register of incorrect mode does not end up in an operation that does not work on physical registers of such size (e.g. via a subreg expr). There are also optimization opportunities that are missed if only mode determines which register value gets placed in: if a register of small mode is primarily used via subregs of large mode, it would be beneficial to place such value in a large register in the first place, instead of trying to actually cast the value to a large mode first every time. Similar situation can occur in reverse: A large value that is used as a small one several times in a row, would benefit from being temporarily reloaded to a small register class, to coalesce the casts. ** wasm-specific considerations ** - wasm also has special types that can only exist in registers, and for which no instructions to load, store or cast to/from an integer is available. This would mean that for these types the register allocator cannot ever pick NO_REGS as a fallback (which it shouldn't ever do on a virtual register target regardless, but it's one more thing to keep in mind). - wasm's to-be register classes are also disjoint, meaning that a value that resides in a 64-bit register cannot be accessed as-if it resides in a 32-bit register. There is also no register class that can hold any value, so GENERAL_REGS is also not meaningful, and ALL_REGS is not a register class any target virtual register can actually be assigned. ** General considerations ** - instructions for virtual register targets may actually accept an arbitrary amount of arguments (e.g. wasm's call family of insns, or asm blocks), each of which would need a constraint, so register allocation infrastructure needs to be adapted to support that. It probably would be useful even for targets with a fixed amount of registers, as that will potentially remove the current 30 operand limit on asm_operands. - register asm decls would no longer be able to specify a specific register name to store the value in, they would need to specify a class instead. This may be of use for fixed register targets too, e.g. to place a value in any register as long as it has the required class. ** The request ** I would like to adapt the current setup of IRA+LRA to work under these constraints. Unfortunately, after attempting to do that by myself my changes kept behaving unpredictably, and I realized that I lack the understanding necessary, and would like some guidance. - Which parts should I touch first, to get something testable? Every time I tried to start this, I felt like there were too many moving parts, between machine constraints, and various parts of ira and lra. - What would a good design for this look like? E.g. my first attempt used backend hooks to mark out additional operands based on circumstantial evidence from the RTL shape itself. Would it be better to create another rtx like "use with constraint" instead? - Is adapting the current infrastructure even the right call, or would it be simpler to write a separate register allocator? It may not even be possible to make the current scheme never reload to memory, but that's a hard requirement for wasm. Thanks, feedable
