On 05/24/2011 04:31 AM, Kirill Batuzov wrote: > > > On Mon, 23 May 2011, Aurelien Jarno wrote: > >> >> Thanks for this patch series. Your approach to solve this issue is >> really different than mine. Instead I added more state to the dead/live >> states, and use them to mark some input deads even for global, and mark >> some output arguments to be synced. This informations are then used >> directly in the tcg_reg_alloc_* functions to make better usage of the >> available registers. On the other hand my patch series only tries to >> really lower the number of spills and doesn't try to make better spill >> choices. >> >> I guess it would be a good idea that I continue with this approach (I >> basically just have to fix a few cases were some regs are wrongly copied >> back to memory), so that we can more easily compare the two approaches. >> Your last patch is anyway interesting, having some statistics is always >> something interesting. >> >> In any case I really think we need a better register allocator before we >> can do any serious optimization passes like constant or copy propagation, >> otherwise we end up with a lot of register in use for no real reason. >> > When I started working on this patch series I first wanted to write a > better register allocator, something linear scan based. But TBs > currently have quite specific and very simple structure. They have globals > which are alive everywhere and temps, packed in a count of nests. Each nest > is a result of translation of one guest instruction. Live ranges of temps in > one nest always intersect, while live ranges of temps from different > nests never intersect. As a result more sophisticated algorithm being > applied to this test case works very similar to a simple greedy algorithm we > have right now.
Something that would be helpful for the RISC hosts would be to add some mechanism to add constants -- or constant fragments, if you like -- into the register allocation mix. If you have access to a Sparc or PPC host (perhaps emulated under qemu), have a look at the code generated for an i386, or even arm executable. You'll see lots of similar constants being created, all in a 2-3 insn sequence. Have a look at the code generated for a 64-bit target like Alpha and it'll be a 4-6 insn sequence. Ideally we'd be able to register-allocate these partial constant loads, and so collapse similar sequences. We have tons of registers that are not being used on these hosts, which seems a shame. r~