On 05/24/2011 04:31 AM, Kirill Batuzov wrote:
> 
> 
> On Mon, 23 May 2011, Aurelien Jarno wrote:
> 
>>
>> Thanks for this patch series. Your approach to solve this issue is
>> really different than mine. Instead I added more state to the dead/live
>> states, and use them to mark some input deads even for global, and mark
>> some output arguments to be synced. This informations are then used
>> directly in the tcg_reg_alloc_* functions to make better usage of the
>> available registers. On the other hand my patch series only tries to
>> really lower the number of spills and doesn't try to make better spill
>> choices.
>>
>> I guess it would be a good idea that I continue with this approach (I
>> basically just have to fix a few cases were some regs are wrongly copied
>> back to memory), so that we can more easily compare the two approaches.
>> Your last patch is anyway interesting, having some statistics is always
>> something interesting.
>>
>> In any case I really think we need a better register allocator before we
>> can do any serious optimization passes like constant or copy propagation,
>> otherwise we end up with a lot of register in use for no real reason.
>>
> When I started working on this patch series I first wanted to write a
> better register allocator, something linear scan based.  But TBs
> currently have quite specific and very simple structure.  They have globals 
> which are alive everywhere and temps, packed in a count of nests.  Each nest
> is a result of translation of one guest instruction.  Live ranges of temps in
> one nest always intersect, while live ranges of temps from different
> nests never intersect.  As a result more sophisticated algorithm being
> applied to this test case works very similar to a simple greedy algorithm we
> have right now.

Something that would be helpful for the RISC hosts would be to add some
mechanism to add constants -- or constant fragments, if you like -- into
the register allocation mix.

If you have access to a Sparc or PPC host (perhaps emulated under qemu),
have a look at the code generated for an i386, or even arm executable.
You'll see lots of similar constants being created, all in a 2-3 insn
sequence.  Have a look at the code generated for a 64-bit target like
Alpha and it'll be a 4-6 insn sequence.

Ideally we'd be able to register-allocate these partial constant loads,
and so collapse similar sequences.  We have tons of registers that are
not being used on these hosts, which seems a shame.


r~

Reply via email to