------- Comment #4 from vmakarov at redhat dot com 2009-04-26 17:38 ------- The first test case is just an example that RA is a heuristic solution. Even heuristic algorithm which works worse in average sometimes can generate a better solution than ones working better in average.
Here is IRA log of pseudo assignments. Different assignments made by the old RA are in parenthesis: Popping a0(r133,l0) -- assign reg 4 Popping a4(r138,l0) -- assign reg 5 Popping a3(r145,l0) -- assign reg 6 Popping a7(r137,l0) -- assign reg 7 (4) <-- key point Popping a2(r146,l0) -- spill (7) Popping a9(r139,l0) -- assign reg 2 Popping a10(r147,l0) -- assign reg 6 (1) Popping a1(r134,l0) -- assign reg 0 Popping a5(r143,l0) -- assign reg 7 (4) Popping a6(r135,l0) -- assign reg 0 Popping a8(r140,l0) -- assign reg 5 (2) If IRA assigned hard reg 4 instead of 7 to pseudo 137, all pseudos would get registers including r139. The old RA assigns 4 to r137 because it prefers the first hard reg in register allocation order if all other conditions are equal. IRA assigns 7 because it has smaller cost than 4. The cost is smaller because IRA hopes that r135 could get 4 later (which is not happened). That is because there is a copy connected the two allocnos. cp4:a0(r133)<->a6(r135)@125:shuffle This copy (called shuffle) is originated from case as in the following situation r2 = op (..., r1) and r1 is becoming dead. Assigning the hard register of r1 to r2 in this situation usually results in better RA (please read a literature about RA). The following patch ignoring the shuffle copies results in the same RA for this test: Index: gcc/ira-color.c =================================================================== --- gcc/ira-color.c (revision 146788) +++ gcc/ira-color.c (working copy) @@ -298,16 +298,18 @@ update_copy_costs (ira_allocno_t allocno (&ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno), cover_class, ALLOCNO_UPDATED_COVER_CLASS_COST (another_allocno), ALLOCNO_HARD_REG_COSTS (another_allocno)); - ira_allocate_and_set_or_copy_costs - (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno), - cover_class, 0, - ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno)); i = ira_class_hard_reg_index[cover_class][hard_regno]; ira_assert (i >= 0); ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno)[i] += update_cost; - ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i] - += update_cost; - + if (cp->insn != NULL_RTX || cp->constraint_p) + { + ira_allocate_and_set_or_copy_costs + (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno), + cover_class, 0, + ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno)); + ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i] + += update_cost; + } queue_update_cost (another_allocno, divisor * COST_HOP_DIVISOR); } } But this patch does not improve RA in average (today I checked code size and performance of SPEC2000 for Core i7 in 32-bit mode). So I have no sense to commit the patch. As for the second test I did not find a difference to worry about (I checked the mainline and gcc-4.3.1 using -march=armv5te -mthumb-interwork -Os -mthumb): --- a0.s 2009-04-24 17:40:54.000000000 -0400 +++ a1.s 2009-04-24 17:40:36.000000000 -0400 @@ -1,8 +1,5 @@ .code 16 .file "a.i" - .section .rodata -.LC0: - .ascii "abc\000" .text .align 2 .global test @@ -11,31 +8,35 @@ .type test, %function test: push {r4, r5, r6, r7, lr} - ldr r5, [r0] - mov r7, r0 - ldrb r3, [r5] + ldr r4, [r0] + mov r5, r0 + ldrb r3, [r4] cmp r3, #91 bne .L2 - mov r0, r5 - ldrb r4, [r5, #1] + mov r0, r4 + ldrb r7, [r4, #1] bl func - add r6, r5, #1 - strb r4, [r5, #1] + add r6, r4, #1 + strb r7, [r4, #1] cmp r0, #0 bne .L3 - mov r5, r6 + mov r4, r6 .L2: ldr r0, .L5 mov r1, #0 bl func2 - mov r6, r5 + mov r6, r4 .L3: - str r6, [r7] + str r6, [r5] @ sp needed for prologue pop {r4, r5, r6, r7, pc} .L6: .align 2 .L5: - .word .LC0 + .word .LANCHOR0 .size test, .-test - .ident "GCC: (GNU) 4.3.4 20090424 (prerelease)" + .section .rodata + .set .LANCHOR0,. + 0 +.LC0: + .ascii "abc\000" + .ident "GCC: (GNU) 4.5.0 20090423 (experimental)" -------------------------------------------------------------------- Once again RA is a heuristic algorithm (optimal solution for some RA models, e.g. based on ILP or algorithms for quadratic assignment problems solutions, are not useful in practice). It is possible to find a lot of tests where the old RA works better than IRA. Analysis of the tests takes a lot of time (although I did a lot of them last 2 years working on IRA). Alexander, I'd really appreciate if you did more analysis and proposed the solutions of the found problem instead of just posting the tests. Although posting the tests is useful too but doing just this will make such PRs less and less priority for me. I checked several times that IRA generates smaller code on bigger tests for ARM, So there is already a progress in RA in comparison with the old RA. But of course, there is no limit for perfection. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836