Uros, Let me try to explain you why I used such code duplication:
Here we have a common case of LEA with 3 different registers - r0 (target), r1(base), r2(index) and possible offset. To get the better scheduling we first try to determine what register is prefirable for inititial setting - r1 or r2 through find_nearest_reg_def. And then we generate the following sequence of instructions: r0 = r_best; r0 = $const, r0 r0 = r_worse, r0 that can save 2 cycles for Atom since first 2 instructions can be hoisted up. I could not find better way for coding it. Below is modified ChangeLog. 2012-08-14 Yuri Rumyantsev <ysrum...@gmail.com> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional argument. * config/i386/i386.md (ix86_split_lea_for_addr) : Add additional argument curr_insn. * config/i386/i386.c (ix86_split_lea_for_addr): Do instructions reodering to get opportunities for better scheduling. (ix86_lea_outperforms): Prefer LEA if only split cost exceeds AGU stall. (find_nearest_reg-def): New function. Find nearest register definition used in address. 2012/8/14 Uros Bizjak <ubiz...@gmail.com>: > On Tue, Aug 14, 2012 at 2:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > >> Thanks a lot forr your comments. >> >> I prepared new patch and ChangeLog. Testing of x32 is in progress. >> >> It it OK for trunk? >> >> 2012-08-14 Yuri Rumyantsev <ysrum...@gmail.com> >> >> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add >> additional argument. >> * config/i386/i386.md (ix86_split_lea_for_addr) : Add >> additional argument curr_insn. >> * config/i386/i386.c (ix86_split_lea_for_addr): Do instructions >> reodering to get opportunities for better scheduling. >> (ix86_lea_outperforms): Do more aggressive lea splitting. > > You are not doing splitting in ix86_lea_outperforms. > >> (find_nearest_reg-def): New function. Find nearest register >> definition used in address. > > Just say: > > (find_nearest_reg_def): New function. > > + emit_insn (gen_rtx_SET (VOIDmode, target, tmp)); > + if (parts.disp && parts.disp != const0_rtx) > + ix86_emit_binop (PLUS, mode, target, parts.disp); > + ix86_emit_binop (PLUS, mode, target, tmp1); > + return; > > Can you explain, why you have to duplicate this code? Here you > generate the same sequence as in the code below. Use tmp and tmp1 in > the way that it will fit existing code. > > Uros.