On Thu, Feb 14, 2013 at 12:36:46AM +0100, Michael Eager wrote: > On 02/13/2013 02:38 PM, Vladimir Makarov wrote: > > On 13-02-13 1:36 AM, Michael Eager wrote: > >> Hi -- > >> > >> I'm seeing register allocation problems and code size increases > >> with gcc-4.6.2 (and gcc-head) compared with older (gcc-4.1.2). > >> Both are compiled using -O3. > >> > >> One test case that I have has a long series of nested if's > >> each with the same comparison and similar computation. > >> > >> if (n<max_no){ > >> n+=*(cp-*p++); > >> if (n<max_no){ > >> n+=*(cp-*p); > >> if (n<max_no){ > >> . . . ~20 levels of nesting > >> <more computations with 'cp' and 'p'> > >> . . . }}} > >> > >> Gcc-4.6.2 generates many blocks like the following: > >> lwi r28,r1,68 -- load into dead reg > >> lwi r31,r1,140 -- load p from stack > >> lbui r28,r31,0 > >> rsubk r31,r28,r19 > >> lbui r31,r31,0 > >> addk r29,r29,r31 > >> swi r31,r1,308 > >> lwi r31,r1,428 -- load of max_no from stack > >> cmp r28,r31,r29 -- n in r29 > >> bgeid r28,$L46 > >> > >> gcc-4.1.2 generates the following: > >> lbui r3,r26,3 > >> rsubk r3,r3,r19 > >> lbui r3,r3,0 > >> addk r30,r30,r3 > >> swi r3,r1,80 > >> cmp r18,r9,r30 -- max_no in r9, n in r30 > >> bgei r18,$L6 > >> > >> gcc-4.6.2 (and gcc-head) load max_no from the stack in each block. > >> There also are extra loads into r28 (which is not used) and r31 at > >> the start of each block. Only r28, r29, and r31 are used. > >> > >> I'm having a hard time telling what is happening or why. The > >> IRA dump has this line: > >> Ignoring reg 772, has equiv memory > >> where pseudo 772 is loaded with max_no early in the function. > >> > >> The reload dump has > >> Reloads for insn # 254 > >> Reload 0: reload_in (SI) = (reg/v:SI 722 [ max_no ]) > >> GR_REGS, RELOAD_FOR_INPUT (opnum = 1) > >> reload_in_reg: (reg/v:SI 722 [ max_no ]) > >> reload_reg_rtx: (reg:SI 31 r31) > >> and similar for each of the other insns using 722. > >> > >> This is followed by > >> Spilling for insn 254. > >> Using reg 31 for reload 0 > >> for each insn using pseudo 722. > >> > >> Any idea what is going on? > >> > > So many changes happened since then (7 years ago), that it is very hard to > > me to say something > > definitely. I also have no gcc-4.1 microblaze (as I see microblaze was > > added to public gcc for 4.6 > > version) and it makes me even more difficult to say something useful. > > > > First of all, the new RA was introduced in gcc4.4 (IRA) which uses > > different heuristics > > (Chaitin-Briggs graph coloring vs Chow's priority RA). > > > > We could blame IRA when we have the same started conditions for it RA > > gcc4.1 and gcc4.6-gcc-4.8. > > But I am sure it is not the same. More aggressive optimizations creates > > higher register pressure. I > > compared peak reg pressure in the test for gcc4.6 and gcc4.8. It became > > higher (from 102 to 106). > > I guess the increase was even bigger since gcc4.1. > > I thought about register pressure causing this, but I think that should cause > spilling of one of the registers which were not used in this long sequence, > rather than causing a large number of additional loads. > > Perhaps the cost analysis has a problem. > > > RA focused on generation of faster code. Looking at the fragment you > > provided it, it is hard to say > > something about it. I tried -Os for gcc4.8 and it generates desirable code > > for the fragment in > > question (by the way the peak register pressure decreased to 66 in this > > case). > > It's both larger and slower, since the additional loads take much longer. > I'll take a > look at -Os. > > It looks like the values of p++ are being pre-calculated and stored on the > stack. This results in > a load, rather than an increment of a register.
Hi, I remember having a similar issue about a year ago. IIRC, I foudn that the ivopts pass was transforming things badly for microblaze. Disabling it helped alot. I can't tell if you are seeing the same thing, but it might be worth trying -fno-ivopts in case you haven't already. Cheers, Edgar