On Thu, Feb 14, 2013 at 12:36:46AM +0100, Michael Eager wrote:
> On 02/13/2013 02:38 PM, Vladimir Makarov wrote:
> > On 13-02-13 1:36 AM, Michael Eager wrote:
> >> Hi --
> >>
> >> I'm seeing register allocation problems and code size increases
> >> with gcc-4.6.2 (and gcc-head) compared with older (gcc-4.1.2).
> >> Both are compiled using -O3.
> >>
> >> One test case that I have has a long series of nested if's
> >> each with the same comparison and similar computation.
> >>
> >>         if (n<max_no){
> >>           n+=*(cp-*p++);
> >>           if (n<max_no){
> >>             n+=*(cp-*p);
> >>               if (n<max_no){
> >>         . . .          ~20 levels of nesting
> >>                <more computations with 'cp' and 'p'>
> >>                 . . . }}}
> >>
> >> Gcc-4.6.2 generates many blocks like the following:
> >>     lwi    r28,r1,68    -- load into dead reg
> >>     lwi    r31,r1,140    -- load p from stack
> >>     lbui    r28,r31,0
> >>     rsubk    r31,r28,r19
> >>     lbui    r31,r31,0
> >>     addk    r29,r29,r31
> >>     swi    r31,r1,308
> >>     lwi    r31,r1,428    -- load of max_no from stack
> >>     cmp    r28,r31,r29    -- n in r29
> >>     bgeid    r28,$L46
> >>
> >> gcc-4.1.2 generates the following:
> >>     lbui    r3,r26,3
> >>     rsubk    r3,r3,r19
> >>     lbui    r3,r3,0
> >>     addk    r30,r30,r3
> >>     swi    r3,r1,80
> >>     cmp    r18,r9,r30    -- max_no in r9, n in r30
> >>     bgei    r18,$L6
> >>
> >> gcc-4.6.2 (and gcc-head) load max_no from the stack in each block.
> >> There also are extra loads into r28 (which is not used) and r31 at
> >> the start of each block.  Only r28, r29, and r31 are used.
> >>
> >> I'm having a hard time telling what is happening or why.  The
> >> IRA dump has this line:
> >>    Ignoring reg 772, has equiv memory
> >> where pseudo 772 is loaded with max_no early in the function.
> >>
> >> The reload dump has
> >> Reloads for insn # 254
> >> Reload 0: reload_in (SI) = (reg/v:SI 722 [ max_no ])
> >>     GR_REGS, RELOAD_FOR_INPUT (opnum = 1)
> >>     reload_in_reg: (reg/v:SI 722 [ max_no ])
> >>     reload_reg_rtx: (reg:SI 31 r31)
> >> and similar for each of the other insns using 722.
> >>
> >> This is followed by
> >>   Spilling for insn 254.
> >>   Using reg 31 for reload 0
> >> for each insn using pseudo 722.
> >>
> >> Any idea what is going on?
> >>
> > So many changes happened since then (7 years ago), that it is very hard to 
> > me to say something
> > definitely.  I also have no gcc-4.1 microblaze (as I see microblaze was 
> > added to public gcc for 4.6
> > version) and it makes me even more difficult to say something useful.
> >
> > First of all, the new RA was introduced in gcc4.4 (IRA) which uses 
> > different heuristics
> > (Chaitin-Briggs graph coloring vs Chow's priority RA).
> >
> > We could blame IRA when we have the same started conditions for it RA 
> > gcc4.1 and gcc4.6-gcc-4.8.
> > But I am sure it is not the same. More aggressive optimizations creates 
> > higher register pressure.  I
> > compared peak reg pressure in the test for gcc4.6 and gcc4.8.  It became 
> > higher (from 102 to 106).
> > I guess the increase was even bigger since gcc4.1.
> 
> I thought about register pressure causing this, but I think that should cause
> spilling of one of the registers which were not used in this long sequence,
> rather than causing a large number of additional loads.
> 
> Perhaps the cost analysis has a problem.
> 
> > RA focused on generation of faster code.  Looking at the fragment you 
> > provided it, it is hard to say
> > something about it.  I tried -Os for gcc4.8 and it generates desirable code 
> > for the fragment in
> > question (by the way the peak register pressure decreased to 66 in this 
> > case).
> 
> It's both larger and slower, since the additional loads take much longer.  
> I'll take a
> look at -Os.
> 
> It looks like the values of p++ are being pre-calculated and stored on the 
> stack.  This results in
> a load, rather than an increment of a register.

Hi,

I remember having a similar issue about a year ago. IIRC, I foudn that
the ivopts pass was transforming things badly for microblaze. Disabling
it helped alot.

I can't tell if you are seeing the same thing, but it might be worth
trying -fno-ivopts in case you haven't already.

Cheers,
Edgar

Reply via email to