On Mon, 1 Feb 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.
> 
> And now the routine “bump_map” in 511.povray is like following:
> ...
> 
>  # DEBUG BEGIN_STMT
>   xcoor = 0.0;
>   ycoor = 0.0;
>   # DEBUG BEGIN_STMT
>   index = .DEFERRED_INIT (index, 2);
>   index2 = .DEFERRED_INIT (index2, 2);
>   index3 = .DEFERRED_INIT (index3, 2);
>   # DEBUG BEGIN_STMT
>   colour1 = .DEFERRED_INIT (colour1, 2);
>   colour2 = .DEFERRED_INIT (colour2, 2);
>   colour3 = .DEFERRED_INIT (colour3, 2);
>   # DEBUG BEGIN_STMT
>   p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>   # DEBUG p1$0 => p1$0_181
>   p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>   # DEBUG p1$1 => p1$1_184
>   p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>   # DEBUG p1$2 => p1$2_172
>   p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>   # DEBUG p2$0 => p2$0_177
>   p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>   # DEBUG p2$1 => p2$1_135
>   p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>   # DEBUG p2$2 => p2$2_137
>   p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>   # DEBUG p3$0 => p3$0_377
>   p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>   # DEBUG p3$1 => p3$1_379
>   p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>   # DEBUG p3$2 => p3$2_381
> 
> 
> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of 
> the components of p1, p2 and p3. 
> 
> With this change, the stack usage numbers with -fstack-usage for approach A, 
> old approach D and new D with the splitting in SRA are:
> 
>   Approach A  Approach D-old  Approach D-new
> 
>       272                     624                     368
> 
> From the above, we can see that splitting the call to DEFERRED_INIT in SRA 
> can reduce the stack usage increase dramatically. 
> 
> However, looks like that the stack size for D is still bigger than A. 
> 
> I checked the IR again, and found that the alias analysis might be 
> responsible for this (by compare the image.cpp.026t.ealias for both A and D):
> 
> (Due to the call to:
> 
>   colour1 = .DEFERRED_INIT (colour1, 2);
> )
> 
> ******Approach A:
> 
> Points_to analysis:
> 
> Constraints:
> …
> colour1 = &NULL
> …
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> ...
> callarg(53) = &colour1
> ...
> _53 = colour1
> 
> Points_to sets:
> …
> colour1 = { NULL ESCAPED NONLOCAL } same as _53
> ...
> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as 
> CALLUSED(48)
> ...
> callarg(53) = { NULL ESCAPED NONLOCAL colour1 }
> 
> ******Apprach D:
> 
> Points_to analysis:
> 
> Constraints:
> …
> callarg(19) = colour1
> callarg(19) = &NONLOCAL
> colour1 = callarg(19) + UNKNOWN
> colour1 = &NONLOCAL
> …
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> …
> callarg(74) = &colour1
> callarg(74) = callarg(74) + UNKNOWN
> callarg(74) = *callarg(74) + UNKNOWN
> …
> _53 = colour1
> _54 = _53
> _55 = _54 + UNKNOWN
> _55 = &NONLOCAL
> _56 = colour1
> _57 = _56
> _58 = _57 + UNKNOWN
> _58 = &NONLOCAL
> _59 = _55 + UNKNOWN
> _59 = _58 + UNKNOWN
> _60 = colour1
> _61 = _60
> _62 = _61 + UNKNOWN
> _62 = &NONLOCAL
> _63 = _59 + UNKNOWN
> _63 = _62 + UNKNOWN
> _64 = _63 + UNKNOWN
> ..
> Points_to set:
> …
> colour1 = { ESCAPED NONLOCAL } same as callarg(19)
> …
> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
> callarg(71) = { ESCAPED NONLOCAL }
> callarg(72) = { ESCAPED NONLOCAL }
> callarg(73) = { ESCAPED NONLOCAL }
> callarg(74) = { ESCAPED NONLOCAL colour1 }
> 
> My question:
> 
> Is it possible to adjust alias analysis to resolve this issue?

You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c
find_func_aliases_for_call (it's not a builtin but you can look in
the respective subroutine for examples).  Specifically you want to
avoid making anything escaped or clobbered.

> thanks.
> 
> Qing
> 
> > On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches 
> > <gcc-patches@gcc.gnu.org> wrote:
> > 
> >>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
> >>>>> has a lot stack increase 
> >>>>> due to implementation D, by examine the IR immediate before RTL
> >>>>> expansion phase.  
> >>>>> (image.cpp.244t.optimized), I found that we have the following
> >>>>> additional statements for the array elements:
> >>>>> 
> >>>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
> >>>>> * normal)
> >>>>> {
> >>>>> …
> >>>>> double p3[3];
> >>>>> double p2[3];
> >>>>> double p1[3];
> >>>>> float colour3[5];
> >>>>> float colour2[5];
> >>>>> float colour1[5];
> >>>>> …
> >>>>> # DEBUG BEGIN_STMT
> >>>>> colour1 = .DEFERRED_INIT (colour1, 2);
> >>>>> colour2 = .DEFERRED_INIT (colour2, 2);
> >>>>> colour3 = .DEFERRED_INIT (colour3, 2);
> >>>>> # DEBUG BEGIN_STMT
> >>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
> >>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
> >>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
> >>>>> p1 = .DEFERRED_INIT (p1, 2);
> >>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
> >>>>> # DEBUG p1$0 => D#12
> >>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
> >>>>> # DEBUG p1$1 => D#11
> >>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
> >>>>> # DEBUG p1$2 => D#10
> >>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
> >>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
> >>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
> >>>>> p2 = .DEFERRED_INIT (p2, 2);
> >>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
> >>>>> # DEBUG p2$0 => D#9
> >>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
> >>>>> # DEBUG p2$1 => D#8
> >>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
> >>>>> # DEBUG p2$2 => D#7
> >>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
> >>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
> >>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
> >>>>> p3 = .DEFERRED_INIT (p3, 2);
> >>>>> ….
> >>>>> }
> >>>>> 
> >>>>> I guess that the above “MEM <double>….. = …” are the ones that make the
> >>>>> differences. Which phase introduced them?
> >>>> 
> >>>> Looks like SRA. But you can just dump all and grep for the first 
> >>>> occurrence. 
> >>> 
> >>> Yes, looks like that SRA is the one:
> >>> 
> >>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
> >>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
> >>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);
> >> 
> >> I realise no-one was suggesting otherwise, but FWIW: SRA could easily
> >> be extended to handle .DEFERRED_INIT if that's the main source of
> >> excess stack usage.  A single .DEFERRED_INIT of an aggregate can
> >> be split into .DEFERRED_INITs of individual components.
> > 
> > Thanks a lot for the suggestion,
> > I will study the code of SRA to see how to do this and then see whether 
> > this can resolve the issue.
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to