On Mon, 1 Feb 2021, Qing Zhao wrote: > Hi, Richard, > > I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion. > > And now the routine “bump_map” in 511.povray is like following: > ... > > # DEBUG BEGIN_STMT > xcoor = 0.0; > ycoor = 0.0; > # DEBUG BEGIN_STMT > index = .DEFERRED_INIT (index, 2); > index2 = .DEFERRED_INIT (index2, 2); > index3 = .DEFERRED_INIT (index3, 2); > # DEBUG BEGIN_STMT > colour1 = .DEFERRED_INIT (colour1, 2); > colour2 = .DEFERRED_INIT (colour2, 2); > colour3 = .DEFERRED_INIT (colour3, 2); > # DEBUG BEGIN_STMT > p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); > # DEBUG p1$0 => p1$0_181 > p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); > # DEBUG p1$1 => p1$1_184 > p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); > # DEBUG p1$2 => p1$2_172 > p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); > # DEBUG p2$0 => p2$0_177 > p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); > # DEBUG p2$1 => p2$1_135 > p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); > # DEBUG p2$2 => p2$2_137 > p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); > # DEBUG p3$0 => p3$0_377 > p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); > # DEBUG p3$1 => p3$1_379 > p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); > # DEBUG p3$2 => p3$2_381 > > > In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of > the components of p1, p2 and p3. > > With this change, the stack usage numbers with -fstack-usage for approach A, > old approach D and new D with the splitting in SRA are: > > Approach A Approach D-old Approach D-new > > 272 624 368 > > From the above, we can see that splitting the call to DEFERRED_INIT in SRA > can reduce the stack usage increase dramatically. > > However, looks like that the stack size for D is still bigger than A. > > I checked the IR again, and found that the alias analysis might be > responsible for this (by compare the image.cpp.026t.ealias for both A and D): > > (Due to the call to: > > colour1 = .DEFERRED_INIT (colour1, 2); > ) > > ******Approach A: > > Points_to analysis: > > Constraints: > … > colour1 = &NULL > … > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > ... > callarg(53) = &colour1 > ... > _53 = colour1 > > Points_to sets: > … > colour1 = { NULL ESCAPED NONLOCAL } same as _53 > ... > CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } > CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as > CALLUSED(48) > ... > callarg(53) = { NULL ESCAPED NONLOCAL colour1 } > > ******Apprach D: > > Points_to analysis: > > Constraints: > … > callarg(19) = colour1 > callarg(19) = &NONLOCAL > colour1 = callarg(19) + UNKNOWN > colour1 = &NONLOCAL > … > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > … > callarg(74) = &colour1 > callarg(74) = callarg(74) + UNKNOWN > callarg(74) = *callarg(74) + UNKNOWN > … > _53 = colour1 > _54 = _53 > _55 = _54 + UNKNOWN > _55 = &NONLOCAL > _56 = colour1 > _57 = _56 > _58 = _57 + UNKNOWN > _58 = &NONLOCAL > _59 = _55 + UNKNOWN > _59 = _58 + UNKNOWN > _60 = colour1 > _61 = _60 > _62 = _61 + UNKNOWN > _62 = &NONLOCAL > _63 = _59 + UNKNOWN > _63 = _62 + UNKNOWN > _64 = _63 + UNKNOWN > .. > Points_to set: > … > colour1 = { ESCAPED NONLOCAL } same as callarg(19) > … > CALLUSED(69) = { ESCAPED NONLOCAL index colour1 } > CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69) > callarg(71) = { ESCAPED NONLOCAL } > callarg(72) = { ESCAPED NONLOCAL } > callarg(73) = { ESCAPED NONLOCAL } > callarg(74) = { ESCAPED NONLOCAL colour1 } > > My question: > > Is it possible to adjust alias analysis to resolve this issue?
You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c find_func_aliases_for_call (it's not a builtin but you can look in the respective subroutine for examples). Specifically you want to avoid making anything escaped or clobbered. > thanks. > > Qing > > > On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > >>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it > >>>>> has a lot stack increase > >>>>> due to implementation D, by examine the IR immediate before RTL > >>>>> expansion phase. > >>>>> (image.cpp.244t.optimized), I found that we have the following > >>>>> additional statements for the array elements: > >>>>> > >>>>> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double > >>>>> * normal) > >>>>> { > >>>>> … > >>>>> double p3[3]; > >>>>> double p2[3]; > >>>>> double p1[3]; > >>>>> float colour3[5]; > >>>>> float colour2[5]; > >>>>> float colour1[5]; > >>>>> … > >>>>> # DEBUG BEGIN_STMT > >>>>> colour1 = .DEFERRED_INIT (colour1, 2); > >>>>> colour2 = .DEFERRED_INIT (colour2, 2); > >>>>> colour3 = .DEFERRED_INIT (colour3, 2); > >>>>> # DEBUG BEGIN_STMT > >>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D); > >>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); > >>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); > >>>>> p1 = .DEFERRED_INIT (p1, 2); > >>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] > >>>>> # DEBUG p1$0 => D#12 > >>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] > >>>>> # DEBUG p1$1 => D#11 > >>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] > >>>>> # DEBUG p1$2 => D#10 > >>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D); > >>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); > >>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); > >>>>> p2 = .DEFERRED_INIT (p2, 2); > >>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] > >>>>> # DEBUG p2$0 => D#9 > >>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] > >>>>> # DEBUG p2$1 => D#8 > >>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] > >>>>> # DEBUG p2$2 => D#7 > >>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D); > >>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); > >>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); > >>>>> p3 = .DEFERRED_INIT (p3, 2); > >>>>> …. > >>>>> } > >>>>> > >>>>> I guess that the above “MEM <double>….. = …” are the ones that make the > >>>>> differences. Which phase introduced them? > >>>> > >>>> Looks like SRA. But you can just dump all and grep for the first > >>>> occurrence. > >>> > >>> Yes, looks like that SRA is the one: > >>> > >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); > >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); > >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D); > >> > >> I realise no-one was suggesting otherwise, but FWIW: SRA could easily > >> be extended to handle .DEFERRED_INIT if that's the main source of > >> excess stack usage. A single .DEFERRED_INIT of an aggregate can > >> be split into .DEFERRED_INITs of individual components. > > > > Thanks a lot for the suggestion, > > I will study the code of SRA to see how to do this and then see whether > > this can resolve the issue. > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)