On Wed, 13 Jan 2021, Qing Zhao wrote:

> 
> 
> > On Jan 13, 2021, at 9:10 AM, Richard Biener <rguent...@suse.de> wrote:
> > 
> > On Wed, 13 Jan 2021, Qing Zhao wrote:
> > 
> >> 
> >> 
> >>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguent...@suse.de> wrote:
> >>> 
> >>> On Tue, 12 Jan 2021, Qing Zhao wrote:
> >>> 
> >>>> Hi, 
> >>>> 
> >>>> Just check in to see whether you have any comments and suggestions on 
> >>>> this:
> >>>> 
> >>>> FYI, I have been continue with Approach D implementation since last week:
> >>>> 
> >>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> >>>> .DEFFERED_INIT during expand to
> >>>> real initialization. Adjusting uninitialized pass with the new refs with 
> >>>> “.DEFFERED_INIT”.
> >>>> 
> >>>> For the remaining work of Approach D:
> >>>> 
> >>>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >>>> ** complete the implementation of uninitialized warnings maintenance 
> >>>> work for D. 
> >>>> 
> >>>> I have completed the uninitialized warnings maintenance work for D.
> >>>> And finished partial of the -ftrivial-auto-var-init=pattern 
> >>>> implementation. 
> >>>> 
> >>>> The following are remaining work of Approach D:
> >>>> 
> >>>>  ** -ftrivial-auto-var-init=pattern for VLA;
> >>>>  **add a new attribute for variable:
> >>>> __attribute((uninitialized)
> >>>> the marked variable is uninitialized intentionaly for performance 
> >>>> purpose.
> >>>>  ** adding complete testing cases;
> >>>> 
> >>>> 
> >>>> Please let me know if you have any objection on my current decision on 
> >>>> implementing approach D. 
> >>> 
> >>> Did you do any analysis on how stack usage and code size are changed 
> >>> with approach D?
> >> 
> >> I did the code size change comparison (I will provide the data in another 
> >> email). And with this data, D works better than A in general. (This is 
> >> surprise to me actually).
> >> 
> >> But not the stack usage.  Not sure how to collect the stack usage data, 
> >> do you have any suggestion on this?
> > 
> > There is -fstack-usage you could use, then of course watching
> > the stack segment at runtime.
> 
> I can do this for CPU2017 to collect the stack usage data and report back.
> 
> >  I'm mostly concerned about
> > stack-limited "processes" such as the linux kernel which I think
> > is a primary target of your work.
> 
> I don’t have any experience on building linux kernel. 
> Do we have to collect data for linux kernel at this time? Is CPU2017 data not 
> enough?

Well, it depends on the desired target.  The linux kernel has a
8kb hard stack limit for kernel threads on x86_64 (IIRC).  You
don't have to do anything, it was just a suggestion.  For normal
program stack usage is probably the least important problem.

Richard.

> Qing
> > 
> > Richard.
> > 
> >> 
> >>> How does compile-time behave (we could gobble up
> >>> lots of .DEFERRED_INIT calls I guess)?
> >> I can collect this data too and report it later.
> >> 
> >> Thanks.
> >> 
> >> Qing
> >>> 
> >>> Richard.
> >>> 
> >>>> Thanks a lot for your help.
> >>>> 
> >>>> Qing
> >>>> 
> >>>> 
> >>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
> >>>>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> This is an update for our previous discussion. 
> >>>>> 
> >>>>> 1. I implemented the following two different implementations in the 
> >>>>> latest upstream gcc:
> >>>>> 
> >>>>> A. Adding real initialization during gimplification, not maintain the 
> >>>>> uninitialized warnings.
> >>>>> 
> >>>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> >>>>> .DEFFERED_INIT during expand to
> >>>>> real initialization. Adjusting uninitialized pass with the new refs 
> >>>>> with “.DEFFERED_INIT”.
> >>>>> 
> >>>>> Note, in this initial implementation,
> >>>>>         ** I ONLY implement -ftrivial-auto-var-init=zero, the 
> >>>>> implementation of -ftrivial-auto-var-init=pattern 
> >>>>>            is not done yet.  Therefore, the performance data is only 
> >>>>> about -ftrivial-auto-var-init=zero. 
> >>>>> 
> >>>>>         ** I added an temporary  option 
> >>>>> -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
> >>>>>            runtime performance study.
> >>>>>         ** I didn’t finish the uninitialized warnings maintenance work 
> >>>>> for D. (That might take more time than I expected). 
> >>>>> 
> >>>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new 
> >>>>> gcc for the following 3 cases:
> >>>>> 
> >>>>> no: default. (-g -O2 -march=native )
> >>>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> >>>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> >>>>> 
> >>>>> And then compute the slowdown data for both A and D as following:
> >>>>> 
> >>>>> benchmarks              A / no  D /no
> >>>>> 
> >>>>> 500.perlbench_r 1.25%   1.25%
> >>>>> 502.gcc_r               0.68%   1.80%
> >>>>> 505.mcf_r               0.68%   0.14%
> >>>>> 520.omnetpp_r   4.83%   4.68%
> >>>>> 523.xalancbmk_r 0.18%   1.96%
> >>>>> 525.x264_r              1.55%   2.07%
> >>>>> 531.deepsjeng_  11.57%  11.85%
> >>>>> 541.leela_r             0.64%   0.80%
> >>>>> 557.xz_                  -0.41% -0.41%
> >>>>> 
> >>>>> 507.cactuBSSN_r 0.44%   0.44%
> >>>>> 508.namd_r              0.34%   0.34%
> >>>>> 510.parest_r            0.17%   0.25%
> >>>>> 511.povray_r            56.57%  57.27%
> >>>>> 519.lbm_r               0.00%   0.00%
> >>>>> 521.wrf_r                        -0.28% -0.37%
> >>>>> 526.blender_r           16.96%  17.71%
> >>>>> 527.cam4_r              0.70%   0.53%
> >>>>> 538.imagick_r           2.40%   2.40%
> >>>>> 544.nab_r               0.00%   -0.65%
> >>>>> 
> >>>>> avg                             5.17%   5.37%
> >>>>> 
> >>>>> From the above data, we can see that in general, the runtime 
> >>>>> performance slowdown for 
> >>>>> implementation A and D are similar for individual benchmarks.
> >>>>> 
> >>>>> There are several benchmarks that have significant slowdown with the 
> >>>>> new added initialization for both
> >>>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, 
> >>>>> I will try to study a little bit
> >>>>> more on what kind of new initializations introduced such slowdown. 
> >>>>> 
> >>>>> From the current study so far, I think that approach D should be good 
> >>>>> enough for our final implementation. 
> >>>>> So, I will try to finish approach D with the following remaining work
> >>>>> 
> >>>>>    ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >>>>>    ** complete the implementation of uninitialized warnings maintenance 
> >>>>> work for D. 
> >>>>> 
> >>>>> 
> >>>>> Let me know if you have any comments and suggestions on my current and 
> >>>>> future work.
> >>>>> 
> >>>>> Thanks a lot for your help.
> >>>>> 
> >>>>> Qing
> >>>>> 
> >>>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches 
> >>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>>> 
> >>>>>> The following are the approaches I will implement and compare:
> >>>>>> 
> >>>>>> Our final goal is to keep the uninitialized warning and minimize the 
> >>>>>> run-time performance cost.
> >>>>>> 
> >>>>>> A. Adding real initialization during gimplification, not maintain the 
> >>>>>> uninitialized warnings.
> >>>>>> B. Adding real initialization during gimplification, marking them with 
> >>>>>> “artificial_init”. 
> >>>>>>  Adjusting uninitialized pass, maintaining the annotation, making sure 
> >>>>>> the real init not
> >>>>>>  Deleted from the fake init. 
> >>>>>> C.  Marking the DECL for an uninitialized auto variable as 
> >>>>>> “no_explicit_init” during gimplification,
> >>>>>>   maintain this “no_explicit_init” bit till after 
> >>>>>> pass_late_warn_uninitialized, or till pass_expand, 
> >>>>>>   add real initialization for all DECLs that are marked with 
> >>>>>> “no_explicit_init”.
> >>>>>> D. Adding .DEFFERED_INIT during gimplification, expand the 
> >>>>>> .DEFFERED_INIT during expand to
> >>>>>>  real initialization. Adjusting uninitialized pass with the new refs 
> >>>>>> with “.DEFFERED_INIT”.
> >>>>>> 
> >>>>>> 
> >>>>>> In the above, approach A will be the one that have the minimum 
> >>>>>> run-time cost, will be the base for the performance
> >>>>>> comparison. 
> >>>>>> 
> >>>>>> I will implement approach D then, this one is expected to have the 
> >>>>>> most run-time overhead among the above list, but
> >>>>>> Implementation should be the cleanest among B, C, D. Let’s see how 
> >>>>>> much more performance overhead this approach
> >>>>>> will be. If the data is good, maybe we can avoid the effort to 
> >>>>>> implement B, and C. 
> >>>>>> 
> >>>>>> If the performance of D is not good, I will implement B or C at that 
> >>>>>> time.
> >>>>>> 
> >>>>>> Let me know if you have any comment or suggestions.
> >>>>>> 
> >>>>>> Thanks.
> >>>>>> 
> >>>>>> Qing
> >>>>> 
> >>>> 
> >>>> 
> >>> 
> >>> -- 
> >>> Richard Biener <rguent...@suse.de <mailto:rguent...@suse.de> 
> >>> <mailto:rguent...@suse.de <mailto:rguent...@suse.de>>>
> >>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> >>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >> 
> >> 
> > 
> > -- 
> > Richard Biener <rguent...@suse.de <mailto:rguent...@suse.de>>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to