On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote:
> 
> 
> > -----Original Message-----
> > From: David Malcolm <[email protected]>
> > Sent: Friday, March 20, 2026 10:10
> > To: Robert Dubner <[email protected]>; [email protected]
> > Subject: Re: COBOL: Hoping for insight with middle-end computation
> > time.
> > 
> > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote:
> > > It happens that COBOL has the COMPUTE statement.  It takes the
> > > form
> > > of,
> > > for example, COMPUTE DDD = AAA + BBB.
> > > 
> > > We implement that by creating a temporary variable, using that as
> > > the
> > > target of an addition of AAA and BBB, and then doing an
> > > assignment to
> > > DDD.
> > > (Recall that COBOL variables can be quite complex, so we are a
> > > long
> > > way
> > > from being able to do this with an ADD_EXPR.)
> > > 
> > > We have determined that the way I've been producing GENERIC for
> > > that
> > > results in N-squared computation time somewhere in the middle
> > > end.
> > > 
> > > I have been tearing my hair out trying to figure out what's
> > > causing
> > > that
> > > N-squared behavior.  I commented away the assignment, and I got
> > > rid
> > > of the
> > > arithmetic.  All that's left is the creation of the temporary,
> > > and
> > > some IF
> > > statements that are generated to test for errors along the way of
> > > the
> > > computation.  (COBOL has a very rich error-detection and
> > > exception-generating facility.
> > > 
> > > The remaining GIMPLE for a single iteration (as shown by
> > > -fdump-tree-gimple) is shown below.  The "phase opt and generate"
> > > times
> > > for repetitions of that GIMPLE are shown here:
> > > 
> > >         phase opt    Factor
> > > Repeats & generate
> > >   1,000       0.17
> > >   2,000       0.49      2.9
> > >   4,000       1.55      3.2
> > >   8,000       7.56      4.9
> > >  16,000      49.31      6.5
> > >  32,000     281.29      5.7
> > > 
> > > I have been struggling with this for days.  Is there an
> > > explanation
> > > for
> > > why the following GIMPLE is resulting in that N-squared behavior?
> > 
> > Hi Bob.  You cite the overall "phase opt and generate" times, but
> > no
> > data on how this is spread across the various optimization passes.
> > 
> > What's the output of -ftime-report on your workload?
> > 
> > In particular, are there any specific passes that are responsible
> > for
> > the growth in time (and thus where we can pinpoint a bug), or is
> > the
> > time evenly distributed across all of them?
> > 
> > Sorry if this is a silly question
> > Dave
> 
> The only silly thing is that in an egregious display of monumental
> arrogance 
> and ignorance, some years back I decided to take on the problem of
> code 
> generation for a new front end without having had any prior
> experience with 
> GCC internals or compiler theory, and without access to anybody who
> actually 
> knew anything.
> 
> I hope you've seen my response to Richard.  For a compilation with
> 
> phase opt and generate             :  14.95 ( 92%)   259M ( 88%)
> 
> the next big component is
> 
> thread pro- & epilogue             :  10.45 ( 64%)  4104  (  0%)
> 
> What that means is yet one more mystery to me.

64% of the wallclock time is being accounted to this timing item. 
Looking in timevar.def, I see that this timing item is:

DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue")

Grepping for TV_THREAD_PROLOGUE_AND_EPILOGUE, I see two passes in
function.cc that account their time to this timevar:
pass_thread_prologue_and_epilogue and
pass_late_thread_prologue_and_epilogue.

Both of these passes ultimately call the function
rest_of_handle_thread_prologue_and_epilogue.

So the slowdown is presumably somewhere inside there.

Dave




> 
> Here's the entire output of -ftime-report-details:
> 
> Time variable                                  wall           GGC
>  phase setup                        :   0.01 (  0%)   150k (  0%)
>  phase parsing                      :   1.33 (  8%)    36M ( 12%)
>  phase opt and generate             :  14.95 ( 92%)   259M ( 88%)
>  phase last asm                     :   0.02 (  0%)   377k (  0%)
>  phase finalize                     :   0.01 (  0%)     0  (  0%)
>  garbage collection                 :   0.09 (  1%)     0  (  0%)
>  callgraph construction             :   0.13 (  1%)    12M (  4%)
>  `- CFG verifier                    :   0.02 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.16 (  1%)     0  (  0%)
>  `- symout                          :   0.01 (  0%)    10M (  4%)
>  `- tree SSA verifier               :   0.04 (  0%)     0  (  0%)
>  `- garbage collection              :   0.01 (  0%)     0  (  0%)
>  callgraph optimization             :   0.04 (  0%)     0  (  0%)
>  `- dominance computation           :   0.01 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.21 (  1%)     0  (  0%)
>  `- tree SSA verifier               :   0.06 (  0%)     0  (  0%)
>  `- garbage collection              :   0.03 (  0%)     0  (  0%)
>  callgraph ipa passes               :   1.14 (  7%)    35M ( 12%)
>  ipa function summary               :   0.00 (  0%)  1832  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  ipa inlining heuristics            :   0.00 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  ipa comdats                        :   0.00 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  ipa free lang data                 :   0.00 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  ipa free inline summary            :   0.00 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.04 (  0%)     0  (  0%)
>  ipa modref                         :   0.00 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  cfg construction                   :   0.01 (  0%)  1232  (  0%)
>  `- rebuild jump labels             :   0.01 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.03 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.03 (  0%)     0  (  0%)
>  cfg cleanup                        :   0.01 (  0%)   208  (  0%)
>  `- CFG verifier                    :   0.03 (  0%)     0  (  0%)
>  CFG verifier                       :   0.40 (  2%)     0  (  0%)
>  trivially dead code                :   0.02 (  0%)     0  (  0%)
>  df scan insns                      :   0.07 (  0%)    96  (  0%)
>  `- verify RTL sharing              :   0.01 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  df live regs                       :   0.03 (  0%)     0  (  0%)
>  df reg dead/unused notes           :   0.03 (  0%)  2628k (  1%)
>  register information               :   0.01 (  0%)     0  (  0%)
>  alias analysis                     :   0.01 (  0%)  1024k (  0%)
>  rebuild jump labels                :   0.01 (  0%)   168  (  0%)
>  parser (global)                    :   1.33 (  8%)    36M ( 12%)
>  early inlining heuristics          :   0.00 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  inline parameters                  :   0.03 (  0%)   672  (  0%)
>  `- tree SSA verifier               :   0.02 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.04 (  0%)     0  (  0%)
>  tree gimplify                      :   0.09 (  1%)    50M ( 17%)
>  tree eh                            :   0.01 (  0%)   584  (  0%)
>  tree CFG construction              :   0.02 (  0%)    13M (  5%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  tree CFG cleanup                   :   0.02 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  `- dominance computation           :   0.01 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  tree SSA other                     :   0.00 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  tree SSA rewrite                   :   0.03 (  0%)    20M (  7%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  `- tree operand scan               :   0.01 (  0%)  8470k (  3%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  tree operand scan                  :   0.01 (  0%)  8470k (  3%)
>  tree SSA verifier                  :   0.36 (  2%)     0  (  0%)
>  `- dominance computation           :   0.06 (  0%)     0  (  0%)
>  tree STMT verifier                 :   0.88 (  5%)     0  (  0%)
>  tree switch lowering               :   0.00 (  0%)     0  (  0%)
>  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  callgraph verifier                 :   0.06 (  0%)     0  (  0%)
>  `- callgraph verifier              :   0.06 (  0%)     0  (  0%)
>  dominance computation              :   0.12 (  1%)     0  (  0%)
>  out of ssa                         :   0.03 (  0%)   952  (  0%)
>  expand vars                        :   0.01 (  0%)  7040k (  2%)
>  expand                             :   0.16 (  1%)    86M ( 29%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.01 (  0%)     0  (  0%)
>  `- out of ssa                      :   0.03 (  0%)   952  (  0%)
>  `- expand vars                     :   0.01 (  0%)  7040k (  2%)
>  `- post expand cleanups            :   0.01 (  0%)    96  (  0%)
>  post expand cleanups               :   0.01 (  0%)  3712  (  0%)
>  `- rebuild jump labels             :   0.01 (  0%)   168  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  jump                               :   0.00 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.03 (  0%)     0  (  0%)
>  `- trivially dead code             :   0.01 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.02 (  0%)     0  (  0%)
>  loop init                          :   0.01 (  0%)  3040  (  0%)
>  mode switching                     :   0.00 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.01 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  integrated RA                      :   0.49 (  3%)    12M (  4%)
>  `- register information            :   0.01 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
>  `- df reg dead/unused notes        :   0.03 (  0%)  2628k (  1%)
>  `- alias analysis                  :   0.01 (  0%)  1024k (  0%)
>  `- trivially dead code             :   0.01 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  `- df live regs                    :   0.01 (  0%)     0  (  0%)
>  LRA non-specific                   :   0.22 (  1%)   352  (  0%)
>  `- LRA hard reg assignment         :   0.01 (  0%)     0  (  0%)
>  `- LRA virtuals elimination        :   0.17 (  1%)    23M (  8%)
>  `- LRA create live ranges          :   0.01 (  0%)     0  (  0%)
>  LRA virtuals elimination           :   0.17 (  1%)    23M (  8%)
>  LRA create live ranges             :   0.01 (  0%)     0  (  0%)
>  LRA hard reg assignment            :   0.01 (  0%)     0  (  0%)
>  reload                             :   0.00 (  0%)    48  (  0%)
>  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
>  `- integrated RA                   :   0.05 (  0%)    96  (  0%)
>  `- LRA non-specific                :   0.06 (  0%)    24  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  thread pro- & epilogue             :  10.45 ( 64%)  4104  (  0%)
>  `- CFG verifier                    :   0.03 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
>  `- df live regs                    :   0.01 (  0%)     0  (  0%)
>  machine dep reorg                  :   0.00 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
>  shorten branches                   :   0.05 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
>  reg stack                          :   0.00 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.04 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.02 (  0%)     0  (  0%)
>  final                              :   0.10 (  1%)  3887k (  1%)
>  `- verify RTL sharing              :   0.04 (  0%)     0  (  0%)
>  `- symout                          :   0.01 (  0%)  7192k (  2%)
>  symout                             :   0.04 (  0%)    18M (  6%)
>  access analysis                    :   0.06 (  0%)    24  (  0%)
>  `- tree SSA verifier               :   0.02 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.04 (  0%)     0  (  0%)
>  `- dominance computation           :   0.01 (  0%)     0  (  0%)
>  early local passes                 :   0.00 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
>  rest of compilation                :   0.10 (  1%)  1754k (  1%)
>  `- dominance computation           :   0.01 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.24 (  1%)     0  (  0%)
>  `- garbage collection              :   0.04 (  0%)     0  (  0%)
>  `- tree STMT verifier              :   0.13 (  1%)     0  (  0%)
>  `- tree SSA verifier               :   0.06 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.13 (  1%)     0  (  0%)
>  unaccounted post reload            :   0.00 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  unaccounted late compilation       :   0.00 (  0%)     0  (  0%)
>  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
>  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
>  verify RTL sharing                 :   0.54 (  3%)     0  (  0%)
>  TOTAL                              :  16.32          296M
> 
> 
> > 
> > 
> > > 
> > > I agree that the conditionals are a bit convoluted, and Jim and I
> > > are
> > > working to do COMPUTE in a much improved way.  But I still feel a
> > > need to
> > > understand what's going on!
> > > 
> > > Thanks so much for any insight into what might be happening here.
> > > 
> > >       _intermediate__stack501_503.0.0.data =
> > > &_stack501_data_517.0;
> > >       _intermediate__stack501_503.0.0.capacity = 16;
> > >       _intermediate__stack501_503.0.0.allocated = 16;
> > >       _intermediate__stack501_503.0.0.offset = 0;
> > >       _intermediate__stack501_503.0.0.name = &"_stack501"[0];
> > >       _intermediate__stack501_503.0.0.picture = &""[0];
> > >       _intermediate__stack501_503.0.0.initial = 0B;
> > >       _intermediate__stack501_503.0.0.parent = 0B;
> > >       _intermediate__stack501_503.0.0.occurs_lower = 0;
> > >       _intermediate__stack501_503.0.0.occurs_upper = 0;
> > >       _intermediate__stack501_503.0.0.attr = 4160;
> > >       _intermediate__stack501_503.0.0.type = 6;
> > >       _intermediate__stack501_503.0.0.level = 0;
> > >       _intermediate__stack501_503.0.0.digits = 37;
> > >       _intermediate__stack501_503.0.0.rdigits = 0;
> > >       _intermediate__stack501_503.0.0.encoding = 1;
> > >       _intermediate__stack501_503.0.0.alphabet = 0;
> > >       D.2812 = 0;
> > >       D.2813 = 0;
> > >       _1013 = D.2812 & 18;
> > >       if (_1013 != 0) goto <D.9353>; else goto <D.9354>;
> > >       <D.9353>:
> > >       goto <D.9355>;
> > >       <D.9354>:
> > >       D.2814 = 0;
> > >       ..pa_erf.1519_1014 = ..pa_erf;
> > >       D.2813 = D.2813 | ..pa_erf.1519_1014;
> > >       if (D.2813 != 0) goto <D.9357>; else goto <D.9358>;
> > >       <D.9357>:
> > >       goto <D.9359>;
> > >       <D.9358>:
> > >       <D.9359>:
> > >       <D.9355>:
> > > 
> > > 
> 

Reply via email to