On Fri, Mar 20, 2026 at 8:58 PM Richard Biener
<[email protected]> wrote:
>
> On Fri, Mar 20, 2026 at 8:09 PM David Malcolm via Gcc <[email protected]> wrote:
> >
> > On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: David Malcolm <[email protected]>
> > > > Sent: Friday, March 20, 2026 10:10
> > > > To: Robert Dubner <[email protected]>; [email protected]
> > > > Subject: Re: COBOL: Hoping for insight with middle-end computation
> > > > time.
> > > >
> > > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote:
> > > > > It happens that COBOL has the COMPUTE statement.  It takes the
> > > > > form
> > > > > of,
> > > > > for example, COMPUTE DDD = AAA + BBB.
> > > > >
> > > > > We implement that by creating a temporary variable, using that as
> > > > > the
> > > > > target of an addition of AAA and BBB, and then doing an
> > > > > assignment to
> > > > > DDD.
> > > > > (Recall that COBOL variables can be quite complex, so we are a
> > > > > long
> > > > > way
> > > > > from being able to do this with an ADD_EXPR.)
> > > > >
> > > > > We have determined that the way I've been producing GENERIC for
> > > > > that
> > > > > results in N-squared computation time somewhere in the middle
> > > > > end.
> > > > >
> > > > > I have been tearing my hair out trying to figure out what's
> > > > > causing
> > > > > that
> > > > > N-squared behavior.  I commented away the assignment, and I got
> > > > > rid
> > > > > of the
> > > > > arithmetic.  All that's left is the creation of the temporary,
> > > > > and
> > > > > some IF
> > > > > statements that are generated to test for errors along the way of
> > > > > the
> > > > > computation.  (COBOL has a very rich error-detection and
> > > > > exception-generating facility.
> > > > >
> > > > > The remaining GIMPLE for a single iteration (as shown by
> > > > > -fdump-tree-gimple) is shown below.  The "phase opt and generate"
> > > > > times
> > > > > for repetitions of that GIMPLE are shown here:
> > > > >
> > > > >         phase opt    Factor
> > > > > Repeats & generate
> > > > >   1,000       0.17
> > > > >   2,000       0.49      2.9
> > > > >   4,000       1.55      3.2
> > > > >   8,000       7.56      4.9
> > > > >  16,000      49.31      6.5
> > > > >  32,000     281.29      5.7
> > > > >
> > > > > I have been struggling with this for days.  Is there an
> > > > > explanation
> > > > > for
> > > > > why the following GIMPLE is resulting in that N-squared behavior?
> > > >
> > > > Hi Bob.  You cite the overall "phase opt and generate" times, but
> > > > no
> > > > data on how this is spread across the various optimization passes.
> > > >
> > > > What's the output of -ftime-report on your workload?
> > > >
> > > > In particular, are there any specific passes that are responsible
> > > > for
> > > > the growth in time (and thus where we can pinpoint a bug), or is
> > > > the
> > > > time evenly distributed across all of them?
> > > >
> > > > Sorry if this is a silly question
> > > > Dave
> > >
> > > The only silly thing is that in an egregious display of monumental
> > > arrogance
> > > and ignorance, some years back I decided to take on the problem of
> > > code
> > > generation for a new front end without having had any prior
> > > experience with
> > > GCC internals or compiler theory, and without access to anybody who
> > > actually
> > > knew anything.
> > >
> > > I hope you've seen my response to Richard.  For a compilation with
> > >
> > > phase opt and generate             :  14.95 ( 92%)   259M ( 88%)
> > >
> > > the next big component is
> > >
> > > thread pro- & epilogue             :  10.45 ( 64%)  4104  (  0%)
> > >
> > > What that means is yet one more mystery to me.
> >
> > 64% of the wallclock time is being accounted to this timing item.
> > Looking in timevar.def, I see that this timing item is:
> >
> > DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue")
> >
> > Grepping for TV_THREAD_PROLOGUE_AND_EPILOGUE, I see two passes in
> > function.cc that account their time to this timevar:
> > pass_thread_prologue_and_epilogue and
> > pass_late_thread_prologue_and_epilogue.
> >
> > Both of these passes ultimately call the function
> > rest_of_handle_thread_prologue_and_epilogue.
> >
> > So the slowdown is presumably somewhere inside there.
>
> Possibly shrink-wrapping, try -fno-shrink-wrap

Ah no, this is without optimizing only.  With -O1 it "improves" to

 thread pro- & epilogue             :   0.20 (  2%)  2112  (  0%)

but overall it's slower.  It might no longer expose quadraticness
though, doubling the number of lines only doubles compile-time for me with -O1
and then (at ~1000 times the compute) -O0 is double the time of -O1.

> Other than that - clearly a bug.  Care to file a bugzilla for this
> compile-time hog?

Still interesting to figure out why it's so slow at -O0.

Richard.

> Richard.
>
> > Dave
> >
> >
> >
> >
> > >
> > > Here's the entire output of -ftime-report-details:
> > >
> > > Time variable                                  wall           GGC
> > >  phase setup                        :   0.01 (  0%)   150k (  0%)
> > >  phase parsing                      :   1.33 (  8%)    36M ( 12%)
> > >  phase opt and generate             :  14.95 ( 92%)   259M ( 88%)
> > >  phase last asm                     :   0.02 (  0%)   377k (  0%)
> > >  phase finalize                     :   0.01 (  0%)     0  (  0%)
> > >  garbage collection                 :   0.09 (  1%)     0  (  0%)
> > >  callgraph construction             :   0.13 (  1%)    12M (  4%)
> > >  `- CFG verifier                    :   0.02 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.16 (  1%)     0  (  0%)
> > >  `- symout                          :   0.01 (  0%)    10M (  4%)
> > >  `- tree SSA verifier               :   0.04 (  0%)     0  (  0%)
> > >  `- garbage collection              :   0.01 (  0%)     0  (  0%)
> > >  callgraph optimization             :   0.04 (  0%)     0  (  0%)
> > >  `- dominance computation           :   0.01 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.21 (  1%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.06 (  0%)     0  (  0%)
> > >  `- garbage collection              :   0.03 (  0%)     0  (  0%)
> > >  callgraph ipa passes               :   1.14 (  7%)    35M ( 12%)
> > >  ipa function summary               :   0.00 (  0%)  1832  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  ipa inlining heuristics            :   0.00 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  ipa comdats                        :   0.00 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  ipa free lang data                 :   0.00 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  ipa free inline summary            :   0.00 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.04 (  0%)     0  (  0%)
> > >  ipa modref                         :   0.00 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  cfg construction                   :   0.01 (  0%)  1232  (  0%)
> > >  `- rebuild jump labels             :   0.01 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.03 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.03 (  0%)     0  (  0%)
> > >  cfg cleanup                        :   0.01 (  0%)   208  (  0%)
> > >  `- CFG verifier                    :   0.03 (  0%)     0  (  0%)
> > >  CFG verifier                       :   0.40 (  2%)     0  (  0%)
> > >  trivially dead code                :   0.02 (  0%)     0  (  0%)
> > >  df scan insns                      :   0.07 (  0%)    96  (  0%)
> > >  `- verify RTL sharing              :   0.01 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  df live regs                       :   0.03 (  0%)     0  (  0%)
> > >  df reg dead/unused notes           :   0.03 (  0%)  2628k (  1%)
> > >  register information               :   0.01 (  0%)     0  (  0%)
> > >  alias analysis                     :   0.01 (  0%)  1024k (  0%)
> > >  rebuild jump labels                :   0.01 (  0%)   168  (  0%)
> > >  parser (global)                    :   1.33 (  8%)    36M ( 12%)
> > >  early inlining heuristics          :   0.00 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  inline parameters                  :   0.03 (  0%)   672  (  0%)
> > >  `- tree SSA verifier               :   0.02 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.04 (  0%)     0  (  0%)
> > >  tree gimplify                      :   0.09 (  1%)    50M ( 17%)
> > >  tree eh                            :   0.01 (  0%)   584  (  0%)
> > >  tree CFG construction              :   0.02 (  0%)    13M (  5%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  tree CFG cleanup                   :   0.02 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  `- dominance computation           :   0.01 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  tree SSA other                     :   0.00 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  tree SSA rewrite                   :   0.03 (  0%)    20M (  7%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  `- tree operand scan               :   0.01 (  0%)  8470k (  3%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  tree operand scan                  :   0.01 (  0%)  8470k (  3%)
> > >  tree SSA verifier                  :   0.36 (  2%)     0  (  0%)
> > >  `- dominance computation           :   0.06 (  0%)     0  (  0%)
> > >  tree STMT verifier                 :   0.88 (  5%)     0  (  0%)
> > >  tree switch lowering               :   0.00 (  0%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.01 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  callgraph verifier                 :   0.06 (  0%)     0  (  0%)
> > >  `- callgraph verifier              :   0.06 (  0%)     0  (  0%)
> > >  dominance computation              :   0.12 (  1%)     0  (  0%)
> > >  out of ssa                         :   0.03 (  0%)   952  (  0%)
> > >  expand vars                        :   0.01 (  0%)  7040k (  2%)
> > >  expand                             :   0.16 (  1%)    86M ( 29%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.01 (  0%)     0  (  0%)
> > >  `- out of ssa                      :   0.03 (  0%)   952  (  0%)
> > >  `- expand vars                     :   0.01 (  0%)  7040k (  2%)
> > >  `- post expand cleanups            :   0.01 (  0%)    96  (  0%)
> > >  post expand cleanups               :   0.01 (  0%)  3712  (  0%)
> > >  `- rebuild jump labels             :   0.01 (  0%)   168  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  jump                               :   0.00 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.03 (  0%)     0  (  0%)
> > >  `- trivially dead code             :   0.01 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.02 (  0%)     0  (  0%)
> > >  loop init                          :   0.01 (  0%)  3040  (  0%)
> > >  mode switching                     :   0.00 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.01 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  integrated RA                      :   0.49 (  3%)    12M (  4%)
> > >  `- register information            :   0.01 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
> > >  `- df reg dead/unused notes        :   0.03 (  0%)  2628k (  1%)
> > >  `- alias analysis                  :   0.01 (  0%)  1024k (  0%)
> > >  `- trivially dead code             :   0.01 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  `- df live regs                    :   0.01 (  0%)     0  (  0%)
> > >  LRA non-specific                   :   0.22 (  1%)   352  (  0%)
> > >  `- LRA hard reg assignment         :   0.01 (  0%)     0  (  0%)
> > >  `- LRA virtuals elimination        :   0.17 (  1%)    23M (  8%)
> > >  `- LRA create live ranges          :   0.01 (  0%)     0  (  0%)
> > >  LRA virtuals elimination           :   0.17 (  1%)    23M (  8%)
> > >  LRA create live ranges             :   0.01 (  0%)     0  (  0%)
> > >  LRA hard reg assignment            :   0.01 (  0%)     0  (  0%)
> > >  reload                             :   0.00 (  0%)    48  (  0%)
> > >  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
> > >  `- integrated RA                   :   0.05 (  0%)    96  (  0%)
> > >  `- LRA non-specific                :   0.06 (  0%)    24  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  thread pro- & epilogue             :  10.45 ( 64%)  4104  (  0%)
> > >  `- CFG verifier                    :   0.03 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
> > >  `- df live regs                    :   0.01 (  0%)     0  (  0%)
> > >  machine dep reorg                  :   0.00 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
> > >  shorten branches                   :   0.05 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
> > >  reg stack                          :   0.00 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.04 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.02 (  0%)     0  (  0%)
> > >  final                              :   0.10 (  1%)  3887k (  1%)
> > >  `- verify RTL sharing              :   0.04 (  0%)     0  (  0%)
> > >  `- symout                          :   0.01 (  0%)  7192k (  2%)
> > >  symout                             :   0.04 (  0%)    18M (  6%)
> > >  access analysis                    :   0.06 (  0%)    24  (  0%)
> > >  `- tree SSA verifier               :   0.02 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.04 (  0%)     0  (  0%)
> > >  `- dominance computation           :   0.01 (  0%)     0  (  0%)
> > >  early local passes                 :   0.00 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.02 (  0%)     0  (  0%)
> > >  rest of compilation                :   0.10 (  1%)  1754k (  1%)
> > >  `- dominance computation           :   0.01 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.24 (  1%)     0  (  0%)
> > >  `- garbage collection              :   0.04 (  0%)     0  (  0%)
> > >  `- tree STMT verifier              :   0.13 (  1%)     0  (  0%)
> > >  `- tree SSA verifier               :   0.06 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.13 (  1%)     0  (  0%)
> > >  unaccounted post reload            :   0.00 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  unaccounted late compilation       :   0.00 (  0%)     0  (  0%)
> > >  `- verify RTL sharing              :   0.02 (  0%)     0  (  0%)
> > >  `- CFG verifier                    :   0.01 (  0%)     0  (  0%)
> > >  verify RTL sharing                 :   0.54 (  3%)     0  (  0%)
> > >  TOTAL                              :  16.32          296M
> > >
> > >
> > > >
> > > >
> > > > >
> > > > > I agree that the conditionals are a bit convoluted, and Jim and I
> > > > > are
> > > > > working to do COMPUTE in a much improved way.  But I still feel a
> > > > > need to
> > > > > understand what's going on!
> > > > >
> > > > > Thanks so much for any insight into what might be happening here.
> > > > >
> > > > >       _intermediate__stack501_503.0.0.data =
> > > > > &_stack501_data_517.0;
> > > > >       _intermediate__stack501_503.0.0.capacity = 16;
> > > > >       _intermediate__stack501_503.0.0.allocated = 16;
> > > > >       _intermediate__stack501_503.0.0.offset = 0;
> > > > >       _intermediate__stack501_503.0.0.name = &"_stack501"[0];
> > > > >       _intermediate__stack501_503.0.0.picture = &""[0];
> > > > >       _intermediate__stack501_503.0.0.initial = 0B;
> > > > >       _intermediate__stack501_503.0.0.parent = 0B;
> > > > >       _intermediate__stack501_503.0.0.occurs_lower = 0;
> > > > >       _intermediate__stack501_503.0.0.occurs_upper = 0;
> > > > >       _intermediate__stack501_503.0.0.attr = 4160;
> > > > >       _intermediate__stack501_503.0.0.type = 6;
> > > > >       _intermediate__stack501_503.0.0.level = 0;
> > > > >       _intermediate__stack501_503.0.0.digits = 37;
> > > > >       _intermediate__stack501_503.0.0.rdigits = 0;
> > > > >       _intermediate__stack501_503.0.0.encoding = 1;
> > > > >       _intermediate__stack501_503.0.0.alphabet = 0;
> > > > >       D.2812 = 0;
> > > > >       D.2813 = 0;
> > > > >       _1013 = D.2812 & 18;
> > > > >       if (_1013 != 0) goto <D.9353>; else goto <D.9354>;
> > > > >       <D.9353>:
> > > > >       goto <D.9355>;
> > > > >       <D.9354>:
> > > > >       D.2814 = 0;
> > > > >       ..pa_erf.1519_1014 = ..pa_erf;
> > > > >       D.2813 = D.2813 | ..pa_erf.1519_1014;
> > > > >       if (D.2813 != 0) goto <D.9357>; else goto <D.9358>;
> > > > >       <D.9357>:
> > > > >       goto <D.9359>;
> > > > >       <D.9358>:
> > > > >       <D.9359>:
> > > > >       <D.9355>:
> > > > >
> > > > >
> > >
> >

Reply via email to