On Fri, 2026-03-20 at 10:46 -0500, Robert Dubner wrote:
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: Friday, March 20, 2026 03:24
> > To: Robert Dubner <[email protected]>
> > Cc: [email protected]
> > Subject: Re: COBOL: Hoping for insight with middle-end computation
> > time.
> > 
> > On Fri, Mar 20, 2026 at 12:23 AM Robert Dubner <[email protected]>
> > wrote:
> > > 
> > > It happens that COBOL has the COMPUTE statement.  It takes the
> > > form of,
> > > for example, COMPUTE DDD = AAA + BBB.
> > > 
> > > We implement that by creating a temporary variable, using that as
> > > the
> > > target of an addition of AAA and BBB, and then doing an
> > > assignment to 
> > > DDD.
> > > (Recall that COBOL variables can be quite complex, so we are a
> > > long way
> > > from being able to do this with an ADD_EXPR.)
> > > 
> > > We have determined that the way I've been producing GENERIC for
> > > that
> > > results in N-squared computation time somewhere in the middle
> > > end.
> > > 
> > > I have been tearing my hair out trying to figure out what's
> > > causing that
> > > N-squared behavior.  I commented away the assignment, and I got
> > > rid of 
> > > the
> > > arithmetic.  All that's left is the creation of the temporary,
> > > and some 
> > > IF
> > > statements that are generated to test for errors along the way of
> > > the
> > > computation.  (COBOL has a very rich error-detection and
> > > exception-generating facility.
> > > 
> > > The remaining GIMPLE for a single iteration (as shown by
> > > -fdump-tree-gimple) is shown below.  The "phase opt and generate"
> > > times
> > > for repetitions of that GIMPLE are shown here:
> > > 
> > >         phase opt          Factor
> > > Repeats & generate
> > >   1,000       0.17
> > >   2,000       0.49      2.9
> > >   4,000       1.55      3.2
> > >   8,000       7.56      4.9
> > >  16,000      49.31      6.5
> > >  32,000     281.29      5.7
> > > 
> > > I have been struggling with this for days.  Is there an
> > > explanation for
> > > why the following GIMPLE is resulting in that N-squared behavior?
> > > 
> > > I agree that the conditionals are a bit convoluted, and Jim and I
> > > are
> > > working to do COMPUTE in a much improved way.  But I still feel a
> > > need 
> > > to
> > > understand what's going on!
> > 
> > Can you share a COBOL testcase that shows what you observe above?
> > Maybe two of them, so I can see how one does the "repeats"?
> > 
> > It's always a bug in the middle-end unless the size of the initial
> > GIMPLE
> > code also grows squared.
> 
> The program is simple enough:
> ~~~~~~~~~~~~~~~~~
>         program-id.         prog.
>         data                division.
>         working-storage     section.
>         01 aaa pic 999.
>         01 bbb pic 999.
>         01 ddd pic 999.
>         procedure           division.
>             compute ddd = aaa + bbb
>             *> Repeat as necessary
>             compute ddd = aaa + bbb
>             goback.
> ~~~~~~~~~~~~~~~~~
> 
> 
> 
> With 500 repeats of "compute ddd = aaa + bbb", -fdump-tree-gimple
> produced a 
> file of 2.4Mbytes bytes.  1,000 repeats gives a file of 4.9Mbytes
> 
> The respective -ftime-reports start off with
> 
> Time variable                                  wall           GGC
>  phase parsing                      :   0.05 (  3%)  4838k (  8%)
>  phase opt and generate             :   1.70 ( 97%)    53M ( 92%)
> ...
>  thread pro- & epilogue             :   0.91 ( 52%)  4120  (  0%)
> 
> and
> 
> Time variable                                  wall           GGC
>  phase parsing                      :   0.07 (  1%)  9523k (  8%)
>  phase opt and generate             :   6.25 ( 99%)   107M ( 92%)
> ...
>  thread pro- & epilogue             :   4.38 ( 71%)  4120  (  0%)
> 
> Doubling it again to 2,000 COMPUTE DDD = AAA + BBB produces a
> 9.8Mbyte 
> gimple file, with these timings
> 
> Time variable                                  wall           GGC
>  phase parsing                      :   0.14 (  0%)    18M (  8%)
>  phase opt and generate             :  31.55 ( 99%)   213M ( 92%)
> ...
>  thread pro- & epilogue             :  27.55 ( 87%)  4120  (  0%)
> 
> 
> That's the full case that got me started investigating.

FWIW I tried this (with about 900 repeats of the "compute" line):

time perf record ./gcobol -B. -S test.cob -ftime-report

Time variable                                  wall           GGC
 phase setup                        :   0.01 (  0%)   141k (  0%)
 phase parsing                      :   0.40 (  1%)  8268k (  8%)
 phase opt and generate             :  59.42 ( 99%)    91M ( 92%)
 garbage collection                 :   0.30 (  1%)     0  (  0%)
 callgraph construction             :   0.24 (  0%)  4854k (  5%)
 callgraph optimization             :   0.11 (  0%)     0  (  0%)
 callgraph ipa passes               :   8.02 ( 13%)    12M ( 13%)
[..snip...]
 tree STMT verifier                 :   6.01 ( 10%)     0  (  0%)
[..snip...]
 thread pro- & epilogue             :  32.71 ( 55%)  4120  (  0%)
[..snip...]
 TOTAL                              :  59.83           99M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
[ perf record: Woken up 33 times to write data ]
[ perf record: Captured and wrote 9.123 MB perf.data (239039 samples) ]

real    1m0.256s
user    0m59.018s
sys     0m0.502s

and the top 10 places reported by "perf report" are:

  30.49%  cobol1   cobol1               [.] ix86_access_stack_p
   9.91%  cobol1   cobol1               [.] TEST_HARD_REG_BIT
   6.89%  cobol1   cobol1               [.] requires_stack_frame_p
   3.32%  cobol1   cobol1               [.] dominated_by_p
   1.55%  cobol1   cobol1               [.] mark_used_flags
   1.33%  cobol1   cobol1               [.] gimple_code
   0.98%  cobol1   cobol1               [.] 
is_a_helper<rtx_insn*>::test<rtx_def>
   0.93%  cobol1   cobol1               [.] contains_struct_check
   0.79%  cobol1   cobol1               [.] verify_rtx_sharing
   0.70%  cobol1   cobol1               [.] BLOCK_FOR_INSN

i.e. I saw 30% of the time spent in an x86-specific function, albeit
with a debug build of the compiler.

Dave


> 
> I think you'll find that the gimple from the full compilation is more
> complicated than necessary.
> 
> If you apply this patch, you'll find it carves away a lot of generic
> that is 
> irrelevant to this discussion:
> ~~~~~~~~~~~~~~~~~~~~
> diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
> index 4f71f9b1152..5f43e759b9f 100644
> --- a/gcc/cobol/genapi.cc
> +++ b/gcc/cobol/genapi.cc
> @@ -5867,13 +5867,13 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
>          TREEPLET tsource;
>          treeplet_fill_source(tsource, sourceref);
>          static bool check_for_error = true;
> -        move_helper(erf,
> -                    destref,
> -                    sourceref,
> -                    tsource,
> -                    rounded,
> -                    check_for_error,
> -                    true);
> +////        move_helper(erf,
> +////                    destref,
> +////                    sourceref,
> +////                    tsource,
> +////                    rounded,
> +////                    check_for_error,
> +////                    true);
> 
>          gg_assign(error_flag, gg_bitwise_or(error_flag, erf));
>          IF(error_flag, ne_op, integer_zero_node)
> @@ -5885,10 +5885,10 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
>              }
>            // There was an error during the move.  Set the exception
> status
>            // information:
> -          gg_call(  VOID,
> -                    "__gg__process_compute_error",
> -                    build_int_cst_type(INT, compute_error_truncate),
> -                    NULL_TREE);
> +////          gg_call(  VOID,
> +////                    "__gg__process_compute_error",
> +////                    build_int_cst_type(INT,
> compute_error_truncate),
> +////                    NULL_TREE);
>            // But because there is an ON ERROR clause, suppress
> DECLARATIVE
>            // processing
>            gg_assign(var_decl_exception_code, integer_zero_node);
> @@ -5935,13 +5935,13 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
>          TREEPLET tsource;
>          treeplet_fill_source(tsource, sourceref);
>          static bool check_for_error = true;
> -        move_helper(erf,
> -                    destref,
> -                    sourceref,
> -                    tsource,
> -                    rounded,
> -                    check_for_error,
> -                    false);
> +////        move_helper(erf,
> +////                    destref,
> +////                    sourceref,
> +////                    tsource,
> +////                    rounded,
> +////                    check_for_error,
> +////                    false);
>          gg_assign(error_flag, gg_bitwise_or(error_flag, erf));
>          IF(error_flag, ne_op, integer_zero_node)
>            {
> @@ -5952,10 +5952,10 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
>              TRACE1_INDENT
>              TRACE1_TEXT("Error during the move; calling 
> __gg__process_compute_error")
>              }
> -          gg_call(  VOID,
> -                    "__gg__process_compute_error",
> -                    build_int_cst_type(INT, compute_error_truncate),
> -                    NULL_TREE);
> +////          gg_call(  VOID,
> +////                    "__gg__process_compute_error",
> +////                    build_int_cst_type(INT,
> compute_error_truncate),
> +////                    NULL_TREE);
>            }
>          ELSE
>            {
> diff --git a/gcc/cobol/genmath.cc b/gcc/cobol/genmath.cc
> index 6eb87544ac0..8bfe4f9e944 100644
> --- a/gcc/cobol/genmath.cc
> +++ b/gcc/cobol/genmath.cc
> @@ -1481,6 +1481,7 @@ parser_op( struct cbl_refer_t cref,
>             struct cbl_refer_t bref,
>             struct cbl_label_t *compute_error_label)
>    {
> +return;
>    Analyze();
>    set_up_compute_error_label(compute_error_label);
> ~~~~~~~~~~~~~~~~~
> 
> you'll get gimple which is much simpler, but still exhibits the N-
> squared 
> middle-end behavior.  After the patch
> 
> 4,000 COMPUTE statements yields a 7.1M gimple file and takes  3.55
> seconds 
> to compile.
> 8,000 COMPUTE statements yields a 15M  gimple file and takes 14.95
> seconds.
> 
> 
> > 
> > > 
> > > Thanks so much for any insight into what might be happening here.
> > > 
> > >       _intermediate__stack501_503.0.0.data =
> > > &_stack501_data_517.0;
> > >       _intermediate__stack501_503.0.0.capacity = 16;
> > >       _intermediate__stack501_503.0.0.allocated = 16;
> > >       _intermediate__stack501_503.0.0.offset = 0;
> > >       _intermediate__stack501_503.0.0.name = &"_stack501"[0];
> > >       _intermediate__stack501_503.0.0.picture = &""[0];
> > >       _intermediate__stack501_503.0.0.initial = 0B;
> > >       _intermediate__stack501_503.0.0.parent = 0B;
> > >       _intermediate__stack501_503.0.0.occurs_lower = 0;
> > >       _intermediate__stack501_503.0.0.occurs_upper = 0;
> > >       _intermediate__stack501_503.0.0.attr = 4160;
> > >       _intermediate__stack501_503.0.0.type = 6;
> > >       _intermediate__stack501_503.0.0.level = 0;
> > >       _intermediate__stack501_503.0.0.digits = 37;
> > >       _intermediate__stack501_503.0.0.rdigits = 0;
> > >       _intermediate__stack501_503.0.0.encoding = 1;
> > >       _intermediate__stack501_503.0.0.alphabet = 0;
> > >       D.2812 = 0;
> > >       D.2813 = 0;
> > >       _1013 = D.2812 & 18;
> > >       if (_1013 != 0) goto <D.9353>; else goto <D.9354>;
> > >       <D.9353>:
> > >       goto <D.9355>;
> > >       <D.9354>:
> > >       D.2814 = 0;
> > >       ..pa_erf.1519_1014 = ..pa_erf;
> > >       D.2813 = D.2813 | ..pa_erf.1519_1014;
> > >       if (D.2813 != 0) goto <D.9357>; else goto <D.9358>;
> > >       <D.9357>:
> > >       goto <D.9359>;
> > >       <D.9358>:
> > >       <D.9359>:
> > >       <D.9355>:
> > > 
> > > 
> 

Reply via email to