On Fri, 2026-03-20 at 10:46 -0500, Robert Dubner wrote:
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: Friday, March 20, 2026 03:24
> > To: Robert Dubner <[email protected]>
> > Cc: [email protected]
> > Subject: Re: COBOL: Hoping for insight with middle-end computation
> > time.
> >
> > On Fri, Mar 20, 2026 at 12:23 AM Robert Dubner <[email protected]>
> > wrote:
> > >
> > > It happens that COBOL has the COMPUTE statement. It takes the
> > > form of,
> > > for example, COMPUTE DDD = AAA + BBB.
> > >
> > > We implement that by creating a temporary variable, using that as
> > > the
> > > target of an addition of AAA and BBB, and then doing an
> > > assignment to
> > > DDD.
> > > (Recall that COBOL variables can be quite complex, so we are a
> > > long way
> > > from being able to do this with an ADD_EXPR.)
> > >
> > > We have determined that the way I've been producing GENERIC for
> > > that
> > > results in N-squared computation time somewhere in the middle
> > > end.
> > >
> > > I have been tearing my hair out trying to figure out what's
> > > causing that
> > > N-squared behavior. I commented away the assignment, and I got
> > > rid of
> > > the
> > > arithmetic. All that's left is the creation of the temporary,
> > > and some
> > > IF
> > > statements that are generated to test for errors along the way of
> > > the
> > > computation. (COBOL has a very rich error-detection and
> > > exception-generating facility.
> > >
> > > The remaining GIMPLE for a single iteration (as shown by
> > > -fdump-tree-gimple) is shown below. The "phase opt and generate"
> > > times
> > > for repetitions of that GIMPLE are shown here:
> > >
> > > phase opt Factor
> > > Repeats & generate
> > > 1,000 0.17
> > > 2,000 0.49 2.9
> > > 4,000 1.55 3.2
> > > 8,000 7.56 4.9
> > > 16,000 49.31 6.5
> > > 32,000 281.29 5.7
> > >
> > > I have been struggling with this for days. Is there an
> > > explanation for
> > > why the following GIMPLE is resulting in that N-squared behavior?
> > >
> > > I agree that the conditionals are a bit convoluted, and Jim and I
> > > are
> > > working to do COMPUTE in a much improved way. But I still feel a
> > > need
> > > to
> > > understand what's going on!
> >
> > Can you share a COBOL testcase that shows what you observe above?
> > Maybe two of them, so I can see how one does the "repeats"?
> >
> > It's always a bug in the middle-end unless the size of the initial
> > GIMPLE
> > code also grows squared.
>
> The program is simple enough:
> ~~~~~~~~~~~~~~~~~
> program-id. prog.
> data division.
> working-storage section.
> 01 aaa pic 999.
> 01 bbb pic 999.
> 01 ddd pic 999.
> procedure division.
> compute ddd = aaa + bbb
> *> Repeat as necessary
> compute ddd = aaa + bbb
> goback.
> ~~~~~~~~~~~~~~~~~
>
>
>
> With 500 repeats of "compute ddd = aaa + bbb", -fdump-tree-gimple
> produced a
> file of 2.4Mbytes bytes. 1,000 repeats gives a file of 4.9Mbytes
>
> The respective -ftime-reports start off with
>
> Time variable wall GGC
> phase parsing : 0.05 ( 3%) 4838k ( 8%)
> phase opt and generate : 1.70 ( 97%) 53M ( 92%)
> ...
> thread pro- & epilogue : 0.91 ( 52%) 4120 ( 0%)
>
> and
>
> Time variable wall GGC
> phase parsing : 0.07 ( 1%) 9523k ( 8%)
> phase opt and generate : 6.25 ( 99%) 107M ( 92%)
> ...
> thread pro- & epilogue : 4.38 ( 71%) 4120 ( 0%)
>
> Doubling it again to 2,000 COMPUTE DDD = AAA + BBB produces a
> 9.8Mbyte
> gimple file, with these timings
>
> Time variable wall GGC
> phase parsing : 0.14 ( 0%) 18M ( 8%)
> phase opt and generate : 31.55 ( 99%) 213M ( 92%)
> ...
> thread pro- & epilogue : 27.55 ( 87%) 4120 ( 0%)
>
>
> That's the full case that got me started investigating.
FWIW I tried this (with about 900 repeats of the "compute" line):
time perf record ./gcobol -B. -S test.cob -ftime-report
Time variable wall GGC
phase setup : 0.01 ( 0%) 141k ( 0%)
phase parsing : 0.40 ( 1%) 8268k ( 8%)
phase opt and generate : 59.42 ( 99%) 91M ( 92%)
garbage collection : 0.30 ( 1%) 0 ( 0%)
callgraph construction : 0.24 ( 0%) 4854k ( 5%)
callgraph optimization : 0.11 ( 0%) 0 ( 0%)
callgraph ipa passes : 8.02 ( 13%) 12M ( 13%)
[..snip...]
tree STMT verifier : 6.01 ( 10%) 0 ( 0%)
[..snip...]
thread pro- & epilogue : 32.71 ( 55%) 4120 ( 0%)
[..snip...]
TOTAL : 59.83 99M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
[ perf record: Woken up 33 times to write data ]
[ perf record: Captured and wrote 9.123 MB perf.data (239039 samples) ]
real 1m0.256s
user 0m59.018s
sys 0m0.502s
and the top 10 places reported by "perf report" are:
30.49% cobol1 cobol1 [.] ix86_access_stack_p
9.91% cobol1 cobol1 [.] TEST_HARD_REG_BIT
6.89% cobol1 cobol1 [.] requires_stack_frame_p
3.32% cobol1 cobol1 [.] dominated_by_p
1.55% cobol1 cobol1 [.] mark_used_flags
1.33% cobol1 cobol1 [.] gimple_code
0.98% cobol1 cobol1 [.]
is_a_helper<rtx_insn*>::test<rtx_def>
0.93% cobol1 cobol1 [.] contains_struct_check
0.79% cobol1 cobol1 [.] verify_rtx_sharing
0.70% cobol1 cobol1 [.] BLOCK_FOR_INSN
i.e. I saw 30% of the time spent in an x86-specific function, albeit
with a debug build of the compiler.
Dave
>
> I think you'll find that the gimple from the full compilation is more
> complicated than necessary.
>
> If you apply this patch, you'll find it carves away a lot of generic
> that is
> irrelevant to this discussion:
> ~~~~~~~~~~~~~~~~~~~~
> diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
> index 4f71f9b1152..5f43e759b9f 100644
> --- a/gcc/cobol/genapi.cc
> +++ b/gcc/cobol/genapi.cc
> @@ -5867,13 +5867,13 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
> TREEPLET tsource;
> treeplet_fill_source(tsource, sourceref);
> static bool check_for_error = true;
> - move_helper(erf,
> - destref,
> - sourceref,
> - tsource,
> - rounded,
> - check_for_error,
> - true);
> +//// move_helper(erf,
> +//// destref,
> +//// sourceref,
> +//// tsource,
> +//// rounded,
> +//// check_for_error,
> +//// true);
>
> gg_assign(error_flag, gg_bitwise_or(error_flag, erf));
> IF(error_flag, ne_op, integer_zero_node)
> @@ -5885,10 +5885,10 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
> }
> // There was an error during the move. Set the exception
> status
> // information:
> - gg_call( VOID,
> - "__gg__process_compute_error",
> - build_int_cst_type(INT, compute_error_truncate),
> - NULL_TREE);
> +//// gg_call( VOID,
> +//// "__gg__process_compute_error",
> +//// build_int_cst_type(INT,
> compute_error_truncate),
> +//// NULL_TREE);
> // But because there is an ON ERROR clause, suppress
> DECLARATIVE
> // processing
> gg_assign(var_decl_exception_code, integer_zero_node);
> @@ -5935,13 +5935,13 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
> TREEPLET tsource;
> treeplet_fill_source(tsource, sourceref);
> static bool check_for_error = true;
> - move_helper(erf,
> - destref,
> - sourceref,
> - tsource,
> - rounded,
> - check_for_error,
> - false);
> +//// move_helper(erf,
> +//// destref,
> +//// sourceref,
> +//// tsource,
> +//// rounded,
> +//// check_for_error,
> +//// false);
> gg_assign(error_flag, gg_bitwise_or(error_flag, erf));
> IF(error_flag, ne_op, integer_zero_node)
> {
> @@ -5952,10 +5952,10 @@ parser_assign( size_t nC, cbl_num_result_t
> *C,
> TRACE1_INDENT
> TRACE1_TEXT("Error during the move; calling
> __gg__process_compute_error")
> }
> - gg_call( VOID,
> - "__gg__process_compute_error",
> - build_int_cst_type(INT, compute_error_truncate),
> - NULL_TREE);
> +//// gg_call( VOID,
> +//// "__gg__process_compute_error",
> +//// build_int_cst_type(INT,
> compute_error_truncate),
> +//// NULL_TREE);
> }
> ELSE
> {
> diff --git a/gcc/cobol/genmath.cc b/gcc/cobol/genmath.cc
> index 6eb87544ac0..8bfe4f9e944 100644
> --- a/gcc/cobol/genmath.cc
> +++ b/gcc/cobol/genmath.cc
> @@ -1481,6 +1481,7 @@ parser_op( struct cbl_refer_t cref,
> struct cbl_refer_t bref,
> struct cbl_label_t *compute_error_label)
> {
> +return;
> Analyze();
> set_up_compute_error_label(compute_error_label);
> ~~~~~~~~~~~~~~~~~
>
> you'll get gimple which is much simpler, but still exhibits the N-
> squared
> middle-end behavior. After the patch
>
> 4,000 COMPUTE statements yields a 7.1M gimple file and takes 3.55
> seconds
> to compile.
> 8,000 COMPUTE statements yields a 15M gimple file and takes 14.95
> seconds.
>
>
> >
> > >
> > > Thanks so much for any insight into what might be happening here.
> > >
> > > _intermediate__stack501_503.0.0.data =
> > > &_stack501_data_517.0;
> > > _intermediate__stack501_503.0.0.capacity = 16;
> > > _intermediate__stack501_503.0.0.allocated = 16;
> > > _intermediate__stack501_503.0.0.offset = 0;
> > > _intermediate__stack501_503.0.0.name = &"_stack501"[0];
> > > _intermediate__stack501_503.0.0.picture = &""[0];
> > > _intermediate__stack501_503.0.0.initial = 0B;
> > > _intermediate__stack501_503.0.0.parent = 0B;
> > > _intermediate__stack501_503.0.0.occurs_lower = 0;
> > > _intermediate__stack501_503.0.0.occurs_upper = 0;
> > > _intermediate__stack501_503.0.0.attr = 4160;
> > > _intermediate__stack501_503.0.0.type = 6;
> > > _intermediate__stack501_503.0.0.level = 0;
> > > _intermediate__stack501_503.0.0.digits = 37;
> > > _intermediate__stack501_503.0.0.rdigits = 0;
> > > _intermediate__stack501_503.0.0.encoding = 1;
> > > _intermediate__stack501_503.0.0.alphabet = 0;
> > > D.2812 = 0;
> > > D.2813 = 0;
> > > _1013 = D.2812 & 18;
> > > if (_1013 != 0) goto <D.9353>; else goto <D.9354>;
> > > <D.9353>:
> > > goto <D.9355>;
> > > <D.9354>:
> > > D.2814 = 0;
> > > ..pa_erf.1519_1014 = ..pa_erf;
> > > D.2813 = D.2813 | ..pa_erf.1519_1014;
> > > if (D.2813 != 0) goto <D.9357>; else goto <D.9358>;
> > > <D.9357>:
> > > goto <D.9359>;
> > > <D.9358>:
> > > <D.9359>:
> > > <D.9355>:
> > >
> > >
>