RE: COBOL: Hoping for insight with middle-end computation time.

Robert Dubner Sat, 21 Mar 2026 12:16:15 -0700

> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Saturday, March 21, 2026 06:42
> To: Jakub Jelinek <[email protected]>
> Cc: Robert Dubner <[email protected]>; David Malcolm 
> <[email protected]>; H.
> J. Lu <[email protected]>; [email protected]; James K. Lowden
> <[email protected]>
> Subject: Re: COBOL: Hoping for insight with middle-end computation time.
>
> On Sat, Mar 21, 2026 at 10:25 AM Jakub Jelinek <[email protected]> wrote:
> >
> > On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote:
> > > 10,000 repeats of that code in the C++ program compiles in 1.36 
> > > seconds.
> > > 20,000 repeats                                             3.18 
> > > seconds.
> > > 40,000 repeats                                             7.92 
> > > seconds.
> > >
> > > 10,000 repeats in the COBOL program                       16.76 
> > > seconds.
> > > 20,000 repeats in the COBOL program                       97.40 
> > > seconds.
> > > 10,000 repeats in the COBOL program                      551.56 
> > > seconds.
> >
> > Perhaps also look at -fdump-tree-ssa-vops dump differences too, that 
> > will
> > make it clearer if there aren't differences in what is TREE_ADDRESSABLE 
> > and
> > what is not, or what is a global var and what could have been rewritten 
> > into
> > SSA form.
>
> Looking at -gimple there's a single scope block with try/finally and all
> clobbers in the finally block.  I suppose given that cobol emits a single
> function only we could elide the end-of-function clobbers for it.
>
> The D.nnn are registers, not memory AFAICS.
>
> What's a bit odd is that there seems to be global variables
> called __gg__treeplet_{1,2,3}{f,o,s} where we store addresses
> of aaa, etc, into:
>
>       __gg__treeplet_1f.58_103 = __gg__treeplet_1f;
>       _104 = 0;
>       _105 = __gg__treeplet_1f.58_103 + _104;
>       *_105 = &aaa.73.0;
>
> and the actual computation happens in
>
>       __gg__add_fixed_phase1 (2, 2, 0, 0,
> __gg__arithmetic_rounds.64_121, 0, &D.321);
>
> I assume which seems to indicate that argumets get passed via global 
> variables
> rather than formal arguments.  That might in practice help code
> generation at -O0
> though.
>
> I suspect it's simply very many life variables (the D.nnnn) that need 
> stack
> space and make the x86 stack var analysis code slow.
>
> I do wonder about
>
>       _intermediate__stack327_329.0.0.data = &_stack327_data_330.0;
>       _intermediate__stack327_329.0.0.capacity = 16;
>       _intermediate__stack327_329.0.0.allocated = 16;
>       _intermediate__stack327_329.0.0.offset = 0;
>       _intermediate__stack327_329.0.0.name = &"_stack327"[0];
>       _intermediate__stack327_329.0.0.picture = &""[0];
>       _intermediate__stack327_329.0.0.initial = 0B;
>       _intermediate__stack327_329.0.0.parent = 0B;
>       _intermediate__stack327_329.0.0.occurs_lower = 0;
>       _intermediate__stack327_329.0.0.occurs_upper = 0;
>       _intermediate__stack327_329.0.0.attr = 4160;
>       _intermediate__stack327_329.0.0.type = 6;
>       _intermediate__stack327_329.0.0.level = 0;
>       _intermediate__stack327_329.0.0.digits = 37;
>       _intermediate__stack327_329.0.0.rdigits = 0;
>       _intermediate__stack327_329.0.0.encoding = 1;
>       _intermediate__stack327_329.0.0.alphabet = 0;
>       __gg__initialize_variable_clean (&_intermediate__stack327_329.0.0,
> 32768);
>
> so why do we have inline initialization of the variable but also a call to
> apparently do sth similar?  That seems to be abstraction that is at least
> oddly designed.
>
> With all of the above it would help if those temporary variables like
> _intermediate__stack327_329.0.0 or the D.nnn would be wrapped
> in their own scope block as to limit their lifetime given at least
> the stack ones are address-taken.  So in C terms, have
>
>   {
>     compute ddd = aaa + bbb
>  }
>  {
>     compute ddd = aaa + bbb
>   }
> ...
>
> so the frontend emits those temporaries into its own scope wrapping
> the computes to limit their lifetime.  Scopes in GENERIC are BLOCKs
> and in the IL you'd have BIND_EXPRs.
>
> Richard.

The reason for __gg__initialize_variable_clean is two-fold.  1) COBOL 
variables can have a hierarchical structure, where the storage for multiple 
variables exist in a single memory block.  The initialize function 
establishes the data area for each variable by adding the offset to the 
base.  (This could be done in GENERIC, and maybe I will.)  But 2) COBOL has 
the INITIALIZE statement, which can set the data area to a number of 
different possibilities, including the original value.  I have chosen to do 
that in the initialize routine.  The initialize routine is not needed for 
this kind of intermediate variable, and that call is due for elimination.

As for the treeplets... Yes, those are global variables.  COBOL can 
be...annoying.  For example, this is a valid statement

    ADD A B C D TO E GIVING X Y Z

So, I need a way of passing all of those variables to the library routine 
that actually adds A through E and assigns the result to X through Z.  When 
you keep in mind that any or all of those variables can be elements from 
subscripted arrays, and that various different COBOL statements allow for 
unbounded numbers of parameters, I ended up streamlining my problem by 
creating global for passing variables.

I am beginning to sense that's somehow not a good idea?

>
> >
> >         Jakub
> >
RE: COBOL: Hoping for insight with middle-end computation time.

Reply via email to