On Sat, Mar 21, 2026 at 8:15 PM Robert Dubner <[email protected]> wrote: > > > > > -----Original Message----- > > From: Richard Biener <[email protected]> > > Sent: Saturday, March 21, 2026 06:42 > > To: Jakub Jelinek <[email protected]> > > Cc: Robert Dubner <[email protected]>; David Malcolm > > <[email protected]>; H. > > J. Lu <[email protected]>; [email protected]; James K. Lowden > > <[email protected]> > > Subject: Re: COBOL: Hoping for insight with middle-end computation time. > > > > On Sat, Mar 21, 2026 at 10:25 AM Jakub Jelinek <[email protected]> wrote: > > > > > > On Fri, Mar 20, 2026 at 10:17:50PM -0500, Robert Dubner wrote: > > > > 10,000 repeats of that code in the C++ program compiles in 1.36 > > > > seconds. > > > > 20,000 repeats 3.18 > > > > seconds. > > > > 40,000 repeats 7.92 > > > > seconds. > > > > > > > > 10,000 repeats in the COBOL program 16.76 > > > > seconds. > > > > 20,000 repeats in the COBOL program 97.40 > > > > seconds. > > > > 10,000 repeats in the COBOL program 551.56 > > > > seconds. > > > > > > Perhaps also look at -fdump-tree-ssa-vops dump differences too, that > > > will > > > make it clearer if there aren't differences in what is TREE_ADDRESSABLE > > > and > > > what is not, or what is a global var and what could have been rewritten > > > into > > > SSA form. > > > > Looking at -gimple there's a single scope block with try/finally and all > > clobbers in the finally block. I suppose given that cobol emits a single > > function only we could elide the end-of-function clobbers for it. > > > > The D.nnn are registers, not memory AFAICS. > > > > What's a bit odd is that there seems to be global variables > > called __gg__treeplet_{1,2,3}{f,o,s} where we store addresses > > of aaa, etc, into: > > > > __gg__treeplet_1f.58_103 = __gg__treeplet_1f; > > _104 = 0; > > _105 = __gg__treeplet_1f.58_103 + _104; > > *_105 = &aaa.73.0; > > > > and the actual computation happens in > > > > __gg__add_fixed_phase1 (2, 2, 0, 0, > > __gg__arithmetic_rounds.64_121, 0, &D.321); > > > > I assume which seems to indicate that argumets get passed via global > > variables > > rather than formal arguments. That might in practice help code > > generation at -O0 > > though. > > > > I suspect it's simply very many life variables (the D.nnnn) that need > > stack > > space and make the x86 stack var analysis code slow. > > > > I do wonder about > > > > _intermediate__stack327_329.0.0.data = &_stack327_data_330.0; > > _intermediate__stack327_329.0.0.capacity = 16; > > _intermediate__stack327_329.0.0.allocated = 16; > > _intermediate__stack327_329.0.0.offset = 0; > > _intermediate__stack327_329.0.0.name = &"_stack327"[0]; > > _intermediate__stack327_329.0.0.picture = &""[0]; > > _intermediate__stack327_329.0.0.initial = 0B; > > _intermediate__stack327_329.0.0.parent = 0B; > > _intermediate__stack327_329.0.0.occurs_lower = 0; > > _intermediate__stack327_329.0.0.occurs_upper = 0; > > _intermediate__stack327_329.0.0.attr = 4160; > > _intermediate__stack327_329.0.0.type = 6; > > _intermediate__stack327_329.0.0.level = 0; > > _intermediate__stack327_329.0.0.digits = 37; > > _intermediate__stack327_329.0.0.rdigits = 0; > > _intermediate__stack327_329.0.0.encoding = 1; > > _intermediate__stack327_329.0.0.alphabet = 0; > > __gg__initialize_variable_clean (&_intermediate__stack327_329.0.0, > > 32768); > > > > so why do we have inline initialization of the variable but also a call to > > apparently do sth similar? That seems to be abstraction that is at least > > oddly designed. > > > > With all of the above it would help if those temporary variables like > > _intermediate__stack327_329.0.0 or the D.nnn would be wrapped > > in their own scope block as to limit their lifetime given at least > > the stack ones are address-taken. So in C terms, have > > > > { > > compute ddd = aaa + bbb > > } > > { > > compute ddd = aaa + bbb > > } > > ... > > > > so the frontend emits those temporaries into its own scope wrapping > > the computes to limit their lifetime. Scopes in GENERIC are BLOCKs > > and in the IL you'd have BIND_EXPRs. > > > > Richard. > > The reason for __gg__initialize_variable_clean is two-fold. 1) COBOL > variables can have a hierarchical structure, where the storage for multiple > variables exist in a single memory block. The initialize function > establishes the data area for each variable by adding the offset to the > base. (This could be done in GENERIC, and maybe I will.) But 2) COBOL has > the INITIALIZE statement, which can set the data area to a number of > different possibilities, including the original value. I have chosen to do > that in the initialize routine. The initialize routine is not needed for > this kind of intermediate variable, and that call is due for elimination.
I see. > As for the treeplets... Yes, those are global variables. COBOL can > be...annoying. For example, this is a valid statement > > ADD A B C D TO E GIVING X Y Z > > So, I need a way of passing all of those variables to the library routine > that actually adds A through E and assigns the result to X through Z. When > you keep in mind that any or all of those variables can be elements from > subscripted arrays, and that various different COBOL statements allow for > unbounded numbers of parameters, I ended up streamlining my problem by > creating global for passing variables. > > I am beginning to sense that's somehow not a good idea? It looks weird at least (and it's going to be a pain when Cobol support threads). But I can see why you did it. The "obvious" choice might have been to use variadic arguments, passing pointers to data which would then eventually be on the stack. It's not clear to me that this would be beneficial though. Another variant would be to "lower" such statement to a simpler ADD A TO E GIVING TEMPORARY ADD B TO TEMPORARY GIVING TEMPORARY ... ASSIGN TEMPORARY TO X ... (just assuming that "ASSIGN" works ;)) that is, try to make the library interface handle a subset of COBOL, a subset that is (maybe) easy to lower to. That said, I don't think what you do is wrong, it's a bit non-obvious if you only read the GENERIC and have no idea how that interfacing is designed, thus my questions ;) The issue about local variable scopes remains and improving that will at least improve stack usage. Maybe it will also help the -O0 compile-time, but I know there is algorithmic issues in the x86 backend and that has to be fixed eventually. Richard. > > > > > > > > > Jakub > > >
