Hi Honza,
> On 12 Dec 2025, at 3:01 am, Jan Hubicka <[email protected]> wrote: > > External email: Use caution opening links or attachments > > >> From 50bee4d9aa1bf14c9d6420af763fc309d3d99d8c Mon Sep 17 00:00:00 2001 >> From: Kugan Vivekanandarajah <[email protected]> >> Date: Sun, 7 Dec 2025 13:45:39 -0800 >> Subject: [PATCH 4/4] [Autofdo][V2] Add hierarchical discriminator for loop >> unrolling >> >> Add hierarchical discriminator support for loop unrolling. >> Assigns multiplicity and copyid discriminators to distinguish unrolled >> iterations. >> >> gcc/ChangeLog: >> >> * cfgloopmanip.cc (duplicate_loop_body_to_header_edge): Assign >> hierarchical discriminators for loop unrolling. >> * cfgloopmanip.h (DLTHE_RECORD_HIERARCHICAL_DISCRIMINATOR): New flag. >> * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Pass flag >> to enable hierarchical discriminator assignment. >> (try_peel_loop): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/hierarchical-discriminator-unroll.c: New test. >> >> + /* Assign hierarchical discriminators to distinguish loop iterations. >> */ >> + if (flags & DLTHE_RECORD_HIERARCHICAL_DISCRIMINATOR) >> + { >> + /* Only handle GIMPLE mode for now. */ >> + if (current_ir_type () == IR_GIMPLE) >> + { >> + /* For loop unrolling, each unrolled iteration is a distinct copy >> + of the code and should get a unique copyid: >> + - copyid = DISCRIMINATOR_LOOP_UNROLL_BASE + iteration_number >> + - multiplicity = preserved from vectorization (if present) >> + >> + The iteration number is 'j' (0 to ndupl-1). */ >> + >> + /* Calculate copyid for this iteration. */ >> + unsigned int copyid = DISCRIMINATOR_LOOP_UNROLL_BASE + j; >> + if (copyid > DISCR_COPYID_MAX) >> + copyid = DISCR_COPYID_MAX; > > This will make 1 unrolling to work well. But if there are two nested > loops and we first unroll the inner loop and then unroll the outer loop, > we will end up overwriting the copy_ids of statements that are > duplicated in both transformations. > > Since copy_id range is limited, I do not see how to get it right except > for having persistent hash which matches locations to highest used > copy_id which would live in struct_function or which would be populated > before passes doing code duplication. > > Can you think of alternatives? I think, for simplicity, we can go with > your solution first, since it is cheaper and hopefuly catches most of > code duplication issues, but this seems important enough so we may want > to solve it in long run. How about going with something like: copyid_base = allocate_copyid_base (loc, base_discriminator); This should have global context and allocate new copyid_base for each call without any of the defines we have now. This would still have issue for LTO where any copyid_base generated in LTRANS will have conflict unless we generate this in WPA and stream. What do you think? Thanks, Kugan > > Also we need to update copy_id in loop header copying, jump threading > and RTL loop unrolling. > Especially jump threading will likely have tendency to cascade.... > > Honza
