> From 50bee4d9aa1bf14c9d6420af763fc309d3d99d8c Mon Sep 17 00:00:00 2001
> From: Kugan Vivekanandarajah <[email protected]>
> Date: Sun, 7 Dec 2025 13:45:39 -0800
> Subject: [PATCH 4/4] [Autofdo][V2] Add hierarchical discriminator for loop
>  unrolling
> 
> Add hierarchical discriminator support for loop unrolling.
> Assigns multiplicity and copyid discriminators to distinguish unrolled
> iterations.
> 
> gcc/ChangeLog:
> 
>       * cfgloopmanip.cc (duplicate_loop_body_to_header_edge): Assign
>       hierarchical discriminators for loop unrolling.
>       * cfgloopmanip.h (DLTHE_RECORD_HIERARCHICAL_DISCRIMINATOR): New flag.
>       * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Pass flag
>       to enable hierarchical discriminator assignment.
>       (try_peel_loop): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/hierarchical-discriminator-unroll.c: New test.
> 
> +      /* Assign hierarchical discriminators to distinguish loop iterations.  
> */
> +      if (flags & DLTHE_RECORD_HIERARCHICAL_DISCRIMINATOR)
> +     {
> +       /* Only handle GIMPLE mode for now.  */
> +       if (current_ir_type () == IR_GIMPLE)
> +         {
> +           /* For loop unrolling, each unrolled iteration is a distinct copy
> +              of the code and should get a unique copyid:
> +              - copyid = DISCRIMINATOR_LOOP_UNROLL_BASE + iteration_number
> +              - multiplicity = preserved from vectorization (if present)
> +              
> +              The iteration number is 'j' (0 to ndupl-1).  */
> +
> +           /* Calculate copyid for this iteration.  */
> +           unsigned int copyid = DISCRIMINATOR_LOOP_UNROLL_BASE + j;
> +           if (copyid > DISCR_COPYID_MAX)
> +             copyid = DISCR_COPYID_MAX;

This will make 1 unrolling to work well.  But if there are two nested
loops and we first unroll the inner loop and then unroll the outer loop,
we will end up overwriting the copy_ids of statements that are
duplicated in both transformations.

Since copy_id range is limited, I do not see how to get it right except
for having persistent hash which matches locations to highest used
copy_id which would live in struct_function or which would be populated
before passes doing code duplication.

Can you think of alternatives?  I think, for simplicity, we can go with
your solution first, since it is cheaper and hopefuly catches most of
code duplication issues, but this seems important enough so we may want
to solve it in long run.

Also we need to update copy_id in loop header copying, jump threading
and RTL loop unrolling.
Especially jump threading will likely have tendency to cascade....

Honza

Reply via email to