Hi Honza,

> On 12 Dec 2025, at 3:01 am, Jan Hubicka <[email protected]> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> From 50bee4d9aa1bf14c9d6420af763fc309d3d99d8c Mon Sep 17 00:00:00 2001
>> From: Kugan Vivekanandarajah <[email protected]>
>> Date: Sun, 7 Dec 2025 13:45:39 -0800
>> Subject: [PATCH 4/4] [Autofdo][V2] Add hierarchical discriminator for loop
>> unrolling
>> 
>> Add hierarchical discriminator support for loop unrolling.
>> Assigns multiplicity and copyid discriminators to distinguish unrolled
>> iterations.
>> 
>> gcc/ChangeLog:
>> 
>>      * cfgloopmanip.cc (duplicate_loop_body_to_header_edge): Assign
>>      hierarchical discriminators for loop unrolling.
>>      * cfgloopmanip.h (DLTHE_RECORD_HIERARCHICAL_DISCRIMINATOR): New flag.
>>      * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Pass flag
>>      to enable hierarchical discriminator assignment.
>>      (try_peel_loop): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      * gcc.dg/hierarchical-discriminator-unroll.c: New test.
>> 
>> +      /* Assign hierarchical discriminators to distinguish loop iterations. 
>>  */
>> +      if (flags & DLTHE_RECORD_HIERARCHICAL_DISCRIMINATOR)
>> +     {
>> +       /* Only handle GIMPLE mode for now.  */
>> +       if (current_ir_type () == IR_GIMPLE)
>> +         {
>> +           /* For loop unrolling, each unrolled iteration is a distinct copy
>> +              of the code and should get a unique copyid:
>> +              - copyid = DISCRIMINATOR_LOOP_UNROLL_BASE + iteration_number
>> +              - multiplicity = preserved from vectorization (if present)
>> +
>> +              The iteration number is 'j' (0 to ndupl-1).  */
>> +
>> +           /* Calculate copyid for this iteration.  */
>> +           unsigned int copyid = DISCRIMINATOR_LOOP_UNROLL_BASE + j;
>> +           if (copyid > DISCR_COPYID_MAX)
>> +             copyid = DISCR_COPYID_MAX;
> 
> This will make 1 unrolling to work well.  But if there are two nested
> loops and we first unroll the inner loop and then unroll the outer loop,
> we will end up overwriting the copy_ids of statements that are
> duplicated in both transformations.
> 
> Since copy_id range is limited, I do not see how to get it right except
> for having persistent hash which matches locations to highest used
> copy_id which would live in struct_function or which would be populated
> before passes doing code duplication.
> 
> Can you think of alternatives?  I think, for simplicity, we can go with
> your solution first, since it is cheaper and hopefuly catches most of
> code duplication issues, but this seems important enough so we may want
> to solve it in long run.
How about going with  something like:
copyid_base = allocate_copyid_base (loc, base_discriminator);
This should have global context and allocate new copyid_base for each call 
without any of the defines we have now.
This would still have issue for LTO where any copyid_base generated in LTRANS 
will have conflict unless we generate this
in WPA and stream. What do you think?

Thanks,
Kugan

> 
> Also we need to update copy_id in loop header copying, jump threading
> and RTL loop unrolling.
> Especially jump threading will likely have tendency to cascade....
> 
> Honza

Reply via email to