https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

Vineet Gupta <vineetg at rivosinc dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vineetg at rivosinc dot com

--- Comment #1 from Vineet Gupta <vineetg at rivosinc dot com> ---
I'm not familiar with actual algorithm of loop distribution, but I debugged and
found the point of divergence.

loop_distribution::execute() loops thru loops_list (cfun, LI_ONLY_INNERMOST).
The copy loop 7 (in both the builds) is processed but
prepare_perfect_loop_nest() returns different values

For single copy src loop, it deduces "perfect nesting" and returns outer loop
3. This essentially skips any further distribution of loop 7.

For multi-loop src build, prepare_perfect_loop_nest() exits early as 
outer->inner == loop fails (outer loop 3 has inner pointing to scaling loop 10,
the last loop inside it, not 7 which is first). This causes further logic to
eventually distribute it to 0 loop and memcpy.

I'm not sure if this is a bug or intended, hence this report.

Reply via email to