https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
Vineet Gupta <vineetg at rivosinc dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vineetg at rivosinc dot com --- Comment #1 from Vineet Gupta <vineetg at rivosinc dot com> --- I'm not familiar with actual algorithm of loop distribution, but I debugged and found the point of divergence. loop_distribution::execute() loops thru loops_list (cfun, LI_ONLY_INNERMOST). The copy loop 7 (in both the builds) is processed but prepare_perfect_loop_nest() returns different values For single copy src loop, it deduces "perfect nesting" and returns outer loop 3. This essentially skips any further distribution of loop 7. For multi-loop src build, prepare_perfect_loop_nest() exits early as outer->inner == loop fails (outer loop 3 has inner pointing to scaling loop 10, the last loop inside it, not 7 which is first). This causes further logic to eventually distribute it to 0 loop and memcpy. I'm not sure if this is a bug or intended, hence this report.