[Bug c++/94485] Inter-dependency between { tree-sra, ABI, inlining, loop-unrolling } leads to mis-optimization

dimitri.gorokhovik at free dot fr via Gcc-bugs Thu, 24 Sep 2020 13:26:08 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94485


--- Comment #8 from Dimitri Gorokhovik <dimitri.gorokhovik at free dot fr> ---
I was able to reduce same code (see the attached file bug-6.cpp).

-- when compiled correctly, running it produces the following (expected)
output:

cube: ({ 0, 0, 0 }, { 1, 1, 1 }) 
cube: ({ 0, 0, 1 }, { 1, 1, 2 }) 
cube: ({ 0, 0, 2 }, { 1, 1, 3 }) 
cube: ({ 0, 1, 0 }, { 1, 2, 1 }) 
cube: ({ 0, 1, 1 }, { 1, 2, 2 }) 
cube: ({ 0, 1, 2 }, { 1, 2, 3 }) 
cube: ({ 0, 2, 0 }, { 1, 3, 1 }) 
cube: ({ 0, 2, 1 }, { 1, 3, 2 }) 
cube: ({ 0, 2, 2 }, { 1, 3, 3 }) 
cube: ({ 1, 0, 0 }, { 2, 1, 1 }) 
cube: ({ 1, 0, 1 }, { 2, 1, 2 }) 
cube: ({ 1, 0, 2 }, { 2, 1, 3 }) 
cube: ({ 1, 1, 0 }, { 2, 2, 1 }) 
cube: ({ 1, 1, 1 }, { 2, 2, 2 }) 
cube: ({ 1, 1, 2 }, { 2, 2, 3 }) 
cube: ({ 1, 2, 0 }, { 2, 3, 1 }) 
cube: ({ 1, 2, 1 }, { 2, 3, 2 }) 
cube: ({ 1, 2, 2 }, { 2, 3, 3 }) 
cube: ({ 2, 0, 0 }, { 3, 1, 1 }) 
cube: ({ 2, 0, 1 }, { 3, 1, 2 }) 
cube: ({ 2, 0, 2 }, { 3, 1, 3 }) 
cube: ({ 2, 1, 0 }, { 3, 2, 1 }) 
cube: ({ 2, 1, 1 }, { 3, 2, 2 }) 
cube: ({ 2, 1, 2 }, { 3, 2, 3 }) 
cube: ({ 2, 2, 0 }, { 3, 3, 1 }) 
cube: ({ 2, 2, 1 }, { 3, 3, 2 }) 
cube: ({ 2, 2, 2 }, { 3, 3, 3 }) 
count = 27

-- when compiled incorrectly, it prints out:

count = 0

Tested with build g++ (GCC) 11.0.0 20200924 (experimental).


In order to compile and run:

g++ -std=c++17 -O3 -o bug-6 bug-6.cpp && ./bug-6

This builds for implicit '-m64' (x86_64) and produces invalid output. 

To get valid output, compile with either of the following:
-m32
-O0 (instead of -O3)
-fno-tree-sra
one of: -DFIX_0, -DFIX_1, -DFIX_2, -DFIX_3, -DFIX_4 


>From my limited understanding of tree dumps, here is what roughly happens:

-- the routine 'begin()', line 183, returns 'struct iterator' by value. The
latter has the size of 14 bytes so returned "in registers". Forcing it to be
returned via memory ==> issue goes away. (Methods to force: make bigger than 16
bytes, make some fields volatile, use -m32). Note also that, when the routine
is evaluated as constexpr (in static_assert), the issue is not reproduced.

-- all called routines (pretty much) are inlined inside one call, to
'count_them ()'. Prevent the inlining of the routine 'can_be_incremented ()'
==>  issue goes away. (Methods to prevent: define FIX_1.)

-- SRA replaces several fields of the 'struct iterator' (line 150), most
importantly 'idx_' (line 153). Disable SRA ==> issue goes away (-fno-tree-sra
or  -O0). 

This replacement by tree-SRA somehow doesn't propagate the writes to the
replacement vars of idx_to the original parts of the structure living "in the
return registers".  When the return value lives in memory, the writes are
propagated correctly.

The compiler then eliminates the loop in 'can_be_incremented' and evaluates the
call to that routine to 'false' (line 163). Forcibly keeping the loop (-DFIX_2)
or replacing it by non-loop code (-DFIX_0) ==> issue goes away.

[Bug c++/94485] Inter-dependency between { tree-sra, ABI, inlining, loop-unrolling } leads to mis-optimization

Reply via email to