Hi Aliaksei,
      to me it looks like a bug of GCC optimization. Basically, it is
assuming that the x variable is used but never read or its value is
never used. Also it assumes the same of the i variable, as we are only
accessing indirectly to the memory where it locates (the code is even
assuming that the variable exists, but it can be optimize out as in
this scenario). Even though, the original C code is valid C code, we
are not helping the compiler writing code like that. So I have
rewritten the code in a way that does not use indirect memory access
to the stack space.

One thing more that makes me think is a bug, if you use an int
constant as the index and not a parameter, the error does not occur
(the code is not badly optimized) and there is a warning about the
not-so-great access to the stack.

On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <alex.sy...@gmail.com> wrote:
>
> Hi Pablo,
>
> Wow! Thank you for the detective story :)
>
> Do I understand correctly that the original code causes undefined behavior 
> and therefore can be changed (or even removed) by the compiler?
> (because it returns something that is referencing memory on the stack)
>
> Please keep posting similar things in future! It is very educative :)
>
> Cheers,
> Alex
>
>
> On Wed, 11 Dec 2019 at 17:35, teso...@gmail.com <teso...@gmail.com> wrote:
>>
>> Hi,
>>     this mail is related to Pharo because it is knowledge I found
>> debugging the build of the VM, but the rest is to document it and
>> perhaps someone will found it interesting (also I couldn't find it
>> easily using Google). Sorry for the long mail!
>>
>> The problem
>> ==========
>>
>> The following code does not produce good code in 8.3 when using 
>> optimizations:
>>
>> long __attribute__ ((noinline)) myFunc(long i, int index){
>>    long v;
>>    long x = i >> 3;
>>
>>    v = x;
>>    return ((int*)(&v))[index];
>> }
>>
>> #include <stdio.h>
>>
>> int main(){
>>
>>     long i;
>>     int x;
>>
>>     scanf("%ld", &i);
>>     scanf("%d", &x);
>>
>>     printf("%ld",myFunc(i,x));
>> }
>>
>> Basically, with -02, it generates the following code:
>>
>> myFunc:
>>      movslq %esi, %rsi
>>      movslq -8(%rsp,%rsi,4), %rax
>>      ret
>>
>> And with -01 it generates the following code:
>>
>> myFunc:
>>      sarq $3, %rdi
>>      movq %rdi, -8(%rsp)
>>      movslq %esi, %rsi
>>      movslq -8(%rsp,%rsi,4), %rax
>>      ret
>>
>> As you can see, the more optimized version is losing the bit shift (or
>> any other operation).
>> To detect the problem it was not easy, basically, I have used the
>> Pharo Tests to detect the failing function and then to understand the
>> generation of code in GCC, we need to debug its generation.
>>
>> The first snippet is generated using:
>>
>> gcc -O2 prb.c -S -o prb.s -Wall
>>
>> and the second using:
>>
>> gcc -O1 prb.c -S -o prb.s -Wall
>>
>> The GCC pipeline has different steps, I have centered my self in the
>> tree optimizations.
>> GCC dumps each of the intermediate states of the compilation, working
>> with C like trees. For example to get the optimized version of the
>> program, before generating the Assembler we have to do:
>>
>> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout
>>
>> This will generate:
>>
>> __attribute__((noinline))
>> myFunc (long int i, int index)
>> {
>>   long int v;
>>   long unsigned int _1;
>>   long unsigned int _2;
>>   int * _3;
>>   int _4;
>>   long int _8;
>>
>>   <bb 2> [local count: 1073741825]:
>>   _1 = (long unsigned int) index_7(D);
>>   _2 = _1 * 4;
>>   _3 = &v + _2;
>>   _4 = *_3;
>>   _8 = (long int) _4;
>>   v ={v} {CLOBBER};
>>   return _8;
>>
>> }
>>
>> This is the optimized SSA (static single assign) version of the
>> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html)
>> As you can see in this version the code is already optimized, and our
>> operations not correctly generated. We expect to see something like:
>>
>> __attribute__((noinline))
>> myFunc (long int i, int index)
>> {
>>   long int x;
>>   long int v;
>>   long unsigned int _1;
>>   long unsigned int _2;
>>   int * _3;
>>   int _4;
>>   long int _10;
>>
>>   <bb 2> [local count: 1073741825]:
>>   x_6 = i_5(D) >> 3;
>>   v = x_6;
>>   _1 = (long unsigned int) index_9(D);
>>   _2 = _1 * 4;
>>   _3 = &v + _2;
>>   _4 = *_3;
>>   _10 = (long int) _4;
>>   v ={v} {CLOBBER};
>>   return _10;
>>
>> }
>>
>> In the first SSA version, we are lacking the shift operation:
>>
>>   x_6 = i_5(D) >> 3;
>>   v = x_6;
>>
>> So, we need to see in which of the optimizations and transformations
>> our code is lost:
>> To see all the steps we should execute:
>>
>> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all
>>
>> This will generate a lot, really a lot, of files in the same directory.
>> They have the name: prb.c.xxx.yyyy
>> Where xxx is the number of the step, and yyyy is the name of the step.
>> Each of the files contains the result of applying the changes, so what
>> I have done is looking in binary search where my code was lost.
>>
>> I have found that the problem was in the first dead code elimination
>> step (prb.c.041t.cddce1)
>> GCC does not like the crap code that we are writing, as we are copying
>> a stack variable and then accessing directly to the memory:
>>
>> v = x;
>> return ((int*)(&v))[index];
>>
>> So, basically, it was assuming that we were only using v and not x, so
>> all the code modifying x is described (this optimization is called
>> "dead store code deletion"). Basically tries to remove all the code
>> that is affecting stack (temporary) variables that are never used.
>>
>> I am not sure if this is a bug in GCC, so I will report it. But I have
>> fixed the problem writing the code in a proper way.
>>
>> Sorry again for the long mail, I wanted to store the knowledge and
>> propagate it before I eventually will flush it. I like these things,
>> but I am not debugging the GCC optimization pipeline all the days.
>>
>> --
>> Pablo Tesone.
>> teso...@gmail.com
>>


-- 
Pablo Tesone.
teso...@gmail.com

Reply via email to