https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110

            Bug ID: 66110
           Summary: uint8_t memory access not optimized
           Product: gcc
           Version: 5.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kevin at koconnor dot net
  Target Milestone: ---

It appears that gcc does not do a good job of optimizing memory accesses to
'uint8_t' variables.  In particular, it appears as if "strict aliasing" is not
done on uint8_t variables, and it appears it is not done even if the uint8_t is
in a struct.

============ GCC version:

$ ~/src/install-5.1.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/home/kevin/src/install-5.1.0/bin/gcc
COLLECT_LTO_WRAPPER=/home/kevin/src/install-5.1.0/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-5.1.0/configure --prefix=/home/kevin/src/install-5.1.0
--enable-languages=c
Thread model: posix
gcc version 5.1.0 (GCC) 

=========== Compile command line:

~/src/install-5.1.0/bin/gcc -v -save-temps -O2 -Wall u8alias.c -c

=========== Contents of u8alias.c:

typedef __UINT8_TYPE__ uint8_t;
typedef __UINT16_TYPE__ uint16_t;

struct s1 {
    uint16_t f1;
    uint8_t f2;
};

struct s2 {
    struct s1 *p1;
};

void f1(struct s2 *p)
{
    p->p1->f2 = 9;
    p->p1->f2 = 10;
}

=========== Contents of u8alias.i:

# 1 "u8alias.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "u8alias.c"
typedef unsigned char uint8_t;
typedef short unsigned int uint16_t;

struct s1 {
    uint16_t f1;
    uint8_t f2;
};

struct s2 {
    struct s1 *p1;
};

void f1(struct s2 *p)
{
    p->p1->f2 = 9;
    p->p1->f2 = 10;
}

=========== Results of compilation:

$ objdump -ldr u8alias.o

0000000000000000 <f1>:
f1():
   0:   48 8b 07                mov    (%rdi),%rax
   3:   c6 40 02 09             movb   $0x9,0x2(%rax)
   7:   48 8b 07                mov    (%rdi),%rax
   a:   c6 40 02 0a             movb   $0xa,0x2(%rax)
   e:   c3                      retq   

=========== Expected results:

I expected the second redundant load to be eliminated - for example, clang
produces this assembler (after replacing the gcc specific uint8_t typedefs with
an include of <stdint.h>):

$ clang -Wall -O2 u8alias.c -c
$ objdump -ldr u8alias.o

0000000000000000 <f1>:
f1():
   0:   48 8b 07                mov    (%rdi),%rax
   3:   c6 40 02 0a             movb   $0xa,0x2(%rax)
   7:   c3                      retq   

=========== Other notes:

If the code is changed so that there are two redundant writes to ->f1 then gcc
does properly optimize away the first store.  Also, if the above definition of
f2 is changed to an 8-bit bitfield (ie, "uint8_t f2 : 8;") then gcc does
properly optimize away the first store.

This occurs on other platforms besides x86_64.  (In particular, it happens on
avr-gcc where 8-bit integers are very common.)  I reproduced the above on gcc
5.1.0, but I've also seen it on variants of gcc 4.8 and gcc 4.9.

My guess is that the above is the result of an interpretation of the C99
specification - in particular section 6.5:

  An object shall have its stored value accessed only by an lvalue expression
that has one of the following types:
[...]
      -- a character type.

However, I do not think that should apply to the above test case for either of
the two following reasons:

1 - the memory access was not to a character type, it was to an integer type
that happened to be 1 byte in size (ie, a uint8_t type)
2 - the memory access was not to a character type, it was to a member of
'struct s1'.

As noted above, clang (eg, 3.4.2) does perform the expected optimization.

Reply via email to