https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100410
Bug ID: 100410 Summary: [10 regression] optimization bug with -O3 -fno-strict-aliasing Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: chantry.xavier at gmail dot com Target Milestone: --- Created attachment 50746 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50746&action=edit testcase We have a bug in our lzo decompressing code when switching to Debian 11 (still not released) which has gcc version 10.2.1 20210110 We use -O2 and several optim flags from -O3. The code worked well with all previous versions of gcc, including gcc version 9.3.0. It also works with several versions of clang. I am attaching a testcase that seems to reproduce the problem. Compile with gcc -O3 -fno-strict-aliasing -Wall testcase.c % gcc -O2 -fno-strict-aliasing -Wall testcase.c % ./a.out 15 40 `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa` % gcc -O3 -fno-strict-aliasing -Wall testcase.c % ./a.out 15 40 `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa` % valgrind ./a.out 15 40 ==321763== Memcheck, a memory error detector ==321763== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==321763== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==321763== Command: ./a.out 15 40 ==321763== ==321763== Conditional jump or move depends on uninitialised value(s) ==321763== at 0x483BC98: strlen (vg_replace_strmem.c:459) ==321763== by 0x48CDF75: __vfprintf_internal (vfprintf-internal.c:1688) ==321763== by 0x48B8D9A: printf (printf.c:33) ==321763== by 0x1091E3: main (in /home/xavier/dev/master/platform/lib-common/tests/a.out) ==321763== Uninitialised value was created by a stack allocation ==321763== at 0x109086: main (in /home/xavier/dev/master/platform/lib-common/tests/a.out) ==321763== `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa` Using -fno-strict-aliasing does not seem to have an impact, but since this code violates the strict aliasing rule, maybe it has something to do with it ? I do not fully understand the generated assembly but I see a difference here. This might be the condition to enable the copy by block of 16 bytes with movdqa / movups ? -- 0x00000000000010c3 <+83>:>---mov %rax,%r8 -- 0x00000000000010c6 <+86>:>---lea (%rax,%rbp,1),%rdx -- 0x00000000000010ca <+90>:>---cmp $0x7,%r12d -- 0x00000000000010ce <+94>:>---jbe 0x11a5 <main+309> -- 0x00000000000010d4 <+100>:>--lea -0x8(%r12),%ecx -- 0x00000000000010d9 <+105>:>--cmp $0x1f,%ecx -- 0x00000000000010dc <+108>:>--jbe 0x118c <main+284> -- 0x00000000000010e2 <+114>:>--cmp $0x8,%rbp -- 0x00000000000010e6 <+118>:>--je 0x118c <main+284>