https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100410
Bug ID: 100410
Summary: [10 regression] optimization bug with -O3
-fno-strict-aliasing
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: chantry.xavier at gmail dot com
Target Milestone: ---
Created attachment 50746
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50746=edit
testcase
We have a bug in our lzo decompressing code when switching to Debian 11 (still
not released) which has gcc version 10.2.1 20210110
We use -O2 and several optim flags from -O3. The code worked well with all
previous versions of gcc, including gcc version 9.3.0. It also works with
several versions of clang.
I am attaching a testcase that seems to reproduce the problem.
Compile with gcc -O3 -fno-strict-aliasing -Wall testcase.c
% gcc -O2 -fno-strict-aliasing -Wall testcase.c
% ./a.out 15 40
`aaa`
% gcc -O3 -fno-strict-aliasing -Wall testcase.c
% ./a.out 15 40
`aa`
% valgrind ./a.out 15 40
==321763== Memcheck, a memory error detector
==321763== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==321763== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==321763== Command: ./a.out 15 40
==321763==
==321763== Conditional jump or move depends on uninitialised value(s)
==321763==at 0x483BC98: strlen (vg_replace_strmem.c:459)
==321763==by 0x48CDF75: __vfprintf_internal (vfprintf-internal.c:1688)
==321763==by 0x48B8D9A: printf (printf.c:33)
==321763==by 0x1091E3: main (in
/home/xavier/dev/master/platform/lib-common/tests/a.out)
==321763== Uninitialised value was created by a stack allocation
==321763==at 0x109086: main (in
/home/xavier/dev/master/platform/lib-common/tests/a.out)
==321763==
`aa`
Using -fno-strict-aliasing does not seem to have an impact, but since this code
violates the strict aliasing rule, maybe it has something to do with it ?
I do not fully understand the generated assembly but I see a difference here.
This might be the condition to enable the copy by block of 16 bytes with movdqa
/ movups ?
-- 0x10c3 <+83>:>---mov%rax,%r8
-- 0x10c6 <+86>:>---lea(%rax,%rbp,1),%rdx
-- 0x10ca <+90>:>---cmp$0x7,%r12d
-- 0x10ce <+94>:>---jbe0x11a5
-- 0x10d4 <+100>:>--lea-0x8(%r12),%ecx
-- 0x10d9 <+105>:>--cmp$0x1f,%ecx
-- 0x10dc <+108>:>--jbe0x118c
-- 0x10e2 <+114>:>--cmp$0x8,%rbp
-- 0x10e6 <+118>:>--je 0x118c