[Bug target/28481] regression from 3.x: 4.1.1 uses memory where it can use registers
--- Comment #4 from vda dot linux at googlemail dot com 2007-07-22 00:02 --- With t.c being a timing program from comment #3 and serpent.c from attachment, I build testing program for 3.4.3, 3.4.6 and 4.2.1, -Os and -O3, like this: ver=NNN gcc -Os -o serpent-${ver}-Os serpent.c t.c gcc -Os -o serpent-${ver}-Os.o -c serpent.c gcc -O3 -o serpent-${ver}-O3 serpent.c t.c gcc -O3 -o serpent-${ver}-O3.o -c serpent.c Performance regression on -O3 (runs at 2/3 speed of 3.4.x). Did four runs of each: 343-O3 ops/second=712888 ops/second=722059 ops/second=718909 ops/second=713506 346-O3 ops/second=643833 ops/second=712619 ops/second=721724 ops/second=719445 421-O3 ops/second=495349 ops/second=496887 ops/second=490650 ops/second=494522 Size: improved relative to 3.4.x: # size *-Os.o textdata bss dec hex filename 4302 0 0430210ce serpent-343-Os.o 4335 0 0433510ef serpent-346-Os.o 3877 0 03877 f25 serpent-421-Os.o ...but 3.4.x was even smaller at -O3 than 4.2.1 at -Os: # size *-O3.o textdata bss dec hex filename 3292 0 03292 cdc serpent-343-O3.o 3292 0 03292 cdc serpent-346-O3.o 3877 0 03877 f25 serpent-421-O3.o Actually, 4.2.1 seems to generate same code for -Os/-O2/-O3: # size *421*.o textdata bss dec hex filename 3877 0 03877 f25 serpent-421-O2.o 3877 0 03877 f25 serpent-421-O3.o 3877 0 03877 f25 serpent-421-Os.o -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28481
[Bug target/28481] regression from 3.x: 4.1.1 uses memory where it can use registers
--- Comment #5 from vda dot linux at googlemail dot com 2007-07-22 00:10 --- Basically, the reason for the regression is that 4.2.1 doesn't figure out how to use i386 registers efficiently. 3.4.3 was able to do it. Difference in assembly: # grep 'mov.*(' serpent-343-O3.s | wc -l 21 serpent_encrypt: pushl %ebp movl%esp, %ebp pushl %edi pushl %esi pushl %ebx pushl %edx movl8(%ebp), %edi movl16(%ebp), %ecx movl12(%edi), %eax # grep 'mov.*(' serpent-421-O3.s | wc -l 115= many more moves to memory (to stack actually) serpent_encrypt: pushl %ebp movl%esp, %ebp pushl %edi pushl %esi pushl %ebx subl$120, %esp allocated storage for spills movl16(%ebp), %eax movl8(%ebp), %edx movl%edx, -128(%ebp) . -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28481
[Bug target/28481] regression from 3.x: 4.1.1 uses memory where it can use registers
-- mmitchel at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|4.1.2 |4.1.3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28481
[Bug target/28481] regression from 3.x: 4.1.1 uses memory where it can use registers
-- mmitchel at gcc dot gnu dot org changed: What|Removed |Added Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28481
[Bug target/28481] regression from 3.x: 4.1.1 uses memory where it can use registers
-- jsm28 at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|--- |4.1.2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28481
[Bug target/28481] regression from 3.x: 4.1.1 uses memory where it can use registers
--- Comment #2 from rguenth at gcc dot gnu dot org 2006-07-25 15:47 --- I get [EMAIL PROTECTED]:/tmp /space/rguenther/install/gcc-3.4.6/bin/gcc -O3 -c serpent.c [EMAIL PROTECTED]:/tmp size serpent.o textdata bss dec hex filename 3562 0 03562 dea serpent.o [EMAIL PROTECTED]:/tmp /space/rguenther/install/gcc-4.1.1/bin/gcc -O3 -c serpent.c [EMAIL PROTECTED]:/tmp size serpent.o textdata bss dec hex filename 4137 0 041371029 serpent.o [EMAIL PROTECTED]:/tmp /space/rguenther/install/gcc-4.1.1/bin/gcc -O3 -c serpent.c -fomit-frame-pointer [EMAIL PROTECTED]:/tmp size serpent.o textdata bss dec hex filename 3695 0 03695 e6f serpent.o [EMAIL PROTECTED]:/tmp /space/rguenther/install/gcc-3.4.6/bin/gcc -O3 -c serpent.c -fomit-frame-pointer [EMAIL PROTECTED]:/tmp size serpent.o textdata bss dec hex filename 3526 0 03526 dc6 serpent.o so, confirmed for -O3, but -O3 is about speed - how's that comparing? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28481
[Bug target/28481] regression from 3.x: 4.1.1 uses memory where it can use registers
--- Comment #3 from vda dot linux at googlemail dot com 2006-07-25 17:18 --- With this test program: #include sys/time.h #include stdio.h typedef unsigned u32; struct serpent_ctx { u32 expkey[132]; }; void serpent_encrypt(void *ctx, u32 *dst, const u32 *src); u32 v[4],u[4]; struct serpent_ctx ctx; int main() { time_t t; int count; t = time(NULL); while(t == time(NULL)) /*wait*/; t = time(NULL); count = 0; while(t == time(NULL)) { serpent_encrypt(ctx, u, v); serpent_encrypt(ctx, u, v); serpent_encrypt(ctx, u, v); serpent_encrypt(ctx, u, v); count++; } printf(ops/second=%d\n, count); return 0; } I see that bigger code = slower code: # size serpent343-O3 serpent411-O3 serpent343-Os textdata bss dec hex filename 4285 260 59251371411 serpent343-O3 4461 260 592531314c1 serpent411-O3 5101 260 59259531741 serpent343-Os 343-O3 is just tiny bit smaller, and it also is tiny bit faster: # ./serpent343-O3;./serpent343-O3;./serpent343-O3; ops/second=168637 ops/second=166610 ops/second=169509 # ./serpent411-O3;./serpent411-O3;./serpent411-O3; ops/second=164809 ops/second=163172 ops/second=161431 I tried longer runs. It is definitely not just a test variability. The biggest is the slowest: # ./serpent343-Os;./serpent343-Os;./serpent343-Os; ops/second=158495 ops/second=151342 ops/second=154777 So, yes, this is also a smallish speed regression too. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28481