Re: A question about redudant load elimination
On Mon, Nov 14, 2011 at 9:01 AM, Jiangning Liu jiangning@arm.com wrote: Hi, For this test case, int x; extern void f(void); void g(int *a) { a[x] = 1; if (x == 100) f(); a[x] = 2; } For trunk, the x86 assembly code is like below, movl x, %eax movl 16(%esp), %ebx movl $1, (%ebx,%eax,4) movl x, %eax // Is this a redundant one? cmpl $100, %eax je .L4 movl $2, (%ebx,%eax,4) addl $8, %esp .cfi_remember_state .cfi_def_cfa_offset 8 popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 4 ret .p2align 4,,7 .p2align 3 .L4: .cfi_restore_state call f movl x, %eax movl $2, (%ebx,%eax,4) addl $8, %esp .cfi_def_cfa_offset 8 popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 4 Ret Is the 2nd movl x, %eax is a redundant one for single thread programming model? If yes, can this be optimized away? f() may change the value of x, so you cannot optimize away the load on that execution path. Thanks, -Jiangning
Re: A question about redudant load elimination
f() may change the value of x, so you cannot optimize away the load on that execution path. This looks more like an aliasing issue with a, doesn't it? -- Eric Botcazou
Re: A question about redudant load elimination
Hi, On Wed, 16 Nov 2011, Eric Botcazou wrote: f() may change the value of x, so you cannot optimize away the load on that execution path. This looks more like an aliasing issue with a, doesn't it? Correct. There's no call to f() between a[x] and x==100, but the store to a[x] might change x if x==0a==x. Ciao, Michael.
Re: A question about redudant load elimination
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/16/11 04:00, Richard Guenther wrote: On Mon, Nov 14, 2011 at 9:01 AM, Jiangning Liu jiangning@arm.com wrote: Hi, For this test case, int x; extern void f(void); void g(int *a) { a[x] = 1; if (x == 100) f(); a[x] = 2; } For trunk, the x86 assembly code is like below, movlx, %eax movl16(%esp), %ebx movl$1, (%ebx,%eax,4) movlx, %eax // Is this a redundant one? cmpl$100, %eax je .L4 movl$2, (%ebx,%eax,4) addl$8, %esp .cfi_remember_state .cfi_def_cfa_offset 8 popl%ebx .cfi_restore 3 .cfi_def_cfa_offset 4 ret .p2align 4,,7 .p2align 3 .L4: .cfi_restore_state callf movlx, %eax movl$2, (%ebx,%eax,4) addl$8, %esp .cfi_def_cfa_offset 8 popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 4 Ret Is the 2nd movl x, %eax is a redundant one for single thread programming model? If yes, can this be optimized away? f() may change the value of x, so you cannot optimize away the load on that execution path. Right. In theory, path isolation would make this optimizable. Make a copy of the block containing a[x] = 2 and make it the target when x != 100. At the source level it'd look something like this: int x; extern void f(void); void g(int *a) { a[x] = 1; if (x == 100) { f(); a[x] = 2; } else { a[x] = 2; } } The problem then becomes identification of the load from x as redundant on the else path, which we're currently not capable of doing. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOw+0dAAoJEBRtltQi2kC7MS0H/Ri6MqbuhEWbL8pLanaNpBwZ NmAcecLVIbB4YVhfqtnWOftUYFAIZZ7WhYHGNsVFIZiIK0wfGmUCk6Jmun6kpUkp TeE2Tlf6keyfGy8XImayhn6Ngifwa9salcxxL4eSND/pADZ2j+URTXVJQvlWlnIv 4aDXrQdlkTtO9EZYwSGoxbAIfHaooSQotLXT68esiTYeEIK9jD2pZVDDuvorps4E ov84Z8DurSBa/N+CA2qk0n//mq1ChwiEn+nCRXOF92v6j6av9+jKf54rJPQu7NNf b7Kj4vRYf+GAHquloqPqe18cK1qbFaZ4O5GGlL8TraXDgIDtOpyzTAtjumOLFZ8= =raxr -END PGP SIGNATURE-
Re: A question about redudant load elimination
Hi, On Wed, 16 Nov 2011, Jeff Law wrote: Right. In theory, path isolation would make this optimizable. Make a copy of the block containing a[x] = 2 and make it the target when x != 100. At the source level it'd look something like this: int x; extern void f(void); void g(int *a) { a[x] = 1; if (x == 100) { f(); a[x] = 2; } else { a[x] = 2; } } The problem then becomes identification of the load from x as redundant on the else path, which we're currently not capable of doing. Also in this variant the first store to a[x] may-clobbers x itself. The function call doesn't enter the picture. Ciao, Michael.
A question about redudant load elimination
Hi, For this test case, int x; extern void f(void); void g(int *a) { a[x] = 1; if (x == 100) f(); a[x] = 2; } For trunk, the x86 assembly code is like below, movlx, %eax movl16(%esp), %ebx movl$1, (%ebx,%eax,4) movlx, %eax // Is this a redundant one? cmpl$100, %eax je .L4 movl$2, (%ebx,%eax,4) addl$8, %esp .cfi_remember_state .cfi_def_cfa_offset 8 popl%ebx .cfi_restore 3 .cfi_def_cfa_offset 4 ret .p2align 4,,7 .p2align 3 .L4: .cfi_restore_state callf movlx, %eax movl$2, (%ebx,%eax,4) addl$8, %esp .cfi_def_cfa_offset 8 popl%ebx .cfi_restore 3 .cfi_def_cfa_offset 4 Ret Is the 2nd movl x, %eax is a redundant one for single thread programming model? If yes, can this be optimized away? Thanks, -Jiangning
Re: A question about redudant load elimination
From tree dump we can see that there are two assignments from x, one to unsigned and one to signed. I guess that's the reason. Apparently there is room to improve though. int prephitmp.8; int * D.2027; unsigned int D.2026; unsigned int x.1; int x.0; # BLOCK 2 freq:1 # PRED: ENTRY [100.0%] (fallthru,exec) x.0_1 = x; x.1_2 = (unsigned int) x.0_1; // unsigned move D.2026_3 = x.1_2 * 4; D.2027_5 = a_4(D) + D.2026_3; *D.2027_5 = 1; prephitmp.8_6 = x; // signed move On Mon, Nov 14, 2011 at 4:01 PM, Jiangning Liu jiangning@arm.com wrote: Hi, For this test case, int x; extern void f(void); void g(int *a) { a[x] = 1; if (x == 100) f(); a[x] = 2; } For trunk, the x86 assembly code is like below, movl x, %eax movl 16(%esp), %ebx movl $1, (%ebx,%eax,4) movl x, %eax // Is this a redundant one? cmpl $100, %eax je .L4 movl $2, (%ebx,%eax,4) addl $8, %esp .cfi_remember_state .cfi_def_cfa_offset 8 popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 4 ret .p2align 4,,7 .p2align 3 .L4: .cfi_restore_state call f movl x, %eax movl $2, (%ebx,%eax,4) addl $8, %esp .cfi_def_cfa_offset 8 popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 4 Ret Is the 2nd movl x, %eax is a redundant one for single thread programming model? If yes, can this be optimized away? Thanks, -Jiangning