Ian Lance Taylor <[email protected]> writes:
> I mentioned on IRC that I had a simple patch to let the RTL level
> aliasing analysis see the underlying decl, the one with the restrict
> qualifier. My original patch was for the 4.0 branch. This is a
> version updated for the 4.1 branch.
I forgot to add the effects. For this test case:
void
copy (int * __restrict p, const int * __restrict q, unsigned int n)
{
unsigned int i;
for (i = 0; i < n; ++i)
{
p[0] = q[0];
p[1] = q[1];
p[2] = q[2];
p[3] = q[3];
p += 4;
q += 4;
}
}
compiled with -O2 -fschedule-insns on i686-pc-linux-gnu, the unpatched
compiler generates this code in the loop:
movl (%edx), %eax
incl %ebx
movl %eax, (%ecx)
movl 4(%edx), %eax
movl %eax, 4(%ecx)
movl 8(%edx), %eax
movl %eax, 8(%ecx)
movl 12(%edx), %eax
addl $16, %edx
movl %eax, 12(%ecx)
addl $16, %ecx
The patched compiler generates this code:
movl 4(%esi), %eax
movl (%esi), %ebx
movl 8(%esi), %edx
movl 12(%esi), %ecx
addl $16, %esi
incl -16(%ebp)
movl %eax, 4(%edi)
movl %ebx, (%edi)
movl -16(%ebp), %eax
movl %edx, 8(%edi)
movl %ecx, 12(%edi)
addl $16, %edi
In the unpatched compiler, the RTL level does not see that p and q can
not alias each other, and therefore does the assignments precisely as
they appear in the program. In the patched compiler, the compiler
sees that there is no aliasing, and all the loads are done before all
the stores. The latter code will normally minimize load delays. Of
course this will have a more dramatic effect on processors which do
in-order execution.
Ian