https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119702
Avinash Jayakar <avinashd at linux dot ibm.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |avinashd at linux dot ibm.com
--- Comment #2 from Avinash Jayakar <avinashd at linux dot ibm.com> ---
I am looking into this issue.
As Peter mentioned the following issue
2) vextsb2d is not required as PowerPC has a modulo shift. It does not matter
if additional bytes are set with shift amount.
is no longer present in the trunk.
I just wanted to understand the optimization opportunity a little better.
This change is mainly to optimize the code size rather than execution time
right?
Because I think using a splat and shift has a similar performance to doing an
add. I just ran a small benchmark, with 2 variants, and see very minimal
difference in the actual execution time. Here is the synthetic benchmark.
int main() {
unsigned long long a[2];
a[0] = 1;
a[1] = 2;
for (long i=0; i<1e10; i++) lshift1((unsigned long long*)&a);
printf ("%ld\n", a[1]); // don't optimize away the loop
}
And should the same behaviour happen with the following code as well?
1. a[0] *= 2; a[1] *= 2;
2. a[0] += a[0]; a[1] += a[1];
All of these emit the same left shift by 1 instruction with current gcc's
trunk.