Simon Jenkins wrote:
I can definitely get
asm ("movaps %%xmm1 %0" : "=m" (t[0]));
to exhibit the optimisation problem (the one I couldn't get your original line to show) and then fix it again by removing the [0].
[snip]
ok, here is a distilled test of how i allocate and use the instructions:
int main (int argc, char ** argv) { char scratch [128 + 15]; float f = 2.3;
int s = (int) scratch; s &= 0xF; if (s) s = 16 - s; float * d = (float *) (((char *) scratch) + s); fprintf (stderr, "%p\n", d);
asm ("movss %0, %%xmm0" : : "m" (f)); asm ("shufps $0, %xmm0, %xmm0"); asm ("movaps %%xmm0, %0" : "=m" (d[0]));
printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]); }
you'll agree that the program should print "2.30 2.30 2.30 2.30". it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.
here's what the assembly block compiles to with "=m" (d):
#APP movss -148(%ebp), %xmm0 shufps $0, %xmm0, %xmm0 movaps %xmm0, -156(%ebp) #NO_APP
and here's with "=m" (d[0]):
#APP movss -148(%ebp), %xmm0 shufps $0, %xmm0, %xmm0 #NO_APP movl -156(%ebp),%eax #APP movaps %xmm0, (%eax) #NO_APP
so saying "=m" (d) causes xmm0 to be written to &d, not d, as intended. if &d isn't 128-bit aligned, it will segfault now. even if it is, that's not where we wanted the numbers from xmm0 to go ...
The discrepency here is because you originally said you were trying to get the data into a named array of floats:
float t[4];
but it turns out you're actually trying to get them into some memory to which you have a named pointer:
float *d;
Now, there are a great many circumstances in which you could treat such names interchangeably, but this isn't one of them.
The following code demonstrates
asm ("movaps %%xmm0, %0" : "=m" (d));
working correctly if d is an aligned array of floats. Also, if you change the d to d[0], it exhibits the optimization problem.
/* start */
float d[4] __attribute__ ((aligned(16))) = { 1.1f, 1.1f, 1.1f, 1.1f };
int main (int argc, char ** argv) { float z = 1.1f; float f = 2.3f;
z += 3.3f;
asm ("movss %0, %%xmm0" : : "m" (f)); asm ("shufps $0, %xmm0, %xmm0"); asm ("movaps %%xmm0, %0" : "=m" (d));
z += d[1];
printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
printf ("z is %.2f\n", z ); } /* end */
We're expecting (and we get):
2.30 2.30 2.30 2.30 z is 6.70
but using d[0] instead of d we end up getting:
2.30 2.30 2.30 2.30 z is 5.50
Simon Jenkins (Bristol, UK)