Tim Goetze wrote:

Simon Jenkins wrote:

I can definitely get

asm ("movaps %%xmm1 %0" : "=m" (t[0]));

to exhibit the optimisation problem (the one I couldn't get your
original line to show) and then fix it again by removing the [0].

[snip]


ok, here is a distilled test of how i allocate and use the instructions:

int main (int argc, char ** argv)
{
 char scratch [128 + 15];
 float f = 2.3;

 int s = (int) scratch;
 s &= 0xF;
 if (s)
   s = 16 - s;
 float * d = (float *) (((char *) scratch) + s);
 fprintf (stderr, "%p\n", d);

 asm ("movss %0, %%xmm0" : : "m" (f));
 asm ("shufps $0, %xmm0, %xmm0");
 asm ("movaps %%xmm0, %0" : "=m" (d[0]));

 printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
}

you'll agree that the program should print "2.30 2.30 2.30 2.30".
it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.

here's what the assembly block compiles to with "=m" (d):

#APP
 movss -148(%ebp), %xmm0
 shufps $0, %xmm0, %xmm0
 movaps %xmm0, -156(%ebp)
#NO_APP

and here's with "=m" (d[0]):

#APP
 movss -148(%ebp), %xmm0
 shufps $0, %xmm0, %xmm0
#NO_APP
 movl -156(%ebp),%eax
#APP
 movaps %xmm0, (%eax)
#NO_APP

so saying "=m" (d) causes xmm0 to be written to &d, not d, as
intended. if &d isn't 128-bit aligned, it will segfault now.
even if it is, that's not where we wanted the numbers from xmm0
to go ...

The discrepency here is because you originally said you were trying to
get the data into a named array of floats:

float t[4];

but it turns out you're actually trying to get them into some memory
to which you have a named pointer:

float *d;

Now, there are a great many circumstances in which you could treat
such names interchangeably, but this isn't one of them.

The following code demonstrates

asm ("movaps %%xmm0, %0" : "=m" (d));

working correctly if d is an aligned array of floats. Also, if
you change the d to d[0], it exhibits the optimization problem.


/* start */


float d[4] __attribute__ ((aligned(16))) = { 1.1f, 1.1f, 1.1f, 1.1f };

int main (int argc, char ** argv)
{
 float z = 1.1f;
 float f = 2.3f;

z += 3.3f;

 asm ("movss %0, %%xmm0" : : "m" (f));
 asm ("shufps $0, %xmm0, %xmm0");
 asm ("movaps %%xmm0, %0" : "=m" (d));

z += d[1];

printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
printf ("z is %.2f\n", z ); } /* end */



We're expecting (and we get):


2.30 2.30 2.30 2.30
z is 6.70

but using d[0] instead of d we end up getting:

2.30 2.30 2.30 2.30
z is 5.50

Simon Jenkins
(Bristol, UK)




Reply via email to