I have found several ways to "fix" the latest issue, but they all boil
down to never passing an __m128d value on the call stack. For instance
change
static __m128d
__attribute__((noinline, unused))
test (__m128d s1, __m128d s2)
to
static __m128d test (__m128d s1, __m128d s2)
and the program works. Similarly, change the function to
static __m128d __attribute__((noinline)) test (__m128d *s1, __m128d *s2)
{
return _mm_add_pd (*s1, *s2);
}
and it also works.
Things I tried to force a 16 byte stack alignment that didn't work:
1 -mstackrealign
2 -mpreferred-stack-boundary=4
3 -mincoming-stack-boundary=4
4 2 and 3
5 1 and 2 and 3
I guess the bigger question is why can an __m128d be passed on the call
stack reliably when -msse2 is invoked, but not otherwise? If the
compiler cannot do this reliably shouldn't it throw an error or warning?
Thanks,
David Mathog
[email protected]
Manager, Sequence Analysis Facility, Biology Division, Caltech