so it checks for __i386__ which is not defined on amd64 (nor x32). There are many defines which would be enough; you can for example check for: #elif defined(__i386__) || defined(__x86_64__) Also, it could be nice to keep the optimized code on armhf as well, which would make the ARM line: #elif defined(__ARM_FEATURE_SIMD32) || defined(__ARM_NEON) Currently, the code _does_ compile but is slow on any arch other than arm64 and i386. And not many people run scientific code on either.
ok, I've updated the patch as you suggested. I hope it is correct. I uploaded it on mentors again with the same version number. Håvard