>> When compiling OpenSSL optimized on ARM using the Microsoft compiler, >> the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c). >> The compiler seems to think that val and temp represent the same item >> when they are clearly one index apart. I've coded a fix that simply >> avoid using temporary variables and uses the indices into the t_d >> array directly. The code is simply a refactor of the existing code >> and does generate very effective neon instructions for the loop. >> >> ectest test which was always failing before in ARM on Windows is now >> succeeding (as well as all the other tests). > > I recall looking at code generated at x86 and not liking the result with > code similar to what you suggest. Which is why those temporary values > were added. I wonder if you could test following loop. > > for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++) > { > t_d[i] = (val>>BN_NIST_521_RSHIFT | > (val=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2; > } > t_d[i] = val>>BN_NIST_521_RSHIFT;
Other compilers issue warnings, so try this instead for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++) { t_d[i] = (val>>BN_NIST_521_RSHIFT | (tmp=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2; val=tmp; } t_d[i] = val>>BN_NIST_521_RSHIFT; If it doesn't work, then we'll go for removing temporary values. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org