>> When compiling OpenSSL optimized on ARM using the Microsoft compiler,
>> the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c).
>> The compiler seems to think that val and temp represent the same item
>> when they are clearly one index apart. I've coded a fix that simply
>> avoid using temporary variables and uses the indices into the t_d
>> array directly. The code is simply a refactor of the existing code
>> and does generate very effective neon instructions for the loop.
>>
>> ectest test which was always failing before in ARM on Windows is now
>> succeeding (as well as all the other tests).
>
> I recall looking at code generated at x86 and not liking the result with
> code similar to what you suggest. Which is why those temporary values
> were added. I wonder if you could test following loop.
>
> for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
> {
> t_d[i] = (val>>BN_NIST_521_RSHIFT |
> (val=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
> }
> t_d[i] = val>>BN_NIST_521_RSHIFT;
Other compilers issue warnings, so try this instead
for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
{
t_d[i] = (val>>BN_NIST_521_RSHIFT |
(tmp=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
val=tmp;
}
t_d[i] = val>>BN_NIST_521_RSHIFT;
If it doesn't work, then we'll go for removing temporary values.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]