>> When compiling OpenSSL optimized on ARM using the Microsoft compiler,
>> the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c).
>> The compiler seems to think that val and temp represent the same item
>> when they are clearly one index apart. I've coded a fix that simply
>> avoid using temporary variables and uses the indices into the t_d
>> array directly. The code is simply a refactor of the existing code
>> and does generate very effective neon instructions for the loop.
>>
>> ectest test which was always failing before in ARM on Windows is now
>> succeeding (as well as all the other tests).
> 
> I recall looking at code generated at x86 and not liking the result with
> code similar to what you suggest. Which is why those temporary values
> were added. I wonder if you could test following loop.
> 
>         for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
>                 {
>                 t_d[i] = (val>>BN_NIST_521_RSHIFT |
>                           (val=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
>                 }
>         t_d[i] = val>>BN_NIST_521_RSHIFT;

Other compilers issue warnings, so try this instead

        for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
                {
                t_d[i] = (val>>BN_NIST_521_RSHIFT |
                          (tmp=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
                val=tmp;
                }
        t_d[i] = val>>BN_NIST_521_RSHIFT;

If it doesn't work, then we'll go for removing temporary values.


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to