Thanks for the submission! It seems that the BN_MONT_CTX-related code (used in crypto/ecdsa for constant-time signing) is entirely independent of the remainder of the patch, and should be considered separately.
Regarding your reference 'S.Gueron and V.Krasnov, "Fast Prime Field Elliptic Curve Cryptography with 256 Bit Primes"' for you NIST P-256 code, is that document available? (Web search only pointed me back to your patch.) I've noticed that for secret-independent constant-time memory access, your code relies on the scattering approach. However http://cryptojedi.org/peter/data/chesrump-20130822.pdf points out that apparently this doesn't actually work as intended. (Dan Bernstein's earlier references: Sections 14, 15 in http://cr.yp.to/papers.html#cachetiming; http://cr.yp.to/mac/athlon.html.) Note that in your code, OPENSSL_ia32cap_P-dependent initialization of global variables is not done in a thread-safe way. How about entirely avoiding this global state, and passing pointers down to the implementations? Your ec_p256_points_mul implementation is much worse than necessary when then input comprises many points (more precisely, more than one point other than the group generator), because you call ec_p256_windowed_mul multiple times separately and add the results. I'd suggest instead to implement this modeled on ec_GFp_nistp256_points_mul instead to benefit from interleaved left-to-right point multiplication. (This avoids the additional point-double operations from the separate point multiplication algorithm executions going through each additional scalar.) Your approach for precomputation also is different (using fewer point operations based on a larger precomputed table than the one we currently use in ec_GFp_nistp256_points_mul) -- that table size still seems appropriate, so keeping that probably makes sense. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org