Hello Curves,

I’ve made a rough cut of an API and simple library for groups of order p with 
cofactor removal, called Decaf.  Please tell me if this looks like a useful 
API, what changes you would make to it, and whether you think the project is 
generally worthwhile.  Also, I’d be curious how you think it compares to other 
EC APIs.

The header is at 
http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/include/decaf.h 
<http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/include/decaf.h>
The implementation is at 
http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/src/decaf.c 
<http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/src/decaf.c>

The goal is a compromise between something like TweetNaCl and a full 
implementation.  The code should be simple enough to be easily audited.  But it 
makes a few compromises in favor of speed over simplicity, such as w=2 instead 
of w=1 on scalarmul.  Furthermore, the API is designed so that you can make it 
fast.

The API and library are not quite done yet.  They need more testing, 32-bit 
cleanliness, and most importantly your feedback.  The Decaf library implements 
only scalar and group operations.  Another section is coming later which will 
include keygen, signing, verification, etc.  I plan to use SHAKE as the hash 
function for these.

The curve in the library is currently Ed448-Goldilocks.  I’m thinking that I 
should modify the prefixes (decaf_448_* instead of decaf_*) to allow other 
curves, such E-521, a hypothetical 379- or 389-bit curve, and/or the twist of 
Curve25519.

Overall, large operations (scalarmul) are about 1/3 to 1/2 as fast as with the 
Goldilocks codebase on Haswell.  This is entirely due to optimizations in the 
Goldilocks codebase which are not in Decaf.  Decaf is written entirely in C 
with no intrinsics or vector operations, and isn’t using a dedicated field 
square, dedicated point double, lazy reduction or (currently) even Karatsuba.  
It doesn’t use a readd form (extensible+pniels) in its scalarmul, instead using 
extended points everywhere.  Its scalarmul uses a window of 2 instead of 5 for 
simplicity.  However, the API is designed to allow fast implementations.

Ironically, the Decaf Montgomery scalar code is much simpler than Goldilocks’ 
half-assed Barrett scalar code, and faster too, so I’m planning to replace the 
Goldilocks implementation with the Decaf implementation.


Possible missing functions:

decaf_scalar_decode_var:
Decodes a scalar which may be longer (or shorter) than the default length, for 
RNGs or hashes.

decaf_scalar_invert: returns (0,failure) on 0.

decaf_point_to_hash:
Undo Elligator; needs extra arg to choose which possible output; may fail.

decaf_point_precompute // decaf_point_precomputed_scalarmul:
Combs.  Maybe the API is allowed to not precompute if it doesn’t support it?

Other things?

Cheers,
— Mike Hamburg
_______________________________________________
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves

Reply via email to