Hello Curves,
I’ve made a rough cut of an API and simple library for groups of order p with
cofactor removal, called Decaf. Please tell me if this looks like a useful
API, what changes you would make to it, and whether you think the project is
generally worthwhile. Also, I’d be curious how you think it compares to other
EC APIs.
The header is at
http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/include/decaf.h
<http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/include/decaf.h>
The implementation is at
http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/src/decaf.c
<http://sourceforge.net/p/ed448goldilocks/code/ci/decaf/tree/src/decaf.c>
The goal is a compromise between something like TweetNaCl and a full
implementation. The code should be simple enough to be easily audited. But it
makes a few compromises in favor of speed over simplicity, such as w=2 instead
of w=1 on scalarmul. Furthermore, the API is designed so that you can make it
fast.
The API and library are not quite done yet. They need more testing, 32-bit
cleanliness, and most importantly your feedback. The Decaf library implements
only scalar and group operations. Another section is coming later which will
include keygen, signing, verification, etc. I plan to use SHAKE as the hash
function for these.
The curve in the library is currently Ed448-Goldilocks. I’m thinking that I
should modify the prefixes (decaf_448_* instead of decaf_*) to allow other
curves, such E-521, a hypothetical 379- or 389-bit curve, and/or the twist of
Curve25519.
Overall, large operations (scalarmul) are about 1/3 to 1/2 as fast as with the
Goldilocks codebase on Haswell. This is entirely due to optimizations in the
Goldilocks codebase which are not in Decaf. Decaf is written entirely in C
with no intrinsics or vector operations, and isn’t using a dedicated field
square, dedicated point double, lazy reduction or (currently) even Karatsuba.
It doesn’t use a readd form (extensible+pniels) in its scalarmul, instead using
extended points everywhere. Its scalarmul uses a window of 2 instead of 5 for
simplicity. However, the API is designed to allow fast implementations.
Ironically, the Decaf Montgomery scalar code is much simpler than Goldilocks’
half-assed Barrett scalar code, and faster too, so I’m planning to replace the
Goldilocks implementation with the Decaf implementation.
Possible missing functions:
decaf_scalar_decode_var:
Decodes a scalar which may be longer (or shorter) than the default length, for
RNGs or hashes.
decaf_scalar_invert: returns (0,failure) on 0.
decaf_point_to_hash:
Undo Elligator; needs extra arg to choose which possible output; may fail.
decaf_point_precompute // decaf_point_precomputed_scalarmul:
Combs. Maybe the API is allowed to not precompute if it doesn’t support it?
Other things?
Cheers,
— Mike Hamburg
_______________________________________________
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves