Hi, I have some questions about implementing Curve448, having implemented it for mbedTLS in a pull request here: https://github.com/ARMmbed/mbedtls/pull/348
I've been following the discussion around the IRTF standard, but I'm still not quite sure what the recommended behaviour is for validating public points. In the latest standard, regarding public u-values: "When receiving such an array, implementations of X25519 (but not X448) MUST mask the most-significant bit in the final byte." (https://tools.ietf.org/html/draft-irtf-cfrg-curves-11#section-4.2 This suggests, but doesn't state, that implementations shouldn't do any masking for Curve448, but should instead just reduce the public value mod P448 (or issue an error if it's not in canonical form, probably my preferred implementation choice). Is that correct? Secondly, I have a question about the implementation of the arithmetic itself. I've had a hunt for Mike's various papers and presentations on Ed448-Goldilocks, and I think I understand the rationale for the choice of prime. What I can't find though is a simple do-this-do-that guide for implementers, like NIST publishes for their primes. In the end, I've come up with my own modular reduction formula after playing around with the equations, and it seems to be reasonable (three 448-sized additions), but I wondered if I'm missing any tricks that should make it even quicker. I'm aware that for 32-bit machines, you can split the numbers up into 32-bit limbs and write q=2^32, p = q^14 - q^7 - 1 and come up with a big complex formula to explictly do your reduction in terms of the limb values. I'm more interested in optimising for 64-bit though, so I've chosen to split the input into four "limbs" of 2^224 and do a few bignum additions on those limbs, which will use the bignum library's 64-bit CPU operations, and may come out quicker. The code is certainly simpler, and I don't think anyone deploying Curve448 really cares about performance as long as it's "good enough". Let N = A0 + 2^448 A1, let A1 = B0 + 2^224 B1. Then N (or N+P) mod P is A0+A1+B1+(B0+B1)*2^224. This works on paper, and produces the expected output on the test vectors from the IRTF spec. So my question is whether this is the expected reduction formula, or whether there's some method which is simpler still. Thanks for your help, Nick ---- Nicholas Wilson: nicho...@nicholaswilson.me.uk _______________________________________________ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves