Hi all, There are different generalizations of Ed25519, e.g. "EdDSA" from the Ed25519 authors, "EdDSA" from RFC 8032, XEdDSA/VXEdDSA, and SchnorrQ.
Despite all this I think there's room for improvement. So I'm thinking of factoring the XEdDSA spec into a more generalized EdDSA framework, with a few ideas: (A) "Synthetic nonces" based on hashing the message, private key, and some randomness, to combine the security benefits of randomization with "deterministic"-style nonces. (B) A signing function that takes a user-specified private scalar (instead of Ed25519-style key derivation) to support extensions like XEdDSA where signing uses an existing X25519 private key; or Bitcoin's Hierarchical Deterministic key derivation. (C) Features to help extension to other Schnorr signature variants (like VRFs). (D) A naming scheme and simple rules for combining curves and hashes, e.g - EdDSA_25519_SHA512 = compatible with Ed25519 - VEdDSA_25519_SHA512 = VRF extension - EdDSA_448_BLAKE2b - VEdDSA_P384_SHA3/512, etc (E) XEdDSA would be a thin wrapper around this generalized EdDSA, e.g. "XEdDSA_25519_SHA512" would just be "EdDSA_25519_SHA512" verified with an X25519 public key, and created with an X25519 private key. I'm looking for feedback, and wondering if anyone else would use this. Below I'll quickly sketch the algorithms, then give more rationale. Algorithm sketch ================= Recall Ed25519: # (K,k) : Signer's (public key, private scalar) # X : Signer's master secret key, byte sequence of length 32 # M : Message, byte sequence of arbitrary length # N : Nonce derivation key, byte sequence of length 32 # (R,r) : Nonce (R = public value aka "commitment", r = private value) # B : Elliptic curve base point of order q # q : Subgroup order # h : Schnorr challenge # hash() : SHA-512 ed25519_sign(K, X, M): k || N = hash(X) r = hash(N || M) (mod q) R = r*B h = hash(R || K || M) (mod q) s = r + (k*h) (mod q) return R || s As a first step we'll turn this into what I'll call a "synthetic nonce" EdDSA signing algorithm, for goals (A) and (B): # hash() : Any cryptographic hash with HASHLEN at least 64 bits larger than bitlen(q). # Z : Output from an RNG, byte sequence of length 32 # empty_labelset : byte sequence of length 3 = (0x02, 0x00, 0x00), explained later # pad1, pad2: padding filled with zero bytes to align the next hash input with a hash block (128-byte alignment for SHA-512). synthetic_eddsa_sign((K,k), Z, M): r = hash(B || empty_labelset || Z || pad1 || k || pad2 || empty_labelset || K || M) (mod q) R = r*B h = hash(R || K || M) (mod q) s = r + (k*h) (mod q) return R || s So this is the same as Ed25519 except it takes a private scalar k as argument, and performs a different nonce derivation. To roughly explain the nonce derivation (more later): * The secret nonce r must be uniformly random to an attacker, and must take a different value whenever h does. For security redundancy, this is done in two ways: first, by hashing random data (Z); second, by hashing the same inputs that are hashed into h (K and M) plus the secret key k. We'll discuss the value of this redundancy later. * B is hashed at the beginning to provide an indepent random oracle from the challenge hash h=hash(R...). Also explained more later. * The labelset is used to provide independent random oracles in more complex protocols. It can contain identifiers for the protocol, a user-specified customization label, and identifiers for the hash instance. For collision-resilience we hash the labelset a second time after the secret values (k and Z) have been mixed in. * k is hashed in its own hash block (due to padding) to provide a "cascade" PRF as in HMAC or AMAC: If Z is a random secret value, then r is the output of a PRF keyed by Z. If Z is public (e.g. a bad RNG) then r is the output of a PRF keyed by k. To make this more extensible, we'll rewrite it and expand the "labelset" idea: generalized_eddsa_sign((K,k), Z, M, customization_label): labelset = new_labelset("", customization_label) R, r = commit(labelset, "", (K,k), Z, M) h = challenge(labelset, "", R, K, M) s = prove(r, k, h) return R || s commit(labelset, extra, (K,k), Z, M): r = hash(B || labelset || Z || pad1 || k || pad2 || labelset || K || extra || M) (mod q) R = r*B return (R, r) challenge(labelset, extra, R, K, M): if is_labelset_empty(labelset): return hash(R || K || M) (mod q) else: return hash(B || labelset || R || labelset || K || extra || M) (mod q) prove(r, k, h): return r + (k*h) (mod q) new_labelset(protocol_name, customization_label): labelset[0] = 2 labelset = labelset || len(protocol_name) || protocol_name labelset = labelset || len(customization_label) || customization_label is_labelset_empty(labelset): return len(labelset) == 3 A labelset contains a sequence of byte-strings, with the "empty" labelset containing two zero-length byte strings for the protocol name and a customization label. Generalized EdDSA would use an empty protocol name and customization label by default, thus would be compatible with Ed25519, since the challenge would be h=hash(R || K || M). A non-empty labelset modifies the challenge hash: The labelset is hashed following B for domain separation, and is repeated following R for collision resilience. Thus even if an attacker finds 2 colliding labelsets where hash(B || labelset1) == hash(B || labelset2), a signature under labelset1 won't verify under labelset2. Non-empty labelsets also indicate a more complex protocol that might bind "extra" data into the Schnorr challenge. For example, below is a "VRF" extension ("VEdDSA") which provides a signature with the same security properties as EdDSA, but where every signature is associated with a VRF output which is an unpredictable but verifiably-unique function of the message and keypair. # c = cofactor (i.e. 8 for Ed25519) # v = VRF output, byte sequence of length 32 # Elligator2() maps a 256-bit value to an Ed25519 point generalized_veddsa_sign((K,k), Z, M, customization_label): labelset = new_labelset("VEdDSA_25519_SHA512_Elligator2", customization_label) labelset1 = add_label(labelset, "1") labelset2 = add_label(labelset, "2") labelset3 = add_label(labelset, "3") labelset4 = add_label(labelset, "4") Bv = c*Elligator2(hash(B || labelset1 || K || M)[...32]) Kv = k * Bv R, r = commit(labelset2, (Bv || Kv), (K,k), Z, M) Rv = r * Bv h = challenge(labelset3, (Bv || Kv || Rv), R, K, M) s = prove(r, k, h) v = hash(B || labelset4 || c*Kv)[...32] return (Kv || h || s), v add_label(labelset, new_label): labelset[0]++ return labelset || len(new_label) || new_label Note that the added hashes for Bv and v use the form hash(B || labelset || ...), which allows them to be analyzed as independent random oracles. Note also that we're careful to bind the new values Bv, Kv, and Rv as "extra" data in the Schnorr challenge. We also include Bv and Kv as "extra" data in nonce derivation, following the "deterministic" strategy of making the nonce depend on the same inputs as the challenge hash. We could use extra_data for additional extensions, e.g. adding a trapdoor commitment to add a "designated verifier" property. To extend all this another way: XEdDSA and "XVEdDSA" signatures could be seen as EdDSA and VEdDSA signatures that are created and verified with Montgomery-form keys (X25519 or X448): generalized_xeddsa_sign(k_mont, Z, M, customization_label): K, k = calculate_key_pair(k_mont) return generalized_eddsa_sign((K,k), Z, M, customization_label) generalized_xveddsa_sign(k_mont, Z, M, customization_label): K, k = calculate_key_pair(k_mont) return generalized_veddsa_sign((K,k), Z, M, customization_label) I've skipped over a lot of things, so more rationales below: Rationales =========== Nonce randomization -------------------- Traditionally discrete-log signatures choose r from an RNG. (By "discrete-log signatures" meaning both Schnorr signatures like EdDSA, and DSA/ECDSA which calculate s differently). "Deterministic" signatures have been popularized by Ed25519 for Schnorr, and RFC 6979 for DSA/ECDSA. In a deterministic signature the nonce is a hash or PRF of the message and some secret key which is unique for each signer (e.g. "N" in Ed25519, or the signer's private scalar (our "k") in RFC 6979). Deterministic signatures are intended to be more robust, since they're not vulnerable to RNG failures. Even a small RNG failure could be catastrophic: (a) If a nonce r becomes known to an attacker, the signer's private scalar is solvable as k = (s-r)/h. (b) If r repeats with different h, the private scalar is solvable as k = (s1-s2)/(h1-h2). (c) If the attacker has a few bits of information about many nonces, the private scalar might be solved with more advanced techniques [NONCE]. RNG failures happen, so making the nonce a function of the message and a secret key is a good idea. However randomization *also* provides benefits: (1) Faulty computations could be caused by attackers glitching clock or power lines, or other tampering. Without randomization, any fault in the calculation R=r*B, or h=hash(...), while signing the same message repeatedly, would give multiple signatures with the same r but different h. This would leak the private scalar as in (b), provided the attacker can calculate or guess the faulty h [FAULT]. With randomization, I believe fault attacks are less effective, for example: - Flipping one bit in the private scalar to leak the private scalar in hundreds of faulty signatures [FAULT1] - Randomizing one byte of the private scalar to leak the private scalar in thousands of faulty signatures [FAULT2] - Skipping scalarmult instructions to leak the private scalar in tens of faulty ECDSA signatures [FAULT3] (2) Without randomization, the algorithm is fragile to modifications and misuse. In particular, modifying it to add an extra input to h=... without also adding the input to r=... would leak the private scalar if the same message is signed with a different extra input. So would signing a message twice, once passing in an incorrect public key K (though the synthetic-nonce algorithm fixes this separately by hashing K into r). (3) Without randomization, an attacker can trigger repeated calculations using the same inputs. Consider a sidechannel attacker recording traces of the signer's power consumption or EM radiation. This attacker might request repeated signing of the same message in an attempt to average out noise and get information about secret inputs in the calculations of r, R, or s. Of course, there are other ways besides randomization to mitigate fault and sidechannel risk. Verifying a new signature before returning it, or performing the signature twice and comparing, would help against faults, but makes the signing operation a few times slower. The R = r*B calculation could be "blinded" in different ways to reduce sidechannel risk, but this probably involves either a larger, slower scalar, e.g. R = (r+qj)B for random j; or calculating new blinding / unblinding factors frequently. So random nonces appear to be a simple and efficient way to increase protection. Given this, I'll argue that randomized vs. deterministic is a false dichotomy here. For private-key protection it's not important that nonces be deterministic based on the message; nonces just have to differ for different "h" values. Mixing (key, message) into the nonce can provide this when an RNG fails, and an RNG can provide this when mixing (key, message) fails. Combining these approaches yields what we could call a "synthetic nonce" (by analogy with "synthetic IV" authenticated encryption). Combining these approaches isn't novel. Adam Langley switched OpenSSL's DSA/ECDSA to combined hashing of RNG output and private scalar in 2013, and this remains in OpenSSL plus the BoringSSL and LibreSSL forks [OPENSSL]. Let's consider two counter-arguments: * DJB has pointed out that a malicious RNG might inspect memory and choose RNG outputs to bias the nonce and leak the private key as in (c) above [ENTROPY]. I'd argue that's a far-fetched threat which is outweighed by the above risks. * It could be argued that determinism is valuable apart from private-key protection, e.g. to help testing. The above signing functions specify exactly how the random input Z is used, so are deterministic and testable. Hashing private scalar into nonce ---------------------------------- Having decided to mix a function of (secret key, message) into the nonce, what secret key to use? * The signer's private scalar k as in OpenSSL, XEdDSA, RFC 6979, and libsecp256k1? * A separate key N, as in Ed25519? Using N has a clearer security analysis: We can consider nonce derivation as a PRF with key N, so there's not the "circularity" of deriving r and h from k, then using r to mask s=r+kh. However dealing with N complicates the API, since the signer has to either store and pass N to the signing function, or pass in a single "master key" which derives both k and N. Ed25519 uses such a master key. However, that means Ed25519 can't be used with an external private scalar, such as an X25519 scalar for XEdDSA, or a derived scalar as in Bitcoin's BIP32 Hierarchical Deterministic key derivation. For a generalized EdDSA I think a simple and flexible API which supports external scalars is best, thus the signing function mixes in k. In practice, I think this has security equivalent to using N: * The main Schnorr security proof is the Pointcheval-Stern proof in the Random Oracle Model. In the ROM it doesn't matter whether we query the Random Oracle with k or N. * Hashing the scalar has seen wide deployment for DSA/ECDSA (e.g. OpenSSL, and Bitcoin's libsecp256k1). While DSA/ECDSA isn't identical to Schnorr, the main relevant difference seems to be lack of a ROM proof for DSA/ECDSA. If people are comfortable with this construct in DSA/ECDSA (without a proof), I think they should be comfortable in the Schnorr case (with a proof). * With synthetic nonces, nonce derivation is a PRF keyed by the random input Z. Thus mixing k (or N) can be viewed as a backup strategy, which is only relevant if the RNG fails. * I'm finding it hard to come up with a plausible hash weakness that is likely to be insecure with k but secure with N. (Anyone?) If someone prefers using a nonce derivation key N, that's still easy with a pre-processing step: Z' = hash(N || pad || Z)[...32] generalized_eddsa_sign(..., Z', ...) For example, here's a synthetic-nonce Ed25519 which uses an Ed25519 "master key" X: generalized_ed25519_sign(K, X, Z, M): k || N = hash(X) Z' = hash(N || pad || Z)[...32] return generalized_eddsa_sign((K,k), Z', M, "") Choice of hash --------------- Ed25519 and the RFC 8032 variants tie each curve to a single hash function (e.g. SHA-512 for Ed25519, and SHAKE256 for Ed448). They also specify hash outputs of double the public key size, requiring an unnecessary and unusual 912-bit hash for Ed448 signatures. I think it would be better to allow flexible combination of hashes and EC groups (and even non-EC groups, such as DSA domain parameters). The hash would be required to have output length at least 64 bits larger than the subgroup order q. The extra 64 bits would ensure that any bias in r and h, after the hash output is reduced by q, is very small. This would also match the nonce-derivation requirement in FIPS 186-4 section B.5.1. Finally, this would allow common 512-bit hash functions with Curve448 and smaller curves. Naming ------- The "EdDSA" name should probably be kept as part of a generalized naming scheme. This name has become associated with Schnorr plus hashing the message into nonce. Specifying the group and hash in the name makes the combinations explicit. Consider the below group and hash names, and examples. "25519" = Edwards encoding of Curve25519 from RFC 7448 "448" = Edwards encoding of Curve448 from RFC 7448 "Ed448" = Edwards encoding of Edwards448 from RFC 7448 "Ed448decaf" = Decaf encoding of Ed448 "FourQ" = Compressed encoding of MSR's FourQ "secp256k1" = Compressed encoding of Bitcoin's secp256k1 "P256" = Compressed encoding of NIST P-256 "P384" = Compressed encoding of NIST P-384 "P521" = Compressed encoding of NIST P-521 "SHA512" = SHA-512 "BLAKE2b" = BLAKE2b with output 512 bits "SHA3/512" = SHA3-512 "SHAKE256/N" = SHAKE256 with output N bits Examples: EdDSA_25519_SHA512 (compatible with Ed25519) EdDSA_FourQ_BLAKE2b XEdDSA_448_SHA512 VEdDSA_Ed448decaf_SHAKE256/512 XVEdDSA_25519_BLAKE2b EdDSA_P384_SHA512 VEdDSA_P521_SHAKE256/640 Labelsets ---------- Labelsets have a few uses: * A user might want to create EdDSA, VEdDSA, and other types of signatures from the same key pair. These different variants should use hashes that can be modelled as independent random oracles, so that signatures of one type can't be interpreted as, or affect the security of, other types. * A protocol might involve several hash calls which should be independent random oracles (as in the VRF case). * A user might want to customize signatures so they are bound to a specific application context. The "customization labels" use (3rd bullet) has debatable value. RFC 8032 supports them ("contexts"), but I'm not aware of protocols planning to use them. Because we'd have a label mechanism it would be easy to experiment with customization labels, and leave them empty if they don't prove useful. Labelsets aren't compatible with RFC 8032, but I don't think the RFC 8032 "context" mechanism is that good: * RFC 8032 contexts are inconsistent between Ed25519 and Ed448: Ed25519 doesn't support contexts; Ed448 does, and is analogous to Ed25519ctx (not Ed25519), except with Ed25519ctx one "SHOULD NOT" use zero-length contexts. Also, while Ed25519ctx and Ed448 both use special prefixes for domain separation in the hash function, the details are different. * The RFC 8032 mechanism doesn't provide a space for protocol names or additional labels. * RFC 8032 contexts have weaker security than message contents because they are hashed before R, and thus a hash collision between a pair of contexts (C1, C2) would allow a message signed under context C1 to be verified under context C2. Schnorr is normally more resilient to hash collisions due to R randomizing the hash calculation. Using hash(B || labelset || ...) with a unique labelset provides an independent random oracle, because: * Uniqueness of labelsets provides domain-separation between hashes of the form hash(B || labelset || ...). * Assuming hash(B || labelset || ...) is used for all new hashing in these signatures, the only non-domain-separated case is the EdDSA challenge hash(R || K || M), and hash(B || labelset || ...) if B=R. These ROM hash queries can be answered by the latter (non-empty labelset) oracles without affecting the EdDSA ROM security proof for hash(R || K || M), because: - A simulator for the EdDSA signing oracle won't need to program the hash oracle for the case R=B, since the simulator can choose R at random. - An extractor for the EdDSA private key will also never need to program this case, since a forgery with R=B allows the private key k to be extracted directly via: R = sB - hK (verification equation) B = sB - hK (R = B) 1 = s - hk k = (s-1)/h Conclusion =========== I'd love any feedback on why people would or wouldn't use this, thoughts, ideas for improvement, etc. Thanks to feedback from this list, in particular Brian Smith, for ideas and motivation about refactoring. Thanks also to David Wu, Henry Corrigan-Gibbs, and Samuel Neves, for feedback and discussion. References =========== [NONCE] https://eprint.iacr.org/2014/434.pdf https://eprint.iacr.org/2013/346.pdf [FAULT] Observation due to Benedikt Schmdit, 2016, private conversation. Andrey Jivsov had earlier made a similar, though not identical, observation: https://www.ietf.org/mail-archive/web/cfrg/current/msg06759.html Another paper in 2016 made a similar observation, but mistakenly thought Ed25519 was resistant since the nonce key N wouldn't leak: https://link.springer.com/chapter/10.1007%2F978-3-319-44524-3_11 [FAULT1] https://www.researchgate.net/publication/221291537_Breaking_Public_Key_Cryptosystems_on_Tamper_Resistant_Devices_in_the_Presence_of_Transient_Faults [FAULT2] "Fault attacks on Signature Schemes" ftp://nozdr.ru/biblio/kolxo3/Cs/CsLn/Information%20Security%20and%20Privacy,%209%20conf.,%20ACISP%202004(LNCS3108,%20Springer,%202004)(ISBN%203540223797)(504s).pdf#page=489 [FAULT3] "A Fault Attack on ECDSA" https://dl.acm.org/citation.cfm?id=1731211 [OPENSSL] https://github.com/openssl/openssl/commit/8a99cb29d1f0013243a532bccc1dc70ed678eebe https://github.com/openssl/openssl/commit/190c615d4398cc6c8b61eb7881d7409314529a75 [ENTROPY] https://blog.cr.yp.to/20140205-entropy.html _______________________________________________ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves