Hi Pieter, I wanted to say that I thought this write-up was excellent! And efficiently hashing the UTXO set in this rolling fashion is a very exciting idea!!
Peter R > On May 15, 2017, at 1:01 PM, Pieter Wuille via bitcoin-dev > <bitcoin-dev@lists.linuxfoundation.org> wrote: > > Hello all, > > I would like to discuss a way of computing a UTXO set hash that is > very efficient to update, but does not support any compact proofs of > existence or non-existence. > > Much has been written on the topic of various data structures and > derived hashes for the UTXO/TXO set before (including Alan Reiner's > trust-free lite nodes [1], Peter Todd's TXO MMR commitments [2] [3], > or Bram Cohen's TXO bitfield [4]). They all provide interesting extra > functionality or tradeoffs, but require invasive changes to the P2P > protocol or how wallets work, or force nodes to maintain their > database in a normative fashion. Instead, here I focus on an efficient > hash that supports nothing but comparing two UTXO sets. However, it is > not incompatible with any of those other approaches, so we can gain > some of the advantages of a UTXO hash without adopting something that > may be incompatible with future protocol enhancements. > > 1. Incremental hashing > > Computing a hash of the UTXO set is easy when it does not need > efficient updates, and when we can assume a fixed serialization with a > normative ordering for the data in it - just serialize the whole thing > and hash it. As different software or releases may use different > database models for the UTXO set, a solution that is order-independent > would seem preferable. > > This brings us to the problem of computing a hash of unordered data. > Several approaches that accomplish this through incremental hashing > were suggested in [5], including XHASH, AdHash, and MuHash. XHASH > consists of first hashing all the set elements independently, and > XORing all those hashes together. This is insecure, as Gaussian > elimination can easily find a subset of random hashes that XOR to a > given value. AdHash/MuHash are similar, except addition/multiplication > modulo a large prime are used instead of XOR. Wagner [6] showed that > attacking XHASH or AdHash is an instance of a generalized birthday > problem (called the k-sum problem in his paper, with unrestricted k), > and gives a O(2^(2*sqrt(n)-1)) algorithm to attack it (for n-bit > hashes). As a result, AdHash with 256-bit hashes only has 31 bits of > security. > > Thankfully, [6] also shows that the k-sum problem cannot be > efficiently solved in groups in which the discrete logarithm problem > is hard, as an efficient k-sum solver can be used to compute discrete > logarithms. As a result, MuHash modulo a sufficiently large safe prime > is provably secure under the DL assumption. Common guidelines on > security parameters [7] say that 3072-bit DL has about 128 bits of > security. A final 256-bit hash can be applied to the 3072-bit result > without loss of security to reduce the final size. > > An alternative to multiplication modulo a prime is using an elliptic > curve group. Due to the ECDLP assumption, which the security of > Bitcoin signatures already relies on, this also results in security > against k-sum solving. This approach is used in the Elliptic Curve > Multiset Hash (ECMH) in [8]. For this to work, we must "hash onto a > curve point" in a way that results in points without known discrete > logarithm. The paper suggests using (controversial) binary elliptic > curves to make that operation efficient. If we only consider > secp256k1, one approach is just reading potential X coordinates from a > PRNG until one is found that has a corresponding Y coordinate > according to the curve equation. On average, 2 iterations are needed. > A constant time algorithm to hash onto the curve exists as well [9], > but it is only slightly faster and is much more complicated to > implement. > > AdHash-like constructions with a sufficiently large intermediate hash > can be made secure against Wagner's algorithm, as suggested in [10]. > 4160-bit hashes would be needed for 128 bits of security. When > repetition is allowed, [8] gives a stronger attack against AdHash, > suggesting that as much as 400000 bits are needed. While repetition is > not directly an issue for our use case, it would be nice if > verification software would not be required to check for duplicated > entries. > > 2. Efficient addition and deletion > > Interestingly, both ECMH and MuHash not only support adding set > elements in any order but also deleting in any order. As a result, we > can simply maintain a running sum for the UTXO set as a whole, and > add/subtract when creating/spending an output in it. In the case of > MuHash it is slightly more complicated, as computing an inverse is > relatively expensive. This can be solved by representing the running > value as a fraction, and multiplying created elements into the > numerator and spent elements into the denominator. Only when the final > hash is desired, a single modular inverse and multiplication is needed > to combine the two. > > As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can > in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that > all of this is perfectly parallellizable: each thread can process an > arbitrary subset of the update operations, allowing them to be > efficiently combined later. > > 3. Comparison of approaches > > Numbers below are based on preliminary benchmarks on a single thread > of a i7-6820HQ CPU running at 3.4GHz. > > (1) (MuHash) Multiplying 3072-bit hashes mod 2^3072 - 1103717 (the > largest 3072-bit safe prime). > * Needs a fast modular multiplication/inverse implementation. > * Using SHA512 + ChaCha20 for generating the hashes takes 1.2us per > element. > * Modular multiplication using GMP takes 1.5us per element (2.5us > with a 60-line C+asm implementation). > * 768 bytes for maintaining a running sum (384 for numerator, 384 > for denominator) > * Very common security assumption. Even if the DL assumption would > be broken (but no k-sum algorithm faster than Wagner's is found), this > still maintains 110 bits of security. > > (2) (ECMH) Adding secp256k1 EC points > * Much more complicated than the previous approaches when > implementing from scratch, but almost no extra complexity when ECDSA > secp256k1 signature validation is already implemented. > * Using SHA512 + libsecp256k1's point decompression for generating > the points takes 11us per element on average. > * Addition/subtracting of N points takes 5.25us + 0.25us*N. > * 64 bytes for a running sum. > * Identical security assumption as Bitcoin's signatures. > > Using the numbers above, we find that: > * Computing the hash from just the UTXO set takes (1) 2m15s (2) 9m20s > * Processing all creations and spends in an average block takes (1) > 24ms (2) 100ms > * Processing precomputed per-transaction aggregates in an average > block takes (1) 3ms (2) 0.5ms > > Note that while (2) has higher CPU usage than (1) in general, it has > lower latency when using precomputed per-transaction aggregates. Using > such aggregates is also more feasible as they're only 64 bytes rather > than 768. Because of simplicity, (1) has my preference. > > Overall, these numbers are sufficiently low (note that they can be > parallellized) that it would be reasonable for full nodes and/or other > software to always maintain one of them, and effectively have a > rolling cryptographical checksum of the UTXO set at all times. > > 4. Use cases > > * Replacement for Bitcoin Core's gettxoutsetinfo RPC's hash > computation. This currently requires minutes of I/O and CPU, as it > serializes and hashes the entire UTXO set. A rolling set hash would > make this instant, making the whole RPC much more usable for sanity > checking. > * Assisting in implementation of fast sync methods with known good > blocks/UTXO sets. > * Database consistency checking: by remembering the UTXO set hash of > the past few blocks (computed on the fly), a consistency check can be > done that recomputes it based on the database. > > > [1] https://bitcointalk.org/index.php?topic=88208.0 > [2] > https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.html > [3] > https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/013591.html > [4] > https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.html > [5] https://cseweb.ucsd.edu/~mihir/papers/inchash.pdf > [6] https://people.eecs.berkeley.edu/~daw/papers/genbday.html > [7] https://www.keylength.com/ > [8] https://arxiv.org/pdf/1601.06502.pdf > [9] https://www.di.ens.fr/~fouque/pub/latincrypt12.pdf > [10] > http://csrc.nist.gov/groups/ST/hash/sha-3/Aug2014/documents/gligoroski_paper_sha3_2014_workshop.pdf > > Cheers, > > -- > Pieter > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev _______________________________________________ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev