Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-06-10 Thread Jim Posen via bitcoin-dev
I generally like the direction of this proposal in terms of allowing full
nodes to run with a different storage/bandwidth tradeoff. Cory, were this
implemented, would you expect Core to support both operating modes (full
UTXO set and UHS) depending on user configuration, or would UHS be
mandatory?

Also, given that Bram Cohen's TXO bitfield proposal was an inspiration for
this, could you comment on why the UHS is preferable to that approach? An
alternative that goes even further in the direction of more bandwidth, less
storage, would be for nodes to simply maintain a Merkle Mountain Range over
all TXOs in order of creation and a spentness bitfield. Blocks could be
requested with the prev outputs and a Merkle proof linking them into the
MMR root. Since the Merkle proof is deterministic, it could be computed by
archive nodes and miners and saved alongside the block data for relay.
Another benefit of this is the TXO MMR root may be independently useful if
committed into the coinbase transaction.

On Thu, Jun 7, 2018 at 7:02 AM Sjors Provoost via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> eMMC storage, which low end devices often use, come in 2x increments.
> Running a pruned full node on 8 GB is difficult if not impossible (the UTXO
> set peaked at 3.5 GB in January, but a full node stores additional stuff).
>
> However, 16 GB is only €10 more expensive and presumably standard by the
> time this would be rolled out.
>
> On AWS every GB of SSD storage avoided saves $1 per year, not end of the
> world stuff, but not negligible either. Outbound traffic costs $0.10 / GB
> (ignoring free allowance), so when uploading 200 GB per year, the 5% would
> offset $1 of storage cost savings.
>
> The above seems marginal, probably not worth it unless there’s really no
> downside.
>
> What I find attractive about this proposal is the ability to squeeze more
> out of limited RAM (typically only 1 or 2 GB on these low end devices). I’d
> have to test Cory’s branch to see if that actually matters in practice.
>
> It’s also useful to distinguish benefits during initial sync from ongoing
> operation. The former I’ve almost given up on for  low end devices (can
> take weeks), in favor of doing it on a faster computer and copying the
> result. The latter needs far less RAM, so perhaps this proposal doesn’t
> help much there, but that would be useful to measure.
>
> Did you try the recent SHA256 optimizations on your branch?
>
> Sjors
>
> > Op 17 mei 2018, om 18:56 heeft Gregory Maxwell via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> het volgende geschreven:
> >
> > On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev
> >  wrote:
> >> Tl;dr: Rather than storing all unspent outputs, store their hashes.
> >
> > My initial thoughts are it's not _completely_ obvious to me that a 5%
> > ongoing bandwidth increase is actually a win to get something like a
> > 40% reduction in the size of a pruned node (and less than a 1%
> > reduction in an archive node) primarily because I've not seen size of
> > a pruned node cited as a usage limiting factor basically anywhere. I
> > would assume it is a win but wouldn't be shocked to see a careful
> > analysis that concluded it wasn't.
> >
> > But perhaps more interestingly, I think the overhead is not really 5%,
> > but it's 5% measured in the context of the phenomenally inefficient tx
> > mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ).
> > Napkin math on the size of a txn alone tells me it's more like a 25%
> > increase if you just consider size of tx vs size of
> > tx+scriptpubkeys,amounts.  If I'm not missing something there, I think
> > that would get in into a very clear not-win range.
> >
> > On the positive side is that it doesn't change the blockchain
> > datastructure, so it's something implementations could do without
> > marrying the network to it forever.
> >
>
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-06-07 Thread Sjors Provoost via bitcoin-dev
eMMC storage, which low end devices often use, come in 2x increments. Running a 
pruned full node on 8 GB is difficult if not impossible (the UTXO set peaked at 
3.5 GB in January, but a full node stores additional stuff).

However, 16 GB is only €10 more expensive and presumably standard by the time 
this would be rolled out.

On AWS every GB of SSD storage avoided saves $1 per year, not end of the world 
stuff, but not negligible either. Outbound traffic costs $0.10 / GB (ignoring 
free allowance), so when uploading 200 GB per year, the 5% would offset $1 of 
storage cost savings.

The above seems marginal, probably not worth it unless there’s really no 
downside.

What I find attractive about this proposal is the ability to squeeze more out 
of limited RAM (typically only 1 or 2 GB on these low end devices). I’d have to 
test Cory’s branch to see if that actually matters in practice.

It’s also useful to distinguish benefits during initial sync from ongoing 
operation. The former I’ve almost given up on for  low end devices (can take 
weeks), in favor of doing it on a faster computer and copying the result. The 
latter needs far less RAM, so perhaps this proposal doesn’t help much there, 
but that would be useful to measure.

Did you try the recent SHA256 optimizations on your branch?

Sjors

> Op 17 mei 2018, om 18:56 heeft Gregory Maxwell via bitcoin-dev 
>  het volgende geschreven:
> 
> On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev
>  wrote:
>> Tl;dr: Rather than storing all unspent outputs, store their hashes.
> 
> My initial thoughts are it's not _completely_ obvious to me that a 5%
> ongoing bandwidth increase is actually a win to get something like a
> 40% reduction in the size of a pruned node (and less than a 1%
> reduction in an archive node) primarily because I've not seen size of
> a pruned node cited as a usage limiting factor basically anywhere. I
> would assume it is a win but wouldn't be shocked to see a careful
> analysis that concluded it wasn't.
> 
> But perhaps more interestingly, I think the overhead is not really 5%,
> but it's 5% measured in the context of the phenomenally inefficient tx
> mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ).
> Napkin math on the size of a txn alone tells me it's more like a 25%
> increase if you just consider size of tx vs size of
> tx+scriptpubkeys,amounts.  If I'm not missing something there, I think
> that would get in into a very clear not-win range.
> 
> On the positive side is that it doesn't change the blockchain
> datastructure, so it's something implementations could do without
> marrying the network to it forever.
> 

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-05-18 Thread Alex Mizrahi via bitcoin-dev
You should read this:
https://bitcointalk.org/index.php?topic=153662.10

On Wed, May 16, 2018 at 7:36 PM, Cory Fields via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Tl;dr: Rather than storing all unspent outputs, store their hashes.
> Untrusted
> peers can supply the full outputs when needed, with very little overhead.
> Any attempt to spoof those outputs would be apparent, as their hashes
> would not
> be present in the hash set. There are many advantages to this, most
> apparently
> in disk and memory savings, as well as a validation speedup. The primary
> disadvantage is a small increase in network traffic. I believe that the
> advantages outweigh the disadvantages.
>
> --
>
> Bitcoin’s unspent transaction output set (usually referred to as “The UTXO
> set”) has two primary roles: providing proof that previous outputs exist
> to be
> spent, and providing the actual previous output data for verification when
> new
> transactions attempts to spend them. These roles are not usually discussed
> independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
> compelling reasons to consider them this way.
>
> To see why, consider running a node with the following changes:
>
> - For each new output, gather all extra data that will be needed for
>   verification when spending it later as an input: the amount,
> scriptPubKey,
>   creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh,
> etc.).
>   Call this the Dereferenced Prevout data.
> - Create a hash from the concatenation of the new outpoint and the
> dereferenced
>   prevout data. Call this a Unspent Transaction Output Hash.
> - Rather than storing the full dereferenced prevout entries in a UTXO set
> as is
>   currently done, instead store their hashes to an Unspent Transaction
> Output
>   Hash Set, or UHS.
> - When relaying a transaction, append the dereferenced prevout for each
> input.
>
> Now when a transaction is received, it contains everything needed for
> verification, including the input amount, height, and coinbaseness, which
> would
> have otherwise required a lookup the UTXO set.
>
> To verify an input's unspentness, again create a hash from the
> concatenation of
> the referenced outpoint and the provided dereferenced prevout, and check
> for
> its presence in the UHS. The hash will only be present if a hash of the
> exact
> same data was previously added to (and not since removed from) the UHS. As
> such, we are protected from a peer attempting to lie about the dereferenced
> prevout data.
>
> ### Some benefits of the UHS model
>
> - Requires no consensus changes, purely a p2p/implementation change.
>
> - UHS is substantially smaller than a full UTXO set (just over half for the
>   main chain, see below). In-memory caching can be much more effective as a
>   result.
>
> - A block’s transactions can be fully verified before doing a potentially
>   expensive database lookup for the previous output data. The UHS can be
>   queried afterwards (or in parallel) to verify previous output inclusion.
>
> - Entire blocks could potentially be verified out-of-order because all
> input
>   data is provided; only the inclusion checks have to be in-order.
> Admittedly
>   this is likely too complicated to be realistic.
>
> - pay-to-pubkey outputs are less burdensome on full nodes, since they use
> no
>   more space on-disk than pay-to-pubkey-hash or pay-to-script-hash.
> Taproot and
>   Graftroot outputs may share the same benefits.
>
> - The burden of holding UTXO data is technically shifted from the
> verifiers to
>   the spender. In reality, full nodes will likely always have a copy as
> well,
>   but conceptually it's a slight improvement to the incentive model.
>
> - Block data from peers can also be used to roll backwards during a reorg.
> This
>   potentially enables an even more aggressive pruning mode.
>
> - UTXO storage size grows exactly linearly with UTXO count, as opposed to
>   growing linearly with UTXO data size. This may be relevant when
> considering
>   new larger output types which would otherwise cause the UTXO Set size to
>   increase more quickly.
>
> - The UHS is a simple set, no need for a key-value database. LevelDB could
>   potentially be dropped as a dependency in some distant future.
>
> - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set
> hashes
>   [1]. Unspent Transaction Output Hashes would simply be mapped to points
> on a
>   curve before adding them to the set.
>
> - With the help of inclusion proofs and rolling hashes, libbitcoinconsensus
>   could potentially safely verify entire blocks. The size of the required
>   proofs would be largely irrelevant as they would be consumed locally.
>
> - Others?
>
> ### TxIn De-duplication
>
> Setting aside the potential benefits, the obvious drawback of using a UHS
> is a
> significant network traffic increase. Fortunately, some properties of
> transactions can be exploited to offset most of the difference.
>
> For q

Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-05-17 Thread Cory Fields via bitcoin-dev
Matt: That's a good point. I'll do up a chart comparing utxo size at each
block, as well as comparing over-the-wire size for each block. I think the
period of coalescing earlier this year should be a good example of what
you're describing.

Greg: heh, I was wondering how long it would take for someone to point out
that I'm cheating. I avoided using the word "compression", mostly to
side-step having the discussion turning into reworking the wire
serialization. But you're absolutely right, the de-duping benefits are
independent of the UHS use-case.

Cory

On Thu, May 17, 2018, 12:56 PM Gregory Maxwell  wrote:

> On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev
>  wrote:
> > Tl;dr: Rather than storing all unspent outputs, store their hashes.
>
> My initial thoughts are it's not _completely_ obvious to me that a 5%
> ongoing bandwidth increase is actually a win to get something like a
> 40% reduction in the size of a pruned node (and less than a 1%
> reduction in an archive node) primarily because I've not seen size of
> a pruned node cited as a usage limiting factor basically anywhere. I
> would assume it is a win but wouldn't be shocked to see a careful
> analysis that concluded it wasn't.
>
> But perhaps more interestingly, I think the overhead is not really 5%,
> but it's 5% measured in the context of the phenomenally inefficient tx
> mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ).
> Napkin math on the size of a txn alone tells me it's more like a 25%
> increase if you just consider size of tx vs size of
> tx+scriptpubkeys,amounts.  If I'm not missing something there, I think
> that would get in into a very clear not-win range.
>
> On the positive side is that it doesn't change the blockchain
> datastructure, so it's something implementations could do without
> marrying the network to it forever.
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-05-17 Thread Gregory Maxwell via bitcoin-dev
On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev
 wrote:
> Tl;dr: Rather than storing all unspent outputs, store their hashes.

My initial thoughts are it's not _completely_ obvious to me that a 5%
ongoing bandwidth increase is actually a win to get something like a
40% reduction in the size of a pruned node (and less than a 1%
reduction in an archive node) primarily because I've not seen size of
a pruned node cited as a usage limiting factor basically anywhere. I
would assume it is a win but wouldn't be shocked to see a careful
analysis that concluded it wasn't.

But perhaps more interestingly, I think the overhead is not really 5%,
but it's 5% measured in the context of the phenomenally inefficient tx
mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ).
Napkin math on the size of a txn alone tells me it's more like a 25%
increase if you just consider size of tx vs size of
tx+scriptpubkeys,amounts.  If I'm not missing something there, I think
that would get in into a very clear not-win range.

On the positive side is that it doesn't change the blockchain
datastructure, so it's something implementations could do without
marrying the network to it forever.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-05-17 Thread Matt Corallo via bitcoin-dev
Hey Cory,

I'm generally a fan of having an option to "prove a block is valid when
relaying it" instead of "just relay it", but I am concerned that this
proposal is overfitting the current UTXO set. Specifically, because UTXO
entries are (roughly) 32 bytes per output plus 32 bytes per transaction
on disk today, a material increase in batching and many-output
transactions may significantly reduce the UTXO-set-size gain in this
proposal while adding complexity to block relay as well as increase the
size of block data relayed, which can have adverse effects on
propagation. I'd love to see your tests re-run on simulated transaction
data with more batching of sends.

Matt

On 05/16/18 12:36, Cory Fields via bitcoin-dev wrote:
> Tl;dr: Rather than storing all unspent outputs, store their hashes. Untrusted
> peers can supply the full outputs when needed, with very little overhead.
> Any attempt to spoof those outputs would be apparent, as their hashes would 
> not
> be present in the hash set. There are many advantages to this, most apparently
> in disk and memory savings, as well as a validation speedup. The primary
> disadvantage is a small increase in network traffic. I believe that the
> advantages outweigh the disadvantages.
> 
> --
> 
> Bitcoin’s unspent transaction output set (usually referred to as “The UTXO
> set”) has two primary roles: providing proof that previous outputs exist to be
> spent, and providing the actual previous output data for verification when new
> transactions attempts to spend them. These roles are not usually discussed
> independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
> compelling reasons to consider them this way.
> 
> To see why, consider running a node with the following changes:
> 
> - For each new output, gather all extra data that will be needed for
>   verification when spending it later as an input: the amount, scriptPubKey,
>   creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, etc.).
>   Call this the Dereferenced Prevout data.
> - Create a hash from the concatenation of the new outpoint and the 
> dereferenced
>   prevout data. Call this a Unspent Transaction Output Hash.
> - Rather than storing the full dereferenced prevout entries in a UTXO set as 
> is
>   currently done, instead store their hashes to an Unspent Transaction Output
>   Hash Set, or UHS.
> - When relaying a transaction, append the dereferenced prevout for each input.
> 
> Now when a transaction is received, it contains everything needed for
> verification, including the input amount, height, and coinbaseness, which 
> would
> have otherwise required a lookup the UTXO set.
> 
> To verify an input's unspentness, again create a hash from the concatenation 
> of
> the referenced outpoint and the provided dereferenced prevout, and check for
> its presence in the UHS. The hash will only be present if a hash of the exact
> same data was previously added to (and not since removed from) the UHS. As
> such, we are protected from a peer attempting to lie about the dereferenced
> prevout data.
> 
> ### Some benefits of the UHS model
> 
> - Requires no consensus changes, purely a p2p/implementation change.
> 
> - UHS is substantially smaller than a full UTXO set (just over half for the
>   main chain, see below). In-memory caching can be much more effective as a
>   result.
> 
> - A block’s transactions can be fully verified before doing a potentially
>   expensive database lookup for the previous output data. The UHS can be
>   queried afterwards (or in parallel) to verify previous output inclusion.
> 
> - Entire blocks could potentially be verified out-of-order because all input
>   data is provided; only the inclusion checks have to be in-order. Admittedly
>   this is likely too complicated to be realistic.
> 
> - pay-to-pubkey outputs are less burdensome on full nodes, since they use no
>   more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Taproot 
> and
>   Graftroot outputs may share the same benefits.
> 
> - The burden of holding UTXO data is technically shifted from the verifiers to
>   the spender. In reality, full nodes will likely always have a copy as well,
>   but conceptually it's a slight improvement to the incentive model.
> 
> - Block data from peers can also be used to roll backwards during a reorg. 
> This
>   potentially enables an even more aggressive pruning mode.
> 
> - UTXO storage size grows exactly linearly with UTXO count, as opposed to
>   growing linearly with UTXO data size. This may be relevant when considering
>   new larger output types which would otherwise cause the UTXO Set size to
>   increase more quickly.
> 
> - The UHS is a simple set, no need for a key-value database. LevelDB could
>   potentially be dropped as a dependency in some distant future.
> 
> - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set hashes
>   [1]. Unspent Transaction Output Hashes would simply be mapped to points on a
>   curve

[bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-05-16 Thread Cory Fields via bitcoin-dev
Tl;dr: Rather than storing all unspent outputs, store their hashes. Untrusted
peers can supply the full outputs when needed, with very little overhead.
Any attempt to spoof those outputs would be apparent, as their hashes would not
be present in the hash set. There are many advantages to this, most apparently
in disk and memory savings, as well as a validation speedup. The primary
disadvantage is a small increase in network traffic. I believe that the
advantages outweigh the disadvantages.

--

Bitcoin’s unspent transaction output set (usually referred to as “The UTXO
set”) has two primary roles: providing proof that previous outputs exist to be
spent, and providing the actual previous output data for verification when new
transactions attempts to spend them. These roles are not usually discussed
independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
compelling reasons to consider them this way.

To see why, consider running a node with the following changes:

- For each new output, gather all extra data that will be needed for
  verification when spending it later as an input: the amount, scriptPubKey,
  creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, etc.).
  Call this the Dereferenced Prevout data.
- Create a hash from the concatenation of the new outpoint and the dereferenced
  prevout data. Call this a Unspent Transaction Output Hash.
- Rather than storing the full dereferenced prevout entries in a UTXO set as is
  currently done, instead store their hashes to an Unspent Transaction Output
  Hash Set, or UHS.
- When relaying a transaction, append the dereferenced prevout for each input.

Now when a transaction is received, it contains everything needed for
verification, including the input amount, height, and coinbaseness, which would
have otherwise required a lookup the UTXO set.

To verify an input's unspentness, again create a hash from the concatenation of
the referenced outpoint and the provided dereferenced prevout, and check for
its presence in the UHS. The hash will only be present if a hash of the exact
same data was previously added to (and not since removed from) the UHS. As
such, we are protected from a peer attempting to lie about the dereferenced
prevout data.

### Some benefits of the UHS model

- Requires no consensus changes, purely a p2p/implementation change.

- UHS is substantially smaller than a full UTXO set (just over half for the
  main chain, see below). In-memory caching can be much more effective as a
  result.

- A block’s transactions can be fully verified before doing a potentially
  expensive database lookup for the previous output data. The UHS can be
  queried afterwards (or in parallel) to verify previous output inclusion.

- Entire blocks could potentially be verified out-of-order because all input
  data is provided; only the inclusion checks have to be in-order. Admittedly
  this is likely too complicated to be realistic.

- pay-to-pubkey outputs are less burdensome on full nodes, since they use no
  more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Taproot and
  Graftroot outputs may share the same benefits.

- The burden of holding UTXO data is technically shifted from the verifiers to
  the spender. In reality, full nodes will likely always have a copy as well,
  but conceptually it's a slight improvement to the incentive model.

- Block data from peers can also be used to roll backwards during a reorg. This
  potentially enables an even more aggressive pruning mode.

- UTXO storage size grows exactly linearly with UTXO count, as opposed to
  growing linearly with UTXO data size. This may be relevant when considering
  new larger output types which would otherwise cause the UTXO Set size to
  increase more quickly.

- The UHS is a simple set, no need for a key-value database. LevelDB could
  potentially be dropped as a dependency in some distant future.

- Potentially integrates nicely with Pieter Wuille's Rolling UTXO set hashes
  [1]. Unspent Transaction Output Hashes would simply be mapped to points on a
  curve before adding them to the set.

- With the help of inclusion proofs and rolling hashes, libbitcoinconsensus
  could potentially safely verify entire blocks. The size of the required
  proofs would be largely irrelevant as they would be consumed locally.

- Others?

### TxIn De-duplication

Setting aside the potential benefits, the obvious drawback of using a UHS is a
significant network traffic increase. Fortunately, some properties of
transactions can be exploited to offset most of the difference.

For quick reference:

p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG
p2pkh scriptSig:[signature] [pubkey]

p2sh scriptPubKey:  HASH160 [script hash] EQUAL
p2sh scriptSig: [signature(s)] [script]

Notice that if a peer is sending a scriptPubKey and a scriptSig together, as
they would when using a UHS, there would likely be some redundancy. Using a
p2sh output for example, th