Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-03-01 Thread Keagan McClelland via bitcoin-dev
> Personally I consider this counterproductive. Apart from the complexity,
it’s not healthy. And the chain grows linearly with storage cost falling
exponentially, leading to a straightforward conclusion.

The motivation for this change is not to encourage full archival nodes to
prune, but to make it possible for pruned nodes to beef up what kind of
archive they retain. Personally I think using the falling storage costs as
a means of providing access to more users is more important than using it
to justify requiring higher node requirements.

> Something to consider adding to this proposal is to keep the idea of
pruning - i.e. retain a sequentially uninterrupted number of the most
recent blocks.
>
> Many users do not run a node for entirely altruistic reasons - they do
so, at least in part, because it allows them to use their wallets
privately. Without this ability, I think the number of users willing to run
their node in this configuration might be reduced.
>
> Another related thought is to have a decreasing density over blocks over
time as you go backwards towards genesis, in order for the data density of
the storage to match the actual usage of the network, in which (I would
imagine) more recent blocks are more heavily requested than early ones.

Per my above comments, this change is actually capitalizing primarily upon
those who wish to do it for more altruistic reasons. Furthermore, doing
linear block scans when you need to query blocks that you don't keep does
not leak privacy details in the same way that bloom filters do. You are not
signaling to the peer that there is something specific in that block that
you care about, because you don't actually know. You are signalling only
that you do not have that block right now, which from the other parts of
the design you are already leaking. In light of this, I don't think that it
is necessary for the blocks to be in sequential sets at all. If there is no
requirement on them being sequential, uniform randomness will take care of
the density problem automatically.

Keagan


On Mon, Mar 1, 2021 at 4:20 AM Eric Voskuil via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> On Sun, Feb 28, 2021 at 10:18 AM Leo Wandersleb via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>
>
> > Only headers need to be downloaded sequentially so downloading relevant
> blocks from one node is totally possible with gaps in between.
>
>
>
> In fact this is exactly how libbitcoin v4 works. We download and store
> blocks in parallel. In the case of a restart block gaps are repopulated.
> Given that headers are validated, we go after the most responsive nodes.
> Based on standard deviation, we drop the slowest peers and rebalance load
> to new/empty channels. We make ordered but not necessarily sequential
> requests. There is no distinction between “initial” block download, a
> restart, or a single or few blocks at the top. So it’s referred to as
> continuous parallel block download.
>
>
>
> But we don’t prune. Personally I consider this counterproductive. Apart
> from the complexity, it’s not healthy. And the chain grows linearly with
> storage cost falling exponentially, leading to a straightforward conclusion.
>
>
>
> e
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-03-01 Thread Eric Voskuil via bitcoin-dev
On Sun, Feb 28, 2021 at 10:18 AM Leo Wandersleb via bitcoin-dev 
mailto:bitcoin-dev@lists.linuxfoundation.org> > wrote:

 

> Only headers need to be downloaded sequentially so downloading relevant 
> blocks from one node is totally possible with gaps in between.

 

In fact this is exactly how libbitcoin v4 works. We download and store blocks 
in parallel. In the case of a restart block gaps are repopulated. Given that 
headers are validated, we go after the most responsive nodes. Based on standard 
deviation, we drop the slowest peers and rebalance load to new/empty channels. 
We make ordered but not necessarily sequential requests. There is no 
distinction between “initial” block download, a restart, or a single or few 
blocks at the top. So it’s referred to as continuous parallel block download.

 

But we don’t prune. Personally I consider this counterproductive. Apart from 
the complexity, it’s not healthy. And the chain grows linearly with storage 
cost falling exponentially, leading to a straightforward conclusion.

 

e

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-03-01 Thread Craig Raw via bitcoin-dev
Something to consider adding to this proposal is to keep the idea of
pruning - i.e. retain a sequentially uninterrupted number of the most
recent blocks.

Many users do not run a node for entirely altruistic reasons - they do so,
at least in part, because it allows them to use their wallets privately.
Without this ability, I think the number of users willing to run their node
in this configuration might be reduced.

Another related thought is to have a decreasing density over blocks over
time as you go backwards towards genesis, in order for the data density of
the storage to match the actual usage of the network, in which (I would
imagine) more recent blocks are more heavily requested than early ones.

Craig

On Sun, Feb 28, 2021 at 10:18 AM Leo Wandersleb via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Only headers need to be downloaded sequentially so downloading relevant
> blocks
> from one node is totally possible with gaps in between.
>
> On 2/27/21 4:10 AM, Igor Cota via bitcoin-dev wrote:
> > Hi Keagan,
> >
> > I had a very similar idea. The only difference being for the node to
> decide on
> > a range of blocks to keep beforehand, rather than making the decision
> > block-by-block like you suggest.
> >
> > I felt the other nodes would be better served by ranges due to the
> sequential
> > nature of IBD. Perhaps this would be computationally lighter as well.
> >
> > I also encourage you to read Ryosuke Abe's paper [1] that proposes a DHT
> > scheme to solve this same problem.
> >
> > Cheers,
> > Igor
> >
> > [1] https://arxiv.org/abs/1902.02174
> >
> > On Fri, 26 Feb 2021 at 21:57, Keagan McClelland via bitcoin-dev
> >  > > wrote:
> >
> > Hi all,
> >
> > I've been thinking for quite some time about the problem of pruned
> nodes
> > and ongoing storage costs for full nodes. One of the things that
> strikes
> > me as odd is that we only really have two settings.
> >
> > A. Prune everything except the most recent blocks, down to the cache
> size
> > B. Keep everything since genesis
> >
> > From my observations and conversations with various folks in the
> > community, they would like to be able to run a "partially" pruned
> node to
> > help bear the load of bootstrapping other nodes and helping with data
> > redundancy in the network, but would prefer to not dedicate hundreds
> of
> > Gigabytes of storage space to the cause.
> >
> > This led me to the idea that a node could randomly prune some of the
> > blocks from history if it passed some predicate. A rough sketch of
> this
> > would look as follows.
> >
> > 1. At node startup, it would generate a random seed, this would be
> unique
> > to the node but not necessary that it be cryptographically secure.
> > 2. In the node configuration it would also carry a "threshold"
> expressed
> > as some percentage of blocks it wanted to keep.
> > 3. As IBD occurs, based off of the threshold, the block hash, and the
> > node's unique seed, the node would either decide to prune the data
> or keep
> > it. The uniqueness of the node's hash should ensure that no block is
> > systematically overrepresented in the set of nodes choosing this
> storage
> > scheme.
> > 4. Once the node's IBD is complete it would advertise this as a peer
> > service, advertising its seed and threshold, so that nodes could
> > deterministically deduce which of its peers had which blocks.
> >
> > The goals are to increase data redundancy in a way that more
> uniformly
> > shares the load across nodes, alleviating some of the pressure of
> full
> > archive nodes on the IBD problem. I am working on a draft BIP for
> this
> > proposal but figured I would submit it as a high level idea in case
> anyone
> > had any feedback on the initial design before I go into specification
> > levels of detail.
> >
> > If you have thoughts on
> >
> > A. The protocol design itself
> > B. The barriers to put this kind of functionality into Core
> >
> > I would love to hear from you,
> >
> > Cheers,
> > Keagan
> > ___
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > 
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >
> >
> >
> > --
> > *Igor Cota*
> > Codex Apertus d.o.o.
> >
> > ___
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfounda

Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-02-28 Thread Leo Wandersleb via bitcoin-dev
Only headers need to be downloaded sequentially so downloading relevant blocks
from one node is totally possible with gaps in between.

On 2/27/21 4:10 AM, Igor Cota via bitcoin-dev wrote:
> Hi Keagan,
>
> I had a very similar idea. The only difference being for the node to decide on
> a range of blocks to keep beforehand, rather than making the decision
> block-by-block like you suggest.
>
> I felt the other nodes would be better served by ranges due to the sequential
> nature of IBD. Perhaps this would be computationally lighter as well.
>
> I also encourage you to read Ryosuke Abe's paper [1] that proposes a DHT
> scheme to solve this same problem.
>
> Cheers,
> Igor
>
> [1] https://arxiv.org/abs/1902.02174
>
> On Fri, 26 Feb 2021 at 21:57, Keagan McClelland via bitcoin-dev
>  > wrote:
>
> Hi all,
>
> I've been thinking for quite some time about the problem of pruned nodes
> and ongoing storage costs for full nodes. One of the things that strikes
> me as odd is that we only really have two settings.
>
> A. Prune everything except the most recent blocks, down to the cache size
> B. Keep everything since genesis
>
> From my observations and conversations with various folks in the
> community, they would like to be able to run a "partially" pruned node to
> help bear the load of bootstrapping other nodes and helping with data
> redundancy in the network, but would prefer to not dedicate hundreds of
> Gigabytes of storage space to the cause.
>
> This led me to the idea that a node could randomly prune some of the
> blocks from history if it passed some predicate. A rough sketch of this
> would look as follows.
>
> 1. At node startup, it would generate a random seed, this would be unique
> to the node but not necessary that it be cryptographically secure.
> 2. In the node configuration it would also carry a "threshold" expressed
> as some percentage of blocks it wanted to keep.
> 3. As IBD occurs, based off of the threshold, the block hash, and the
> node's unique seed, the node would either decide to prune the data or keep
> it. The uniqueness of the node's hash should ensure that no block is
> systematically overrepresented in the set of nodes choosing this storage
> scheme.
> 4. Once the node's IBD is complete it would advertise this as a peer
> service, advertising its seed and threshold, so that nodes could
> deterministically deduce which of its peers had which blocks.
>
> The goals are to increase data redundancy in a way that more uniformly
> shares the load across nodes, alleviating some of the pressure of full
> archive nodes on the IBD problem. I am working on a draft BIP for this
> proposal but figured I would submit it as a high level idea in case anyone
> had any feedback on the initial design before I go into specification
> levels of detail.
>
> If you have thoughts on
>
> A. The protocol design itself
> B. The barriers to put this kind of functionality into Core
>
> I would love to hear from you,
>
> Cheers,
> Keagan
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> 
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
> -- 
> *Igor Cota*
> Codex Apertus d.o.o.
>
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-02-27 Thread David A. Harding via bitcoin-dev
On Sat, Feb 27, 2021 at 09:19:34AM -1000, David A. Harding via bitcoin-dev 
wrote:
> - Discussion thread 1: 
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/014186.html

Two particularly useful emails from that thread are:

- https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/014199.html
  which links to discussions about the topic prior to 2017, including
  discussion about DoS risks that are more important than the
  fingerprinting risk I mentioned in my previous reply.

- https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/014227.html
  which describes a potential way to distribute data with fewer DoS
  risks and less severe fingerprinting than each node storing a
  different set of blocks.

-Dave


signature.asc
Description: PGP signature
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-02-27 Thread Yuval Kogman via bitcoin-dev
On Fri, 26 Feb 2021 at 20:57, Keagan McClelland via bitcoin-dev
 wrote:

> The goals are to increase data redundancy in a way that more uniformly shares 
> the load across nodes, alleviating some of the pressure of full archive nodes 
> on the IBD problem. I am working on a draft BIP for this proposal but figured 
> I would submit it as a high level idea in case anyone had any feedback on the 
> initial design before I go into specification levels of detail.

You might be interested in an approach (henceforth "SeF") by Swanand
Kadhe, Jichan Chung and Kannan Ramchandran which employs fountain
codes, presented at Scaling Bitcoin 2019:
https://arxiv.org/abs/1906.12140

>From a cursory search it appears there is quite a bit of
related/followup work as well.

The simplest fountain code, the Luby Transform (applied in this work)
the encoder divides a message into smaller chunks, and then constructs
an infinite stream of codewords which are XORs of d randomly selected
chunks where d is sampled from the robust soliton distribution. The
simplest decoder simply XORs new k=1 codewords from the relevant k>1
codewords, and the full message can be recovered with overwhelming
probability and in linear time with a sufficiently large random sample
of codewords from the encoded stream. Note that the decoder must know
which chunks went into a codeword, but this is usually addressed using
pseudorandomness, which has other motivations in an adversarial
setting.

In SeF, the general idea is that "droplet nodes" are pruning nodes
which retain some number (equivalent to your threshold parameter) of
codewords from the encoding concatenated blocks (to obtain a fixed
message size), and serve these to compatible clients. This is more
robust than retaining a random sample of blocks, and also performs
well according to their simulations.

Even if this paper is not directly applicable to your efforts, the
theory of fountain codes should be of interest (many variants exist),
and there is work on fountain codes. There is also some work on
fountain codes in an adversarial setting (Falcon codes) but I'm only
vaguely familiar with it, and if i'm not mistaken most of the
considerations are either already implicitly addressed by the Bitcoin
protocol or explicitly addressed in the SeF paper, whose results also
take into account malicious droplet nodes.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-02-27 Thread Yuval Kogman via bitcoin-dev
On Sat, 27 Feb 2021 at 22:09, Yuval Kogman  wrote:
> and there is work on fountain codes. There is also some work on

apologies, the first half of this line should have been deleted as it
was worked into the next sentence.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-02-27 Thread David A. Harding via bitcoin-dev
On Fri, Feb 26, 2021 at 11:40:35AM -0700, Keagan McClelland via bitcoin-dev 
wrote:
> Hi all,

Hi Keagan,

> 4. Once the node's IBD is complete it would advertise this as a peer
> service, advertising its seed and threshold, so that nodes could
> deterministically deduce which of its peers had which blocks.

Although some of the details differed, I believe this general idea of
sharded block storage was previously discussed in the context of BIP159,
which warns:

"Peers may have different prune depths (depending on the peers
configuration, disk space, etc.) which can result in a
fingerprinting weakness (finding the prune depth through getdata
requests). NODE_NETWORK_LIMITED supporting peers SHOULD avoid
leaking the prune depth and therefore not serve blocks deeper than
the signaled NODE_NETWORK_LIMITED threshold (288 blocks)."

- BIP: 
https://github.com/bitcoin/bips/blob/master/bip-0159.mediawiki#counter-measures-for-peer-fingerprinting
- Discussion thread 1: 
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/014186.html
- Discussion thread 2: 
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014314.html
- Discussion thread 2, continued: 
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/014186.html
- BIP159-related PR, review comments: 
https://github.com/bitcoin/bitcoin/pull/10387

> If you have thoughts on
> 
> A. The protocol design itself
> B. The barriers to put this kind of functionality into Core
> 
> I would love to hear from you,

I think it would be unlikely for any popular node software to adopt a
technique that could make specific nodes easily fingerprintable on an
ongoing basis unless it solved some other urgent problem.  Luke Dashjr's
rough data collection currently shows 5,629 archival listening nodes,[1]
which is a substantial fraction of the roughly 10,000 listening nodes
reported by Addy Yeow,[2] so I don't think we're near the point of
needing to worry about the unavailability of historic blocks.

[1] https://luke.dashjr.org/programs/bitcoin/files/charts/services.html
[2] https://bitnodes.io/dashboard/

However, if there's a reasonable solution to the fingerprinting problem,
I do think node developers would find that very interesting.

-Dave


signature.asc
Description: PGP signature
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] A design for Probabilistic Partial Pruning

2021-02-27 Thread Igor Cota via bitcoin-dev
Hi Keagan,

I had a very similar idea. The only difference being for the node to decide
on a range of blocks to keep beforehand, rather than making the decision
block-by-block like you suggest.

I felt the other nodes would be better served by ranges due to the
sequential nature of IBD. Perhaps this would be computationally lighter as
well.

I also encourage you to read Ryosuke Abe's paper [1] that proposes a DHT
scheme to solve this same problem.

Cheers,
Igor

[1] https://arxiv.org/abs/1902.02174

On Fri, 26 Feb 2021 at 21:57, Keagan McClelland via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Hi all,
>
> I've been thinking for quite some time about the problem of pruned nodes
> and ongoing storage costs for full nodes. One of the things that strikes me
> as odd is that we only really have two settings.
>
> A. Prune everything except the most recent blocks, down to the cache size
> B. Keep everything since genesis
>
> From my observations and conversations with various folks in the
> community, they would like to be able to run a "partially" pruned node to
> help bear the load of bootstrapping other nodes and helping with data
> redundancy in the network, but would prefer to not dedicate hundreds of
> Gigabytes of storage space to the cause.
>
> This led me to the idea that a node could randomly prune some of the
> blocks from history if it passed some predicate. A rough sketch of this
> would look as follows.
>
> 1. At node startup, it would generate a random seed, this would be unique
> to the node but not necessary that it be cryptographically secure.
> 2. In the node configuration it would also carry a "threshold" expressed
> as some percentage of blocks it wanted to keep.
> 3. As IBD occurs, based off of the threshold, the block hash, and the
> node's unique seed, the node would either decide to prune the data or keep
> it. The uniqueness of the node's hash should ensure that no block is
> systematically overrepresented in the set of nodes choosing this storage
> scheme.
> 4. Once the node's IBD is complete it would advertise this as a peer
> service, advertising its seed and threshold, so that nodes could
> deterministically deduce which of its peers had which blocks.
>
> The goals are to increase data redundancy in a way that more uniformly
> shares the load across nodes, alleviating some of the pressure of full
> archive nodes on the IBD problem. I am working on a draft BIP for this
> proposal but figured I would submit it as a high level idea in case anyone
> had any feedback on the initial design before I go into specification
> levels of detail.
>
> If you have thoughts on
>
> A. The protocol design itself
> B. The barriers to put this kind of functionality into Core
>
> I would love to hear from you,
>
> Cheers,
> Keagan
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>


-- 
*Igor Cota*
Codex Apertus d.o.o.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[bitcoin-dev] A design for Probabilistic Partial Pruning

2021-02-26 Thread Keagan McClelland via bitcoin-dev
Hi all,

I've been thinking for quite some time about the problem of pruned nodes
and ongoing storage costs for full nodes. One of the things that strikes me
as odd is that we only really have two settings.

A. Prune everything except the most recent blocks, down to the cache size
B. Keep everything since genesis

>From my observations and conversations with various folks in the community,
they would like to be able to run a "partially" pruned node to help bear
the load of bootstrapping other nodes and helping with data redundancy in
the network, but would prefer to not dedicate hundreds of Gigabytes of
storage space to the cause.

This led me to the idea that a node could randomly prune some of the blocks
from history if it passed some predicate. A rough sketch of this would look
as follows.

1. At node startup, it would generate a random seed, this would be unique
to the node but not necessary that it be cryptographically secure.
2. In the node configuration it would also carry a "threshold" expressed as
some percentage of blocks it wanted to keep.
3. As IBD occurs, based off of the threshold, the block hash, and the
node's unique seed, the node would either decide to prune the data or keep
it. The uniqueness of the node's hash should ensure that no block is
systematically overrepresented in the set of nodes choosing this storage
scheme.
4. Once the node's IBD is complete it would advertise this as a peer
service, advertising its seed and threshold, so that nodes could
deterministically deduce which of its peers had which blocks.

The goals are to increase data redundancy in a way that more uniformly
shares the load across nodes, alleviating some of the pressure of full
archive nodes on the IBD problem. I am working on a draft BIP for this
proposal but figured I would submit it as a high level idea in case anyone
had any feedback on the initial design before I go into specification
levels of detail.

If you have thoughts on

A. The protocol design itself
B. The barriers to put this kind of functionality into Core

I would love to hear from you,

Cheers,
Keagan
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev