Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Robert McKay
Basically the problem with that is that someone could setup a single 
full node that has the blockchain and can answer those challenges and 
then a bunch of other non-full nodes that just proxy any such challenges 
to the single full node.

Rob

On 2015-03-26 23:04, Matt Whitlock wrote:
 Maybe I'm overlooking something, but I've been watching this thread
 with increasing skepticism at the complexity of the offered solution.
 I don't understand why it needs to be so complex. I'd like to offer 
 an
 alternative for your consideration...

 Challenge:
 Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected
 bytes from the block chain)).

 Choose N such that it would be infeasible for the responding node to
 fetch all of the needed blocks in a short amount of time. In other
 words, assume that a node can seek to a given byte in a block stored
 on local disk much faster than it can download the entire block from 
 a
 remote peer. This is almost certainly a safe assumption.

 For example, choose N = 1024. Then the proving node needs to perform
 1024 random reads from local disk. On spinning media, this is likely
 to take somewhere on the order of 15 seconds. Assuming blocks are
 averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
 data. Can 500 MiB be downloaded in 15 seconds? This data transfer 
 rate
 is 280 Mbps. Almost certainly not possible. And if it is, just
 increase N. The challenge also becomes more difficult as average 
 block
 size increases.

 This challenge-response protocol relies on the lack of a partial
 getdata command in the Bitcoin protocol: a node cannot ask for only
 part of a block; it must ask for an entire block. Furthermore, nodes
 could ban other nodes for making too many random requests for blocks.


 On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:

  If I understand correctly, transforming raw blocks to keyed blocks
  takes 512x longer than transforming keyed blocks back to raw. The 
 key
  is public, like the IP, or some other value which perhaps changes 
 less
  frequently.
 
 Yes. I was thinking that the IP could be part of a first layer of
 encryption done to the blockchain data prior to the asymetric 
 operation.
 That way the asymmetric operation can be the same for all users (no
 different primers for different IPs, and then the verifiers does not
 have to verify that a particular p is actually a pseudo-prime 
 suitable
 for P.H. ) and the public exponent can be just 3.

 
  Two protocols can be performed to prove local possession:
  1. (prover and verifier pay a small cost) The verifier sends a 
 seed to
  derive some n random indexes, and the prover must respond with 
 the hash
  of the decrypted blocks within a certain time bound. Suppose that
  decryption of n blocks take 100 msec (+-100 msec of network 
 jitter).
  Then an attacker must have a computer 50 faster to be able to
  consistently cheat. The last 50 blocks should not be part of the 
 list to
  allow nodes to catch-up and encrypt the blocks in background.
 
 
  Can you clarify, the prover is hashing random blocks of 
 *decrypted*,
  as-in raw, blockchain data? What does this prove other than, 
 perhaps,
  fast random IO of the blockchain? (which is useful in its own 
 right,
  e.g. as a way to ensure only full-node IO-bound mining if baked 
 into
  the PoW)
 
  How is the verifier validating the response without possession of 
 the
  full blockchain?

 You're right, It is incorrect. Not the decrypted blocks must be 
 sent,
 but the encrypted blocks. There correct protocol is this:

 1. (prover and verifier pay a small cost) The verifier sends a seed 
 to
 derive some n random indexes, and the prover must respond with the 
 the
 encrypted blocks within a certain time bound. The verifier decrypts
 those blocks to check if they are part of the block-chain.

 But then there is this improvement which allows the verifier do 
 detect
 non full-nodes with much less computation:

 3. (prover pays a small cost, verifier smaller cost) The verifier 
 asks
 the prover to send a Merkle tree root of hashes of encrypted blocks 
 with
 N indexes selected by a psudo-random function seeded by a challenge
 value, where each encrypted-block is previously prefixed with the 
 seed
 before being hashed (e.g. N=100). The verifier receives the Markle 
 Root
 and performs a statistical test on the received information. From 
 the N
 hashes blocks, it chooses M  N (e.g. M = 20), and asks the proved 
 for
 the blocks at these indexes. The prover sends the blocks, the 
 verifier
 validates the blocks by decrypting them and also verifies that the
 Merkle tree was well constructed for those block nodes. This proves 
 with
 high probability that the Merkle tree was built on-the-fly and
 specifically for this challenge-response protocol.

  I also wonder about the effect of spinning disk versus SSD. Seek 
 time
  for 1,000 random reads is either nearly zero or dominating 
 depending
  on the two modes. I 

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Matt Whitlock
I agree that someone could do this, but why is that a problem? Isn't the goal 
of this exercise to ensure more full nodes on the network? In order to be able 
to answer the challenges, an entity would need to be running a full node 
somewhere. Thus, they have contributed at least one additional full node to the 
network. I could certainly see a case for a company to host hundreds of 
lightweight (e.g., EC2) servers all backed by a single copy of the block chain. 
Why force every single machine to have its own copy? All you really need to 
require is that each agency/participant have its own copy.


On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote:
 Basically the problem with that is that someone could setup a single 
 full node that has the blockchain and can answer those challenges and 
 then a bunch of other non-full nodes that just proxy any such challenges 
 to the single full node.
 
 Rob
 
 On 2015-03-26 23:04, Matt Whitlock wrote:
  Maybe I'm overlooking something, but I've been watching this thread
  with increasing skepticism at the complexity of the offered solution.
  I don't understand why it needs to be so complex. I'd like to offer 
  an
  alternative for your consideration...
 
  Challenge:
  Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected
  bytes from the block chain)).
 
  Choose N such that it would be infeasible for the responding node to
  fetch all of the needed blocks in a short amount of time. In other
  words, assume that a node can seek to a given byte in a block stored
  on local disk much faster than it can download the entire block from 
  a
  remote peer. This is almost certainly a safe assumption.
 
  For example, choose N = 1024. Then the proving node needs to perform
  1024 random reads from local disk. On spinning media, this is likely
  to take somewhere on the order of 15 seconds. Assuming blocks are
  averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
  data. Can 500 MiB be downloaded in 15 seconds? This data transfer 
  rate
  is 280 Mbps. Almost certainly not possible. And if it is, just
  increase N. The challenge also becomes more difficult as average 
  block
  size increases.
 
  This challenge-response protocol relies on the lack of a partial
  getdata command in the Bitcoin protocol: a node cannot ask for only
  part of a block; it must ask for an entire block. Furthermore, nodes
  could ban other nodes for making too many random requests for blocks.
 
 
  On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:
 
   If I understand correctly, transforming raw blocks to keyed blocks
   takes 512x longer than transforming keyed blocks back to raw. The 
  key
   is public, like the IP, or some other value which perhaps changes 
  less
   frequently.
  
  Yes. I was thinking that the IP could be part of a first layer of
  encryption done to the blockchain data prior to the asymetric 
  operation.
  That way the asymmetric operation can be the same for all users (no
  different primers for different IPs, and then the verifiers does not
  have to verify that a particular p is actually a pseudo-prime 
  suitable
  for P.H. ) and the public exponent can be just 3.
 
  
   Two protocols can be performed to prove local possession:
   1. (prover and verifier pay a small cost) The verifier sends a 
  seed to
   derive some n random indexes, and the prover must respond with 
  the hash
   of the decrypted blocks within a certain time bound. Suppose that
   decryption of n blocks take 100 msec (+-100 msec of network 
  jitter).
   Then an attacker must have a computer 50 faster to be able to
   consistently cheat. The last 50 blocks should not be part of the 
  list to
   allow nodes to catch-up and encrypt the blocks in background.
  
  
   Can you clarify, the prover is hashing random blocks of 
  *decrypted*,
   as-in raw, blockchain data? What does this prove other than, 
  perhaps,
   fast random IO of the blockchain? (which is useful in its own 
  right,
   e.g. as a way to ensure only full-node IO-bound mining if baked 
  into
   the PoW)
  
   How is the verifier validating the response without possession of 
  the
   full blockchain?
 
  You're right, It is incorrect. Not the decrypted blocks must be 
  sent,
  but the encrypted blocks. There correct protocol is this:
 
  1. (prover and verifier pay a small cost) The verifier sends a seed 
  to
  derive some n random indexes, and the prover must respond with the 
  the
  encrypted blocks within a certain time bound. The verifier decrypts
  those blocks to check if they are part of the block-chain.
 
  But then there is this improvement which allows the verifier do 
  detect
  non full-nodes with much less computation:
 
  3. (prover pays a small cost, verifier smaller cost) The verifier 
  asks
  the prover to send a Merkle tree root of hashes of encrypted blocks 
  with
  N indexes selected by a psudo-random function seeded by a challenge
  value, where each 

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Robert McKay
The main motivation is to try and stop a single entity running lots of 
nodes in order to harvest transaction origin IPs. That's what's behind 
this.

Probably the efforts are a waste of time.. if someone has to keep a few 
hundred copies of the blockchain around in order to keep IP specific 
precomputed data around for all the IPs they listen on then they'll just 
buy a handful of 5TB HDs and call it a day.. still some of the ideas 
proposed are quite interesting and might not have much downside.

Rob


On 2015-03-27 15:16, Matt Whitlock wrote:
 I agree that someone could do this, but why is that a problem? Isn't
 the goal of this exercise to ensure more full nodes on the network? 
 In
 order to be able to answer the challenges, an entity would need to be
 running a full node somewhere. Thus, they have contributed at least
 one additional full node to the network. I could certainly see a case
 for a company to host hundreds of lightweight (e.g., EC2) servers all
 backed by a single copy of the block chain. Why force every single
 machine to have its own copy? All you really need to require is that
 each agency/participant have its own copy.


 On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote:
 Basically the problem with that is that someone could setup a single
 full node that has the blockchain and can answer those challenges 
 and
 then a bunch of other non-full nodes that just proxy any such 
 challenges
 to the single full node.

 Rob

 On 2015-03-26 23:04, Matt Whitlock wrote:
  Maybe I'm overlooking something, but I've been watching this 
 thread
  with increasing skepticism at the complexity of the offered 
 solution.
  I don't understand why it needs to be so complex. I'd like to 
 offer
  an
  alternative for your consideration...
 
  Challenge:
  Send me: SHA256(SHA256(concatenation of N pseudo-randomly 
 selected
  bytes from the block chain)).
 
  Choose N such that it would be infeasible for the responding node 
 to
  fetch all of the needed blocks in a short amount of time. In other
  words, assume that a node can seek to a given byte in a block 
 stored
  on local disk much faster than it can download the entire block 
 from
  a
  remote peer. This is almost certainly a safe assumption.
 
  For example, choose N = 1024. Then the proving node needs to 
 perform
  1024 random reads from local disk. On spinning media, this is 
 likely
  to take somewhere on the order of 15 seconds. Assuming blocks are
  averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
  data. Can 500 MiB be downloaded in 15 seconds? This data transfer
  rate
  is 280 Mbps. Almost certainly not possible. And if it is, just
  increase N. The challenge also becomes more difficult as average
  block
  size increases.
 
  This challenge-response protocol relies on the lack of a partial
  getdata command in the Bitcoin protocol: a node cannot ask for 
 only
  part of a block; it must ask for an entire block. Furthermore, 
 nodes
  could ban other nodes for making too many random requests for 
 blocks.
 
 
  On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:
 
   If I understand correctly, transforming raw blocks to keyed 
 blocks
   takes 512x longer than transforming keyed blocks back to raw. 
 The
  key
   is public, like the IP, or some other value which perhaps 
 changes
  less
   frequently.
  
  Yes. I was thinking that the IP could be part of a first layer of
  encryption done to the blockchain data prior to the asymetric
  operation.
  That way the asymmetric operation can be the same for all users 
 (no
  different primers for different IPs, and then the verifiers does 
 not
  have to verify that a particular p is actually a pseudo-prime
  suitable
  for P.H. ) and the public exponent can be just 3.
 
  
   Two protocols can be performed to prove local possession:
   1. (prover and verifier pay a small cost) The verifier sends a
  seed to
   derive some n random indexes, and the prover must respond with
  the hash
   of the decrypted blocks within a certain time bound. Suppose 
 that
   decryption of n blocks take 100 msec (+-100 msec of network
  jitter).
   Then an attacker must have a computer 50 faster to be able to
   consistently cheat. The last 50 blocks should not be part of 
 the
  list to
   allow nodes to catch-up and encrypt the blocks in background.
  
  
   Can you clarify, the prover is hashing random blocks of
  *decrypted*,
   as-in raw, blockchain data? What does this prove other than,
  perhaps,
   fast random IO of the blockchain? (which is useful in its own
  right,
   e.g. as a way to ensure only full-node IO-bound mining if baked
  into
   the PoW)
  
   How is the verifier validating the response without possession 
 of
  the
   full blockchain?
 
  You're right, It is incorrect. Not the decrypted blocks must be
  sent,
  but the encrypted blocks. There correct protocol is this:
 
  1. (prover and verifier pay a small cost) The verifier sends a 
 seed
  to
  

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Matt Whitlock
On Friday, 27 March 2015, at 4:57 pm, Wladimir J. van der Laan wrote:
 On Fri, Mar 27, 2015 at 11:16:43AM -0400, Matt Whitlock wrote:
  I agree that someone could do this, but why is that a problem? Isn't the 
  goal of this exercise to ensure more full nodes on the network? In order to 
  be able to answer the challenges, an entity would need to be running a full 
  node somewhere. Thus, they have contributed at least one additional full 
  node to the network. I could certainly see a case for a company to host 
  hundreds of lightweight (e.g., EC2) servers all backed by a single copy of 
  the block chain. Why force every single machine to have its own copy? All 
  you really need to require is that each agency/participant have its own 
  copy.
 
 They would not even have to run one. It could just pass the query to a random 
 other node, and forward its result :)

D'oh. Of course. Thanks. :/

The suggestion about encrypting blocks with a key tied to IP address seems like 
a bad idea, though. Lots of nodes are on dynamic IP addresses. It wouldn't 
really be practical to re-encrypt the entire block chain every time a node's IP 
address changes.

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Matt Whitlock
On Friday, 27 March 2015, at 4:57 pm, Wladimir J. van der Laan wrote:
 On Fri, Mar 27, 2015 at 11:16:43AM -0400, Matt Whitlock wrote:
  I agree that someone could do this, but why is that a problem? Isn't the 
  goal of this exercise to ensure more full nodes on the network? In order to 
  be able to answer the challenges, an entity would need to be running a full 
  node somewhere. Thus, they have contributed at least one additional full 
  node to the network. I could certainly see a case for a company to host 
  hundreds of lightweight (e.g., EC2) servers all backed by a single copy of 
  the block chain. Why force every single machine to have its own copy? All 
  you really need to require is that each agency/participant have its own 
  copy.
 
 They would not even have to run one. It could just pass the query to a random 
 other node, and forward its result :)

Ah, easy way to fix that. In fact, in my first draft of my suggestion, I had 
the answer, but I removed it because I thought it was superfluous.

Challenge:
Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from 
the block chain | prover's nonce | verifier's nonce)).

The nonces are from the version messages exchanged at connection startup. A 
node can't pass the buck because it can't control the nonce that a random other 
node chooses.

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Jeremy Spilman

 On Mar 27, 2015, at 8:16 AM, Matt Whitlock b...@mattwhitlock.name wrote:
 
 Isn't the goal of this exercise to ensure more full nodes on the network?

Basically we're talking about a form of Sybil defense and better quantifying 
true blockchain resiliency by proof of storage.

In this case the goal is to see if we can prove the number of distinct digital 
copies of the blockchain. This is actually a tricky problem because it will 
(always?) devolve to inferences from response timing, and we are running over a 
heterogenous network with heterogeneous machines.

It would be extremely impressive to achieve a reliable mechanism for discerning 
a local copy exists under these constraints, particularly without false 
positives and false negatives, and without imposing very substantial one-time 
encoding costs, e.g. on par with doubling the verification cost. 

I think while its a difficult cost-benefit analysis, even code complexity 
aside, it's interesting to discuss all the same!

Simply having many unique IP addresses possibly accessing the same unique copy 
provides a different (if any) benefit. E.g. Tor uses IPs as a cost factor, but 
(until recently?) didn't even factor in things like them all being the same 
Class C. 

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Thy Shizzle
If the IP discovery is your main motivation, why don't you introduce some onion 
routing into transactions? That would solve this problem easily, of course 
there is an overhead which will slightly slow down the relay of transactions 
but not significantly, also make it an option not enforced, for those worried 
about IP association.

From: Robert McKaymailto:rob...@mckay.com
Sent: ‎28/‎03/‎2015 2:33 AM
To: Matt Whitlockmailto:b...@mattwhitlock.name
Cc: 
bitcoin-development@lists.sourceforge.netmailto:bitcoin-development@lists.sourceforge.net
Subject: Re: [Bitcoin-development] network disruption as a service and proof 
of local storage

The main motivation is to try and stop a single entity running lots of
nodes in order to harvest transaction origin IPs. That's what's behind
this.

Probably the efforts are a waste of time.. if someone has to keep a few
hundred copies of the blockchain around in order to keep IP specific
precomputed data around for all the IPs they listen on then they'll just
buy a handful of 5TB HDs and call it a day.. still some of the ideas
proposed are quite interesting and might not have much downside.

Rob


On 2015-03-27 15:16, Matt Whitlock wrote:
 I agree that someone could do this, but why is that a problem? Isn't
 the goal of this exercise to ensure more full nodes on the network?
 In
 order to be able to answer the challenges, an entity would need to be
 running a full node somewhere. Thus, they have contributed at least
 one additional full node to the network. I could certainly see a case
 for a company to host hundreds of lightweight (e.g., EC2) servers all
 backed by a single copy of the block chain. Why force every single
 machine to have its own copy? All you really need to require is that
 each agency/participant have its own copy.


 On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote:
 Basically the problem with that is that someone could setup a single
 full node that has the blockchain and can answer those challenges
 and
 then a bunch of other non-full nodes that just proxy any such
 challenges
 to the single full node.

 Rob

 On 2015-03-26 23:04, Matt Whitlock wrote:
  Maybe I'm overlooking something, but I've been watching this
 thread
  with increasing skepticism at the complexity of the offered
 solution.
  I don't understand why it needs to be so complex. I'd like to
 offer
  an
  alternative for your consideration...
 
  Challenge:
  Send me: SHA256(SHA256(concatenation of N pseudo-randomly
 selected
  bytes from the block chain)).
 
  Choose N such that it would be infeasible for the responding node
 to
  fetch all of the needed blocks in a short amount of time. In other
  words, assume that a node can seek to a given byte in a block
 stored
  on local disk much faster than it can download the entire block
 from
  a
  remote peer. This is almost certainly a safe assumption.
 
  For example, choose N = 1024. Then the proving node needs to
 perform
  1024 random reads from local disk. On spinning media, this is
 likely
  to take somewhere on the order of 15 seconds. Assuming blocks are
  averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
  data. Can 500 MiB be downloaded in 15 seconds? This data transfer
  rate
  is 280 Mbps. Almost certainly not possible. And if it is, just
  increase N. The challenge also becomes more difficult as average
  block
  size increases.
 
  This challenge-response protocol relies on the lack of a partial
  getdata command in the Bitcoin protocol: a node cannot ask for
 only
  part of a block; it must ask for an entire block. Furthermore,
 nodes
  could ban other nodes for making too many random requests for
 blocks.
 
 
  On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:
 
   If I understand correctly, transforming raw blocks to keyed
 blocks
   takes 512x longer than transforming keyed blocks back to raw.
 The
  key
   is public, like the IP, or some other value which perhaps
 changes
  less
   frequently.
  
  Yes. I was thinking that the IP could be part of a first layer of
  encryption done to the blockchain data prior to the asymetric
  operation.
  That way the asymmetric operation can be the same for all users
 (no
  different primers for different IPs, and then the verifiers does
 not
  have to verify that a particular p is actually a pseudo-prime
  suitable
  for P.H. ) and the public exponent can be just 3.
 
  
   Two protocols can be performed to prove local possession:
   1. (prover and verifier pay a small cost) The verifier sends a
  seed to
   derive some n random indexes, and the prover must respond with
  the hash
   of the decrypted blocks within a certain time bound. Suppose
 that
   decryption of n blocks take 100 msec (+-100 msec of network
  jitter).
   Then an attacker must have a computer 50 faster to be able to
   consistently cheat. The last 50 blocks should not be part of
 the
  list to
   allow nodes to catch-up and encrypt the blocks in