Re: [Bitcoin-development] network disruption as a service and proof of local storage
Basically the problem with that is that someone could setup a single full node that has the blockchain and can answer those challenges and then a bunch of other non-full nodes that just proxy any such challenges to the single full node. Rob On 2015-03-26 23:04, Matt Whitlock wrote: Maybe I'm overlooking something, but I've been watching this thread with increasing skepticism at the complexity of the offered solution. I don't understand why it needs to be so complex. I'd like to offer an alternative for your consideration... Challenge: Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from the block chain)). Choose N such that it would be infeasible for the responding node to fetch all of the needed blocks in a short amount of time. In other words, assume that a node can seek to a given byte in a block stored on local disk much faster than it can download the entire block from a remote peer. This is almost certainly a safe assumption. For example, choose N = 1024. Then the proving node needs to perform 1024 random reads from local disk. On spinning media, this is likely to take somewhere on the order of 15 seconds. Assuming blocks are averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of data. Can 500 MiB be downloaded in 15 seconds? This data transfer rate is 280 Mbps. Almost certainly not possible. And if it is, just increase N. The challenge also becomes more difficult as average block size increases. This challenge-response protocol relies on the lack of a partial getdata command in the Bitcoin protocol: a node cannot ask for only part of a block; it must ask for an entire block. Furthermore, nodes could ban other nodes for making too many random requests for blocks. On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote: If I understand correctly, transforming raw blocks to keyed blocks takes 512x longer than transforming keyed blocks back to raw. The key is public, like the IP, or some other value which perhaps changes less frequently. Yes. I was thinking that the IP could be part of a first layer of encryption done to the blockchain data prior to the asymetric operation. That way the asymmetric operation can be the same for all users (no different primers for different IPs, and then the verifiers does not have to verify that a particular p is actually a pseudo-prime suitable for P.H. ) and the public exponent can be just 3. Two protocols can be performed to prove local possession: 1. (prover and verifier pay a small cost) The verifier sends a seed to derive some n random indexes, and the prover must respond with the hash of the decrypted blocks within a certain time bound. Suppose that decryption of n blocks take 100 msec (+-100 msec of network jitter). Then an attacker must have a computer 50 faster to be able to consistently cheat. The last 50 blocks should not be part of the list to allow nodes to catch-up and encrypt the blocks in background. Can you clarify, the prover is hashing random blocks of *decrypted*, as-in raw, blockchain data? What does this prove other than, perhaps, fast random IO of the blockchain? (which is useful in its own right, e.g. as a way to ensure only full-node IO-bound mining if baked into the PoW) How is the verifier validating the response without possession of the full blockchain? You're right, It is incorrect. Not the decrypted blocks must be sent, but the encrypted blocks. There correct protocol is this: 1. (prover and verifier pay a small cost) The verifier sends a seed to derive some n random indexes, and the prover must respond with the the encrypted blocks within a certain time bound. The verifier decrypts those blocks to check if they are part of the block-chain. But then there is this improvement which allows the verifier do detect non full-nodes with much less computation: 3. (prover pays a small cost, verifier smaller cost) The verifier asks the prover to send a Merkle tree root of hashes of encrypted blocks with N indexes selected by a psudo-random function seeded by a challenge value, where each encrypted-block is previously prefixed with the seed before being hashed (e.g. N=100). The verifier receives the Markle Root and performs a statistical test on the received information. From the N hashes blocks, it chooses M N (e.g. M = 20), and asks the proved for the blocks at these indexes. The prover sends the blocks, the verifier validates the blocks by decrypting them and also verifies that the Merkle tree was well constructed for those block nodes. This proves with high probability that the Merkle tree was built on-the-fly and specifically for this challenge-response protocol. I also wonder about the effect of spinning disk versus SSD. Seek time for 1,000 random reads is either nearly zero or dominating depending on the two modes. I
Re: [Bitcoin-development] network disruption as a service and proof of local storage
I agree that someone could do this, but why is that a problem? Isn't the goal of this exercise to ensure more full nodes on the network? In order to be able to answer the challenges, an entity would need to be running a full node somewhere. Thus, they have contributed at least one additional full node to the network. I could certainly see a case for a company to host hundreds of lightweight (e.g., EC2) servers all backed by a single copy of the block chain. Why force every single machine to have its own copy? All you really need to require is that each agency/participant have its own copy. On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote: Basically the problem with that is that someone could setup a single full node that has the blockchain and can answer those challenges and then a bunch of other non-full nodes that just proxy any such challenges to the single full node. Rob On 2015-03-26 23:04, Matt Whitlock wrote: Maybe I'm overlooking something, but I've been watching this thread with increasing skepticism at the complexity of the offered solution. I don't understand why it needs to be so complex. I'd like to offer an alternative for your consideration... Challenge: Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from the block chain)). Choose N such that it would be infeasible for the responding node to fetch all of the needed blocks in a short amount of time. In other words, assume that a node can seek to a given byte in a block stored on local disk much faster than it can download the entire block from a remote peer. This is almost certainly a safe assumption. For example, choose N = 1024. Then the proving node needs to perform 1024 random reads from local disk. On spinning media, this is likely to take somewhere on the order of 15 seconds. Assuming blocks are averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of data. Can 500 MiB be downloaded in 15 seconds? This data transfer rate is 280 Mbps. Almost certainly not possible. And if it is, just increase N. The challenge also becomes more difficult as average block size increases. This challenge-response protocol relies on the lack of a partial getdata command in the Bitcoin protocol: a node cannot ask for only part of a block; it must ask for an entire block. Furthermore, nodes could ban other nodes for making too many random requests for blocks. On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote: If I understand correctly, transforming raw blocks to keyed blocks takes 512x longer than transforming keyed blocks back to raw. The key is public, like the IP, or some other value which perhaps changes less frequently. Yes. I was thinking that the IP could be part of a first layer of encryption done to the blockchain data prior to the asymetric operation. That way the asymmetric operation can be the same for all users (no different primers for different IPs, and then the verifiers does not have to verify that a particular p is actually a pseudo-prime suitable for P.H. ) and the public exponent can be just 3. Two protocols can be performed to prove local possession: 1. (prover and verifier pay a small cost) The verifier sends a seed to derive some n random indexes, and the prover must respond with the hash of the decrypted blocks within a certain time bound. Suppose that decryption of n blocks take 100 msec (+-100 msec of network jitter). Then an attacker must have a computer 50 faster to be able to consistently cheat. The last 50 blocks should not be part of the list to allow nodes to catch-up and encrypt the blocks in background. Can you clarify, the prover is hashing random blocks of *decrypted*, as-in raw, blockchain data? What does this prove other than, perhaps, fast random IO of the blockchain? (which is useful in its own right, e.g. as a way to ensure only full-node IO-bound mining if baked into the PoW) How is the verifier validating the response without possession of the full blockchain? You're right, It is incorrect. Not the decrypted blocks must be sent, but the encrypted blocks. There correct protocol is this: 1. (prover and verifier pay a small cost) The verifier sends a seed to derive some n random indexes, and the prover must respond with the the encrypted blocks within a certain time bound. The verifier decrypts those blocks to check if they are part of the block-chain. But then there is this improvement which allows the verifier do detect non full-nodes with much less computation: 3. (prover pays a small cost, verifier smaller cost) The verifier asks the prover to send a Merkle tree root of hashes of encrypted blocks with N indexes selected by a psudo-random function seeded by a challenge value, where each
Re: [Bitcoin-development] network disruption as a service and proof of local storage
The main motivation is to try and stop a single entity running lots of nodes in order to harvest transaction origin IPs. That's what's behind this. Probably the efforts are a waste of time.. if someone has to keep a few hundred copies of the blockchain around in order to keep IP specific precomputed data around for all the IPs they listen on then they'll just buy a handful of 5TB HDs and call it a day.. still some of the ideas proposed are quite interesting and might not have much downside. Rob On 2015-03-27 15:16, Matt Whitlock wrote: I agree that someone could do this, but why is that a problem? Isn't the goal of this exercise to ensure more full nodes on the network? In order to be able to answer the challenges, an entity would need to be running a full node somewhere. Thus, they have contributed at least one additional full node to the network. I could certainly see a case for a company to host hundreds of lightweight (e.g., EC2) servers all backed by a single copy of the block chain. Why force every single machine to have its own copy? All you really need to require is that each agency/participant have its own copy. On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote: Basically the problem with that is that someone could setup a single full node that has the blockchain and can answer those challenges and then a bunch of other non-full nodes that just proxy any such challenges to the single full node. Rob On 2015-03-26 23:04, Matt Whitlock wrote: Maybe I'm overlooking something, but I've been watching this thread with increasing skepticism at the complexity of the offered solution. I don't understand why it needs to be so complex. I'd like to offer an alternative for your consideration... Challenge: Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from the block chain)). Choose N such that it would be infeasible for the responding node to fetch all of the needed blocks in a short amount of time. In other words, assume that a node can seek to a given byte in a block stored on local disk much faster than it can download the entire block from a remote peer. This is almost certainly a safe assumption. For example, choose N = 1024. Then the proving node needs to perform 1024 random reads from local disk. On spinning media, this is likely to take somewhere on the order of 15 seconds. Assuming blocks are averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of data. Can 500 MiB be downloaded in 15 seconds? This data transfer rate is 280 Mbps. Almost certainly not possible. And if it is, just increase N. The challenge also becomes more difficult as average block size increases. This challenge-response protocol relies on the lack of a partial getdata command in the Bitcoin protocol: a node cannot ask for only part of a block; it must ask for an entire block. Furthermore, nodes could ban other nodes for making too many random requests for blocks. On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote: If I understand correctly, transforming raw blocks to keyed blocks takes 512x longer than transforming keyed blocks back to raw. The key is public, like the IP, or some other value which perhaps changes less frequently. Yes. I was thinking that the IP could be part of a first layer of encryption done to the blockchain data prior to the asymetric operation. That way the asymmetric operation can be the same for all users (no different primers for different IPs, and then the verifiers does not have to verify that a particular p is actually a pseudo-prime suitable for P.H. ) and the public exponent can be just 3. Two protocols can be performed to prove local possession: 1. (prover and verifier pay a small cost) The verifier sends a seed to derive some n random indexes, and the prover must respond with the hash of the decrypted blocks within a certain time bound. Suppose that decryption of n blocks take 100 msec (+-100 msec of network jitter). Then an attacker must have a computer 50 faster to be able to consistently cheat. The last 50 blocks should not be part of the list to allow nodes to catch-up and encrypt the blocks in background. Can you clarify, the prover is hashing random blocks of *decrypted*, as-in raw, blockchain data? What does this prove other than, perhaps, fast random IO of the blockchain? (which is useful in its own right, e.g. as a way to ensure only full-node IO-bound mining if baked into the PoW) How is the verifier validating the response without possession of the full blockchain? You're right, It is incorrect. Not the decrypted blocks must be sent, but the encrypted blocks. There correct protocol is this: 1. (prover and verifier pay a small cost) The verifier sends a seed to
Re: [Bitcoin-development] network disruption as a service and proof of local storage
On Friday, 27 March 2015, at 4:57 pm, Wladimir J. van der Laan wrote: On Fri, Mar 27, 2015 at 11:16:43AM -0400, Matt Whitlock wrote: I agree that someone could do this, but why is that a problem? Isn't the goal of this exercise to ensure more full nodes on the network? In order to be able to answer the challenges, an entity would need to be running a full node somewhere. Thus, they have contributed at least one additional full node to the network. I could certainly see a case for a company to host hundreds of lightweight (e.g., EC2) servers all backed by a single copy of the block chain. Why force every single machine to have its own copy? All you really need to require is that each agency/participant have its own copy. They would not even have to run one. It could just pass the query to a random other node, and forward its result :) D'oh. Of course. Thanks. :/ The suggestion about encrypting blocks with a key tied to IP address seems like a bad idea, though. Lots of nodes are on dynamic IP addresses. It wouldn't really be practical to re-encrypt the entire block chain every time a node's IP address changes. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] network disruption as a service and proof of local storage
On Friday, 27 March 2015, at 4:57 pm, Wladimir J. van der Laan wrote: On Fri, Mar 27, 2015 at 11:16:43AM -0400, Matt Whitlock wrote: I agree that someone could do this, but why is that a problem? Isn't the goal of this exercise to ensure more full nodes on the network? In order to be able to answer the challenges, an entity would need to be running a full node somewhere. Thus, they have contributed at least one additional full node to the network. I could certainly see a case for a company to host hundreds of lightweight (e.g., EC2) servers all backed by a single copy of the block chain. Why force every single machine to have its own copy? All you really need to require is that each agency/participant have its own copy. They would not even have to run one. It could just pass the query to a random other node, and forward its result :) Ah, easy way to fix that. In fact, in my first draft of my suggestion, I had the answer, but I removed it because I thought it was superfluous. Challenge: Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from the block chain | prover's nonce | verifier's nonce)). The nonces are from the version messages exchanged at connection startup. A node can't pass the buck because it can't control the nonce that a random other node chooses. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] network disruption as a service and proof of local storage
On Mar 27, 2015, at 8:16 AM, Matt Whitlock b...@mattwhitlock.name wrote: Isn't the goal of this exercise to ensure more full nodes on the network? Basically we're talking about a form of Sybil defense and better quantifying true blockchain resiliency by proof of storage. In this case the goal is to see if we can prove the number of distinct digital copies of the blockchain. This is actually a tricky problem because it will (always?) devolve to inferences from response timing, and we are running over a heterogenous network with heterogeneous machines. It would be extremely impressive to achieve a reliable mechanism for discerning a local copy exists under these constraints, particularly without false positives and false negatives, and without imposing very substantial one-time encoding costs, e.g. on par with doubling the verification cost. I think while its a difficult cost-benefit analysis, even code complexity aside, it's interesting to discuss all the same! Simply having many unique IP addresses possibly accessing the same unique copy provides a different (if any) benefit. E.g. Tor uses IPs as a cost factor, but (until recently?) didn't even factor in things like them all being the same Class C. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] network disruption as a service and proof of local storage
If the IP discovery is your main motivation, why don't you introduce some onion routing into transactions? That would solve this problem easily, of course there is an overhead which will slightly slow down the relay of transactions but not significantly, also make it an option not enforced, for those worried about IP association. From: Robert McKaymailto:rob...@mckay.com Sent: 28/03/2015 2:33 AM To: Matt Whitlockmailto:b...@mattwhitlock.name Cc: bitcoin-development@lists.sourceforge.netmailto:bitcoin-development@lists.sourceforge.net Subject: Re: [Bitcoin-development] network disruption as a service and proof of local storage The main motivation is to try and stop a single entity running lots of nodes in order to harvest transaction origin IPs. That's what's behind this. Probably the efforts are a waste of time.. if someone has to keep a few hundred copies of the blockchain around in order to keep IP specific precomputed data around for all the IPs they listen on then they'll just buy a handful of 5TB HDs and call it a day.. still some of the ideas proposed are quite interesting and might not have much downside. Rob On 2015-03-27 15:16, Matt Whitlock wrote: I agree that someone could do this, but why is that a problem? Isn't the goal of this exercise to ensure more full nodes on the network? In order to be able to answer the challenges, an entity would need to be running a full node somewhere. Thus, they have contributed at least one additional full node to the network. I could certainly see a case for a company to host hundreds of lightweight (e.g., EC2) servers all backed by a single copy of the block chain. Why force every single machine to have its own copy? All you really need to require is that each agency/participant have its own copy. On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote: Basically the problem with that is that someone could setup a single full node that has the blockchain and can answer those challenges and then a bunch of other non-full nodes that just proxy any such challenges to the single full node. Rob On 2015-03-26 23:04, Matt Whitlock wrote: Maybe I'm overlooking something, but I've been watching this thread with increasing skepticism at the complexity of the offered solution. I don't understand why it needs to be so complex. I'd like to offer an alternative for your consideration... Challenge: Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from the block chain)). Choose N such that it would be infeasible for the responding node to fetch all of the needed blocks in a short amount of time. In other words, assume that a node can seek to a given byte in a block stored on local disk much faster than it can download the entire block from a remote peer. This is almost certainly a safe assumption. For example, choose N = 1024. Then the proving node needs to perform 1024 random reads from local disk. On spinning media, this is likely to take somewhere on the order of 15 seconds. Assuming blocks are averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of data. Can 500 MiB be downloaded in 15 seconds? This data transfer rate is 280 Mbps. Almost certainly not possible. And if it is, just increase N. The challenge also becomes more difficult as average block size increases. This challenge-response protocol relies on the lack of a partial getdata command in the Bitcoin protocol: a node cannot ask for only part of a block; it must ask for an entire block. Furthermore, nodes could ban other nodes for making too many random requests for blocks. On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote: If I understand correctly, transforming raw blocks to keyed blocks takes 512x longer than transforming keyed blocks back to raw. The key is public, like the IP, or some other value which perhaps changes less frequently. Yes. I was thinking that the IP could be part of a first layer of encryption done to the blockchain data prior to the asymetric operation. That way the asymmetric operation can be the same for all users (no different primers for different IPs, and then the verifiers does not have to verify that a particular p is actually a pseudo-prime suitable for P.H. ) and the public exponent can be just 3. Two protocols can be performed to prove local possession: 1. (prover and verifier pay a small cost) The verifier sends a seed to derive some n random indexes, and the prover must respond with the hash of the decrypted blocks within a certain time bound. Suppose that decryption of n blocks take 100 msec (+-100 msec of network jitter). Then an attacker must have a computer 50 faster to be able to consistently cheat. The last 50 blocks should not be part of the list to allow nodes to catch-up and encrypt the blocks in