Re: [Bitcoin-development] Privacy and blockchain data
On Tue, Jan 07, 2014 at 10:34:46PM -0800, Jeremy Spilman wrote: > > > >2) Common prefixes: Generate addresses such that for a given wallet they > > all share a fixed prefix. The length of that prefix determines the > > anonymity set and associated privacy/bandwidth tradeoff, which > > remainds a fixed ratio of all transactions for the life of the > > wallet. > > > > Interesting thought to make the privacy/bandwidth trade-off using > vanitygen and prefix filters. > > But doesn't this effectively expand the universe of potential spies > from 'the global attacker' who is watching your SPV queries, to > simply 'the globe' -- anyone with a copy of the blockchain? It's a trade-off. Most people are going to use public peers for their SPV nodes - they're not going to run full nodes. They also are going to want to limit how much bandwidth they use to sync their wallets; if they don't care the can use a very short, or no, prefix and the problem goes away. If you make that bandwidth/privacy trade-off by using very specific filters and non-specific addresses then you have a situation where those public peers are learning a lot of potentially valuable data. It's easy to imagine, say, the IRS being willing to pay for data on how many Bitcoins people have in their wallets to try to catch tax cheats for instance, and that can easily fund a lot of fast and high-quality peers that don't advertise the fact that they're selling data on the contents of your wallet. On the other hand if you use non-specific filters, and prefixed addresses for incoming payments, then you're not leaking high-quality information to anyone. I think this makes for a more robust Bitcoin system, especially as we need things like CoinJoin for privacy that make *everyones* privacy matter to you; CoinJoin could easily be defeated by aquiring lots of good info on the contents of wallets through SPV queries. > Some stats on UTXO set size: (slightly stale -- as of block 270733) > >7.4m unspent outputs >2.2m transactions with unspent outputs >2.1m unique unspent scriptPubKeys >Side note: the top 1,000 scriptPubKeys have 10% of all unspent outputs. > > Let's say you use an 8-bit prefix (1/256) that would be ~10,000 > transactions in the UTXO you would be monitoring. But if I knew a > few different days / time-periods you transacted, I could figure out > your prefix. Actually UTXO isn't the right way to look at this; prefix filters would be almost certainly matched against all txouts in blocks. Or put another way, UTXO isn't the right way to look at it because the attacker will have some rough idea of the time period, and wants to know about transactions made. > Of course, anyone you transact with would know your prefix outright. Well what good, in your example, is it for the attacker to go from "I know my target gets a paycheck every two weeks for $x" to "His wallet prefix is abcd with y% probability"? Even once you learn the prefix of your target's wallet, what funds they actually own is still embedded in a much larger anonymity set of hundreds to thousands of transactions that had nothing to do with them. > Wouldn't this also allow obvious identification of spend versus > change addresses in a transaction? No, I specifically said that you don't want to use prefixes with change txouts for that reason. Fortunately while the set of all scriptPubKey's ever used for change txouts will grow over time, as long as you are not watching for new payments on any key in that set you only need to query for the ones that still have funds on them, and that's only because you want to be able to detect unauthorized spends of them. -- 'peter'[:-1]@petertodd.org 00028a5c9edabc9697fe96405f667be1d6d558d1db21d49b8857 signature.asc Description: Digital signature -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Privacy and blockchain data
> > 2) Common prefixes: Generate addresses such that for a given wallet they >all share a fixed prefix. The length of that prefix determines the >anonymity set and associated privacy/bandwidth tradeoff, which >remainds a fixed ratio of all transactions for the life of the >wallet. > Interesting thought to make the privacy/bandwidth trade-off using vanitygen and prefix filters. But doesn't this effectively expand the universe of potential spies from 'the global attacker' who is watching your SPV queries, to simply 'the globe' -- anyone with a copy of the blockchain? Some stats on UTXO set size: (slightly stale -- as of block 270733) 7.4m unspent outputs 2.2m transactions with unspent outputs 2.1m unique unspent scriptPubKeys Side note: the top 1,000 scriptPubKeys have 10% of all unspent outputs. Let's say you use an 8-bit prefix (1/256) that would be ~10,000 transactions in the UTXO you would be monitoring. But if I knew a few different days / time-periods you transacted, I could figure out your prefix. Of course, anyone you transact with would know your prefix outright. Wouldn't this also allow obvious identification of spend versus change addresses in a transaction? -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
[Bitcoin-development] Privacy and blockchain data
* Summary CoinJoin, CoinSwap and similar technologies improve your privacy by making sure information about what coins you own doesn't make it into the blockchain, but syncing your wallet is a privacy risk in itself and can easily leak that same info. Here's an overview of that risk, how to quantify it, and how to reduce it efficiently. * Background In the most general sense a Bitcoin wallet is a collection of one or more scriptPubKeys, often known as addresses.(*) The basic purpose of the wallet is maintain the set of all transaction outputs (txouts) matching the scriptPubKeys in the wallet. Secondary to that purpose is to maintain the set of all transactions associated with scriptPubKeys in the wallet; almost all (all?) wallet software maintains transaction information rather than only txout data. Usually, but not always, the wallet will have some mechanism to spend transaction outputs, creating new transactions. (if the wallet doesn't it is referred to as a watch-only wallet) Given a full set of blockchain data the task of keeping the set of all relevant transactions and txouts up-to-date is simple: scan the blockchain for the relevant data. The challenge is to devise systems where wallets can be kept up to date without this requirement in a way that is secure, efficient, scalable, and meets the user's privacy requirements. *) Alternatively addresses can be thought of as instructions to the payor as to how to generate a scriptPubKey that the payee can spend, a subtlety different concept. * Threat Model and Goals Currently the Bitcoin network consists of a large (low thousands) number of allegedly independent nodes. There is no mechanism to prevent an attacker from sybil attacking the network other than the availability of IP addresses. This protection is made even weaker by the difficulty of being sure you have a non-sybilled list of nodes to connect too; IP addresses are passed gossip-style with no authentication. From a privacy perspective we are conservative and assume an active, internal, and global attacker - using the terminology of Diaz et al.(1) - that controls up to 100% of the nodes you are connected too. With regard to retrieval of blockchain data we can use the Sweeney's notion of k-anonymity(2) where the privacy-sensitive data for an individual is obscured by it's inclusion in a data of a large set of individuals, the anonymity set. * Basic Functionality With regard to blockchain data we have the following basic functions: ** Spending funds The user creates a transaction and gets it to miners by some method, usually the P2P network although also possibly by direct submission. Either way privacy can be achieved through a mix network such as Tor and/or relaying other users' transactions so as to embed yours within a larger anonymity set. In some cases payment protocols can shift the problem to the recipient of the funds. Using CoinJoin also helps increase the anonymity set. Usually the sender will want to determine when the transaction confirms; once the transaction has confirmed modulo a reorganization the confirmation count can only increase. Transaction mutability and double-spends by malicious CoinJoin participants complicate the task of detecting confirmation: ideally we could simply query for the presence of a given txid in each new block, however the transaction could be mutated, changing the txid. The most simple way to detect confirmation is then to query for spends of the txouts spend by the transaction. ** Receiving new funds While in the future payment protocols may give recipients transaction information directly it is most likely that wallets will continue to have to query peers for new transactions paying scriptPubKey's under the user's control for the forseeable future. ** Detection of unauthorized spends Users' want early detection of private key compromise, accomplished by querying blockchain data for spends from txouts in their wallets. This has implications for how change must be handled, discussed below. * Scalability/Efficiency The total work done by the system as a whole for all queries given some number of transactions n is the scalability of the scheme. In addition scalability, and privacy in some cases, is improved if work can be easily spread out across multiple nodes both at a per-block and within-block level. * Reliability/Robustness Deterministic wallets using BIP32 or similar, where all private keys are derived from a fixed seed, have proven to be extremely popular with users for their simple backup model. While losing transaction metadata after a data-loss event is unfortunate, losing access to all funds is a disaster. Any address generation scheme must take this into account and make it possible for all funds to be recovered quickly and efficiently from blockchain data. Preserving privacy during this recovery is a consideration, but 100% recovery of funds should not be sacrificed for that goal. * Query schemes ** Bloom fil