Re: [bitcoin-dev] Non-equal value CoinJoins. Opinions.

2019-12-28 Thread Yuval Kogman via bitcoin-dev
Hi,

On Sat, 28 Dec 2019 at 01:29, nopara73 via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

I haven't read the whole thing in detail (and fwiw, I don't think I will by
this point), but I do want to respond to section about the combinatorics as
well as the proof, since both the premises and the implications don't seem
very solid to me, especially in light of the other replies in this thread.

It appears to be a step up from the Knapsack paper in terms of the
specificity of a concrete mixing protocol (which again, I did not
scrutinize, but see below), but a regression in terms of privacy (see other
replies), which even in the Knapsack paper's approach raises some concerns:

Now, there are 100!/(10!)^10 ~= 10^92 ways to partition the inputs into a
> list of 10 sets of 10 inputs, but only a tiny fraction of these partitions
> will produce the precise output list.
>
In the equal amount case, the search space of possible interpretations with
n = # inputs + # indistinguishable outputs is proportional to the nth Bell
number, i.e. it's exponential in the size of the transaction, which is an
inviting intuition. But this is an *upper* bound on the difficulty of
deanonymization, given no additional information.

This quantitative framing is potentially misleading because:

1. attributing inputs/outputs (sub-transactions in the Knapsack paper's
terminology) is arguably not a search problem, but an optimization problem,
since approximate results are still partly useful to the adversary
2. there are many computational strategies, heuristics, etc that in
practice can make this more efficient than brute force[1], so framing it
that as a security parameter doesn't sit right with me
3. as individual sub-transactions are identified (for example using out of
band information), the computational challenge also *drops* exponentially
fast

Additionally (though is a broader criticism of CoinJoin based privacy and
not specific to unequal amounts, and in particular refers to ZmnSCPxj's
assertion of 0 linkability) I am very worried that perspectives that focus
on linkability information revealed by a single coinjoin transaction in
isolation. This problem was alluded in the document, to but I don't see
that it was addressed. Naively the post/pre mix transaction graph would
seem to present a computationally much harder problem when looking at the
combinatorics through the same lens, but reality it can also be used to
place many constraints on valid partitions/sub-transaction assignments for
a single transaction with equal amounts. The trivial example is post mix
linking of outputs, but there are many other ways to draw inferences or
eliminate possible interpretations of a single transaction based on its
wider context, which in turn may be used to attack other transactions.


> Based on the example above, we can see that not only are there a huge
> number of partitions, but that even with a fast algorithm that could find
> matching partitions, it would produce around 10^20 possible valid
> configurations. With 10^20 possibilities, there is essentially no linkage.
>
This is a better framing, but still doesn't address my third bullet, since
"Attacks always get better; they never get worse." In other words
"essentially no linkage" due to multiple possible interpretation is still
strictly more meaningful if you can add constraints out of band.

To be fair in equal amount CoinJoins this is also the case, but it's a much
simpler model to consider in the context of other privacy leak vectors
(e.g. transaction graph connectivity beyond a single coinjoin, wallet
fingerprinting, temporal patterns, network privacy leaks, etc etc), since
analyzing your level of exposure is *also* complicated by unequal amounts,
in other words higher chance of privacy leaks due to misuse, or ignorance
of some of the implications under intended use. Thinking through these
implications is much easier when the information content in the amounts is
minimized.

The Cash Fusion scheme actually extends this obfuscation even further. Not
> only can players bring many inputs, they can also have multiple outputs
>
And, quoting another section:

Unfortunately, the production of equal-amount coins is impractical for
> various reasons. Foremost, it has a "toxic waste"
>

I'm still cautiously optimistic about the potential of multiple
inputs/outputs per user (c.f. 3-phase chaumian CoinJoin ideas we've
previously discussed in the context of Wasabi, though I don't recall any
public discussion I can link to, sorry list), but with the additional
assumption of amounts with small popcounts/Hamming weights (e.g. only
amounts that are 2^n sat in size, or based on 1-2-5 series, and for a
rationale see Ethan's reply).

Unfortunately this trades off that "toxic waste" problem for a very large
on chain footprint (e.g. if the popcount of the amount of a wallet is
limited to 1, the number of inputs and change outputs required in the worst
case is proportional to log of the pay

Re: [bitcoin-dev] Non-equal value CoinJoins. Opinions.

2019-12-28 Thread ZmnSCPxj via bitcoin-dev
Good morning Adam,

> The CashFusion research came out of the Bitcoin Cash camp, thus this probably 
> went under the radar of many of you. I would like to ask your opinions on the 
> research's claim that, if non-equal value coinjoins can be really relied on 
> for privacy or not.
>
> (Btw, there were also similar ideas in the Knapsack paper in 2017: 
> https://www.comsys.rwth-aachen.de/fileadmin/papers/2017/2017-maurer-trustcom-coinjoin.pdf
>  ) 
>
> https://github.com/cashshuffle/spec/blob/master/CASHFUSION.md#avoiding-amount-linkages-through-combinatorics
>   
>
> I copy the most relevant paragraphs here:
>
>   -BEGIN QUOTE - 
>  
>
> Consider a transaction where 10 people have each brought 10 inputs of 
> arbitary amounts in the neighborhood of ~0.1 BCH. One input might be 
> 0.03771049 BCH; the next might be 0.24881232 BCH, etc. All parties have 
> chosen to consolidate their coins, so the transaction has 10 outputs of 
> around 1 BCH. So the transaction has 100 inputs, and 10 outputs. The first 
> output might be 0.91128495, the next could be 1.79783710, etc.
>
> Now, there are 100!/(10!)^10 ~= 10^92 ways to partition the inputs into a 
> list of 10 sets of 10 inputs, but only a tiny fraction of these partitions 
> will produce the precise output list. So, how many ways produce this exact 
> output list? We can estimate with some napkin math. First, recognize that for 
> each partitioning, each output will typically land in a range of ~10^8 
> discrete possibilities (around 1 BCH wide, with a 0.0001 BCH resolution). 
> The first 9 outputs all have this range of possibilities, and the last will 
> be constrained by the others. So, the 10^92 possibilies will land somewhere 
> within a 9-dimensional grid that cointains (10^8)^9=10^72 possible distinct 
> sites, one site which is our actual output list. Since we are stuffing 10^92 
> possibilties into a grid that contains only 10^72 sites, then this means on 
> average, each site will have 10^20 possibilities.
>
> Based on the example above, we can see that not only are there a huge number 
> of partitions, but that even with a fast algorithm that could find matching 
> partitions, it would produce around 10^20 possible valid configurations. With 
> 10^20 possibilities, there is essentially no linkage. The Cash Fusion scheme 
> actually extends this obfuscation even further. Not only can players bring 
> many inputs, they can also have multiple outputs.
>
> -END QUOTE -
> --


It seems to me that most users will not have nearly the same output of "around 
1 BTC" anyway if you deploy this on a real live mainnet, and if your math 
requires that you have "around 1 BTC" outputs per user. you might as well just 
use equal-valued CoinJoins, where the equal-valued outputs at least are 
completely unlinked from the inputs.

Indeed, the change outputs of an equal-valued CoinJoin would have similar 
analyses to CashFusion, since the same analysis "around 1 BTC" can be performed 
with the CoinJoin change outputs "around 0 BTC".

* You can always transform a CashFusion transaction whose outputs are "around 1 
BTC" to a CoinJoin transaction with equal-valued outputs and some change 
outputs, with the equal-valued outputs having equal value to the smallest 
CashFusion output.
 * e.g. if you have a CashFusion transaction with outputs 1.0, 1.1, 0.99, you 
could transform that to a CoinJoin with 0.99, 0.99, 0.99, 0.01, 0.11 outputs.
* Conversely, you can transform an equal-valued CoinJoin transaction to a 
CashFusion transaction using the same technique.
* That implies that the change outputs of an equal-valued CoinJoin have the 
same linkability as the outputs of the equivalent CashFusion transaction.
* At least with equal-valued CoinJoin, the equal-valued outputs have 0 
linkability with inputs (at least with only that transaction in isolation).
  The same cannot be said of CashFusion, because the value involved is just in 
a single UTXO.

Regards,
ZmnSCPxj
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Non-equal value CoinJoins. Opinions.

2019-12-28 Thread Ethan Heilman via bitcoin-dev
I'm only going to talk about cashfusion and not the knapsack paper.

The language they use to describe the cashfusion protocol is very
broad and could describe many things. Because it is hard so vague I
don't want to dismiss the cashfusion approach out of hand. For
instance they say: "inputs of arbitary amounts in the neighborhood of
~0.1 BCH" what exactly does this mean?

Attack 1:
If we assume arbitrary means any precision then a trivial attack is
possible. Consider the case where one of the inputs has more precision
than any other input. This allows an attacker to trivially break the
privacy of that input:

Lets look at a toy example that takes 12 inputs and creates 3 outputs
inputs:
0.1525
0.1225
0.1145
0.1443
0.1144111
0.1001
0.1124
0.1093
0.1113
0.1134
0.1029
0.1206

Outputs:
0.4648111
0.5185
0.4349

Clearly output output 0.4648111 contains input 0.1144111.

Attack 2:
Let's say you attempt to address this problem this by limiting the
precision of inputs to two decimal places i.e. 0.1X where 0<=X<=9.
Consider the case of 10 users where each user is always joining sets
of 10 inputs to create 1 output. Thus in total you would have 100
inputs and 10 outputs in the coinjoin. If one of those outputs is 2
then you know its inputs must all be 0.2. Using this method you can
start eliminate input output pairs far faster brute force. How much
faster is hard to say without adding additional assumptions for
instance are these inputs amounts drawn from a uniform distribution?

I want to be clear. I'm not saying cashfusion is broken or that this
more inputs than outputs technique is a dead end. However the
description given is vague and could be interpreted to describe a
broken protocol. Is this actively being used?

On Fri, Dec 27, 2019 at 8:29 PM nopara73 via bitcoin-dev
 wrote:
>
> The CashFusion research came out of the Bitcoin Cash camp, thus this probably 
> went under the radar of many of you. I would like to ask your opinions on the 
> research's claim that, if non-equal value coinjoins can be really relied on 
> for privacy or not.
>
> (Btw, there were also similar ideas in the Knapsack paper in 2017: 
> https://www.comsys.rwth-aachen.de/fileadmin/papers/2017/2017-maurer-trustcom-coinjoin.pdf
>  )
>
> https://github.com/cashshuffle/spec/blob/master/CASHFUSION.md#avoiding-amount-linkages-through-combinatorics
>
> I copy the most relevant paragraphs here:
>
>   -BEGIN QUOTE -
>
>
> Consider a transaction where 10 people have each brought 10 inputs of 
> arbitary amounts in the neighborhood of ~0.1 BCH. One input might be 
> 0.03771049 BCH; the next might be 0.24881232 BCH, etc. All parties have 
> chosen to consolidate their coins, so the transaction has 10 outputs of 
> around 1 BCH. So the transaction has 100 inputs, and 10 outputs. The first 
> output might be 0.91128495, the next could be 1.79783710, etc.
>
> Now, there are 100!/(10!)^10 ~= 10^92 ways to partition the inputs into a 
> list of 10 sets of 10 inputs, but only a tiny fraction of these partitions 
> will produce the precise output list. So, how many ways produce this exact 
> output list? We can estimate with some napkin math. First, recognize that for 
> each partitioning, each output will typically land in a range of ~10^8 
> discrete possibilities (around 1 BCH wide, with a 0.0001 BCH resolution). 
> The first 9 outputs all have this range of possibilities, and the last will 
> be constrained by the others. So, the 10^92 possibilies will land somewhere 
> within a 9-dimensional grid that cointains (10^8)^9=10^72 possible distinct 
> sites, one site which is our actual output list. Since we are stuffing 10^92 
> possibilties into a grid that contains only 10^72 sites, then this means on 
> average, each site will have 10^20 possibilities.
>
> Based on the example above, we can see that not only are there a huge number 
> of partitions, but that even with a fast algorithm that could find matching 
> partitions, it would produce around 10^20 possible valid configurations. With 
> 10^20 possibilities, there is essentially no linkage. The Cash Fusion scheme 
> actually extends this obfuscation even further. Not only can players bring 
> many inputs, they can also have multiple outputs.
>
> -END QUOTE -
> --
> Best,
> Ádám
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev