Good morning list,
Few people have responded to this topic, but when has that ever stopped me from
spamming the mailinglist?
Analysis of Path Extension for Privacy
======================================
As mentioned before, increasing the path length deliberately, by any means,
intuitively makes it seem that privacy is improved.
I will now show how this is not a panacea and that its cost in terms of
increased fees, increased risk of stuckness, and increased worst-case stuckness
might not justify its benefits, which are weaker (at least on the current
network) than they might seem at first glance.
Let us first start with an example network:
A -- B -- S1-- C -- D -- G -- H
| / / \ | / |
| / / \ | / |
I -- J E -- S2-- F
Let us suppose that `A` wishes to make a payment to `F`.
Further, let us assume for simplicity that all nodes charge the same amount for
forwarding, and that they all charge 0.01 units base and 0 proportional.
Finally, let us also assume that `S1` and `S2` are two nodes run by a single
surveillor, and that all the other nodes are otherwise not interested in
destroying Lightning privacy.
Now, obviously `A` and `F` do not know that `S1` and `S2` are run run by a
surveillor, thus cannot identify S1 and S2 as belonging to a single actor that
wants to snoop their payment.
Now let us suppose that A takes the shortest path `A -> B -> S1 -> C -> E -> S2
-> F`.
Would it be improved to artifically increase the path length?
We can observe that, with the current network, due to the same hash being used
in the entire route, `S1` and `S2` can easily notice when they are on the same
route.
Thus, suppose we increased the path length by taking this route instead: `A ->
B -> S1 -> I -> J -> C -> D -> G -> E -> S2 -> F`.
Then our surveillor gets exactly the same information as in the case `A -> B ->
S1 -> C -> E -> S2 -> F`.
* An incoming payment went into S1 via B, so the payer must be A or B (taking
the shortest-path heuristic pre-S1).
* An outgoing payment went out of S2 via F, so the payer must be F (taking the
shortest-path heuristic post-S2).
Thus, in many cases, it is immaterial if we actually inserted greater length
onto the path.
We can generally expect that surveillors can easily just buy a bunch of BTC and
insert many nodes into the network to act as surveillance nodes, especially
since forwarding pays fees and thus the requirement to lock funds in channels
is not in fact an opportunity cost.
Once a path passes through more than one surveillor nodes, any increase in
length between the two endmost cooperating surveillor nodes does not improve
privacy.
Thus `A` would end up paying the costs of increased path length (higher fees,
higher risk of stuckness, highest worst-case stuckness) *without* any benefit
to its privacy.
Indeed, we might point out that if `A` took the path `A -> B -> S1 -> C -> D ->
G -> H -> F`, then it would have avoided `S2` and the single surveillor would
have a much larger set of possible destinations to analyze.
But if it instead took the longer path `A -> B -> S1 -> I -> J -> C -> D -> G
-> E -> S2 -> F`, then `S2` would have helped the surveillor pin down the
destination that either A or B is paying to.
Which brings the next point: longer path also means increased chances of going
through *two* surveillance nodes, and thus having the payment endpoints
(ultimate sender, ultimate receiver) be identified by surveillor nodes.
The increased path length also means less reliable forwarding (more nodes can
fail), meaning it is now more likely that `A` will have to find another route.
This increases the chances that `S1` and `S2` will be on *some* route.
And because the same hash will *also* be used on the alternate routing
attempts, then if on one attempt we went `A -> B -> S1 -> C -> D -> G -> H ->
F` and on the next attempt we went `A -> I -> J -> C -> E -> S2 -> F`, then
both `S1` and `S2` can still triangulate the ultimate payer and ultimate payee
as with getting the shortest path in the first place!
The intuition that "sub-optimal paths means better privacy" is countered by the
intuition that "the more people you tell, the less private it is".
At least on the current network, the fact that we can identify a single payment
(across multiple nodes within an attempt, and across multiple attempts trying
to forward the same payment) means increased path length does not buy a lot of
privacy, and the significant losses in useability might not be commensurate to
the mild increase in privacy it *theoretically* could get.
Fortunately, the PTLC-based path decorrelation fixes most of these, and with
that, at least it is hard for `S1` and `S2` to identify if forwards going
through them are the same payment, at least from the payment hash.
Digression: `permuteroute` Privacy
----------------------------------
Previously, [I discussed the `permuteroute`
algorithm](https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-August/002095.html),
which is basically the Path Splicing technique applied to Lightning payments.
TLDR: When a payment fails at a particular node or channel along a route, we
copy the path prefix before the break, the path postfix after the break, and
make a route around the break, getting better time behavior since it is likely
that the area around the break has alternate routes that are still relatively
short (Dijkstra algorithm in particular has a massive increase in runtime for
every additional node on the shortest path, so if the shortest path is very
short, the runtime for the Dijkstra run to bridge the break would be much
faster than if we redid it from the source to the destination).
An advantage of `permuteroute` is also the fact that *you do not tell more
people about your payment*.
The prefix of path is reused rather than looking for a completely new path, and
the nodes along that prefix *already* knows about the payment and its hash and
its amount and its CLTV-delta, from the previous failing payment attempt.
Telling them approximately the same details *again* in the next payment attempt
is not an *additional* privacy leak at least.
(in general a `permuteroute` will either increase or give the same path length
as the previous attempted path, so the first attempt will give, via the
unremovable data CLTV-delta and amount, the shortest distance from a surveillor
node to the destination, and subsequent `permuteroute` attempts will tend to
give a *longer* distance to the destination, which does not help the surveillor
to *narrow down* the destination, thus does not increase privacy leaks to the
nodes on the reused prefix.)
Thus, I expect that `permuteroute` will give, not only a good improvement in
practical pathfinding speed, it would also give a mild practical improvement in
privacy compared to using high-randomization pathfinding techniques like the
Rusty Russell Random Route Ralgorithm for *every* payment attempt.
Privacy Non-Improvement Due To Practical Limits
-----------------------------------------------
There is a practical limit to the number of nodes you can stuff onto an
artificially lengthened path.
Unfortunately, it means as well that privacy improvement at the destination end
translates to privacy reduction at the source end.
For instance, it is likely that wallets would want to bound the outgoing
CLTV-delta at the ultimate sender.
If this CLTV-delta were, for instance, to be a year away from the current
blockheight, then at the worst-case, in case of a stuck payment, the funds used
in a stuck payment would be locked for a year in an HTLC.
Thus, obviously wallets will be designed to impose some maximum CLTV-delta at
the ultimate sender.
For example, C-Lightning limits this, by default, to 576 blocks.
Knowing this limit, if an arbitrary forwarding node were, for instance, to note
that the incoming HTLC had a CLTV that was 548 blocks in the future, then it
can guess that the ultimate sender is a node within 28 CTLV-delta blocks away
from it.
Of course, such a large CLTV means that the outgoing HTLC probably has a
similarly large CLTV relative to now, meaning that the destination is harder to
find.
But in effect, any improvement in making the destination harder to find comes
with a decay in the privacy of the source.
As well, the amount can give a lesser degree of this, if the forwarding node
can guess some standard amount is the final payment at the destination;
C-Lightning imposes a default `maxfeepercent` limit of 0.5%.
This works with only analyzing CLTV-delta (and maybe amount, but that is much
less reliable), from a single forwarding node, thus path decorrelation does
*not* help with this (path decorrelation helps when multiple, but not all,
forwarding nodes are controlled by a single surveillor).
CLTV-delta and amount are unremoveable data and thus this analysis is always
possible, so any improvement we can use to improve destination privacy is
likely to come at the cost of the source privacy.
This may still be a good tradeoff: the onion protocol used makes identifying
the source very difficult, whereas the destination is tightly constrained by
the CLTV-delta that must be provided to every forwarding node, so sacrificing a
little source privacy to get a little destination privacy may still be worth
it, but it should as well be noted that payment information is much more
valuable if information about *both* the payer *and* the payee is available.
Random Walk Loops
-----------------
Increasing the path length using a naive form of Rusty Russell Random Route
Ralgorithm may lead to loops being formed in the proposed path, e.g. `A - > B
-> S1 -> C -> E -> G -> D -> E -> S2 -> F`.
For the node `E`, which appears twice in that route, this is not much different
from the case that it is running two separate nodes `S1` and `S2`: a node it
controls appears more than once on the same route, thus leading to the ability
to eliminate intermediate hops from consideration.
That is, the "detour" through `G` and `D` becomes immaterial to the analysis
that `E` can perform on the route.
The end result is that we pay more for including `G` and `D`, but they do not
increase our privacy against the node that happens to be the pivot on a loop.
Thus, we should use path length increasing algorithms that specifically avoid
loops being formed.
Of course, this can now be used as a basis for a heuristic by which a
forwarding node can eliminate some nodes from its analysis by simply assuming
that loops do not occur.
With path decorrelation, identifying loops becomes difficult and they will have
much lower privacy loss.
Analysis of Multipath and JIT Routing for Privacy
=================================================
Multipath gives an improvement to useability, allowing payments to be
distributed across multiple paths rather than requiring that the entire payment
be successfully routable as a single large payment.
I would point out as well that Just-In-Time Rebalancing, more commonly known as
JIT Routing, gives you most of what multipath gives, but with much simpler
software design.
(Nothing prevents Just-In-Time rebalancing from being done by the ultimate
payer, so as long as the payer has *some* channel that can hold the entire
payment, it would be perfectly fine to initially have the ultimate payer with
its funds distributed across multiple channels, which is another thing that
people have been thinking would require multipath to solve: Just-in-time
rebalance works with this as well.)
I will now demonstrate that JIT Routing gives better privacy than multipath, at
least with the current network.
With multipath, every part of the payment has the same hash.
Thus, suppose `A` splits a single payment into two parts, each taking paths `A
-> B -> S1 -> C -> D -> G -> H -> F` and `A -> I -> J -> C -> E -> S2 -> F`.
Again, this gives `S1` and `S2` the same tight information, that it is likely
the ultimate sender is either `A` or `B` and the ultimate destination is `F`.
`S1` and `S2` need only compare their CLTV-deltas (data that cannot be removed
from the payment) to determine which one is likely to be nearer the source and
which is likely to be nearer the destination, and thereby get better
triangulation of both the ultimate sender and ultimate receiver.
In fact, it is worse than that.
Multiple cooperating surveillors that find themselves on different parts of the
same payment can use the unremovable CLTV-delta information to create a Venn
diagram of the possible destinations, then consider that the destination is the
intersection of those nodes that are within the outgoing CLTV-delta of each of
them.
(Similar information as well can be extracted from multiple attempts of a
single payment.)
On the other hand, an important property of JIT Routing is that a rebalance
attempt triggered by a payment has a different hash from the payment.
Thus, the other nodes that would be involved in the JIT Routing are never given
the important detail of the hash of the actual payment, thus preventing them
from easily being correlated.
Because there is only a single path for the "real" payment, we follow the
principle "telling more people makes it less private": only the nodes specified
by the ultimate payer in the payment onion are told of the *actual* payment
details, all the other "supporting" nodes that facilitate the payment by
allowing rebalances are not told about the payment details, they only get told
about rebalances, which are much less interesting (it is literally just
shuffling your own money around).
In particular, they are *not* told the CLTV-delta of the "main" payment path,
which is the largest single-node privacy leak in Lightning (for multi-node
surveillors, the hash is an even bigger privacy leak).
Of course, path decorrelation helps reduce the issues that multipath has.
CLTV-deltas are still an issue under multipath though, and the fact that
rebalances triggered by a payment do not leak the actual CLTV-deltas of the
original payment is still an important advantage that JIT Routing has over even
decorrelated multipath.
Analysis of Path Decerrelation for Privacy
==========================================
As noted above, path decorrelation mitigates privacy issues with both
artificially-lengthened paths, and multipath.
Of note is that these only close privacy leaks regarding hashes.
CLTV-delta in particular is still not fixed by path decorrelation.
However, path decorrelation does enable some important possible techniques.
Extended Shadow Routing
-----------------------
As noted before, artificially increasing the path length makes it harder for
forwarding nodes to use the shortest-path heuristic to determine te ultimate
payer and ultimate payee of a payment attempt.
Increasing the path length tends to increase the number of hop nodes that are
traversed, increasing costs, risk of stuck payments, and lock time of stuck
payments.
We can observe in particular that when considering stuckness, every additional
hop node affects our useability twice:
* Every additional hop node is another hop node that could fail between
forwarding and claiming.
* Every additional hop node demands additional CLTV-delta for that hop,
increasing the total amount of time we will offer on the first outgoing HTLC.
Thus, very long routes tend to greatly reduce useability.
A way to mitigate this would be to use "extended shadow routing", where instead
of *actually* taking a longer path, we *pretend* to take a longer path and
overpay some forwarding nodes with more fees and CLTV-delta than they demand.
For example, as noted above, the shortest path from `A` to `F` is `A -> B -> S1
-> C -> E -> S2 -> F`.
Or more precisely, if `A` has to pay 1.0 unit to `F`, and all hop nodes have a
0.01 base fee and 0 proportional fee, the route is: `A ->1.05-> B ->1.04-> S1
->1.03-> C ->1.02-> E ->1.01-> S2 ->1.00-> F`.
(I have elided the CLTV-delta requirements, but I am sure you can imagine the
CLTV-delta requirements from this; indeed, CLTV-delta is a much better way to
break privacy than the amount, so consider the example as a proxy for
CLTV-delta being used to analyze the path.)
With path decorrelation, `S1` and `S2` can no longer rely on a single hash
along the entire route.
Even so, they could still determine the shortest path between them, and thereby
know the expected difference in amount and CLTV-delta from the outgoing of `S1`
to the incoming of `S2`.
>From this, they can determine with high probability that forwards of similar
>value and sidereal timing are actually belonging to the same single payment.
Now suppose instead `A` took the sub-optimal path `A ->1.07-> B ->1.06-> S1
->1.05-> C ->1.04-> D ->1.03-> G ->1.02-> E ->1.01-> S2 ->1.00-> F`.
Now `S1` and `S2` can no longer use the shortest path between them to determine
that this is a single payment --- the outgoing of `S1` minus the incoming of
`S2` is no longer the cost of the direct path between them, so they cannot
assign high probability that the two forwards they see are hops of the same
payment.
But for `A`, this comes at the cost of higher fees, higher CLTV-delta, and
higher risk of payment attempt failure --- now any failures at `D` and `G` will
affect the payment experience.
To mitigate this, instead `A` can take this sub-optimal path that has fewer
nodes but overpays the fees and CLTV-delta of `C`: `A ->1.07-> B ->1.06-> S1
->1.05-> C ->1.02-> E ->1.01-> S2 ->1.00-> F`.
This is extended shadow routing, where not only do we overpay at the
destination, we also randomly overpay intermediate hops.
We still use the shortest path in terms of hops, but the random overpaying we
do at intermediate hops makes it harder for other intermediate hops to
determine if the forwards belong to the same payment or not.
(Again, just to be clear: we also adjust the CLTV-delta correspondingly to
overpay the CLTV-delta demanded by `C`.)
Standardized Multipart Splitting Amounts
----------------------------------------
Of course, if the only payment going around with an amount of 1.0 units is the
payment between `A` and `F`, and all the other payments do not, it is still
possible, even with path decorrelation, for `S1` and `S2` to correlate the
forwards by looking at the payment amounts.
Fortunately, if we are able to fix the triangulation issues with multipath by
implementing path decorrelation, we can subsequently use multipath to fix the
amount-correlation issue.
(Nothing to be done about the CLTV-delta, though...)
This is done by specifying a set of standardized payment amounts, and forcing
our algorithms to split with those amounts always, possibly overpaying the
destination a little just to hide all the payment amounts of a multipath to
standardized payment amounts.
For example, suppose we define standardized payment amounts 16, 8, 4, 2, 1,
0.5, 0.25, 0.125, 0.0625.
If we want to pay 1.3 units, we could split our payment into 1, 0.25, and
0.0625, leading to a total of 1.3125 units, for a 0.96% overpayment.
The advantage of this is that it strengthens increasing the path length
(whether for real, or by "extended shadow routing" which just artificially
increases the path length by overpaying intermediate hops).
As noted before, if the only payment going around with 1.0 units is that
between `A` and `F`, then `S1` and `S2` can increase the probability they
assign that the two forwards they see belong to the same payment, even if we
increased the path length between them so they can no longer use the
shortest-path heuristic.
But if we use standardized payment amounts, and 1.0 is one of the standard
amounts, then `S1` and `S2` cannot be certain that the two forwards they
individually see are the same payment --- they could be two unrelated
standard-amount payments instead.
This reduces amount-based correlation.
In particular, with path decorrelation, multiple attempts of a part are now
also harder to correlate --- it could be that those are unrelated attempts from
a different payment.
Of course, this topic is likely to lead to a lot of debate.
For instance, powers of two might not be the best (maybe a Fibonacci sequence
would be better to minimize overpayment, in much the same way they reduce
unused space in `malloc` implementations), multipart receivers might not want
to receive a multipart payment composed of payments below the dust threshold
(because if they have to drop onchain, they may end up getting multiple
incoming parts worth zilch each), more standard payment amounts just split the
anonymity set (and what number of standard payment amounts is optimal?) and so
on.
Since the benefit of standardized multipart splitting amounts lies in there
*being* a standard, I suggest we instead just vote on *whether or not* to
define such a standardized set of amounts, and if we do decide to go with this,
just hold a lottery on who exactly will define the standardized amounts.
(instead replacing the long debates on what standardized payment amounts to use
with long debates on how to do a trustless online-only lottery)
No matter what set of standardized amounts we define, it seems to me that there
will be payments so small that splitting them up would be unnecessarily
uneconomical, or the payment amount might be smaller than the smallest
standardized amount.
We might accept the loss of privacy in those if we consider them to be small
and not of much interest.
Amount Decorrelation By Self-Payment
------------------------------------
We already know that we can perform donations by using a self-payment, hiding
the donation in the fee of the node we wish to pay.
With PTLCs, which are a requirement of path decorrelation anyway, we can
increase the security of such circular payments (they can no longer be stolen
by wormhole attacks), and also have a proof-of-payment arrive from the payee
(by using the sum of the payee secret plus a secret known by the payer at the
payee hop).
This also implements a form of stuckless payments.
If we use two standardized sets of payments with a small difference between
them, it becomes possible as well to pay smaller amounts by paying using the
*difference* between two standard payment amounts.
As well, the return path ends up misleading successful analysis, as it
*inverts* the direction of who is the *actual* payer and who is the *actual*
payee.
This could be used to mislead analysis by using a standard amount in the
forward section of the path and then use a non-standard amount in the return
path, allowing payment of tiny amounts that are impractical given the standard
set, but misleading the *actual* amount and who is the *actual* payer and
*actual* payee.
As a concrete example, suppose `A` wants to pay `I` a tiny 0.1 units, and does
not care that the fees it ends up paying may be a significant fraction of that
micropayment.
It could do: `A ->8.01-> B ->8.0-> I ->7.9-> S1 ->7.89-> B ->7.88-> A`.
`A` ends up paying 8.01 - 7.88 = 0.13, so it paid 0.1 to `I` and 0.03 in fees.
`S1` might end up being able to analyze the non-standard amount of `7.9-> S1
->7.89` and determine that `I` pays `A` about 7.88 units, when in fact the
actual payment is `A` paying `I` 0.1 units.
Against this technique, we should also remind of the principle "telling more
people makes it less private", however.
As well, this increases the number of hop nodes that could fail the payment,
meaning more payment attempts (and thus more people getting told of the
payment).
More analysis on the use of circular payments when paying may be in order.
In any case, this would require explicit support in the BOLT spec as well, as
we would need invoices to be payable with a direct payment, and with an
indirect circular payment as well.
Regards,
ZmnSCPxj
_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev