[Lightning-dev] On Path Privacy

ZmnSCPxj via Lightning-dev Wed, 25 Dec 2019 19:04:09 -0800


Privacy Loss Models
===================

Shortest Path Heuristic
-----------------------

First, let us presuppose a network connected as below:

B --- C
/ \
/ \
A D
\ /
\ /
F --- E

Now, let us suppose that some evil surveillor insinuates itself into the
network:

B --- C
/ \ / \
/ \ / \
A --- S --- D
\ / \ /
\ / \ /
F --- E

The strategy of the surveillor is thus the below:

* It makes itself more central to the network by creating large numbers of
channels, especially to distant nodes.
* It lowers its channel fees relative to the other nodes, making its channels
attractive to pathfinders that consider channel fees.
* It uses the "shortest path heuristic".
That is:
* Suppose in the given example, S receives a forward via the B-S channel, to
the S-D channel.
* Then it knows the payer is B and the payee is D.

More precisely, the "shortest path heuristic" simply means performing the below
precomputation:

* Take the entire routemap.
* Create a mapping of the tuple of an incoming channel and an outgoing channel
(of the surveillor node), to a set of pairs of payers and payees.
* For each possible payer in all nodes:
* For each each possible payee in all nodes:
* Take the shortest path from payer to payee.
* If the surveillor node is in that path, determine the incoming channel
and outgoing channel, then add the payer and payee to the corresponding entry
in the mapping.

Then, whenever the surveillor node has to perform a forward, it only needs to
look up the incoming channel and outgoing channel in the mapping.
This results in a set of possible payer/payee pairs.

This analysis becomes more powerful when the surveillor has acquired forwarding
logs from multiple nodes on the network.
If a payment is determined to have passed through multiple nodes, whose
forwarding logs the surveillor has acquired, then the surveillor can simply
perform the above precomputation for each node, then take the intersection of
the sets in each mapping entry.

Much of the privacy loss is effectively due to forwarding nodes being able to
create "reverse" routing tables.
That is, instead of a routing table that inputs a source and destination and
outputs a (set of) likely direction(s) to send packets to, a reverse routing
table inputs a direction from sent and to sent and outputs a (set of) likely
source(s)/destination(s).

Unremovable Forwarding Data
---------------------------

As we know, switching Lightning from HTLCs to PTLCs (i.e. from payment hash +
preimage to payment point + scalar) enables many new features, including the
so-called "path decorrelation".
The theory is that by changing the payment point at each hop, multiple
coordinating surveillor nodes will be unable to determine if the same payment
passed through them.

However, this is not sufficient, as the data below cannot be removed or hidden:

* The sidereal time at which `update_add_htlc` messages are sent
* This can be used to determine if two forwards are possibly on the same
payment path.
* Worse, the sidereal time between the outgoing `update_add_htlc` and the
`update_fail_htlc` / `update_fulfill_htlc` response can be measured to help
characterize the distance, in nodes, to the payee.
* We cannot change or mask this time, unless we are willing to add random,
potentially-large latencies to forwards, which we do not want to do since the
entire point of Lightning Network is fast payments.
* The value being transmitted cannot be hidden from the forwarding node.
* While we know how to send homomorphically encrypted values that can be
summed up, the forwarding node has to know the value of its own channels, and
would know how much a forward changes its own channel values in order to
determine if it has enough capacity on a channel to actually offer the outgoing
HTLC.
* The timelock of the payment cannot be hidden from the forwarding node.
* The forwarding node is liable for the timelock of any HTLCs it offers, and
it has to know about the exact timelock in order to force a channel onchain
when any HTLCs pending on it have a timelock that is expiring very soon.

The above data helps two cooperating surveilling nodes to determine if two
forwards on their nodes belong to the same logical payment, even if path
decorrelation masks the payment point along the entire path:

* The two nodes can determine:
* How long a shortest-path payment takes to reach from one to the other.
* How much fees the shortest path between them costs.
* How much CLTV-delta the shortest path between them costs.

By deriving the above data, two nodes can have a very good guess of which
forwards passing through them are likely to belong to the same payment, which
helps them triangulate better the payer and payee, especially given the
shortest path heuristic.
Thus, even *with* path decorrelation, two cooperating surveilling nodes can
still correlate forwards across them (they just will not be able to execute a
wormhole attack).

Now, of note is that the the timelock is particularly leaky in terms of privacy.
The timelock gives an upper bound on the distance of the forwarding node to the
ultimate payee.
In combination with another privacy loss, this can allow a forwarding node to
precisely identify the ultimate payee of a payment.

Both value and timelock can be randomized somewhat by use of "shadow routes",
as implemented in C-Lightning.
However, the current C-Lightning implementation only inserts shadow routes at
the end of the route; it should also insert shadow routes at random points in
the route, and mildly overpay fees along the way (not just give extra
value+cltv-delta to the last node), to reduce the ability of multiple
cooperating surveillors to determine if their forwards are the same payment.

REDACTED SECTION
----------------

REDACTED TEXT.

Non-Solutions to Privacy Loss
=============================

Unpublished Channels Delenda Est
--------------------------------

Unpublished channels are becoming popular for some reason, and even section 4.1
of [this paper](https://arxiv.org/pdf/1909.02717) gives it as a solution for
improving privacy without sacrificing utility.

Unfortunately for that paper, this is wrong, and unpublished channels must be
destroyed.

Unpublished channels do not improve privacy, but instead move the privacy risks
from shared to concentrated on specific nodes, which become targets for
surveillors.
Further, because it moves risk to fewer high-value targets (a symptom of
centralization), it violates [Risk Sharing
Principle](https://github.com/libbitcoin/libbitcoin-system/wiki/Risk-Sharing-Principle).

As I noted
[here](https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-October/002241.html),
the Axiom of Terminus means that I can have the heuristic:

* If I receive a forwarding request:
* If the incoming channel is unpublished, the node on the other end is the
payer.
* If the outgoing channel is unpublished, the node on the other end is the
payee.

This gives very precise information about who the payer and payee is, and is
not fixable by e.g. shadow routing or [REDACTED TEXT].

Now, of course the above heuristic is only doable by primarily-forwarding nodes
with a mix of unpublished and published channels, and only to nodes that use
the unpublished channels.
Otherwise, independent third parties cannot even know about nodes that *only*
use unpublished channels, so that should be good, right?

Unfortunately:

* Payers have pretty good privacy otherwise: the (unremovable) timelock is a
hint to the distance to the payee, not payer, and the payer could always claim
that it received the payment via other published channels.
* Thus, ***the fact that with unpublished channels some nodes can now
identify the payer precisely is a major privacy loss for payers***.
* Those nodes with a mix of unpublished channels and published channels become
major targets for surveillance.
* They may be tempted by the dark side to defray their operating costs by
actively selling the payment information they have acquired.
* Their backups and persistent storage can be hacked and copies of forwarding
logs acquired, even if they are strongly committed to privacy for their clients.
* Competitors who *do* sell their payment information can outcompete them,
undercutting their fees (which also has the effect that payments are more
likely to be routed to them due to the lower fees, increasing their ability to
surveill further).
* The existence of unpublished channels is still attested on the blockchain,
and onchain analysis can give hints on which unpublished users ended up making
channels with specific published nodes.
* Lightning is practically the only use of 2-of-2 P2WSH, since most HODL
usages will be either 1-of-1 or 2-of-3 ("never take two chronometers to sea:
take one or three").
* Since current Bitcoin requires revealing a 2-of-2 script in P2WSH, even
mutual closes of unpublished channels are visible publicly on the blockchain
anyway, and subsequent use of that money can be traceable to specific servers.
* Taproot fixes this.
* CoinJoin fixes this.

If all channels were published, then this spreads out the information that
forwarding nodes can keep.
Surveillors capable of stealing forwarding logs of nodes will need to gather
the forwarding nodes of large numbers of nodes.
This is risk sharing: privacy-leaking information is spread out and no single
node is a particularly juicy target for acquisition of control.

Note that nodes can publish themselves without publishing their IP addresses;
thus they can at least hide their locations at the IP layer while revealing
their locations at the Lightning layer.
Nodes can also use Tor, at the cost of increased latency with the rest of the
network, which behooves us to design our protocol to minimize bandwidth and
turnaroud.

Refusal to Log
--------------

Now, a forwarding node operator may not necessarily be an evil surveillor, and
may even have a desire to be (seen as) good.
Such a node operator might, as protection against hacks, and protection against
evil surveillors who might hack their node and snoop their data, refuse to
maintain a log of forwards.

Let us presuppose that there exists some magic spell, that ensures that a node
operator is in fact performing (or not performing) some promised act (or
disavowed act).
I will now show that, even if such a spell could exist, a node operator cannot
promise to not keep a log of forwards, at least with the current HTLCs being
sent via Poon-Dryja channels.

First:

* Poon-Dryja requires transaction revocation.
* Transaction revocation requires that nodes be able to recognize old state and
revoke them.
* Revoking a transaction means spending it using a special revocation branch of
a special modified script.
* Special scripts in Bitcoin must be encoded in P2SH/P2WSH.
* To spend a P2SH/P2WSH, we need to show the whole encoded script, even those
branches that are not taken.
* Thus, to spend a revocable HTLC output, a node has to store the hash and
timelock, even if it is taking the revocation path.
* While HTLCs currently used have yet another transaction (HTLC-timeout or
HTLC-success) to actually claim the funds, both the HTLC and the claim
transactions have to be revocable, else an attacker might publish an old state
that has all channel funds in HTLCs.
If the victim is unable to publish the HTLC script in order to spend (and
revoke) the plain HTLC output without an HTLC-timeout or HTLC-success
transaction being published, then the attacker can simply refuse to publish the
HTLC-timeout/HTLC-success and the victim still loses access to the funds
(though the attacker will still not be able to claim it, the attacker might
just want to make the victim lose funds).
Thus, every node must be able to publish the HTLC script in order to revoke
it (even without the attacker publishing HTLC-success/HTCL-timeout), which
means it has to keep every old state hash and timelock.

Then, secondly:

* The hashes used in forwarding have to be the same in both the incoming HTLC
and the outgoing HTLC.

Thus:

* Nodes have to keep all hashes and timelocks for all HTLCs of all older
channel states of all channels, otherwise they are insecure.
* Third parties acquiring the above data from nodes can correlate forwards by
checking for HTLCs that exist in two channels.
* Indeed, because HTLC scripts are different based on whether it is the local
or the remote side that offered them, third parties can determine as well which
HTLC is incoming and which it outgoing.
* Further, the stored timelock gives an upper bound on the time an HTLC was
created, giving hints on timing of a payment as well.
* Therefore all current secure nodes are required to store data that is enough
to derive a log of forwards.

Of course, this is not so dire.
In the (close?) future, improvements to base layer will allow nodes to elide
enough information from its stored data, so that a log of forwards can no
longer be derived:

* Schnorr means we can use pointlocked timelocked contracts with payment
decorrelation, thus allowing breaking of the link between incoming and outgoing
PTLCs even in the data stored by nodes.
* Once an outgoing PTLC has been claimed and we have determined the incoming
PTLC, we can delete the delta between the two contracts from persistent
storage, thus preventing from linking incoming from outgoing PTLCs.
* Taproot means we can put the revocation branch of revocable HTLCs in a
separate tapleaf.
* Thus, nodes can store only the Merkle path to the revocation script, and
thus no longer store the hash and timelock for all HTLCs on all older states.
* `SIGHASH_NOINPUT` means we can implement Decker-Russell-Osuntokun.
* Thus, honest nodes no longer need to store data from previous states of the
channel.

Against the above, however, I must remind:

* We still need to create a magic spell that allows third parties to actually
believe, without tr\*st, the promises of node operators that they will not
store a plain log of forwards of their node, and that they will not modify the
available open-source software to do so and run their locally-modified versions.
* Even if the above magic spell were created, at ***some*** point in the past
the node knew an incoming HTLC/PTLC and its link to an outgoing HTLC/PTLC.
* Lower layers beneath the node software may betray the above magic spell
nevertheless:
* "Deleted" data might be implemented by a database layer by simply marking
some record as unused, and then reuse of that record might take a long time,
meaning that old data may still exist in "deleted" records.
* The filesystem layer might implement recovery logs or other techniques
where "overwritten" data is kept, as a simple function of reliability, and that
data may remain validly readable outside the filesystem (i.e. deleted data
might not be overwritten).
* The spinning rust / floating gate lower layers may, even with data
overwritten, retain transient inductive / capacitive charge that can be read by
specialized equipment, or may not overwrite data from sectors / blocks marked
bad and then remapped elsewhere.
* In short, we cannot "refuse to log": we have to write the link between an
outgoing HTLC/PTLC to an incoming HTLC/PTLC somewhere persistent *first* (in
case our node gets randomly shut down while a forward is not yet claimed), wait
for the outgoing HTLC/PTLC to be claimed, and *then* delete it later.
And "delete" is not necessarily an operation that is possible in the
presence of attackers that want to retain your data.

Possible Solutions
==================

Avoiding Shortest Path
----------------------

The shortest path heuristic is key in analyzing payments on the Lightning
network.
Thus, breaking this heuristic is important.

* One way is to `permuteroute` the shortest-path.
* For reference, `permuteroute` is described here:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-August/002095.html
* Briefly: It is an algorithm where you input an existing route, plus a
"breakage" (a channel or node that is failing) on that route.
The algorithm then makes a short-range pathfinding to heal the break, or
fails if it takes too long for it to find a way to heal the break.
* Start with a shortest path via normal shortest-path pathfinding algorithms,
then randomly select one intermediate node on that path and treat it as a
"breakage" with `permuteroute`.
* If that succeeds, try it a few more times until it fails or you are
satisfied with the additional path length, then just use the latest randomized
path.
* Add random tweaks to your channel traversal costs.
* This is done currently by the C-Lightning route randomization feature, but
note that it is currently set to up to a +/-5% tweak.
* [This paper](https://arxiv.org/pdf/1909.06890) evaluates the C-Lightning
route randomization as well.
It suggest adding random tweaks to the costs of entire routes instead,
though it may not be easy for normal pathfinding algorithms to implement this.
* Rusty Russell Random Route Ralgorithm.
* Alluded to in [SLP134](https://stephanlivera.com/episode/134/).
* In a private email, Rusty recounted the random routing routine; roughly
reworded:
* Start with a Dijkstra starting from the payer node and completing over
the entire routemap.
* Store the distance from each node to the payer.
* This step can be precomputed, rather than performed each time we need a
path.
* Then the algorithm proper:
* Start at the destination. payee node.
* Toss a biased coin (bias controlled by how much you want to overpay
fees):
* If it is heads, pick a channel at random and extend the route to it.
* If it is tails, find which channel goes to the node with
lowest-distance-to-payer and extend the route to it.
* Keep doing the above until we reach the payer node.
* Then apply shadow routing afterwards.
* Note that the above algorithm is roughly similar to the Greedy Best First
Search with precomputed distance heuristic (a.k.a. `getroutequick`, which I
presented in LNConf 2019).
* The only difference is that there is a random coin toss between selecting
a random channel and going towards the source node, where `getroutequick` just
goes towards the source node.
* We can approximate the Rusty Russell Random Routing Ralgorithm by
applying a random tweak to the distance heuristic in a Greedy Best First Search!
* The algorithm as described has a chance of creating small loops, which
should be avoided.
* Small loops that curl (i.e. pass multiple times) through a surveillor
node are equivalent to a payment passing through multiple surveillor-controlled
nodes, and can be identified even post-payment-decorrelation as being the same
payment.
* Thus you end up paying for routing through the loop, but do not gain
much privacy from that extra payment because now you are giving more accurate
data to a specific node (where the payment curled through).
* A proper Greedy Best First Search avoids loops due to marking nodes as
they are visited, so I think basing the algorithm around Greedy Best First
Search plus random tweak on the heuristic is better.
* Usage discounting.
* When a base pathfinder returns a path, increment a "usedness" counter on
each node on that path.
* On future pathfinding calls, consider those nodes to have higher than
normal cost, according to their usedness.
* Of note is that pathfinding algorithms do NOT actually need a single
numeric cost.
* Instead, they need a mathematical group and group operation, with an
additional "comparison" operation to determine which of two group elements is
smaller.
* Integer costs are a group with a comparison operation.
* Instead of integer cost, we can have a pair of `(usedness, fee)`.
* Add each tuple element separately, e.g. `(usedness_a, fee_a) +
(usedness_b, fee_b) = (usedness_a + usedness_b, fee_a + fee_b)`.
* To compare, e.g. `(usedness_a, fee_a) < (usedness_b, fee_b)`, we
first compare by `usedness`, then if they are equal, compare `fee`.
* This gives good diversity over multiple calls of the pathfinder, which is
useful for:
* Multipath, since we want each candidate path to have different nodes as
much as possible, without completely discounting those nodes in case of
bottlenecks in the network.
* Ensures your payment data is distributed to multiple nodes rather than
reusing the same nodes again and again, improving Risk Sharing since your data
is spread out over more nodes.
* The `usedness` needs to persist across node restarts.

Direct Path Exemption
---------------------

As mentioned above, avoiding the shortest path breaks the shortest path
heuristic, adding uncertainty to analyses and thus improving privacy.

However, we should note an important exception to avoiding the shortest path:

* If the shortest path is a single hop, directly from the payer to the payee
without intervening nodes, then we should prefer that path rather than avoid it.

The reason why we have this exception is that ***not telling any other nodes
gives us the best privacy***.
This is achievable only by having direct channels with a payee, thus it ***does
not scale***, since it would not be practical to have direct channels between
every payer and payee.
However, if we find that the payment is to a direct peer, taking this exception
should be a massive, low-hanging privacy win.

This brings up the following observation:

* In order to avoid the shortest path, we may need to add more nodes to the
path we use.
* By adding more nodes to the path we use, we increase the number of nodes
that know our payment, increasing our chances of informing some surveillor.

Extended Shadow Routing
-----------------------

Another apropos observation is that having more nodes on a path reduces the
reliability of payments along the path.
Every node on that path is a possible failure point in forwarding.

Worse, paths that are longer in terms of hops are more likely to have the below
failure mode:

* A forwarding node is able to accept and irrevocably forward an HTLC.
* Before the outgoing HTLC can be claimed offchain, the forwarding node shuts
down.
* The payment is now "stuck" until the incoming HTLC of the node reaches
timeout.

Basically, if a forwarding node fails between the time it forwards the HTLC and
the time the next node is able to claim it, payments get stuck for durations
that can be counted in entire blocks.
As path length increases, this failure mode becomes more likely:

* There are more possible nodes to fail.
* The time between when a node is able to forward and the next node is able to
claim becomes longer, making it more likely that an individual node fails
during this time.

Of course, taking the shortest path in terms of node hops is precisely what
reverse routing tables can break.
This is the core of the observation that privacy and usability are apparently
at odds with one another.

Reverse routing tables cannot precisely identify the payer and payee.
However, the timelock, as well as the time it takes for an HTLC to be forwarded
and then claimed, can give hints as to the distance between the forwarding node
and the payee.
We cannot do anything about the sidereal time at which events occur, but we
*can* do something about timelocks, i.e. shadow routing as already implemented
in C-Lightning.

A secondary concern is that, after path decorrelation is implemented, the
*delta* in value, timelock, and sidereal time between two cooperating
surveilling nodes can be used to determine if two forwards on those nodes
belong to the same overall payment.
Shadow routing does not help this, since it only adds timelock and value tweaks
to the *end* of the path, and any *delta* between two surveilling nodes remains
the same.

Avoiding the shortest path has a chance that the sub-path between the two
surveilling nodes is not the delta of the shortest path between them, which
helps remove payment correlation in a post-decorrelation world.
However, it does have the drawback that increasing the actual path length tends
to increase the number of nodes along the path, reducing reliability.

By inserting shadow routes in the middle of the path instead of just the payee
end, we can gain some of the above advantage of removing payment correlation in
a post-decorrelation world, without having to incur the reduced reliability of
actually increasing the number of hops along a path.
This is only useful in a post-decorrelation world, as use of HTLCs means that
two cooperating surveilling nodes can determine if two forwards between them
belong to the same overall payment.

Inserting such shadow routes basically amounts to overpaying fees and giving
more timelock slack to forwarding nodes.

Multipath Value Obfuscation
---------------------------

An observation we can have is that, the difference between a fixed known value
and a random value will be indistinguishable from a random value.
The same observation underlies adaptor signatures and payment decorrelation.

Thus, one way to obfuscate the unremovable data of the payment value is to
always start with payments split into two subpayments.

* Take a random value from 1 to the total payment minus 1 and use it on one
subpayment.
* Take the difference between the total payment amount and the above random
value and use it as the other subpayment.

Both numbers will look like random values, and only together can they be used
to identify the actual value.

The value obfuscation only works to obfuscate the final payee value.
Without "avoiding shortest path" or "extended shadow route", the value of the
payment can still be used by two cooperating surveilling nodes to identify if
two forwards belong to the same (sub)payment, even in a post-decorrelation
world.

With HTLCs, two cooperating surveilling nodes that happen to be on separate
subpayments can still identify the total payment and thereby have a guess on
what is being bought.

_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

[Lightning-dev] On Path Privacy

Reply via email to