> If Alice's wallet uses
> Electrum servers, then whoever operates the server will be able to infer
> whether Alice has been restoring from seed or not. Not only Electrum,
> uses Electrum servers, but a whole range of mobile wallets do,
> including Phoenix.

That's also a good point. But I'm willing to consider that another
> k-anonymity
> set situation: how many randomly chosen blockchain data sources does the
> attacker control? Provided the probability of that happening is low, it's
> not a
> significant concern.


I think it only makes sense to talk about probability here if peers are
fully anonymous, which to my understanding is not currently the case. You
can consider an attack begins much before there is even an open channel
with the victim. Once an attacker identifies a target, only then would they
try to open a channel open with the victim, fully intending to
exploit them. The victim is already vulnerable in some way but doesn't yet
know it. From this perspective, the event of Alice attempting to restore
from seed at a particular moment is likely one that the attacker planned
all along, not some random occurrence. It's the last step in the
attack, not the first.



On Fri, Aug 18, 2023 at 3:06 AM Peter Todd <p...@petertodd.org> wrote:

> On Thu, Aug 17, 2023 at 11:36:58AM +0200, Thomas Voegtlin wrote:
> > Hello Peter,
> >
> > I have to disagree with both your reasoning and the numerical values
> > you are proposing. Let us first look at the equations:
> >
> > > Suppose that Alice goes a few days without connecting to Bob 10
> > > times per year, and this particular connection attempt is an example
> > > of that 10/year event.  Suppose that Alice has a 1% chance of data
> > > loss per year that **requires her to use Bob's channel state backup**,
> > > and suppose that with 100% certainty, in the event of data loss Alice
> > > would take a few days to attempt recovery. That means that this
> > > particular connection event represents a 1% / 10 = 0.1% probability
> > > event, and P_d = 0.1%
> >
> > In this paragraph, you are expressing the joint probability that Alice
> > has lost her data AND that she is displaying a certain behaviour
> > visible to Bob (staying disconnected for a few days).
> >
> > First, let me clarify the "per year" issue. Since you are trying to
> > compute the probability of data loss at a particular connection event,
> > the terms used in your equation must be the probability P(loss) of
> > data loss *at that connection*, and the probability P(behaviour) of
> > having been offline for several days *at that connection*. It may be
> > possible to express P(loss) and P(behaviour) from the probabilities
> > *per year*, but that would requires assumptions about the frequency of
> > connections, and the duration of the process. For example, once data
> > is lost, it remains lost forever. Since units of time do not show up
> > in your equations, I will assume that you simply meant probabilities
> > at connection: P(loss) = 0.01 and P(behaviour) = 0.1
> >
> > To compute the joint probability that Alice has lost data and that she
> > is displaying that behaviour, you used as a product, as if those were
> > independent events:
> >
> >   P(loss AND behaviour) = P(loss) * P(behaviour)
> >
> > Unfortunately, those events are not independent. If the behaviour of
> > Alice is caused by the loss, then P(loss AND behaviour) is certainly
> > not equal to P(loss) * P(behaviour), but considerably closer to
> > P(loss).
>
> While I agree that my math was a bit sloppy, I think you are thinking about
> this problem in the wrong way.
>
> The "normal" connections are the k-anonymity set: they are the non-loss
> events
> that could potentially be loss events. In addition to that set of events,
> there
> are also loss events that are indistinguishable from the non-loss events.
>
> If each client averages 10 delayed connections per year, a node with N
> clients
> will see:
>
>                 10
>     r_d =  N * ----
>                year
>
> ...total delayed connections per year. In addition to those delayed
> connections,
> that same node with N clients will see:
>
>                1%
>     r_l = N * ----
>               year
>
> ...total connections with data loss per year. That means the probability
> of a
> given connection attempt representing data loss is:
>
>              r_l
>     P_d = ---------
>           r_l + r_d
>
> Notice how if you work that out, the units completely cancel out, leaving
> you
> the dimensionless ratio 0.0999% ~= 0.1%, approximately the same number as
> my
> earlier less rigorous approach.
>
> > Anyway, the joint probability P(loss AND behaviour) is not the
> > relevant quantity here.  Indeed, when you wrote:
> >
> > > Bob can profit if V_f * P_d > V_h,
> >
> > you were implying that P_d is the same as the one you computed above,
> > using that multiplication. This is wrong. If we want to compute
> > whether Bob can profit, we need to look at what the probability of
> > data loss given the information available to Bob. In other words, P_d
> > should be the the conditional probability that Alice has lost her
> > state, given the information available to Bob: P(loss|behaviour):
>
> I already took that into account by defining the k-anonymity set as the
> set of
> all delayed non-loss connections. Conditional probability isn't relevant
> here.
>
> > Second, regarding the numerical values:
> >
> > I do not wish to argue over the numerical value of the probability of
> > data loss per year. However, I want to point out that the probability
> > of users connecting without having their data becomes considerably
> > higher if restoring your channels from seed becomes a feature. If
> > users are told that they can restore their state from seed, then they
> > are going to use that feature. For example, some Electrum users decide
> > to uninstall and reinstall their wallet app from their device whenever
> > they cross a border. So, the relevant question is, how frequently do
> > users restore their wallet from seed, and not how frequently they have
> > actually lost data.
>
> That's a good point. But I'm willing to say that's an example of the user
> doing
> something stupid and reckless. And it's an example that can be mitigated by
> making the process of an emergency restore from seed inconvenient, eg by
> adding
> artificial delays.
>
> > Finally, note that the "behaviour" encompasses all the information
> > available to Bob, not only the fact that Alice has been offline for a
> > few day. There might be other channels that can be exploited by an
> > attacker to gain information about Alice.
>
> In general, this is a class of engineering problem that we already have in
> many
> other circumstances. I don't believe it is a deal breaker to the idea.
>
> > If Alice's wallet uses
> > Electrum servers, then whoever operates the server will be able to infer
> > whether Alice has been restoring from seed or not. Not only Electrum,
> > uses Electrum servers, but a whole range of mobile wallets do,
> > including Phoenix.
>
> That's also a good point. But I'm willing to consider that another
> k-anonymity
> set situation: how many randomly chosen blockchain data sources does the
> attacker control? Provided the probability of that happening is low, it's
> not a
> significant concern.
>
> --
> https://petertodd.org 'peter'[:-1]@petertodd.org
> _______________________________________________
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
>
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Reply via email to