>
> I haven't looked at the `announcement_signatures` case yet, but at least
> for the `commit_sig` case this should never be an issue.
>
Yup the issue is the same for `announcement_signatures`. Storing the
last_short_channel_id is key here so you know which announcement
message you can safely ignore and which are considered errors. This
behavior should be enacted until sending OR receiving `revoke_and_ack`.


> It only means
> that sometimes, after sending `splice_locked`, you will receive some
> `commit_sig` messages that are for commitments that you don't care about
> anymore. You should be able to safely ignore those `commit_sig`. I have
> provided more details in the gist linked.
>
Yup you should ignore invalid commit_sigs for this brief period of time but
then
go back to rejecting invalid commit_sigs after sending OR receiving
`revoke_and_ack`.


> Let me know if I'm missing something, but I believe this is simply an
> edge case that implementations need to correctly handle, not a protocol
> issue? Or maybe I'm not understanding the scenario correctly?
>
It is a critical detail to implement correctly. Implementations without
good test coverage can easily ship releases that will break / force close
randomly during the splice lock process. What's worse is this effects
the larger more valuable/busy routing nodes more often and is easy to
miss on a simple regtest node 😳.

By the way, I find your notation a bit hard to follow...I think that we
> really need to detail the exact message flow (like I did in the linked
> gist) to be able to explain protocol issues, otherwise there's always
> a risk that people think about a slightly different message flow, which
> means we'll just talk past each other...
>
The original gist from back in January is here:
https://gist.github.com/ddustin/7ee222eb31c3eac5b141c991c0937fae

Happy to add more details but the example is accurate to exactly the
messages sent during this time. Confirmed it with actual tests as well.

Is there some message that it looks like I missed here? If it helps I can
give you an actual log message flow from the real tests, but honestly,
they just make it more confusing to understand 🤷.

It might be simpler to follow by working backwards from this solution:
https://gist.github.com/ddustin/017aeadfbf34d2fcd950f1238614afe2

My original proposal from late January ^ 100% solves the issue but
adds a lock on the connection via an extra STFU step.

My current proposal is essentially "let's allow the race condition and
observe which things become invalid during it." I settled on two things
invalid events that occur:
* A stale `commitments_signed` message (bundle) with extra signatures
* A stale `announcement_sigs` message for the pre-splice channel

An "easy" solution an implementation could do is allow any invalid
`commitments_signed` or `announcement_sigs` to come through
and ignore them without warning or error. This "works" but leaves the
node in a messy state where actual errors are being squelched
and opens up a potential DoS issue.

So the ideal solution is to allow a certain class of 'stale' messages
for a period of time, that time being from:
A) mutual_splice_locked (the moment we have both sent & received
`splice_locked`)
B) A successful `commitment_signed` <-> `revoke_and_ack` round trip (in
either direction)

The definition of A is pretty straightforward and we need to detect
mutual_splice_lock anyway so that's easy.

The definition of B can be a couple of things but the simplest is I believe
the sending OR receiving of `revoke_and_ack`.

Is it a little more work to implement it this way? Yeah. But I think the
effort
is worth it to make the protocol & our nodes more robust.

Cheers,
Dusty
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Reply via email to