> > I haven't looked at the `announcement_signatures` case yet, but at least > for the `commit_sig` case this should never be an issue. > Yup the issue is the same for `announcement_signatures`. Storing the last_short_channel_id is key here so you know which announcement message you can safely ignore and which are considered errors. This behavior should be enacted until sending OR receiving `revoke_and_ack`.
> It only means > that sometimes, after sending `splice_locked`, you will receive some > `commit_sig` messages that are for commitments that you don't care about > anymore. You should be able to safely ignore those `commit_sig`. I have > provided more details in the gist linked. > Yup you should ignore invalid commit_sigs for this brief period of time but then go back to rejecting invalid commit_sigs after sending OR receiving `revoke_and_ack`. > Let me know if I'm missing something, but I believe this is simply an > edge case that implementations need to correctly handle, not a protocol > issue? Or maybe I'm not understanding the scenario correctly? > It is a critical detail to implement correctly. Implementations without good test coverage can easily ship releases that will break / force close randomly during the splice lock process. What's worse is this effects the larger more valuable/busy routing nodes more often and is easy to miss on a simple regtest node 😳. By the way, I find your notation a bit hard to follow...I think that we > really need to detail the exact message flow (like I did in the linked > gist) to be able to explain protocol issues, otherwise there's always > a risk that people think about a slightly different message flow, which > means we'll just talk past each other... > The original gist from back in January is here: https://gist.github.com/ddustin/7ee222eb31c3eac5b141c991c0937fae Happy to add more details but the example is accurate to exactly the messages sent during this time. Confirmed it with actual tests as well. Is there some message that it looks like I missed here? If it helps I can give you an actual log message flow from the real tests, but honestly, they just make it more confusing to understand 🤷. It might be simpler to follow by working backwards from this solution: https://gist.github.com/ddustin/017aeadfbf34d2fcd950f1238614afe2 My original proposal from late January ^ 100% solves the issue but adds a lock on the connection via an extra STFU step. My current proposal is essentially "let's allow the race condition and observe which things become invalid during it." I settled on two things invalid events that occur: * A stale `commitments_signed` message (bundle) with extra signatures * A stale `announcement_sigs` message for the pre-splice channel An "easy" solution an implementation could do is allow any invalid `commitments_signed` or `announcement_sigs` to come through and ignore them without warning or error. This "works" but leaves the node in a messy state where actual errors are being squelched and opens up a potential DoS issue. So the ideal solution is to allow a certain class of 'stale' messages for a period of time, that time being from: A) mutual_splice_locked (the moment we have both sent & received `splice_locked`) B) A successful `commitment_signed` <-> `revoke_and_ack` round trip (in either direction) The definition of A is pretty straightforward and we need to detect mutual_splice_lock anyway so that's easy. The definition of B can be a couple of things but the simplest is I believe the sending OR receiving of `revoke_and_ack`. Is it a little more work to implement it this way? Yeah. But I think the effort is worth it to make the protocol & our nodes more robust. Cheers, Dusty
_______________________________________________ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev