Hey, In testing the `splice_locked` workflow I discovered a race condition which is critical we solve correctly. The core problem happens if any channel activity occurs in the time after `splice_locked` is sent and before `splice_locked` is received.
`splice_locked` is defined as being locked once it is both sent and received. It is fairly trivial to build a test case for this -- have a node continually spamming payments while `splice_lock`ing is occurring and the race condition will trigger relatively often. The race condition effects two messages in particular: `commitment_signed` and `announcement_signatures`. Below is an example of how it occurs with commitment but the flow is essentially the same for announcement: Legend: Item -> means sent Item <- means received Chan X (implies a channel at block height X) (Since these happen at different times) Splice locked race condition example Node A. Node B. * Channel starts at block height 100 splice_locked -> <- splice_locked <- commitments_signed (Chan 100) -> splice_locked Node B now considers splice locked (Chan 106) <- commitments_signed (Chan 106) splice_locked <- Node A now considers splice locked (Chan 106) commitments_signed <- (Chan 100) commitments_signed <- (Chan 106) Node A considers the commitments_signed for Chan 100 invalid. The commitments_signed for Chan 106 is, however, valid. This example uses commitments_signed but remains a problem for any message that depends on channel state. The solution requires the temporary storing of two items: * [scid] last_short_channel_id (the pre-splice short channel id) * [bool] splice_await_commitment_succcess After sending & receiving `splice_locked` (so called 'mutual splice lock), the last_short_channel_id should be set to the pre-splice short channel id and splice_await_commitment_succcess should be flagged to true. If an `announcement_signatures` is received with an scid matching `last_short_channel_id` the message should be ignored and the channel connection should not be aborted (as it normally would). If a `commitment_signed` message is received with the tlv splice_info->splice_channel_id set to something other than the successfully confirmed splice channel_id, the message should be ignored. Once a revoke_and_ack is successfully sent OR received, `last_short_channel_id` and `splice_await_commitment_succcess` should be reset and normal validation of `announcement_signatures` and `commitment_signed` should be resumed. This solves the race condition while preserving as strict a validation of messages as possible and removes the need to add new fields to these messages. Cheers, Dusty
_______________________________________________ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev