Hey,

In testing the `splice_locked` workflow I discovered a race condition which
is critical we solve correctly. The core problem happens if any channel
activity occurs in the time after `splice_locked` is sent and before
`splice_locked` is received.

`splice_locked` is defined as being locked once it is both sent and
received. It is fairly trivial to build a test case for this -- have a node
continually spamming payments while `splice_lock`ing is occurring and the
race condition will trigger relatively often.

The race condition effects two messages in particular: `commitment_signed`
and `announcement_signatures`. Below is an example of how it occurs with
commitment but the flow is essentially the same for announcement:

Legend:
Item -> means sent
Item <- means received
Chan X (implies a channel at block height X)
(Since these happen at different times)
Splice locked race condition example
Node A. Node B.
* Channel starts at block height 100
splice_locked ->
<- splice_locked
<- commitments_signed (Chan 100)
-> splice_locked
Node B now considers splice locked (Chan 106)
<- commitments_signed (Chan 106)
splice_locked <-
Node A now considers splice locked (Chan 106)
commitments_signed <- (Chan 100)
commitments_signed <- (Chan 106)
Node A considers the commitments_signed for Chan 100 invalid.
The commitments_signed for Chan 106 is, however, valid.
This example uses commitments_signed but remains a problem for any message
that depends on channel state.

The solution requires the temporary storing of two items:
* [scid] last_short_channel_id (the pre-splice short channel id)
* [bool] splice_await_commitment_succcess

After sending & receiving `splice_locked` (so called 'mutual splice lock),
the last_short_channel_id should be set to the pre-splice short channel id
and splice_await_commitment_succcess should be flagged to true.

If an `announcement_signatures` is received with an scid matching
`last_short_channel_id` the message should be ignored and the channel
connection should not be aborted (as it normally would).

If a `commitment_signed` message is received with the
tlv splice_info->splice_channel_id set to something other than the
successfully confirmed splice channel_id, the message should be ignored.

Once a revoke_and_ack is successfully sent OR received,
`last_short_channel_id` and `splice_await_commitment_succcess` should be
reset and normal validation of `announcement_signatures` and
`commitment_signed` should be resumed.

This solves the race condition while preserving as strict a validation of
messages as possible and removes the need to add new fields to these
messages.

Cheers,
Dusty
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Reply via email to