I tried to work out a few scenarios with this, where the apply worker
will wait until its local clock hits 'remote_commit_tts - max_skew
permitted'. Please have a look.
Let's say, we have a GUC to configure max_clock_skew permitted.
Resolver is last_update_wins in both cases.
1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew.
Remote Update with commit_timestamp = 10.20AM.
Local clock (which is say 5 min behind) shows = 10.15AM.
> >
When remote update arrives at local node, we see that skew is greater
than max_clock_skew and thus apply worker waits till local clock hits
'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the
local clock hits 10.20 AM, the worker applies the remote change with
commit_tts of 10.20AM. In the meantime (during wait period of apply
worker)) if some local update on same row has happened at say 10.18am,
that will applied first, which will be later overwritten by above
remote change of 10.20AM as remote-change's timestamp appear more
latest, even though it has happened earlier than local change.
For the sake of simplicity let's call the change that happened at
10:20 AM change-1 and the change that happened at 10:15 as change-2
and assume we are talking about the synchronous commit only.

Do you mean "the change that happened at 10:18 as change-2"

I think now from an application perspective the change-1 wouldn't have
caused the change-2 because we delayed applying change-2 on the local
node

Do you mean "we delayed applying change-1 on the local node."

which would have delayed the confirmation of the change-1 to the
application that means we have got the change-2 on the local node
without the confirmation of change-1 hence change-2 has no causal
dependency on the change-1.  So it's fine that we perform change-1
before change-2

Do you mean "So it's fine that we perform change-2 before change-1"

and the timestamp will also show the same at any other
node if they receive these 2 changes.
The goal is to ensure that if we define the order where change-2
happens before change-1, this same order should be visible on all
other nodes. This will hold true because the commit timestamp of
change-2 is earlier than that of change-1.

Considering the above corrections as base, I agree with this.

2)  Case 2: max_clock_skew is set to 2min.
> >
Remote Update with commit_timestamp=10.20AM
Local clock (which is say 5 min behind) = 10.15AM.
> >
Now apply worker will notice skew greater than 2min and thus will wait
till local clock hits 'remote's commit_tts - max_clock_skew' i.e.
10.18 and will apply the change with commit_tts of 10.20 ( as we
always save the origin's commit timestamp into local commit_tts, see
RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say
another local update is triggered at 10.19am, it will be applied
locally but it will be ignored on remote node. On the remote node ,
the existing change with a timestamp of 10.20 am will win resulting in
data divergence.
Let's call the 10:20 AM change as a change-1 and the change that
happened at 10:19 as change-2
IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that
commit_ts of that change is 10:20, and the same will be visible to all
other nodes.  So in conflict resolution still the change-1 happened
after the change-2 because change-2's commit_ts is 10:19 AM.   Now
there could be a problem with the causal order because we applied the
change-1 at 10:18 AM so the application might have gotten confirmation
at 10:18 AM and the change-2 of the local node may be triggered as a
result of confirmation of the change-1 that means now change-2 has a
causal dependency on the change-1 but commit_ts shows change-2
happened before the change-1 on all the nodes.
So, is this acceptable? I think yes because the user has configured a
maximum clock skew of 2 minutes, which means the detected order might
not always align with the causal order for transactions occurring
within that time frame.

Agree. I had the same thoughts, and wanted to confirm my understanding.

Generally, the ideal configuration for
max_clock_skew should be in multiple of the network round trip time.
Assuming this configuration, we wouldn't encounter this problem
because for change-2 to be caused by change-1, the client would need
to get confirmation of change-1 and then trigger change-2, which would
take at least 2-3 network round trips.


