On Wed, Sep 4, 2024 at 12:23 PM shveta malik <shveta.ma...@gmail.com> wrote: > > Hello hackers, > (Cc people involved in the earlier discussion) > > I would like to discuss the $Subject. > > While discussing Logical Replication's Conflict Detection and > Resolution (CDR) design in [1] , it came to our notice that the > commit LSN and timestamp may not correlate perfectly i.e. commits may > happen with LSN1 < LSN2 but with Ts1 > Ts2. This issue may arise > because, during the commit process, the timestamp (xactStopTimestamp) > is captured slightly earlier than when space is reserved in the WAL. > > ~~ > > Reproducibility of conflict-resolution problem due to the timestamp inversion > ------------------------------------------------ > It was suggested that timestamp inversion *may* impact the time-based > resolutions such as last_update_wins (targeted to be implemented in > [1]) as we may end up making wrong decisions if timestamps and LSNs > are not correctly ordered. And thus we tried some tests but failed to > find any practical scenario where it could be a problem. > > Basically, the proposed conflict resolution is a row-level resolution, > and to cause the row value to be inconsistent, we need to modify the > same row in concurrent transactions and commit the changes > concurrently. But this doesn't seem possible because concurrent > updates on the same row are disallowed (e.g., the later update will be > blocked due to the row lock). See [2] for the details. > > We tried to give some thoughts on multi table cases as well e.g., > update table A with foreign key and update the table B that table A > refers to. But update on table A will block the update on table B as > well, so we could not reproduce data-divergence due to the > LSN/timestamp mismatch issue there. > > ~~ > > Idea proposed to fix the timestamp inversion issue > ------------------------------------------------ > There was a suggestion in [3] to acquire the timestamp while reserving > the space (because that happens in LSN order). The clock would need to > be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. The > main problem why it's being done outside the critical section, because > gettimeofday() may be quite expensive. There's a concept of hybrid > clock, combining "time" and logical counter, which might be useful > independently of CDR. > > On further analyzing this idea, we found that CLOCK_MONOTONIC can be > accepted only by clock_gettime() which has more precision than > gettimeofday() and thus is equally or more expensive theoretically (we > plan to test it and post the results). It does not look like a good > idea to call any of these when holding spinlock to reserve the wal > position. As for the suggested solution "hybrid clock", it might not > help here because the logical counter is only used to order the > transactions with the same timestamp. The problem here is how to get > the timestamp along with wal position > reservation(ReserveXLogInsertLocation). >
Here are the tests done to compare clock_gettime() and gettimeofday() performance. Machine details : Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s): 120; 800GB RAM Three functions were tested across three different call volumes (1 million, 100 million, and 1 billion): 1) clock_gettime() with CLOCK_REALTIME 2) clock_gettime() with CLOCK_MONOTONIC 3) gettimeofday() --> clock_gettime() with CLOCK_MONOTONIC sometimes shows slightly better performance, but not consistently. The difference in time taken by all three functions is minimal, with averages varying by no more than ~2.5%. Overall, the performance between CLOCK_MONOTONIC and gettimeofday() is essentially the same. Below are the test results - (each test was run twice for consistency) 1) For 1 million calls: 1a) clock_gettime() with CLOCK_REALTIME: - Run 1: 0.01770 seconds, Run 2: 0.01772 seconds, Average: 0.01771 seconds. 1b) clock_gettime() with CLOCK_MONOTONIC: - Run 1: 0.01753 seconds, Run 2: 0.01748 seconds, Average: 0.01750 seconds. 1c) gettimeofday(): - Run 1: 0.01742 seconds, Run 2: 0.01777 seconds, Average: 0.01760 seconds. 2) For 100 million calls: 2a) clock_gettime() with CLOCK_REALTIME: - Run 1: 1.76649 seconds, Run 2: 1.76602 seconds, Average: 1.76625 seconds. 2b) clock_gettime() with CLOCK_MONOTONIC: - Run 1: 1.72768 seconds, Run 2: 1.72988 seconds, Average: 1.72878 seconds. 2c) gettimeofday(): - Run 1: 1.72436 seconds, Run 2: 1.72174 seconds, Average: 1.72305 seconds. 3) For 1 billion calls: 3a) clock_gettime() with CLOCK_REALTIME: - Run 1: 17.63859 seconds, Run 2: 17.65529 seconds, Average: 17.64694 seconds. 3b) clock_gettime() with CLOCK_MONOTONIC: - Run 1: 17.15109 seconds, Run 2: 17.27406 seconds, Average: 17.21257 seconds. 3c) gettimeofday(): - Run 1: 17.21368 seconds, Run 2: 17.22983 seconds, Average: 17.22175 seconds. ~~~~ Attached the scripts used for tests. -- Thanks, Nisha
<<attachment: clock_gettime_test.zip>>