On 12/22/21 05:56, Fujii Masao wrote:


On 2021/12/22 10:57, Tomas Vondra wrote:


On 12/19/21 04:03, Amit Kapila wrote:
On Sat, Dec 18, 2021 at 7:24 AM Tomas Vondra
<tomas.von...@enterprisedb.com> wrote:

while working on logical decoding of sequences, I ran into an issue with
nextval() in a transaction that rolls back, described in [1]. But after
thinking about it a bit more (and chatting with Petr Jelinek), I think
this issue affects physical sync replication too.

Imagine you have a primary <-> sync_replica cluster, and you do this:

    CREATE SEQUENCE s;

    -- shutdown the sync replica

    BEGIN;
    SELECT nextval('s') FROM generate_series(1,50);
    ROLLBACK;

    BEGIN;
    SELECT nextval('s');
    COMMIT;

The natural expectation would be the COMMIT gets stuck, waiting for the
sync replica (which is not running), right? But it does not.


How about if we always WAL log the first sequence change in a transaction?


I've been thinking about doing something like this, but I think it would not have any significant advantages compared to using "SEQ_LOG_VALS 0". It would still have the same performance hit for plain nextval() calls, and there's no measurable impact on simple workloads that already write WAL in transactions even with SEQ_LOG_VALS 0.

Just idea; if wal_level > minimal, how about making nextval_internal() (1) check whether WAL is replicated to sync standbys, up to the page lsn of the sequence, and (2) forcibly emit a WAL record if not replicated yet? The similar check is performed at the beginning of SyncRepWaitForLSN(), so probably we can reuse that code.


Interesting idea, but I think it has a couple of issues :-(

1) We'd need to know the LSN of the last WAL record for any given sequence, and we'd need to communicate that between backends somehow. Which seems rather tricky to do without affecting performance.

2) SyncRepWaitForLSN() is used only in commit-like situations, and it's a simple wait, not a decision to write more WAL. Environments without sync replicas are affected by this too - yes, the data loss issue is not there, but the amount of WAL is still increased.

IIRC sync_standby_names can change while a transaction is running, even just right before commit, at which point we can't just go back in time and generate WAL for sequences accessed earlier. But we still need to ensure the sequence is properly replicated.

3) I don't think it'd actually reduce the amount of WAL records in environments with many sessions (incrementing the same sequence). In those cases the WAL (generated by in-progress xact from another session) is likely to not be flushed, so we'd generate the extra WAL record. (And if the other backends would need flush LSN of this new WAL record, which would make it more likely they have to generate WAL too.)


So I don't think this would actually help much.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to