(Resending this because somehow my previous post did not appear in the
mail archives)
-- Forwarded message -
From: Peter Smith
Date: Fri, Dec 2, 2022 at 7:59 PM
Subject: Re: Perform streaming logical transactions by background
workers and parallel apply
To: houzj.f...@fujitsu.com
Cc: Amit Kapila , Masahiko Sawada
, wangw.f...@fujitsu.com
, Dilip Kumar ,
shiy.f...@fujitsu.com , PostgreSQL Hackers
Here are my review comments for patch v54-0001.
==
FILE: .../replication/logical/applyparallelworker.c
1. File header comment
1a.
+ * This file contains the code to launch, set up, and teardown parallel apply
+ * worker which receives the changes from the leader worker and
invokes routines
+ * to apply those on the subscriber database.
"parallel apply worker" -> "a parallel apply worker"
~
1b.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelApplyWorkerInfo which is required to communicate
+ * among leader and parallel apply workers.
"that are intended to support" -> "for"
"required to communicate among leader and parallel apply workers." ->
"required so the leader worker and parallel apply workers can
communicate with each other."
~
1c.
+ *
+ * The parallel apply workers are assigned (if available) as soon as xact's
+ * first stream is received for subscriptions that have set their 'streaming'
+ * option as parallel. The leader apply worker will send changes to this new
+ * worker via shared memory. We keep this worker assigned till the transaction
+ * commit is received and also wait for the worker to finish at commit. This
+ * preserves commit ordering and avoid file I/O in most cases, although we
+ * still need to spill to a file if there is no worker available. See comments
+ * atop logical/worker to know more about streamed xacts whose changes are
+ * spilled to disk. It is important to maintain commit order to avoid failures
+ * due to (a) transaction dependencies, say if we insert a row in the first
+ * transaction and update it in the second transaction on publisher then
+ * allowing the subscriber to apply both in parallel can lead to failure in the
+ * update. (b) deadlocks, allowing transactions that update the same set of
+ * rows/tables in the opposite order to be applied in parallel can lead to
+ * deadlocks.
"due to (a)" -> "due to: "
"(a) transaction dependencies, " -> "(a) transaction dependencies - "
". (b) deadlocks, " => "; (b) deadlocks - "
~
1d.
+ *
+ * We maintain a worker pool to avoid restarting workers for each streaming
+ * transaction. We maintain each worker's information in the
+ * ParallelApplyWorkersList. After successfully launching a new worker, its
+ * information is added to the ParallelApplyWorkersList. Once the worker
+ * finishes applying the transaction, we mark it available for re-use. Now,
+ * before starting a new worker to apply the streaming transaction, we check
+ * the list for any available worker. Note that we maintain a maximum of half
+ * the max_parallel_apply_workers_per_subscription workers in the pool and
+ * after that, we simply exit the worker after applying the transaction.
+ *
"We maintain a worker pool" -> "A worker pool is used"
"We maintain each worker's information" -> "We maintain each worker's
information (ParallelApplyWorkerInfo)"
"we mark it available for re-use" -> "it is marked as available for re-use"
"Note that we maintain a maximum of half" -> "Note that we retain a
maximum of half"
~
1e.
+ * XXX This worker pool threshold is a bit arbitrary and we can provide a GUC
+ * variable for this in the future if required.
"a bit arbitrary" -> "arbitrary"
~
1f.
+ *
+ * The leader apply worker will create a separate dynamic shared memory segment
+ * when each parallel apply worker starts. The reason for this design is that
+ * we cannot count how many workers will be started. It may be possible to
+ * allocate enough shared memory in one segment based on the maximum number of
+ * parallel apply workers (max_parallel_apply_workers_per_subscription), but
+ * this would waste memory if no process is actually started.
+ *
"we cannot count how many workers will be started." -> "we cannot
predict how many workers will be needed."
~
1g.
+ * The dynamic shared memory segment will contain (a) a shm_mq that is used to
+ * send changes in the transaction from leader apply worker to parallel apply
+ * worker (b) another shm_mq that is used to send errors (and other messages
+ * reported via elog/ereport) from the parallel apply worker to leader apply
+ * worker (c) necessary information to be shared among parallel apply workers
+ * and leader apply worker (i.e. members of ParallelApplyWorkerShared).
"will contain (a)" => "contains: (a)"
"worker (b)" -> "worker; (b)
"worker (c)" -> "worker; (c)"
"and leader apply worker" -> "and the leader apply worker"
~
1h.
+ *
+ * Locking Considerations
+ * --
+ * Since the