On 12-06-21 04:37 AM, Andres Freund wrote:
Hi Steve,
Thanks!


Attached is a detailed review of the patch.

Very good analysis, thanks!
Another reasons why we cannot easily do 1) is that subtransactions aren't
discernible from top-level transactions before the top-level commit happens,
we can only properly merge in the right order (by "sorting" via lsn) once we
have seen the commit record which includes a list of all committed
subtransactions.


Based on that and your comments further down in your reply (and that no one spoke up and disagreed with you) It sounds like that doing (1) isn't going to be practical.

I also don't think 1) would be particularly welcome by people trying to
replicate into foreign systems.


They could still sort the changes into transaction groups before applying to the foreign system.


I planned to have some cutoff 'max_changes_in_memory_per_txn' value. If it has
been reached for one transaction all existing changes are spilled to disk. New
changes again can be kept in memory till its reached again.

Do you want max_changes_per_in_memory_txn or do you want to put a limit on the total amount of memory that the cache is able to use? How are you going to tell a DBA to tune max_changes_in_memory_per_txn? They know how much memory their system has and that they can devote to the apply cache versus other things, giving them guidance on how estimating how much open transactions they might have at a point in time and how many WAL change records each transaction generates seems like a step backwards from the progress we've been making in getting Postgresql to be easier to tune. The maximum number of transactions that could be opened at a time is governed by max_connections on the master at the time the WAL was generated , so I don't even see how the machine processing the WAL records could autotune/guess that.



We need to support serializing the cache for crash recovery + shutdown of the
receiving side as well. Depending on how we do the wal decoding we will need
it more frequently...


Have you described your thoughts on crash recovery on another thread?

I am thinking that this module would have to serialize some state everytime it calls cache->commit() to ensure that consumers don't get invoked twice on the same transaction.

If the apply module is making changes to the same backend that the apply cache serializes to then both the state for the apply cache and the changes that committed changes/transactions make will be persisted (or not persisted) together. What if I am replicating from x86 to x86_64 via a apply module that does textout conversions?

x86         Proxy                                 x86_64
----WAL------> apply
                     cache
                      |   (proxy catalog)
                      |
                     apply module
                      textout  --------------------->
                                      SQL statements


How do we ensure that the commits are all visible(or not visible) on the catalog on the proxy instance used for decoding WAL, the destination database, and the state + spill files of the apply cache stay consistent in the event of a crash of either the proxy or the target? I don't think you can (unless we consider two-phase commit, and I'd rather we didn't). Can we come up with a way of avoiding the need for them to be consistent with each other?

I think apply modules will need to be able to be passed the same transaction twice (once before the crash and again after) and come up with a way of deciding if that transaction has a) Been applied to the translation/proxy catalog and b) been applied on the replica instance. How is the walreceiver going to decide which WAL sgements it needs to re-process after a crash? I would want to see more of these details worked out before we finalize the interface between the apply cache and the apply modules and how the serialization works.


Code Review
=========

applycache.h
-----------------------
+typedef struct ApplyCacheTupleBuf
+{
+    /* position in preallocated list */
+    ilist_s_node node;
+
+    HeapTupleData tuple;
+    HeapTupleHeaderData header;
+    char data[MaxHeapTupleSize];
+} ApplyCacheTupleBuf;

Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big the data in the transaction is? Wouldn't workloads with inserts of lots of small rows in a transaction eat up lots of memory that is allocated but storing nothing? The only alternative I can think of is dynamically allocating these and I don't know what the cost/benefit of that overhead will be versus spilling to disk sooner.

+* FIXME: better name
+ */
+ApplyCacheChange*
+ApplyCacheGetChange(ApplyCache*);

How about:

ApplyCacheReserveChangeStruct(..)
ApplyCacheReserveChange(...)
ApplyCacheAllocateChange(...)

as ideas?
+/*
+ * Return an unused ApplyCacheChange struct
 +*/
+void
+ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);

ApplyCacheReleaseChange(...) ? I keep thinking of 'Return' as us returning the data somewhere not the memory.


applycache.c:
-------------------

I've taken a quick look through this file and I don't see any issues other than the many FIXME's and other issues you've identified already, which I don't expect you to address in this CF.

Andres



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to