Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

Steve Singer Wed, 20 Jun 2012 17:17:44 -0700

On 12-06-13 07:28 AM, Andres Freund wrote:

From: Andres Freund<and...@anarazel.de>


The individual changes need to be identified by an xid. The xid can be a
subtransaction or a toplevel one, at commit those can be reintegrated by doing
a k-way mergesort between the individual transaction.

Callbacks for apply_begin, apply_change and apply_commit are provided to
retrieve complete transactions.

Missing:
- spill-to-disk
- correct subtransaction merge, current behaviour is simple/wrong
- DDL handling (?)
- resource usage controls

Here is an initial review of the ApplyCache patch.

This patch provides a module for taking actions in the WAL stream andgroups the actions by transaction, then passing these change records toa set of plugin functions.

For each transaction it encounters it keeps a list of the actions inthat transaction. The ilist included in an earlier patch is used,changes resulting from that patch review would effect the code here butnot in a way that chances the design. When the module sees a commit fora transaction it calls the apply_change callback for each change.

I can think of three ways that a replication system like this could tryto apply transactions.

1) Each time it sees a new transaction it could open up a newtransaction on the replica and makes that change. It leaves thetransaction open and goes on applying the next change (which might befor the current transaction or might be for another one).When it comes across a commit record it would then commit thetransaction. If 100 concurrent transactions were open on the originthen 100 concurrent transactions will be open on the replica.

2) Determine the commit order of the transactions, group all the changesfor a particular transaction together and apply them in that order forthe transaction that committed first, commit that transaction and thenmove onto the transaction that committed second.

3) Group the transactions in a way that you move the replica from oneconsistent snapshot to another. This is what Slony and Londiste dobecause they don't have the commit order or commit timestamps. Built-inreplication can do better.

This patch implements option (2). If we had a way of implementingoption (1) efficiently would we be better off?

Option (2) requires us to put unparsed WAL data (HeapTuples) in theapply cache. You can't translate this to an independent LCR until youcall the apply_change record (which happens once the commit isencountered). The reason for this is because some of the changes mightbe DDL (or things generated by a DDL trigger) that will change thetranslation catalog so you can't translate the HeapData to LCR's untilyour at a stage where you can update the translation catalog. In bothcases you might need to see later WAL records before you can convert anearlier one into an LCR (ie TOAST).


Some of my concerns with the apply cache are

Big transactions (bulk loads, mass updates) will be cached in the applycache until the commit comes along. One issue Slony has todo with bulkoperations is that the replicas can't start processing the bulk INSERTuntil after it has commited. If it takes 10 hours to load the data onthe master it will take another 10 hours (at best) to load the data intothe replica(20 hours after you start the process). With binarystreaming replication your replica is done processing the bulk updateshortly after the master is.

Long running transactions can sit in the cache for a long time. Whenyou spill to disk we would want the long running but inactive onesspilled to disk first. This is solvable but adds to the complexity ofthis module, how were you planning on managing which items of the listget spilled to disk?

The idea that we can safely reorder the commands into transactionalgroupings works (as far as I know) today because DDL commands get bigheavy locks that are held until the end of the transaction. I thinkRobert mentioned earlier in the parent thread that maybe some of thatwill be changed one day.


The downsides of (1) that I see are:

We would want a single backend to keep open multiple transactions atonce. How hard would that be to implement? Would subtransactions be goodenough here?

Applying (or even translating WAL to LCR's) the changes in parallelacross transactions might complicate the catalog structure because eachconcurrent transaction might need its own version of the catalog (or canyou depend on the locking at the master for this? I think you can today)

With approach (1) changes that are part of a rolledback transactionwould have more overhead because you would call apply_change on them.

With approach (1) a later component could still group the LCR's bytransaction before applying by running the LCR's through a datastructure very similar to the ApplyCache.

I think I need more convincing that approach (2), what this patchimplements, is the best way doing things, compared (1). I will hold offon a more detailed review of the code until I get a better sense of ifthe design will change.


Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

Reply via email to