Hi! I am rather pleased to announce the next version of the changeset extraction patchset. Thanks to help from a large number of people I think we are slowly getting to the point where it is getting committable.
Since the last submitted version (20121115002746.ga7...@awork2.anarazel.de) a large number of fixes and the result of good amount of review has been added to the tree. All bugs known to me have been fixed. Fixes include: * synchronous replication support * don't peg the xmin for user tables, do it only for catalog ones. * arbitrarily large transaction support by spilling large transactions to disk * spill snapshots to disk, so we can restart without waiting for a new snapshot to be built * Don't read all WAL from the establishment of a logical slot * tests via SQL interface to changeset extraction The todo list includes: * morph the "logical slot" interface into being "replication slots" that can also be used by streaming replication * move some more code from snapbuild.c to decode.c to remove a largely duplicated switch * do some more header/comment cleanup & clarification * move pg_receivellog into its own directory in src/bin or contrib/. * user/developer level documentation The patch series currently has two interfaces to logical decoding. One - which is primarily useful for pg_regress style tests and playing around - is SQL based, the other one uses a walsender replication connection. A quick demonstration of the SQL interface (server needs to be started with wal_level = logical and max_logical_slots > 0): =# CREATE EXTENSION test_logical_decoding; =# SELECT * FROM init_logical_replication('regression_slot', 'test_decoding'); slotname | xlog_position -----------------+--------------- regression_slot | 0/17D5908 (1 row) =# CREATE TABLE foo(id serial primary key, data text); =# INSERT INTO foo(data) VALUES(1); =# UPDATE foo SET id = -id, data = ':'||data; =# DELETE FROM foo; =# DROP TABLE foo; =# SELECT * FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '0'); location | xid | data -----------+-----+-------------------------------------------------------------------------------- 0/17D59B8 | 695 | BEGIN 0/17D59B8 | 695 | COMMIT 0/17E8B58 | 696 | BEGIN 0/17E8B58 | 696 | table "foo": INSERT: id[int4]:1 data[text]:1 0/17E8B58 | 696 | COMMIT 0/17E8CA8 | 697 | BEGIN 0/17E8CA8 | 697 | table "foo": UPDATE: old-pkey: id[int4]:1 new-tuple: id[int4]:-1 data[text]::1 0/17E8CA8 | 697 | COMMIT 0/17E8E50 | 698 | BEGIN 0/17E8E50 | 698 | table "foo": DELETE: id[int4]:-1 0/17E8E50 | 698 | COMMIT 0/17E9058 | 699 | BEGIN 0/17E9058 | 699 | COMMIT (13 rows) =# SELECT * FROM pg_stat_logical_decoding ; slot_name | plugin | database | active | xmin | restart_decoding_lsn -----------------+---------------+----------+--------+------+---------------------- regression_slot | test_decoding | 12042 | f | 695 | 0/17D58D0 (1 row) =# SELECT * FROM stop_logical_replication('regression_slot'); stop_logical_replication -------------------------- 0 The walsender interface has the same calls INIT_LOGICAL_REPLICATION 'slot' 'plugin'; START_LOGICAL_REPLICATION 'slot' restart_lsn [(option value)*]; STOP_LOGICAL_REPLICATION 'slot'; The only difference is that START_LOGICAL_REPLICATION can stream changes and it can support synchronous replication. The output seen in the 'data' column is produced by a so called 'output plugin' which users of the facility can write to suit their needs. They can be written by implementing 5 functions in the shared object that's passed to init_logical_replication() above: * pg_decode_init (optional) * pg_decode_begin_txn * pg_decode_change * pg_decode_commit_txn * pg_decode_cleanup (optional) The most interesting function pg_decode_change get's passed a structure containing old/new versions of the row, the 'struct Relation' belonging to it and metainformation about the transaction. The output plugin can rely on syscache lookups et al. to decode the changed tuple in whatever fashion it wants. I'd like to invite reviewers to first look at: * the output plugin interface * the walsender/SRF interface * patch 12 which contains most of the code When reading the code, the information flow during decoding might be interesting: --------------- +---------------+ | XLogReader | +---------------+ | XLOG Records | v +---------------+ | decode.c | +---------------+ | | | | v | +---------------+ | | snapbuild.c | HeapTupleData +---------------+ | | | catalog snapshots | | | v v +---------------+ |reorderbuffer.c| +---------------+ | HeapTuple & Metadata | v +---------------+ | Output Plugin | +---------------+ | Whatever you want | v +---------------+ | Output Handler| | | |WalSnd or SRF | +---------------+ --------------- Overview of the attached patches: 0001: indirect toast tuples; required but submitted independently 0002: functions for testing; not required, 0003: (tablespace, filenode) syscache; required 0004: RelationMapFilenodeToOid: required, simple 0005: pg_relation_by_filenode() function; not required but useful 0006: Introduce InvalidCommandId: required, simple 0007: Adjust Satisfies* interface: required, mechanical, 0008: Allow walsender to attach to a database: required, needs review 0009: New GetOldestXmin() parameter; required, pretty boring 0010: Log xl_running_xact regularly in the bgwriter: required 0011: make fsync_fname() public; required, needs to be in a different file 0012: Relcache support for an Relation's primary key: required 0013: Actual changeset extraction; required 0014: Output plugin demo; not required (except for testing) but useful 0015: Add pg_receivellog program: not required but useful 0016: Add test_logical_decoding extension; not required, but contains the tests for the feature. Uses 0014 0017: Snapshot building docs; not required Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers