On Mon, Feb 9, 2026 at 2:42 AM Vitaly Davydov <[email protected]> wrote:
>
> Dear Masahiko, Ashutosh,
>
> Thank you for the feedback!
>
> On 2/4/26 10:39, Masahiko Sawada wrote
>  > While it sounds challenging, it sounds promising.
>
> I consider decoding of catalog changes from the WAL promising as well. It 
> hides
> all the complexity of DDL queries behind simple catalog changes. CTAS with 
> temp
> tables, local context objects (like function variables and arguments) in ddl,
> SELECT somefunc() queries with functions where DDL commands are executed - ddl
> is pretty diverse and original queries too hard to handle. I see the only
> suitable way - to decode WAL changes in system catalog.
>
> The idea to use a stable representation of DDL is reasonable, but creating a 
> new
> representation would add more maintenance burden, than reusing some existing
> stuff. There is an idea to use existing data structures. A possible simple way
> to represent catalog changes as a list inserted/deleted system catalog tuples
> logically separated into a number of simple "atomic" DDL commands. Using xid 
> and
> cid may help to make a logical separation of system catalog changes into
> a number of atomic ddl commands.
>
> In case of system table changes, we may add some handling of different 
> versions.
> Changes may be directly applied on a postgresql replica by inserting the 
> tuples
> into system tables as is.

We cannot apply the changes to system catalogs made in the publisher
to the subscriber as is because we don't support system catalog
compatibility across major versions.

>
> On 2/4/26 14:44, Ashutosh Bapat wrote:
>  > Consider an example of ALTER TABLE tab ADD PRIMARY KEY (id) where id
>  > is an existing column in table tab. This will add a constraint on the
>  > table and also create an index. Thus effectively two DDLs will be
>  > executed. If we decode catalog changes we need to make sure that only
>  > the original DDL is replicated; else the apply worker downstream will
>  > cause an ERROR, stalling the replication. Similar is the case with
>  > CREATE TABLE ... AS ... - we should only replicate the CREATE TABLE
>  > and let the regular replication handle data replication. I think we
>  > need to somehow annotate the WAL containing catalog changes to
>  > indicate whether those represent the original DDL or derived DDL and
>  > decode only the WAL corresponding to the original DDL.
>
> We may gather all the changed (inserted) system catalog tuples belonging to
> a command and then decide which command is represented by these changes. For,
> ALTER TABLE ADD PRIMARY KEY we gather inserted tuples in pg_constraints,
> pg_index, pg_attribute and then make a decision that these tuples are related
> to each other and represent a primary key.

I think we need to check not only what system catalogs were changed
but also how system catalog tuples were changed in order to determine
what is the original DDL query. For example, both ALTER TABLE ... SET
STORAGE EXTERNAL and ALTER TABLE ... SET COMPRESSION pglz change the
corresponding pg_attribute tuple but we need to generate different DDL
queries from them.

Also, we need to consider whether we need to replicate the information
that is not present in WAL records of system catalog changes, e.g.,
CASCADE or RESTRICT of DROP TABLE. The subscriber might have a view
referring to the table being dropped on the publisher. If we send only
a delete change of pg_class to delete the table, applying the delete
change on the subscriber might fail due to depending objects.

Similarly, witn the idea of sending DDL queries to subscribers, we
would need to modify the executed DDL query to reflect the publication
definition. For example, while the publisher has table T1, T2, and T3,
the subscriber has only T1, and T1 is being replicated from the
publisher to the subscriber. If user executes DROP TABLE T1, T2, T3,
the logical replication should send DROP TABLE T2 (or equivalent
information) instead of the original DDL query as it is. Also, a
similar consideration would be required for publications with
publish_via_partition_root being true.

> When we decode final changes, it is much easier to handle CTAS because such
> statements may be too complex, like CREATE TABLE AS SELECT somefunc(). Only
> final changes in the WAL will help us to decode table columns and its types.
> Attempt to create a derived ddl sql seems to be very hard to implement.

There might be cases where we might want to send DDLs instead of
system catalog changes. Taking an example from your recent mail, ALTER
TABLE t ALTER COLUMN x TYPE double precision; command writes
pg_attribute changes with inserting rewrote tuples. If we improve many
ALTER TABLE sub commands so that we can execute them with a lesser
lock level or without table writes, it would be better to delegate how
to execute changing the column type to the subscriber that might be a
newer version.

One idea I'm experimenting with is that we define an abstract data
type that can represent a DDL (like CollectedCommand) and write it to
a new WAL record so that logical decoding processes it. For CREATE
DDLs, we can use pg_get_xxx_def() function while using a historical
snapshot to get the DDLs. We would need to implement the codes to
generate DROP and ALTER DDLs from the data. I believe DROP DDLs would
not be hard. For ALTER DDLs, we would incur the initial implementation
costs, but we would not change these codes often.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to