On Tue, Apr 21, 2026 at 4:45 AM Masahiko Sawada <[email protected]> wrote: > > On Mon, Feb 23, 2026 at 5:21 PM Masahiko Sawada <[email protected]> wrote: > > > > > > One idea I'm experimenting with is that we define an abstract data > > type that can represent a DDL (like CollectedCommand) and write it to > > a new WAL record so that logical decoding processes it. For CREATE > > DDLs, we can use pg_get_xxx_def() function while using a historical > > snapshot to get the DDLs. We would need to implement the codes to > > generate DROP and ALTER DDLs from the data. I believe DROP DDLs would > > not be hard. For ALTER DDLs, we would incur the initial implementation > > costs, but we would not change these codes often. > > > > DDL support for logical replication is one of the biggest missing > pieces in logical replication. I'd like to resume this work for PG20. > > We made a lot of effort on this feature through 2022 and 2023, but the > development is currently inactive. The last patch was submitted on Jul > 18, 2023. I've reviewed the previous patches and discussions, and I > would like to summarize how DDL replication was implemented, the main > reasons it stalled, and propose an alternative design to address those > problems. > > The overall idea of the previous patch set was to implement DDL > deparsing and utilize it for DDL replication. It converted a parse > tree into a JSON string. For instance, if a user executes "DROP TABLE > t1", the deparser generates from its parse tree: > > {DROPSTMT :objects (("t1")) :removeType 41 :behavior 0 :missing_ok > false :concurrent false} > > to: > > {"fmt": "DROP TABLE %{objidentity}s", "objidentity": "public.t1"} > > This JSON string is self-documenting, meaning someone who gets it can > easily reconstruct the original DDL with schema-qualified object > names. In a dedicated event trigger for logical replication, we > deparsed the parse tree of a DDL, wrote it into a WAL record, and then > the logical decoding processed it similarly to DML changes. > > While there are several benefits to the JSON data approach mentioned > in the wiki [1] -- most notably the flexibility to easily remap > schemas (e.g., mapping "schema A" on the publisher to "schema B" on > the subscriber) -- there was a major concern: the huge maintenance > burden. >
Yes, there will be a maintenance cost of JSON-based deparsing approach. But note that multiple senior people (Alvaro Herrera, Robert Haas) [1] seems to favor that approach. So, I am not sure we can conclude to abandon that approach without those people or some other senior people agreeing to abandon it. To be clear, I am not against considering a new/different approach for DDL replication but just that it is not clear that old/existing approach can be ruled out without more discussion on it, We would need to maintain the JSON serialization code whenever > creating or modifying parse nodes, regardless of whether the changes > were related to DDL replication. IIUC, this was the primary reason the > feature didn't cross the finish line. > > Additionally, I think there is another design issue: it is not > output-plugin agnostic. Since the deparsed DDL was written by a > logical-replication-specific event trigger, third-party logical > decoding plugins cannot easily detect DDL events. > Why RmgrId like RM_LOGICALDDLMSG_ID and XLOG_LOGICAL_DDL_MESSAGE wal info is not sufficient for this? Decoder will add a message like REORDER_BUFFER_CHANGE_DDL which can be used to detect DDL message, no? > Ideally, we should > write DDL information into a WAL record natively when > wal_level='logical' (or additionally when a GUC enables DDL events > WAL-logging) so that all decoding plugins can detect them. This also > allows us to test DDL logical decoding with test_decoding without > setting up a full logical replication subscription. > > To address these two points, I'd like to propose an alternative > approach: we introduce a new data type, say DDLCommand, that is > self-contained to represent a DDL (like CollectedCommand), and don't > rely on event triggers. It would have the command type (and subtype if > required), the OIDs of the target object and its namespace, and the > OID of the user who executed the DDL. We write it to a new WAL record > at appropriate places during DDL execution, and the logical decoding > layer passes the data to output plugins. That way, any logical > decoding plugin can detect DDL changes, and it's up to the plugins how > to decode the DDL information. > > In pgoutput, for CREATE DDLs, we can use the pg_get_xxx_ddl() > functions while using a historical snapshot to get the DDLs, saving > maintenance costs. We would still need to implement the code to > generate DROP and ALTER DDLs from the data. I believe DROP DDLs would > not be hard. For ALTER DDLs, we would incur an initial implementation > cost, but we would not need to change this code often. We can > implement the DDL generation code in a way that improves ddlutils.c. > > Also, because DDLCommand is separated from parse nodes, we only need > to change the DDL deparse/replication code when it is actually needed. > Additionally, this approach would eliminate the code around the > two-step process (using DCT_TableDropStart and DCT_TableDropEnd) for > DROP TABLE. While it would miss the flexibility benefits that the JSON > deparsing approach has, I guess it would not be very hard to implement > the mapping in the deparse layer even without the JSON data. > Possible but the point was flexibility and ease with which users can implement mapping with JSON approach. > > FYI I've experimented with auto-generation approaches too. For > instance, gen_node_support.pl generates C code that converts parse > nodes to the corresponding text representations. Or > gen_node_support.pl generates C code that makes all objects in the > given SQL query text fully-schema qualified. While these ideas are > promising they didn't help reduce the maintenance burden much as the > parse node definitions are already complex and vary on nodes much. > Yeah, this is my recollection of a previous attempt for auto_generating the deparsing code. [1]: https://www.postgresql.org/message-id/CA%2BTgmoauXRQ3yDZNGTzXv_m1kdUnH1Ww%2BhwKmKUSjtyBh0Em2Q%40mail.gmail.com -- With Regards, Amit Kapila.
