Hi,

On Thu, Jun 18, 2026 at 3:24 PM Masahiko Sawada <[email protected]> wrote:
>
> Hi all,
> (CC'ing people who worked on DDL deparse)

Thanks for working on this feature!

Please note that I haven't spent enough time reading all the older
threads in this area, so I'm still building my understanding. Kindly
bear with me

> DDL Deparse has been developed[1][2] for quite a long time and was
> originally proposed as a building block of DDL replication. Reading
> the related discussion threads, there is an agreement on implementing
> DDL replication on top of DDL deparse, and some hackers prefer the
> format of its output.

Could you please summarize the other approaches for DDL replication,
their pros and cons, and why the deparsing approach is preferred? I
have a vague understanding, but summarizing it here would help
reviewers a lot. Thank you!

> I've reviewed the last proposed approach and
> researched DDL deparse/DDL replication, and there are some points that
> are unclear to me, and things have changed since it was actively
> developed. So I've started this thread separately from the DDL
> replication thread in order to discuss DDL deparse itself while
> working toward DDL replication development.

+1 to splitting things for better discussion.

> Quick summary of the last developed DDL deparse feature[3]: the basic
> functionality is that it takes a parse tree as an input and constructs
> DDLs by retrieving the information from system catalogs based on the
> parse tree.

The parse tree is what gets generated when the DDL command is being
executed, right? I mean, we can't generate the DDL statement as an
after-the-fact operation? I haven't studied the pg_get_table_ddl patch
yet.

> The output format is self-documenting JSON, enabling us to
> easily do table name mapping or schema name mapping while
> reconstructing a DDL command.

+1 to starting with JSON. It can later be extended with more optimized
formats (binary) for sending over the network.

> For instance, deparsing "create table
> test (a int)" produces:
>
> {
>   "fmt": "CREATE TABLE %{identity}D (%{table_elements:, }s)",
>   "identity": {
>     "objname": "test",
>     "schemaname": "public"
>   },
>   "table_elements": [
>     {
>       "fmt": "%{name}I %{coltype}T STORAGE %{colstorage}s",
>       "name": "a",
>       "type": "column",
>       "coltype": {
>         "typmod": "",
>         "typarray": false,
>         "typename": "int4",
>         "schemaname": "pg_catalog"
>         },
>       "colstorage": "PLAIN"
>     }
>   ]
> }

Do we need an LSN+timeline, creation time, or some sort of ordering ID
field for establishing the order of operations (e.g., DDL command1
executed before command2)? I understand that when used for DDL
replication it does have an LSN field, but for other DDL deparsing
use-cases it would also be helpful.

> The main point that I want to discuss is what output we expect from
> DDL deparse, especially CREATE DDLs. I originally thought that DDL
> deparse converts the parse tree back into the DDL command originally
> executed. However, what DDL deparse for CREATE TABLE actually does is
> to generate possibly multiple DDLs to achieve the exact same catalog
> state. That is, the deparsed command doesn't necessarily preserve the
> user intent in the original DDL command, and possibly generates
> multiple commands for the one command. For instance:
>
> I haven't seen any discussion on whether it's architecturally correct
> for a DDL deparser not to preserve the user intent.

I briefly played with MySQL binlog replication and noticed that it
doesn't decompose DDLs. I haven't read their documentation, but I
believe there's a concern with decomposing a single DDL into multiple
DDLs - how would the consumer make it atomic and crash-safe alongside
other concurrent DDLs? Say the consumer is another PostgreSQL database
and it wants to create a table along with indexes, FKs, constraints,
etc. If we decompose the single DDL into multiple DDLs, how can the
consumer replay them together atomically and be crash-safe?

> If it's okay for DDL deparse to expand one DDL command into multiple
> ones, it's very similar to what pg_get_table_ddl()[4] does. The only
> difference between the two is the output format. I guess we could add
> an option to the SQL function to output DDLs as a JSON blob. I don't
> think we want to maintain two features if they provide very similar
> functionality. Also, I'm not if it could be useful other than DDL
> replication use cases.
>
> I personally think that DDL deparse should preserve (and possibly
> normalize) the user intent, producing the following query, for
> example:

My initial thought is that it should preserve user intent, but I could be wrong.

> Feedback is very welcome.

I did a quick pass over the patch. The code looks a bit lengthy - is
it possible to split it up? For example, JSON-related helpers first,
then basic CREATE TABLE, then sequences, then constraints, then ALTER
TABLE, etc.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com


Reply via email to