> On 23 Jun 2026, at 03:17, Kirk Wolak <[email protected]> wrote:
> 
> Comments welcome. I am happy to adjust naming, documentation, or scope based 
> on feedback.

Hi Kirk,

+1 on the problem. Excluding a few huge tables to keep the dump fast,
but still wanting a loadable slot so a recent slice can be dropped in
and restored in the normal order (data before indexes/FKs, no manual
reindex) is a real, recurring need. I'd be glad to see it solved.

For the archive: the broader "filter table data on dump" idea has come
up many times, and it may help to frame this patch against that history:

  - 2008, Simon Riggs: pg_dump -w for sampling [0]. Redirected -
    "why not COPY (SELECT ...) TO STDOUT".
  - 2018, Carter Thaxton: --include-table-data-where, then --where [1].
    Euler Taveira's concerns: a filtered subset "won't restore" (FKs),
    and whether pg_dump is the right place for an ETL-ish parameter;
    plus table:clause quoting/colon parsing. Returned with feedback.
  - 2020, Surafel Temesgen: --where revival [2]. Daniel Gustafsson
    flagged arbitrary user SQL as a search_path / CVE-2018-1058-class
    hazard; no tests. Returned with feedback.
  - 2022, Nikita Starovoitov: partial data dumps [3]. Stalled on
    referential integrity - by the author's own account he could not
    solve it within the dump.
  - 2025, your own --filter-data PoC [4]. Still open; Tom suggested
    partitioning, Greg suggested dumping a view (-t foo=view:fooslice).

Two objections recur across those: "just use COPY (SELECT ...) TO",
and "a raw row filter produces a dump that won't restore (FKs)". The
placeholder approach sits entirely clear of both. There is no user
WHERE clause, so no ETL surface, no search_path hazard, and no
consistency promise: as Carter noted in 2018, --where=table:false
already behaves exactly like --exclude-table-data, and the placeholder
is precisely that corner - identical restore-risk, it just keeps a
loadable empty TABLE DATA entry instead of dropping it. And the reason
this belongs in pg_dump rather than a hand-rolled \copy is the point
David G. Johnston made in 2018: it lets you lean on pg_dump's
dependency resolution - which is exactly what the placeholder buys, the
(externally produced) data lands on the normal restore path, ahead of
indexes and FKs, no post-restore reindex. So I'd pitch it as "reserve a
loadable slot", not "lightweight --where".

On the interface, a few options to discuss:

  1. As posted: --create-table-data-placeholders as a global modifier
     to --exclude-table-data. Works, but "create" already means CREATE
     DATABASE in pg_dump, it's all-or-nothing across excluded tables,
     and it couples two options.
  2. A positive pattern option, e.g. --empty-table-data=PATTERN (plus
     --empty-table-data-and-children), mirroring the --exclude-table-data
     family. It folds the two current flags into one, reads as "include
     the table, empty", and gives per-table control. Slight preference
     here.
  3. A matching action for the table_data object type in the --filter
     file, for people (like you) who keep the big-table list in a file.
     Heavier - it's a grammar change to a shared file format - so I'd
     see it as an addition to 2, not a replacement.

WDYT?

Thank you!

Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/[email protected]
[1] 
https://www.postgresql.org/message-id/flat/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=pxktetcxxg4kcut...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/calay4q8o00sg5nemi2auqnulvmglar6br+yrvm41zuspjm2...@mail.gmail.com
[3] 
https://www.postgresql.org/message-id/flat/cac5einmthnpakugdj0d8-kfn21hpqnouvdamoha4tdmzgyo...@mail.gmail.com
[4] 
https://www.postgresql.org/message-id/flat/CACLU5mS07WGPpq6=m8ac5tumusnpl8fhbpnyem86isawegc...@mail.gmail.com



Reply via email to