Attaching files now...
On Mon, 16 Jun 2025, Dimitrios Apostolou wrote:
Attaching v3 of the patch, together with a new test file that tests
previously untested flags of pg_restore.
Added to July's commitfest:
https://commitfest.postgresql.org/patch/5821/
On Thu, 12 Jun 2025, Dimitrios Apostolou wrote:
I wonder about the following in pg_restore.c.
Right now my implementation covers only parallel restore.
In the case of non-parallel restore, I want to make the behaviour similar,
i.e. each worker to issue a TRUNCATE before COPY starts.
But then the StartTransaction() doesn't make sense, as everything might
already be in a transaction because of --single-transaction.
Should I completely skip StartTransaction() if !is_parallel?
In the end, I followed the same code path for both parallel and non-parallel
pg_restore. It works even with --single-transaction, where apparently there
are several nested transactions.
The tests pass, so I assume it's OK to start a subtransaction for each table
restoration inside the global single-transaction.
Dimitris
From 90341ba3742ebf5679766a48799c7fb3975c6e3a Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <ji...@qt.io>
Date: Sat, 12 Apr 2025 01:59:45 +0200
Subject: [PATCH v3 1/2] pg_restore --clean --data-only
Issues a TRUNCATE before COPYing the data into the tables, within a
transaction.
As a result it avoids logging to the WAL, when combined with
wal_level=minimal.
---
doc/src/sgml/ref/pg_restore.sgml | 15 +++++++++++++++
src/bin/pg_dump/pg_backup.h | 1 +
src/bin/pg_dump/pg_backup_archiver.c | 12 ++++++++++--
src/bin/pg_dump/pg_restore.c | 15 +++++++++++----
src/bin/pg_dump/t/001_basic.pl | 6 ------
5 files changed, 37 insertions(+), 12 deletions(-)
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 8c88b07dcc8..86aacab588d 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -121,18 +121,33 @@ PostgreSQL documentation
<listitem>
<para>
Before restoring database objects, issue commands
to <command>DROP</command> all the objects that will be restored.
This option is useful for overwriting an existing database.
If any of the objects do not exist in the destination database,
ignorable error messages will be reported,
unless <option>--if-exists</option> is also specified.
</para>
+ <para>
+ If the schema is not restored but the data is restored (for example
+ using <option>--data-only</option>
+ or <option>--section=data</option>), then instead of DROP, a TRUNCATE
+ will be attempted before COPYing the data. So if you want to
+ overwrite an existing database without re-defining the schema, then
+ issue <option>--data-only --clean</option>.
+ </para>
+ <para>
+ Together with <option>--parallel</option> it is a high performance
+ way to load the tables, as it avoids logging to the WAL (if the
+ server is configured with <option>wal_level=minimal</option>).
+ Warning: foreign key constraints might cause table truncation to
+ fail.
+ </para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-C</option></term>
<term><option>--create</option></term>
<listitem>
<para>
Create the database before restoring into it.
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index af0007fb6d2..9428b362c92 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -99,18 +99,19 @@ typedef struct _restoreOptions
int noOwner; /* Don't try to match original object owner */
int noTableAm; /* Don't issue table-AM-related commands */
int noTablespace; /* Don't issue tablespace-related commands */
int disable_triggers; /* disable triggers during data-only
* restore */
int use_setsessauth; /* Use SET SESSION AUTHORIZATION commands
* instead of OWNER TO */
char *superuser; /* Username to use as superuser */
char *use_role; /* Issue SET ROLE to this */
+ int clean;
int dropSchema;
int disable_dollar_quoting;
int dump_inserts; /* 0 = COPY, otherwise rows per INSERT */
int column_inserts;
int if_exists;
int no_comments; /* Skip comments */
int no_policies; /* Skip row security policies */
int no_publications; /* Skip publication entries */
int no_security_labels; /* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 197c1295d93..157ce5b82fc 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -978,34 +978,42 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
pg_log_info("processing data for table \"%s.%s\"",
te->namespace, te->tag);
/*
* In parallel restore, if we created the table earlier in
* this run (so that we know it is empty) and we are not
* restoring a load-via-partition-root data item then we
* wrap the COPY in a transaction and precede it with a
- * TRUNCATE. If wal_level is set to minimal this prevents
+ * TRUNCATE.
+ *
+ * Likewise if the table was pre-existing and the data is
+ * being restored with --clean.
+ *
+ * If wal_level is set to minimal this prevents
* WAL-logging the COPY. This obtains a speedup similar
* to that from using single_txn mode in non-parallel
* restores.
*
* We mustn't do this for load-via-partition-root cases
* because some data might get moved across partition
* boundaries, risking deadlock and/or loss of previously
* loaded data. (We assume that all partitions of a
* partitioned table will be treated the same way.)
*/
- use_truncate = is_parallel && te->created &&
+ use_truncate = ((is_parallel && te->created) || ropt->clean) &&
!is_load_via_partition_root(te);
if (use_truncate)
{
+ pg_log_debug("BEGIN transaction and TRUNCATE table \"%s.%s\"",
+ te->namespace, te->tag);
+
/*
* Parallel restore is always talking directly to a
* server, so no need to see if we should issue BEGIN.
*/
StartTransaction(&AH->public);
/*
* Issue TRUNCATE with ONLY so that child tables are
* not wiped.
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index c4b6214d618..002e62d13e0 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -206,19 +206,19 @@ main(int argc, char **argv)
while ((c = getopt_long(argc, argv, "acCd:ef:F:gh:I:j:lL:n:N:Op:P:RsS:t:T:U:vwWx1",
cmdopts, NULL)) != -1)
{
switch (c)
{
case 'a': /* Dump data only */
data_only = true;
break;
case 'c': /* clean (i.e., drop) schema prior to create */
- opts->dropSchema = 1;
+ opts->clean = 1;
break;
case 'C':
opts->createDB = 1;
break;
case 'd':
opts->cparams.dbname = pg_strdup(optarg);
break;
case 'e':
opts->exit_on_error = true;
@@ -419,21 +419,18 @@ main(int argc, char **argv)
/* reject conflicting "with-" and "no-" options */
if (with_data && no_data)
pg_fatal("options --with-data and --no-data cannot be used together");
if (with_schema && no_schema)
pg_fatal("options --with-schema and --no-schema cannot be used together");
if (with_statistics && no_statistics)
pg_fatal("options --with-statistics and --no-statistics cannot be used together");
- if (data_only && opts->dropSchema)
- pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
-
if (opts->single_txn && opts->txn_size > 0)
pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
/*
* -C is not compatible with -1, because we can't create a database inside
* a transaction block.
*/
if (opts->createDB && opts->single_txn)
pg_fatal("options -C/--create and -1/--single-transaction cannot be used together");
@@ -450,18 +447,28 @@ main(int argc, char **argv)
* caused an error in one of the checks above.
*/
opts->dumpData = ((opts->dumpData && !schema_only && !statistics_only) ||
(data_only || with_data)) && !no_data;
opts->dumpSchema = ((opts->dumpSchema && !data_only && !statistics_only) ||
(schema_only || with_schema)) && !no_schema;
opts->dumpStatistics = ((opts->dumpStatistics && !schema_only && !data_only) ||
(statistics_only || with_statistics)) && !no_statistics;
+ /*
+ * If --clean has been issued and the SQL schema is being restored, then
+ * clear the schema first and recreate it. No need to clear data.
+ */
+ if (opts->clean && opts->dumpSchema)
+ {
+ opts->clean = 0;
+ opts->dropSchema = 1;
+ }
+
opts->disable_triggers = disable_triggers;
opts->enable_row_security = enable_row_security;
opts->noDataForFailedTables = no_data_for_failed_tables;
opts->noTableAm = outputNoTableAm;
opts->noTablespace = outputNoTablespaces;
opts->use_setsessauth = use_setsessauth;
opts->no_comments = no_comments;
opts->no_policies = no_policies;
opts->no_publications = no_publications;
diff --git a/src/bin/pg_dump/t/001_basic.pl b/src/bin/pg_dump/t/001_basic.pl
index 84ca25e17d6..d4e6748be87 100644
--- a/src/bin/pg_dump/t/001_basic.pl
+++ b/src/bin/pg_dump/t/001_basic.pl
@@ -95,24 +95,18 @@ command_fails_like(
[ 'pg_restore', '-d', 'xxx', '-f', 'xxx' ],
qr/\Qpg_restore: error: options -d\/--dbname and -f\/--file cannot be used together\E/,
'pg_restore: options -d/--dbname and -f/--file cannot be used together');
command_fails_like(
[ 'pg_dump', '-c', '-a' ],
qr/\Qpg_dump: error: options -c\/--clean and -a\/--data-only cannot be used together\E/,
'pg_dump: options -c/--clean and -a/--data-only cannot be used together');
-command_fails_like(
- [ 'pg_restore', '-c', '-a', '-f -' ],
- qr/\Qpg_restore: error: options -c\/--clean and -a\/--data-only cannot be used together\E/,
- 'pg_restore: options -c/--clean and -a/--data-only cannot be used together'
-);
-
command_fails_like(
[ 'pg_dump', '--if-exists' ],
qr/\Qpg_dump: error: option --if-exists requires option -c\/--clean\E/,
'pg_dump: option --if-exists requires option -c/--clean');
command_fails_like(
[ 'pg_dump', '-j3' ],
qr/\Qpg_dump: error: parallel backup only supported by the directory format\E/,
'pg_dump: parallel backup only supported by the directory format');
--
2.49.0
From 159cce30910425914648fde430deeda5be7a41e3 Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <ji...@qt.io>
Date: Mon, 16 Jun 2025 02:36:21 +0200
Subject: [PATCH v3 2/2] Add new test file with pg_restore test cases
Test restoring from custom format dumps, with and without offsets in the
table of contents - the latter happens when pg_dump writes to stdout.
Compare restored database contents to the original database.
Test parallel restore -j2 as well as sequential.
Test --single-transaction.
Test --data-only and --clean --data-only.
---
src/bin/pg_dump/t/007_pg_restore.pl | 150 ++++++++++++++++++++++++++++
1 file changed, 150 insertions(+)
create mode 100644 src/bin/pg_dump/t/007_pg_restore.pl
diff --git a/src/bin/pg_dump/t/007_pg_restore.pl b/src/bin/pg_dump/t/007_pg_restore.pl
new file mode 100644
index 00000000000..402d97fad81
--- /dev/null
+++ b/src/bin/pg_dump/t/007_pg_restore.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+my $port = $node->port;
+
+
+# TODO test that --section=post --clean, does not clean any tables.
+
+# Create database
+$node->safe_psql('postgres','
+ CREATE DATABASE db1;
+ \c db1
+ CREATE TABLE t1 (
+ i integer
+ );
+ INSERT INTO t1 VALUES (1), (2), (3), (4);
+ CREATE TABLE t2 (
+ t text
+ );
+ INSERT INTO t2 VALUES (\'a\'), (\'bb\'), (\'ccc\');');
+
+# Function to compare two databases of the above kind.
+sub compare_db_contents
+{
+ my ($db1, $db2, $should_match) = @_;
+ $should_match = 1
+ unless defined $should_match;
+
+ my $query = "
+ SELECT * FROM t1 ORDER BY i;
+ SELECT * FROM t2 ORDER BY t;
+ ";
+ my $result1 = $node->safe_psql($db1, $query);
+ my $result2 = $node->safe_psql($db2, $query);
+
+ if ($should_match)
+ {
+ is($result2, $result1, "The database contents should match");
+ } else {
+ isnt($result2, $result1, "The database contents should NOT match");
+ }
+}
+
+sub test_pg_restore
+{
+ my $dump_file = shift;
+ my $file_basename = File::Basename::basename($dump_file);
+
+ my @cmd = ( 'pg_restore', '--dbname', 'db1_restored' );
+ my $cmd_s = "pg_restore";
+
+ # Optionally this function takes a hash as last parameter.
+ if ($_[0])
+ {
+ my (%hash_args) = @_;
+ shift;
+ my @extra_args = $hash_args{'extra_args'};
+ if (@extra_args)
+ {
+ @cmd = (@cmd, @extra_args);
+ $cmd_s = "$cmd_s @extra_args";
+ }
+ }
+
+ $node->safe_psql(
+ 'postgres',
+ 'DROP DATABASE IF EXISTS db1_restored;');
+ ok(1, "clean up");
+
+ # Restore into a new database
+ $node->safe_psql('postgres',
+ 'CREATE DATABASE db1_restored;');
+ $node->command_ok([@cmd,
+ $dump_file],
+ "$cmd_s $file_basename");
+
+ # Verify restored db matches the dumped one
+ compare_db_contents('db1', 'db1_restored');
+
+ # Restore again with --data-only.
+ # Now the rows should be duplicate, the databases shouldn't match.
+ $node->command_ok([@cmd, '--data-only',
+ $dump_file],
+ "$cmd_s --data-only $file_basename");
+ compare_db_contents('db1', 'db1_restored', 0);
+
+ # Restore again with --data-only --clean.
+ # The database contents should match.
+ $node->command_ok([@cmd, '--clean', '--data-only',
+ $dump_file],
+ "$cmd_s --clean --data-only $file_basename");
+ compare_db_contents('db1', 'db1_restored');
+
+ # Restore from stdin.
+ my $stderr;
+ my $result = $node->run_log([@cmd, '--clean', '--data-only'],
+ ('<' => $dump_file,
+ '2>' => \$stderr));
+ if (grep {/^-j/} @cmd)
+ {
+ ok(!$result, "should fail: $cmd_s --clean --data-only < $file_basename");
+ chomp($stderr);
+ like($stderr,
+ '/parallel restore from standard input is not supported$/',
+ "stderr: parallel restore from standard input is not supported");
+ }
+ else
+ {
+ ok($result, "$cmd_s --clean --data-only < $file_basename");
+ compare_db_contents('db1', 'db1_restored');
+ }
+}
+
+
+# Basic dump
+my $d1 = "$tempdir/dump_file";
+$node->command_ok(['pg_dump', '--format=custom',
+ '--file', $d1, 'db1'],
+ 'pg_dump --format=custom --file dump_file');
+# Dump also to stdout, as the TOC doesn't contain offsets.
+my $d2 = "$tempdir/dump_file_stdout";
+my $result = $node->run_log(['pg_dump', '--format=custom', 'db1'],
+ ('>' => $d2));
+ok($result, "pg_dump --format=custom > dump_file_stdout");
+
+
+# Run all pg_restore testcases against each archive.
+test_pg_restore($d1);
+test_pg_restore($d2);
+test_pg_restore($d1, ('extra_args' => ('-j2')));
+test_pg_restore($d2, ('extra_args' => ('-j2')));
+test_pg_restore($d1, ('extra_args' => ('--single-transaction')));
+test_pg_restore($d2, ('extra_args' => ('--single-transaction')));
+
+
+$node->stop('fast');
+
+done_testing();
--
2.49.0