Re: [Proposal] vacuumdb --schema only

Gilles Darold Wed, 06 Apr 2022 10:44:02 -0700

Le 30/03/2022 à 23:22, Nathan Bossart a écrit :
> I took a look at the v4 patch.
>
> 'git-apply' complains about whitespace errors:


Fixed.


> +   <para>
> +    To clean all tables in the <literal>Foo</literal> and 
> <literal>bar</literal> schemas
> +    only in a database named <literal>xyzzy</literal>:
> +<screen>
> +<prompt>$ </prompt><userinput>vacuumdb --schema='"Foo"' --schema='bar' 
> xyzzy</userinput>
> +</screen></para>
>
> nitpicks: I think the phrasing should be "To only clean tables in the...".
> Also, is there any reason to use a schema name with a capital letter as an
> example?  IMO that just adds unnecessary complexity to the example.

I have though that an example of a schema with case sensitivity was
missing in the documentation but I agree with your comment, this is
probably not he best place to do that. Fixed.


> +$node->issues_sql_like(
> +    [ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
> +    qr/VACUUM "Foo".*/,
> +    'vacuumdb --schema schema only');
>
> IIUC there should only be one table in the schema.  Can we avoid matching
> "*" and check for the exact command instead?

Fixed.


> I think there should be a few more test cases.  For example, we should test
> using -n and -N at the same time, and we should test what happens when
> those options are used for missing schemas.

Fixed


> +    /*
> +     * When filtering on schema name, filter by table is not allowed.
> +     * The schema name can already be set to a fqdn table name.
> +     */
> +    if (tbl_count && (schemas.head != NULL))
> +    {
> +        pg_log_error("cannot vacuum all tables in schema(s) and specific 
> table(s) at the same time");
> +        exit(1);
> +    }
>
> I think there might be some useful refactoring we can do that would
> simplify adding similar options in the future.  Specifically, can we have a
> global variable that stores the type of vacuumdb command (e.g., all,
> tables, or schemas)?  If so, perhaps the tables list could be renamed and
> reused for schemas (and any other objects that need listing in the future).

I don't think there will be much more options like this one that will be
added to this command but anyway I have changed the patch that way.


> +        if (schemas != NULL && schemas->head != NULL)
> +        {
> +            appendPQExpBufferStr(&catalog_query,
> +                                 " AND c.relnamespace");
> +            if (schema_is_exclude == TRI_YES)
> +                appendPQExpBufferStr(&catalog_query,
> +                                    " OPERATOR(pg_catalog.!=) ALL (ARRAY[");
> +            else if (schema_is_exclude == TRI_NO)
> +                appendPQExpBufferStr(&catalog_query,
> +                                    " OPERATOR(pg_catalog.=) ANY (ARRAY[");
> +
> +            for (cell = schemas->head; cell != NULL; cell = cell->next)
> +            {
> +                appendStringLiteralConn(&catalog_query, cell->val, conn);
> +
> +                if (cell->next != NULL)
> +                    appendPQExpBufferStr(&catalog_query, ", ");
> +            }
> +
> +            /* Finish formatting schema filter */
> +            appendPQExpBufferStr(&catalog_query, 
> "]::pg_catalog.regnamespace[])\n");
> +        }
>
> IMO we should use a CTE for specified schemas like we do for the specified
> tables.  I wonder if we could even have a mostly-shared CTE code path for
> all vacuumdb commands with a list of names.

Fixed


> -    /*
> -     * If no tables were listed, filter for the relevant relation types.  If
> -     * tables were given via --table, don't bother filtering by relation 
> type.
> -     * Instead, let the server decide whether a given relation can be
> -     * processed in which case the user will know about it.
> -     */
> -    if (!tables_listed)
> +    else
>      {
> +        /*
> +         * If no tables were listed, filter for the relevant relation types. 
>  If
> +         * tables were given via --table, don't bother filtering by relation 
> type.
> +         * Instead, let the server decide whether a given relation can be
> +         * processed in which case the user will know about it.
> +         */
> nitpick: This change seems unnecessary.

Fixed


Thanks for the review, all these changes are available in new version v6
of the patch and attached here.


Best regards,

-- 
Gilles Darold

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..0de001ef24 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
    <arg choice="opt"><replaceable>dbname</replaceable></arg>
   </cmdsynopsis>
 
+  <cmdsynopsis>
+   <command>vacuumdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+       <arg choice="plain">
+        <arg choice="opt">
+         <group choice="plain">
+          <arg choice="plain"><option>-n</option></arg>
+          <arg choice="plain"><option>--schema</option></arg>
+         </group>
+         <replaceable>schema</replaceable>
+        </arg>
+       </arg>
+
+       <arg choice="plain">
+        <arg choice="opt">
+         <group choice="plain">
+          <arg choice="plain"><option>-N</option></arg>
+          <arg choice="plain"><option>--exclude-schema</option></arg>
+         </group>
+         <replaceable>schema</replaceable>
+        </arg>
+       </arg>
+     </group>
+    </arg>
+   </arg>
+
+   <arg choice="opt"><replaceable>dbname</replaceable></arg>
+  </cmdsynopsis>
+
   <cmdsynopsis>
    <command>vacuumdb</command>
    <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
@@ -244,6 +278,28 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Clean or analyze all tables in <replaceable class="parameter">schema</replaceable> only.
+        Multiple schemas can be vacuumed by writing multiple <option>-n</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Clean or analyze all tables NOT in <replaceable class="parameter">schema</replaceable>.
+        Multiple schemas can be excluded from the vacuum by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--no-index-cleanup</option></term>
       <listitem>
@@ -619,6 +675,14 @@ PostgreSQL documentation
 <prompt>$ </prompt><userinput>vacuumdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
 </screen></para>
 
+   <para>
+    To clean all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    only in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>vacuumdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
  </refsect1>
 
  <refsect1>
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..7bbfb97246 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,15 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".bar/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t', 'pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-N',  '"Foo"', 'postgres' ],
+	'cannot use option -n | --schema and -N | --exclude-schema at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4f6917fd39..87261ebd2b 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,11 +46,18 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 } vacuumingOptions;
 
+enum trivalue schema_is_exclude = TRI_DEFAULT;
+
+/*
+ * The kind of object filter to use. '0': none, 'n': schema, 't': table
+ * these values correspond to the -n | -N and -t command line options.
+ */
+char objfilter = '0';
 
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
-								SimpleStringList *tables,
+								SimpleStringList *schemas,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -94,6 +101,8 @@ main(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"jobs", required_argument, NULL, 'j'},
 		{"parallel", required_argument, NULL, 'P'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"analyze-in-stages", no_argument, NULL, 3},
 		{"disable-page-skipping", no_argument, NULL, 4},
@@ -122,7 +131,7 @@ main(int argc, char *argv[])
 	vacuumingOptions vacopts;
 	bool		analyze_in_stages = false;
 	bool		alldb = false;
-	SimpleStringList tables = {NULL, NULL};
+	SimpleStringList objects = {NULL, NULL};
 	int			concurrentCons = 1;
 	int			tbl_count = 0;
 
@@ -140,7 +149,7 @@ main(int argc, char *argv[])
 
 	handle_help_version_opts(argc, argv, "vacuumdb", help);
 
-	while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:P:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:P:n:N:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -181,11 +190,19 @@ main(int argc, char *argv[])
 				alldb = true;
 				break;
 			case 't':
+			{
+				/* When filtering on schema name, filter by table is not allowed. */
+				if (schema_is_exclude != TRI_DEFAULT)
 				{
-					simple_string_list_append(&tables, optarg);
-					tbl_count++;
-					break;
+					pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+					exit(1);
 				}
+
+				simple_string_list_append(&objects, optarg);
+				objfilter = 't';
+				tbl_count++;
+				break;
+			}
 			case 'f':
 				vacopts.full = true;
 				break;
@@ -202,6 +219,41 @@ main(int argc, char *argv[])
 									  &vacopts.parallel_workers))
 					exit(1);
 				break;
+			case 'n':			/* include schema(s) */
+				/* When filtering on schema name, filter by table is not allowed. */
+				if (tbl_count)
+				{
+					pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+					exit(1);
+				}
+
+				if (schema_is_exclude == TRI_YES)
+				{
+					pg_log_error("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same time");
+					exit(1);
+				}
+
+				simple_string_list_append(&objects, optarg);
+				objfilter = 'n';
+				schema_is_exclude = TRI_NO;
+				break;
+			case 'N':			/* exclude schema(s) */
+				/* When filtering on schema name, filter by table is not allowed. */
+				if (tbl_count)
+				{
+					pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+					exit(1);
+				}
+				if (schema_is_exclude == TRI_NO)
+				{
+					pg_log_error("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same time");
+					exit(1);
+				}
+
+				simple_string_list_append(&objects, optarg);
+				objfilter = 'n';
+				schema_is_exclude = TRI_YES;
+				break;
 			case 2:
 				maintenance_db = pg_strdup(optarg);
 				break;
@@ -341,6 +393,16 @@ main(int argc, char *argv[])
 
 	setup_cancel_handler(NULL);
 
+	/*
+	 * When filtering on schema name, filter by table is not allowed.
+	 * The schema name can already be set to a fqdn table name.
+	 */
+	if (tbl_count && objfilter == 'n' && objects.head != NULL)
+	{
+		pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+		exit(1);
+	}
+
 	/* Avoid opening extra connections. */
 	if (tbl_count && (concurrentCons > tbl_count))
 		concurrentCons = tbl_count;
@@ -352,7 +414,17 @@ main(int argc, char *argv[])
 			pg_log_error("cannot vacuum all databases and a specific one at the same time");
 			exit(1);
 		}
-		if (tables.head != NULL)
+
+		if (objfilter == 'n' && objects.head != NULL)
+		{
+			if (schema_is_exclude == TRI_YES)
+				pg_log_error("cannot exclude from vacuum specific schema(s) in all databases");
+			else if (schema_is_exclude == TRI_NO)
+				pg_log_error("cannot vacuum specific schema(s) in all databases");
+			exit(1);
+		}
+
+		if (objfilter == 't' && objects.head != NULL)
 		{
 			pg_log_error("cannot vacuum specific table(s) in all databases");
 			exit(1);
@@ -387,7 +459,7 @@ main(int argc, char *argv[])
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&tables,
+									&objects,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -395,7 +467,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&tables,
+								&objects,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -420,7 +492,7 @@ static void
 vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
-					SimpleStringList *tables,
+					SimpleStringList *objects,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
@@ -435,7 +507,7 @@ vacuum_one_database(ConnParams *cparams,
 	int			i;
 	int			ntups;
 	bool		failed = false;
-	bool		tables_listed = false;
+	bool		objects_listed = false;
 	bool		has_where = false;
 	const char *initcmd;
 	const char *stage_commands[] = {
@@ -549,31 +621,43 @@ vacuum_one_database(ConnParams *cparams,
 	 * catalog query will fail.
 	 */
 	initPQExpBuffer(&catalog_query);
-	for (cell = tables ? tables->head : NULL; cell; cell = cell->next)
+	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
 		char	   *just_table;
 		const char *just_columns;
 
-		/*
-		 * Split relation and column names given by the user, this is used to
-		 * feed the CTE with values on which are performed pre-run validity
-		 * checks as well.  For now these happen only on the relation name.
-		 */
-		splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
-							  &just_table, &just_columns);
-
-		if (!tables_listed)
+		if (!objects_listed)
 		{
 			appendPQExpBufferStr(&catalog_query,
-								 "WITH listed_tables (table_oid, column_list) "
+								 "WITH listed_objects (object_oid, column_list) "
 								 "AS (\n  VALUES (");
-			tables_listed = true;
+			objects_listed = true;
 		}
 		else
 			appendPQExpBufferStr(&catalog_query, ",\n  (");
 
-		appendStringLiteralConn(&catalog_query, just_table, conn);
-		appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+
+		switch (objfilter)
+		{
+			case 'n':
+				appendStringLiteralConn(&catalog_query, cell->val, conn);
+				appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace::pg_catalog.oid, ");
+				break;
+			case 't':
+				/*
+				 * Split relation and column names given by the user, this is used to
+				 * feed the CTE with values on which are performed pre-run validity
+				 * checks as well.  For now these happen only on the relation name.
+				 */
+				splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
+									  &just_table, &just_columns);
+
+				appendStringLiteralConn(&catalog_query, just_table, conn);
+				appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+				break;
+			default:
+				break;
+		}
 
 		if (just_columns && just_columns[0] != '\0')
 			appendStringLiteralConn(&catalog_query, just_columns, conn);
@@ -586,13 +670,13 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/* Finish formatting the CTE */
-	if (tables_listed)
+	if (objects_listed)
 		appendPQExpBufferStr(&catalog_query, "\n)\n");
 
 	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
 
-	if (tables_listed)
-		appendPQExpBufferStr(&catalog_query, ", listed_tables.column_list");
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
 
 	appendPQExpBufferStr(&catalog_query,
 						 " FROM pg_catalog.pg_class c\n"
@@ -601,10 +685,21 @@ vacuum_one_database(ConnParams *cparams,
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
-	/* Used to match the tables listed by the user */
-	if (tables_listed)
-		appendPQExpBufferStr(&catalog_query, " JOIN listed_tables"
-							 " ON listed_tables.table_oid OPERATOR(pg_catalog.=) c.oid\n");
+	/* Used to match the tables or schemas listed by the user */
+	if (objects_listed)
+	{
+		appendPQExpBufferStr(&catalog_query, " JOIN listed_objects"
+				 " ON listed_objects.object_oid OPERATOR(pg_catalog.=) ");
+		switch (objfilter)
+		{
+			case 't':
+				appendPQExpBufferStr(&catalog_query, "c.oid\n");
+				break;
+			case 'n':
+				appendPQExpBufferStr(&catalog_query, "ns.oid\n");
+				break;
+		}
+	}
 
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
@@ -612,7 +707,7 @@ vacuum_one_database(ConnParams *cparams,
 	 * Instead, let the server decide whether a given relation can be
 	 * processed in which case the user will know about it.
 	 */
-	if (!tables_listed)
+	if (!objects_listed || objfilter == 'n')
 	{
 		appendPQExpBufferStr(&catalog_query, " WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array["
 							 CppAsString2(RELKIND_RELATION) ", "
@@ -683,7 +778,7 @@ vacuum_one_database(ConnParams *cparams,
 							 fmtQualifiedId(PQgetvalue(res, i, 1),
 											PQgetvalue(res, i, 0)));
 
-		if (tables_listed && !PQgetisnull(res, i, 2))
+		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
 		simple_string_list_append(&dbtables, buf.data);
@@ -1027,6 +1122,8 @@ help(const char *progname)
 	printf(_("      --no-index-cleanup          don't remove index entries that point to dead tuples\n"));
 	printf(_("      --no-process-toast          skip the TOAST table associated with the table to vacuum\n"));
 	printf(_("      --no-truncate               don't truncate empty pages at the end of the table\n"));
+	printf(_("  -n, --schema=PATTERN            vacuum tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=PATTERN    do NOT vacuum tables in the specified schema(s)\n"));
 	printf(_("  -P, --parallel=PARALLEL_WORKERS use this many background workers for vacuum, if available\n"));
 	printf(_("  -q, --quiet                     don't write any messages\n"));
 	printf(_("      --skip-locked               skip relations that cannot be immediately locked\n"));

Re: [Proposal] vacuumdb --schema only

Reply via email to