> On Oct 13, 2021, at 1:43 PM, Mark Dilger <[email protected]> wrote: > > The issue of name parsing impacts pg_dump and pg_dumpall, also. Consider > what happens with: > > pg_dump -t production.critical.secrets > secrets.dump > dropdb production > > In v13, if your default database is "testing", and database "testing" has the > same schemas and tables (but not data) as production, you are unhappy. You > just dumped a copy of your test data and blew away the production data. > > You could end up unhappy in v14, if database "testing" has a schema named > "production" and a table that matches the pattern /^critical.secrets$/, but > otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching > tables were found". Neither behavior seems correct.
With the attached patch, this scenario results in a "cross-database references
are not implemented" error.
> The function where the processing occurs is processSQLNamePattern, which is
> called by pg_dump, pg_dumpall, and psql. All three callers expect
> processSQLNamePattern to append where-clauses to a buffer, not to execute any
> sql of its own. I propose that processSQLNamePattern return an error code if
> the pattern contains more than three parts, but otherwise insert the database
> portion into the buffer as a "pg_catalog.current_database()
> OPERATOR(pg_catalog.=) <database>", where <database> is a properly escaped
> representation of the database portion. Maybe someday we can change that to
> OPERATOR(pg_catalog.~), but for now we lack the sufficient logic for handling
> multiple matching database names. (The situation is different for
> pg_dumpall, as it's using the normal logic for matching a relation name, not
> for matching a database, and we'd still be fine matching that against a
> pattern.)
I ultimately went with your strcmp idea rather than OPERATOR(pg_catalog.=), as
rejecting the database name as part of the query complicates the calling
convention for no apparent benefit. I had been concerned about database names
that were collation-wise equal but byte-wise unequal, but it seems we already
treat those as distinct database names, so my concern was unnecessary. We
already use strcmp on database names from frontend clients
(fe_utils/parallel_slots.c, psql/prompt.c, pg_amcheck.c, pg_dump.c,
pg_upgrade/relfilenode.c), from libpq (libpq/hba.c) and from the backend
(commands/dbcommands.c, init/postinit.c).
I tried testing how this plays out by handing `createdb` the name é (U+00E9
"LATIN SMALL LETTER E WITH ACCUTE") and then again the name é (U+0065 "LATIN
SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE ACCENT".) That results in
two distinct databases, not an error about a duplicate database name:
# select oid, datname, datdba, encoding, datcollate, datctype from
pg_catalog.pg_database where datname IN ('é', 'é');
oid | datname | datdba | encoding | datcollate | datctype
-------+---------+--------+----------+-------------+-------------
37852 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8
37855 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8
(2 rows)
But that doesn't seem to prove much, as other tools in my locale don't treat
those as equal either. (Testing with perl's "eq" operator, they compare as
distinct.) I expected to find regression tests providing better coverage for
this somewhere, but did not. Anybody know more about it?
> For psql and pg_dump, I'm tempted to restrict the database portion (if not
> quoted) to neither contain shell glob characters nor POSIX regex characters,
> and return an error code if any are found, so that the clients can raise an
> appropriate error to the user.
With the patch, using pattern characters in an unquoted database portion
results in a "database name must be literal" error. Using them in a quoted
database name is allowed, but unless you are connected to a database that
literally equals that name, you will get a "cross-database references are not
implemented" error.
> In psql, this proposal would result in no tables matching \d
> wrongdb.schema.table, which would differ from v13's behavior. You wouldn't
> get an error about having specified the wrong database. You'd just get no
> matching relations. \d ??db??.schema.table would complain about the db
> portion being a pattern. \d "??db??".schema.table would work, assuming
> you're connected to a database literally named ??db??
With the patch, psql will treat \d wrongdb.schema.table as a "cross-database
references are not implemented" error.
> In pg_dumpall, --exclude-database=more.than.one.part would give an error
> about too many dotted parts rather than simply trying to exclude the last
> "part" and silently ignoring the prefix, which I think is what v13's
> pg_dumpall would do. --exclude-database=db?? would work to exclude four
> character database names beginning in "db".
The patch implements this.
> In pg_dump, the -t wrongdb.schema.table would match nothing and give the
> familiar error "pg_dump: error: no matching tables were found".
With the patch, pg_dump instead gives a "cross-database references are not
implemented" error.
> pg_dump -t too.many.dotted.names would give a different error about too
> many parts.
With the patch, pg_dump instead gives a "improper qualified name (too many
dotted names)" error.
> pg_dump -t db??.foo.bar would give an error about the database needing to be
> a literal name rather than a pattern.
With the patch, pg_dump gives a "database name must be literal" error. This is
the only new error message in the patch, which puts a burden on translators,
but I didn't see any existing message that would serve. Suggestions welcome.
> I don't like your proposal to use a strcmp() rather than a pg_catalog.=
> match, because it diverges from how the rest of the pattern is treated,
> including in how encoding settings might interact with the name, needing to
> be executed on the client side rather than in the server where the rest of
> the name resolution is happening.
Recanted, as discussed above.
The patch only changes the behavior of pg_amcheck in that it now rejects
patterns with too many parts. Using database patterns was and remains legal
for this tool.
The patch changes nothing about reindexdb. That's a debatable design choice,
but reindexdb doesn't use string_utils's processSQLNamePattern() function as
the other tools do, nor does its documentation reference psql's
#APP-PSQL-PATTERNS documentation. It's --schema option only takes literal
names.
v1-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch
Description: Binary data
— Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
