On 2/17/23 4:15 AM, Amit Kapila wrote:
On Fri, Feb 17, 2023 at 1:13 AM Jonathan S. Katz <jk...@postgresql.org> wrote:

On 2/16/23 2:38 PM, Alvaro Herrera wrote:
On 2023-Feb-16, Jonathan S. Katz wrote:

On 2/16/23 12:53 PM, Alvaro Herrera wrote:

I don't think this is the fault of logical replication.  Consider that
for the backend server, the function source code is just an opaque
string that is given to the plpgsql engine to interpret.  So there's no
way for the logical DDL replication engine to turn this into runnable
code if the table name is not qualified.

Sure, that's fair. That said, the example above would fall under a "typical
use case", i.e. I'm replicating functions that call tables without schema
qualification. This is pretty common, and as logical replication becomes
used for more types of workloads (e.g. high availability), we'll definitely
see this.

Hmm, I think you're saying that replay should turn check_function_bodies
off, and I think I agree with that.

Yes, exactly. +1


But will that be sufficient? I guess such functions can give errors at
a later stage when invoked at DML or another DDL time. Consider the
following example:

Pub:
CREATE PUBLICATION pub FOR ALL TABLES with (ddl = 'all');

Sub:
(Set check_function_bodies = off in postgresql.conf)
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub;

Pub:
CREATE FUNCTION t1(a int) RETURNS int AS $$
select a+1;
$$ LANGUAGE sql;
CREATE FUNCTION t(a int) RETURNS int AS $$
select t1(a);
$$ LANGUAGE sql;
CREATE TABLE tbl1 (a int primary key, b text);
create index idx on tbl1(t(a));

insert into tbl1 values (1,1); -- This insert on publisher causes an
error on the subscriber. Check subscriber Logs (ERROR:  function
t1(integer) does not exist at character 9.)

I did reproduce this as is. I also reproduced this when I rewrote the function in PL/pgSQL.

I also did an experiment using PL/v8[1] where I rewrote the functions above and did two tests: one via SPI, the other via PL/v8's ability to find and call a function[2]. In both cases, the INSERT statement failed citing the inability to find the function. The calls did work when I schema-qualified.

However, when I converted the SQL-only functions to use the SQL standard syntax (BEGIN ATOMIC), I did not get this error and was able to successfully use this index with the table on both publisher and subscriber. I believe this is due to the generated function body having all of the schema qualifications in it.

This happens because of the function used in the index expression.
Now, this is not the only thing, the replication can even fail during
DDL replication when the function like above is IMMUTABLE and used as
follows: ALTER TABLE tbl ADD COLUMN d int DEFAULT t(1);

Normally, it is recommended that users can fix such errors by
schema-qualifying affected names. See commits 11da97024a and
582edc369c.

I'm very familiar with those CVEs, but even though these are our recommended best practices, there is still a lot of code that does not schema-qualify the names of functions (including many of our own examples ;)

If we're going to say "You can use logical replication to replicate functions, but you have to ensure you've schema-qualified any function calls within them," I think that will prevent people from being able to use this feature, particularly on existing applications.

I guess there's a connection I'm missing here. For the failing examples above, I look at the pg_proc entries on both the publisher and the subscriber and they're identical. I'm not understanding why creating and executing the functions works on the publisher, but it does not on the subscriber. What additional info would the subscriber need to be able to successfully run these functions? Would we need to pass in some additional context, e.g. what the search_path was at the time the publisher created the function?

Thanks,

Jonathan

[1] https://plv8.github.io/
[2] https://plv8.github.io/#-code-plv8-find_function-code-

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to