Re: Identify huge pages accessibility using madvise

2024-09-25 Thread Gabriele Bartolini
Hi Dmitry,

I've been attempting to replicate this issue directly in Kubernetes, but I
haven't been successful so far. I've been using EKS nodes, and it seems
that they all run cgroup v2 now. Do you have anything that could help me
get started on this more quickly?

Thanks,
Gabriele

On Sat, 13 Apr 2024 at 18:24, Dmitry Dolgov <9erthali...@gmail.com> wrote:

> Hi,
>
> I would like to propose a small patch to address an annoying issue with
> the way how PostgreSQL does fallback in case if "huge_pages = try" is
> set. Here is how the problem looks like:
>
> * PostgreSQL is starting on a machine with some huge pages available
>
> * It tries to identify that fact and does mmap with MAP_HUGETLB, which
>   succeeds
>
> * But it has a pleasure to run inside a cgroup with a hugetlb
>   controller and limits set to 0 (or anything less than PostgreSQL
>   needs)
>
> * Under this circumstances PostgreSQL will proceed allocating huge
>   pages, but the first page fault will trigger SIGBUS
>
> I've sketched out how to reproduce it with cgroup v1 and v2 in the
> attached scripts.
>
> This sounds like quite a rare combination of factors, but apparently
> it's fairly easy to face this on K8s/OpenShift. There was a bug reported
> some time ago [1] about this behaviour, and back then I was under the
> impression it's a solved matter with nothing to do. Yet I still observe
> this type of issues, the latest one not longer than a week ago.
>
> After some research I found what looks to me like a relatively simple
> way to address the problem. In Linux kernel 5.14 a new flag to madvise
> was introduced that might be just what we need here. It's called
> MADV_POPULATE_READ [2] and it tells kernel to populate page tables by
> triggering read faults if required. One by-design feature of this flag
> is to fail the madvise call in the situations like one above, giving an
> opportunity to avoid SIGBUS.
>
> I've outlined a patch to implement this approach and tested it on a
> newish Linux kernel I've got lying around (6.9.0-rc1) -- no SIGBUS,
> PostgreSQL does fallback to not use huge pages. The resulting change
> seems to be small enough to justify addressing this small but annoying
> issue. Any thoughts or commentaries about the proposal?
>
> [1]:
> https://www.postgresql.org/message-id/flat/HE1PR0701MB256920EEAA3B2A9C06249F339E110%40HE1PR0701MB2569.eurprd07.prod.outlook.com
> [2]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4ca9b3859dac14bbef0c27d00667bb5b10917adb
>


-- 
Gabriele Bartolini
VP, Chief Architect, Kubernetes
enterprisedb.com


Re: RFC: Additional Directory for Extensions

2024-08-27 Thread Gabriele Bartolini
Hi David,

Thanks for your email.

On Mon, 26 Aug 2024 at 16:07, David E. Wheeler 
wrote:

> I would assume BINDIR, DOCDIR, HTMLDIR, PKGLIBDIR, MANDIR, SHAREDIR, and
> perhaps LOCALEDIR.
>
> But even if it’s just one or two, the only proper way an extension
> directory would work, IME, is to define a directory-based structure for
> extensions, where every file for an extension is in a directory named for
> the extension, and subdirectories are defined for each of the above
> requisite file types. Something like:
>
> extension_name
> ├── control.ini
> ├── bin
> ├── doc
> ├── html
> ├── lib
> ├── local
> ├── man
> └── share
>

I'm really glad you proposed this publicly. I reached the same conclusion
the other day when digging deeper into the problem with a few folks from
CloudNativePG. Setting aside multi-arch images for now, if we could
reorganize the content of a single image (identified by OS distro,
PostgreSQL major version, and extension version) with a top-level directory
structure as you described, we could easily mount each image as a separate
volume.

The extension image could follow a naming convention like this (order can
be adjusted): `---(-)`. For example, `pgvector-16-0.7.4-bookworm-1`
would represent the first image built in a repository for pgvector 0.7.4
for PostgreSQL 16 on Debian Bookworm. If multi-arch images aren't desired,
we could incorporate the architecture somewhere in the naming convention.

This would allow multiple paths to work and keep all the files for an
> extension bundled together. It could also potentially allow for multiple
> versions of an extension to be installed at once, if we required the
> version to be part of the directory name.
>

If we wanted to install multiple versions of an extension, we could mount
them in different directories, with the version included in the folder
name—for example, `pgvector-0.7.4` instead of just `pgvector`. However, I'm
a bit rusty with the extensions framework, so I'll need to check if this
approach is feasible and makes sense.

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: RFC: Additional Directory for Extensions

2024-08-22 Thread Gabriele Bartolini
Hi Jelte,

On Thu, 22 Aug 2024 at 09:32, Jelte Fennema-Nio  wrote:

> It looks like you want one directory per extension, so that list would
> get pretty long if you have multiple extensions. Maybe (as a follow up
> change), we should start to support a * as a wildcard in both of these
> GUCs. So you could say:
>
> SET extension_search_path = /mnt/extensions/pg16/*
>
> To mean effectively the same as you're describing above.
>

That'd be great. +1.

-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: RFC: Additional Directory for Extensions

2024-08-22 Thread Gabriele Bartolini
Hi Craig,

On Thu, 22 Aug 2024 at 01:07, Craig Ringer 
wrote:

> It's also very relevant for local development and testing.
>

Yep, which is the original goal of Christoph IIRC.


> It may be possible to weaken this restriction somewhat thanks to the
> upcoming
> https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/
> feature that permits additional OCI images to be mounted as read-only
> volumes on a workload. This would still only permit mounting at
> Pod-creation time, not runtime mounting and unmonuting, but means the
> base postgres image could be supplemented by mounting additional
> images for extensions.
>

I'm really excited about that feature, but it's still in the alpha stage.
However, I don't anticipate any issues for the future general availability
(GA) release. Regardless, we may need to consider a temporary solution that
is compatible with existing Kubernetes and possibly Postgres versions (but
that's beyond the purpose of this thread).

For example, one might mount image "postgis-vX.Y.Z" image onto base
> image "postgresql-16" if support for PostGIS is desired, without then
> having to bake every possible extension anyone might ever want into
> the base image. This solves all sorts of messy issues with upgrades
> and new version releases.
>

Yep.


> But for it to work, it must be possible to tell postgres to look in
> _multiple places_ for extension .sql scripts and control files. This
> is presently possible for modules (dynamic libraries, .so / .dylib /
> .dll) but without a way to also configure the path for extensions it's
> of very limited utility.
>

Agree.


> So IMO this should be a _path_ to search for extension control files
> and SQL scripts.
>

I like this. I also prefer the name `extension_search_path`.

For safety, it might make sense to impose the restriction that if an
> extension control file is found in a given directory, SQL scripts will
> also only be looked for in that same directory. That way there's no
> chance of accidentally mixing and matching SQL scripts from different
> versions of an extension if it appears twice on the extension search
> path in different places. The rule for loading SQL scripts would be:
>
> * locate first directory on path contianing matching extension control file
> * use this directory as the extension directory for all subsequent SQL
> script loading and running actions
>

It could work, but it requires some prototyping and exploration. I'm
willing to participate and use CloudNativePG as a test bed.

Cheers,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: RFC: Additional Directory for Extensions

2024-08-21 Thread Gabriele Bartolini
Hi everyone,

Apologies for only starting to look into this now. Thanks, David, for
pushing this forward.

I want to emphasize the importance of this patch for the broader adoption
of extensions in immutable container environments, such as those used by
the CloudNativePG operator in Kubernetes.

To provide some context, one of the key principles of CloudNativePG is that
containers, once started, cannot be modified—this includes the installation
of Postgres extensions and their libraries. This restriction prevents us
from adding extensions on the fly, requiring them to be included in the
main PostgreSQL operand image. As a result, users who need specific
extensions must build custom images through automated pipelines (see:
https://cloudnative-pg.io/blog/creating-container-images/).

We’ve been considering ways to improve this process for some time. The
direction we're exploring involves mounting an ephemeral volume that
contains the necessary extensions (namely $sharedir and $pkglibdir from
pg_config). These volumes would be created and populated with the required
extensions when the container starts and destroyed when it shuts down. To
make this work, each extension must be independently packaged as a
container image containing the appropriate files for a specific extension
version, tailored to the architecture, distribution, OS version, and
Postgres version.

I’m committed to thoroughly reviewing this patch, testing it with
CloudNativePG and a few extensions, and providing feedback as soon as
possible.

Best,
Gabriele

On Mon, 8 Jul 2024 at 18:02, David E. Wheeler  wrote:

> On Jun 25, 2024, at 18:31, David E. Wheeler  wrote:
>
> > For those who prefer a GitHub patch review experience, see this PR:
> >
> >  https://github.com/theory/postgres/pull/3/files
>
> Rebased and restored PGC_SUSET in the attached v5 patch, plus noted the
> required privileges in the docs.
>
> Best,
>
> David
>
>
>
>

-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2024-02-07 Thread Gabriele Bartolini
Hi Joel,

On Wed, 7 Feb 2024 at 10:00, Joel Jacobson  wrote:

> On Fri, Sep 8, 2023, at 16:17, Gabriele Bartolini wrote:
> > ```
> > postgres=# ALTER SYSTEM SET wal_level TO minimal;
> > ERROR:  could not open file "postgresql.auto.conf": Permission denied
> > ```
>
> +1 to simply mark postgresql.auto.conf file as not being writeable.
>
> To improve the UX experience, how about first checking if the file is not
> writeable, or catch EACCESS, and add a user-friendly hint?
>
> ```
> postgres=# ALTER SYSTEM SET wal_level TO minimal;
> ERROR:  could not open file "postgresql.auto.conf": Permission denied
> HINT: The ALTER SYSTEM command is effectively disabled as the
> configuration file is set to read-only.
> ```
>

This would do - provided we fix the issue with pg_rewind not handling
read-only files in PGDATA.

Cheers,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2024-02-07 Thread Gabriele Bartolini
Hi Jelte,

On Tue, 6 Feb 2024 at 16:22, Jelte Fennema-Nio  wrote:

> I'm not convinced we need a new file to disable ALTER SYSTEM. I feel
> like an "enable_alter_system" GUC that defaults to ON would work fine
> for this. If we make that GUC be PGC_POSTMASTER then an operator can
> disable ALTER SYSTEM either with a command line argument or by
> changing the main config file. Since this feature is mostly useful
> when the config file is managed by an external system, it seems rather
> simple for that system to configure one extra GUC in the config file.
>

This is mostly the approach I have taken in the patch, except allowing to
change the value in the configuration file. The patch at the moment was
enforcing just the setting at startup (which is more than enough for a
Kubernetes operator given that Postgres runs in the container). I had done
some experiments enabling the change in the configuration file, but wasn't
sure in which `config_group` to place the 'enable_alter_system` GUC, based
on the src/include/utils/guc_tables.h. Any thoughts/hints?

Cheers,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2024-01-31 Thread Gabriele Bartolini
Hi there,

I very much like the idea of a file in the data directory that also
controls the copy operations.

Just wanted to highlight though that in our operator we have already
applied the read-only postgresql.auto.conf trick to disable the system (see
https://cloudnative-pg.io/documentation/current/postgresql_conf/#enabling-alter-system).
However, having that file read-only triggered an issue when using pg_rewind
to resync a former primary, as pg_rewind immediately bails out when a
read-only file is encountered in the PGDATA (see
https://github.com/cloudnative-pg/cloudnative-pg/issues/3698).

We might keep this in mind if we go down the path of the separate file.

Thanks,
Gabriele

On Wed, 31 Jan 2024 at 08:43, Peter Eisentraut  wrote:

> On 31.01.24 06:28, Tom Lane wrote:
> >> The idea of adding a file to the data directory appeals to me.
> >>
> >> optional_runtime_features.conf
> >> alter_system=enabled
> >> copy_from_program=enabled
> >> copy_to_program=disabled
> > ... so, exactly what keeps an uncooperative superuser from
> > overwriting that file?
>
> The point of this feature would be to keep the honest people honest.
>
> The first thing I did when ALTER SYSTEM came out however many years ago
> was to install Nagios checks to warn when postgresql.auto.conf exists.
> Because the thing is an attractive nuisance, especially when you want to
> do centralized configuration control.  Of course you can bypass it using
> COPY PROGRAM etc., but then you *know* that you are *bypassing*
> something.  If you just see ALTER SYSTEM, you'll think, "that is
> obviously the appropriate tool", and there is no generally accepted way
> to communicate that, in particular environment, it might not be.
>
>

-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2024-01-30 Thread Gabriele Bartolini
Hi,

I am sending an updated patch, and submitting this to the next commit fest,
as I still believe this could be very useful.

Thanks,
Gabriele

On Thu, 7 Sept 2023 at 21:51, Gabriele Bartolini <
gabriele.bartol...@enterprisedb.com> wrote:

> Hi everyone,
>
> I would like to propose a patch that allows administrators to disable
> `ALTER SYSTEM` via either a runt-time option to pass to the Postgres server
> process at startup (e.g. `--disable-alter-system=true`, false by default)
> or a new GUC (or even both), without changing the current default method of
> the server.
>
> The main reason is that this would help improve the “security by default”
> posture of Postgres in a Kubernetes/Cloud Native environment - and, in
> general, in any environment on VMs/bare metal behind a configuration
> management system in which changes should only be made in a declarative way
> and versioned like Ansible Tower, to cite one.
>
> Below you find some background information and the longer story behind
> this proposal.
>
> Sticking to the Kubernetes use case, I am primarily speaking on behalf of
> the CloudNativePG open source operator (cloudnative-pg.io, of which I am
> one of the maintainers). However, I am sure that this option could benefit
> any operator for Postgres - an operator is the most common and recommended
> way to run a complex application like a PostgreSQL database management
> system inside Kubernetes.
>
> In this case, the state of a PostgreSQL cluster (for example its number of
> replicas, configuration, storage, etc.) is defined in a Custom Resource
> Definition in the form of configuration, typically YAML, and the operator
> works with Kubernetes to ensure that, at any moment, the requested Postgres
> cluster matches the observed one. This is a very basic example in
> CloudNativePG:
> https://cloudnative-pg.io/documentation/current/samples/cluster-example.yaml
>
> As I was mentioning above, in a Cloud Native environment it is expected
> that workloads are secure by default. Without going into much detail, many
> decisions have been made in that direction by operators for Postgres,
> including CloudNativePG. The goal of this proposal is to provide a way to
> ensure that changes to the PostgreSQL configuration in a Kubernetes
> controlled Postgres cluster are allowed only through the Kubernetes API.
>
> Basically, if you want to change an option for PostgreSQL, you need to
> change the desired state in the Kubernetes resource, then Kubernetes will
> converge (through the operator). In simple words, it’s like empowering the
> operator to impersonate the PostgreSQL superuser.
>
> However, given that we cannot force this use case, there could be roles
> with the login+superuser privileges connecting to the PostgreSQL instance
> and potentially “interfering” with the requested state defined in the
> configuration by imperatively running “ALTER SYSTEM” commands.
>
> For example: CloudNativePG has a fixed value for some GUCs in order to
> manage a full HA cluster, including SSL, log, some WAL and replication
> settings. While the operator eventually reconciles those settings, even the
> temporary change of those settings in a cluster might be harmful. Think for
> example of a user that, through `ALTER SYSTEM`, tries to change WAL level
> to minimal, or change the setting of the log (we require CSV), potentially
> creating issues to the underlying instance and cluster (potentially leaving
> it in an unrecoverable state in the case of other more invasive GUCS).
>
> At the moment, a possible workaround is that `ALTER SYSTEM` can be blocked
> by making the postgresql.auto.conf read only, but the returned message is
> misleading and that’s certainly bad user experience (which is very
> important in a cloud native environment):
>
> ```
> postgres=# ALTER SYSTEM SET wal_level TO minimal;
> ERROR:  could not open file "postgresql.auto.conf": Permission denied
> ```
>
> For this reason, I would like to propose the option to be given to the
> postgres process at startup, in order to be as less invasive as possible
> (the operator could then start Postgres requesting `ALTER SYSTEM` to be
> disabled). That’d be my preference at the moment, if possible.
>
> Alternatively, or in addition, the introduction of a GUC to disable `ALTER
> SYSTEM` altogether. This enables tuning this setting through configuration
> at the Kubernetes level, only if the operators require it - without
> damaging the rest of the users.
>
> Before I start writing any lines of code and propose a patch, I would like
> first to understand if there’s room for it.
>
> Thanks for your attention and … looking forward to your feedback!
>
> Ciao,
> Gabriele
> --
> Gabriele Bartolini
> Vice President, Cloud Native at EDB
> enterprisedb.com
>


-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


0001-Add-enable_alter_system-GUC.patch
Description: Binary data


Re: Extend pgbench partitioning to pgbench_history

2024-01-30 Thread Gabriele Bartolini
Hi Abhijit,

Thanks for your input. Please accept my updated patch.

Ciao,
Gabriele

On Tue, 16 Jan 2024 at 12:53, Abhijit Menon-Sen  wrote:

> At 2023-11-30 11:29:15 +0100, gabriele.bartol...@enterprisedb.com wrote:
> >
> > With the attached patch, I extend the partitioning capability to the
> > pgbench_history table too.
> >
> > I have been thinking of adding an option to control this, but I preferred
> > to ask in this list whether it really makes sense or not (I struggle
> indeed
> > to see use cases where accounts is partitioned and history is not).
>
> I don't have a strong opinion about this, but I also can't think of a
> reason to want to create partitions for pgbench_accounts but leave out
> pgbench_history.
>
> > From ba8f507b126a9c5bd22dd40bb8ce0c1f0c43ac59 Mon Sep 17 00:00:00 2001
> > From: Gabriele Bartolini 
> > Date: Thu, 30 Nov 2023 11:02:39 +0100
> > Subject: [PATCH] Include pgbench_history in partitioning method for
> pgbench
> >
> > In case partitioning, make sure that pgbench_history is also partitioned
> with
> > the same criteria.
>
> I think "If partitioning" or "If we're creating partitions" would read
> better here. Also, same criteria as what? Maybe you could just add "as
> pgbench_accounts" to the end.
>
> > diff --git a/doc/src/sgml/ref/pgbench.sgml
> b/doc/src/sgml/ref/pgbench.sgml
> > index 05d3f81619..4c02d2a61d 100644
> > --- a/doc/src/sgml/ref/pgbench.sgml
> > +++ b/doc/src/sgml/ref/pgbench.sgml
> > […]
> > @@ -378,9 +378,9 @@ pgbench 
> options  d
> >
> --partitions=NUM
> >
> > 
> > -Create a partitioned pgbench_accounts table
> with
> > -NUM partitions of nearly equal size
> for
> > -the scaled number of accounts.
> > +Create partitioned pgbench_accounts and
> pgbench_history
> > +tables with NUM partitions of nearly
> equal size for
> > +the scaled number of accounts - and future history records.
> >  Default is 0, meaning no partitioning.
> > 
>
> I would just leave out the "-" and write "number of accounts and future
> history records".
>
> > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
> > index 2e1650d0ad..87adaf4d8f 100644
> > --- a/src/bin/pgbench/pgbench.c
> > +++ b/src/bin/pgbench/pgbench.c
> > […]
> > @@ -889,8 +889,10 @@ usage(void)
> >  "  --index-tablespace=TABLESPACE\n"
> >  "   create indexes in the
> specified tablespace\n"
> >  "  --partition-method=(range|hash)\n"
> > -"   partition pgbench_accounts
> with this method (default: range)\n"
> > -"  --partitions=NUM partition pgbench_accounts
> into NUM parts (default: 0)\n"
> > +"   partition pgbench_accounts
> and pgbench_history with this method"
> > +"   (default: range)."
> > +"  --partitions=NUM partition pgbench_accounts
> and pgbench_history into NUM parts"
> > +    "   (default: 0)\n"
> >  "  --tablespace=TABLESPACE  create tables in the
> specified tablespace\n"
> >  "  --unlogged-tablescreate tables as unlogged
> tables\n"
> >  "\nOptions to select what to run:\n"
>
> There's a missing newline after "(default: range).".
>
> I read through the rest of the patch closely. It looks fine to me. It
> applies, builds, and does create the partitions as intended.
>
> -- Abhijit
>


-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


v2-0001-Include-pgbench_history-in-partitioning-method-fo.patch
Description: Binary data


Re: Extend pgbench partitioning to pgbench_history

2023-11-30 Thread Gabriele Bartolini
Please discard the previous patch and use this one (it had a leftover
comment from an initial attempt to limit this to hash case).

Thanks,
Gabriele

On Thu, 30 Nov 2023 at 11:29, Gabriele Bartolini <
gabriele.bartol...@enterprisedb.com> wrote:

> Hi there,
>
> While benchmarking a new feature involving tablespace support in
> CloudNativePG (Kubernetes operator), I wanted to try out the partitioning
> feature of pgbench. I saw it supporting both range and hash partitioning,
> but limited to pgbench_accounts.
>
> With the attached patch, I extend the partitioning capability to the
> pgbench_history table too.
>
> I have been thinking of adding an option to control this, but I preferred
> to ask in this list whether it really makes sense or not (I struggle indeed
> to see use cases where accounts is partitioned and history is not).
>
> Please let me know what you think.
>
> Thanks,
> Gabriele
> --
> Gabriele Bartolini
> Vice President, Cloud Native at EDB
> enterprisedb.com
>


-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


0001-Include-pgbench_history-in-partitioning-method-for-p.patch
Description: Binary data


Extend pgbench partitioning to pgbench_history

2023-11-30 Thread Gabriele Bartolini
Hi there,

While benchmarking a new feature involving tablespace support in
CloudNativePG (Kubernetes operator), I wanted to try out the partitioning
feature of pgbench. I saw it supporting both range and hash partitioning,
but limited to pgbench_accounts.

With the attached patch, I extend the partitioning capability to the
pgbench_history table too.

I have been thinking of adding an option to control this, but I preferred
to ask in this list whether it really makes sense or not (I struggle indeed
to see use cases where accounts is partitioned and history is not).

Please let me know what you think.

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


0001-Include-pgbench_history-in-partitioning-method-for-p.patch
Description: Binary data


Re: Possibility to disable `ALTER SYSTEM`

2023-09-15 Thread Gabriele Bartolini
Hi Greg,

On Wed, 13 Sept 2023 at 19:10, Greg Sabino Mullane 
wrote:

> Seems to be some resistance to getting this in core, so why not just use
> an extension? I was able to create a quick POC to do just that. Hook into
> PG and look for AlterSystemStmt, throw a "Sorry, ALTER SYSTEM is not
> currently allowed" error. Put into shared_preload_libraries and you're
> done. As a bonus, works on all supported versions, so no need to wait for
> Postgres 17 - or Postgres 18/19 given the feature drift this thread is
> experiencing :)
>

As much as I would like to see your extension, I would still like to
understand why Postgres itself shouldn't solve this basic requirement
coming from the configuration management driven/Kubernetes space. It
shouldn't be a big deal to have such an option, either as a startup one or
a GUC, should it?

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2023-09-11 Thread Gabriele Bartolini
Hi Stephen,

On Mon, 11 Sept 2023 at 17:12, Stephen Frost  wrote:

> A lot of the objections seem to be on the grounds of returning a
> 'permission denied' kind of error and I generally agree with that being
> the wrong approach.
>
> As an alternative idea- what if we had something in postgresql.conf
> along the lines of:
>
> include_alter_system = true/false
>
> and use that to determine if the postgresql.auto.conf is included or
> not..?
>

That sounds like a very good idea. I had thought about that when writing
the PoC, as a SIGHUP controlled GUC. I had trouble finding an adequate GUC
category for that (ideas?), and thought it was a more intrusive patch
to trigger the conversation. But I am willing to explore that too (which
won't change by any inch the goal of the patch).

With the above, we could throw a WARNING or maybe just NOTICE when ALTER
> SYSTEM is run that 'include_alter_system is false and therefore these
> changes won't be included in the running configuration' or similar.
>

That's also an option I'd be willing to explore with folks here.


> What this does cause problems with is that pg_basebackup and other tools
> (eg: pgbackrest) write into postgresql.auto.conf currently and we'd want
> those to still work.  That's an opportunity, imv, though, since I don't
> really think where ALTER SYSTEM writes to and where backup/restore
> tools are writing to should really be the same place anyway.  Therefore,
> perhaps we add a 'postgresql.system.conf' or similar and maybe a
> corresponding option in postgresql.conf to include it or not.
>

Totally. We are for example in the same position with the CloudNativePG
operator, and it is something we are intending to fix (
https://github.com/cloudnative-pg/cloudnative-pg/issues/2727). I agree with
you that it is the wrong place to do it.

This is an issue if you're looking at it as a security thing.  This
> isn't an issue if don't view it that way.  Still, I do see some merit in
> making it so that you can't actually change the config that's loaded at
> system start from inside the data directory or as the PG superuser,
> which my proposal above would support- just configure in postgresql.conf
> to not include any of the alter-system or generated config.  The actual
> postgresql.conf could be owned by root then too.
>

+1.

Thank you,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2023-09-11 Thread Gabriele Bartolini
Hi Magnus,

On Mon, 11 Sept 2023 at 16:04, Magnus Hagander  wrote:

> But to do that, there would need to be a very in-your-face warning in
> the documentation about it like "note that this only disables the
> ALTER SYSTEM command. It does not prevent a superuser from changing
> the configuration remotely using other means".
>

Although I did not include any docs in the PoC patch, that's exactly the
plan. So +1 from me.


> Yes, this is marginally harder than saying ALTER SYSTEM SET
> work_mem='1TB', but only very very marginally so. And from a security
> perspective, there is zero difference.
>

Agree, but the primary goal is not security. Indeed, security requires a
more holistic approach (and in my initial thread I deliberately did not
mention all the knobs that the operator provides, with stricter and
stricter default values, as I thought it was not relevant from a Postgres'
PoV). However, as I explained in the patch PoC thread, the change is
intended primarily to warn legitimate administrators in a configuration
managed controlled environment that ALTER SYSTEM has been disabled for that
system. These are environments where network access for a superuser is
prohibited, but still possible for local SREs to log in via the container
in the pod for incident resolution - very often this happens in high stress
conditions and I believe that this gate will help them remind that if they
want to change the settings they need to do it through the Kubernetes
resources. So primarily: usability.

Another idea to solve the problem could be to instead introduce a
> specific configuration file (either hardcoded or an
> include_final_parameter= parameter, in which case patroni or the
> k8s operator could set that parameter on the command line and that
> parameter only) that is parsed *after* postgresql.auto.conf and
> thereby would override the manual settings. This file would explicilty
> be documented as intended for this type of tooling, and when you have
> a tool - whether patroni or another declarative operator - it owns
> this file and can overwrite it with whatever it wants. And this would
> also retain the ability to use ALTER SYSTEM SET for *other*
> parameters, if they want to.
>

But that is exactly the whole point of this request: disable the last
operation altogether. This option will easily give any operator (or
deployment in a configuration management scenario) the possibility to ship
a system that, out-of-the box, facilitates this one direction update of the
configuration, allowing the observed state to easily reconcile with the
desired one. Without breaking any existing deployment.


> Stopping ALTER SYSTEM SET without actually preventing the superuser
> from doing the same thing as they were doing before would to me be at
> least as much of a hack as what patroni does today is.
>

Agree, but as I said above, that'd be at that point the role of an
operator. An operator, at that point, will have the possibility to
configure this knob in conjunction with others. A possibility that Postgres
is not currently giving.

Postgres itself should be able to give this possibility, as these
environments demand Postgres to address their emerging needs.

Thank you,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2023-09-08 Thread Gabriele Bartolini
Hi Magnus,

On Fri, 8 Sept 2023 at 23:43, Magnus Hagander  wrote:

> +1. And to make that happen, the appropriate thing is to identify
> *why* they are using superuser today, and focus efforts on finding
> ways for them to do that without being superuser.
>

As I am explaining in the other post (containing a very basic proof of
concept patch), it is not about restricting superuser. It is primarily a
usability and configuration matter of a running instance, which helps
control an entire cluster like in the case of Kubernetes (where, in order
to provide self-healing and high availability we are forced to go beyond
the single instance and think in terms of primary with one or more standbys
or at least continuous backup in place).

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2023-09-08 Thread Gabriele Bartolini
Hi Tom and Alvaro,

On Fri, 8 Sept 2023 at 17:31, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > I don't understand Tom's resistance to this request.
>
> It's false security.  If you think you are going to prevent a superuser
> from messing with the system's configuration, you are going to need a
> lot more restrictions than this, and we'll be forever getting security
> reports that "hey, I found another way for a superuser to get filesystem
> access".  I think the correct answer to this class of problems is "don't
> give superuser privileges to clients running inside the container".
>

Ok, this is clearer. That makes sense now, and this probably helps me
explain better the goal here. I also omitted in the initial email all the
security precautions that a Kubernetes should take. This could be another
step towards that direction but, you are right, it won't fix it entirely
(in case of malicious superusers).

In my opinion, the biggest benefit of this possibility is on the usability
side, providing a clear and configurable way to disable ALTER SYSTEM in
those environments where declarative configuration is a requirement. For
example, this should at least "warn" human beings that have the permissions
to connect to a Postgres database (think of SREs managing a DBaaS solution
or a DBA) and try to change a setting in an instance. Moreover, for those
who are managing through declarative configuration not only one instance,
but a Postgres cluster that controls standby instances too, the benefit of
impeding these modifications could be even higher (think of the hot standby
sensitive parameters like max_connections that require coordination
depending whether you increase or decrease them).

I hope this is clearer. For what it's worth, I have done a basic PoC patch
(roughly 20 lines of code), which I have attached here just to provide some
basis for further analysis and comments. The general idea is to disable
ALTER SYSTEM at startup, like this:

pg_ctl start -o "-c enable_alter_system=off"


The setting can be verified with:

psql -c 'SHOW enable_alter_system'
 enable_alter_system
-
 off
(1 row)


And then:

psql -c 'ALTER SYSTEM SET max_connections TO 10'
ERROR:  permission denied to run ALTER SYSTEM


Thanks for your attention and looking forward to getting feedback and
advice.

Cheers,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


enable_alter_system_guc.patch
Description: Binary data


Re: Possibility to disable `ALTER SYSTEM`

2023-09-08 Thread Gabriele Bartolini
Hi Isaac,

On Fri, 8 Sept 2023 at 16:11, Isaac Morland  wrote:

> Alternate idea, not sure how good this is: Use existing OS security
> features (regular permissions, or more modern features such as the
> immutable attribute) to mark the postgresql.auto.conf file as not being
> writeable. Then any attempt to ALTER SYSTEM should result in an error.
>

That is the point I highlighted in the initial post in the thread. We could
make it readonly, but the returned error is misleading and definitely poor
UX:

```
postgres=# ALTER SYSTEM SET wal_level TO minimal;
ERROR:  could not open file "postgresql.auto.conf": Permission denied
```

IMO we should clearly state that `ALTER SYSTEM` is deliberately disabled in
a system, rather than indirectly hinting it through an inaccessible file.
Not sure if I am clearly highlighting the fine difference here.

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2023-09-08 Thread Gabriele Bartolini
Hi Tom,

On Thu, 7 Sept 2023 at 22:27, Tom Lane  wrote:

> Gabriele Bartolini  writes:
> > I would like to propose a patch that allows administrators to disable
> > `ALTER SYSTEM` via either a runt-time option to pass to the Postgres
> server
> > process at startup (e.g. `--disable-alter-system=true`, false by default)
> > or a new GUC (or even both), without changing the current default method
> of
> > the server.
>
> ALTER SYSTEM is already heavily restricted.


Could you please help me better understand what you mean here?


> I don't think we need random kluges added to the permissions system.


If you allow me, why do you think disabling ALTER SYSTEM altogether is a
random kluge? Again, I'd like to better understand this position. I've
personally been in many conversations on the security side of things for
Postgres in Kubernetes environments, and this is a frequent concern by
users who request that changes to the Postgres system (not a database)
should only be done declaratively and prevented from within the system.

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Possibility to disable `ALTER SYSTEM`

2023-09-07 Thread Gabriele Bartolini
Hi Joe,

On Thu, 7 Sept 2023 at 21:57, Joe Conway  wrote:

> Without trying to debate the particulars, a huge +1 for the concept of
> allowing ALTER SYSTEM to be disabled. FWIW I would vote the same for
> control over COPY PROGRAM.
>

Oh ... another one of my favourite security friendly features! :)

That sounds like a good idea to me.

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Possibility to disable `ALTER SYSTEM`

2023-09-07 Thread Gabriele Bartolini
Hi everyone,

I would like to propose a patch that allows administrators to disable
`ALTER SYSTEM` via either a runt-time option to pass to the Postgres server
process at startup (e.g. `--disable-alter-system=true`, false by default)
or a new GUC (or even both), without changing the current default method of
the server.

The main reason is that this would help improve the “security by default”
posture of Postgres in a Kubernetes/Cloud Native environment - and, in
general, in any environment on VMs/bare metal behind a configuration
management system in which changes should only be made in a declarative way
and versioned like Ansible Tower, to cite one.

Below you find some background information and the longer story behind this
proposal.

Sticking to the Kubernetes use case, I am primarily speaking on behalf of
the CloudNativePG open source operator (cloudnative-pg.io, of which I am
one of the maintainers). However, I am sure that this option could benefit
any operator for Postgres - an operator is the most common and recommended
way to run a complex application like a PostgreSQL database management
system inside Kubernetes.

In this case, the state of a PostgreSQL cluster (for example its number of
replicas, configuration, storage, etc.) is defined in a Custom Resource
Definition in the form of configuration, typically YAML, and the operator
works with Kubernetes to ensure that, at any moment, the requested Postgres
cluster matches the observed one. This is a very basic example in
CloudNativePG:
https://cloudnative-pg.io/documentation/current/samples/cluster-example.yaml

As I was mentioning above, in a Cloud Native environment it is expected
that workloads are secure by default. Without going into much detail, many
decisions have been made in that direction by operators for Postgres,
including CloudNativePG. The goal of this proposal is to provide a way to
ensure that changes to the PostgreSQL configuration in a Kubernetes
controlled Postgres cluster are allowed only through the Kubernetes API.

Basically, if you want to change an option for PostgreSQL, you need to
change the desired state in the Kubernetes resource, then Kubernetes will
converge (through the operator). In simple words, it’s like empowering the
operator to impersonate the PostgreSQL superuser.

However, given that we cannot force this use case, there could be roles
with the login+superuser privileges connecting to the PostgreSQL instance
and potentially “interfering” with the requested state defined in the
configuration by imperatively running “ALTER SYSTEM” commands.

For example: CloudNativePG has a fixed value for some GUCs in order to
manage a full HA cluster, including SSL, log, some WAL and replication
settings. While the operator eventually reconciles those settings, even the
temporary change of those settings in a cluster might be harmful. Think for
example of a user that, through `ALTER SYSTEM`, tries to change WAL level
to minimal, or change the setting of the log (we require CSV), potentially
creating issues to the underlying instance and cluster (potentially leaving
it in an unrecoverable state in the case of other more invasive GUCS).

At the moment, a possible workaround is that `ALTER SYSTEM` can be blocked
by making the postgresql.auto.conf read only, but the returned message is
misleading and that’s certainly bad user experience (which is very
important in a cloud native environment):

```
postgres=# ALTER SYSTEM SET wal_level TO minimal;
ERROR:  could not open file "postgresql.auto.conf": Permission denied
```

For this reason, I would like to propose the option to be given to the
postgres process at startup, in order to be as less invasive as possible
(the operator could then start Postgres requesting `ALTER SYSTEM` to be
disabled). That’d be my preference at the moment, if possible.

Alternatively, or in addition, the introduction of a GUC to disable `ALTER
SYSTEM` altogether. This enables tuning this setting through configuration
at the Kubernetes level, only if the operators require it - without
damaging the rest of the users.

Before I start writing any lines of code and propose a patch, I would like
first to understand if there’s room for it.

Thanks for your attention and … looking forward to your feedback!

Ciao,
Gabriele
--
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com