Re: Identify huge pages accessibility using madvise
Hi Dmitry, I've been attempting to replicate this issue directly in Kubernetes, but I haven't been successful so far. I've been using EKS nodes, and it seems that they all run cgroup v2 now. Do you have anything that could help me get started on this more quickly? Thanks, Gabriele On Sat, 13 Apr 2024 at 18:24, Dmitry Dolgov <9erthali...@gmail.com> wrote: > Hi, > > I would like to propose a small patch to address an annoying issue with > the way how PostgreSQL does fallback in case if "huge_pages = try" is > set. Here is how the problem looks like: > > * PostgreSQL is starting on a machine with some huge pages available > > * It tries to identify that fact and does mmap with MAP_HUGETLB, which > succeeds > > * But it has a pleasure to run inside a cgroup with a hugetlb > controller and limits set to 0 (or anything less than PostgreSQL > needs) > > * Under this circumstances PostgreSQL will proceed allocating huge > pages, but the first page fault will trigger SIGBUS > > I've sketched out how to reproduce it with cgroup v1 and v2 in the > attached scripts. > > This sounds like quite a rare combination of factors, but apparently > it's fairly easy to face this on K8s/OpenShift. There was a bug reported > some time ago [1] about this behaviour, and back then I was under the > impression it's a solved matter with nothing to do. Yet I still observe > this type of issues, the latest one not longer than a week ago. > > After some research I found what looks to me like a relatively simple > way to address the problem. In Linux kernel 5.14 a new flag to madvise > was introduced that might be just what we need here. It's called > MADV_POPULATE_READ [2] and it tells kernel to populate page tables by > triggering read faults if required. One by-design feature of this flag > is to fail the madvise call in the situations like one above, giving an > opportunity to avoid SIGBUS. > > I've outlined a patch to implement this approach and tested it on a > newish Linux kernel I've got lying around (6.9.0-rc1) -- no SIGBUS, > PostgreSQL does fallback to not use huge pages. The resulting change > seems to be small enough to justify addressing this small but annoying > issue. Any thoughts or commentaries about the proposal? > > [1]: > https://www.postgresql.org/message-id/flat/HE1PR0701MB256920EEAA3B2A9C06249F339E110%40HE1PR0701MB2569.eurprd07.prod.outlook.com > [2]: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4ca9b3859dac14bbef0c27d00667bb5b10917adb > -- Gabriele Bartolini VP, Chief Architect, Kubernetes enterprisedb.com
Re: RFC: Additional Directory for Extensions
Hi David, Thanks for your email. On Mon, 26 Aug 2024 at 16:07, David E. Wheeler wrote: > I would assume BINDIR, DOCDIR, HTMLDIR, PKGLIBDIR, MANDIR, SHAREDIR, and > perhaps LOCALEDIR. > > But even if it’s just one or two, the only proper way an extension > directory would work, IME, is to define a directory-based structure for > extensions, where every file for an extension is in a directory named for > the extension, and subdirectories are defined for each of the above > requisite file types. Something like: > > extension_name > ├── control.ini > ├── bin > ├── doc > ├── html > ├── lib > ├── local > ├── man > └── share > I'm really glad you proposed this publicly. I reached the same conclusion the other day when digging deeper into the problem with a few folks from CloudNativePG. Setting aside multi-arch images for now, if we could reorganize the content of a single image (identified by OS distro, PostgreSQL major version, and extension version) with a top-level directory structure as you described, we could easily mount each image as a separate volume. The extension image could follow a naming convention like this (order can be adjusted): `---(-)`. For example, `pgvector-16-0.7.4-bookworm-1` would represent the first image built in a repository for pgvector 0.7.4 for PostgreSQL 16 on Debian Bookworm. If multi-arch images aren't desired, we could incorporate the architecture somewhere in the naming convention. This would allow multiple paths to work and keep all the files for an > extension bundled together. It could also potentially allow for multiple > versions of an extension to be installed at once, if we required the > version to be part of the directory name. > If we wanted to install multiple versions of an extension, we could mount them in different directories, with the version included in the folder name—for example, `pgvector-0.7.4` instead of just `pgvector`. However, I'm a bit rusty with the extensions framework, so I'll need to check if this approach is feasible and makes sense. Thanks, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: RFC: Additional Directory for Extensions
Hi Jelte, On Thu, 22 Aug 2024 at 09:32, Jelte Fennema-Nio wrote: > It looks like you want one directory per extension, so that list would > get pretty long if you have multiple extensions. Maybe (as a follow up > change), we should start to support a * as a wildcard in both of these > GUCs. So you could say: > > SET extension_search_path = /mnt/extensions/pg16/* > > To mean effectively the same as you're describing above. > That'd be great. +1. -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: RFC: Additional Directory for Extensions
Hi Craig, On Thu, 22 Aug 2024 at 01:07, Craig Ringer wrote: > It's also very relevant for local development and testing. > Yep, which is the original goal of Christoph IIRC. > It may be possible to weaken this restriction somewhat thanks to the > upcoming > https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/ > feature that permits additional OCI images to be mounted as read-only > volumes on a workload. This would still only permit mounting at > Pod-creation time, not runtime mounting and unmonuting, but means the > base postgres image could be supplemented by mounting additional > images for extensions. > I'm really excited about that feature, but it's still in the alpha stage. However, I don't anticipate any issues for the future general availability (GA) release. Regardless, we may need to consider a temporary solution that is compatible with existing Kubernetes and possibly Postgres versions (but that's beyond the purpose of this thread). For example, one might mount image "postgis-vX.Y.Z" image onto base > image "postgresql-16" if support for PostGIS is desired, without then > having to bake every possible extension anyone might ever want into > the base image. This solves all sorts of messy issues with upgrades > and new version releases. > Yep. > But for it to work, it must be possible to tell postgres to look in > _multiple places_ for extension .sql scripts and control files. This > is presently possible for modules (dynamic libraries, .so / .dylib / > .dll) but without a way to also configure the path for extensions it's > of very limited utility. > Agree. > So IMO this should be a _path_ to search for extension control files > and SQL scripts. > I like this. I also prefer the name `extension_search_path`. For safety, it might make sense to impose the restriction that if an > extension control file is found in a given directory, SQL scripts will > also only be looked for in that same directory. That way there's no > chance of accidentally mixing and matching SQL scripts from different > versions of an extension if it appears twice on the extension search > path in different places. The rule for loading SQL scripts would be: > > * locate first directory on path contianing matching extension control file > * use this directory as the extension directory for all subsequent SQL > script loading and running actions > It could work, but it requires some prototyping and exploration. I'm willing to participate and use CloudNativePG as a test bed. Cheers, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: RFC: Additional Directory for Extensions
Hi everyone, Apologies for only starting to look into this now. Thanks, David, for pushing this forward. I want to emphasize the importance of this patch for the broader adoption of extensions in immutable container environments, such as those used by the CloudNativePG operator in Kubernetes. To provide some context, one of the key principles of CloudNativePG is that containers, once started, cannot be modified—this includes the installation of Postgres extensions and their libraries. This restriction prevents us from adding extensions on the fly, requiring them to be included in the main PostgreSQL operand image. As a result, users who need specific extensions must build custom images through automated pipelines (see: https://cloudnative-pg.io/blog/creating-container-images/). We’ve been considering ways to improve this process for some time. The direction we're exploring involves mounting an ephemeral volume that contains the necessary extensions (namely $sharedir and $pkglibdir from pg_config). These volumes would be created and populated with the required extensions when the container starts and destroyed when it shuts down. To make this work, each extension must be independently packaged as a container image containing the appropriate files for a specific extension version, tailored to the architecture, distribution, OS version, and Postgres version. I’m committed to thoroughly reviewing this patch, testing it with CloudNativePG and a few extensions, and providing feedback as soon as possible. Best, Gabriele On Mon, 8 Jul 2024 at 18:02, David E. Wheeler wrote: > On Jun 25, 2024, at 18:31, David E. Wheeler wrote: > > > For those who prefer a GitHub patch review experience, see this PR: > > > > https://github.com/theory/postgres/pull/3/files > > Rebased and restored PGC_SUSET in the attached v5 patch, plus noted the > required privileges in the docs. > > Best, > > David > > > > -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Joel, On Wed, 7 Feb 2024 at 10:00, Joel Jacobson wrote: > On Fri, Sep 8, 2023, at 16:17, Gabriele Bartolini wrote: > > ``` > > postgres=# ALTER SYSTEM SET wal_level TO minimal; > > ERROR: could not open file "postgresql.auto.conf": Permission denied > > ``` > > +1 to simply mark postgresql.auto.conf file as not being writeable. > > To improve the UX experience, how about first checking if the file is not > writeable, or catch EACCESS, and add a user-friendly hint? > > ``` > postgres=# ALTER SYSTEM SET wal_level TO minimal; > ERROR: could not open file "postgresql.auto.conf": Permission denied > HINT: The ALTER SYSTEM command is effectively disabled as the > configuration file is set to read-only. > ``` > This would do - provided we fix the issue with pg_rewind not handling read-only files in PGDATA. Cheers, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Jelte, On Tue, 6 Feb 2024 at 16:22, Jelte Fennema-Nio wrote: > I'm not convinced we need a new file to disable ALTER SYSTEM. I feel > like an "enable_alter_system" GUC that defaults to ON would work fine > for this. If we make that GUC be PGC_POSTMASTER then an operator can > disable ALTER SYSTEM either with a command line argument or by > changing the main config file. Since this feature is mostly useful > when the config file is managed by an external system, it seems rather > simple for that system to configure one extra GUC in the config file. > This is mostly the approach I have taken in the patch, except allowing to change the value in the configuration file. The patch at the moment was enforcing just the setting at startup (which is more than enough for a Kubernetes operator given that Postgres runs in the container). I had done some experiments enabling the change in the configuration file, but wasn't sure in which `config_group` to place the 'enable_alter_system` GUC, based on the src/include/utils/guc_tables.h. Any thoughts/hints? Cheers, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi there, I very much like the idea of a file in the data directory that also controls the copy operations. Just wanted to highlight though that in our operator we have already applied the read-only postgresql.auto.conf trick to disable the system (see https://cloudnative-pg.io/documentation/current/postgresql_conf/#enabling-alter-system). However, having that file read-only triggered an issue when using pg_rewind to resync a former primary, as pg_rewind immediately bails out when a read-only file is encountered in the PGDATA (see https://github.com/cloudnative-pg/cloudnative-pg/issues/3698). We might keep this in mind if we go down the path of the separate file. Thanks, Gabriele On Wed, 31 Jan 2024 at 08:43, Peter Eisentraut wrote: > On 31.01.24 06:28, Tom Lane wrote: > >> The idea of adding a file to the data directory appeals to me. > >> > >> optional_runtime_features.conf > >> alter_system=enabled > >> copy_from_program=enabled > >> copy_to_program=disabled > > ... so, exactly what keeps an uncooperative superuser from > > overwriting that file? > > The point of this feature would be to keep the honest people honest. > > The first thing I did when ALTER SYSTEM came out however many years ago > was to install Nagios checks to warn when postgresql.auto.conf exists. > Because the thing is an attractive nuisance, especially when you want to > do centralized configuration control. Of course you can bypass it using > COPY PROGRAM etc., but then you *know* that you are *bypassing* > something. If you just see ALTER SYSTEM, you'll think, "that is > obviously the appropriate tool", and there is no generally accepted way > to communicate that, in particular environment, it might not be. > > -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi, I am sending an updated patch, and submitting this to the next commit fest, as I still believe this could be very useful. Thanks, Gabriele On Thu, 7 Sept 2023 at 21:51, Gabriele Bartolini < gabriele.bartol...@enterprisedb.com> wrote: > Hi everyone, > > I would like to propose a patch that allows administrators to disable > `ALTER SYSTEM` via either a runt-time option to pass to the Postgres server > process at startup (e.g. `--disable-alter-system=true`, false by default) > or a new GUC (or even both), without changing the current default method of > the server. > > The main reason is that this would help improve the “security by default” > posture of Postgres in a Kubernetes/Cloud Native environment - and, in > general, in any environment on VMs/bare metal behind a configuration > management system in which changes should only be made in a declarative way > and versioned like Ansible Tower, to cite one. > > Below you find some background information and the longer story behind > this proposal. > > Sticking to the Kubernetes use case, I am primarily speaking on behalf of > the CloudNativePG open source operator (cloudnative-pg.io, of which I am > one of the maintainers). However, I am sure that this option could benefit > any operator for Postgres - an operator is the most common and recommended > way to run a complex application like a PostgreSQL database management > system inside Kubernetes. > > In this case, the state of a PostgreSQL cluster (for example its number of > replicas, configuration, storage, etc.) is defined in a Custom Resource > Definition in the form of configuration, typically YAML, and the operator > works with Kubernetes to ensure that, at any moment, the requested Postgres > cluster matches the observed one. This is a very basic example in > CloudNativePG: > https://cloudnative-pg.io/documentation/current/samples/cluster-example.yaml > > As I was mentioning above, in a Cloud Native environment it is expected > that workloads are secure by default. Without going into much detail, many > decisions have been made in that direction by operators for Postgres, > including CloudNativePG. The goal of this proposal is to provide a way to > ensure that changes to the PostgreSQL configuration in a Kubernetes > controlled Postgres cluster are allowed only through the Kubernetes API. > > Basically, if you want to change an option for PostgreSQL, you need to > change the desired state in the Kubernetes resource, then Kubernetes will > converge (through the operator). In simple words, it’s like empowering the > operator to impersonate the PostgreSQL superuser. > > However, given that we cannot force this use case, there could be roles > with the login+superuser privileges connecting to the PostgreSQL instance > and potentially “interfering” with the requested state defined in the > configuration by imperatively running “ALTER SYSTEM” commands. > > For example: CloudNativePG has a fixed value for some GUCs in order to > manage a full HA cluster, including SSL, log, some WAL and replication > settings. While the operator eventually reconciles those settings, even the > temporary change of those settings in a cluster might be harmful. Think for > example of a user that, through `ALTER SYSTEM`, tries to change WAL level > to minimal, or change the setting of the log (we require CSV), potentially > creating issues to the underlying instance and cluster (potentially leaving > it in an unrecoverable state in the case of other more invasive GUCS). > > At the moment, a possible workaround is that `ALTER SYSTEM` can be blocked > by making the postgresql.auto.conf read only, but the returned message is > misleading and that’s certainly bad user experience (which is very > important in a cloud native environment): > > ``` > postgres=# ALTER SYSTEM SET wal_level TO minimal; > ERROR: could not open file "postgresql.auto.conf": Permission denied > ``` > > For this reason, I would like to propose the option to be given to the > postgres process at startup, in order to be as less invasive as possible > (the operator could then start Postgres requesting `ALTER SYSTEM` to be > disabled). That’d be my preference at the moment, if possible. > > Alternatively, or in addition, the introduction of a GUC to disable `ALTER > SYSTEM` altogether. This enables tuning this setting through configuration > at the Kubernetes level, only if the operators require it - without > damaging the rest of the users. > > Before I start writing any lines of code and propose a patch, I would like > first to understand if there’s room for it. > > Thanks for your attention and … looking forward to your feedback! > > Ciao, > Gabriele > -- > Gabriele Bartolini > Vice President, Cloud Native at EDB > enterprisedb.com > -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com 0001-Add-enable_alter_system-GUC.patch Description: Binary data
Re: Extend pgbench partitioning to pgbench_history
Hi Abhijit, Thanks for your input. Please accept my updated patch. Ciao, Gabriele On Tue, 16 Jan 2024 at 12:53, Abhijit Menon-Sen wrote: > At 2023-11-30 11:29:15 +0100, gabriele.bartol...@enterprisedb.com wrote: > > > > With the attached patch, I extend the partitioning capability to the > > pgbench_history table too. > > > > I have been thinking of adding an option to control this, but I preferred > > to ask in this list whether it really makes sense or not (I struggle > indeed > > to see use cases where accounts is partitioned and history is not). > > I don't have a strong opinion about this, but I also can't think of a > reason to want to create partitions for pgbench_accounts but leave out > pgbench_history. > > > From ba8f507b126a9c5bd22dd40bb8ce0c1f0c43ac59 Mon Sep 17 00:00:00 2001 > > From: Gabriele Bartolini > > Date: Thu, 30 Nov 2023 11:02:39 +0100 > > Subject: [PATCH] Include pgbench_history in partitioning method for > pgbench > > > > In case partitioning, make sure that pgbench_history is also partitioned > with > > the same criteria. > > I think "If partitioning" or "If we're creating partitions" would read > better here. Also, same criteria as what? Maybe you could just add "as > pgbench_accounts" to the end. > > > diff --git a/doc/src/sgml/ref/pgbench.sgml > b/doc/src/sgml/ref/pgbench.sgml > > index 05d3f81619..4c02d2a61d 100644 > > --- a/doc/src/sgml/ref/pgbench.sgml > > +++ b/doc/src/sgml/ref/pgbench.sgml > > […] > > @@ -378,9 +378,9 @@ pgbench > options d > > > --partitions=NUM > > > > > > -Create a partitioned pgbench_accounts table > with > > -NUM partitions of nearly equal size > for > > -the scaled number of accounts. > > +Create partitioned pgbench_accounts and > pgbench_history > > +tables with NUM partitions of nearly > equal size for > > +the scaled number of accounts - and future history records. > > Default is 0, meaning no partitioning. > > > > I would just leave out the "-" and write "number of accounts and future > history records". > > > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c > > index 2e1650d0ad..87adaf4d8f 100644 > > --- a/src/bin/pgbench/pgbench.c > > +++ b/src/bin/pgbench/pgbench.c > > […] > > @@ -889,8 +889,10 @@ usage(void) > > " --index-tablespace=TABLESPACE\n" > > " create indexes in the > specified tablespace\n" > > " --partition-method=(range|hash)\n" > > -" partition pgbench_accounts > with this method (default: range)\n" > > -" --partitions=NUM partition pgbench_accounts > into NUM parts (default: 0)\n" > > +" partition pgbench_accounts > and pgbench_history with this method" > > +" (default: range)." > > +" --partitions=NUM partition pgbench_accounts > and pgbench_history into NUM parts" > > + " (default: 0)\n" > > " --tablespace=TABLESPACE create tables in the > specified tablespace\n" > > " --unlogged-tablescreate tables as unlogged > tables\n" > > "\nOptions to select what to run:\n" > > There's a missing newline after "(default: range).". > > I read through the rest of the patch closely. It looks fine to me. It > applies, builds, and does create the partitions as intended. > > -- Abhijit > -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com v2-0001-Include-pgbench_history-in-partitioning-method-fo.patch Description: Binary data
Re: Extend pgbench partitioning to pgbench_history
Please discard the previous patch and use this one (it had a leftover comment from an initial attempt to limit this to hash case). Thanks, Gabriele On Thu, 30 Nov 2023 at 11:29, Gabriele Bartolini < gabriele.bartol...@enterprisedb.com> wrote: > Hi there, > > While benchmarking a new feature involving tablespace support in > CloudNativePG (Kubernetes operator), I wanted to try out the partitioning > feature of pgbench. I saw it supporting both range and hash partitioning, > but limited to pgbench_accounts. > > With the attached patch, I extend the partitioning capability to the > pgbench_history table too. > > I have been thinking of adding an option to control this, but I preferred > to ask in this list whether it really makes sense or not (I struggle indeed > to see use cases where accounts is partitioned and history is not). > > Please let me know what you think. > > Thanks, > Gabriele > -- > Gabriele Bartolini > Vice President, Cloud Native at EDB > enterprisedb.com > -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com 0001-Include-pgbench_history-in-partitioning-method-for-p.patch Description: Binary data
Extend pgbench partitioning to pgbench_history
Hi there, While benchmarking a new feature involving tablespace support in CloudNativePG (Kubernetes operator), I wanted to try out the partitioning feature of pgbench. I saw it supporting both range and hash partitioning, but limited to pgbench_accounts. With the attached patch, I extend the partitioning capability to the pgbench_history table too. I have been thinking of adding an option to control this, but I preferred to ask in this list whether it really makes sense or not (I struggle indeed to see use cases where accounts is partitioned and history is not). Please let me know what you think. Thanks, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com 0001-Include-pgbench_history-in-partitioning-method-for-p.patch Description: Binary data
Re: Possibility to disable `ALTER SYSTEM`
Hi Greg, On Wed, 13 Sept 2023 at 19:10, Greg Sabino Mullane wrote: > Seems to be some resistance to getting this in core, so why not just use > an extension? I was able to create a quick POC to do just that. Hook into > PG and look for AlterSystemStmt, throw a "Sorry, ALTER SYSTEM is not > currently allowed" error. Put into shared_preload_libraries and you're > done. As a bonus, works on all supported versions, so no need to wait for > Postgres 17 - or Postgres 18/19 given the feature drift this thread is > experiencing :) > As much as I would like to see your extension, I would still like to understand why Postgres itself shouldn't solve this basic requirement coming from the configuration management driven/Kubernetes space. It shouldn't be a big deal to have such an option, either as a startup one or a GUC, should it? Thanks, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Stephen, On Mon, 11 Sept 2023 at 17:12, Stephen Frost wrote: > A lot of the objections seem to be on the grounds of returning a > 'permission denied' kind of error and I generally agree with that being > the wrong approach. > > As an alternative idea- what if we had something in postgresql.conf > along the lines of: > > include_alter_system = true/false > > and use that to determine if the postgresql.auto.conf is included or > not..? > That sounds like a very good idea. I had thought about that when writing the PoC, as a SIGHUP controlled GUC. I had trouble finding an adequate GUC category for that (ideas?), and thought it was a more intrusive patch to trigger the conversation. But I am willing to explore that too (which won't change by any inch the goal of the patch). With the above, we could throw a WARNING or maybe just NOTICE when ALTER > SYSTEM is run that 'include_alter_system is false and therefore these > changes won't be included in the running configuration' or similar. > That's also an option I'd be willing to explore with folks here. > What this does cause problems with is that pg_basebackup and other tools > (eg: pgbackrest) write into postgresql.auto.conf currently and we'd want > those to still work. That's an opportunity, imv, though, since I don't > really think where ALTER SYSTEM writes to and where backup/restore > tools are writing to should really be the same place anyway. Therefore, > perhaps we add a 'postgresql.system.conf' or similar and maybe a > corresponding option in postgresql.conf to include it or not. > Totally. We are for example in the same position with the CloudNativePG operator, and it is something we are intending to fix ( https://github.com/cloudnative-pg/cloudnative-pg/issues/2727). I agree with you that it is the wrong place to do it. This is an issue if you're looking at it as a security thing. This > isn't an issue if don't view it that way. Still, I do see some merit in > making it so that you can't actually change the config that's loaded at > system start from inside the data directory or as the PG superuser, > which my proposal above would support- just configure in postgresql.conf > to not include any of the alter-system or generated config. The actual > postgresql.conf could be owned by root then too. > +1. Thank you, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Magnus, On Mon, 11 Sept 2023 at 16:04, Magnus Hagander wrote: > But to do that, there would need to be a very in-your-face warning in > the documentation about it like "note that this only disables the > ALTER SYSTEM command. It does not prevent a superuser from changing > the configuration remotely using other means". > Although I did not include any docs in the PoC patch, that's exactly the plan. So +1 from me. > Yes, this is marginally harder than saying ALTER SYSTEM SET > work_mem='1TB', but only very very marginally so. And from a security > perspective, there is zero difference. > Agree, but the primary goal is not security. Indeed, security requires a more holistic approach (and in my initial thread I deliberately did not mention all the knobs that the operator provides, with stricter and stricter default values, as I thought it was not relevant from a Postgres' PoV). However, as I explained in the patch PoC thread, the change is intended primarily to warn legitimate administrators in a configuration managed controlled environment that ALTER SYSTEM has been disabled for that system. These are environments where network access for a superuser is prohibited, but still possible for local SREs to log in via the container in the pod for incident resolution - very often this happens in high stress conditions and I believe that this gate will help them remind that if they want to change the settings they need to do it through the Kubernetes resources. So primarily: usability. Another idea to solve the problem could be to instead introduce a > specific configuration file (either hardcoded or an > include_final_parameter= parameter, in which case patroni or the > k8s operator could set that parameter on the command line and that > parameter only) that is parsed *after* postgresql.auto.conf and > thereby would override the manual settings. This file would explicilty > be documented as intended for this type of tooling, and when you have > a tool - whether patroni or another declarative operator - it owns > this file and can overwrite it with whatever it wants. And this would > also retain the ability to use ALTER SYSTEM SET for *other* > parameters, if they want to. > But that is exactly the whole point of this request: disable the last operation altogether. This option will easily give any operator (or deployment in a configuration management scenario) the possibility to ship a system that, out-of-the box, facilitates this one direction update of the configuration, allowing the observed state to easily reconcile with the desired one. Without breaking any existing deployment. > Stopping ALTER SYSTEM SET without actually preventing the superuser > from doing the same thing as they were doing before would to me be at > least as much of a hack as what patroni does today is. > Agree, but as I said above, that'd be at that point the role of an operator. An operator, at that point, will have the possibility to configure this knob in conjunction with others. A possibility that Postgres is not currently giving. Postgres itself should be able to give this possibility, as these environments demand Postgres to address their emerging needs. Thank you, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Magnus, On Fri, 8 Sept 2023 at 23:43, Magnus Hagander wrote: > +1. And to make that happen, the appropriate thing is to identify > *why* they are using superuser today, and focus efforts on finding > ways for them to do that without being superuser. > As I am explaining in the other post (containing a very basic proof of concept patch), it is not about restricting superuser. It is primarily a usability and configuration matter of a running instance, which helps control an entire cluster like in the case of Kubernetes (where, in order to provide self-healing and high availability we are forced to go beyond the single instance and think in terms of primary with one or more standbys or at least continuous backup in place). Thanks, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Tom and Alvaro, On Fri, 8 Sept 2023 at 17:31, Tom Lane wrote: > Alvaro Herrera writes: > > I don't understand Tom's resistance to this request. > > It's false security. If you think you are going to prevent a superuser > from messing with the system's configuration, you are going to need a > lot more restrictions than this, and we'll be forever getting security > reports that "hey, I found another way for a superuser to get filesystem > access". I think the correct answer to this class of problems is "don't > give superuser privileges to clients running inside the container". > Ok, this is clearer. That makes sense now, and this probably helps me explain better the goal here. I also omitted in the initial email all the security precautions that a Kubernetes should take. This could be another step towards that direction but, you are right, it won't fix it entirely (in case of malicious superusers). In my opinion, the biggest benefit of this possibility is on the usability side, providing a clear and configurable way to disable ALTER SYSTEM in those environments where declarative configuration is a requirement. For example, this should at least "warn" human beings that have the permissions to connect to a Postgres database (think of SREs managing a DBaaS solution or a DBA) and try to change a setting in an instance. Moreover, for those who are managing through declarative configuration not only one instance, but a Postgres cluster that controls standby instances too, the benefit of impeding these modifications could be even higher (think of the hot standby sensitive parameters like max_connections that require coordination depending whether you increase or decrease them). I hope this is clearer. For what it's worth, I have done a basic PoC patch (roughly 20 lines of code), which I have attached here just to provide some basis for further analysis and comments. The general idea is to disable ALTER SYSTEM at startup, like this: pg_ctl start -o "-c enable_alter_system=off" The setting can be verified with: psql -c 'SHOW enable_alter_system' enable_alter_system - off (1 row) And then: psql -c 'ALTER SYSTEM SET max_connections TO 10' ERROR: permission denied to run ALTER SYSTEM Thanks for your attention and looking forward to getting feedback and advice. Cheers, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com enable_alter_system_guc.patch Description: Binary data
Re: Possibility to disable `ALTER SYSTEM`
Hi Isaac, On Fri, 8 Sept 2023 at 16:11, Isaac Morland wrote: > Alternate idea, not sure how good this is: Use existing OS security > features (regular permissions, or more modern features such as the > immutable attribute) to mark the postgresql.auto.conf file as not being > writeable. Then any attempt to ALTER SYSTEM should result in an error. > That is the point I highlighted in the initial post in the thread. We could make it readonly, but the returned error is misleading and definitely poor UX: ``` postgres=# ALTER SYSTEM SET wal_level TO minimal; ERROR: could not open file "postgresql.auto.conf": Permission denied ``` IMO we should clearly state that `ALTER SYSTEM` is deliberately disabled in a system, rather than indirectly hinting it through an inaccessible file. Not sure if I am clearly highlighting the fine difference here. Thanks, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Tom, On Thu, 7 Sept 2023 at 22:27, Tom Lane wrote: > Gabriele Bartolini writes: > > I would like to propose a patch that allows administrators to disable > > `ALTER SYSTEM` via either a runt-time option to pass to the Postgres > server > > process at startup (e.g. `--disable-alter-system=true`, false by default) > > or a new GUC (or even both), without changing the current default method > of > > the server. > > ALTER SYSTEM is already heavily restricted. Could you please help me better understand what you mean here? > I don't think we need random kluges added to the permissions system. If you allow me, why do you think disabling ALTER SYSTEM altogether is a random kluge? Again, I'd like to better understand this position. I've personally been in many conversations on the security side of things for Postgres in Kubernetes environments, and this is a frequent concern by users who request that changes to the Postgres system (not a database) should only be done declaratively and prevented from within the system. Thanks, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Re: Possibility to disable `ALTER SYSTEM`
Hi Joe, On Thu, 7 Sept 2023 at 21:57, Joe Conway wrote: > Without trying to debate the particulars, a huge +1 for the concept of > allowing ALTER SYSTEM to be disabled. FWIW I would vote the same for > control over COPY PROGRAM. > Oh ... another one of my favourite security friendly features! :) That sounds like a good idea to me. Thanks, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com
Possibility to disable `ALTER SYSTEM`
Hi everyone, I would like to propose a patch that allows administrators to disable `ALTER SYSTEM` via either a runt-time option to pass to the Postgres server process at startup (e.g. `--disable-alter-system=true`, false by default) or a new GUC (or even both), without changing the current default method of the server. The main reason is that this would help improve the “security by default” posture of Postgres in a Kubernetes/Cloud Native environment - and, in general, in any environment on VMs/bare metal behind a configuration management system in which changes should only be made in a declarative way and versioned like Ansible Tower, to cite one. Below you find some background information and the longer story behind this proposal. Sticking to the Kubernetes use case, I am primarily speaking on behalf of the CloudNativePG open source operator (cloudnative-pg.io, of which I am one of the maintainers). However, I am sure that this option could benefit any operator for Postgres - an operator is the most common and recommended way to run a complex application like a PostgreSQL database management system inside Kubernetes. In this case, the state of a PostgreSQL cluster (for example its number of replicas, configuration, storage, etc.) is defined in a Custom Resource Definition in the form of configuration, typically YAML, and the operator works with Kubernetes to ensure that, at any moment, the requested Postgres cluster matches the observed one. This is a very basic example in CloudNativePG: https://cloudnative-pg.io/documentation/current/samples/cluster-example.yaml As I was mentioning above, in a Cloud Native environment it is expected that workloads are secure by default. Without going into much detail, many decisions have been made in that direction by operators for Postgres, including CloudNativePG. The goal of this proposal is to provide a way to ensure that changes to the PostgreSQL configuration in a Kubernetes controlled Postgres cluster are allowed only through the Kubernetes API. Basically, if you want to change an option for PostgreSQL, you need to change the desired state in the Kubernetes resource, then Kubernetes will converge (through the operator). In simple words, it’s like empowering the operator to impersonate the PostgreSQL superuser. However, given that we cannot force this use case, there could be roles with the login+superuser privileges connecting to the PostgreSQL instance and potentially “interfering” with the requested state defined in the configuration by imperatively running “ALTER SYSTEM” commands. For example: CloudNativePG has a fixed value for some GUCs in order to manage a full HA cluster, including SSL, log, some WAL and replication settings. While the operator eventually reconciles those settings, even the temporary change of those settings in a cluster might be harmful. Think for example of a user that, through `ALTER SYSTEM`, tries to change WAL level to minimal, or change the setting of the log (we require CSV), potentially creating issues to the underlying instance and cluster (potentially leaving it in an unrecoverable state in the case of other more invasive GUCS). At the moment, a possible workaround is that `ALTER SYSTEM` can be blocked by making the postgresql.auto.conf read only, but the returned message is misleading and that’s certainly bad user experience (which is very important in a cloud native environment): ``` postgres=# ALTER SYSTEM SET wal_level TO minimal; ERROR: could not open file "postgresql.auto.conf": Permission denied ``` For this reason, I would like to propose the option to be given to the postgres process at startup, in order to be as less invasive as possible (the operator could then start Postgres requesting `ALTER SYSTEM` to be disabled). That’d be my preference at the moment, if possible. Alternatively, or in addition, the introduction of a GUC to disable `ALTER SYSTEM` altogether. This enables tuning this setting through configuration at the Kubernetes level, only if the operators require it - without damaging the rest of the users. Before I start writing any lines of code and propose a patch, I would like first to understand if there’s room for it. Thanks for your attention and … looking forward to your feedback! Ciao, Gabriele -- Gabriele Bartolini Vice President, Cloud Native at EDB enterprisedb.com