Re: Dependency isn't created between extension and schema
On Mon, Dec 21, 2020 at 04:02:29PM +0900, Masahiko Sawada wrote: > Is it a bug? Since the created schema obviously depends on the > extension when we created the schema specified in the schema option, I > think we might want to create the dependency so that DROP EXTENSION > drops the schema as well. I’ve attached the draft patch so that CREATE > EXTENSION creates the dependency if it newly creates the schema. FWIW, I recall that the "soft" behavior that exists now is wanted, as it is more flexible for DROP EXTENSION: what you are suggesting here has the disadvantage to make DROP EXTENSION fail if any non-extension object has been created on this schema, so this could be disruptive when it comes to some upgrade scenarios. schema_name The name of the schema in which to install the extension's objects, given that the extension allows its contents to be relocated. The named schema must already exist. While on it.. The docs are incorrect here. As you say, CreateExtensionInternal() may internally create a schema. -- Michael signature.asc Description: PGP signature
Re: New Table Access Methods for Multi and Single Inserts
On Fri, Dec 18, 2020 at 11:54:39AM -0600, Justin Pryzby wrote: > On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote: > > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby wrote: > > > Are you thinking that TableInsertState would eventually have additional > > > attributes which would apply to other tableams, but not to heap ? Is > > > heap_insert_begin() really specific to heap ? It's allocating and > > > populating a > > > structure based on its arguments, but those same arguments would be > > > passed to > > > every other AM's insert_begin routine, too. Do you need a more flexible > > > data > > > structure, something that would also accomodate extensions? I'm thinking > > > of > > > reloptions as a loose analogy. > > > > I could not think of other tableam attributes now. But +1 to have that > > kind of flexible structure for TableInsertState. So, it can have > > tableam type and attributes within the union for each type. > > Right now you have heap_insert_begin(), and I asked if it was really > heap-specific. Right now, it populates a struct based on a static list of > arguments, which are what heap uses. > > If you were to implement a burp_insert_begin(), how would it differ from > heap's? With the current API, they'd (have to) be the same, which means > either > that it should apply to all AMs (or have a "default" implementation that can > be > overridden by an AM), or that this API assumes that other AMs will want to do > exactly what heap does, and fails to allow other AMs to implement > optimizations > for bulk inserts as claimed. > > I don't think using a "union" solves the problem, since it can only > accommodate > core AMs, and not extensions, so I suggested something like reloptions, which > have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET > toast.autovacuum_enabled). I think you'd want to handle things like: - a compressed AM wants to specify a threshold for a tuple's *compressed* size (maybe in addition to the uncompressed size); - a "columnar" AM wants to specify a threshold size for a column, rather than for each tuple; I'm not proposing to handle those specific parameters, but rather pointing out that your implementation doesn't allow handling AM-specific considerations, which I think was the goal. The TableInsertState structure would need to store those, and then the AM's multi_insert_v2 routine would need to make use of them. It feels a bit like we'd introduce the idea of an "AM option", except that it wouldn't be user-facing (or maybe some of them would be?). Maybe I've misunderstood though, so other opinions are welcome. -- Justin
Re: New Table Access Methods for Multi and Single Inserts
On Fri, Dec 18, 2020 at 11:24 PM Justin Pryzby wrote: > On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote: > > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby wrote: > > > Are you thinking that TableInsertState would eventually have additional > > > attributes which would apply to other tableams, but not to heap ? Is > > > heap_insert_begin() really specific to heap ? It's allocating and > > > populating a > > > structure based on its arguments, but those same arguments would be > > > passed to > > > every other AM's insert_begin routine, too. Do you need a more flexible > > > data > > > structure, something that would also accomodate extensions? I'm thinking > > > of > > > reloptions as a loose analogy. > > > > I could not think of other tableam attributes now. But +1 to have that > > kind of flexible structure for TableInsertState. So, it can have > > tableam type and attributes within the union for each type. > > Right now you have heap_insert_begin(), and I asked if it was really > heap-specific. Right now, it populates a struct based on a static list of > arguments, which are what heap uses. > > If you were to implement a burp_insert_begin(), how would it differ from > heap's? With the current API, they'd (have to) be the same, which means > either > that it should apply to all AMs (or have a "default" implementation that can > be > overridden by an AM), or that this API assumes that other AMs will want to do > exactly what heap does, and fails to allow other AMs to implement > optimizations > for bulk inserts as claimed. > > I don't think using a "union" solves the problem, since it can only > accommodate > core AMs, and not extensions, so I suggested something like reloptions, which > have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET > toast.autovacuum_enabled). IIUC, your suggestion is to make the heap options such as alloc_bistate(bulk insert state is required or not), mi_max_slots (number of maximum buffered slots/tuples) and mi_max_size (the maximum tuple size of the buffered slots) as reloptions with some default values in reloptions.c under RELOPT_KIND_HEAP category so that they can be modified by users on a per table basis. And likewise other tableam options can be added by the tableam developers. This way, the APIs will become more generic. The tableam developers need to add reloptions of their choice and use them in the new API implementations. Let me know if I am missing anything from what you have in your mind. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: pg_preadv() and pg_pwritev()
> > I'm drawing a blank on trivial candidate uses for preadv(), without > > infrastructure from later patches. > > Can't immediately think of something either. This might be not that trivial , but maybe acquire_sample_rows() from analyze.c ? Please note however there's patch https://www.postgresql.org/message-id/20201109180644.GJ16415%40tamriel.snowman.net ( https://commitfest.postgresql.org/30/2799/ ) for adding fadvise, but maybe those two could be even combined so you would be doing e.g. 16x fadvise() and then grab 8 pages in one preadv() call ? I'm find unlikely however that preadv give any additional performance benefit there after having fadvise, but clearly a potential place to test. -J.
Dependency isn't created between extension and schema
Hi all, While analyzing the issue James reported to us, I realized that if the schema option in the control file is specified and the schema doesn’t exist we create the schema on CREATE EXTENSION but the created schema doesn’t refer to the extension. Due to this behavior, the schema remains even on DROP EXTENSION. You can see this behavior by using the test_ext6 extension in src/test/module/test_extensions. In the control file, it has the schema option: $ cat src/test/modules/test_extensions/test_ext6.control comment = 'test_ext6' default_version = '1.0' relocatable = false superuser = true schema = 'test_ext6' On CREATE EXTENSION, the schema test_ext6 is created if not exist: postgres(1:692)=# create extension test_ext6 ; CREATE EXTENSION postgres(1:692)=# \dn List of schemas Name| Owner ---+-- public| masahiko test_ext6 | masahiko (2 rows) But it isn't dropped on DROP EXTENSION: postgres(1:692)=# drop extension test_ext6 ; DROP EXTENSION postgres(1:692)=# \dn List of schemas Name| Owner ---+-- public| masahiko test_ext6 | masahiko (2 rows) Is it a bug? Since the created schema obviously depends on the extension when we created the schema specified in the schema option, I think we might want to create the dependency so that DROP EXTENSION drops the schema as well. I’ve attached the draft patch so that CREATE EXTENSION creates the dependency if it newly creates the schema. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ create_dependency_schema_and_extension.patch Description: Binary data
Re: [PATCH] Logical decoding of TRUNCATE
On Sun, Dec 20, 2020 at 9:43 AM Noah Misch wrote: > > Since commit 039eb6e added logical replication support for TRUNCATE, logical > apply of the TRUNCATE fails if it chooses a parallel index build: > I think the TRUNCATE operation should not use parallelism either via apply worker or without it because there is nothing to scan in heap. Additionally, we can have an Assert or elog in InitializeParallelDSM to ensure that it is never invoked by parallel worker. -- With Regards, Amit Kapila.
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Fri, Dec 18, 2020 at 8:15 AM Bharath Rupireddy wrote: > On Fri, Dec 18, 2020 at 7:18 AM Michael Paquier wrote: > > On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote: > > > The behavior of the ctas/cmv, in case the relation already exists is as > > > shown in [1]. The things that have been changed with the patch are: 1) In > > > any case we do not rewrite or plan the select part if the relation already > > > exists 2) For explain ctas/cmv (without analyze), now the relation > > > existence is checked early and the error is thrown as highlighted in [1]. > > > > > > With patch, there is no behavioral change(from that of master branch) in > > > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the > > > notice. > > > > > > Thoughts? > > > > HEAD is already a mixed bad of behaviors, and the set of results you > > are presenting here is giving a similar impression. It brings in some > > sanity by just ignoring the effects of the IF NOT EXISTS clause all > > the time still that's not consistent with the queries not using > > EXPLAIN. > > I tried to make it consistent by issuing NOTICE (not an error) even > for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and > exit from the ExplainOneUtility, we could output an empty plan to the > user because, by now ExplainResultDesc would have been called at the > start of the explain via PortalStart(). I didn't find a clean way of > coding if we are not okay to show notice and empty plan to the user. > > Any suggestions on achieving above? > > > Hmm. Looking for similar behaviors, I can see one case in > > select_into.sql where we just never execute the plan when using WITH > > NO DATA but still show the plan, meaning that the query gets planned > > but it just gets marked as "(never executed)" if attempting to use > > ANALYZE. > > Yes, with no data we would see "(never executed)" for explain analyze > if the relation does not already exist. If the relation does exist, > then the error/notice. > > >There may be use cases for that as the user directly asked directly for an > >EXPLAIN. > > IMHO, in any case checking for the existence of the relations > specified in a query is must before we output something to the user. > For instance, the query "explain select * from non_existent_tbl;" > where non_existent_tbl doesn't exist, throws an error. Similarly, > "explain create table already_existing_tbl as select * from > another_tbl;" where the table ctas/select into trying to create > already exists, should also throw error. But that's not happening > currently on master. Which seems to be a problem to me. So, with the > patch proposed here, we error out in this case. > > If the user really wants to see the explain plan, then he/she should > use the correct query. > > > Note: the patch needs tests for all the patterns you would like to > > stress. This way it is easier to follow the patterns that are > > changing with your patch and compare them with the HEAD behavior (like > > looking at the diffs with the tests of the patch, but without the > > diffs in src/backend/). > > Sure, I will add test cases and post v3 patch. Attaching v3 patch that also contains test cases. Please review it further. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v3-0001-Fail-Fast-In-CTAS-CMV-If-Relation-Already-Exists.patch Description: Binary data
Re: Single transaction in the tablesync worker?
On Sat, Dec 19, 2020 at 12:10 PM Amit Kapila wrote: > > On Fri, Dec 18, 2020 at 6:41 PM Peter Smith wrote: > > > > I understand why you are trying to create this patch atop logical > decoding of 2PC patch but I think it is better to create this as an > independent patch and then use it to test 2PC problem. Also, please > explain what kind of testing you did to ensure that it works properly > after the table sync worker restarts after the crash. > Few other comments: == 1. * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think + * such workers are even iterated by this loop, and nobody else is removing them. + */ + if (slotname) + { The above FIXME is not clear to me. Actually, the crashed workers should restart, finish their work, and drop the slots. So not sure what exactly this FIXME refers to? 2. DropSubscription() { .. ReplicationSlotDropAtPubNode( + NULL, + conninfo, /* use conninfo to make a new connection. */ + subname, + syncslotname); .. } With the above call, it will form a connection with the publisher and drop the required slots. I think we need to save the connection info so that we don't need to connect/disconnect for each slot to be dropped. Later in this function, we again connect and drop the apply worker slot. I think we should connect just once drop the apply and table sync slots if any. 3. ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname) { .. + PG_TRY(); .. + PG_CATCH(); + { + /* NOP. Just gobble any ERROR. */ + } + PG_END_TRY(); Why are we suppressing the error instead of handling it the error in the same way as we do while dropping the apply worker slot in DropSubscription? 4. @@ -139,6 +141,28 @@ finish_sync_worker(void) get_rel_name(MyLogicalRepWorker->relid; CommitTransactionCommand(); + /* + * Cleanup the tablesync slot. + */ + { + extern void ReplicationSlotDropAtPubNode( + WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname); This is not how we export functions at other places? -- With Regards, Amit Kapila.
Re: \gsetenv
"David G. Johnston" writes: > On Sun, Dec 20, 2020 at 11:07 AM Tom Lane wrote: >> If we could draw a line between "safe" and "unsafe" environment >> variables, I'd be willing to consider a patch that allows directly >> setting only the former. But I don't see how to draw that line. > Because if you are writing > SELECT col1, col2, col3 OR SELECT expression AS col1 \gset > The query author has explicitly stated which variable names they exactly > want to change/create and nothing the server can do will be able to alter > those names. > Or *is* that the problem - the server might decide to send back a column > named "breakme1" in the first column position even though the user aliased > the column name as "col1"? Yeah, exactly. Just because the SQL output *should* have column names x, y, z doesn't mean it *will*, if the server is malicious. psql isn't bright enough to understand what column names the query ought to produce, so it just believes the column names that come back in the query result. > Would a "\gsetenv (col1, col2, col3, skip, col4)" be acceptable that leaves > the name locally defined while relying on column position to match? Hmm, maybe. The key point here is local vs. remote control of which variables get assigned to, and offhand that seems like it'd fix the problem. > How much do we want to handicap a useful feature because someone can use it > alongside "SELECT *"? Whether it's "SELECT *" or "SELECT 1 AS X" doesn't really matter here. The concern is that somebody has hacked the server to send back something that is *not* what you asked for. For that matter, since the actual update isn't visible to the user, the attacker could easily make the server send back all the columns the user expected ... plus some he didn't. The attackee might not even realize till later that something fishy happened. regards, tom lane
RE: Parallel INSERT (INTO ... SELECT ...)
Hi > Posting an updated set of patches to address recent feedback: > > - Removed conditional-locking code used in parallel-safety checking code > (Tsunakawa-san feedback). It turns out that for the problem test case, no > parallel-safety checking should be occurring that locks relations because > those inserts are specifying VALUES, not an underlying SELECT, so the > parallel-safety checking code was updated to bail out early if no underlying > SELECT is specified for the INSERT. No change to the test code was required. > - Added comment to better explain the reason for treating "INSERT ... > ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip) > - Added assertion to heap_prepare_insert() (Amit) > - Updated error handling for NULL index_expr_item case (Vignesh) + + index_oid_list = RelationGetIndexList(rel); ... As memtioned in the comments of RelationGetIndexList: * we return a copy of the list palloc'd in the caller's context. The caller * may list_free() the returned list after scanning it. Shall we list_free(index_oid_list) at the end of function ? Just to avoid potential memory leak. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
Hi The cfbost seems complains about the testcase: Command exited with code 1 perl dumpregr.pl === $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out C:/projects/postgresql/src/test/regress/results/write_parallel.out --- C:/projects/postgresql/src/test/regress/expected/write_parallel.out 2020-12-21 01:41:17.745091500 + +++ C:/projects/postgresql/src/test/regress/results/write_parallel.out 2020-12-21 01:47:20.375514800 + @@ -1204,7 +1204,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows) @@ -1233,7 +1233,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows) Best regards, houzj
Re: Proposed patch for key managment
Greetings, * Alastair Turner (min...@decodable.me) wrote: > On Mon, 21 Dec 2020 at 00:33, Stephen Frost wrote: > > * Alastair Turner (min...@decodable.me) wrote: > > > What I'd like specifically is to have the option of an external > > > keyring as a first class key store, where the keys stored in the > > > external keyring are used directly to encrypt the database contents. > > > > Right- understood, but that's not what the patch does today and there > > isn't anyone who has proposed such a patch, nor explained why we > > couldn't add that capability later. > > I'm keen to propose a patch along those lines, even if just as a > sample. Minimising the amount of code which would have to be unpicked > in that effort would be great. Particularly avoiding any changes to > data structures, since that may effectively block adding new > capabilities. Ok, sure, but are there such changes that would need to be made for this case...? Seems to only be speculation at this point. > > > Regarding the on-disk format, separate storage of the key HMACs and > > > the wrapped keys sounds like a requirement for implementing a fully > > > external keyring or doing key wrapping externally via an OpenSSL or > > > nss tls pluggable engine. I'm still looking through the code. > > > > This seems a bit confusing as the question at hand is the on-disk format > > of the internal keyring, not anything to do with an external keyring > > (which we wouldn't have any storage of and so I don't understand how it > > makes sense to even discuss the idea of storage for the external > > keyring..?). > > A requirement for validating the keys, no matter which keyring they > came from, was mentioned along the way. Since there would be no point > in validating keys from the internal ring twice, storing the > validation data (HMACs in the current design) separately from the > internal keyring's keys seems like it would avoid changing the data > structures for the internal keyring when adding an external option. This doesn't strike me as very clear- specifically which HMACs are you referring to which would need to be stored separately from the internal keyring to make it possible for an external keyring to be used? > > Again, we will need the ability to perform the encryption using a simple > > passphrase provided by the user, while using multiple randomly generated > > data encryption keys, and that's what the latest set of patches are > > geared towards. I'm generally in support of adding additional > > capabilities in this area in the future, but I don't think we need to > > bog down the current effort by demanding that be implemented today. > > I'm really not looking to bog down the current effort. Great, glad to hear that. > I'll have a go at adding another keyring and/or abstracting the key > wrap and see how well a true peer to the passphrase approach fits in. Having patches to review and consider definitely helps, imv. Thanks, Stephen signature.asc Description: PGP signature
Re: Proposed patch for key managment
Thanks Stephen, On Mon, 21 Dec 2020 at 00:33, Stephen Frost wrote: > > Greetings, > > * Alastair Turner (min...@decodable.me) wrote: ... > > > > What I'd like specifically is to have the option of an external > > keyring as a first class key store, where the keys stored in the > > external keyring are used directly to encrypt the database contents. > > Right- understood, but that's not what the patch does today and there > isn't anyone who has proposed such a patch, nor explained why we > couldn't add that capability later. > I'm keen to propose a patch along those lines, even if just as a sample. Minimising the amount of code which would have to be unpicked in that effort would be great. Particularly avoiding any changes to data structures, since that may effectively block adding new capabilities. > > >...Your description above of changes to pass keys out > > of the command sound like they may have addressed this. > > The updates are intended to make it so that the KEK which wraps the > internal keyring will be obtained directly from the cluster key command, > pushing the transformation of a passphrase (should one be provided) into > a proper key to the script, but otherwise allowing the result of things > like 'openssl rand -hex 32' to be used directly as the KEK. > Sounds good. > > > Regarding the on-disk format, separate storage of the key HMACs and > > the wrapped keys sounds like a requirement for implementing a fully > > external keyring or doing key wrapping externally via an OpenSSL or > > nss tls pluggable engine. I'm still looking through the code. > > This seems a bit confusing as the question at hand is the on-disk format > of the internal keyring, not anything to do with an external keyring > (which we wouldn't have any storage of and so I don't understand how it > makes sense to even discuss the idea of storage for the external > keyring..?). > A requirement for validating the keys, no matter which keyring they came from, was mentioned along the way. Since there would be no point in validating keys from the internal ring twice, storing the validation data (HMACs in the current design) separately from the internal keyring's keys seems like it would avoid changing the data structures for the internal keyring when adding an external option. > > Again, we will need the ability to perform the encryption using a simple > passphrase provided by the user, while using multiple randomly generated > data encryption keys, and that's what the latest set of patches are > geared towards. I'm generally in support of adding additional > capabilities in this area in the future, but I don't think we need to > bog down the current effort by demanding that be implemented today. > I'm really not looking to bog down the current effort. I'll have a go at adding another keyring and/or abstracting the key wrap and see how well a true peer to the passphrase approach fits in. Regards Alastair
Re: Refactor routine to check for ASCII-only case
On Fri, Dec 18, 2020 at 11:30:16AM -0500, Stephen Frost wrote: > * Heikki Linnakangas (hlinn...@iki.fi) wrote: >> +1 > > Yeah, in a quick look, this looks like a good improvement. Thanks. This has been applied as of 93e8ff8. -- Michael signature.asc Description: PGP signature
Re: Proposed patch for key managment
Greetings, * Alastair Turner (min...@decodable.me) wrote: > On Sun, 20 Dec 2020 at 22:59, Stephen Frost wrote: > > Yes, it's true that after things are implemented it can be more > > difficult to change them- but if you're concerned about the specific > > on-disk format of the keyring then please help us understand what your > > concern is there. The concern you've raised so far is just that you'd > > like an option to have an external keyring, and that's fine, but we are > > also going to need to have an internal one and which comes first doesn't > > seem particularly material to me. I don't think having one or the other > > in place first will really detract or make more difficult the adding of > > the other later. > > What I'd like specifically is to have the option of an external > keyring as a first class key store, where the keys stored in the > external keyring are used directly to encrypt the database contents. Right- understood, but that's not what the patch does today and there isn't anyone who has proposed such a patch, nor explained why we couldn't add that capability later. > The examples in this discussion so far have all put the internal > keyring between any other crypto infrastructure and the file > encryption process. Your description above of changes to pass keys out > of the command sound like they may have addressed this. The updates are intended to make it so that the KEK which wraps the internal keyring will be obtained directly from the cluster key command, pushing the transformation of a passphrase (should one be provided) into a proper key to the script, but otherwise allowing the result of things like 'openssl rand -hex 32' to be used directly as the KEK. > Regarding the on-disk format, separate storage of the key HMACs and > the wrapped keys sounds like a requirement for implementing a fully > external keyring or doing key wrapping externally via an OpenSSL or > nss tls pluggable engine. I'm still looking through the code. This seems a bit confusing as the question at hand is the on-disk format of the internal keyring, not anything to do with an external keyring (which we wouldn't have any storage of and so I don't understand how it makes sense to even discuss the idea of storage for the external keyring..?). Again, we will need the ability to perform the encryption using a simple passphrase provided by the user, while using multiple randomly generated data encryption keys, and that's what the latest set of patches are geared towards. I'm generally in support of adding additional capabilities in this area in the future, but I don't think we need to bog down the current effort by demanding that be implemented today. Thanks, Stephen signature.asc Description: PGP signature
Re: \gsetenv
On Sun, Dec 20, 2020 at 11:07 AM Tom Lane wrote: > If we could draw a line between "safe" and "unsafe" environment > variables, I'd be willing to consider a patch that allows directly > setting only the former. But I don't see how to draw that line. > > IIUC the threat here is for users that write: SELECT * FROM view \gset Because if you are writing SELECT col1, col2, col3 OR SELECT expression AS col1 \gset The query author has explicitly stated which variable names they exactly want to change/create and nothing the server can do will be able to alter those names. Or *is* that the problem - the server might decide to send back a column named "breakme1" in the first column position even though the user aliased the column name as "col1"? Would a "\gsetenv (col1, col2, col3, skip, col4)" be acceptable that leaves the name locally defined while relying on column position to match? How much do we want to handicap a useful feature because someone can use it alongside "SELECT *"? Can we prevent it from working in such a case outright - force an explicit column name list at minimum, and ideally aliases for expressions? I suspect not, too much of that has to happen on the server. That makes doing this by column position and defining the names strictly locally a compromise worth considering. At worst, there is no way to get an unwanted variable to appear on the client even if the data for wanted variables is made bogus by the compromised server. I don't see how avoiding the bogus data problem is even possible. David J.
Re: [PATCH] Logical decoding of TRUNCATE
On Sun, Dec 20, 2020 at 3:13 PM Andres Freund wrote: > Hm. Do I understand correctly that this problem is hit solely because > the parallel mode code relies on there already have been a transaction > snapshot set, thus avoiding the error? And that the code normally only > works because GetTransactionSnapshot() will already have been called > somewhere, before EnterParallelMode()? It seems unlikely that InitializeParallelDSM() was ever intended to be run in a background worker. -- Peter Geoghegan
Re: \gsetenv
On Sun, Dec 20, 2020 at 10:42:40PM +0200, Heikki Linnakangas wrote: > On 20/12/2020 21:05, David Fetter wrote: > > We have plenty of ways to spawn shells and cause havoc, and we > > wouldn't be able to block them all even if we decided to put a bunch > > of pretty onerous restrictions on psql at this very late date. We have > > \set, backticks, \!, and bunches of things less obvious that could, > > even without a compromised server, cause real mischief. > > There is a big difference between having to trust the server or not. Yeah, > you could cause a lot of mischief if you let a user run arbitrary psql > scripts on your behalf. But that's no excuse for opening up a whole another > class of problems. I'm skittish about putting exploits out in public in advance of discussions about how to mitigate them, but I have constructed several that do pretty bad things using only hostile content in a server and the facilities `psql` already provides. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [PATCH] Logical decoding of TRUNCATE
Hi, On 2020-12-20 04:13:19 +, Noah Misch wrote: > postgres: subscriber: logical replication worker for subscription 16411 > (GetTransactionSnapshot+0x168) [0x951ce8] > postgres: subscriber: logical replication worker for subscription 16411 > (InitializeParallelDSM+0x16) [0x52cf86] > postgres: subscriber: logical replication worker for subscription 16411 > (btbuild+0x26a) [0x50905a] > postgres: subscriber: logical replication worker for subscription 16411 > (index_build+0x14b) [0x569c1b] > postgres: subscriber: logical replication worker for subscription 16411 > (reindex_index+0x19a) [0x56caea] > postgres: subscriber: logical replication worker for subscription 16411 > (reindex_relation+0xc0) [0x56d090] > postgres: subscriber: logical replication worker for subscription 16411 > (ExecuteTruncateGuts+0x376) [0x62f0d6] > postgres: subscriber: logical replication worker for subscription 16411 > () [0x78d592] > postgres: subscriber: logical replication worker for subscription 16411 > (ApplyWorkerMain+0x5ab) [0x78e4eb] > postgres: subscriber: logical replication worker for subscription 16411 > (StartBackgroundWorker+0x23f) [0x75522f] > postgres: subscriber: logical replication worker for subscription 16411 > () [0x762a6d] > postgres: subscriber: logical replication worker for subscription 16411 > () [0x7635ee] > /lib64/libpthread.so.0(+0xf630) [0x7fe081e97630] > /lib64/libc.so.6(__select+0x13) [0x7fe0805c0983] > postgres: subscriber: logical replication worker for subscription 16411 > () [0x4887ac] > postgres: subscriber: logical replication worker for subscription 16411 > (PostmasterMain+0x1118) [0x764c88] > postgres: subscriber: logical replication worker for subscription 16411 > (main+0x6f2) [0x48aae2] > /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fe0804ed555] > postgres: subscriber: logical replication worker for subscription 16411 > () [0x48ab49] > 2020-12-19 17:54:04.683 PST [3629353:5] LOG: background worker "logical > replication worker" (PID 3629509) exited with exit code 1 Hm. Do I understand correctly that this problem is hit solely because the parallel mode code relies on there already have been a transaction snapshot set, thus avoiding the error? And that the code normally only works because GetTransactionSnapshot() will already have been called somewhere, before EnterParallelMode()? Greetings, Andres Freund
Re: Proposed patch for key managment
Greetings Alastair, * Alastair Turner (min...@decodable.me) wrote: > On Sat, 19 Dec 2020 at 16:58, Bruce Momjian wrote: > > To enable the direct injection of keys into the server, we would need a > > new command for this, since trying to make the passphrase command do > > this will lead to unnecessary complexity. The passphrase command should > > do one thing, and you can't have it changing its behavior based on > > parameters, since the parameters are for the script to process, not to > > change behavior of the server. > > > > If we wanted to do this, we would need a new command, and one of the > > parameters would be %k or something that would identify the key number > > we want to retrieve from the KMS. Stephen pointed out that we could > > still validate that key; the key would not be stored wrapped in the > > file system, but we could store an HMAC in the file system, and use that > > for validation. > > I think that this is where we have ended up talking at cross-purposes. > My idea (after some refining based on Stephen's feedback) is that > there should be only this new command (the cluster key command) and > that the program for unwrapping stored keys should be called this way. > As could a program to contact an HSM, etc. This moves the generating > and wrapping functionality out of the core Postgres processes, making > it far easier to add alternatives. I see this approach was discussed > on the email thread you linked to, but I can't see where or how a > conclusion was reached about it... I do generally like the idea of having the single command able to be used to either provide the KEK (where PG manages the keyring internally), or called multiple times to retrieve the DEKs (in which case PG wouldn't be managing the keyring). However, after chatting with Bruce about it for a bit this weekend, I'm not sure that we need to tackle the second case today. I don't think there's any doubt that there will be users who will want PG to manage the keyring and to be able to work with just a passphrase, as Bruce has worked to make happen here and which we have a patch for. I'm hoping Bruce will post a new patch soon based on the work that he and I discussed today (mostly just clarifications around keys vs. passphrases and having the PG side accept a key which the command that's run will produce). With that, I'm feeling pretty comfortable that we can move forward and at least get this initial work committed. > The pg_strong_random() function uses OpesnSSL's RAND_bytes() where > available. With appropriate configuration of an OpenSSL engine > supplying an alternative RAND, this could be handed off to a TRNG. Sure. > This is an example of the other option for providing flexibility to > support specific key generation, wrapping, ... requirements - handing > the operations off to a library like OpenSSL or nss tls which, in > turn, can use pluggable providers for these functions. FWIW, the > proprietary DBMSs use a pluggable engine approach to meet requirements > in this category. OpenSSL provides for this configuration and gives us a pluggable architecture to make this happen, so I'm not sure what the concern here is..? I agree that eventually it'd be nice to allow the administrator to, more directly, control the DEKs but we're still going to need to have a solution for the user who wishes to just provide a passphrase or a KEK and I don't see any particular reason to hold off on the implementation of that. > > My final point is that we can find ways to do what you are suggesting as > > an addition to what we are adding now. What we need is clear > > justification of why these additional features are needed. Requiring > > the use of a true random number generator is a valid argument, but we > > need to figure out, once this is implemented, who really wants that, and > > how to implement it cleanly. > > Clean interfaces would be either "above" the database calling out to > commands in user-land or "below" the database in the abstractions > which OpenSSL, nss tls and friends provide. Since the conclusion seems > already to have been reached that the keyring should be inside the > core process and only the passphrase should pop out above, I'll review > the patch in this vein and see if I can suggest any routes to carving > out more manageable subsets of the patch. There's no doubt that there needs to be a solution where the keyring is managed by PG. Perhaps there are options to allow an external keyring as well, but we can add that in the future. > "...once this is implemented..." changes become a lot more difficult. > Particularly when the changes would affect what are regarded as the > database's on-disk files. Which I suspect is a contributing factor to > the level of engagement surrounding this feature. Yes, it's true that after things are implemented it can be more difficult to change them- but if you're concerned about the specific on-disk format of the keyring then please help us
Re: Proposed patch for key managment
Thanks, Bruce On Sat, 19 Dec 2020 at 16:58, Bruce Momjian wrote: > ... > > To enable the direct injection of keys into the server, we would need a > new command for this, since trying to make the passphrase command do > this will lead to unnecessary complexity. The passphrase command should > do one thing, and you can't have it changing its behavior based on > parameters, since the parameters are for the script to process, not to > change behavior of the server. > > If we wanted to do this, we would need a new command, and one of the > parameters would be %k or something that would identify the key number > we want to retrieve from the KMS. Stephen pointed out that we could > still validate that key; the key would not be stored wrapped in the > file system, but we could store an HMAC in the file system, and use that > for validation. > I think that this is where we have ended up talking at cross-purposes. My idea (after some refining based on Stephen's feedback) is that there should be only this new command (the cluster key command) and that the program for unwrapping stored keys should be called this way. As could a program to contact an HSM, etc. This moves the generating and wrapping functionality out of the core Postgres processes, making it far easier to add alternatives. I see this approach was discussed on the email thread you linked to, but I can't see where or how a conclusion was reached about it... > > On other interesting approach would be to allow the server to call out > for a KMS when it needs to generate the initial data keys that are > wrapped by the passphrase; this is in contrast to calling the KMS > everytime it needs the data keys. > ... > ...The data keys are generated using this random value > code: > > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/port/pg_strong_random.c;hb=HEAD > > so someone would have to explain why generating this remotely in a KMS > is superior, not just based on company policy. > The pg_strong_random() function uses OpesnSSL's RAND_bytes() where available. With appropriate configuration of an OpenSSL engine supplying an alternative RAND, this could be handed off to a TRNG. This is an example of the other option for providing flexibility to support specific key generation, wrapping, ... requirements - handing the operations off to a library like OpenSSL or nss tls which, in turn, can use pluggable providers for these functions. FWIW, the proprietary DBMSs use a pluggable engine approach to meet requirements in this category. > > We should create code for the general use-case, which is currently > passphrase-wrapped system-generated data keys, and then get feedback on > what else we need. However, I should point out that the community is > somewhat immune to the "my company needs this" kind of argument without > logic to back it up, though sometimes I think this entire feature is in > that category. ... Yeah. Security in general, and crypto in particular, are often "because someone told me to" requirements. > > My final point is that we can find ways to do what you are suggesting as > an addition to what we are adding now. What we need is clear > justification of why these additional features are needed. Requiring > the use of a true random number generator is a valid argument, but we > need to figure out, once this is implemented, who really wants that, and > how to implement it cleanly. > Clean interfaces would be either "above" the database calling out to commands in user-land or "below" the database in the abstractions which OpenSSL, nss tls and friends provide. Since the conclusion seems already to have been reached that the keyring should be inside the core process and only the passphrase should pop out above, I'll review the patch in this vein and see if I can suggest any routes to carving out more manageable subsets of the patch. "...once this is implemented..." changes become a lot more difficult. Particularly when the changes would affect what are regarded as the database's on-disk files. Which I suspect is a contributing factor to the level of engagement surrounding this feature. Regards Alastair
Re: pg_preadv() and pg_pwritev()
Hi, On 2020-12-20 16:26:42 +1300, Thomas Munro wrote: > > 1. port.h cannot assume that has already been included; > > nor do I want to fix that by including there. Do we > > really need to define a fallback value of IOV_MAX? If so, > > maybe the answer is to put the replacement struct iovec and > > IOV_MAX in some new header. > > Ok, I moved all this stuff into port/pg_uio.h. Can we come up with a better name than 'uio'? I find that a not exactly meaningful name. Or perhaps we could just leave the functions in port/port.h, but extract the value of the define at configure time? That way only pread.c / pwrite.c would need it. > > 3. The patch as given won't prove anything except that the code > > compiles. Is it worth fixing at least one code path to make > > use of pg_preadv and pg_pwritev, so we can make sure this code > > is tested before there's layers of other new code on top? > > OK, here's a patch to zero-fill fresh WAL segments with pwritev(). I think that's a good idea. However, I see two small issues: 1) If we write larger amounts at once, we need to handle partial writes. That's a large enough amount of IO that it's much more likely to hit a memory shortage or such in the kernel - we had to do that a while a go for WAL writes (which can also be larger), if memory serves. Perhaps we should have pg_pwritev/readv unconditionally go through pwrite.c/pread.c and add support for "continuing" partial writes/reads in one central place? > I'm drawing a blank on trivial candidate uses for preadv(), without > infrastructure from later patches. Can't immediately think of something either. Greetings, Andres Freund
Re: Commit fest manager for 2021-01
On Sun, Dec 20, 2020 at 10:27 PM Magnus Hagander wrote: > > On Sat, Dec 19, 2020 at 6:00 AM Michael Paquier wrote: > > > > On Sat, Dec 19, 2020 at 10:03:47AM +0530, Amit Kapila wrote: > > > Glad to hear. I am confident that you can do justice to this role. > > > > I also think you will do just fine. Thanks for taking care of this. > > +1 on both accounts. > > If you haven't been one before (which I think?), please let me know > what username your account in the system has, and I will make sure you > get the required permissions- Thanks! My usename is masahikosawada. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: Weird special case in jsonb_concat()
Hi, w.r.t. the patch, +select '[3]'::jsonb || '{}'::jsonb; + ?column? +-- + [3, {}] +(1 row) + +select '3'::jsonb || '[]'::jsonb; Should cases where the empty array precedes non-empty jsonb be added ? select '[]'::jsonb || '3'::jsonb; select '{}'::jsonb || '[3]'::jsonb; Cheers
Re: optimizer/clauses.h needn't include access/htup.h
Hi, On 2020-11-23 19:44:37 -0300, Alvaro Herrera wrote: > I was contemplating commands/trigger.c this morning (after Heikki split > copy.c) thinking about the three pieces embedded in there -- one > catalog/pg_trigger.c, one in executor (execTrigger.c?) and what seems a > very small piece to remain where it is. One thing that's not clear enough in general - at least in my view - is what belongs into executor/ and what should be somewhere more general. E.g. a lot of what I assume you would move to execTrigger.c is not just used within the real executor, but also from e.g. copy. There's plenty pre-existing examples for that (e.g. tuple slots), but I wonder if we should try to come up with a better split at some point. Oh, and definitely +1 on splitting trigger.c. Wonder if the the trigger queue stuff, and the directly executor interfacing functions should be split again? It seems to me the trigger queue details are isolated enough that that could come out clean enough. - Andres
Improving LWLock wait events
Hi, The current wait events are already pretty useful. But I think we could make them more informative without adding real runtime overhead. 1) For lwlocks I think it'd be quite useful to show the mode of acquisition in pg_stat_activity.wait_event_type, instead of just saying 'LWLock'. I think we should split PG_WAIT_LWLOCK into PG_WAIT_LWLOCK_{EXCLUSIVE,SHARED,WAIT_UNTIL_FREE}, and report a different wait_event_type based on the different class. The fact that it'd break people explicitly looking for LWLock in pg_stat_activity doesn't seem to outweigh the benefits to me. 2) I think it's unhelpful that waits for WAL insertion locks to progress show up LWLock acquisitions. LWLockWaitForVar() feels like a distinct enough operation that passing in a user-specified wait event is worth the miniscule incremental overhead that'd add. I'd probably just make it a different wait class, and have xlog.c compute that based on the number of the slot being waited for. 3) I have observed waking up other processes as part of a lock release to be a significant performance factor. I would like to add a separate wait event type for that. That'd be a near trivial extension to 1) I also think there's a 4, but I think the tradeoffs are a bit more complicated: 4) For a few types of lwlock just knowing the tranche isn't sufficient. E.g. knowing whether it's one or different buffer mapping locks are being waited on is important to judge contention. For wait events right now we use 1 byte for the class, 1 byte is unused and 2 bytes are used for event specific information (the tranche in case of lwlocks). Seems like we could change the split to be a 4bit class and leave 28bit to the specific wait event type? And in lwlocks case we could make something like 4 bit class, 10 bit tranche, 20 bit sub-tranche? 20 bit aren't enough to uniquely identify a lock for the larger tranches (mostly buffer locks, I think), but I think it'd still be enough to disambiguate. The hardest part would be to know how to identify individual locks. The easiest would probably be to just mask in a parts of the lwlock address (e.g. shift it right by INTALIGN, and then mask in the result into the eventId). That seems a bit unsatisfying. We could probably do a bit better: We could just store the information about tranche / offset within tranche at LWLockInitialize() time, instead of computing something just before waiting. While LWLock.tranche is only 16bits right now, the following two bytes are currently padding... That'd allow us to have proper numerical identification for nearly all tranches, without needing to go back to the complexity of having tranches specify base & stride. Even more API churn around lwlock initialization isn't desirable :(, but we could just add a LWLockInitializeIdentified() or such. Greetings, Andres Freund
Re: proposal: schema variables
Hi, This is continuation of the previous review. +* We should to use schema variable buffer, when +* it is available. 'should use' (no to) + /* When buffer of used schema variables loaded from shared memory */ A verb seems missing in the above comment. + elog(ERROR, "unexpected non-SELECT command in LET ... SELECT"); Since non-SELECT is mentioned, I wonder if the trailing SELECT can be omitted. +* some collision can be solved simply here to reduce errors based +* on simply existence of some variables. Often error can be using simply occurred twice above - I think one should be enough. If you want to keep the second, it should be 'simple'. Cheers On Sun, Dec 20, 2020 at 11:25 AM Zhihong Yu wrote: > Hi, > I took a look at the rebased patch. > > + varisnotnull > + boolean > + > + > + True if the schema variable doesn't allow null value. The default > value is false. > > I wonder whether the field can be named in positive tense: e.g. > varallowsnull with default of true. > > + vareoxaction > + n = no action, d = drop the > variable, > + r = reset the variable to its default value. > > Looks like there is only one action allowed. I wonder if there is a > possibility of having two actions at the same time in the future. > > + The PL/pgSQL language has not packages > + and then it has not package variables and package constants. The > > 'has not' -> 'has no' > > + a null value. A variable created as NOT NULL and without an > explicitely > > explicitely -> explicitly > > + int nnewmembers; > + Oid*oldmembers; > + Oid*newmembers; > > I wonder if naming nnewmembers newmembercount would be more readable. > > For pg_variable_aclcheck: > > + return ACLCHECK_OK; > + else > + return ACLCHECK_NO_PRIV; > > The 'else' can be omitted. > > + * Ownership check for a schema variables (specified by OID). > > 'a schema variable' (no s) > > For NamesFromList(): > > + if (IsA(n, String)) > + { > + result = lappend(result, n); > + } > + else > + break; > > There would be no more name if current n is not a String ? > > +* both variants, and returns InvalidOid with not_uniq flag, > when > > 'and return' (no s) > > + return InvalidOid; > + } > + else if (OidIsValid(varoid_without_attr)) > > 'else' is not needed (since the if block ends with return). > > For clean_cache_callback(), > > + if (hash_search(schemavarhashtab, > + (void *) >varid, > + HASH_REMOVE, > + NULL) == NULL) > + elog(DEBUG1, "hash table corrupted"); > > Maybe add more information to the debug, such as var name. > Should we come out of the while loop in this scenario ? > > Cheers >
Re: \gsetenv
On 20/12/2020 21:05, David Fetter wrote: We have plenty of ways to spawn shells and cause havoc, and we wouldn't be able to block them all even if we decided to put a bunch of pretty onerous restrictions on psql at this very late date. We have \set, backticks, \!, and bunches of things less obvious that could, even without a compromised server, cause real mischief. There is a big difference between having to trust the server or not. Yeah, you could cause a lot of mischief if you let a user run arbitrary psql scripts on your behalf. But that's no excuse for opening up a whole another class of problems. - Heikki
Re: proposal: schema variables
Hi, I took a look at the rebased patch. + varisnotnull + boolean + + + True if the schema variable doesn't allow null value. The default value is false. I wonder whether the field can be named in positive tense: e.g. varallowsnull with default of true. + vareoxaction + n = no action, d = drop the variable, + r = reset the variable to its default value. Looks like there is only one action allowed. I wonder if there is a possibility of having two actions at the same time in the future. + The PL/pgSQL language has not packages + and then it has not package variables and package constants. The 'has not' -> 'has no' + a null value. A variable created as NOT NULL and without an explicitely explicitely -> explicitly + int nnewmembers; + Oid*oldmembers; + Oid*newmembers; I wonder if naming nnewmembers newmembercount would be more readable. For pg_variable_aclcheck: + return ACLCHECK_OK; + else + return ACLCHECK_NO_PRIV; The 'else' can be omitted. + * Ownership check for a schema variables (specified by OID). 'a schema variable' (no s) For NamesFromList(): + if (IsA(n, String)) + { + result = lappend(result, n); + } + else + break; There would be no more name if current n is not a String ? +* both variants, and returns InvalidOid with not_uniq flag, when 'and return' (no s) + return InvalidOid; + } + else if (OidIsValid(varoid_without_attr)) 'else' is not needed (since the if block ends with return). For clean_cache_callback(), + if (hash_search(schemavarhashtab, + (void *) >varid, + HASH_REMOVE, + NULL) == NULL) + elog(DEBUG1, "hash table corrupted"); Maybe add more information to the debug, such as var name. Should we come out of the while loop in this scenario ? Cheers
Re: \gsetenv
On Sun, Dec 20, 2020 at 01:07:12PM -0500, Tom Lane wrote: > David Fetter writes: > > On Sun, Dec 20, 2020 at 02:26:14PM +0100, Fabien COELHO wrote: > >> SELECT 'Calvin' AS foo \gset > >> \setenv FOO :foo > >> \! echo $FOO > >> Calvin > > > You're the second person who's mentioned this workaround, which goes > > to a couple of points I tried to make earlier: > > > - This is not by any means a new capability, just a convenience, and > > - In view of the fact that it's a very old capability, the idea that > > it has implications for controlling access or other parts of the > > space of threat models is pretty silly. > > This is essentially the same workaround as what we recommend for anyone > who's unhappy with the fix for CVE-2020-25696: do \gset into a non-special > variable and then copy to the special variable. The reason it seems okay > is that now it is clear that client-side logic intends the special > variable change to happen. Thus a compromised server cannot hijack your > client-side session all by itself. There's nonzero risk in letting the > server modify your PROMPT1, PATH, or whatever, but you took the risk > intentionally (and, presumably, it's necessary to your purposes). > > I tend to agree with you that the compromised-server argument is a little > bit of a stretch. Still, we did have an actual user complaining about > the case for \gset, and it's clear that in at least some scenarios this > sort of attack could be used to parlay a server compromise into additional > access. So we're not likely to undo the CVE-2020-25696 fix, and we're > equally unlikely to provide an unrestricted way to set environment > variables directly from the server. > > If we could draw a line between "safe" and "unsafe" environment > variables, I'd be willing to consider a patch that allows directly > setting only the former. But I don't see how to draw that line. > Most of the point of any such feature would have to be to affect > the behavior of shell commands subsequently invoked with \! ... > and we can't know what a given variable would do to those. So on > the whole I'm inclined to leave things as-is, where people have to > do the assignment manually. I suppose now's not a great time for this from an optics point of view. Taking on the entire security theater industry is way out of scope for the PostgreSQL project. We have plenty of ways to spawn shells and cause havoc, and we wouldn't be able to block them all even if we decided to put a bunch of pretty onerous restrictions on psql at this very late date. We have \set, backticks, \!, and bunches of things less obvious that could, even without a compromised server, cause real mischief. I believe that a more effective way to deal with this reality in a way that helps users is to put clear warnings in the documentation about the fact that psql programs are at least as capable as shell programs in that they are innately capable of doing anything that the psql's system user is authorized to do. Would a patch along that line help? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: pgbench failed when -f option contains a char '@'
Fabien COELHO writes: > The "parser" looks for the last @ in the argument, so the simple > workaround is to append "@1". > I suggest the attached doc update, or anything in better English. Agreed, done. regards, tom lane
Re: \gsetenv
David Fetter writes: > On Sun, Dec 20, 2020 at 02:26:14PM +0100, Fabien COELHO wrote: >> SELECT 'Calvin' AS foo \gset >> \setenv FOO :foo >> \! echo $FOO >> Calvin > You're the second person who's mentioned this workaround, which goes > to a couple of points I tried to make earlier: > - This is not by any means a new capability, just a convenience, and > - In view of the fact that it's a very old capability, the idea that > it has implications for controlling access or other parts of the > space of threat models is pretty silly. This is essentially the same workaround as what we recommend for anyone who's unhappy with the fix for CVE-2020-25696: do \gset into a non-special variable and then copy to the special variable. The reason it seems okay is that now it is clear that client-side logic intends the special variable change to happen. Thus a compromised server cannot hijack your client-side session all by itself. There's nonzero risk in letting the server modify your PROMPT1, PATH, or whatever, but you took the risk intentionally (and, presumably, it's necessary to your purposes). I tend to agree with you that the compromised-server argument is a little bit of a stretch. Still, we did have an actual user complaining about the case for \gset, and it's clear that in at least some scenarios this sort of attack could be used to parlay a server compromise into additional access. So we're not likely to undo the CVE-2020-25696 fix, and we're equally unlikely to provide an unrestricted way to set environment variables directly from the server. If we could draw a line between "safe" and "unsafe" environment variables, I'd be willing to consider a patch that allows directly setting only the former. But I don't see how to draw that line. Most of the point of any such feature would have to be to affect the behavior of shell commands subsequently invoked with \! ... and we can't know what a given variable would do to those. So on the whole I'm inclined to leave things as-is, where people have to do the assignment manually. regards, tom lane
Re: \gsetenv
On Sun, Dec 20, 2020 at 02:26:14PM +0100, Fabien COELHO wrote: > Hello David, > > > We have \gset to set some parameters, but not ones in the environment, > > so I fixed this with a new analogous command, \gsetenv. I considered > > refactoring SetVariable to include environment variables, but for a > > first cut, I just made a separate function and an extra if. > > My 0.02€: ISTM that you do not really need that, it can already be achieved > with gset, so I would not bother to add a gsetenv. > > sh> psql > SELECT 'Calvin' AS foo \gset > \setenv FOO :foo > \! echo $FOO > Calvin Thanks! You're the second person who's mentioned this workaround, which goes to a couple of points I tried to make earlier: - This is not by any means a new capability, just a convenience, and - In view of the fact that it's a very old capability, the idea that it has implications for controlling access or other parts of the space of threat models is pretty silly. Having dispensed with the idea that there's a new attack surface here, I'd like to request that people at least have a look at it as a feature psql users might appreciate having. As the author, I obviously see it that way, but again as the author, it's not for me to make that call. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: multi-install PostgresNode
On 12/19/20 11:19 AM, Andrew Dunstan wrote: > > > This turns out to be remarkably short, with the use of a little eval magic. > > Give the attached, this test program works just fine: > > #!/bin/perl > use PostgresNodePath; > $ENV{PG_REGRESS} = > '/home/andrew/pgl/vpath.12/src/test/regress/pg_regress'; > my $node = get_new_node('/home/andrew/pgl/inst.12.5711','blurfl'); > print $node->info; > print $node->connstr(),"\n"; > $node->init(); > This time with a typo removed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com PostgresNodePath.pm Description: Perl program
Re: how to find log
On 12/20/20 11:31 AM, Tom Lane wrote: > Dmitry Markman writes: >> suppose I started the server with the following command >> pg_ctl -D . . . start -l >> is there a way to get later by sending some query to the >> server or > No, the server has no way to know where its stdout/stderr were > pointed to. You might want to enable the syslogger output method > (see logging_collector) to have something a bit more featureful. > > https://www.postgresql.org/docs/current/runtime-config-logging.html > > Alternatively, asking the OS in many cases will work, e.g. on my linux machine: ls -l /proc/{postmasterpid}/fd/1 cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: how to find log
Dmitry Markman writes: > suppose I started the server with the following command > pg_ctl -D . . . start -l > is there a way to get later by sending some query to the server > or No, the server has no way to know where its stdout/stderr were pointed to. You might want to enable the syslogger output method (see logging_collector) to have something a bit more featureful. https://www.postgresql.org/docs/current/runtime-config-logging.html regards, tom lane
how to find log
Hi all suppose I started the server with the following command pg_ctl -D . . . start -l is there a way to get later by sending some query to the server or reading some configuration file (for example I can get data directory with the query “show data_directory”) thanks in advance Dimitry Markman
Re: pgbench failed when -f option contains a char '@'
Hello Tom, I think we should just leave this as it is. The user can simply rename the file. Yeah. The assumption when we defined the script-weight syntax was that there's no particular reason to use "@" in a script file name, and I don't see why that's a bad assumption. The "parser" looks for the last @ in the argument, so the simple workaround is to append "@1". I suggest the attached doc update, or anything in better English. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 7180fedd65..bba3cf05b0 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -461,6 +461,8 @@ pgbench options d the list of executed scripts. An optional integer weight after @ allows to adjust the probability of drawing the test. +If the filename includes a @ character, append a weight so +that there is no ambiguity: filen@me@1. See below for details.
Re: pgbench failed when -f option contains a char '@'
Hello, pgbench use -f filename[@weight] to receive a sql script file with a weight, ISTM that I thought of this: "pgbench -f filen@me@1" does work. sh> touch foo@bla sh> pgbench -f foo@bla@1 pgbench: fatal: empty command list for script "foo@bla" The documentation could point this out, though. -- Fabien.
Re: Commit fest manager for 2021-01
On Sat, Dec 19, 2020 at 6:00 AM Michael Paquier wrote: > > On Sat, Dec 19, 2020 at 10:03:47AM +0530, Amit Kapila wrote: > > Glad to hear. I am confident that you can do justice to this role. > > I also think you will do just fine. Thanks for taking care of this. +1 on both accounts. If you haven't been one before (which I think?), please let me know what username your account in the system has, and I will make sure you get the required permissions- -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: \gsetenv
Hello David, We have \gset to set some parameters, but not ones in the environment, so I fixed this with a new analogous command, \gsetenv. I considered refactoring SetVariable to include environment variables, but for a first cut, I just made a separate function and an extra if. My 0.02€: ISTM that you do not really need that, it can already be achieved with gset, so I would not bother to add a gsetenv. sh> psql SELECT 'Calvin' AS foo \gset \setenv FOO :foo \! echo $FOO Calvin -- Fabien.
Re: pg_preadv() and pg_pwritev()
On Sun, Dec 20, 2020 at 8:07 PM Tom Lane wrote: > One minor thought is that in > > + struct iovec iov[Min(IOV_MAX, 1024)]; /* cap stack space */ > > it seems like pretty much every use of IOV_MAX would want some > similar cap. Should we centralize that idea with, say, > > #define PG_IOV_MAX Min(IOV_MAX, 1024) > > ? Or will the plausible cap vary across uses? Hmm. For the real intended user of this, namely worker processes that simulate AIO when native AIO isn't available, higher level code will limit the iov count to much smaller numbers anyway. It wants to try to stay under typical device limits for vectored I/O, because split requests would confound attempts to model and limit queue depth and control latency. In Andres's AIO prototype he currently has a macro PGAIO_MAX_COMBINE set to 16 (meaning approximately 16 data block or wal reads/writes = 128KB worth of scatter/gather per I/O request); I guess it should really be Min(IOV_MAX, ), but I don't currently have an opinion on the , except that it should surely be closer to 16 than 1024 (for example /sys/block/nvme0n1/queue/max_segments is 33 here). I mention all this to explain that I don't think the code in patch 0002 is going to turn out to be very typical: it's trying to minimise system calls by staying under an API limit (though I cap it for allocation sanity), whereas more typical code probably wants to stay under a device limit, so I don't immediately have another use for eg PG_IOV_MAX.
Re: Asynchronous Append on postgres_fdw nodes.
On Thu, Dec 10, 2020 at 3:38 PM Andrey V. Lepikhov wrote: > On 11/17/20 2:56 PM, Etsuro Fujita wrote: > > On Mon, Oct 5, 2020 at 3:35 PM Etsuro Fujita > > wrote: > > Comments welcome! The attached is still WIP and maybe I'm missing > > something, though. > I reviewed your patch and used it in my TPC-H benchmarks. It is still > WIP. Will you improve this patch? Yeah, will do. > I also want to say that, in my opinion, Horiguchi-san's version seems > preferable: it is more structured, simple to understand, executor-native > and allows to reduce FDW interface changes. I’m not sure what you mean by “executor-native”, but I partly agree that Horiguchi-san’s version would be easier to understand, because his version was made so that a tuple is requested from an async subplan using our Volcano Iterator model almost as-is. But my concerns about his version would be: 1) it’s actually pretty invasive, because it changes the contract of the ExecProcNode() API [1], and 2) IIUC it wouldn’t allow us to find ready subplans from occurred wait events when we extend to the case where subplans are joins or aggregates over ForeignScans [2]. > This code really only needs > one procedure - IsForeignPathAsyncCapable. This isn’t correct: his version uses ForeignAsyncConfigureWait() as well. Thank you for reviewing! Sorry for the delay. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAPmGK16YXCADSwsFLSxqTBBLbt3E_%3DiigKTtjS%3Ddqu%2B8K8DWCw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAPmGK16rA5ODyRrVK9iPsyW-td2RcRZXsdWoVhMmLLmUhprsTg%40mail.gmail.com
Re: Asynchronous Append on postgres_fdw nodes.
On Thu, Nov 26, 2020 at 10:28 AM movead...@highgo.ca wrote: > I test the patch and occur several issues as blow: Thank you for the review! > Issue one: > Get a Assert error at 'Assert(bms_is_member(i, node->as_needrequest));' in > ExecAppendAsyncRequest() function when I use more than two foreign table > on different foreign server. > > I research the code and do such change then the Assert problom disappear. Could you show a test case causing the assertion failure? > Issue two: > Then I test and find if I have sync subplan and async sunbplan, it will run > over > the sync subplan then the async turn, I do not know if it is intent. Did you use a partitioned table with only two partitions where one is local and the other is remote? If so, that would be expected, because in that case, 1) the patch would first send an asynchronous query to the remote, 2) it would then process the local partition until the end, 3) it would then wait/poll the async event, and 4) it would finally process the remote partition when the event occurs. Sorry for the delay. Best regards, Etsuro Fujita
Re: how to use valgrind for TAP tests
Hello, 18.12.2020 19:02, Tom Lane wrote: > "osumi.takami...@fujitsu.com" writes: >> I have a question about how to execute valgrind with TAP tests >> in order to check some patches in the community. >> My main interest is testing src/test/subscription now but >> is there any general way to do it ? > The standard solution is > > (1) Build normally (well, with -DUSE_VALGRIND) > (2) Move the postgres executable aside, say > mv src/backend/postgres src/backend/postgres.orig > (3) Replace the executable with a wrapper script that invokes > valgrind on the original executable > (4) Now you can run "make check" with a valgrind'ed server, > as well as things that depend on "make check", such as TAP tests > > The script I use for (3) is attached; adjust paths and options to taste. I use the attached patch for this purpose, that slightly simplifies things and covers all the other binaries: git apply .../install-vrunner.patch CPPFLAGS="-DUSE_VALGRIND -Og" ./configure --enable-tap-tests --enable-debug --enable-cassert && make && make check `make check-world` is possible too, with src/bin/pg_ctl/t/001_start_stop.pl disabled (removed). Best regards, Alexander diff --git a/install-vrunner.sh b/install-vrunner.sh new file mode 100755 index 000..21881d4c729 --- /dev/null +++ b/install-vrunner.sh @@ -0,0 +1,31 @@ +#!/bin/bash +set -e + +SRCDIR=$1 +PGPREFIX=$SRCDIR/tmp_install/usr/local/pgsql + +cd $PGPREFIX/bin +cp $SRCDIR/src/tools/valgrind.supp ./ + +echo "" +cat << 'EOF' > vrunner +#!/bin/bash +set -e +BD="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +SN="$(basename "${BASH_SOURCE[0]}")" +OBIN="_$SN" +[[ $LD_PRELOAD = *"valgrind"* ]] && exec $BD/$OBIN "$@" + +exec valgrind --quiet --exit-on-first-error=yes --error-exitcode=1 --leak-check=no --time-stamp=yes \ + --gen-suppressions=all --suppressions=$BD/valgrind.supp \ + --trace-children=yes --trace-children-skip="/bin/*,/usr/bin/*" \ + $BD/$OBIN "$@" +EOF +chmod a+x vrunner + +for f in *; do +if [ "$f" = "vrunner" ] || [ ! -x "$f" ]; then continue; fi +mv "$f" "_$f" +ln -s vrunner "$f" +chmod a+x "$f" +done diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 884d66920b6..87af7fd9ab8 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -411,6 +411,7 @@ ifeq ($(MAKELEVEL),0) $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 $(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 + '$(abs_top_builddir)'/install-vrunner.sh '$(abs_top_builddir)' endif endif endif diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 7f427e7d60e..e6115881f6b 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -689,12 +689,7 @@ wait_for_postmaster(pgpid_t pm_pid, bool do_checkpoint) pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]); pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]); if (pmstart >= start_time - 2 && -#ifndef WIN32 -pmpid == pm_pid -#else - /* Windows can only reject standalone-backend PIDs */ pmpid > 0 -#endif ) { /*