Re: bool_plperl transform
On 2/29/20 4:55 PM, Ivan Panchenko wrote: > Hi, > While using PL/Perl I have found that it obtains boolean arguments > from Postgres as ‘t’ and ‘f’, which is extremely inconvenient because > ‘f’ is not false from the perl viewpoint. > So the problem is how to convert the SQL booleans into Perl style. > > There are 3 ways to do this: > > 1. make plperl automatically convert bools into something acceptable > for perl. This looks simple, but probably is not acceptable as it > breaks compatibility. > 2. try to make some trick like it is done with arrays, i.e. convert > bools into special Perl objects which look like ‘t’ and ‘f’ when > treated as text, but are true and false for boolean operations. I > am not sure that it is possible and reliable. > 3. make a transform which transforms bool, like it is done with > jsonb. This does not break compatibility and is rather > straightforward. > > So I propose to take the third way and make such transform. This is > very simple, a patch is attached. > Also this patch improves the plperl documentation page, which now has > nothing said about the transforms. > > Patch appears to be missing all the new files. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.
Hi Amit, Thanks, I didn't check the thread. It looks to me the latest patch was submitted by Fujita-san, Oct-2018. Then, Tom pointer out this simple approach has a problem of inefficient remote query plan because of no intelligence on the structure of remote tables mapped by postgres_fdw. After that, the patch has been left for a year. Indeed, it is not an ideal query plan to execute for each updated rows... postgres=# explain select * from rtable_parent where tableoid = 126397 and ctid = '(0,11)'::tid; QUERY PLAN - Append (cost=0.00..5.18 rows=2 width=50) -> Seq Scan on rtable_parent (cost=0.00..1.15 rows=1 width=31) Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid)) -> Tid Scan on rtable_child (cost=0.00..4.02 rows=1 width=68) TID Cond: (ctid = '(0,11)'::tid) Filter: (tableoid = '126397'::oid) (6 rows) Rather than the refactoring at postgres_fdw, is it possible to have a built-in partition pruning rule when "tableoid = " was supplied? If partition mechanism would have the feature, it should not be a complicated problem. Best regards, 2020年3月1日(日) 12:39 Amit Langote : > > Hi, > > On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai wrote: > > > > Hello, > > > > I noticed the following scenario under the development of truncate > > support on FDW. > > > > In case when 'ftable' maps a remote table that has inherited children,... > > > > postgres=# create table rtable_parent (id int, label text, x text); > > CREATE TABLE > > postgres=# create table rtable_child () inherits (rtable_parent); > > CREATE TABLE > > postgres=# insert into rtable_parent (select x, 'parent', md5(x::text) > > from generate_series(1,10) x); > > INSERT 0 10 > > postgres=# insert into rtable_child (select x, 'child', md5(x::text) > > from generate_series(6,15) x); > > INSERT 0 10 > > postgres=# create foreign table ftable (id int, label text, x text) > >server loopback options (table_name 'rtable_parent'); > > CREATE FOREIGN TABLE > > > > The 'ftable' shows the results from both of the parent and children. > > postgres=# select * from ftable; > > id | label |x > > ++-- > > 1 | parent | c4ca4238a0b923820dcc509a6f75849b > > 2 | parent | c81e728d9d4c2f636f067f89cc14862c > > 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > > 4 | parent | a87ff679a2f3e71d9181a67b7542122c > > 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > > 6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc > > 7 | parent | 8f14e45fceea167a5a36dedd4bea2543 > > 8 | parent | c9f0f895fb98ab9159f51fd0297e236d > > 9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26 > > 10 | parent | d3d9446802a44259755d38e6d163e820 > > 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > > 7 | child | 8f14e45fceea167a5a36dedd4bea2543 > > 8 | child | c9f0f895fb98ab9159f51fd0297e236d > > 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 > > 10 | child | d3d9446802a44259755d38e6d163e820 > > 11 | child | 6512bd43d9caa6e02c990b0a82652dca > > 12 | child | c20ad4d76fe97759aa27a0c99bff6710 > > 13 | child | c51ce410c124a10e0db5e4b97fc2af39 > > 14 | child | aab3238922bcc25a6f606eb525ffdc56 > > 15 | child | 9bf31c7ff062936a96d3c8bd1f8f2ff3 > > (20 rows) > > > > When we try to update the foreign-table without DirectUpdate mode, > > remote query tries to update the rows specified by "ctid" system column. > > However, it was not a unique key in this case. > > > > postgres=# explain update ftable set x = 'updated' where id > 10 and > > pg_backend_pid() > 0; > > QUERY PLAN > > - > > Update on ftable (cost=100.00..133.80 rows=414 width=74) > >-> Result (cost=100.00..133.80 rows=414 width=74) > > One-Time Filter: (pg_backend_pid() > 0) > > -> Foreign Scan on ftable (cost=100.00..133.80 rows=414 width=42) > > (4 rows) > > > > [*] Note that pg_backend_pid() prevent direct update. > > > > postgres=# update ftable set x = 'updated' where id > 10 and > > pg_backend_pid() > 0; > > UPDATE 5 > > postgres=# select ctid,* from ftable; > > ctid | id | label |x > > +++-- > > (0,1) | 1 | parent | c4ca4238a0b923820dcc509a6f75849b > > (0,2) | 2 | parent | c81e728d9d4c2f636f067f89cc14862c > > (0,3) | 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > > (0,4) | 4 | parent | a87ff679a2f3e71d9181a67b7542122c > > (0,5) | 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > > (0,11) | 6 | parent | updated > > (0,12) | 7 | parent | updated > > (0,13) | 8 | parent | updated > > (0,14) | 9 | parent | updated > > (0,15) | 10 | parent | updated > > (0,1) | 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > > (0,2) | 7 | child | 8f14e45fceea167a5a36dedd4bea2543 >
Re: [PATCH] Add schema and table names to partition error
Hi Chris, On Sun, Mar 1, 2020 at 4:34 AM Chris Bandy wrote: > Hello, > > I'm writing telemetry data into a table partitioned by time. When there > is no partition for a particular date, my app notices the constraint > violation, creates the partition, and retries the insert. > > I'm used to handling constraint violations by observing the constraint > name in the error fields. However, this error had none. I set out to add > the name to the error field, but after a bit of reading my impression is > that partition constraints are more like a property of a table. This makes sense to me. Btree code which implements unique constraints also does this; see _bt_check_unique() function in src/backend/access/nbtree/nbtinsert.c: ereport(ERROR, (errcode(ERRCODE_UNIQUE_VIOLATION), errmsg("duplicate key value violates unique constraint \"%s\"", RelationGetRelationName(rel)), key_desc ? errdetail("Key %s already exists.", key_desc) : 0, errtableconstraint(heapRel, RelationGetRelationName(rel; > I've attached a patch that adds the schema and table name fields to > errors for my use case: Instead of using errtable(), use errtableconstraint() like the btree code does, if only just for consistency. > - Insert data into a partitioned table for which there is no partition. > - Insert data directly into an incorrect partition. There are couple more instances in src/backend/command/tablecmds.c where partition constraint is checked: In ATRewriteTable(): if (partqualstate && !ExecCheck(partqualstate, econtext)) { if (tab->validate_default) ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("updated partition constraint for default partition \"%s\" would be violated by some row", RelationGetRelationName(oldrel; else ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("partition constraint of relation \"%s\" is violated by some row", RelationGetRelationName(oldrel; } Maybe, better fix these too for completeness. Thanks, Amit
Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.
Hi, On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai wrote: > > Hello, > > I noticed the following scenario under the development of truncate > support on FDW. > > In case when 'ftable' maps a remote table that has inherited children,... > > postgres=# create table rtable_parent (id int, label text, x text); > CREATE TABLE > postgres=# create table rtable_child () inherits (rtable_parent); > CREATE TABLE > postgres=# insert into rtable_parent (select x, 'parent', md5(x::text) > from generate_series(1,10) x); > INSERT 0 10 > postgres=# insert into rtable_child (select x, 'child', md5(x::text) > from generate_series(6,15) x); > INSERT 0 10 > postgres=# create foreign table ftable (id int, label text, x text) >server loopback options (table_name 'rtable_parent'); > CREATE FOREIGN TABLE > > The 'ftable' shows the results from both of the parent and children. > postgres=# select * from ftable; > id | label |x > ++-- > 1 | parent | c4ca4238a0b923820dcc509a6f75849b > 2 | parent | c81e728d9d4c2f636f067f89cc14862c > 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > 4 | parent | a87ff679a2f3e71d9181a67b7542122c > 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > 6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc > 7 | parent | 8f14e45fceea167a5a36dedd4bea2543 > 8 | parent | c9f0f895fb98ab9159f51fd0297e236d > 9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26 > 10 | parent | d3d9446802a44259755d38e6d163e820 > 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > 7 | child | 8f14e45fceea167a5a36dedd4bea2543 > 8 | child | c9f0f895fb98ab9159f51fd0297e236d > 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 > 10 | child | d3d9446802a44259755d38e6d163e820 > 11 | child | 6512bd43d9caa6e02c990b0a82652dca > 12 | child | c20ad4d76fe97759aa27a0c99bff6710 > 13 | child | c51ce410c124a10e0db5e4b97fc2af39 > 14 | child | aab3238922bcc25a6f606eb525ffdc56 > 15 | child | 9bf31c7ff062936a96d3c8bd1f8f2ff3 > (20 rows) > > When we try to update the foreign-table without DirectUpdate mode, > remote query tries to update the rows specified by "ctid" system column. > However, it was not a unique key in this case. > > postgres=# explain update ftable set x = 'updated' where id > 10 and > pg_backend_pid() > 0; > QUERY PLAN > - > Update on ftable (cost=100.00..133.80 rows=414 width=74) >-> Result (cost=100.00..133.80 rows=414 width=74) > One-Time Filter: (pg_backend_pid() > 0) > -> Foreign Scan on ftable (cost=100.00..133.80 rows=414 width=42) > (4 rows) > > [*] Note that pg_backend_pid() prevent direct update. > > postgres=# update ftable set x = 'updated' where id > 10 and > pg_backend_pid() > 0; > UPDATE 5 > postgres=# select ctid,* from ftable; > ctid | id | label |x > +++-- > (0,1) | 1 | parent | c4ca4238a0b923820dcc509a6f75849b > (0,2) | 2 | parent | c81e728d9d4c2f636f067f89cc14862c > (0,3) | 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > (0,4) | 4 | parent | a87ff679a2f3e71d9181a67b7542122c > (0,5) | 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > (0,11) | 6 | parent | updated > (0,12) | 7 | parent | updated > (0,13) | 8 | parent | updated > (0,14) | 9 | parent | updated > (0,15) | 10 | parent | updated > (0,1) | 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > (0,2) | 7 | child | 8f14e45fceea167a5a36dedd4bea2543 > (0,3) | 8 | child | c9f0f895fb98ab9159f51fd0297e236d > (0,4) | 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 > (0,5) | 10 | child | d3d9446802a44259755d38e6d163e820 > (0,11) | 11 | child | updated > (0,12) | 12 | child | updated > (0,13) | 13 | child | updated > (0,14) | 14 | child | updated > (0,15) | 15 | child | updated > (20 rows) > > The WHERE-clause (id > 10) should affect only child table. > However, it updated the rows in the parent table with same ctid. > > How about your thought? > Probably, we need to fetch a pair of tableoid and ctid to identify > the remote table exactly, if not direct-update cases. This was this discussed on this thread: https://www.postgresql.org/message-id/CAFjFpRfcgwsHRmpvoOK-GUQi-n8MgAS%2BOxcQo%3DaBDn1COywmcg%40mail.gmail.com Solutions have been proposed too, but none finalized yet. Thanks, Amit
[BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.
Hello, I noticed the following scenario under the development of truncate support on FDW. In case when 'ftable' maps a remote table that has inherited children,... postgres=# create table rtable_parent (id int, label text, x text); CREATE TABLE postgres=# create table rtable_child () inherits (rtable_parent); CREATE TABLE postgres=# insert into rtable_parent (select x, 'parent', md5(x::text) from generate_series(1,10) x); INSERT 0 10 postgres=# insert into rtable_child (select x, 'child', md5(x::text) from generate_series(6,15) x); INSERT 0 10 postgres=# create foreign table ftable (id int, label text, x text) server loopback options (table_name 'rtable_parent'); CREATE FOREIGN TABLE The 'ftable' shows the results from both of the parent and children. postgres=# select * from ftable; id | label |x ++-- 1 | parent | c4ca4238a0b923820dcc509a6f75849b 2 | parent | c81e728d9d4c2f636f067f89cc14862c 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 4 | parent | a87ff679a2f3e71d9181a67b7542122c 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc 7 | parent | 8f14e45fceea167a5a36dedd4bea2543 8 | parent | c9f0f895fb98ab9159f51fd0297e236d 9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26 10 | parent | d3d9446802a44259755d38e6d163e820 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc 7 | child | 8f14e45fceea167a5a36dedd4bea2543 8 | child | c9f0f895fb98ab9159f51fd0297e236d 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 10 | child | d3d9446802a44259755d38e6d163e820 11 | child | 6512bd43d9caa6e02c990b0a82652dca 12 | child | c20ad4d76fe97759aa27a0c99bff6710 13 | child | c51ce410c124a10e0db5e4b97fc2af39 14 | child | aab3238922bcc25a6f606eb525ffdc56 15 | child | 9bf31c7ff062936a96d3c8bd1f8f2ff3 (20 rows) When we try to update the foreign-table without DirectUpdate mode, remote query tries to update the rows specified by "ctid" system column. However, it was not a unique key in this case. postgres=# explain update ftable set x = 'updated' where id > 10 and pg_backend_pid() > 0; QUERY PLAN - Update on ftable (cost=100.00..133.80 rows=414 width=74) -> Result (cost=100.00..133.80 rows=414 width=74) One-Time Filter: (pg_backend_pid() > 0) -> Foreign Scan on ftable (cost=100.00..133.80 rows=414 width=42) (4 rows) [*] Note that pg_backend_pid() prevent direct update. postgres=# update ftable set x = 'updated' where id > 10 and pg_backend_pid() > 0; UPDATE 5 postgres=# select ctid,* from ftable; ctid | id | label |x +++-- (0,1) | 1 | parent | c4ca4238a0b923820dcc509a6f75849b (0,2) | 2 | parent | c81e728d9d4c2f636f067f89cc14862c (0,3) | 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 (0,4) | 4 | parent | a87ff679a2f3e71d9181a67b7542122c (0,5) | 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 (0,11) | 6 | parent | updated (0,12) | 7 | parent | updated (0,13) | 8 | parent | updated (0,14) | 9 | parent | updated (0,15) | 10 | parent | updated (0,1) | 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc (0,2) | 7 | child | 8f14e45fceea167a5a36dedd4bea2543 (0,3) | 8 | child | c9f0f895fb98ab9159f51fd0297e236d (0,4) | 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 (0,5) | 10 | child | d3d9446802a44259755d38e6d163e820 (0,11) | 11 | child | updated (0,12) | 12 | child | updated (0,13) | 13 | child | updated (0,14) | 14 | child | updated (0,15) | 15 | child | updated (20 rows) The WHERE-clause (id > 10) should affect only child table. However, it updated the rows in the parent table with same ctid. How about your thought? Probably, we need to fetch a pair of tableoid and ctid to identify the remote table exactly, if not direct-update cases. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Re: Auxiliary Processes and MyAuxProc
On Sat, Nov 30, 2019 at 9:15 PM Michael Paquier wrote: > > On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote: > > Color me unconvinced. > > The latest comments of the thread have not been addressed yet. so I am > marking the patch as returned with feedback. If you think that's not > correct, please feel free to update the status of the patch. If you > do so, please provide at the same time a rebased version of the patch, > as it is failing to apply on HEAD. I've finally had some time to update the patchset with an attempt at addressing Andres' concerns. Note that the current spin has a bug which does not allow syslogger to run properly. Additionally, it is squashed into one patch again. I intend to split the patch out into separate functional patches and fix the syslogger issue, but wanted to get this on the ticket for the open CommitFest before it closes. I'll go ahead and re-add it and will be pushing updates as I have them. I intend to give this patchset the time it deserves, so will be much more responsive this go-around. Thanks, -- Mike Palmiotto https://crunchydata.com From 34feb7399fe218eab0da851032299f81e7ad6689 Mon Sep 17 00:00:00 2001 From: Mike Palmiotto Date: Tue, 19 Feb 2019 15:29:33 + Subject: [PATCH 2/2] Add a hook to allow extensions to set worker metadata This patch implements a hook that allows extensions to modify a worker process' metadata in backends. Additional work done by Yuli Khodorkovskiy Original discussion here: https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO%3DhtC6LnA6aW4r-%2Bjq%3D3Q5RAoFQgW8EtA%40mail.gmail.com --- src/backend/postmaster/fork_process.c | 11 +++ src/include/postmaster/fork_process.h | 4 2 files changed, 15 insertions(+) diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c index def3cee37e..e392f5207e 100644 --- a/src/backend/postmaster/fork_process.c +++ b/src/backend/postmaster/fork_process.c @@ -22,6 +22,14 @@ #include "postmaster/fork_process.h" +/* + * This hook allows plugins to get control of the child process following + * a successful fork_process(). It can be used to perform some action in + * extensions to set metadata in backends (including special backends) upon + * setting of the session user. + */ +ForkProcess_post_hook_type ForkProcess_post_hook = NULL; + #ifndef WIN32 /* * Wrapper for fork(). Return values are the same as those for fork(): @@ -114,6 +122,9 @@ fork_process(void) #ifdef USE_OPENSSL RAND_cleanup(); #endif + + if (ForkProcess_post_hook) + (*ForkProcess_post_hook) (); } return result; diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h index a42f859a08..b4d1560b72 100644 --- a/src/include/postmaster/fork_process.h +++ b/src/include/postmaster/fork_process.h @@ -20,4 +20,8 @@ extern pid_t fork_process(void); extern pid_t postmaster_forkexec(int argc, char *argvp[]); #endif +/* Hook for plugins to get control after a successful fork_process() */ +typedef void (*ForkProcess_post_hook_type) (); +extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook; + #endif /* FORK_PROCESS_H */ -- 2.21.0 From 6fb91ba335153882d16d2dff28d584c3c0bc83a2 Mon Sep 17 00:00:00 2001 From: Mike Palmiotto Date: Fri, 27 Sep 2019 12:28:19 -0400 Subject: [PATCH 1/2] Introduce subprocess infrastructure This is an initial attempt at coalescing all of the postmaster subprocess startups into one central framework. The patchset introduces a MySubprocess global, which contains an PgSubprocess entry from the process_types array. Each entry has defined entrypoints, cleanup functions, prep functions, and a few other control-oriented variables. This is a rework of https://commitfest.postgresql.org/25/2259/, which attempts to address the feedback from Andres Freund. --- src/backend/bootstrap/bootstrap.c | 78 +- src/backend/main/main.c | 3 + src/backend/postmaster/Makefile | 1 + src/backend/postmaster/autovacuum.c | 165 +--- src/backend/postmaster/bgworker.c | 151 +++- src/backend/postmaster/bgwriter.c | 2 +- src/backend/postmaster/checkpointer.c | 2 +- src/backend/postmaster/pgarch.c | 83 +-- src/backend/postmaster/pgstat.c | 89 +-- src/backend/postmaster/postmaster.c | 786 +++- src/backend/postmaster/startup.c| 8 +- src/backend/postmaster/subprocess.c | 206 + src/backend/postmaster/syslogger.c | 278 +++ src/backend/postmaster/walwriter.c | 2 +- src/backend/replication/walreceiver.c | 2 +- src/backend/utils/init/globals.c| 3 +- src/backend/utils/init/postinit.c | 2 +- src/include/bootstrap/bootstrap.h | 4 +- src/include/miscadmin.h | 1 + src/include/pgstat.h| 5 +- src/include/postmaster/autovacu
Re: TRUNCATE on foreign tables
Hello, The attached is revised version. > > If callback is invoked with a foreign-relation that is specified by TRUNCATE > > command with ONLY, it seems to me reasonable that remote TRUNCATE > > command specifies the relation on behalf of the foreign table with ONLY. > > > > So, if ExecForeignTruncate() has another list to inform the context for each > > relation, postgres_fdw can build proper remote query that may specify the > > remote tables with ONLY-clause. > > Yeah, TRUNCATE can specify ONLY on a per-table basis, so having a > second list makes sense. Then in the FDW, just make sure to > elog(ERROR) if the lengths do no match, and then use forboth() to loop > over them. One thing that you need to be careful about is that tables > which are added to the list because of inheritance should not be > marked with ONLY when generating the command to the remote. > The v5 patch added separated list for the FDW callback, to inform the context when relations are specified by TRUNCATE command. The frels_extra argument is a list of integers. 0 means that relevant foreign-table is specified without "ONLY" clause. and positive means specified with "ONLY" clause. Negative value means that foreign-tables are not specified in the TRUNCATE command, but truncated due to dependency (like partition's child leaf). The remote SQL generates TRUNCATE command according to the above "extra" information. So, "TRUNCATE ONLY ftable" generate a remote query with "TRUNCATE ONLY mapped_remote_table". On the other hand, it can make strange results, although it is a corner case. The example below shows the result of TRUNCATE ONLY on a foreign-table that mapps a remote table with an inherited children. The rows id < 10 belongs to the parent table, thus TRUNCATE ONLY tru_ftable eliminated the remote parent, however, it looks the tru_ftable still contains rows after TRUNCATE command. I wonder whether it is tangible behavior for users. Of course, "ONLY" clause controls local hierarchy of partitioned / inherited tables, however, I'm not certain whether the concept shall be expanded to the structure of remote tables. +SELECT * FROM tru_ftable; + id |x ++-- + 5 | e4da3b7fbbce2345d7772b0674a318d5 + 6 | 1679091c5a880faf6fb5e6087eb1b2dc + 7 | 8f14e45fceea167a5a36dedd4bea2543 + 8 | c9f0f895fb98ab9159f51fd0297e236d + 9 | 45c48cce2e2d7fbdea1afc51c7c6ad26 + 10 | d3d9446802a44259755d38e6d163e820 + 11 | 6512bd43d9caa6e02c990b0a82652dca + 12 | c20ad4d76fe97759aa27a0c99bff6710 + 13 | c51ce410c124a10e0db5e4b97fc2af39 + 14 | aab3238922bcc25a6f606eb525ffdc56 +(10 rows) + +TRUNCATE ONLY tru_ftable; -- truncate only parent portion +SELECT * FROM tru_ftable; + id |x ++-- + 10 | d3d9446802a44259755d38e6d163e820 + 11 | 6512bd43d9caa6e02c990b0a82652dca + 12 | c20ad4d76fe97759aa27a0c99bff6710 + 13 | c51ce410c124a10e0db5e4b97fc2af39 + 14 | aab3238922bcc25a6f606eb525ffdc56 +(5 rows) > > Regarding to the other comments, it's all Ok for me. I'll update the patch. > > And, I forgot "updatable" option at postgres_fdw. It should be checked on > > the truncate also, right? > > Hmm. Good point. Being able to filter that silently through a > configuration parameter is kind of interesting. Now I think that this > should be a separate option because updatable applies to DMLs. Like, > truncatable? > Ok, "truncatable" option was added. Please check the regression test and documentation updates. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei pgsql13-truncate-on-foreign-table.v5.patch Description: Binary data
Re: bool_plperl transform
=?UTF-8?B?SXZhbiBQYW5jaGVua28=?= writes: > While using PL/Perl I have found that it obtains boolean arguments from > Postgres as ‘t’ and ‘f’, which is extremely inconvenient because ‘f’ is not > false from the perl viewpoint. > ... > * make a transform which transforms bool, like it is done with jsonb. This > does not break compatibility and is rather straightforward. Please register this patch in the commitfest app, so we don't lose track of it. https://commitfest.postgresql.org/27/ regards, tom lane
Re: Allowing ALTER TYPE to change storage strategy
Tomas Vondra writes: > On Fri, Feb 28, 2020 at 08:35:33PM -0500, Tom Lane wrote: >> You'd need a moderately strong lock on each such table, which means >> there'd be serious deadlock hazards. I'm dubious that it's worth >> troubling with. > Yeah, I don't plan to do this in v1 (and I have no immediate plan to > work on it after that). But I wonder how is the deadlock risk any > different compared e.g. to DROP TYPE ... CASCADE? True, but dropping a type you're actively using seems pretty improbable; whereas the whole point of the patch you're proposing is that people *would* want to use it in production. regards, tom lane
bool_plperl transform
Hi, While using PL/Perl I have found that it obtains boolean arguments from Postgres as ‘t’ and ‘f’, which is extremely inconvenient because ‘f’ is not false from the perl viewpoint. So the problem is how to convert the SQL booleans into Perl style. There are 3 ways to do this: * make plperl automatically convert bools into something acceptable for perl. This looks simple, but probably is not acceptable as it breaks compatibility. * try to make some trick like it is done with arrays, i.e. convert bools into special Perl objects which look like ‘t’ and ‘f’ when treated as text, but are true and false for boolean operations. I am not sure that it is possible and reliable. * make a transform which transforms bool, like it is done with jsonb. This does not break compatibility and is rather straightforward. So I propose to take the third way and make such transform. This is very simple, a patch is attached. Also this patch improves the plperl documentation page, which now has nothing said about the transforms. Regards, Ivan Panchenko bool_plperl_transform_v1.patch Description: Binary data
Re: Broken resetting of subplan hash tables
Andres Freund writes: > On 2020-02-29 14:02:59 -0500, Tom Lane wrote: >> TBH, I think that this tuple table API is seriously misdesigned; >> it is confusing and very error-prone to have the callers need to >> reset the tuple context separately from calling ResetTupleHashTable. > Do you have an alternative proposal? I'd be inclined to let the tuple hashtable make its own tuple-storage context and reset that for itself. Is it really worth the complexity and bug hazards to share such a context with other uses? > We could change it so more of the metadata for execGrouping.c is > computed outside of BuildTupleHashTableExt(), and continue to destroy > the entire hashtable. But we'd still have to reallocate the hashtable, > the slot, etc. So having a reset interface seems like the right thing. Agreed, the reset interface is a good idea. I'm just not happy that in addition to resetting, you have to remember to reset some vaguely-related context (and heaven help you if you reset that context but not the hashtable). >> Also, the callers all look like their resets are intended to destroy >> the whole hashtable not just its contents (as indeed they were doing, >> before the faulty commit). But I didn't attempt to fix that today. > Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like > that to me? Why would they want to drop the hashtable metadata when > resetting? What am I missing? They may not look like it to you; but Andreas misread that, and so did I at first --- not least because that *is* how it used to work, and there are still comments suggesting that that's how it works, eg this in ExecInitRecursiveUnion: * If hashing, we need a per-tuple memory context for comparisons, and a * longer-lived context to store the hash table. The table can't just be * kept in the per-query context because we want to be able to throw it * away when rescanning. "throw it away" sure looks like it means the entire hashtable, not just its tuple contents. regards, tom lane
Re: Allow to_date() and to_timestamp() to accept localized names
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= writes: > On Tue, Jan 28, 2020 at 9:35 PM Peter Eisentraut < > peter.eisentr...@2ndquadrant.com> wrote: >> So I'm withdrawing my concerns with respect to this patch. As long as >> it can do a roundtrip conversion with to_char(), it's fine. > We can avoid issues with non injective case conversion languages with a > double conversion, so both strings in the comparison end up in the same > state. > I propose an upper/lower conversion as in the attached patch. This seems pretty reasonable to me, with a couple of caveats: * I don't think it's appropriate to hard-wire DEFAULT_COLLATION_OID as the collation to do case-folding with. For to_date/to_timestamp, we have collatable text input so we can rely on the function's input collation instead, at the cost of having to pass down the collation OID through a few layers of subroutines :-(. For parse_datetime, I punted for now and let it use DEFAULT_COLLATION_OID, because that's currently only called by JSONB code that probably hasn't got a usable input collation anyway (since jsonb isn't considered collatable). * I think it's likely worthwhile to do a quick check for an exact match before we go through all these case-folding pushups. If the expected use-case is reversing to_char() output, that will win all the time. We're probably ahead of the game even if it only matches a few percent of the time. Attached v10 incorporates those improvements, plus a bit of polishing of docs and comments. Barring objections, this seems committable to me. regards, tom lane diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 28035f1..8b73e05 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5968,7 +5968,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); TM prefix -translation mode (print localized day and month names based on +translation mode (use localized day and month names based on ) TMMonth @@ -5999,9 +5999,20 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); - TM does not include trailing blanks. - to_timestamp and to_date ignore - the TM modifier. + TM suppresses trailing blanks whether or + not FM is specified. + + + + + + to_timestamp and to_date + ignore letter case in the input; so for + example MON, Mon, + and mon all accept the same strings. When using + the TM modifier, case-folding is done according to + the rules of the function's input collation (see + ). diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index d029468..db99a6b 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -1038,7 +1038,7 @@ static void parse_format(FormatNode *node, const char *str, const KeyWord *kw, static void DCH_to_char(FormatNode *node, bool is_interval, TmToChar *in, char *out, Oid collid); static void DCH_from_char(FormatNode *node, const char *in, TmFromChar *out, - bool std, bool *have_error); + Oid collid, bool std, bool *have_error); #ifdef DEBUG_TO_FROM_CHAR static void dump_index(const KeyWord *k, const int *index); @@ -1057,11 +1057,14 @@ static int from_char_parse_int_len(int *dest, const char **src, const int len, FormatNode *node, bool *have_error); static int from_char_parse_int(int *dest, const char **src, FormatNode *node, bool *have_error); -static int seq_search(const char *name, const char *const *array, int *len); +static int seq_search_ascii(const char *name, const char *const *array, int *len); +static int seq_search_localized(const char *name, char **array, int *len, + Oid collid); static int from_char_seq_search(int *dest, const char **src, const char *const *array, + char **localized_array, Oid collid, FormatNode *node, bool *have_error); -static void do_to_timestamp(text *date_txt, text *fmt, bool std, +static void do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, struct pg_tm *tm, fsec_t *fsec, int *fprec, uint32 *flags, bool *have_error); static char *fill_str(char *str, int c, int max); @@ -2457,7 +2460,7 @@ from_char_parse_int(int *dest, const char **src, FormatNode *node, bool *have_er * suitable for comparisons to ASCII strings. */ static int -seq_search(const char *name, const char *const *array, int *len) +seq_search_ascii(const char *name, const char *const *array, int *len) { unsigned char firstc; const char *const *a; @@ -2503,8 +2506,89 @@ seq_search(const char *name, const char *const *array, int *len) } /* - * Perform a sequential search in 'array' for an entry matching the first - * character(s) of the 'src' string case-insensitively. + * Sequentially search an array of possibly non-English words for
Re: Allowing ALTER TYPE to change storage strategy
On Fri, Feb 28, 2020 at 08:35:33PM -0500, Tom Lane wrote: Tomas Vondra writes: I think we might check if there are any attributes with the given data type, and allow the change if there are none. That would still allow the change when the type is used only for things like function parameters etc. But we'd also have to check for domains (recursively). Still has race conditions. Yeah, I have no problem believing that. One thing I haven't mentioned in the original message is CASCADE. It seems useful to optionally change storage for all attributes with the given data type. But I'm not sure it's actually a good idea, and the amount of code seems non-trivial (it'd have to copy quite a bit of code from ALTER TABLE). You'd need a moderately strong lock on each such table, which means there'd be serious deadlock hazards. I'm dubious that it's worth troubling with. Yeah, I don't plan to do this in v1 (and I have no immediate plan to work on it after that). But I wonder how is the deadlock risk any different compared e.g. to DROP TYPE ... CASCADE? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SLRU statistics
On Sat, Feb 29, 2020 at 11:44:26AM -0300, Alvaro Herrera wrote: On 2020-Feb-29, Tomas Vondra wrote: Did we actually remove track-enabling GUCs? I think we still have - track_activities - track_counts - track_io_timing - track_functions But maybe I'm missing something? Hm I remembered we removed the one for row-level stats (track_row_stats), but what we really did is merge it with block-level stats (track_block_stats) into track_counts -- commit 48f7e6439568. Funnily enough, if you disable that autovacuum won't work, so I'm not sure it's a very useful tunable. And it definitely has more overhead than what this new GUC would have. OK > I find SlruType pretty odd, and the accompanying "if" list in > pg_stat_get_slru() correspondingly so. Would it be possible to have > each SLRU enumerate itself somehow? Maybe add the name in SlruCtlData > and query that, somehow. (I don't think we have an array of SlruCtlData > anywhere though, so this might be a useless idea.) Well, maybe. We could have a system to register SLRUs dynamically, but the trick here is that by having a fixed predefined number of SLRUs simplifies serialization in pgstat.c and so on. I don't think the "if" branches in pg_stat_get_slru() are particularly ugly, but maybe we could replace the enum with a registry of structs, something like rmgrlist.h. It seems like an overkill to me, though. Yeah, maybe we don't have to fix that now. IMO the current solution is sufficient for the purpose. I guess we could just stick a name into the SlruCtlData (and remove SlruType entirely), and use that to identify the stats entries. That might be enough, and in fact we already have that - SimpleLruInit gets a name parameter and copies that to the lwlock_tranche_name. One of the main reasons why I opted to use the enum is that it makes tracking, lookup and serialization pretty trivial - it's just an index lookup, etc. But maybe it wouldn't be much more complex with the name, considering the name length is limited by SLRU_MAX_NAME_LENGTH. And we probably don't expect many entries, so we could keep them in a simple list, or maybe a simplehash. I'm not sure what to do with data for SLRUs that might have disappeared after a restart (e.g. because someone removed an extension). Until now those would be in the all in the "other" entry. The attached v2 fixes the issues in your first message: - I moved the page_miss() call after SlruRecentlyUsed(), but then I realized it's entirely duplicate with the page_read() update done in SlruPhysicalReadPage(). I removed the call from SlruPhysicalReadPage() and renamed page_miss to page_read - that's more consistent with shared buffers stats, which also have buffers_hit and buffer_read. - I've also implemented the reset. I ended up adding a new option to pg_stat_reset_shared, which always resets all SLRU entries. We track the reset timestamp for each SLRU entry, but the value is always the same. I admit this is a bit weird - I did it like this because (a) I'm not sure how to identify the individual entries and (b) the SLRU is shared, so pg_stat_reset_shared seems kinda natural. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index f8e7670f8d..3f45db7ea9 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -692,7 +692,8 @@ CLOGShmemInit(void) { ClogCtl->PagePrecedes = CLOGPagePrecedes; SimpleLruInit(ClogCtl, "clog", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, - CLogControlLock, "pg_xact", LWTRANCHE_CLOG_BUFFERS); + CLogControlLock, "pg_xact", LWTRANCHE_CLOG_BUFFERS, + SLRU_CLOG); } /* diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 630df672cc..44d7ca4483 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -494,7 +494,7 @@ CommitTsShmemInit(void) CommitTsCtl->PagePrecedes = CommitTsPagePrecedes; SimpleLruInit(CommitTsCtl, "commit_timestamp", CommitTsShmemBuffers(), 0, CommitTsControlLock, "pg_commit_ts", - LWTRANCHE_COMMITTS_BUFFERS); + LWTRANCHE_COMMITTS_BUFFERS, SLRU_COMMIT_TS); commitTsShared = ShmemInitStruct("CommitTs shared", sizeof(CommitTimestampShared), diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 50e98caaeb..37a5854284 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -1831,11 +1831,11 @@ MultiXactShmemInit(void) SimpleLruInit(MultiXactOffsetCtl,
Re: Catalog invalidations vs catalog scans vs ScanPgRelation()
Hi, On 2020-02-28 22:10:52 -0800, Andres Freund wrote: > So, um. What happens is that doDeletion() does a catalog scan, which > sets a snapshot. The results of that catalog scan are then used to > perform modifications. But at that point there's no guarantee that we > still hold *any* snapshot, as e.g. invalidations can trigger the catalog > snapshot being released. > > I don't see how that's safe. Without ->xmin preventing that, > intermediate row versions that we did look up could just get vacuumed > away, and replaced with a different row. That does seem like a serious > issue? > > I think there's likely a lot of places that can hit this? What makes it > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release > ->xmin when there previously has been *any* catalog access? Because in > contrast to normal table modifications, there's currently nothing at all > forcing us to hold a snapshot between catalog lookups an their > modifications? > > Am I missing something? Or is this a fairly significant hole in our > arrangements? I still think that's true. In a first iteration I hacked around the problem by explicitly registering a catalog snapshot in RemoveTempRelations(). That *sometimes* allows to get through the regression tests without the assertions triggering. But I don't think that's good enough (even if we fixed the other potential crashes similarly). The only reason that avoids the asserts is because in nearly all other cases there's also a user snapshot that's pushed. But that pushed snapshot can have an xmin that's newer than the catalog snapshot, which means we're still in danger of tids from catalog scans being outdated. My preliminary conclusion is that it's simply not safe to do SnapshotResetXmin() from within InvalidateCatalogSnapshot(), PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need to defer the SnapshotResetXmin() call until at least CommitTransactionCommand()? Outside of that there ought (with exception of multi-transaction commands, but they have to be careful anyway) to be no "in progress" sequences of related catalog lookups/modifications. Alternatively we could ensure that all catalog lookup/mod sequences ensure that the first catalog snapshot is registered. But that seems like a gargantuan task? > The easiest way to fix this would probably be to have inval.c call a > version of InvalidateCatalogSnapshot() that leaves the oldest catalog > snapshot around, but sets up things so that GetCatalogSnapshot() will > return a freshly taken snapshot? ISTM that pretty much every > InvalidateCatalogSnapshot() call within a transaction needs that behaviour? A related question is in which cases is it actually safe to use a snapshot that's not registered, nor pushed as the active snapshot. snapmgr.c just provides: * Note that the return value may point at static storage that will be modified * by future calls and by CommandCounterIncrement(). Callers should call * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be * used very long. but doesn't clarify what 'very long' means. As far as I can tell, there's very little that actually safe. It's probably ok to do a single visiblity test, but anything that e.g. has a chance of accepting invalidations is entirely unsafe? Greetings, Andres Freund
Re: vacuum verbose detail logs are unclear; log at *start* of each stage
On Thu, Feb 27, 2020 at 10:10:57AM +0100, Peter Eisentraut wrote: > On 2020-01-26 06:36, Justin Pryzby wrote: > >Subject: [PATCH v3 1/4] Remove gettext erronously readded at 580ddce > > > >-appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0)); > >+appendStringInfo(&buf, "%s.", pg_rusage_show(&ru0)); > > Why do you think it was erroneously added? I was wrong. It seemed useless to me to translate "%s.", but I see now that at least JA.PO uses a different terminator than ".". $ git grep -C3 'msgid "%s."$' '*.po' |grep msgstr src/backend/po/de.po-msgstr "%s." src/backend/po/es.po-msgstr "%s." src/backend/po/fr.po-msgstr "%s." src/backend/po/id.po-msgstr "%s." src/backend/po/it.po-msgstr "%s." src/backend/po/ja.po-msgstr "%s。" BTW I only *happened* to see your message on the www interface. I didn't get the original message. And the "Resend email" button didn't get it to me, either.. -- Justin
Re: Broken resetting of subplan hash tables
Hi, On 2020-02-29 14:02:59 -0500, Tom Lane wrote: > > 3. Additionally since the memory context used by the hash tables is > > reset in buildSubPlanHash() if we start resetting hash tables we will > > get a use after free bug. > > Nope, not right. The hash table metadata is now allocated in the > es_query_cxt; what is in the hashtablecxt is just tuples, and that > does need to be cleared, per the comment for ResetTupleHashTable. > Your patch as given results in a nasty memory leak, which is easily > demonstrated with a small mod to the regression test case I added: > select sum(ss.tst::int) from > generate_series(1,1000) o cross join lateral ( > select i.ten in (select f1 from int4_tbl where f1 <= o) as tst, > random() as r > from onek i where i.unique1 = o%1000 ) ss; > > > Since the current behavior of the code in HEAD is not actually broken, > > it is just an optimization which is not used, this fix does not have to > > be backpatched. > > Unfortunately ... this test case also leaks memory like mad in > HEAD, v12, and v11, because all of them are rebuilding the hash > table from scratch without clearing the old one. So this is > indeed broken and a back-patch is necessary. Yea :(. Thanks for doing that. > I noted while looking at this that most of the calls of > ResetTupleHashTable are actually never reached in the regression > tests (per code coverage report) so I made up some test cases > that do reach 'em, and included those in the commit. Cool. > TBH, I think that this tuple table API is seriously misdesigned; > it is confusing and very error-prone to have the callers need to > reset the tuple context separately from calling ResetTupleHashTable. Do you have an alternative proposal? Before committing the patch adding it that way, I'd waited for quite a while asking for input... In several cases (nodeAgg.c, nodeSetOp.c) there's memory from outside execGrouping.c that's also allocated in the same context as the table contents (via TupleHashTable->additional) - just resetting the context passed to BuildTupleHashTableExt() as part of ResetTupleHashTable() seems problematic too. We could change it so more of the metadata for execGrouping.c is computed outside of BuildTupleHashTableExt(), and continue to destroy the entire hashtable. But we'd still have to reallocate the hashtable, the slot, etc. So having a reset interface seems like the right thing. I guess we could set it up so that BuildTupleHashTableExt() registers a memory context reset callback on tablecxt, which'd reinitialize the hashtable. But that seems like it'd be at least as confusing? > Also, the callers all look like their resets are intended to destroy > the whole hashtable not just its contents (as indeed they were doing, > before the faulty commit). But I didn't attempt to fix that today. Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like that to me? Why would they want to drop the hashtable metadata when resetting? What am I missing? Greetings, Andres Freund
[PATCH] Add schema and table names to partition error
Hello, I'm writing telemetry data into a table partitioned by time. When there is no partition for a particular date, my app notices the constraint violation, creates the partition, and retries the insert. I'm used to handling constraint violations by observing the constraint name in the error fields. However, this error had none. I set out to add the name to the error field, but after a bit of reading my impression is that partition constraints are more like a property of a table. I've attached a patch that adds the schema and table name fields to errors for my use case: - Insert data into a partitioned table for which there is no partition. - Insert data directly into an incorrect partition. Thanks, Chris >From 8261b366c49b2d04baeb882d39dfa626b9315889 Mon Sep 17 00:00:00 2001 From: Chris Bandy Date: Sat, 29 Feb 2020 12:47:56 -0600 Subject: [PATCH] Add schema and table names to partition error --- src/backend/executor/execMain.c | 3 ++- src/backend/executor/execPartition.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git src/backend/executor/execMain.c src/backend/executor/execMain.c index ee5c3a60ff..7cb486b211 100644 --- src/backend/executor/execMain.c +++ src/backend/executor/execMain.c @@ -1878,7 +1878,8 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("new row for relation \"%s\" violates partition constraint", RelationGetRelationName(resultRelInfo->ri_RelationDesc)), - val_desc ? errdetail("Failing row contains %s.", val_desc) : 0)); + val_desc ? errdetail("Failing row contains %s.", val_desc) : 0, + errtable(resultRelInfo->ri_RelationDesc))); } /* diff --git src/backend/executor/execPartition.c src/backend/executor/execPartition.c index c13b1d3501..a5542b92c7 100644 --- src/backend/executor/execPartition.c +++ src/backend/executor/execPartition.c @@ -345,7 +345,8 @@ ExecFindPartition(ModifyTableState *mtstate, RelationGetRelationName(rel)), val_desc ? errdetail("Partition key of the failing row contains %s.", - val_desc) : 0)); + val_desc) : 0, + errtable(rel))); } if (partdesc->is_leaf[partidx]) -- 2.11.0
Re: subplan resets wrong hashtable
I wrote: > Right. So the incorrect ResetTupleHashTable call is unreachable > (and a look at the code coverage report confirms that). The whole > thing obviously is a bit hasty and unreviewed, but it doesn't have > a live bug AFAICS ... or at least, if there's a bug, it's a memory > leakage issue across repeat executions, not a crash hazard. For the archives' sake: this *is* a memory leak, and we dealt with it at 58c47ccfff20b8c125903482725c1dbfd30beade. regards, tom lane
Re: Broken resetting of subplan hash tables
Andreas Karlsson writes: > The code for resetting the hash tables of the SubPlanState node in > buildSubPlanHash() looks like it can never run, and additionally it > would be broken if it would ever run. This issue was introduced in > commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd. Right. Justin Pryzby also noted this a couple weeks back, but we didn't deal with it at that point because we were up against a release deadline. > As far as I gather the following is true: > 1. It sets node->hashtable and node->hashnulls to NULL a few lines > before checking if they are not NULL which means the code for resetting > them cannot ever be reached. Yeah. Those lines should have been removed when the ResetTupleHashTable logic was added. > 2. But if we changed to code so that the ResetTupleHashTable() calls are > reachable we would hit a typo. It resets node->hashtable twice and never > resets node->hashnulls which would cause non-obvious bugs. Right. > 3. Additionally since the memory context used by the hash tables is > reset in buildSubPlanHash() if we start resetting hash tables we will > get a use after free bug. Nope, not right. The hash table metadata is now allocated in the es_query_cxt; what is in the hashtablecxt is just tuples, and that does need to be cleared, per the comment for ResetTupleHashTable. Your patch as given results in a nasty memory leak, which is easily demonstrated with a small mod to the regression test case I added: select sum(ss.tst::int) from generate_series(1,1000) o cross join lateral ( select i.ten in (select f1 from int4_tbl where f1 <= o) as tst, random() as r from onek i where i.unique1 = o%1000 ) ss; > Since the current behavior of the code in HEAD is not actually broken, > it is just an optimization which is not used, this fix does not have to > be backpatched. Unfortunately ... this test case also leaks memory like mad in HEAD, v12, and v11, because all of them are rebuilding the hash table from scratch without clearing the old one. So this is indeed broken and a back-patch is necessary. I noted while looking at this that most of the calls of ResetTupleHashTable are actually never reached in the regression tests (per code coverage report) so I made up some test cases that do reach 'em, and included those in the commit. TBH, I think that this tuple table API is seriously misdesigned; it is confusing and very error-prone to have the callers need to reset the tuple context separately from calling ResetTupleHashTable. Also, the callers all look like their resets are intended to destroy the whole hashtable not just its contents (as indeed they were doing, before the faulty commit). But I didn't attempt to fix that today. regards, tom lane
Re: Portal->commandTag as an enum
> On Feb 28, 2020, at 5:42 PM, Tom Lane wrote: > > Mark Dilger writes: >>> On Feb 28, 2020, at 3:05 PM, Tom Lane wrote: >>> Is there a way to drop that logic altogether by making the tagname string >>> be "INSERT 0" for the INSERT case? Or would the zero bleed into other >>> places where we don't want it? > >> In general, I don't think we want to increase the number of distinct >> tags. Which command you finished running and whether you want a >> rowcount and/or lastoid are orthogonal issues. > > Well, my thought is that last_oid is gone and it isn't ever coming back. > So the less code we use supporting a dead feature, the better. > > If we can't remove the special case in EndCommand() altogether, I'd be > inclined to hard-code it as "if (tag == CMDTAG_INSERT ..." rather than > expend infrastructure on treating last_oid as a live option for commands > to have. You may want to think about the embedding of InvalidOid into the EndCommand output differently from how you think about the embedding of the rowcount into the EndCommand output, but my preference is to treat these issues the same and make a strong distinction between the commandtag and the embedded oid and/or rowcount. It's hard to say how many future features would be crippled by having the embedded InvalidOid in the commandtag, but as an example *right now* in the works, we have a feature to count how many commands of a given type have been executed. It stands to reason that feature, whether accepted in its current form or refactored, would not want to show users a pg_stats table like this: cnt command - 5INSERT 0 37 SELECT What the heck is the zero doing after the INSERT? That's the hardcoded InvalidOid that you are apparently arguing for. You could get around that by having the pg_sql_stats patch have its own separate set of command tag strings, but why would we intentionally design that sort of duplication into the solution? As for hardcoding the behavior of whether to embed a rowcount in the output from EndCommand; In src/backend/replication/walsender.c, exec_replication_command() returns "SELECT" from EndCommand, and not "SELECT $rowcount" like everywhere else. The patch as submitted does not change behavior. It only refactors the code while preserving the current behavior. So we would have to agree that the patch can change how exec_replication_command() behaves and start embedding a rowcount there, too, if we want to make SELECT behave the same everywhere. There is another problem, though, which is that if we're hoping to eventually abate this historical behavior and stop embedding InvalidOid and/or rowcount in the commandtag returned from EndCommand, it might be necessary (for backward compatibility with clients) to do that incrementally, in which case we still need the distinction between commandtags and formats to exist in the code. How else can you say that, for example, in the next rev of the protocol that we're not going to embed InvalidOid anymore, but we will continue to return it for clients who connect via the older protocol? What if the next rev of the protocol still returns rowcount, but in a way that doesn't require the clients to implement (or link to) a parser that extracts the rowcount by parsing a string? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pgbench: option delaying queries till connections establishment?
Hi, On 2020-02-29 15:29:19 +0100, Fabien COELHO wrote: > Pgbench is more or less designed to run a long hopefully steady-state > benchmark, so that the initial connection setup is always negligeable. Not > complying with this hypothesis quite often leads to weird results. I don't think this is a good starting point. Sure, a longer run will yield more precise results, and one needs more than just an instantaneous measurement. But in a lot of cases we want to use pgbench to measure a lot of different variations, making it infeasible for each run to be all that long. Of course whether that's feasible depends on the workload (e.g. readonly runs can be shorter than read/write runs). Also note that in the case that made me look at this, you'd have to run the test for *weeks* to drown out the performance difference that's solely caused by difference in how long individual connects are established. Partially because the "excluding connection establishing" number is entirely broken, but also because fewer connections having been established changes the performance so much. I think we should also consider making pgbench actually use non-blocking connection establishment. It seems pretty weird that that's the one libpq operation where we don't? In particular for -C, with -c > -j, that makes the results pretty meaningless. > Adding a synchronization barrier should be simple enough, I thought about it > in the past. > > However, I'd still be wary that it is no silver bullet: if you start a lot > of threads compared to the number of available cores, pgbench would > basically overload the system, and you would experience a lot of waiting > time which reflects that the client code has not got enough cpu time. > Basically you would be testing the OS process/thread management performance. Sure, that's possible. But I don't see what that has to do with the barrier? Also, most scripts actually have client/server interaction... Greetings, Andres Freund
Re: Yet another fast GiST build (typo)
> On 29 февр. 2020 г., at 17:20, Erik Rijkers wrote: > > Small typo alert: > In v5-0002-Implement-GiST-build-using-sort-support.patch there is: > > + method is also optional and is used diring fast GiST build. > > 'diring' should be 'during' > Thanks! I've fixed this and added patch with GiST reloption to provide sort support function. Best regards, Andrey Borodin. v6-0001-Add-sort-support-for-point-gist_point_sortsupport.patch Description: Binary data v6-0002-Implement-GiST-build-using-sort-support.patch Description: Binary data v6-0003-Function-relopt-for-gist-build.patch Description: Binary data
Re: proposal \gcsv
On Sat, Feb 29, 2020 at 11:59:22AM +0100, Pavel Stehule wrote: > so 29. 2. 2020 v 11:34 odesílatel Vik Fearing > napsal: > > > On 29/02/2020 06:43, Pavel Stehule wrote: > > > Hi > > > > > > I would to enhance \g command about variant \gcsv > > > > > > proposed command has same behave like \g, only the result will be every > > > time in csv format. > > > > > > It can helps with writing psql macros wrapping \g command. > > > > > > Options, notes? > > > > But then we would need \ghtml and \glatex etc. If we want a shortcut > > for setting a one-off format, I would go for \gf or something. > > > > \gf csv > > \gf html > > \gf latex > > > > usability of html or latex format in psql is significantly lower than csv > format. There is only one generic format for data - csv. Not exactly. There's a lot of uses for things along the lines of \gf json \gf yaml I'd rather add a new \gf that takes arguments, as it seems more extensible. For example, there are uses for \gf csv header if no header is the default, or \gf csv noheader if header is the default. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: ALTER tbl rewrite loses CLUSTER ON index
On Fri, Feb 28, 2020 at 08:42:02PM -0600, Justin Pryzby wrote: > On Fri, Feb 28, 2020 at 06:26:04PM -0500, Tom Lane wrote: > > Justin Pryzby writes: > > > I think the attached is 80% complete (I didn't touch pg_dump). > > > One objection to this change would be that all relations (including > > > indices) > > > end up with relclustered fields, and pg_index already has a number of > > > bools, so > > > it's not like this one bool is wasting a byte. > > > I think relisclustered was a's clever way of avoiding that overhead > > > (c0ad5953). > > > So I would be -0.5 on moving it to pg_class.. > > > But I think 0001 and 0002 are worthy. Maybe the test in 0002 should live > > > somewhere else. > > > > 0001 has been superseded by events (faade5d4c), so the cfbot is choking > > on that one's failure to apply, and not testing any further. Please > > repost without 0001 so that we can get this testing again. > > I've just noticed while working on (1) that this separately affects REINDEX > CONCURRENTLY, which would be a new bug in v12. Without CONCURRENTLY there's > no > issue. I guess we need a separate patch for that case. Rebased Amit's patch and included my own 0002 to fix the REINDEX CONCURRENTLY issue. -- Justin >From dccc6f08450eacafc3a08bc8b2836d2feb23a533 Mon Sep 17 00:00:00 2001 From: Amit Langote Date: Thu, 6 Feb 2020 18:14:16 +0900 Subject: [PATCH v4 1/2] ALTER tbl rewrite loses CLUSTER ON index --- src/backend/commands/tablecmds.c | 39 +++ src/test/regress/expected/alter_table.out | 34 src/test/regress/sql/alter_table.sql | 16 +- 3 files changed, 88 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 02a7c04fdb..6b2469bd09 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -490,6 +490,7 @@ static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, static void RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid, Relation rel, List *domname, const char *conname); +static void PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid); static void TryReuseIndex(Oid oldId, IndexStmt *stmt); static void TryReuseForeignKey(Oid oldId, Constraint *con); static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName, @@ -11838,6 +11839,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, newcmd->def = (Node *) stmt; tab->subcmds[AT_PASS_OLD_INDEX] = lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd); + + /* Preserve index's indisclustered property, if set. */ + PreserveClusterOn(tab, AT_PASS_OLD_INDEX, oldId); } else if (IsA(stm, AlterTableStmt)) { @@ -11874,6 +11878,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, rel, NIL, indstmt->idxname); + + /* Preserve index's indisclustered property, if set. */ + PreserveClusterOn(tab, AT_PASS_OLD_INDEX, indoid); } else if (cmd->subtype == AT_AddConstraint) { @@ -11996,6 +12003,38 @@ RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid, tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd); } +/* + * For a table's index that is to be recreated due to PostAlterType + * processing, preserve its indisclustered property by issuing ALTER TABLE + * CLUSTER ON command on the table that will run after the command to recreate + * the index. + */ +static void +PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid) +{ + HeapTuple indexTuple; + Form_pg_index indexForm; + + Assert(OidIsValid(indoid)); + Assert(pass == AT_PASS_OLD_INDEX); + + indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indoid)); + if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indoid); + indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + + if (indexForm->indisclustered) + { + AlterTableCmd *newcmd = makeNode(AlterTableCmd); + + newcmd->subtype = AT_ClusterOn; + newcmd->name = get_rel_name(indoid); + tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd); + } + + ReleaseSysCache(indexTuple); +} + /* * Subroutine for ATPostAlterTypeParse(). Calls out to CheckIndexCompatible() * for the real analysis, then mutates the IndexStmt based on that verdict. diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index fb6d86a269..a01c6d6ec5 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -4296,3 +4296,37 @@ create trigger xtrig update bar1 set a = a + 1; INFO: a=1, b=1 /* End test case for bug #16242 */ +-- alter type rewrite/rebuild should preserve cluster marking on index +create table alttype_cluster (a int); +create index alttype_cluster_a on alttype_cluster (a); +alter table alttype_cluster cluster on alttype_cluster_a; +select
Re: Binary support for pgoutput plugin
On Fri, 28 Feb 2020 at 18:34, Tom Lane wrote: > Dave Cramer writes: > > Rebased against head > > The cfbot's failing to apply this [1]. It looks like the reason is only > that you included a catversion bump in what you submitted. Protocol is to > *not* do that in submitted patches, but rely on the committer to add it at > the last minute --- otherwise you'll waste a lot of time rebasing the > patch, which is what it needs now. > > regards, tom lane > > [1] http://cfbot.cputube.org/patch_27_2152.log rebased and removed the catversion bump. 0006-make-sure-the-subcription-is-marked-as-binary.patch Description: Binary data 0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch Description: Binary data 0002-add-binary-column-to-pg_subscription.patch Description: Binary data 0004-get-relid-inside-of-logical_read_insert.patch Description: Binary data 0005-Remove-C99-declaration-and-code.patch Description: Binary data 0001-First-pass-at-working-code-without-subscription-opti.patch Description: Binary data 0007-check-that-the-subscriber-is-compatible-with-the-pub.patch Description: Binary data
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Sat, Feb 29, 2020 at 03:35:27PM +0300, Alexey Kondratov wrote: > Anyway, new version is attached. It is rebased in order to resolve conflicts > with a recent fix of REINDEX CONCURRENTLY + temp relations, and includes > this small comment fix. Thanks for rebasing - I actually started to do that yesterday. I extracted the bits from your original 0001 patch which handled CLUSTER and VACUUM FULL. I don't think if there's any interest in combining that with ALTER anymore. On another thread (1), I tried to implement that, and Tom pointed out problem with the implementation, but also didn't like the idea. I'm including some proposed fixes, but didn't yet update the docs, errors or tests for that. (I'm including your v8 untouched in hopes of not messing up the cfbot). My fixes avoid an issue if you try to REINDEX onto pg_default, I think due to moving system toast indexes. template1=# REINDEX DATABASE template1 TABLESPACE pg_default; 2020-02-29 08:01:41.835 CST [23382] WARNING: cannot change tablespace of indexes for mapped relations, skipping all WARNING: cannot change tablespace of indexes for mapped relations, skipping all 2020-02-29 08:01:41.894 CST [23382] ERROR: SMgrRelation hashtable corrupted 2020-02-29 08:01:41.894 CST [23382] STATEMENT: REINDEX DATABASE template1 TABLESPACE pg_default; 2020-02-29 08:01:41.894 CST [23382] WARNING: AbortTransaction while in COMMIT state 2020-02-29 08:01:41.895 CST [23382] PANIC: cannot abort transaction 491, it was already committed -- Justin (1) https://www.postgresql.org/message-id/flat/20200208150453.GV403%40telsasoft.com >From e3f7d8e08a00d3c6d02464b81e25b220b5945c89 Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Sat, 29 Feb 2020 15:35:27 +0300 Subject: [PATCH v9 1/3] Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly On 2020-02-11 19:48, Justin Pryzby wrote: > For your v7 patch, which handles REINDEX to a new tablespace, I have a > few > minor comments: > > + * the relation will be rebuilt. If InvalidOid is used, the default > > => should say "currrent", not default ? > Yes, it keeps current index tablespace in that case, thanks. > > +++ b/doc/src/sgml/ref/reindex.sgml > +TABLESPACE > ... > + class="parameter">new_tablespace > > => I saw you split the description of TABLESPACE from new_tablespace > based on > comment earlier in the thread, but I suggest that the descriptions for > these > should be merged, like: > > + > +TABLESPACE class="parameter">new_tablespace > + > + > + Allow specification of a tablespace where all rebuilt indexes > will be created. > + Cannot be used with "mapped" relations. If > SCHEMA, > + DATABASE or SYSTEM are > specified, then > + all unsuitable relations will be skipped and a single > WARNING > + will be generated. > + > + > + > It sounds good to me, but here I just obey the structure, which is used all around. Documentation of ALTER TABLE/DATABASE, REINDEX and many others describes each literal/parameter in a separate entry, e.g. new_tablespace. So I would prefer to keep it as it is for now. > > The existing patch is very natural, especially the parts in the > original patch > handling vacuum full and cluster. Those were removed to concentrate on > REINDEX, and based on comments that it might be nice if ALTER handled > CLUSTER > and VACUUM FULL. On a separate thread, I brought up the idea of ALTER > using > clustered order. Tom pointed out some issues with my implementation, > but > didn't like the idea, either. > > So I suggest to re-include the CLUSTER/VAC FULL parts as a separate > 0002 patch, > the same way they were originally implemented. > > BTW, I think if "ALTER" were updated to support REINDEX (to allow > multiple > operations at once), it might be either: > |ALTER INDEX i SET TABLESPACE , REINDEX -- to reindex a single index > on a given tlbspc > or > |ALTER TABLE tbl REINDEX USING INDEX TABLESPACE spc; -- to reindex all > inds on table inds moved to a given tblspc > "USING INDEX TABLESPACE" is already used for ALTER..ADD column/table > CONSTRAINT. > Yes, I also think that allowing REINDEX/CLUSTER/VACUUM FULL to put resulting relation in a different tablespace is a very natural operation. However, I did a couple of attempts to integrate latter two with ALTER TABLE and failed with it, since it is already complex enough. I am still willing to proceed with it, but not sure how soon it will be. Anyway, new version is attached. It is rebased in order to resolve conflicts with a recent fix of REINDEX CONCURRENTLY + temp relations, and includes this small comment fix. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com The Russian Postgres Company >From d2b7a5fa2e11601759b47af0c142a7824ef907a2 Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Mon, 30 Dec 2019 20:00:37 +0300 Subject: [PATCH v8] Allow REINDEX to change tablespace REINDEX already does full relation rewrite, this
[REPORT] Possible Memory Leak Postgres Windows
Hi, I'm sending this report from DrMemory, which shows some leaks from the current postgres. DrMemory is it is a reliable tool, but it is not perfect. ( https://drmemory.org/) regards, Ranier Vilela Dr. Memory version 2.2.0 build 1 built on Jul 1 2019 00:42:20 Windows version: WinVer=105;Rel=1909;Build=18363;Edition=Core Dr. Memory results for pid 24772: "postgres.exe" Application cmdline: "postgres.exe -D c:/postgres/data -F -c listen_addresses=localhost -k """ Recorded 117 suppression(s) from default c:\DrMemory\bin\suppress-default.txt Error #1: UNINITIALIZED READ: reading register eax # 0 KERNELBASE.dll!FindNextFileW # 1 KERNELBASE.dll!FindNextFileA # 2 readdir [C:\dll\postgres\src\port\dirent.c:96] # 3 ReadDir [C:\dll\postgres\src\backend\storage\file\fd.c:2660] # 4 SlruScanDirectory [C:\dll\postgres\src\backend\access\transam\slru.c:1401] # 5 AsyncShmemInit [C:\dll\postgres\src\backend\commands\async.c:564] # 6 CreateSharedMemoryAndSemaphores [C:\dll\postgres\src\backend\storage\ipc\ipci.c:265] # 7 PostmasterMain [C:\dll\postgres\src\backend\postmaster\postmaster.c:1008] # 8 main [C:\dll\postgres\src\backend\main\main.c:210] Note: @0:00:12.076 in thread 15692 Note: instruction: cmp%eax $0x0018 Error #2: UNINITIALIZED READ: reading register eax # 0 KERNELBASE.dll!FindNextFileW # 1 KERNELBASE.dll!FindNextFileA # 2 readdir [C:\dll\postgres\src\port\dirent.c:96] # 3 ReadDir [C:\dll\postgres\src\backend\storage\file\fd.c:2660] # 4 SlruScanDirectory [C:\dll\postgres\src\backend\access\transam\slru.c:1401] # 5 AsyncShmemInit [C:\dll\postgres\src\backend\commands\async.c:564] # 6 CreateSharedMemoryAndSemaphores [C:\dll\postgres\src\backend\storage\ipc\ipci.c:265] # 7 PostmasterMain [C:\dll\postgres\src\backend\postmaster\postmaster.c:1008] # 8 main [C:\dll\postgres\src\backend\main\main.c:210] Note: @0:00:12.076 in thread 15692 Note: instruction: data16 mov%cx -> 0x0234(%esi,%eax,2) Error #3: UNINITIALIZED READ: reading register eax # 0 KERNELBASE.dll!FindNextFileW # 1 KERNELBASE.dll!FindNextFileA # 2 readdir [C:\dll\postgres\src\port\dirent.c:96] # 3 ReadDir [C:\dll\postgres\src\backend\storage\file\fd.c:2660] # 4 SlruScanDirectory [C:\dll\postgres\src\backend\access\transam\slru.c:1401] # 5 AsyncShmemInit [C:\dll\postgres\src\backend\commands\async.c:564] # 6 CreateSharedMemoryAndSemaphores [C:\dll\postgres\src\backend\storage\ipc\ipci.c:265] # 7 PostmasterMain [C:\dll\postgres\src\backend\postmaster\postmaster.c:1008] # 8 main [C:\dll\postgres\src\backend\main\main.c:210] Note: @0:00:12.076 in thread 15692 Note: instruction: test %eax %eax Error #4: UNINITIALIZED READ: reading register eax # 0 KERNELBASE.dll!FindNextFileW # 1 KERNELBASE.dll!FindNextFileA # 2 readdir [C:\dll\postgres\src\port\dirent.c:96] # 3 RemovePgTempRelationFiles [C:\dll\postgres\src\backend\storage\file\fd.c:3142] # 4 RemovePgTempFiles [C:\dll\postgres\src\backend\storage\file\fd.c:3020] # 5 PostmasterMain [C:\dll\postgres\src\backend\postmaster\postmaster.c:1325] # 6 main [C:\dll\postgres\src\backend\main\main.c:210] Note: @0:00:13.826 in thread 15692 Note: instruction: cmp%eax $0x0018 Error #5: UNINITIALIZED READ: reading register eax # 0 KERNELBASE.dll!FindNextFileW # 1 KERNELBASE.dll!FindNextFileA # 2 readdir [C:\dll\postgres\src\port\dirent.c:96] # 3 RemovePgTempRelationFiles [C:\dll\postgres\src\backend\storage\file\fd.c:3142] # 4 RemovePgTempFiles [C:\dll\postgres\src\backend\storage\file\fd.c:3020] # 5 PostmasterMain [C:\dll\postgres\src\backend\postmaster\postmaster.c:1325] # 6 main [C:\dll\postgres\src\backend\main\main.c:210] Note: @0:00:13.826 in thread 15692 Note: instruction: data16 mov%cx -> 0x0234(%esi,%eax,2) Error #6: UNINITIALIZED READ: reading register eax # 0 KERNELBASE.dll!FindNextFileW # 1 KERNELBASE.dll!FindNextFileA # 2 readdir [C:\dll\postgres\src\port\dirent.c:96] # 3 RemovePgTempRelationFilesInDbspace [C:\dll\postgres\src\backend\storage\file\fd.c:3170] # 4 RemoveP
Re: SLRU statistics
On 2020-Feb-29, Tomas Vondra wrote: > Did we actually remove track-enabling GUCs? I think we still have > > - track_activities > - track_counts > - track_io_timing > - track_functions > > But maybe I'm missing something? Hm I remembered we removed the one for row-level stats (track_row_stats), but what we really did is merge it with block-level stats (track_block_stats) into track_counts -- commit 48f7e6439568. Funnily enough, if you disable that autovacuum won't work, so I'm not sure it's a very useful tunable. And it definitely has more overhead than what this new GUC would have. > > I find SlruType pretty odd, and the accompanying "if" list in > > pg_stat_get_slru() correspondingly so. Would it be possible to have > > each SLRU enumerate itself somehow? Maybe add the name in SlruCtlData > > and query that, somehow. (I don't think we have an array of SlruCtlData > > anywhere though, so this might be a useless idea.) > > Well, maybe. We could have a system to register SLRUs dynamically, but > the trick here is that by having a fixed predefined number of SLRUs > simplifies serialization in pgstat.c and so on. I don't think the "if" > branches in pg_stat_get_slru() are particularly ugly, but maybe we could > replace the enum with a registry of structs, something like rmgrlist.h. > It seems like an overkill to me, though. Yeah, maybe we don't have to fix that now. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgbench: option delaying queries till connections establishment?
Hello Kyotaro-san, I think this also shows that "including/excluding connections establishing" as well as some of the other stats reported pretty bogus. In the 'before' case a substantial numer of the connections had not yet been established until the end of the test run! I see it useful. In most cases we don't care connection time of pgbench. Especially in the mentioned case the result is just bogus. I think the reason for "including/excluding connection establishing" is not that people wants to see how long connection took to establish but that how long the substantial work took. If each client did run with continuously re-establishing new connections the connection time would be useful, but actually all the connections are established at once at the beginning. So FWIW I prefer that the barrier is applied by default Yep. (that is, it can be disabled) On reflection, I'm not sure I see a use case for not running the barrier if it is available. and the progress time starts at the time all clients has been established. Yep, the start time should be set after the connection barrier, and possibly before a start barrier to ensure that no transaction has started before the start time: although performance measures are expected to be slightly false because of how they are measured (measuring takes time), from a benchmarking perspective the displayed result should be <= the actual performance. Now, again, if long benchmarks are run, which for a db should more or less always be the case, this should not matter much. starting vacuum...end. + time to established 5000 connections: 1323ms Yep, maybe showing the initial connection time separately. -- Fabien.
Re: pgbench: option delaying queries till connections establishment?
Hello Andres, Therefore I'd like to make pgbench wait till it has established all connections, before they run queries. Does anybody else see this as being useful? Yes, I think that having this behavior available would make sense. If so, should this be done unconditionally? Dunno. I should think about it. I'd say probably. Pgbench is more or less designed to run a long hopefully steady-state benchmark, so that the initial connection setup is always negligeable. Not complying with this hypothesis quite often leads to weird results. A new option? Maybe, if not unconditional. If there is an unconditional barrier, the excluding/including connection stuff does not make a lot of sense when not under -C, if it did make any sense before… Included in an existing one somehow? Which one would you suggest? Adding a synchronization barrier should be simple enough, I thought about it in the past. However, I'd still be wary that it is no silver bullet: if you start a lot of threads compared to the number of available cores, pgbench would basically overload the system, and you would experience a lot of waiting time which reflects that the client code has not got enough cpu time. Basically you would be testing the OS process/thread management performance. On my 4-core laptop, with a do-nothing script (\set i 0): sh> pgbench -T 10 -f nope.sql -P 1 -j 10 -c 10 latency average = 0.000 ms latency stddev = 0.049 ms tps = 21048841.630291 (including connections establishing) tps = 21075139.938887 (excluding connections establishing) sh> pgbench -T 10 -f nope.sql -P 1 -j 100 -c 100 latency average = 0.002 ms latency stddev = 0.470 ms tps = 23846777.241370 (including connections establishing) tps = 24132252.146257 (excluding connections establishing) Throughput is slightly better, latency average and variance explode because each thread is given stretches of cpu time to advance, then wait for the next round of cpu time. -- Fabien.
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On 2020-02-11 19:48, Justin Pryzby wrote: For your v7 patch, which handles REINDEX to a new tablespace, I have a few minor comments: + * the relation will be rebuilt. If InvalidOid is used, the default => should say "currrent", not default ? Yes, it keeps current index tablespace in that case, thanks. +++ b/doc/src/sgml/ref/reindex.sgml +TABLESPACE ... +class="parameter">new_tablespace => I saw you split the description of TABLESPACE from new_tablespace based on comment earlier in the thread, but I suggest that the descriptions for these should be merged, like: + +TABLESPACEnew_tablespace + + + Allow specification of a tablespace where all rebuilt indexes will be created. + Cannot be used with "mapped" relations. If SCHEMA, + DATABASE or SYSTEM are specified, then + all unsuitable relations will be skipped and a single WARNING + will be generated. + + + It sounds good to me, but here I just obey the structure, which is used all around. Documentation of ALTER TABLE/DATABASE, REINDEX and many others describes each literal/parameter in a separate entry, e.g. new_tablespace. So I would prefer to keep it as it is for now. The existing patch is very natural, especially the parts in the original patch handling vacuum full and cluster. Those were removed to concentrate on REINDEX, and based on comments that it might be nice if ALTER handled CLUSTER and VACUUM FULL. On a separate thread, I brought up the idea of ALTER using clustered order. Tom pointed out some issues with my implementation, but didn't like the idea, either. So I suggest to re-include the CLUSTER/VAC FULL parts as a separate 0002 patch, the same way they were originally implemented. BTW, I think if "ALTER" were updated to support REINDEX (to allow multiple operations at once), it might be either: |ALTER INDEX i SET TABLESPACE , REINDEX -- to reindex a single index on a given tlbspc or |ALTER TABLE tbl REINDEX USING INDEX TABLESPACE spc; -- to reindex all inds on table inds moved to a given tblspc "USING INDEX TABLESPACE" is already used for ALTER..ADD column/table CONSTRAINT. Yes, I also think that allowing REINDEX/CLUSTER/VACUUM FULL to put resulting relation in a different tablespace is a very natural operation. However, I did a couple of attempts to integrate latter two with ALTER TABLE and failed with it, since it is already complex enough. I am still willing to proceed with it, but not sure how soon it will be. Anyway, new version is attached. It is rebased in order to resolve conflicts with a recent fix of REINDEX CONCURRENTLY + temp relations, and includes this small comment fix. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com The Russian Postgres Company From d2b7a5fa2e11601759b47af0c142a7824ef907a2 Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Mon, 30 Dec 2019 20:00:37 +0300 Subject: [PATCH v8] Allow REINDEX to change tablespace REINDEX already does full relation rewrite, this patch adds a possibility to specify a new tablespace where new relfilenode will be created. --- doc/src/sgml/ref/reindex.sgml | 24 +- src/backend/catalog/index.c | 75 -- src/backend/commands/cluster.c| 2 +- src/backend/commands/indexcmds.c | 96 --- src/backend/commands/tablecmds.c | 2 +- src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/parser/gram.y | 14 ++-- src/backend/tcop/utility.c| 6 +- src/bin/psql/tab-complete.c | 6 ++ src/include/catalog/index.h | 7 +- src/include/commands/defrem.h | 6 +- src/include/nodes/parsenodes.h| 1 + src/test/regress/input/tablespace.source | 49 src/test/regress/output/tablespace.source | 66 15 files changed, 323 insertions(+), 33 deletions(-) diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index c54a7c420d4..0628c94bb1e 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name +REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name [ TABLESPACE new_tablespace ] where option can be one of: @@ -174,6 +174,28 @@ REINDEX [ ( option [, ...] ) ] { IN + +TABLESPACE + + + This specifies a tablespace, where all rebuilt indexes will be created. + Cannot be used with "mapped" relations. If SCHEMA, + DATABASE or SYSTEM is specified, then + all unsuitable relations will be skipped and a single WARNING + will be generated. + + + + + +new_tablespace + + + The name of th
Re: SLRU statistics
On Fri, Feb 28, 2020 at 08:19:18PM -0300, Alvaro Herrera wrote: On 2020-Jan-21, Tomas Vondra wrote: On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote: > I've not tested the performance impact but perhaps we might want to > disable these counter by default and controlled by a GUC. And similar > to buffer statistics it might be better to inline > pgstat_count_slru_page_xxx function for better performance. Hmmm, yeah. Inlining seems like a good idea, and maybe we should have something like track_slru GUC. I disagree with adding a GUC. If a performance impact can be measured let's turn the functions to static inline, as already proposed. My guess is that pgstat_count_slru_page_hit() is the only candidate for that; all the other paths involve I/O or lock acquisition or even WAL generation, so the impact won't be measurable anyhow. We removed track-enabling GUCs years ago. Did we actually remove track-enabling GUCs? I think we still have - track_activities - track_counts - track_io_timing - track_functions But maybe I'm missing something? That being said, I'm not sure we need to add a GUC. I'll do some measurements and we'll see. Maybe the statis inline will me enough. BTW, this comment: /* update the stats counter of pages found in shared buffers */ is not strictly true, because we don't use what we normally call "shared buffers" for SLRUs. Oh, right. Will fix. Patch applies cleanly. I suggest to move the page_miss() call until after SlruRecentlyUsed(), for consistency with the other case. OK. I find SlruType pretty odd, and the accompanying "if" list in pg_stat_get_slru() correspondingly so. Would it be possible to have each SLRU enumerate itself somehow? Maybe add the name in SlruCtlData and query that, somehow. (I don't think we have an array of SlruCtlData anywhere though, so this might be a useless idea.) Well, maybe. We could have a system to register SLRUs dynamically, but the trick here is that by having a fixed predefined number of SLRUs simplifies serialization in pgstat.c and so on. I don't think the "if" branches in pg_stat_get_slru() are particularly ugly, but maybe we could replace the enum with a registry of structs, something like rmgrlist.h. It seems like an overkill to me, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Yet another fast GiST build (typo)
On 2020-02-29 13:13, Andrey M. Borodin wrote: Hi! On 24 февр. 2020 г., at 13:50, Andrey M. Borodin wrote: Hi Thomas! Thanks for looking into this! I’ll fix your notices asap. PFA v5. Thomas, I've used your wording almost exactly with explanation how point_zorder_internal() works. It has more explanation power than my attempts to compose good comment. Small typo alert: In v5-0002-Implement-GiST-build-using-sort-support.patch there is: + method is also optional and is used diring fast GiST build. 'diring' should be 'during'
Re: Yet another fast GiST build
Hi! > On 24 февр. 2020 г., at 13:50, Andrey M. Borodin wrote: > > Hi Thomas! > > Thanks for looking into this! I’ll fix your notices asap. PFA v5. Thomas, I've used your wording almost exactly with explanation how point_zorder_internal() works. It has more explanation power than my attempts to compose good comment. There is one design decision that worries me most: should we use opclass function or index option to provide this sorting information? It is needed only during index creation, actually. And having extra i-class only for fast build seems excessive. I think we can provide both ways and let opclass developers decide? Thanks! Best regards, Andrey Borodin. v5-0001-Add-sort-support-for-point-gist_point_sortsupport.patch Description: Binary data v5-0002-Implement-GiST-build-using-sort-support.patch Description: Binary data
Re: BUG #15858: could not stat file - over 4GB
On Sat, Feb 29, 2020 at 9:40 AM Juan José Santamaría Flecha < juanjo.santama...@gmail.com> wrote: > On Sat, Feb 29, 2020 at 12:44 AM Tom Lane wrote: > >> >> The cfbot thinks this doesn't compile on Windows [1]. Looks like perhaps >> a missing-#include problem? > > > The define logic for _WIN32_WINNT includes testing of _MSC_VER, and is not > a proper choice for MSVC 2013 as the cfbot is showing. > The cfbot is not happy yet. I will backtrack a bit on the cruft cleanup. Please find attached a new version addressing this issue. Regards, Juan José Santamaría Flecha > >> > 0001-Support-for-large-files-on-Win32-v7.patch Description: Binary data
Re: proposal \gcsv
so 29. 2. 2020 v 11:34 odesílatel Vik Fearing napsal: > On 29/02/2020 06:43, Pavel Stehule wrote: > > Hi > > > > I would to enhance \g command about variant \gcsv > > > > proposed command has same behave like \g, only the result will be every > > time in csv format. > > > > It can helps with writing psql macros wrapping \g command. > > > > Options, notes? > > But then we would need \ghtml and \glatex etc. If we want a shortcut > for setting a one-off format, I would go for \gf or something. > > \gf csv > \gf html > \gf latex > usability of html or latex format in psql is significantly lower than csv format. There is only one generic format for data - csv. Regards Pavel > -1 on \gcsv > -- > Vik Fearing >
Re: proposal \gcsv
On 29/02/2020 06:43, Pavel Stehule wrote: > Hi > > I would to enhance \g command about variant \gcsv > > proposed command has same behave like \g, only the result will be every > time in csv format. > > It can helps with writing psql macros wrapping \g command. > > Options, notes? But then we would need \ghtml and \glatex etc. If we want a shortcut for setting a one-off format, I would go for \gf or something. \gf csv \gf html \gf latex -1 on \gcsv -- Vik Fearing
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
On Fri, Feb 28, 2020 at 07:23:38PM -0500, Tom Lane wrote: > Will push that, thanks for looking. Thanks for the commit. -- Michael signature.asc Description: PGP signature
Re: proposal: schema variables
pá 28. 2. 2020 v 16:30 odesílatel Pavel Stehule napsal: > > > čt 27. 2. 2020 v 15:37 odesílatel Pavel Stehule > napsal: > >> >> Hi >> >> >>> 3) Any way to define CONSTANTs ? >>> We already talked a bit about this subject and also Gilles Darold >>> introduces it in this mailing-list topic but I'd like to insist on it. >>> I think it would be nice to have a way to say that a variable should not >>> be changed once defined. >>> Maybe it's hard to implement and can be implemented later, but I just >>> want to know if this concern is open. >>> >> >> I played little bit with it and I didn't find any nice solution, but >> maybe I found the solution. I had ideas about some variants, but almost all >> time I had a problem with parser's shifts because all potential keywords >> are not reserved. >> >> last variant, but maybe best is using keyword WITH >> >> So the syntax can looks like >> >> CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT >> expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ] >> >> What do you think about this syntax? It doesn't need any new keyword, and >> it easy to enhance it. >> >> CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT); >> > > After some more thinking and because in other patch I support syntax > CREATE TRANSACTION VARIABLE ... I change my opinion and implemented support > for > syntax CREATE IMMUTABLE VARIABLE for define constants. > second try to fix pg_dump Regards Pavel > > See attached patch > > Regards > > Pavel > > >> >> ? >> >> Regards >> >> Pavel >> >> >> schema-variables-20200229.patch.gz Description: application/gzip
Broken resetting of subplan hash tables
Hi, I looked again at one of the potential issues Ranier Vilela's static analysis found and after looking more at it I still think this one is a real bug. But my original patch was incorrect and introduced a use after free bug. The code for resetting the hash tables of the SubPlanState node in buildSubPlanHash() looks like it can never run, and additionally it would be broken if it would ever run. This issue was introduced in commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd. As far as I gather the following is true: 1. It sets node->hashtable and node->hashnulls to NULL a few lines before checking if they are not NULL which means the code for resetting them cannot ever be reached. 2. But if we changed to code so that the ResetTupleHashTable() calls are reachable we would hit a typo. It resets node->hashtable twice and never resets node->hashnulls which would cause non-obvious bugs. 3. Additionally since the memory context used by the hash tables is reset in buildSubPlanHash() if we start resetting hash tables we will get a use after free bug. I have attached a patch which makes sure the code for resetting the hash tables is actually run while also fixing the code for resetting them. Since the current behavior of the code in HEAD is not actually broken, it is just an optimization which is not used, this fix does not have to be backpatched. Andreas diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index ff95317879..4d7306ff06 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -494,9 +494,6 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext) * If it's not necessary to distinguish FALSE and UNKNOWN, then we don't * need to store subplan output rows that contain NULL. */ - MemoryContextReset(node->hashtablecxt); - node->hashtable = NULL; - node->hashnulls = NULL; node->havehashrows = false; node->havenullrows = false; @@ -533,7 +530,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext) } if (node->hashnulls) - ResetTupleHashTable(node->hashtable); + ResetTupleHashTable(node->hashnulls); else node->hashnulls = BuildTupleHashTableExt(node->parent, node->descRight, @@ -549,6 +546,8 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext) node->hashtempcxt, false); } + else + node->hashnulls = NULL; /* * We are probably in a short-lived expression-evaluation context. Switch
Re: Parallel copy
On Wed, Feb 26, 2020 at 8:47 PM Ants Aasma wrote: > > On Tue, 25 Feb 2020 at 18:00, Tomas Vondra > wrote: > > Perhaps. I guess it'll depend on the CSV file (number of fields, ...), > > so I still think we need to do some measurements first. I'm willing to > > do that, but (a) I doubt I'll have time for that until after 2020-03, > > and (b) it'd be good to agree on some set of typical CSV files. > > I agree that getting a nice varied dataset would be nice. Including > things like narrow integer only tables, strings with newlines and > escapes in them, extremely wide rows. > > I tried to capture a quick profile just to see what it looks like. > Grabbed a random open data set from the web, about 800MB of narrow > rows CSV [1]. > > Script: > CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count > text); > COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true); > > Profile: > # Samples: 59K of event 'cycles:u' > # Event count (approx.): 57644269486 > # > # Overhead Command Shared Object Symbol > # .. > ... > # > 18.24% postgres postgres[.] CopyReadLine > 9.23% postgres postgres[.] NextCopyFrom > 8.87% postgres postgres[.] NextCopyFromRawFields > 5.82% postgres postgres[.] pg_verify_mbstr_len > 5.45% postgres postgres[.] pg_strtoint32 > 4.16% postgres postgres[.] heap_fill_tuple > 4.03% postgres postgres[.] heap_compute_data_size > 3.83% postgres postgres[.] CopyFrom > 3.78% postgres postgres[.] AllocSetAlloc > 3.53% postgres postgres[.] heap_form_tuple > 2.96% postgres postgres[.] InputFunctionCall > 2.89% postgres libc-2.30.so[.] __memmove_avx_unaligned_erms > 1.82% postgres libc-2.30.so[.] __strlen_avx2 > 1.72% postgres postgres[.] AllocSetReset > 1.72% postgres postgres[.] RelationPutHeapTuple > 1.47% postgres postgres[.] heap_prepare_insert > 1.31% postgres postgres[.] heap_multi_insert > 1.25% postgres postgres[.] textin > 1.24% postgres postgres[.] int4in > 1.05% postgres postgres[.] tts_buffer_heap_clear > 0.85% postgres postgres[.] pg_any_to_server > 0.80% postgres postgres[.] pg_comp_crc32c_sse42 > 0.77% postgres postgres[.] cstring_to_text_with_len > 0.69% postgres postgres[.] AllocSetFree > 0.60% postgres postgres[.] appendBinaryStringInfo > 0.55% postgres postgres[.] > tts_buffer_heap_materialize.part.0 > 0.54% postgres postgres[.] palloc > 0.54% postgres libc-2.30.so[.] __memmove_avx_unaligned > 0.51% postgres postgres[.] palloc0 > 0.51% postgres postgres[.] pg_encoding_max_length > 0.48% postgres postgres[.] enlargeStringInfo > 0.47% postgres postgres[.] ExecStoreVirtualTuple > 0.45% postgres postgres[.] PageAddItemExtended > > So that confirms that the parsing is a huge chunk of overhead with > current splitting into lines being the largest portion. Amdahl's law > says that splitting into tuples needs to be made fast before > parallelizing makes any sense. > I have ran very simple case on table with 2 indexes and I can see a lot of time is spent in index insertion. I agree that there is a good amount of time spent in tokanizing but it is not very huge compared to index insertion. I have expanded the time spent in the CopyFrom function from my perf report and pasted here. We can see that a lot of time is spent in ExecInsertIndexTuples(77%). I agree that we need to further evaluate that out of which how much is I/O vs CPU operations. But, the point I want to make is that it's not true for all the cases that parsing is taking maximum amout of time. - 99.50% CopyFrom - 82.90% CopyMultiInsertInfoFlush - 82.85% CopyMultiInsertBufferFlush + 77.68% ExecInsertIndexTuples + 3.74% table_multi_insert + 0.89% ExecClearTuple - 12.54% NextCopyFrom - 7.70% NextCopyFromRawFields - 5.72% CopyReadLine 3.96% CopyReadLineText + 1.49% pg_any_to_server 1.86% CopyReadAttributesCSV + 3.68% InputFunctionCall + 2.11% ExecMaterializeSlot + 0.94% MemoryContextReset My test: -- Prepare: CREATE TABLE t (a int, b int, c varchar); insert into t select i,i, '' from generate_series(1,1000) as i; copy t to '/home/dilipkumar/a.csv' WITH (FORMAT 'csv', HEADER true); truncate table t; create index idx on t(a); create index idx1 on t(
Re: BUG #15858: could not stat file - over 4GB
On Sat, Feb 29, 2020 at 12:44 AM Tom Lane wrote: > > The cfbot thinks this doesn't compile on Windows [1]. Looks like perhaps > a missing-#include problem? The define logic for _WIN32_WINNT includes testing of _MSC_VER, and is not a proper choice for MSVC 2013 as the cfbot is showing. Please find attached a new version addressing this issue. Regards, Juan José Santamaría Flecha > > 0001-Support-for-large-files-on-Win32-v6.patch Description: Binary data