Re: bool_plperl transform

2020-02-29 Thread Andrew Dunstan


On 2/29/20 4:55 PM, Ivan Panchenko wrote:
> Hi,
> While using PL/Perl I have found that it obtains boolean arguments
> from Postgres as ‘t’ and ‘f’, which is extremely inconvenient because
> ‘f’ is not false from the perl viewpoint.
> So the problem is how to convert the SQL booleans into Perl style.
>  
> There are 3 ways to do this:
>
>  1. make plperl automatically convert bools into something acceptable
> for perl. This looks simple, but probably is not acceptable as it
> breaks compatibility.
>  2. try to make some trick like it is done with arrays, i.e. convert
> bools into special Perl objects which look like ‘t’ and ‘f’ when
> treated as text, but are true and false for boolean operations. I
> am not sure that it is possible and reliable.
>  3. make a transform which transforms bool, like it is done with
> jsonb. This does not break compatibility and is rather
> straightforward.
>
> So I propose to take the third way and make such transform. This is
> very simple, a patch is attached.
> Also this patch improves the plperl documentation page, which now has
> nothing said about the transforms.
>  
>


Patch appears to be missing all the new files.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.

2020-02-29 Thread Kohei KaiGai
Hi Amit,

Thanks, I didn't check the thread.

It looks to me the latest patch was submitted by Fujita-san, Oct-2018.
Then, Tom pointer out this simple approach has a problem of inefficient remote
query plan because of no intelligence on the structure of remote tables mapped
by postgres_fdw. After that, the patch has been left for a year.

Indeed, it is not an ideal query plan to execute for each updated rows...

postgres=# explain select * from rtable_parent where tableoid = 126397
and ctid = '(0,11)'::tid;
   QUERY PLAN
-
 Append  (cost=0.00..5.18 rows=2 width=50)
   ->  Seq Scan on rtable_parent  (cost=0.00..1.15 rows=1 width=31)
 Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid))
   ->  Tid Scan on rtable_child  (cost=0.00..4.02 rows=1 width=68)
 TID Cond: (ctid = '(0,11)'::tid)
 Filter: (tableoid = '126397'::oid)
(6 rows)

Rather than the refactoring at postgres_fdw, is it possible to have a
built-in partition
pruning rule when "tableoid = " was supplied?
If partition mechanism would have the feature, it should not be a
complicated problem.

Best regards,

2020年3月1日(日) 12:39 Amit Langote :
>
> Hi,
>
> On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai  wrote:
> >
> > Hello,
> >
> > I noticed the following scenario under the development of truncate
> > support on FDW.
> >
> > In case when 'ftable' maps a remote table that has inherited children,...
> >
> > postgres=# create table rtable_parent (id int, label text, x text);
> > CREATE TABLE
> > postgres=# create table rtable_child () inherits (rtable_parent);
> > CREATE TABLE
> > postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
> > from generate_series(1,10) x);
> > INSERT 0 10
> > postgres=# insert into rtable_child (select x, 'child', md5(x::text)
> > from generate_series(6,15) x);
> > INSERT 0 10
> > postgres=# create foreign table ftable (id int, label text, x text)
> >server loopback options (table_name 'rtable_parent');
> > CREATE FOREIGN TABLE
> >
> > The 'ftable' shows the results from both of the parent and children.
> > postgres=# select * from ftable;
> >  id | label  |x
> > ++--
> >   1 | parent | c4ca4238a0b923820dcc509a6f75849b
> >   2 | parent | c81e728d9d4c2f636f067f89cc14862c
> >   3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
> >   4 | parent | a87ff679a2f3e71d9181a67b7542122c
> >   5 | parent | e4da3b7fbbce2345d7772b0674a318d5
> >   6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
> >   7 | parent | 8f14e45fceea167a5a36dedd4bea2543
> >   8 | parent | c9f0f895fb98ab9159f51fd0297e236d
> >   9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
> >  10 | parent | d3d9446802a44259755d38e6d163e820
> >   6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
> >   7 | child  | 8f14e45fceea167a5a36dedd4bea2543
> >   8 | child  | c9f0f895fb98ab9159f51fd0297e236d
> >   9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
> >  10 | child  | d3d9446802a44259755d38e6d163e820
> >  11 | child  | 6512bd43d9caa6e02c990b0a82652dca
> >  12 | child  | c20ad4d76fe97759aa27a0c99bff6710
> >  13 | child  | c51ce410c124a10e0db5e4b97fc2af39
> >  14 | child  | aab3238922bcc25a6f606eb525ffdc56
> >  15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
> > (20 rows)
> >
> > When we try to update the foreign-table without DirectUpdate mode,
> > remote query tries to update the rows specified by "ctid" system column.
> > However, it was not a unique key in this case.
> >
> > postgres=# explain update ftable set x = 'updated' where id > 10 and
> > pg_backend_pid() > 0;
> >  QUERY PLAN
> > -
> >  Update on ftable  (cost=100.00..133.80 rows=414 width=74)
> >->  Result  (cost=100.00..133.80 rows=414 width=74)
> >  One-Time Filter: (pg_backend_pid() > 0)
> >  ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
> > (4 rows)
> >
> > [*] Note that pg_backend_pid() prevent direct update.
> >
> > postgres=# update ftable set x = 'updated' where id > 10 and
> > pg_backend_pid() > 0;
> > UPDATE 5
> > postgres=# select ctid,* from ftable;
> >   ctid  | id | label  |x
> > +++--
> >  (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
> >  (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc14862c
> >  (0,3)  |  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
> >  (0,4)  |  4 | parent | a87ff679a2f3e71d9181a67b7542122c
> >  (0,5)  |  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
> >  (0,11) |  6 | parent | updated
> >  (0,12) |  7 | parent | updated
> >  (0,13) |  8 | parent | updated
> >  (0,14) |  9 | parent | updated
> >  (0,15) | 10 | parent | updated
> >  (0,1)  |  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
> >  (0,2)  |  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
> 

Re: [PATCH] Add schema and table names to partition error

2020-02-29 Thread Amit Langote
Hi Chris,

On Sun, Mar 1, 2020 at 4:34 AM Chris Bandy  wrote:
> Hello,
>
> I'm writing telemetry data into a table partitioned by time. When there
> is no partition for a particular date, my app notices the constraint
> violation, creates the partition, and retries the insert.
>
> I'm used to handling constraint violations by observing the constraint
> name in the error fields. However, this error had none. I set out to add
> the name to the error field, but after a bit of reading my impression is
> that partition constraints are more like a property of a table.

This makes sense to me.  Btree code which implements unique
constraints also does this; see _bt_check_unique() function in
src/backend/access/nbtree/nbtinsert.c:

ereport(ERROR,
(errcode(ERRCODE_UNIQUE_VIOLATION),
 errmsg("duplicate key value violates
unique constraint \"%s\"",
RelationGetRelationName(rel)),
 key_desc ? errdetail("Key %s already exists.",
  key_desc) : 0,
 errtableconstraint(heapRel,

RelationGetRelationName(rel;

> I've attached a patch that adds the schema and table name fields to
> errors for my use case:

Instead of using errtable(), use errtableconstraint() like the btree
code does, if only just for consistency.

> - Insert data into a partitioned table for which there is no partition.
> - Insert data directly into an incorrect partition.

There are couple more instances in src/backend/command/tablecmds.c
where partition constraint is checked:

In ATRewriteTable():

if (partqualstate && !ExecCheck(partqualstate, econtext))
{
if (tab->validate_default)
ereport(ERROR,
(errcode(ERRCODE_CHECK_VIOLATION),
 errmsg("updated partition constraint for
default partition \"%s\" would be violated by some row",
RelationGetRelationName(oldrel;
else
ereport(ERROR,
(errcode(ERRCODE_CHECK_VIOLATION),
 errmsg("partition constraint of relation
\"%s\" is violated by some row",
RelationGetRelationName(oldrel;
}

Maybe, better fix these too for completeness.

Thanks,
Amit




Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.

2020-02-29 Thread Amit Langote
Hi,

On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai  wrote:
>
> Hello,
>
> I noticed the following scenario under the development of truncate
> support on FDW.
>
> In case when 'ftable' maps a remote table that has inherited children,...
>
> postgres=# create table rtable_parent (id int, label text, x text);
> CREATE TABLE
> postgres=# create table rtable_child () inherits (rtable_parent);
> CREATE TABLE
> postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
> from generate_series(1,10) x);
> INSERT 0 10
> postgres=# insert into rtable_child (select x, 'child', md5(x::text)
> from generate_series(6,15) x);
> INSERT 0 10
> postgres=# create foreign table ftable (id int, label text, x text)
>server loopback options (table_name 'rtable_parent');
> CREATE FOREIGN TABLE
>
> The 'ftable' shows the results from both of the parent and children.
> postgres=# select * from ftable;
>  id | label  |x
> ++--
>   1 | parent | c4ca4238a0b923820dcc509a6f75849b
>   2 | parent | c81e728d9d4c2f636f067f89cc14862c
>   3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
>   4 | parent | a87ff679a2f3e71d9181a67b7542122c
>   5 | parent | e4da3b7fbbce2345d7772b0674a318d5
>   6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
>   7 | parent | 8f14e45fceea167a5a36dedd4bea2543
>   8 | parent | c9f0f895fb98ab9159f51fd0297e236d
>   9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
>  10 | parent | d3d9446802a44259755d38e6d163e820
>   6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
>   7 | child  | 8f14e45fceea167a5a36dedd4bea2543
>   8 | child  | c9f0f895fb98ab9159f51fd0297e236d
>   9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
>  10 | child  | d3d9446802a44259755d38e6d163e820
>  11 | child  | 6512bd43d9caa6e02c990b0a82652dca
>  12 | child  | c20ad4d76fe97759aa27a0c99bff6710
>  13 | child  | c51ce410c124a10e0db5e4b97fc2af39
>  14 | child  | aab3238922bcc25a6f606eb525ffdc56
>  15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
> (20 rows)
>
> When we try to update the foreign-table without DirectUpdate mode,
> remote query tries to update the rows specified by "ctid" system column.
> However, it was not a unique key in this case.
>
> postgres=# explain update ftable set x = 'updated' where id > 10 and
> pg_backend_pid() > 0;
>  QUERY PLAN
> -
>  Update on ftable  (cost=100.00..133.80 rows=414 width=74)
>->  Result  (cost=100.00..133.80 rows=414 width=74)
>  One-Time Filter: (pg_backend_pid() > 0)
>  ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
> (4 rows)
>
> [*] Note that pg_backend_pid() prevent direct update.
>
> postgres=# update ftable set x = 'updated' where id > 10 and
> pg_backend_pid() > 0;
> UPDATE 5
> postgres=# select ctid,* from ftable;
>   ctid  | id | label  |x
> +++--
>  (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
>  (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc14862c
>  (0,3)  |  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
>  (0,4)  |  4 | parent | a87ff679a2f3e71d9181a67b7542122c
>  (0,5)  |  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
>  (0,11) |  6 | parent | updated
>  (0,12) |  7 | parent | updated
>  (0,13) |  8 | parent | updated
>  (0,14) |  9 | parent | updated
>  (0,15) | 10 | parent | updated
>  (0,1)  |  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
>  (0,2)  |  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
>  (0,3)  |  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
>  (0,4)  |  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
>  (0,5)  | 10 | child  | d3d9446802a44259755d38e6d163e820
>  (0,11) | 11 | child  | updated
>  (0,12) | 12 | child  | updated
>  (0,13) | 13 | child  | updated
>  (0,14) | 14 | child  | updated
>  (0,15) | 15 | child  | updated
> (20 rows)
>
> The WHERE-clause (id > 10) should affect only child table.
> However, it updated the rows in the parent table with same ctid.
>
> How about your thought?
> Probably, we need to fetch a pair of tableoid and ctid to identify
> the remote table exactly, if not direct-update cases.

This was this discussed on this thread:

https://www.postgresql.org/message-id/CAFjFpRfcgwsHRmpvoOK-GUQi-n8MgAS%2BOxcQo%3DaBDn1COywmcg%40mail.gmail.com

Solutions have been proposed too, but none finalized yet.

Thanks,
Amit




[BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.

2020-02-29 Thread Kohei KaiGai
Hello,

I noticed the following scenario under the development of truncate
support on FDW.

In case when 'ftable' maps a remote table that has inherited children,...

postgres=# create table rtable_parent (id int, label text, x text);
CREATE TABLE
postgres=# create table rtable_child () inherits (rtable_parent);
CREATE TABLE
postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
from generate_series(1,10) x);
INSERT 0 10
postgres=# insert into rtable_child (select x, 'child', md5(x::text)
from generate_series(6,15) x);
INSERT 0 10
postgres=# create foreign table ftable (id int, label text, x text)
   server loopback options (table_name 'rtable_parent');
CREATE FOREIGN TABLE

The 'ftable' shows the results from both of the parent and children.
postgres=# select * from ftable;
 id | label  |x
++--
  1 | parent | c4ca4238a0b923820dcc509a6f75849b
  2 | parent | c81e728d9d4c2f636f067f89cc14862c
  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
  4 | parent | a87ff679a2f3e71d9181a67b7542122c
  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
  6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
  7 | parent | 8f14e45fceea167a5a36dedd4bea2543
  8 | parent | c9f0f895fb98ab9159f51fd0297e236d
  9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
 10 | parent | d3d9446802a44259755d38e6d163e820
  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
 10 | child  | d3d9446802a44259755d38e6d163e820
 11 | child  | 6512bd43d9caa6e02c990b0a82652dca
 12 | child  | c20ad4d76fe97759aa27a0c99bff6710
 13 | child  | c51ce410c124a10e0db5e4b97fc2af39
 14 | child  | aab3238922bcc25a6f606eb525ffdc56
 15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
(20 rows)

When we try to update the foreign-table without DirectUpdate mode,
remote query tries to update the rows specified by "ctid" system column.
However, it was not a unique key in this case.

postgres=# explain update ftable set x = 'updated' where id > 10 and
pg_backend_pid() > 0;
 QUERY PLAN
-
 Update on ftable  (cost=100.00..133.80 rows=414 width=74)
   ->  Result  (cost=100.00..133.80 rows=414 width=74)
 One-Time Filter: (pg_backend_pid() > 0)
 ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
(4 rows)

[*] Note that pg_backend_pid() prevent direct update.

postgres=# update ftable set x = 'updated' where id > 10 and
pg_backend_pid() > 0;
UPDATE 5
postgres=# select ctid,* from ftable;
  ctid  | id | label  |x
+++--
 (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
 (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc14862c
 (0,3)  |  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
 (0,4)  |  4 | parent | a87ff679a2f3e71d9181a67b7542122c
 (0,5)  |  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
 (0,11) |  6 | parent | updated
 (0,12) |  7 | parent | updated
 (0,13) |  8 | parent | updated
 (0,14) |  9 | parent | updated
 (0,15) | 10 | parent | updated
 (0,1)  |  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
 (0,2)  |  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
 (0,3)  |  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
 (0,4)  |  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
 (0,5)  | 10 | child  | d3d9446802a44259755d38e6d163e820
 (0,11) | 11 | child  | updated
 (0,12) | 12 | child  | updated
 (0,13) | 13 | child  | updated
 (0,14) | 14 | child  | updated
 (0,15) | 15 | child  | updated
(20 rows)

The WHERE-clause (id > 10) should affect only child table.
However, it updated the rows in the parent table with same ctid.

How about your thought?
Probably, we need to fetch a pair of tableoid and ctid to identify
the remote table exactly, if not direct-update cases.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: Auxiliary Processes and MyAuxProc

2020-02-29 Thread Mike Palmiotto
On Sat, Nov 30, 2019 at 9:15 PM Michael Paquier  wrote:
>
> On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote:
> > Color me unconvinced.
>
> The latest comments of the thread have not been addressed yet. so I am
> marking the patch as returned with feedback.  If you think that's not
> correct, please feel free to update the status of the patch.  If you
> do so, please provide at the same time a rebased version of the patch,
> as it is failing to apply on HEAD.

I've finally had some time to update the patchset with an attempt at
addressing Andres' concerns. Note that the current spin has a bug
which does not allow syslogger to run properly. Additionally, it is
squashed into one patch again.  I intend to split the patch out into
separate functional patches and fix the syslogger issue, but wanted to
get this on the ticket for the open CommitFest before it closes. I'll
go ahead and re-add it and will be pushing updates as I have them.

I intend to give this patchset the time it deserves, so will be much
more responsive this go-around.

Thanks,

-- 
Mike Palmiotto
https://crunchydata.com
From 34feb7399fe218eab0da851032299f81e7ad6689 Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Tue, 19 Feb 2019 15:29:33 +
Subject: [PATCH 2/2] Add a hook to allow extensions to set worker metadata

This patch implements a hook that allows extensions to modify a worker
process' metadata in backends.

Additional work done by Yuli Khodorkovskiy 
Original discussion here: https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO%3DhtC6LnA6aW4r-%2Bjq%3D3Q5RAoFQgW8EtA%40mail.gmail.com
---
 src/backend/postmaster/fork_process.c | 11 +++
 src/include/postmaster/fork_process.h |  4 
 2 files changed, 15 insertions(+)

diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index def3cee37e..e392f5207e 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -22,6 +22,14 @@
 
 #include "postmaster/fork_process.h"
 
+/*
+ * This hook allows plugins to get control of the child process following
+ * a successful fork_process(). It can be used to perform some action in
+ * extensions to set metadata in backends (including special backends) upon
+ * setting of the session user.
+ */
+ForkProcess_post_hook_type ForkProcess_post_hook = NULL;
+
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
@@ -114,6 +122,9 @@ fork_process(void)
 #ifdef USE_OPENSSL
 		RAND_cleanup();
 #endif
+
+		if (ForkProcess_post_hook)
+			(*ForkProcess_post_hook) ();
 	}
 
 	return result;
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index a42f859a08..b4d1560b72 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -20,4 +20,8 @@ extern pid_t fork_process(void);
 extern pid_t postmaster_forkexec(int argc, char *argvp[]);
 #endif
 
+/* Hook for plugins to get control after a successful fork_process() */
+typedef void (*ForkProcess_post_hook_type) ();
+extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook;
+
 #endif			/* FORK_PROCESS_H */
-- 
2.21.0

From 6fb91ba335153882d16d2dff28d584c3c0bc83a2 Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Fri, 27 Sep 2019 12:28:19 -0400
Subject: [PATCH 1/2] Introduce subprocess infrastructure

This is an initial attempt at coalescing all of the postmaster
subprocess startups into one central framework. The patchset introduces
a MySubprocess global, which contains an PgSubprocess entry from the
process_types array. Each entry has defined entrypoints, cleanup
functions, prep functions, and a few other control-oriented variables.

This is a rework of https://commitfest.postgresql.org/25/2259/, which
attempts to address the feedback from Andres Freund.
---
 src/backend/bootstrap/bootstrap.c   |  78 +-
 src/backend/main/main.c |   3 +
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/autovacuum.c | 165 +---
 src/backend/postmaster/bgworker.c   | 151 +++-
 src/backend/postmaster/bgwriter.c   |   2 +-
 src/backend/postmaster/checkpointer.c   |   2 +-
 src/backend/postmaster/pgarch.c |  83 +--
 src/backend/postmaster/pgstat.c |  89 +--
 src/backend/postmaster/postmaster.c | 786 +++-
 src/backend/postmaster/startup.c|   8 +-
 src/backend/postmaster/subprocess.c | 206 +
 src/backend/postmaster/syslogger.c  | 278 +++
 src/backend/postmaster/walwriter.c  |   2 +-
 src/backend/replication/walreceiver.c   |   2 +-
 src/backend/utils/init/globals.c|   3 +-
 src/backend/utils/init/postinit.c   |   2 +-
 src/include/bootstrap/bootstrap.h   |   4 +-
 src/include/miscadmin.h |   1 +
 src/include/pgstat.h|   5 +-
 src/include/postmaster/autovacu

Re: TRUNCATE on foreign tables

2020-02-29 Thread Kohei KaiGai
Hello,

The attached is revised version.

> > If callback is invoked with a foreign-relation that is specified by TRUNCATE
> > command with ONLY, it seems to me reasonable that remote TRUNCATE
> > command specifies the relation on behalf of the foreign table with ONLY.
> >
> > So, if ExecForeignTruncate() has another list to inform the context for each
> > relation, postgres_fdw can build proper remote query that may specify the
> > remote tables with ONLY-clause.
>
> Yeah, TRUNCATE can specify ONLY on a per-table basis, so having a
> second list makes sense.  Then in the FDW, just make sure to
> elog(ERROR) if the lengths do no match, and then use forboth() to loop
> over them.  One thing that you need to be careful about is that tables
> which are added to the list because of inheritance should not be
> marked with ONLY when generating the command to the remote.
>
The v5 patch added separated list for the FDW callback, to inform the context
when relations are specified by TRUNCATE command. The frels_extra
argument is a list of integers. 0 means that relevant foreign-table is specified
without "ONLY" clause. and positive means specified with "ONLY" clause.
Negative value means that foreign-tables are not specified in the TRUNCATE
command, but truncated due to dependency (like partition's child leaf).

The remote SQL generates TRUNCATE command according to the above
"extra" information. So, "TRUNCATE ONLY ftable" generate a remote query
with "TRUNCATE ONLY mapped_remote_table".
On the other hand, it can make strange results, although it is a corner case.
The example below shows the result of TRUNCATE ONLY on a foreign-table
that mapps a remote table with an inherited children.
The rows id < 10 belongs to the parent table, thus TRUNCATE ONLY tru_ftable
eliminated the remote parent, however, it looks the tru_ftable still
contains rows
after TRUNCATE command.

I wonder whether it is tangible behavior for users. Of course, "ONLY" clause
controls local hierarchy of partitioned / inherited tables, however, I'm not
certain whether the concept shall be expanded to the structure of remote tables.

+SELECT * FROM tru_ftable;
+ id |x
++--
+  5 | e4da3b7fbbce2345d7772b0674a318d5
+  6 | 1679091c5a880faf6fb5e6087eb1b2dc
+  7 | 8f14e45fceea167a5a36dedd4bea2543
+  8 | c9f0f895fb98ab9159f51fd0297e236d
+  9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ 10 | d3d9446802a44259755d38e6d163e820
+ 11 | 6512bd43d9caa6e02c990b0a82652dca
+ 12 | c20ad4d76fe97759aa27a0c99bff6710
+ 13 | c51ce410c124a10e0db5e4b97fc2af39
+ 14 | aab3238922bcc25a6f606eb525ffdc56
+(10 rows)
+
+TRUNCATE ONLY tru_ftable;  -- truncate only parent portion
+SELECT * FROM tru_ftable;
+ id |x
++--
+ 10 | d3d9446802a44259755d38e6d163e820
+ 11 | 6512bd43d9caa6e02c990b0a82652dca
+ 12 | c20ad4d76fe97759aa27a0c99bff6710
+ 13 | c51ce410c124a10e0db5e4b97fc2af39
+ 14 | aab3238922bcc25a6f606eb525ffdc56
+(5 rows)

> > Regarding to the other comments, it's all Ok for me. I'll update the patch.
> > And, I forgot "updatable" option at postgres_fdw. It should be checked on
> > the truncate also, right?
>
> Hmm.  Good point.  Being able to filter that silently through a
> configuration parameter is kind of interesting.  Now I think that this
> should be a separate option because updatable applies to DMLs.  Like,
> truncatable?
>
Ok, "truncatable" option was added.
Please check the regression test and documentation updates.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql13-truncate-on-foreign-table.v5.patch
Description: Binary data


Re: bool_plperl transform

2020-02-29 Thread Tom Lane
=?UTF-8?B?SXZhbiBQYW5jaGVua28=?=  writes:
> While using PL/Perl I have found that it obtains boolean arguments from 
> Postgres as ‘t’ and ‘f’, which is extremely inconvenient because ‘f’ is not 
> false from the perl viewpoint.
> ...
> *  make a transform which transforms bool, like it is done with jsonb. This 
> does not break compatibility and is rather straightforward.

Please register this patch in the commitfest app, so we don't lose track
of it.

https://commitfest.postgresql.org/27/

regards, tom lane




Re: Allowing ALTER TYPE to change storage strategy

2020-02-29 Thread Tom Lane
Tomas Vondra  writes:
> On Fri, Feb 28, 2020 at 08:35:33PM -0500, Tom Lane wrote:
>> You'd need a moderately strong lock on each such table, which means
>> there'd be serious deadlock hazards.  I'm dubious that it's worth
>> troubling with.

> Yeah, I don't plan to do this in v1 (and I have no immediate plan to
> work on it after that). But I wonder how is the deadlock risk any
> different compared e.g. to DROP TYPE ... CASCADE?

True, but dropping a type you're actively using seems pretty improbable;
whereas the whole point of the patch you're proposing is that people
*would* want to use it in production.

regards, tom lane




bool_plperl transform

2020-02-29 Thread Ivan Panchenko

Hi,
While using PL/Perl I have found that it obtains boolean arguments from 
Postgres as ‘t’ and ‘f’, which is extremely inconvenient because ‘f’ is not 
false from the perl viewpoint.
So the problem is how to convert the SQL booleans into Perl style.
 
There are 3 ways to do this:
*  make plperl automatically convert bools into something acceptable for perl. 
This looks simple, but probably is not acceptable as it breaks compatibility.
*  try to make some trick like it is done with arrays, i.e. convert bools into 
special Perl objects which look like ‘t’ and ‘f’ when treated as text, but are 
true and false for boolean operations. I am not sure that it is possible and 
reliable.
*  make a transform which transforms bool, like it is done with jsonb. This 
does not break compatibility and is rather straightforward.
So I propose to take the third way and make such transform. This is very 
simple, a patch is attached.
Also this patch improves the plperl documentation page, which now has nothing 
said about the transforms.
 
Regards,
Ivan Panchenko
 
 

bool_plperl_transform_v1.patch
Description: Binary data


Re: Broken resetting of subplan hash tables

2020-02-29 Thread Tom Lane
Andres Freund  writes:
> On 2020-02-29 14:02:59 -0500, Tom Lane wrote:
>> TBH, I think that this tuple table API is seriously misdesigned;
>> it is confusing and very error-prone to have the callers need to
>> reset the tuple context separately from calling ResetTupleHashTable.

> Do you have an alternative proposal?

I'd be inclined to let the tuple hashtable make its own tuple-storage
context and reset that for itself.  Is it really worth the complexity
and bug hazards to share such a context with other uses?

> We could change it so more of the metadata for execGrouping.c is
> computed outside of BuildTupleHashTableExt(), and continue to destroy
> the entire hashtable. But we'd still have to reallocate the hashtable,
> the slot, etc.  So having a reset interface seems like the right thing.

Agreed, the reset interface is a good idea.  I'm just not happy that
in addition to resetting, you have to remember to reset some
vaguely-related context (and heaven help you if you reset that context
but not the hashtable).

>> Also, the callers all look like their resets are intended to destroy
>> the whole hashtable not just its contents (as indeed they were doing,
>> before the faulty commit).  But I didn't attempt to fix that today.

> Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like
> that to me? Why would they want to drop the hashtable metadata when
> resetting? What am I missing?

They may not look like it to you; but Andreas misread that, and so did
I at first --- not least because that *is* how it used to work, and
there are still comments suggesting that that's how it works, eg this
in ExecInitRecursiveUnion:

 * If hashing, we need a per-tuple memory context for comparisons, and a
 * longer-lived context to store the hash table.  The table can't just be
 * kept in the per-query context because we want to be able to throw it
 * away when rescanning.

"throw it away" sure looks like it means the entire hashtable, not just
its tuple contents.

regards, tom lane




Re: Allow to_date() and to_timestamp() to accept localized names

2020-02-29 Thread Tom Lane
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?=  
writes:
> On Tue, Jan 28, 2020 at 9:35 PM Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>> So I'm withdrawing my concerns with respect to this patch.  As long as
>> it can do a roundtrip conversion with to_char(), it's fine.

> We can avoid issues with non injective case conversion languages with a
> double conversion, so both strings in the comparison end up in the same
> state.
> I propose an upper/lower conversion as in the attached patch.

This seems pretty reasonable to me, with a couple of caveats:

* I don't think it's appropriate to hard-wire DEFAULT_COLLATION_OID
as the collation to do case-folding with.  For to_date/to_timestamp,
we have collatable text input so we can rely on the function's input
collation instead, at the cost of having to pass down the collation
OID through a few layers of subroutines :-(.  For parse_datetime,
I punted for now and let it use DEFAULT_COLLATION_OID, because that's
currently only called by JSONB code that probably hasn't got a usable
input collation anyway (since jsonb isn't considered collatable).

* I think it's likely worthwhile to do a quick check for an exact
match before we go through all these case-folding pushups.  If the
expected use-case is reversing to_char() output, that will win all
the time.  We're probably ahead of the game even if it only matches
a few percent of the time.

Attached v10 incorporates those improvements, plus a bit of
polishing of docs and comments.  Barring objections, this seems
committable to me.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 28035f1..8b73e05 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5968,7 +5968,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');


 TM prefix
-translation mode (print localized day and month names based on
+translation mode (use localized day and month names based on
  )
 TMMonth

@@ -5999,9 +5999,20 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 
  
   
-   TM does not include trailing blanks.
-   to_timestamp and to_date ignore
-   the TM modifier.
+   TM suppresses trailing blanks whether or
+   not FM is specified.
+  
+ 
+
+ 
+  
+   to_timestamp and to_date
+   ignore letter case in the input; so for
+   example MON, Mon,
+   and mon all accept the same strings.  When using
+   the TM modifier, case-folding is done according to
+   the rules of the function's input collation (see
+   ).
   
  
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index d029468..db99a6b 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1038,7 +1038,7 @@ static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 static void DCH_to_char(FormatNode *node, bool is_interval,
 		TmToChar *in, char *out, Oid collid);
 static void DCH_from_char(FormatNode *node, const char *in, TmFromChar *out,
-		  bool std, bool *have_error);
+		  Oid collid, bool std, bool *have_error);
 
 #ifdef DEBUG_TO_FROM_CHAR
 static void dump_index(const KeyWord *k, const int *index);
@@ -1057,11 +1057,14 @@ static int	from_char_parse_int_len(int *dest, const char **src, const int len,
 	FormatNode *node, bool *have_error);
 static int	from_char_parse_int(int *dest, const char **src, FormatNode *node,
 bool *have_error);
-static int	seq_search(const char *name, const char *const *array, int *len);
+static int	seq_search_ascii(const char *name, const char *const *array, int *len);
+static int	seq_search_localized(const char *name, char **array, int *len,
+ Oid collid);
 static int	from_char_seq_search(int *dest, const char **src,
  const char *const *array,
+ char **localized_array, Oid collid,
  FormatNode *node, bool *have_error);
-static void do_to_timestamp(text *date_txt, text *fmt, bool std,
+static void do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std,
 			struct pg_tm *tm, fsec_t *fsec, int *fprec,
 			uint32 *flags, bool *have_error);
 static char *fill_str(char *str, int c, int max);
@@ -2457,7 +2460,7 @@ from_char_parse_int(int *dest, const char **src, FormatNode *node, bool *have_er
  * suitable for comparisons to ASCII strings.
  */
 static int
-seq_search(const char *name, const char *const *array, int *len)
+seq_search_ascii(const char *name, const char *const *array, int *len)
 {
 	unsigned char firstc;
 	const char *const *a;
@@ -2503,8 +2506,89 @@ seq_search(const char *name, const char *const *array, int *len)
 }
 
 /*
- * Perform a sequential search in 'array' for an entry matching the first
- * character(s) of the 'src' string case-insensitively.
+ * Sequentially search an array of possibly non-English words for

Re: Allowing ALTER TYPE to change storage strategy

2020-02-29 Thread Tomas Vondra

On Fri, Feb 28, 2020 at 08:35:33PM -0500, Tom Lane wrote:

Tomas Vondra  writes:

I think we might check if there are any attributes with the given data
type, and allow the change if there are none. That would still allow the
change when the type is used only for things like function parameters
etc. But we'd also have to check for domains (recursively).


Still has race conditions.



Yeah, I have no problem believing that.


One thing I haven't mentioned in the original message is CASCADE. It
seems useful to optionally change storage for all attributes with the
given data type. But I'm not sure it's actually a good idea, and the
amount of code seems non-trivial (it'd have to copy quite a bit of code
from ALTER TABLE).


You'd need a moderately strong lock on each such table, which means
there'd be serious deadlock hazards.  I'm dubious that it's worth
troubling with.



Yeah, I don't plan to do this in v1 (and I have no immediate plan to
work on it after that). But I wonder how is the deadlock risk any
different compared e.g. to DROP TYPE ... CASCADE?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-02-29 Thread Tomas Vondra

On Sat, Feb 29, 2020 at 11:44:26AM -0300, Alvaro Herrera wrote:

On 2020-Feb-29, Tomas Vondra wrote:


Did we actually remove track-enabling GUCs? I think we still have

 - track_activities
 - track_counts
 - track_io_timing
 - track_functions

But maybe I'm missing something?


Hm I remembered we removed the one for row-level stats
(track_row_stats), but what we really did is merge it with block-level
stats (track_block_stats) into track_counts -- commit 48f7e6439568.
Funnily enough, if you disable that autovacuum won't work, so I'm not
sure it's a very useful tunable.  And it definitely has more overhead
than what this new GUC would have.



OK


> I find SlruType pretty odd, and the accompanying "if" list in
> pg_stat_get_slru() correspondingly so.  Would it be possible to have
> each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
> and query that, somehow.  (I don't think we have an array of SlruCtlData
> anywhere though, so this might be a useless idea.)

Well, maybe. We could have a system to register SLRUs dynamically, but
the trick here is that by having a fixed predefined number of SLRUs
simplifies serialization in pgstat.c and so on. I don't think the "if"
branches in pg_stat_get_slru() are particularly ugly, but maybe we could
replace the enum with a registry of structs, something like rmgrlist.h.
It seems like an overkill to me, though.


Yeah, maybe we don't have to fix that now.



IMO the current solution is sufficient for the purpose. I guess we could
just stick a name into the SlruCtlData (and remove SlruType entirely),
and use that to identify the stats entries. That might be enough, and in
fact we already have that - SimpleLruInit gets a name parameter and
copies that to the lwlock_tranche_name.

One of the main reasons why I opted to use the enum is that it makes
tracking, lookup and serialization pretty trivial - it's just an index
lookup, etc. But maybe it wouldn't be much more complex with the name, 
considering the name length is limited by SLRU_MAX_NAME_LENGTH. And we

probably don't expect many entries, so we could keep them in a simple
list, or maybe a simplehash.

I'm not sure what to do with data for SLRUs that might have disappeared
after a restart (e.g. because someone removed an extension). Until now
those would be in the all in the "other" entry.


The attached v2 fixes the issues in your first message:

- I moved the page_miss() call after SlruRecentlyUsed(), but then I
  realized it's entirely duplicate with the page_read() update done in
  SlruPhysicalReadPage(). I removed the call from SlruPhysicalReadPage()
  and renamed page_miss to page_read - that's more consistent with
  shared buffers stats, which also have buffers_hit and buffer_read.

- I've also implemented the reset. I ended up adding a new option to
  pg_stat_reset_shared, which always resets all SLRU entries. We track
  the reset timestamp for each SLRU entry, but the value is always the
  same. I admit this is a bit weird - I did it like this because (a) I'm
  not sure how to identify the individual entries and (b) the SLRU is
  shared, so pg_stat_reset_shared seems kinda natural.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index f8e7670f8d..3f45db7ea9 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -692,7 +692,8 @@ CLOGShmemInit(void)
 {
ClogCtl->PagePrecedes = CLOGPagePrecedes;
SimpleLruInit(ClogCtl, "clog", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
- CLogControlLock, "pg_xact", 
LWTRANCHE_CLOG_BUFFERS);
+ CLogControlLock, "pg_xact", 
LWTRANCHE_CLOG_BUFFERS,
+ SLRU_CLOG);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c 
b/src/backend/access/transam/commit_ts.c
index 630df672cc..44d7ca4483 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -494,7 +494,7 @@ CommitTsShmemInit(void)
CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
SimpleLruInit(CommitTsCtl, "commit_timestamp", CommitTsShmemBuffers(), 
0,
  CommitTsControlLock, "pg_commit_ts",
- LWTRANCHE_COMMITTS_BUFFERS);
+ LWTRANCHE_COMMITTS_BUFFERS, SLRU_COMMIT_TS);
 
commitTsShared = ShmemInitStruct("CommitTs shared",
 
sizeof(CommitTimestampShared),
diff --git a/src/backend/access/transam/multixact.c 
b/src/backend/access/transam/multixact.c
index 50e98caaeb..37a5854284 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1831,11 +1831,11 @@ MultiXactShmemInit(void)
SimpleLruInit(MultiXactOffsetCtl,
   

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-02-29 Thread Andres Freund
Hi,

On 2020-02-28 22:10:52 -0800, Andres Freund wrote:
> So, um. What happens is that doDeletion() does a catalog scan, which
> sets a snapshot. The results of that catalog scan are then used to
> perform modifications. But at that point there's no guarantee that we
> still hold *any* snapshot, as e.g. invalidations can trigger the catalog
> snapshot being released.
> 
> I don't see how that's safe. Without ->xmin preventing that,
> intermediate row versions that we did look up could just get vacuumed
> away, and replaced with a different row. That does seem like a serious
> issue?
> 
> I think there's likely a lot of places that can hit this? What makes it
> safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
> ->xmin when there previously has been *any* catalog access? Because in
> contrast to normal table modifications, there's currently nothing at all
> forcing us to hold a snapshot between catalog lookups an their
> modifications?
> 
> Am I missing something? Or is this a fairly significant hole in our
> arrangements?

I still think that's true.  In a first iteration I hacked around the
problem by explicitly registering a catalog snapshot in
RemoveTempRelations(). That *sometimes* allows to get through the
regression tests without the assertions triggering.

But I don't think that's good enough (even if we fixed the other
potential crashes similarly). The only reason that avoids the asserts is
because in nearly all other cases there's also a user snapshot that's
pushed. But that pushed snapshot can have an xmin that's newer than the
catalog snapshot, which means we're still in danger of tids from catalog
scans being outdated.

My preliminary conclusion is that it's simply not safe to do
SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
to defer the SnapshotResetXmin() call until at least
CommitTransactionCommand()? Outside of that there ought (with exception
of multi-transaction commands, but they have to be careful anyway) to be
no "in progress" sequences of related catalog lookups/modifications.

Alternatively we could ensure that all catalog lookup/mod sequences
ensure that the first catalog snapshot is registered. But that seems
like a gargantuan task?


> The easiest way to fix this would probably be to have inval.c call a
> version of InvalidateCatalogSnapshot() that leaves the oldest catalog
> snapshot around, but sets up things so that GetCatalogSnapshot() will
> return a freshly taken snapshot?  ISTM that pretty much every
> InvalidateCatalogSnapshot() call within a transaction needs that behaviour?


A related question is in which cases is it actually safe to use a
snapshot that's not registered, nor pushed as the active snapshot.
snapmgr.c just provides:

* Note that the return value may point at static storage that will be modified
 * by future calls and by CommandCounterIncrement().  Callers should call
 * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
 * used very long.

but doesn't clarify what 'very long' means. As far as I can tell,
there's very little that actually safe. It's probably ok to do a single
visiblity test, but anything that e.g. has a chance of accepting
invalidations is entirely unsafe?

Greetings,

Andres Freund




Re: vacuum verbose detail logs are unclear; log at *start* of each stage

2020-02-29 Thread Justin Pryzby
On Thu, Feb 27, 2020 at 10:10:57AM +0100, Peter Eisentraut wrote:
> On 2020-01-26 06:36, Justin Pryzby wrote:
> >Subject: [PATCH v3 1/4] Remove gettext erronously readded at 580ddce
> >
> >-appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
> >+appendStringInfo(&buf, "%s.", pg_rusage_show(&ru0));
> 
> Why do you think it was erroneously added?

I was wrong.

It seemed useless to me to translate "%s.", but I see now that at least JA.PO
uses a different terminator than ".".

$ git grep -C3 'msgid "%s."$' '*.po' |grep msgstr
src/backend/po/de.po-msgstr "%s."
src/backend/po/es.po-msgstr "%s."
src/backend/po/fr.po-msgstr "%s."
src/backend/po/id.po-msgstr "%s."
src/backend/po/it.po-msgstr "%s."
src/backend/po/ja.po-msgstr "%s。"

BTW I only *happened* to see your message on the www interface.  I didn't get
the original message.  And the "Resend email" button didn't get it to me,
either..

-- 
Justin




Re: Broken resetting of subplan hash tables

2020-02-29 Thread Andres Freund
Hi,

On 2020-02-29 14:02:59 -0500, Tom Lane wrote:
> > 3. Additionally since the memory context used by the hash tables is 
> > reset in buildSubPlanHash() if we start resetting hash tables we will 
> > get a use after free bug.
 > 
> Nope, not right.  The hash table metadata is now allocated in the
> es_query_cxt; what is in the hashtablecxt is just tuples, and that
> does need to be cleared, per the comment for ResetTupleHashTable.
> Your patch as given results in a nasty memory leak, which is easily
> demonstrated with a small mod to the regression test case I added:

> select sum(ss.tst::int) from
>   generate_series(1,1000) o cross join lateral (
> select i.ten in (select f1 from int4_tbl where f1 <= o) as tst,
>  random() as r
> from onek i where i.unique1 = o%1000 ) ss;
> 
> > Since the current behavior of the code in HEAD is not actually broken, 
> > it is just an optimization which is not used, this fix does not have to 
> > be backpatched.
> 
> Unfortunately ... this test case also leaks memory like mad in
> HEAD, v12, and v11, because all of them are rebuilding the hash
> table from scratch without clearing the old one.  So this is
> indeed broken and a back-patch is necessary.

Yea :(. Thanks for doing that.


> I noted while looking at this that most of the calls of
> ResetTupleHashTable are actually never reached in the regression
> tests (per code coverage report) so I made up some test cases
> that do reach 'em, and included those in the commit.

Cool.


> TBH, I think that this tuple table API is seriously misdesigned;
> it is confusing and very error-prone to have the callers need to
> reset the tuple context separately from calling ResetTupleHashTable.

Do you have an alternative proposal? Before committing the patch adding
it that way, I'd waited for quite a while asking for input...  In
several cases (nodeAgg.c, nodeSetOp.c) there's memory from outside
execGrouping.c that's also allocated in the same context as the table
contents (via TupleHashTable->additional) - just resetting the context
passed to BuildTupleHashTableExt() as part of ResetTupleHashTable()
seems problematic too.

We could change it so more of the metadata for execGrouping.c is
computed outside of BuildTupleHashTableExt(), and continue to destroy
the entire hashtable. But we'd still have to reallocate the hashtable,
the slot, etc.  So having a reset interface seems like the right thing.

I guess we could set it up so that BuildTupleHashTableExt() registers a
memory context reset callback on tablecxt, which'd reinitialize the
hashtable. But that seems like it'd be at least as confusing?


> Also, the callers all look like their resets are intended to destroy
> the whole hashtable not just its contents (as indeed they were doing,
> before the faulty commit).  But I didn't attempt to fix that today.

Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like
that to me? Why would they want to drop the hashtable metadata when
resetting? What am I missing?

Greetings,

Andres Freund




[PATCH] Add schema and table names to partition error

2020-02-29 Thread Chris Bandy

Hello,

I'm writing telemetry data into a table partitioned by time. When there 
is no partition for a particular date, my app notices the constraint 
violation, creates the partition, and retries the insert.


I'm used to handling constraint violations by observing the constraint 
name in the error fields. However, this error had none. I set out to add 
the name to the error field, but after a bit of reading my impression is 
that partition constraints are more like a property of a table.


I've attached a patch that adds the schema and table name fields to 
errors for my use case:


- Insert data into a partitioned table for which there is no partition.
- Insert data directly into an incorrect partition.

Thanks,
Chris
>From 8261b366c49b2d04baeb882d39dfa626b9315889 Mon Sep 17 00:00:00 2001
From: Chris Bandy 
Date: Sat, 29 Feb 2020 12:47:56 -0600
Subject: [PATCH] Add schema and table names to partition error

---
 src/backend/executor/execMain.c  | 3 ++-
 src/backend/executor/execPartition.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git src/backend/executor/execMain.c src/backend/executor/execMain.c
index ee5c3a60ff..7cb486b211 100644
--- src/backend/executor/execMain.c
+++ src/backend/executor/execMain.c
@@ -1878,7 +1878,8 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo,
 			(errcode(ERRCODE_CHECK_VIOLATION),
 			 errmsg("new row for relation \"%s\" violates partition constraint",
 	RelationGetRelationName(resultRelInfo->ri_RelationDesc)),
-			 val_desc ? errdetail("Failing row contains %s.", val_desc) : 0));
+			 val_desc ? errdetail("Failing row contains %s.", val_desc) : 0,
+			 errtable(resultRelInfo->ri_RelationDesc)));
 }
 
 /*
diff --git src/backend/executor/execPartition.c src/backend/executor/execPartition.c
index c13b1d3501..a5542b92c7 100644
--- src/backend/executor/execPartition.c
+++ src/backend/executor/execPartition.c
@@ -345,7 +345,8 @@ ExecFindPartition(ModifyTableState *mtstate,
 			RelationGetRelationName(rel)),
 	 val_desc ?
 	 errdetail("Partition key of the failing row contains %s.",
-			   val_desc) : 0));
+			   val_desc) : 0,
+	 errtable(rel)));
 		}
 
 		if (partdesc->is_leaf[partidx])
-- 
2.11.0



Re: subplan resets wrong hashtable

2020-02-29 Thread Tom Lane
I wrote:
> Right.  So the incorrect ResetTupleHashTable call is unreachable
> (and a look at the code coverage report confirms that).  The whole
> thing obviously is a bit hasty and unreviewed, but it doesn't have
> a live bug AFAICS ... or at least, if there's a bug, it's a memory
> leakage issue across repeat executions, not a crash hazard.

For the archives' sake: this *is* a memory leak, and we dealt with
it at 58c47ccfff20b8c125903482725c1dbfd30beade.

regards, tom lane




Re: Broken resetting of subplan hash tables

2020-02-29 Thread Tom Lane
Andreas Karlsson  writes:
> The code for resetting the hash tables of the SubPlanState node in 
> buildSubPlanHash() looks like it can never run, and additionally it 
> would be broken if it would ever run. This issue was introduced in 
> commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd.

Right.  Justin Pryzby also noted this a couple weeks back, but we
didn't deal with it at that point because we were up against a
release deadline.

> As far as I gather the following is true:

> 1. It sets node->hashtable and node->hashnulls to NULL a few lines 
> before checking if they are not NULL which means the code for resetting 
> them cannot ever be reached.

Yeah.  Those lines should have been removed when the ResetTupleHashTable
logic was added.

> 2. But if we changed to code so that the ResetTupleHashTable() calls are 
> reachable we would hit a typo. It resets node->hashtable twice and never 
> resets node->hashnulls which would cause non-obvious bugs.

Right.

> 3. Additionally since the memory context used by the hash tables is 
> reset in buildSubPlanHash() if we start resetting hash tables we will 
> get a use after free bug.

Nope, not right.  The hash table metadata is now allocated in the
es_query_cxt; what is in the hashtablecxt is just tuples, and that
does need to be cleared, per the comment for ResetTupleHashTable.
Your patch as given results in a nasty memory leak, which is easily
demonstrated with a small mod to the regression test case I added:

select sum(ss.tst::int) from
  generate_series(1,1000) o cross join lateral (
select i.ten in (select f1 from int4_tbl where f1 <= o) as tst,
 random() as r
from onek i where i.unique1 = o%1000 ) ss;

> Since the current behavior of the code in HEAD is not actually broken, 
> it is just an optimization which is not used, this fix does not have to 
> be backpatched.

Unfortunately ... this test case also leaks memory like mad in
HEAD, v12, and v11, because all of them are rebuilding the hash
table from scratch without clearing the old one.  So this is
indeed broken and a back-patch is necessary.

I noted while looking at this that most of the calls of
ResetTupleHashTable are actually never reached in the regression
tests (per code coverage report) so I made up some test cases
that do reach 'em, and included those in the commit.

TBH, I think that this tuple table API is seriously misdesigned;
it is confusing and very error-prone to have the callers need to
reset the tuple context separately from calling ResetTupleHashTable.
Also, the callers all look like their resets are intended to destroy
the whole hashtable not just its contents (as indeed they were doing,
before the faulty commit).  But I didn't attempt to fix that today.

regards, tom lane




Re: Portal->commandTag as an enum

2020-02-29 Thread Mark Dilger



> On Feb 28, 2020, at 5:42 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>>> On Feb 28, 2020, at 3:05 PM, Tom Lane  wrote:
>>> Is there a way to drop that logic altogether by making the tagname string
>>> be "INSERT 0" for the INSERT case?  Or would the zero bleed into other
>>> places where we don't want it?
> 
>> In general, I don't think we want to increase the number of distinct
>> tags.  Which command you finished running and whether you want a
>> rowcount and/or lastoid are orthogonal issues.
> 
> Well, my thought is that last_oid is gone and it isn't ever coming back.
> So the less code we use supporting a dead feature, the better.
> 
> If we can't remove the special case in EndCommand() altogether, I'd be
> inclined to hard-code it as "if (tag == CMDTAG_INSERT ..." rather than
> expend infrastructure on treating last_oid as a live option for commands
> to have.

You may want to think about the embedding of InvalidOid into the EndCommand 
output differently from how you think about the embedding of the rowcount into 
the EndCommand output, but my preference is to treat these issues the same and 
make a strong distinction between the commandtag and the embedded oid and/or 
rowcount.  It's hard to say how many future features would be crippled by 
having the embedded InvalidOid in the commandtag, but as an example *right now* 
in the works, we have a feature to count how many commands of a given type have 
been executed.  It stands to reason that feature, whether accepted in its 
current form or refactored, would not want to show users a pg_stats table like 
this:

   cnt   command
      -
   5INSERT 0
   37  SELECT

What the heck is the zero doing after the INSERT?  That's the hardcoded 
InvalidOid that you are apparently arguing for.  You could get around that by 
having the pg_sql_stats patch have its own separate set of command tag strings, 
but why would we intentionally design that sort of duplication into the 
solution?

As for hardcoding the behavior of whether to embed a rowcount in the output 
from EndCommand; In src/backend/replication/walsender.c, 
exec_replication_command() returns "SELECT" from EndCommand, and not "SELECT 
$rowcount" like everywhere else.  The patch as submitted does not change 
behavior.  It only refactors the code while preserving the current behavior.  
So we would have to agree that the patch can change how 
exec_replication_command() behaves and start embedding a rowcount there, too, 
if we want to make SELECT behave the same everywhere.

There is another problem, though, which is that if we're hoping to eventually 
abate this historical behavior and stop embedding InvalidOid and/or rowcount in 
the commandtag returned from EndCommand, it might be necessary (for backward 
compatibility with clients) to do that incrementally, in which case we still 
need the distinction between commandtags and formats to exist in the code.  How 
else can you say that, for example, in the next rev of the protocol that we're 
not going to embed InvalidOid anymore, but we will continue to return it for 
clients who connect via the older protocol?  What if the next rev of the 
protocol still returns rowcount, but in a way that doesn't require the clients 
to implement (or link to) a parser that extracts the rowcount by parsing a 
string?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pgbench: option delaying queries till connections establishment?

2020-02-29 Thread Andres Freund
Hi,

On 2020-02-29 15:29:19 +0100, Fabien COELHO wrote:
> Pgbench is more or less designed to run a long hopefully steady-state
> benchmark, so that the initial connection setup is always negligeable. Not
> complying with this hypothesis quite often leads to weird results.

I don't think this is a good starting point. Sure, a longer run will
yield more precise results, and one needs more than just an
instantaneous measurement. But in a lot of cases we want to use pgbench
to measure a lot of different variations, making it infeasible for each
run to be all that long.

Of course whether that's feasible depends on the workload (e.g. readonly
runs can be shorter than read/write runs).

Also note that in the case that made me look at this, you'd have to run
the test for *weeks* to drown out the performance difference that's
solely caused by difference in how long individual connects are
established. Partially because the "excluding connection establishing"
number is entirely broken, but also because fewer connections having
been established changes the performance so much.


I think we should also consider making pgbench actually use non-blocking
connection establishment. It seems pretty weird that that's the one
libpq operation where we don't? In particular for -C, with -c > -j,
that makes the results pretty meaningless.


> Adding a synchronization barrier should be simple enough, I thought about it
> in the past.
> 
> However, I'd still be wary that it is no silver bullet: if you start a lot
> of threads compared to the number of available cores, pgbench would
> basically overload the system, and you would experience a lot of waiting
> time which reflects that the client code has not got enough cpu time.
> Basically you would be testing the OS process/thread management performance.

Sure, that's possible. But I don't see what that has to do with the
barrier?

Also, most scripts actually have client/server interaction...

Greetings,

Andres Freund




Re: Yet another fast GiST build (typo)

2020-02-29 Thread Andrey M. Borodin


> On 29 февр. 2020 г., at 17:20, Erik Rijkers  wrote:
> 
> Small typo alert:
> In v5-0002-Implement-GiST-build-using-sort-support.patch there is:
> 
> +   method is also optional and is used diring fast GiST build.
> 
> 'diring' should be 'during'
> 

Thanks!

I've fixed this and added patch with GiST reloption to provide sort support 
function.

Best regards, Andrey Borodin.



v6-0001-Add-sort-support-for-point-gist_point_sortsupport.patch
Description: Binary data


v6-0002-Implement-GiST-build-using-sort-support.patch
Description: Binary data


v6-0003-Function-relopt-for-gist-build.patch
Description: Binary data


Re: proposal \gcsv

2020-02-29 Thread David Fetter
On Sat, Feb 29, 2020 at 11:59:22AM +0100, Pavel Stehule wrote:
> so 29. 2. 2020 v 11:34 odesílatel Vik Fearing 
> napsal:
> 
> > On 29/02/2020 06:43, Pavel Stehule wrote:
> > > Hi
> > >
> > > I would to enhance \g command about variant \gcsv
> > >
> > > proposed command has same behave like \g, only the result will be every
> > > time in csv format.
> > >
> > > It can helps with writing psql macros wrapping \g command.
> > >
> > > Options, notes?
> >
> > But then we would need \ghtml and \glatex etc.  If we want a shortcut
> > for setting a one-off format, I would go for \gf or something.
> >
> > \gf csv
> > \gf html
> > \gf latex
> >
> 
> usability of html or latex format in psql is significantly lower than csv
> format. There is only one generic format for data - csv.

Not exactly.  There's a lot of uses for things along the lines of 

\gf json
\gf yaml

I'd rather add a new \gf that takes arguments, as it seems more
extensible. For example, there are uses for

\gf csv header

if no header is the default, or 

\gf csv noheader

if header is the default.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: ALTER tbl rewrite loses CLUSTER ON index

2020-02-29 Thread Justin Pryzby
On Fri, Feb 28, 2020 at 08:42:02PM -0600, Justin Pryzby wrote:
> On Fri, Feb 28, 2020 at 06:26:04PM -0500, Tom Lane wrote:
> > Justin Pryzby  writes:
> > > I think the attached is 80% complete (I didn't touch pg_dump).
> > > One objection to this change would be that all relations (including 
> > > indices)
> > > end up with relclustered fields, and pg_index already has a number of 
> > > bools, so
> > > it's not like this one bool is wasting a byte.
> > > I think relisclustered was a's clever way of avoiding that overhead 
> > > (c0ad5953).
> > > So I would be -0.5 on moving it to pg_class..
> > > But I think 0001 and 0002 are worthy.  Maybe the test in 0002 should live
> > > somewhere else.
> > 
> > 0001 has been superseded by events (faade5d4c), so the cfbot is choking
> > on that one's failure to apply, and not testing any further.  Please
> > repost without 0001 so that we can get this testing again.
> 
> I've just noticed while working on (1) that this separately affects REINDEX
> CONCURRENTLY, which would be a new bug in v12.  Without CONCURRENTLY there's 
> no
> issue.  I guess we need a separate patch for that case.

Rebased Amit's patch and included my own 0002 to fix the REINDEX CONCURRENTLY
issue.

-- 
Justin
>From dccc6f08450eacafc3a08bc8b2836d2feb23a533 Mon Sep 17 00:00:00 2001
From: Amit Langote 
Date: Thu, 6 Feb 2020 18:14:16 +0900
Subject: [PATCH v4 1/2] ALTER tbl rewrite loses CLUSTER ON index

---
 src/backend/commands/tablecmds.c  | 39 +++
 src/test/regress/expected/alter_table.out | 34 
 src/test/regress/sql/alter_table.sql  | 16 +-
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 02a7c04fdb..6b2469bd09 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -490,6 +490,7 @@ static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
 static void RebuildConstraintComment(AlteredTableInfo *tab, int pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
+static void PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
 static void TryReuseForeignKey(Oid oldId, Constraint *con);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
@@ -11838,6 +11839,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			newcmd->def = (Node *) stmt;
 			tab->subcmds[AT_PASS_OLD_INDEX] =
 lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd);
+
+			/* Preserve index's indisclustered property, if set. */
+			PreserveClusterOn(tab, AT_PASS_OLD_INDEX, oldId);
 		}
 		else if (IsA(stm, AlterTableStmt))
 		{
@@ -11874,6 +11878,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			 rel,
 			 NIL,
 			 indstmt->idxname);
+
+	/* Preserve index's indisclustered property, if set. */
+	PreserveClusterOn(tab, AT_PASS_OLD_INDEX, indoid);
 }
 else if (cmd->subtype == AT_AddConstraint)
 {
@@ -11996,6 +12003,38 @@ RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid,
 	tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd);
 }
 
+/*
+ * For a table's index that is to be recreated due to PostAlterType
+ * processing, preserve its indisclustered property by issuing ALTER TABLE
+ * CLUSTER ON command on the table that will run after the command to recreate
+ * the index.
+ */
+static void
+PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	Assert(OidIsValid(indoid));
+	Assert(pass == AT_PASS_OLD_INDEX);
+
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indoid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indoid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	if (indexForm->indisclustered)
+	{
+		AlterTableCmd *newcmd = makeNode(AlterTableCmd);
+
+		newcmd->subtype = AT_ClusterOn;
+		newcmd->name = get_rel_name(indoid);
+		tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd);
+	}
+
+	ReleaseSysCache(indexTuple);
+}
+
 /*
  * Subroutine for ATPostAlterTypeParse().  Calls out to CheckIndexCompatible()
  * for the real analysis, then mutates the IndexStmt based on that verdict.
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index fb6d86a269..a01c6d6ec5 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -4296,3 +4296,37 @@ create trigger xtrig
 update bar1 set a = a + 1;
 INFO:  a=1, b=1
 /* End test case for bug #16242 */
+-- alter type rewrite/rebuild should preserve cluster marking on index
+create table alttype_cluster (a int);
+create index alttype_cluster_a on alttype_cluster (a);
+alter table alttype_cluster cluster on alttype_cluster_a;
+select 

Re: Binary support for pgoutput plugin

2020-02-29 Thread Dave Cramer
On Fri, 28 Feb 2020 at 18:34, Tom Lane  wrote:

> Dave Cramer  writes:
> > Rebased against head
>
> The cfbot's failing to apply this [1].  It looks like the reason is only
> that you included a catversion bump in what you submitted.  Protocol is to
> *not* do that in submitted patches, but rely on the committer to add it at
> the last minute --- otherwise you'll waste a lot of time rebasing the
> patch, which is what it needs now.
>
> regards, tom lane
>
> [1] http://cfbot.cputube.org/patch_27_2152.log


rebased and removed the catversion bump.


0006-make-sure-the-subcription-is-marked-as-binary.patch
Description: Binary data


0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
Description: Binary data


0002-add-binary-column-to-pg_subscription.patch
Description: Binary data


0004-get-relid-inside-of-logical_read_insert.patch
Description: Binary data


0005-Remove-C99-declaration-and-code.patch
Description: Binary data


0001-First-pass-at-working-code-without-subscription-opti.patch
Description: Binary data


0007-check-that-the-subscriber-is-compatible-with-the-pub.patch
Description: Binary data


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-02-29 Thread Justin Pryzby
On Sat, Feb 29, 2020 at 03:35:27PM +0300, Alexey Kondratov wrote:
> Anyway, new version is attached. It is rebased in order to resolve conflicts
> with a recent fix of REINDEX CONCURRENTLY + temp relations, and includes
> this small comment fix.

Thanks for rebasing - I actually started to do that yesterday.

I extracted the bits from your original 0001 patch which handled CLUSTER and
VACUUM FULL.  I don't think if there's any interest in combining that with
ALTER anymore.  On another thread (1), I tried to implement that, and Tom
pointed out problem with the implementation, but also didn't like the idea.

I'm including some proposed fixes, but didn't yet update the docs, errors or
tests for that.  (I'm including your v8 untouched in hopes of not messing up
the cfbot).  My fixes avoid an issue if you try to REINDEX onto pg_default, I
think due to moving system toast indexes.

template1=# REINDEX DATABASE template1 TABLESPACE pg_default;
2020-02-29 08:01:41.835 CST [23382] WARNING:  cannot change tablespace of 
indexes for mapped relations, skipping all
WARNING:  cannot change tablespace of indexes for mapped relations, skipping all
2020-02-29 08:01:41.894 CST [23382] ERROR:  SMgrRelation hashtable corrupted
2020-02-29 08:01:41.894 CST [23382] STATEMENT:  REINDEX DATABASE template1 
TABLESPACE pg_default;
2020-02-29 08:01:41.894 CST [23382] WARNING:  AbortTransaction while in COMMIT 
state
2020-02-29 08:01:41.895 CST [23382] PANIC:  cannot abort transaction 491, it 
was already committed

-- 
Justin

(1) 
https://www.postgresql.org/message-id/flat/20200208150453.GV403%40telsasoft.com
>From e3f7d8e08a00d3c6d02464b81e25b220b5945c89 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Sat, 29 Feb 2020 15:35:27 +0300
Subject: [PATCH v9 1/3] Allow CLUSTER, VACUUM FULL and REINDEX to change
 tablespace on the fly

On 2020-02-11 19:48, Justin Pryzby wrote:
> For your v7 patch, which handles REINDEX to a new tablespace, I have a
> few
> minor comments:
>
> + * the relation will be rebuilt.  If InvalidOid is used, the default
>
> => should say "currrent", not default ?
>

Yes, it keeps current index tablespace in that case, thanks.

>
> +++ b/doc/src/sgml/ref/reindex.sgml
> +TABLESPACE
> ...
> + class="parameter">new_tablespace
>
> => I saw you split the description of TABLESPACE from new_tablespace
> based on
> comment earlier in the thread, but I suggest that the descriptions for
> these
> should be merged, like:
>
> +   
> +TABLESPACE class="parameter">new_tablespace
> +
> + 
> +  Allow specification of a tablespace where all rebuilt indexes
> will be created.
> +  Cannot be used with "mapped" relations. If
> SCHEMA,
> +  DATABASE or SYSTEM are
> specified, then
> +  all unsuitable relations will be skipped and a single
> WARNING
> +  will be generated.
> + 
> +
> +   
>

It sounds good to me, but here I just obey the structure, which is used
all around. Documentation of ALTER TABLE/DATABASE, REINDEX and many
others describes each literal/parameter in a separate entry, e.g.
new_tablespace. So I would prefer to keep it as it is for now.

>
> The existing patch is very natural, especially the parts in the
> original patch
> handling vacuum full and cluster.  Those were removed to concentrate on
> REINDEX, and based on comments that it might be nice if ALTER handled
> CLUSTER
> and VACUUM FULL.  On a separate thread, I brought up the idea of ALTER
> using
> clustered order.  Tom pointed out some issues with my implementation,
> but
> didn't like the idea, either.
>
> So I suggest to re-include the CLUSTER/VAC FULL parts as a separate
> 0002 patch,
> the same way they were originally implemented.
>
> BTW, I think if "ALTER" were updated to support REINDEX (to allow
> multiple
> operations at once), it might be either:
> |ALTER INDEX i SET TABLESPACE , REINDEX -- to reindex a single index
> on a given tlbspc
> or
> |ALTER TABLE tbl REINDEX USING INDEX TABLESPACE spc; -- to reindex all
> inds on table inds moved to a given tblspc
> "USING INDEX TABLESPACE" is already used for ALTER..ADD column/table
> CONSTRAINT.
>

Yes, I also think that allowing REINDEX/CLUSTER/VACUUM FULL to put
resulting relation in a different tablespace is a very natural
operation. However, I did a couple of attempts to integrate latter two
with ALTER TABLE and failed with it, since it is already complex enough.
I am still willing to proceed with it, but not sure how soon it will be.

Anyway, new version is attached. It is rebased in order to resolve
conflicts with a recent fix of REINDEX CONCURRENTLY + temp relations,
and includes this small comment fix.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company

>From d2b7a5fa2e11601759b47af0c142a7824ef907a2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 30 Dec 2019 20:00:37 +0300
Subject: [PATCH v8] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this

[REPORT] Possible Memory Leak Postgres Windows

2020-02-29 Thread Ranier Vilela
Hi,
I'm sending this report from DrMemory, which shows some leaks from the
current postgres.
DrMemory is it is a reliable tool, but it is not perfect. (
https://drmemory.org/)

regards,
Ranier Vilela
Dr. Memory version 2.2.0 build 1 built on Jul  1 2019 00:42:20
Windows version: WinVer=105;Rel=1909;Build=18363;Edition=Core
Dr. Memory results for pid 24772: "postgres.exe"
Application cmdline: "postgres.exe -D c:/postgres/data -F -c 
listen_addresses=localhost -k """
Recorded 117 suppression(s) from default c:\DrMemory\bin\suppress-default.txt

Error #1: UNINITIALIZED READ: reading register eax
# 0 KERNELBASE.dll!FindNextFileW 
# 1 KERNELBASE.dll!FindNextFileA 
# 2 readdir   
[C:\dll\postgres\src\port\dirent.c:96]
# 3 ReadDir   
[C:\dll\postgres\src\backend\storage\file\fd.c:2660]
# 4 SlruScanDirectory 
[C:\dll\postgres\src\backend\access\transam\slru.c:1401]
# 5 AsyncShmemInit
[C:\dll\postgres\src\backend\commands\async.c:564]
# 6 CreateSharedMemoryAndSemaphores   
[C:\dll\postgres\src\backend\storage\ipc\ipci.c:265]
# 7 PostmasterMain
[C:\dll\postgres\src\backend\postmaster\postmaster.c:1008]
# 8 main  
[C:\dll\postgres\src\backend\main\main.c:210]
Note: @0:00:12.076 in thread 15692
Note: instruction: cmp%eax $0x0018

Error #2: UNINITIALIZED READ: reading register eax
# 0 KERNELBASE.dll!FindNextFileW 
# 1 KERNELBASE.dll!FindNextFileA 
# 2 readdir   
[C:\dll\postgres\src\port\dirent.c:96]
# 3 ReadDir   
[C:\dll\postgres\src\backend\storage\file\fd.c:2660]
# 4 SlruScanDirectory 
[C:\dll\postgres\src\backend\access\transam\slru.c:1401]
# 5 AsyncShmemInit
[C:\dll\postgres\src\backend\commands\async.c:564]
# 6 CreateSharedMemoryAndSemaphores   
[C:\dll\postgres\src\backend\storage\ipc\ipci.c:265]
# 7 PostmasterMain
[C:\dll\postgres\src\backend\postmaster\postmaster.c:1008]
# 8 main  
[C:\dll\postgres\src\backend\main\main.c:210]
Note: @0:00:12.076 in thread 15692
Note: instruction: data16 mov%cx -> 0x0234(%esi,%eax,2)

Error #3: UNINITIALIZED READ: reading register eax
# 0 KERNELBASE.dll!FindNextFileW 
# 1 KERNELBASE.dll!FindNextFileA 
# 2 readdir   
[C:\dll\postgres\src\port\dirent.c:96]
# 3 ReadDir   
[C:\dll\postgres\src\backend\storage\file\fd.c:2660]
# 4 SlruScanDirectory 
[C:\dll\postgres\src\backend\access\transam\slru.c:1401]
# 5 AsyncShmemInit
[C:\dll\postgres\src\backend\commands\async.c:564]
# 6 CreateSharedMemoryAndSemaphores   
[C:\dll\postgres\src\backend\storage\ipc\ipci.c:265]
# 7 PostmasterMain
[C:\dll\postgres\src\backend\postmaster\postmaster.c:1008]
# 8 main  
[C:\dll\postgres\src\backend\main\main.c:210]
Note: @0:00:12.076 in thread 15692
Note: instruction: test   %eax %eax

Error #4: UNINITIALIZED READ: reading register eax
# 0 KERNELBASE.dll!FindNextFileW 
# 1 KERNELBASE.dll!FindNextFileA 
# 2 readdir 
[C:\dll\postgres\src\port\dirent.c:96]
# 3 RemovePgTempRelationFiles   
[C:\dll\postgres\src\backend\storage\file\fd.c:3142]
# 4 RemovePgTempFiles   
[C:\dll\postgres\src\backend\storage\file\fd.c:3020]
# 5 PostmasterMain  
[C:\dll\postgres\src\backend\postmaster\postmaster.c:1325]
# 6 main
[C:\dll\postgres\src\backend\main\main.c:210]
Note: @0:00:13.826 in thread 15692
Note: instruction: cmp%eax $0x0018

Error #5: UNINITIALIZED READ: reading register eax
# 0 KERNELBASE.dll!FindNextFileW 
# 1 KERNELBASE.dll!FindNextFileA 
# 2 readdir 
[C:\dll\postgres\src\port\dirent.c:96]
# 3 RemovePgTempRelationFiles   
[C:\dll\postgres\src\backend\storage\file\fd.c:3142]
# 4 RemovePgTempFiles   
[C:\dll\postgres\src\backend\storage\file\fd.c:3020]
# 5 PostmasterMain  
[C:\dll\postgres\src\backend\postmaster\postmaster.c:1325]
# 6 main
[C:\dll\postgres\src\backend\main\main.c:210]
Note: @0:00:13.826 in thread 15692
Note: instruction: data16 mov%cx -> 0x0234(%esi,%eax,2)

Error #6: UNINITIALIZED READ: reading register eax
# 0 KERNELBASE.dll!FindNextFileW 
# 1 KERNELBASE.dll!FindNextFileA 
# 2 readdir  
[C:\dll\postgres\src\port\dirent.c:96]
# 3 RemovePgTempRelationFilesInDbspace   
[C:\dll\postgres\src\backend\storage\file\fd.c:3170]
# 4 RemoveP

Re: SLRU statistics

2020-02-29 Thread Alvaro Herrera
On 2020-Feb-29, Tomas Vondra wrote:

> Did we actually remove track-enabling GUCs? I think we still have
> 
>  - track_activities
>  - track_counts
>  - track_io_timing
>  - track_functions
> 
> But maybe I'm missing something?

Hm I remembered we removed the one for row-level stats
(track_row_stats), but what we really did is merge it with block-level
stats (track_block_stats) into track_counts -- commit 48f7e6439568. 
Funnily enough, if you disable that autovacuum won't work, so I'm not
sure it's a very useful tunable.  And it definitely has more overhead
than what this new GUC would have.

> > I find SlruType pretty odd, and the accompanying "if" list in
> > pg_stat_get_slru() correspondingly so.  Would it be possible to have
> > each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
> > and query that, somehow.  (I don't think we have an array of SlruCtlData
> > anywhere though, so this might be a useless idea.)
> 
> Well, maybe. We could have a system to register SLRUs dynamically, but
> the trick here is that by having a fixed predefined number of SLRUs
> simplifies serialization in pgstat.c and so on. I don't think the "if"
> branches in pg_stat_get_slru() are particularly ugly, but maybe we could
> replace the enum with a registry of structs, something like rmgrlist.h.
> It seems like an overkill to me, though.

Yeah, maybe we don't have to fix that now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgbench: option delaying queries till connections establishment?

2020-02-29 Thread Fabien COELHO



Hello Kyotaro-san,


I think this also shows that "including/excluding connections
establishing" as well as some of the other stats reported pretty
bogus. In the 'before' case a substantial numer of the connections had
not yet been established until the end of the test run!


I see it useful. In most cases we don't care connection time of
pgbench. Especially in the mentioned case the result is just bogus.  I
think the reason for "including/excluding connection establishing" is
not that people wants to see how long connection took to establish but
that how long the substantial work took.  If each client did run with
continuously re-establishing new connections the connection time would
be useful, but actually all the connections are established at once at
the beginning.

So FWIW I prefer that the barrier is applied by default


Yep.


(that is, it can be disabled)


On reflection, I'm not sure I see a use case for not running the barrier 
if it is available.


and the progress time starts at the time all clients has been 
established.


Yep, the start time should be set after the connection barrier, and 
possibly before a start barrier to ensure that no transaction has started 
before the start time: although performance measures are expected to be 
slightly false because of how they are measured (measuring takes time), 
from a benchmarking perspective the displayed result should be <= the 
actual performance.


Now, again, if long benchmarks are run, which for a db should more or less 
always be the case, this should not matter much.



starting vacuum...end.

+ time to established 5000 connections: 1323ms


Yep, maybe showing the initial connection time separately.

--
Fabien.




Re: pgbench: option delaying queries till connections establishment?

2020-02-29 Thread Fabien COELHO


Hello Andres,


Therefore I'd like to make pgbench wait till it has established all
connections, before they run queries.



Does anybody else see this as being useful?


Yes, I think that having this behavior available would make sense.


If so, should this be done unconditionally?


Dunno. I should think about it. I'd say probably.

Pgbench is more or less designed to run a long hopefully steady-state 
benchmark, so that the initial connection setup is always negligeable. Not 
complying with this hypothesis quite often leads to weird results.



A new option?


Maybe, if not unconditional.

If there is an unconditional barrier, the excluding/including connection 
stuff does not make a lot of sense when not under -C, if it did make any 
sense before…



Included in an existing one somehow?


Which one would you suggest?

Adding a synchronization barrier should be simple enough, I thought about 
it in the past.


However, I'd still be wary that it is no silver bullet: if you start a lot 
of threads compared to the number of available cores, pgbench would 
basically overload the system, and you would experience a lot of waiting 
time which reflects that the client code has not got enough cpu time. 
Basically you would be testing the OS process/thread management 
performance.


On my 4-core laptop, with a do-nothing script (\set i 0):

  sh> pgbench -T 10 -f nope.sql -P 1 -j 10 -c 10
  latency average = 0.000 ms
  latency stddev = 0.049 ms
  tps = 21048841.630291 (including connections establishing)
  tps = 21075139.938887 (excluding connections establishing)

  sh> pgbench -T 10 -f nope.sql -P 1 -j 100 -c 100
  latency average = 0.002 ms
  latency stddev = 0.470 ms
  tps = 23846777.241370 (including connections establishing)
  tps = 24132252.146257 (excluding connections establishing)

Throughput is slightly better, latency average and variance explode 
because each thread is given stretches of cpu time to advance, then wait 
for the next round of cpu time.


--
Fabien.

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-02-29 Thread Alexey Kondratov

On 2020-02-11 19:48, Justin Pryzby wrote:
For your v7 patch, which handles REINDEX to a new tablespace, I have a 
few

minor comments:

+ * the relation will be rebuilt.  If InvalidOid is used, the default

=> should say "currrent", not default ?



Yes, it keeps current index tablespace in that case, thanks.



+++ b/doc/src/sgml/ref/reindex.sgml
+TABLESPACE
...
+class="parameter">new_tablespace


=> I saw you split the description of TABLESPACE from new_tablespace 
based on
comment earlier in the thread, but I suggest that the descriptions for 
these

should be merged, like:

+   
+TABLESPACEnew_tablespace
+
+ 
+  Allow specification of a tablespace where all rebuilt indexes
will be created.
+  Cannot be used with "mapped" relations. If 
SCHEMA,

+  DATABASE or SYSTEM are
specified, then
+  all unsuitable relations will be skipped and a single
WARNING
+  will be generated.
+ 
+
+   



It sounds good to me, but here I just obey the structure, which is used 
all around. Documentation of ALTER TABLE/DATABASE, REINDEX and many 
others describes each literal/parameter in a separate entry, e.g. 
new_tablespace. So I would prefer to keep it as it is for now.




The existing patch is very natural, especially the parts in the 
original patch

handling vacuum full and cluster.  Those were removed to concentrate on
REINDEX, and based on comments that it might be nice if ALTER handled 
CLUSTER
and VACUUM FULL.  On a separate thread, I brought up the idea of ALTER 
using
clustered order.  Tom pointed out some issues with my implementation, 
but

didn't like the idea, either.

So I suggest to re-include the CLUSTER/VAC FULL parts as a separate 
0002 patch,

the same way they were originally implemented.

BTW, I think if "ALTER" were updated to support REINDEX (to allow 
multiple

operations at once), it might be either:
|ALTER INDEX i SET TABLESPACE , REINDEX -- to reindex a single index
on a given tlbspc
or
|ALTER TABLE tbl REINDEX USING INDEX TABLESPACE spc; -- to reindex all
inds on table inds moved to a given tblspc
"USING INDEX TABLESPACE" is already used for ALTER..ADD column/table 
CONSTRAINT.




Yes, I also think that allowing REINDEX/CLUSTER/VACUUM FULL to put 
resulting relation in a different tablespace is a very natural 
operation. However, I did a couple of attempts to integrate latter two 
with ALTER TABLE and failed with it, since it is already complex enough. 
I am still willing to proceed with it, but not sure how soon it will be.


Anyway, new version is attached. It is rebased in order to resolve 
conflicts with a recent fix of REINDEX CONCURRENTLY + temp relations, 
and includes this small comment fix.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company
From d2b7a5fa2e11601759b47af0c142a7824ef907a2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 30 Dec 2019 20:00:37 +0300
Subject: [PATCH v8] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml | 24 +-
 src/backend/catalog/index.c   | 75 --
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  | 96 ---
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/parser/gram.y | 14 ++--
 src/backend/tcop/utility.c|  6 +-
 src/bin/psql/tab-complete.c   |  6 ++
 src/include/catalog/index.h   |  7 +-
 src/include/commands/defrem.h |  6 +-
 src/include/nodes/parsenodes.h|  1 +
 src/test/regress/input/tablespace.source  | 49 
 src/test/regress/output/tablespace.source | 66 
 15 files changed, 323 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index c54a7c420d4..0628c94bb1e 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name [ TABLESPACE new_tablespace ]
 
 where option can be one of:
 
@@ -174,6 +174,28 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, then
+  all unsuitable relations will be skipped and a single WARNING
+  will be generated.
+ 
+
+   
+
+   
+new_tablespace
+
+ 
+  The name of th

Re: SLRU statistics

2020-02-29 Thread Tomas Vondra

On Fri, Feb 28, 2020 at 08:19:18PM -0300, Alvaro Herrera wrote:

On 2020-Jan-21, Tomas Vondra wrote:


On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:



> I've not tested the performance impact but perhaps we might want to
> disable these counter by default and controlled by a GUC. And similar
> to buffer statistics it might be better to inline
> pgstat_count_slru_page_xxx function for better performance.

Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
something like track_slru GUC.


I disagree with adding a GUC.  If a performance impact can be measured
let's turn the functions to static inline, as already proposed.  My
guess is that pgstat_count_slru_page_hit() is the only candidate for
that; all the other paths involve I/O or lock acquisition or even WAL
generation, so the impact won't be measurable anyhow.  We removed
track-enabling GUCs years ago.



Did we actually remove track-enabling GUCs? I think we still have

 - track_activities
 - track_counts
 - track_io_timing
 - track_functions

But maybe I'm missing something?

That being said, I'm not sure we need to add a GUC. I'll do some
measurements and we'll see. Maybe the statis inline will me enough.


BTW, this comment:
/* update the stats counter of pages found in shared 
buffers */

is not strictly true, because we don't use what we normally call "shared
buffers"  for SLRUs.



Oh, right. Will fix.


Patch applies cleanly.  I suggest to move the page_miss() call until
after SlruRecentlyUsed(), for consistency with the other case.



OK.


I find SlruType pretty odd, and the accompanying "if" list in
pg_stat_get_slru() correspondingly so.  Would it be possible to have
each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
and query that, somehow.  (I don't think we have an array of SlruCtlData
anywhere though, so this might be a useless idea.)



Well, maybe. We could have a system to register SLRUs dynamically, but
the trick here is that by having a fixed predefined number of SLRUs
simplifies serialization in pgstat.c and so on. I don't think the "if"
branches in pg_stat_get_slru() are particularly ugly, but maybe we could
replace the enum with a registry of structs, something like rmgrlist.h.
It seems like an overkill to me, though.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Yet another fast GiST build (typo)

2020-02-29 Thread Erik Rijkers

On 2020-02-29 13:13, Andrey M. Borodin wrote:

Hi!

On 24 февр. 2020 г., at 13:50, Andrey M. Borodin 
 wrote:


Hi Thomas!

Thanks for looking into this! I’ll fix your notices asap.


PFA v5.
Thomas, I've used your wording almost exactly with explanation how
point_zorder_internal() works. It has more explanation power than my 
attempts

to compose good comment.


Small typo alert:
In v5-0002-Implement-GiST-build-using-sort-support.patch there is:

+   method is also optional and is used diring fast GiST build.

'diring' should be 'during'





Re: Yet another fast GiST build

2020-02-29 Thread Andrey M. Borodin
Hi!

> On 24 февр. 2020 г., at 13:50, Andrey M. Borodin  wrote:
> 
> Hi Thomas!
> 
> Thanks for looking into this! I’ll fix your notices asap.

PFA v5.
Thomas, I've used your wording almost exactly with explanation how
point_zorder_internal() works. It has more explanation power than my attempts
to compose good comment.

There is one design decision that worries me most:
should we use opclass function or index option to provide this sorting 
information?
It is needed only during index creation, actually. And having extra i-class 
only for fast build
seems excessive.
I think we can provide both ways and let opclass developers decide?

Thanks!

Best regards, Andrey Borodin.



v5-0001-Add-sort-support-for-point-gist_point_sortsupport.patch
Description: Binary data


v5-0002-Implement-GiST-build-using-sort-support.patch
Description: Binary data


Re: BUG #15858: could not stat file - over 4GB

2020-02-29 Thread Juan José Santamaría Flecha
On Sat, Feb 29, 2020 at 9:40 AM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

> On Sat, Feb 29, 2020 at 12:44 AM Tom Lane  wrote:
>
>>
>> The cfbot thinks this doesn't compile on Windows [1].  Looks like perhaps
>> a missing-#include problem?
>
>
> The define logic for _WIN32_WINNT includes testing of _MSC_VER, and is not
> a proper choice for MSVC 2013 as the cfbot is showing.
>

The cfbot is not happy yet. I will backtrack a bit on the cruft cleanup.

Please find attached a new version addressing this issue.

Regards,

Juan José Santamaría Flecha

>
>>
>


0001-Support-for-large-files-on-Win32-v7.patch
Description: Binary data


Re: proposal \gcsv

2020-02-29 Thread Pavel Stehule
so 29. 2. 2020 v 11:34 odesílatel Vik Fearing 
napsal:

> On 29/02/2020 06:43, Pavel Stehule wrote:
> > Hi
> >
> > I would to enhance \g command about variant \gcsv
> >
> > proposed command has same behave like \g, only the result will be every
> > time in csv format.
> >
> > It can helps with writing psql macros wrapping \g command.
> >
> > Options, notes?
>
> But then we would need \ghtml and \glatex etc.  If we want a shortcut
> for setting a one-off format, I would go for \gf or something.
>
> \gf csv
> \gf html
> \gf latex
>

usability of html or latex format in psql is significantly lower than csv
format. There is only one generic format for data - csv.

Regards

Pavel



> -1 on \gcsv
> --
> Vik Fearing
>


Re: proposal \gcsv

2020-02-29 Thread Vik Fearing
On 29/02/2020 06:43, Pavel Stehule wrote:
> Hi
> 
> I would to enhance \g command about variant \gcsv
> 
> proposed command has same behave like \g, only the result will be every
> time in csv format.
> 
> It can helps with writing psql macros wrapping \g command.
> 
> Options, notes?

But then we would need \ghtml and \glatex etc.  If we want a shortcut
for setting a one-off format, I would go for \gf or something.

\gf csv
\gf html
\gf latex

-1 on \gcsv
-- 
Vik Fearing




Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-02-29 Thread Michael Paquier
On Fri, Feb 28, 2020 at 07:23:38PM -0500, Tom Lane wrote:
> Will push that, thanks for looking.

Thanks for the commit.
--
Michael


signature.asc
Description: PGP signature


Re: proposal: schema variables

2020-02-29 Thread Pavel Stehule
pá 28. 2. 2020 v 16:30 odesílatel Pavel Stehule 
napsal:

>
>
> čt 27. 2. 2020 v 15:37 odesílatel Pavel Stehule 
> napsal:
>
>>
>> Hi
>>
>>
>>> 3) Any way to define CONSTANTs ?
>>> We already talked a bit about this subject and also Gilles Darold
>>> introduces it in this mailing-list topic but I'd like to insist on it.
>>> I think it would be nice to have a way to say that a variable should not
>>> be changed once defined.
>>> Maybe it's hard to implement and can be implemented later, but I just
>>> want to know if this concern is open.
>>>
>>
>> I played little bit with it and I didn't find any nice solution, but
>> maybe I found the solution. I had ideas about some variants, but almost all
>> time I had a problem with parser's shifts because all potential keywords
>> are not reserved.
>>
>> last variant, but maybe best is using keyword WITH
>>
>> So the syntax can looks like
>>
>> CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT
>> expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ]
>>
>> What do you think about this syntax? It doesn't need any new keyword, and
>> it easy to enhance it.
>>
>> CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT);
>>
>
> After some more thinking and because in other patch I support syntax
> CREATE TRANSACTION VARIABLE ... I change my opinion and implemented support
> for
> syntax CREATE IMMUTABLE VARIABLE for define constants.
>

second try to fix pg_dump

Regards

Pavel


>
> See attached patch
>
> Regards
>
> Pavel
>
>
>>
>> ?
>>
>> Regards
>>
>> Pavel
>>
>>
>>


schema-variables-20200229.patch.gz
Description: application/gzip


Broken resetting of subplan hash tables

2020-02-29 Thread Andreas Karlsson

Hi,

I looked again at one of the potential issues Ranier Vilela's static 
analysis found and after looking more at it I still think this one is a 
real bug. But my original patch was incorrect and introduced a use after 
free bug.


The code for resetting the hash tables of the SubPlanState node in 
buildSubPlanHash() looks like it can never run, and additionally it 
would be broken if it would ever run. This issue was introduced in 
commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd.


As far as I gather the following is true:

1. It sets node->hashtable and node->hashnulls to NULL a few lines 
before checking if they are not NULL which means the code for resetting 
them cannot ever be reached.


2. But if we changed to code so that the ResetTupleHashTable() calls are 
reachable we would hit a typo. It resets node->hashtable twice and never 
resets node->hashnulls which would cause non-obvious bugs.


3. Additionally since the memory context used by the hash tables is 
reset in buildSubPlanHash() if we start resetting hash tables we will 
get a use after free bug.


I have attached a patch which makes sure the code for resetting the hash 
tables is actually run while also fixing the code for resetting them.


Since the current behavior of the code in HEAD is not actually broken, 
it is just an optimization which is not used, this fix does not have to 
be backpatched.


Andreas
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index ff95317879..4d7306ff06 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -494,9 +494,6 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 	 * If it's not necessary to distinguish FALSE and UNKNOWN, then we don't
 	 * need to store subplan output rows that contain NULL.
 	 */
-	MemoryContextReset(node->hashtablecxt);
-	node->hashtable = NULL;
-	node->hashnulls = NULL;
 	node->havehashrows = false;
 	node->havenullrows = false;
 
@@ -533,7 +530,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 		}
 
 		if (node->hashnulls)
-			ResetTupleHashTable(node->hashtable);
+			ResetTupleHashTable(node->hashnulls);
 		else
 			node->hashnulls = BuildTupleHashTableExt(node->parent,
 	 node->descRight,
@@ -549,6 +546,8 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 	 node->hashtempcxt,
 	 false);
 	}
+	else
+		node->hashnulls = NULL;
 
 	/*
 	 * We are probably in a short-lived expression-evaluation context. Switch


Re: Parallel copy

2020-02-29 Thread Dilip Kumar
On Wed, Feb 26, 2020 at 8:47 PM Ants Aasma  wrote:
>
> On Tue, 25 Feb 2020 at 18:00, Tomas Vondra  
> wrote:
> > Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
> > so I still think we need to do some measurements first. I'm willing to
> > do that, but (a) I doubt I'll have time for that until after 2020-03,
> > and (b) it'd be good to agree on some set of typical CSV files.
>
> I agree that getting a nice varied dataset would be nice. Including
> things like narrow integer only tables, strings with newlines and
> escapes in them, extremely wide rows.
>
> I tried to capture a quick profile just to see what it looks like.
> Grabbed a random open data set from the web, about 800MB of narrow
> rows CSV [1].
>
> Script:
> CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count 
> text);
> COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true);
>
> Profile:
> # Samples: 59K of event 'cycles:u'
> # Event count (approx.): 57644269486
> #
> # Overhead  Command   Shared Object   Symbol
> #     ..
> ...
> #
> 18.24%  postgres  postgres[.] CopyReadLine
>  9.23%  postgres  postgres[.] NextCopyFrom
>  8.87%  postgres  postgres[.] NextCopyFromRawFields
>  5.82%  postgres  postgres[.] pg_verify_mbstr_len
>  5.45%  postgres  postgres[.] pg_strtoint32
>  4.16%  postgres  postgres[.] heap_fill_tuple
>  4.03%  postgres  postgres[.] heap_compute_data_size
>  3.83%  postgres  postgres[.] CopyFrom
>  3.78%  postgres  postgres[.] AllocSetAlloc
>  3.53%  postgres  postgres[.] heap_form_tuple
>  2.96%  postgres  postgres[.] InputFunctionCall
>  2.89%  postgres  libc-2.30.so[.] __memmove_avx_unaligned_erms
>  1.82%  postgres  libc-2.30.so[.] __strlen_avx2
>  1.72%  postgres  postgres[.] AllocSetReset
>  1.72%  postgres  postgres[.] RelationPutHeapTuple
>  1.47%  postgres  postgres[.] heap_prepare_insert
>  1.31%  postgres  postgres[.] heap_multi_insert
>  1.25%  postgres  postgres[.] textin
>  1.24%  postgres  postgres[.] int4in
>  1.05%  postgres  postgres[.] tts_buffer_heap_clear
>  0.85%  postgres  postgres[.] pg_any_to_server
>  0.80%  postgres  postgres[.] pg_comp_crc32c_sse42
>  0.77%  postgres  postgres[.] cstring_to_text_with_len
>  0.69%  postgres  postgres[.] AllocSetFree
>  0.60%  postgres  postgres[.] appendBinaryStringInfo
>  0.55%  postgres  postgres[.] 
> tts_buffer_heap_materialize.part.0
>  0.54%  postgres  postgres[.] palloc
>  0.54%  postgres  libc-2.30.so[.] __memmove_avx_unaligned
>  0.51%  postgres  postgres[.] palloc0
>  0.51%  postgres  postgres[.] pg_encoding_max_length
>  0.48%  postgres  postgres[.] enlargeStringInfo
>  0.47%  postgres  postgres[.] ExecStoreVirtualTuple
>  0.45%  postgres  postgres[.] PageAddItemExtended
>
> So that confirms that the parsing is a huge chunk of overhead with
> current splitting into lines being the largest portion. Amdahl's law
> says that splitting into tuples needs to be made fast before
> parallelizing makes any sense.
>

I have ran very simple case on table with 2 indexes and I can see a
lot of time is spent in index insertion.  I agree that there is a good
amount of time spent in tokanizing but it is not very huge compared to
index insertion.

I have expanded the time spent in the CopyFrom function from my perf
report and pasted here.  We can see that a lot of time is spent in
ExecInsertIndexTuples(77%).   I agree that we need to further evaluate
that out of which how much is I/O vs CPU operations.  But, the point I
want to make is that it's not true for all the cases that parsing is
taking maximum amout of time.

   - 99.50% CopyFrom
  - 82.90% CopyMultiInsertInfoFlush
 - 82.85% CopyMultiInsertBufferFlush
+ 77.68% ExecInsertIndexTuples
+ 3.74% table_multi_insert
+ 0.89% ExecClearTuple
  - 12.54% NextCopyFrom
 - 7.70% NextCopyFromRawFields
- 5.72% CopyReadLine
 3.96% CopyReadLineText
   + 1.49% pg_any_to_server
  1.86% CopyReadAttributesCSV
 + 3.68% InputFunctionCall
  + 2.11% ExecMaterializeSlot
  + 0.94% MemoryContextReset

My test:
-- Prepare:
CREATE TABLE t (a int, b int, c varchar);
insert into t select i,i, '' from
generate_series(1,1000) as i;
copy t to '/home/dilipkumar/a.csv'  WITH (FORMAT 'csv', HEADER true);
truncate table t;
create index idx on t(a);
create index idx1 on t(

Re: BUG #15858: could not stat file - over 4GB

2020-02-29 Thread Juan José Santamaría Flecha
On Sat, Feb 29, 2020 at 12:44 AM Tom Lane  wrote:

>
> The cfbot thinks this doesn't compile on Windows [1].  Looks like perhaps
> a missing-#include problem?


The define logic for _WIN32_WINNT includes testing of _MSC_VER, and is not
a proper choice for MSVC 2013 as the cfbot is showing.

Please find attached a new version addressing this issue.

Regards,

Juan José Santamaría Flecha

>
>


0001-Support-for-large-files-on-Win32-v6.patch
Description: Binary data