Re: [HACKERS] logical decoding of two-phase transactions

2021-01-08 Thread Amit Kapila
On Tue, Jan 5, 2021 at 4:26 PM Amit Kapila  wrote:
>
> On Tue, Jan 5, 2021 at 2:11 PM Ajin Cherian  wrote:
> >
> >
> > I've addressed the above comments and the patch is attached. I've
> > called it v36-0007.
> >
>
> Thanks, I have pushed this after minor wordsmithing.
>

The test case is failing on one of the build farm machines. See email
from Tom Lane [1]. The symptom clearly shows that we are decoding
empty xacts which can happen due to background activity by autovacuum.
I think we need a fix similar to what we have done in
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=82a0ba7707e010a29f5fe1a0020d963c82b8f1cb.

I'll try to reproduce and provide a fix for this later today or tomorrow.


[1] - https://www.postgresql.org/message-id/363512.1610171267%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.




Re: Single transaction in the tablesync worker?

2021-01-08 Thread Amit Kapila
On Fri, Jan 8, 2021 at 2:55 PM Peter Smith  wrote:
>
> On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie  wrote:
> >
>
> > 3.
> > +   /*
> > +* To build a slot name for the sync work, we are limited to 
> > NAMEDATALEN -
> > +* 1 characters.
> > +*
> > +* The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + 
> > '\0'). (It's
> > +* actually the NAMEDATALEN on the remote that matters, but this 
> > scheme
> > +* will also work reasonably if that is different.)
> > +*/
> > +   StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small");   /* 
> > for sanity */
> > +
> > +   syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
> >
> > The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
> > But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not 
> > NAMEDATALEN - 1.
> > Should we change the comment here?
> >
>
> The comment wording is a remnant from older code which had a
> differently format slot name.
> I think the comment is still valid, albeit maybe unnecessary since in
> the current code the tablesync slot
> name length is fixed. But I left the older comment here as a safety reminder
> in case some future change would want to modify the slot name. What do
> you think?
>

I find it quite confusing. The comments should reflect the latest
code. You can probably say in some form that the length of slotname
shouldn't exceed NAMEDATALEN because of remote node constraints on
slot name length. Also, probably the StaticAssert on NAMEDATALEN is
not required.

1.
+   
+Additional table synchronization slots are normally transient, created
+internally and dropped automatically when they are no longer needed.
+These table synchronization slots have generated names:
+   pg_%u_sync_%u (parameters:
Subscription oid, Table
relid)
+   

The last line seems too long. I think we are not strict for 80 char
limit in docs but it is good to be close to that, however, this
appears quite long.

2.
+ /*
+ * Cleanup any remaining tablesync resources.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+ char state;
+ XLogRecPtr statelsn;

I have already mentioned previously that let's not use this new style
of code (start using { to localize the scope of variables). I don't
know about others but I find it difficult to read such a code. You
might want to consider moving this whole block to a separate function.

3.
/*
+ * XXX - Should optimize this to avoid multiple
+ * connect/disconnect.
+ */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, );

I think it is better to avoid multiple connect/disconnect here. In
this same function, we have connected to the publisher, we should be
able to use the same connection.

4.
process_syncing_tables_for_sync()
{
..
+ /*
+ * Cleanup the tablesync slot.
+ */
+ syncslotname = ReplicationSlotNameForTablesync(
+MySubscription->oid,
+MyLogicalRepWorker->relid);
+ PG_TRY();
+ {
+ elog(DEBUG1, "process_syncing_tables_for_sync: dropping the
tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_FINALLY();
+ {
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
..
}

Both here and in DropSubscription(), it seems we are using
PG_TRY..PG_FINALLY just to free the memory even though
ReplicationSlotDropAtPubNode already has try..finally. Can we arrange
code to move allocation of syncslotname inside
ReplicationSlotDropAtPubNode to avoid additional try..finaly? BTW, if
the usage of try..finally here is only to free the memory, I am not
sure if it is required because I think we will anyway Reset the memory
context where this memory is allocated as part of error handling.

5.
 #define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
  * NULL) */
+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+ * (sublsn NULL) */
 #define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
  * apply (sublsn set) */

I am not very happy with the new state name SUBREL_STATE_TCOPYDONE as
it is quite different from other adjoining state names and somehow not
going well with the code. How about SUBREL_STATE_ENDCOPY 'e' or
SUBREL_STATE_FINISHEDCOPY 'f'?

-- 
With Regards,
Amit Kapila.




Re: logical replication worker accesses catalogs in error context callback

2021-01-08 Thread Bharath Rupireddy
On Thu, Jan 7, 2021 at 5:47 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 6, 2021 at 11:42 AM Masahiko Sawada  wrote:
> >
> > On Wed, Jan 6, 2021 at 11:02 AM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > Due to a debug ereport I just noticed that worker.c's
> > > slot_store_error_callback is doing something quite dangerous:
> > >
> > > static void
> > > slot_store_error_callback(void *arg)
> > > {
> > > SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
> > > LogicalRepRelMapEntry *rel;
> > > char   *remotetypname;
> > > Oid remotetypoid,
> > > localtypoid;
> > >
> > > /* Nothing to do if remote attribute number is not set */
> > > if (errarg->remote_attnum < 0)
> > > return;
> > >
> > > rel = errarg->rel;
> > > remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
> > >
> > > /* Fetch remote type name from the LogicalRepTypMap cache */
> > > remotetypname = logicalrep_typmap_gettypname(remotetypoid);
> > >
> > > /* Fetch local type OID from the local sys cache */
> > > localtypoid = get_atttype(rel->localreloid, errarg->local_attnum 
> > > + 1);
> > >
> > > errcontext("processing remote data for replication target 
> > > relation \"%s.%s\" column \"%s\", "
> > >"remote type %s, local type %s",
> > >rel->remoterel.nspname, rel->remoterel.relname,
> > >rel->remoterel.attnames[errarg->remote_attnum],
> > >remotetypname,
> > >format_type_be(localtypoid));
> > > }
> > >
> > >
> > > that's not code that can run in an error context callback. It's
> > > theoretically possible (but unlikely) that
> > > logicalrep_typmap_gettypname() is safe to run in an error context
> > > callback. But get_atttype() definitely isn't.
> > >
> > > get_attype() may do catalog accesses. That definitely can't happen
> > > inside an error context callback - the entire transaction might be
> > > borked at this point!
> >
> > You're right. Perhaps calling to format_type_be() is also dangerous
> > since it does catalog access. We should have added the local type
> > names to SlotErrCallbackArg so we avoid catalog access in the error
> > context.
> >
> > I'll try to fix this.
>
> Attached the patch that fixes this issue.
>
> Since logicalrep_typmap_gettypname() could search the sys cache by
> calling to format_type_be(), I stored both local and remote type names
> to SlotErrCallbackArg so that we can just set the names in the error
> callback without sys cache lookup.
>
> Please review it.

Patch looks good, except one minor comment - can we store
rel->remoterel.atttyps[remoteattnum] into a local variable and use it
in logicalrep_typmap_gettypname, just to save on indentation?

I quickly searched in places where error callbacks are being used, I
think we need a similar kind of fix for conversion_error_callback in
postgres_fdw.c, because get_attname and get_rel_name are being used
which do catalogue lookups. ISTM that all the other error callbacks
are good in the sense that they are not doing sys catalogue lookups.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Simple progress reporting for COPY command

2021-01-08 Thread Bharath Rupireddy
On Fri, Jan 8, 2021 at 7:00 PM Matthias van de Meent
 wrote:
> On Thu, 7 Jan 2021 at 23:00, Josef Šimánek  wrote:
> >
> > čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
> >  napsal:
> > >
> > > I'm not particularly attached to the "lines" naming, it just seemed OK
> > > to me. So if there's consensus to rename this somehow, I'm OK with it.
> >
> > The problem I do see here is it depends on the "way" of COPY. If
> > you're copying from CSV file to table, those are actually lines (since
> > 1 line = 1 tuple). But copying from DB to file is copying tuples (but
> > 1 tuple = 1 file line). Line works better here for me personally.
> >
> > Once I'll fix the problem with triggers (and also another cases if
> > found), I think we can consider it lines. It will represent amount of
> > lines processed from file on COPY FROM and amount of lines written to
> > file in COPY TO form (at least in CSV format). I'm not sure how BINARY
> > format works, I'll check.
>
> Counterexample that 1 tuple need not be 1 line, in csv/binary:

Yes in binary format file, 1 tuple is definitely not 1 line.

> /*
>  * create a table with one tuple containing 1 text field, which consists of
>  * 10 newline characters.
>  * If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n).
>  */
> # CREATE TABLE ttab (val) AS
>   SELECT * FROM (values (
> repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text
>   )) as v;
>
> # -- indeed, one unix-style line, according to $ wc -l copy.txt
> # COPY ttab TO 'copy.txt' (format text);
> COPY 1
> # TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text);
> COPY 1
>
> # -- 11 lines
> # COPY ttab TO 'copy.csv' (format csv);
> COPY 1
> # TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv);
> COPY 1
>
> # -- 13 lines
> # COPY ttab TO 'copy.bin' (format binary);
> COPY 1
> # TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary);
> COPY 1
>
> All of the above copy statements would only report 'lines_processed = 1',
> in the progress reporting, while csv/binary line counts are definatively
> inconsistent with what the progress reporting shows, because progress
> reporting counts tuples / table rows, not the amount of lines in the
> external file.

I agree with that. +1 to change the name of 'lines_processed' column
to 'tuples_processed'. So, the meaning of 'tuples_processed' can be -
for COPY FROM, it's the number of tuples that are written to the
target table, for COPY TO, it's the number of tuples from the source
table or source plan that are written out to the file/stdin.

IMHO, it's good to have at least the following info: the type of
command COPY TO or COPY FROM, the format CSV,BINARY,TEXT and from
where the input comes from or output goes to - FILE, STDIN, STDOUT or
PROGRAM. Otherwise users will have to join pg_stat_progress_copy view
with other views to get the type of copy command or the entire query.
That's sometimes cumbersome to figure out what's the other view name
and write join queries.

Another point related to documentation, I see that the
pg_stat_progress_copy view details are added to monitoring.sgml, but I
didn't find any mention of it in the copy.sgml. For instance, we have
pg_stat_progress_basebackup documentation both in monitoring.sgml and
a mention of it and link to it in pg_basebackup.sgml. Usually, users
will look at copy documentation to find any kind of information
related to it, so better we have it mentioned there as well.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-01-08 Thread japin


On Sat, 09 Jan 2021 at 09:38, Bharath Rupireddy wrote:
> On Fri, Jan 8, 2021 at 9:50 PM japin  wrote:
>> > Attaching v3 patches, please consider these for further review.
>> >
>>
>> I find that both the declaration and definition of 
>> match_matview_with_new_data()
>> have a tab between type and variable.  We can use pgindent to fix it.
>> What do you think?
>>
>>
>> static void
>> match_matview_with_new_data(RefreshMatViewStmt *stmt, Relation  matviewRel,
>>   ^
>> Oid matviewOid, Oid OIDNewHeap, Oid relowner,
>> int save_sec_context, char relpersistence,
>> uint64  processed)
>>   ^
>
> I ran pgindent on 0001 patch to fix the above. 0002 patch has no
> changes. If I'm correct, pgindent will be run periodically on master.
>

Thanks for your point out. I don't know before.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Improper use about DatumGetInt32

2021-01-08 Thread Michael Paquier
On Fri, Jan 08, 2021 at 04:54:47PM +0100, Peter Eisentraut wrote:
> Updated patch that does that.

Thanks.  Looks sane seen from here.

+/* LCOV_EXCL_START */
Does it really make sense to add those markers here?  It seems to me
that we would ignore any new coverage if regression tests based on
older versions are added (we may really want to have such tests for
more in-core extensions to be able to verify the portability of an
extension, but that's not the job of this patch of course).

-   elog(ERROR, "block 0 is a meta page");
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("block 0 is a meta page")));
[...]
+errmsg("block number %llu is out of range for relation \"%s\"",
This does not follow the usual style for error reports that should not
be written as full sentences?  Maybe something like "invalid block
number %u referring to meta page" and "block number out of range for
relation %s: %llu"?
--
Michael


signature.asc
Description: PGP signature


Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-01-08 Thread Bharath Rupireddy
On Fri, Jan 8, 2021 at 9:50 PM japin  wrote:
> > Attaching v3 patches, please consider these for further review.
> >
>
> I find that both the declaration and definition of 
> match_matview_with_new_data()
> have a tab between type and variable.  We can use pgindent to fix it.
> What do you think?
>
>
> static void
> match_matview_with_new_data(RefreshMatViewStmt *stmt, Relation  matviewRel,
>   ^
> Oid matviewOid, Oid OIDNewHeap, Oid relowner,
> int save_sec_context, char relpersistence,
> uint64  processed)
>   ^

I ran pgindent on 0001 patch to fix the above. 0002 patch has no
changes. If I'm correct, pgindent will be run periodically on master.

Attaching v4 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v4-0001-Rearrange-Refresh-Mat-View-Code.patch
Description: Binary data


v4-0002-EXPLAIN-EXPLAIN-ANALYZE-REFRESH-MATERIALIZED-VIEW.patch
Description: Binary data


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-08 Thread Bharath Rupireddy
On Fri, Jan 8, 2021 at 9:55 AM Bharath Rupireddy
 wrote:
> I will make the changes and post a new patch set soon.

Attaching v9 patch set that has addressed the review comments on the
disconnect function returning setof records, documentation changes,
and postgres_fdw--1.0-1.1.sql changes.

Please consider the v9 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v9-0001-postgres_fdw-function-to-discard-cached-connectio.patch
Description: Binary data


v9-0002-postgres_fdw-add-keep_connections-GUC-to-not-cach.patch
Description: Binary data


v9-0003-postgres_fdw-server-level-option-keep_connection-.patch
Description: Binary data


Re: Add table access method as an option to pgbench

2021-01-08 Thread David Zhang

tablespace is an extraneous word ?


Thanks a lot for pointing this out. I will fix it in next patch once get all 
issues clarified.


On Sun, Dec 27, 2020 at 09:14:53AM -0400, Fabien COELHO wrote:

src/test/regress/sql/create_am.sql:CREATE ACCESS METHOD heap2 TYPE TABLE 
HANDLER heap_tableam_handler;
...
src/test/regress/sql/create_am.sql:DROP ACCESS METHOD heap2;


Do you suggest to add a new positive test case by using table access method 
*heap2* created using the existing heap_tableam_handler?


If you found pre-existing inconsistencies/errors/deficiencies, I'd suggest to
fix them in a separate patch.  Typically those are small, and you can make them
0001 patch.  Errors might be worth backpatching, so in addition to making the
"feature" patches small, it's good if they're separate.



Like writing NAME vs TABLESPACE.  Or escape vs escapes.

I will provide a separate small patch when fixing the *tablespace* typo mention 
above. Can you help to clarity which releases or branches need to be back 
patched?


Or maybe using SET default_tablespace instead of modifying the CREATE sql.
That'd need to be done separately for indexes, and RESET after creating the
tables, to avoid accidentally affecting indexes, too.

Why should it not affect indexes?

I rarely use pgbench, and probably never looked at its source before, but I saw
that it has a separate --tablespace and --index-tablespace, and that
--tablespace is *not* the default for indexes.

So if we changed it to use SET default_tablespace instead of amending the DDL
sql, we'd need to make sure the SET applied only to the intended CREATE
command, and not all following create commands.  In the case that
--index-tablespace is not specified, it would be buggy to do this:

SET default_tablespace=foo;
CREATE TABLE ...
CREATE INDEX ...
CREATE TABLE ...
CREATE INDEX ...
...


If this `tablespace` issue also applies to `table-access-method`, yes, I agree 
it could be buggy too. But, the purpose of this patch is to provide an option 
for end user to test a newly created table access method. If the the *new* 
table access method hasn't fully implemented all the interfaces yet, then no 
matter the end user use the option provided by this patch or the GUC parameter, 
the results will be the same.

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca


Re: PoC/WIP: Extended statistics on expressions

2021-01-08 Thread Tomas Vondra
On 1/8/21 3:35 AM, Justin Pryzby wrote:
> On Fri, Jan 08, 2021 at 01:57:29AM +0100, Tomas Vondra wrote:
>> Attached is a patch fixing most of the issues. There are a couple
>> exceptions:
> 
> In the docs:
> 
> ...
>

Thanks! Checking the docs and comments is a tedious work, I appreciate
you going through all that. I'll fix that in the next version.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: list of extended statistics on psql

2021-01-08 Thread Tomas Vondra


On 1/8/21 1:14 AM, Tomas Vondra wrote:
> 
> 
> On 1/8/21 12:52 AM, Tatsuro Yamada wrote:
>> Hi,
>>
>> On 2021/01/08 0:56, Tomas Vondra wrote:
>>> On 1/7/21 3:47 PM, Alvaro Herrera wrote:
 On 2021-Jan-07, Tomas Vondra wrote:

> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:

>> I overlooked the check for MCV in the logic building query
>> because I created the patch as a new feature on PG14.
>> I'm not sure whether we should do back patch or not. However, I'll
>> add the check on the next patch because it is useful if you decide to
>> do the back patch on PG10, 11, 12, and 13.
>
> BTW perhaps a quick look at the other \d commands would show if
> there are
> precedents. I didn't have time for that.

 Yes, we do promise that new psql works with older servers.

>>>
>>> Yeah, makes sense. That means we need add the check for 12 / MCV.
>>
>>
>> Ah, I got it.
>> I fixed the patch to work with older servers to add the checking
>> versions. And I tested \dX command on older servers (PG10 - 13).
>> These results look fine.
>>
>> 0001:
>>   Added the check code to handle pre-PG12. It has not MCV and
>>    pg_statistic_ext_data.
>> 0002:
>>   This patch is the same as the previous patch (not changed).
>>
>> Please find the attached files.
>>
> 
> OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to
> squash the patches into a single commit.
> 

Attached is a patch I plan to commit - 0001 is the last submitted
version with a couple minor tweaks, mostly in docs/comments, and small
rework of branching to be more like the other functions in describe.c.

While working on that, I realized that 'defined' might be a bit
ambiguous, I initially thought it means 'NOT NULL' (which it does not).
I propose to change it to 'requested' instead. Tatsuro, do you agree, or
do you think 'defined' is better?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 3d2f4ef2ecba9fd7987df665237add6fc4ec03c1 Mon Sep 17 00:00:00 2001
From: Tatsuro Yamada 
Date: Thu, 7 Jan 2021 14:28:20 +0900
Subject: [PATCH 1/2] psql \dX: list extended statistics objects

The new command lists extended statistics objects, possibly with their
sizes. All past releases with extended statistics are supported.

Author: Tatsuro Yamada
Reviewed-by: Julien Rouhaud, Alvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1
---
 doc/src/sgml/ref/psql-ref.sgml  |  14 +++
 src/bin/psql/command.c  |   3 +
 src/bin/psql/describe.c | 150 
 src/bin/psql/describe.h |   3 +
 src/bin/psql/help.c |   1 +
 src/bin/psql/tab-complete.c |   4 +-
 src/test/regress/expected/stats_ext.out |  94 +++
 src/test/regress/sql/stats_ext.sql  |  31 +
 8 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..d01acc92b8 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1918,6 +1918,20 @@ testdb=
 
 
   
+  
+  
+\dX [ pattern ]
+
+
+Lists extended statistics.
+If pattern
+is specified, only those extended statistics whose names match the
+pattern are listed.
+If + is appended to the command name, each extended
+statistics is listed with its size.
+
+
+  
 
   
 \dy[+] [ pattern ]
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 303e7c3ad8..c5ebc1c3f4 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -928,6 +928,9 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 else
 	success = listExtensions(pattern);
 break;
+			case 'X':			/* Extended Statistics */
+success = listExtendedStats(pattern, show_verbose);
+break;
 			case 'y':			/* Event Triggers */
 success = listEventTriggers(pattern, show_verbose);
 break;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index caf97563f4..46f54199fb 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4392,6 +4392,156 @@ listEventTriggers(const char *pattern, bool verbose)
 	return true;
 }
 
+/*
+ * \dX
+ *
+ * Describes extended statistics.
+ */
+bool
+listExtendedStats(const char *pattern, bool verbose)
+{
+	PQExpBufferData buf;
+	PGresult   *res;
+	printQueryOpt myopt = pset.popt;
+
+	if (pset.sversion < 10)
+	{
+		char		sverbuf[32];
+
+		pg_log_error("The server (version %s) does not support extended statistics.",
+	 formatPGVersionNumber(pset.sversion, false,
+		   sverbuf, sizeof(sverbuf)));
+		return true;
+	}
+
+	initPQExpBuffer();
+	printfPQExpBuffer(,
+	  "SELECT \n"
+	  "es.stxnamespace::pg_catalog.regnamespace::text AS 

Re: Enhance traceability of wal_level changes for backup management

2021-01-08 Thread Stephen Frost
Greetings,

* osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> On Thursday, January 7, 2021 2:40 AM Stephen Frost  wrote:
> > * osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> > > You said
> > > > The use case I imagined is that the user temporarily changes
> > > > wal_level to 'none' from 'replica' or 'logical' to speed up loading
> > > > and changes back to the normal. In this case, the backups taken
> > > > before the wal_level change cannot be used to restore the database
> > > > to the point after the wal_level change. So I think backup
> > > > management tools would want to recognize the time or LSN when/where
> > > > wal_level is changed to ‘none’ in order to do some actions such as
> > > > invalidating older backups, re-calculating backup redundancy etc.
> > > > Actually the same is true when the user changes to ‘minimal’. The
> > > > tools would need to recognize the time or LSN in this case too.
> > > > Since temporarily changing wal_level has been an uncommon use case
> > > > some tools perhaps are not aware of that yet. But since introducing
> > > > wal_level = ’none’ could make the change common, I think we need to
> > provide a way for the tools.
> > 
> > I continue to be against the idea of introducing another wal_level.  If 
> > there's
> > additional things we can do to reduce WAL traffic while we continue to use 
> > it to
> > keep the system in a consistent state then we should implement those for the
> > 'minimal' case and be done with it.

> The new wal_level=none patch achieves something that cannot be done
> or cannot be implemented together with wal_level='minimal' clearly.

I disagree.

> Did you have a look at the peformance evaluation result that I conducted in 
> [1] ?
> It proved that data loading of 'none' is much faster than that of 'minimal'.

That test claims to have generated 10G worth of WAL, which makes it seem
pretty likely that you didn't use UNLOGGED tables, and didn't create the
table in the same transaction as you performed the data loading for that
table, so, naturally, you got the full amount of WAL.

> > Adding another wal_level is just going to be confusing and increase 
> > complexity,
> > and the chances that someone will end up in a bad state.
> Even if when we committed another idea,
> that is "ALTER TABLE tbl SET UNLOGGED/LOGGED without copying relation data",
> the complexity like taking a full backup before bulk data loading didn't 
> change
> and when any accidents happened during no wal logging for specific table with 
> the improvement,
> user would need to start from the backup again. This looks same to me.

Yes, there is still the issue that running with wal_level = minimal for
a period of time means that you can't perform PITR from before the
change to minimal through to the time after it's been changed back to
something higher than minimal.  That's no different, I agree.  That
isn't an argument to introduce another WAL level though.

> Additionally, the patch itself in that thread is big and more complicated.

This isn't a reason to avoid introducing another wal_level.

> The complexity you meant is the wal_level's impact to backup management tools 
> or anything else ?

The complexity of having another wal_level and having to document it and
explain how it behaves to users, particularly when, effectively, it
shouldn't actually be any different from wal_level = minimal, assuming
we've found and implemented the interesting optimizations regarding
wal_level = minimal.

> > > I wondered, couldn't backup management tools utilize the information
> > > in the backup that is needed to be made when wal_level is changed to 
> > > "none"
> > for example ?
> > 
> > What information is that, exactly?  If there's a way to detect that the 
> > wal_level
> > has been flipped to 'minimal' and then back to 'replica', other than 
> > scanning the
> > WAL, we'd certainly like to hear of it, so we can implement logic in 
> > pgbackrest
> > to detect that happening.  I'll admit that I've not gone hunting for such 
> > of late,
> > but I don't recall seeing anything that would change that either.
> Excuse me for making you confused.
> I was thinking about control file in the backup as information.
> 
> I'm not familiar with the internals of those backup management tools
> but do they monitor the control file and its values of the runnning server at 
> short intervals ?
> And, if they don't do so and if we want accurate time or LSN that indicates 
> wal_level changes,
> I thought we could pick up exact information from control file of cluster 
> directory or its backup
> during server stop (because changing wal_level requires server stop).
> That's all. Sorry for the noise.

If it's in the control file then pg_controldata should be able to pull
it ount and all of the backup tools should be able to manage that, even
if they can't read it directly like tools which are also written in C,
so having it in the control file seems alright to 

Re: Key management with tests

2021-01-08 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Jan  8, 2021 at 03:33:44PM -0500, Stephen Frost wrote:
> > > Anyway, I think we need to figure out how to trim.  The first part would
> > > be to figure out whether we need 128 _and_ 256-bit tests, and then see
> > > what items are really useful.  Stephen, do you have any ideas on that?
> > > We currently have 10296 tests, and I think we could get away with 100.
> > 
> > Yeah, it's probably still too much, but I don't have any particularly
> > justifiable suggestions as to exactly what we should remove or what we
> > should keep.
> > 
> > Perhaps it'd make sense to try and cover the cases that are more likely
> > to be issues between our wrapper functions and OpenSSL, and not stress
> > too much about constantly testing cases that should really be up to
> > OpenSSL.  As such, I'd propose:
> > 
> > - Add back in some 192-bit tests, so we cover all three bit lengths.
> > - Add back in some additional authenticated test cases, just to make
> >   sure that, until/unless we implement support, the test code properly
> >   skips over those.
> > - Keep tests for various length plaintext/ciphertext (including 0-byte
> >   cases, so we make sure those work, since they really should).
> > - Keep at least one test for each length of tag that's included in the
> >   test suite.
> 
> Makes sense.  I did a simplistic trim-down to 90 tests but it still was
> 40% of the patch;  attached.  The hex strings are very long.

I don't think we actually need to stress over the size of the test data
relative to the size of the patch- it's not like it's all that much perl
code.  I can appreciate that we don't want to add megabytes worth of
test data to the git repo though.

> > I'm not sure how many tests we'd end up with from that, but my swag /
> > gut feeling is that it'd probably be on the order of 100ish and a small
> > enough set that it won't dwarf the rest of the patch.
> > 
> > Would be nice if we had a way for some buildfarm animal or something to
> > pull in the entire suite and test it, imv..  If anyone wants to
> > volunteer, I'd be happy to explain how to make that happen (it's not
> > hard though- download/unzip the files, drop them in the directory,
> > update the test script to add all the files into the array).
> 
> Yes, do we have a place to store more comprehensive tests outside of our
> git tree?   Has this been done before?

Not that I'm aware of.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: WIP: System Versioned Temporal Table

2021-01-08 Thread Ryan Lambert
On Fri, Jan 8, 2021 at 11:38 AM Simon Riggs 
wrote:

> On Fri, Jan 8, 2021 at 4:50 PM Ryan Lambert 
> wrote:
> >
> > On Fri, Jan 8, 2021 at 5:34 AM Simon Riggs 
> wrote:
> >>
> >> On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs <
> simon.ri...@enterprisedb.com> wrote:
> >> >
> >> > On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs <
> simon.ri...@enterprisedb.com> wrote:
> >> >
> >> > > I've minimally rebased the patch to current head so that it compiles
> >> > > and passes current make check.
> >> >
> >> > Full version attached
> >>
> >> New version attached with improved error messages, some additional
> >> docs and a review of tests.
> >>
> >
> > The v10 patch fails to make on the current master branch (15b824da).
> Error:
>
> Updated v11 with additional docs and some rewording of messages/tests
> to use "system versioning" correctly.
>
> No changes on the points previously raised.
>
> --
> Simon Riggshttp://www.EnterpriseDB.com/



Thank you!  The v11 applies and installs.  I tried a simple test,
unfortunately it appears the versioning is not working. The initial value
is not preserved through an update and a new row does not appear to be
created.

CREATE TABLE t
(
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
v BIGINT NOT NULL
)
WITH SYSTEM VERSIONING
;

Verify start/end time columns created.

t=# \d t
Table "public.t"
  Column   |   Type   | Collation | Nullable |
Default
---+--+---+--+--
 id| bigint   |   | not null | generated by
default as identity
 v | bigint   |   | not null |
 StartTime | timestamp with time zone |   | not null | generated
always as row start
 EndTime   | timestamp with time zone |   | not null | generated
always as row end
Indexes:
"t_pkey" PRIMARY KEY, btree (id, "EndTime")


Add a row and check the timestamps set as expected.


INSERT INTO t (v) VALUES (1);

 SELECT * FROM t;
 id | v |   StartTime   | EndTime
+---+---+--
  1 | 1 | 2021-01-08 20:56:20.848097+00 | infinity

Update the row.

UPDATE t SET v = -1;

The value for v updated but StartTime is the same.


SELECT * FROM t;
 id | v  |   StartTime   | EndTime
++---+--
  1 | -1 | 2021-01-08 20:56:20.848097+00 | infinity


Querying the table for all versions only returns the single updated row (v
= -1) with the original row StartTime.  The original value has disappeared
entirely it seems.

SELECT * FROM t
FOR SYSTEM_TIME FROM '-infinity' TO 'infinity';


I also created a non-versioned table and later added the columns using
ALTER TABLE and encountered the same behavior.


Ryan Lambert


Re: Key management with tests

2021-01-08 Thread Bruce Momjian
On Fri, Jan  8, 2021 at 03:34:23PM -0500, Stephen Frost wrote:
> > All the tests pass now.  The current src/test directory is 19MB, and
> > adding these tests takes it to 23MB, or a 20% increase.  That seems like
> > a lot.  It is testing 128-bit and 256-bit keys --- should we do fewer
> > tests, or just test 256, or use gzip to compress the tests by 50%? 
> > (Does every platform have gzip?)
> 
> Thanks a lot for working on this and figuring out what the issue was and
> fixing it!  That's great that we got all those cases passing for you
> too.

Yes, I was relieved.  The pattern of when zero-length strings fail in
which modes is still very odd, but at least it reports an error, so it
isn't returning incorrect data.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Key management with tests

2021-01-08 Thread Bruce Momjian
On Fri, Jan  8, 2021 at 03:33:44PM -0500, Stephen Frost wrote:
> > No, I don't think so.  Stephen imported the entire NIST test suite.  It
> > was so comperhensive, it detected several OpenSSL bugs for zero-length
> > strings, which I already reported, but we would never be encrypting
> > zero-length strings, so there wasn't a lot of value to it.
> 
> I ran the entire test suite locally to ensure everything worked, but I
> didn't actually include all of it in the PR which you merged- I had
> already reduced it quite a bit by removing all 'additional
> authenticated data' test cases (which the tests will automatically skip
> and which we haven't implemented support for in the common library
> wrappers) and by removing the 192-bit cases.  This reduced the overall
> test set by about 2/3rd's or so, as I recall.

Wow, so that was reduced!

> > Anyway, I think we need to figure out how to trim.  The first part would
> > be to figure out whether we need 128 _and_ 256-bit tests, and then see
> > what items are really useful.  Stephen, do you have any ideas on that?
> > We currently have 10296 tests, and I think we could get away with 100.
> 
> Yeah, it's probably still too much, but I don't have any particularly
> justifiable suggestions as to exactly what we should remove or what we
> should keep.
> 
> Perhaps it'd make sense to try and cover the cases that are more likely
> to be issues between our wrapper functions and OpenSSL, and not stress
> too much about constantly testing cases that should really be up to
> OpenSSL.  As such, I'd propose:
> 
> - Add back in some 192-bit tests, so we cover all three bit lengths.
> - Add back in some additional authenticated test cases, just to make
>   sure that, until/unless we implement support, the test code properly
>   skips over those.
> - Keep tests for various length plaintext/ciphertext (including 0-byte
>   cases, so we make sure those work, since they really should).
> - Keep at least one test for each length of tag that's included in the
>   test suite.

Makes sense.  I did a simplistic trim-down to 90 tests but it still was
40% of the patch;  attached.  The hex strings are very long.

> I'm not sure how many tests we'd end up with from that, but my swag /
> gut feeling is that it'd probably be on the order of 100ish and a small
> enough set that it won't dwarf the rest of the patch.
> 
> Would be nice if we had a way for some buildfarm animal or something to
> pull in the entire suite and test it, imv..  If anyone wants to
> volunteer, I'd be happy to explain how to make that happen (it's not
> hard though- download/unzip the files, drop them in the directory,
> update the test script to add all the files into the array).

Yes, do we have a place to store more comprehensive tests outside of our
git tree?   Has this been done before?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



cryptotest.tgz
Description: application/gtar-compressed


Re: Key management with tests

2021-01-08 Thread Stephen Frost
Greetings Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Jan  1, 2021 at 01:07:50AM -0500, Bruce Momjian wrote:
> > On Thu, Dec 31, 2020 at 11:50:47PM -0500, Bruce Momjian wrote:
> > > I have completed the key management patch with tests created by Stephen
> > > Frost.  Original patch by Masahiko Sawada.  It requires the hex
> > > reorganization patch first.  The key patch is now 2.1MB because of the
> > > tests, so attaching it here seems unwise:
> > > 
> > >   https://github.com/postgres/postgres/compare/master...bmomjian:hex.diff
> > >   https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> > > 
> > > I will add it to the commitfest.  I think we need to figure out how much
> > > of the tests we want to add.
> > 
> > I am getting regression test errors using OpenSSL 1.1.1d  10 Sep 2019
> > with zero-length input data (no -p), while Stephen is able for those
> > tests to pass.   This needs more research, plus I think higher-level
> > tests.
> 
> I have found the cause of the failure, which I added as a C comment:
> 
> /*
>  * OpenSSL 1.1.1d and earlier crashes on some zero-length plaintext
>  * and ciphertext strings.  It crashes on an encryption call to
>  * EVP_EncryptFinal_ex(() in GCM mode of zero-length strings if
>  * plaintext is NULL, even though plaintext_len is zero.  Setting
>  * plaintext to non-NULL allows it to work.  In KW/KWP mode,
>  * zero-length strings fail if plaintext_len = 0 and plaintext is
>  * non-NULL (the opposite).  OpenSSL 1.1.1e+ is fine with all options.
>  */
> else if (cipher == PG_CIPHER_AES_GCM)
> {
> plaintext_len = 0;
> plaintext = pg_malloc0(1);
> }
> 
> All the tests pass now.  The current src/test directory is 19MB, and
> adding these tests takes it to 23MB, or a 20% increase.  That seems like
> a lot.  It is testing 128-bit and 256-bit keys --- should we do fewer
> tests, or just test 256, or use gzip to compress the tests by 50%? 
> (Does every platform have gzip?)

Thanks a lot for working on this and figuring out what the issue was and
fixing it!  That's great that we got all those cases passing for you
too.

Thanks again,

Stephen


signature.asc
Description: PGP signature


Re: Key management with tests

2021-01-08 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Jan  7, 2021 at 04:08:49PM -0300, Álvaro Herrera wrote:
> > On 2021-Jan-07, Bruce Momjian wrote:
> > 
> > > All the tests pass now.  The current src/test directory is 19MB, and
> > > adding these tests takes it to 23MB, or a 20% increase.  That seems like
> > > a lot.  It is testing 128-bit and 256-bit keys --- should we do fewer
> > > tests, or just test 256, or use gzip to compress the tests by 50%? 
> > > (Does every platform have gzip?)
> > 
> > So the tests are about 95% of the patch ... do we really need that many
> > tests?
> 
> No, I don't think so.  Stephen imported the entire NIST test suite.  It
> was so comperhensive, it detected several OpenSSL bugs for zero-length
> strings, which I already reported, but we would never be encrypting
> zero-length strings, so there wasn't a lot of value to it.

I ran the entire test suite locally to ensure everything worked, but I
didn't actually include all of it in the PR which you merged- I had
already reduced it quite a bit by removing all 'additional
authenticated data' test cases (which the tests will automatically skip
and which we haven't implemented support for in the common library
wrappers) and by removing the 192-bit cases.  This reduced the overall
test set by about 2/3rd's or so, as I recall.

> Anyway, I think we need to figure out how to trim.  The first part would
> be to figure out whether we need 128 _and_ 256-bit tests, and then see
> what items are really useful.  Stephen, do you have any ideas on that?
> We currently have 10296 tests, and I think we could get away with 100.

Yeah, it's probably still too much, but I don't have any particularly
justifiable suggestions as to exactly what we should remove or what we
should keep.

Perhaps it'd make sense to try and cover the cases that are more likely
to be issues between our wrapper functions and OpenSSL, and not stress
too much about constantly testing cases that should really be up to
OpenSSL.  As such, I'd propose:

- Add back in some 192-bit tests, so we cover all three bit lengths.
- Add back in some additional authenticated test cases, just to make
  sure that, until/unless we implement support, the test code properly
  skips over those.
- Keep tests for various length plaintext/ciphertext (including 0-byte
  cases, so we make sure those work, since they really should).
- Keep at least one test for each length of tag that's included in the
  test suite.

I'm not sure how many tests we'd end up with from that, but my swag /
gut feeling is that it'd probably be on the order of 100ish and a small
enough set that it won't dwarf the rest of the patch.

Would be nice if we had a way for some buildfarm animal or something to
pull in the entire suite and test it, imv..  If anyone wants to
volunteer, I'd be happy to explain how to make that happen (it's not
hard though- download/unzip the files, drop them in the directory,
update the test script to add all the files into the array).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: A failure of standby to follow timeline switch

2021-01-08 Thread Alvaro Herrera
Masao-san: Are you intending to act as committer for these?  Since the
bug is mine I can look into it, but since you already did all the
reviewing work, I'm good with you giving it the final push.

0001 looks good to me; let's get that one committed quickly so that we
can focus on the interesting stuff.  While the implementation of
find_in_log is quite dumb (not this patch's fault), it seems sufficient
to deal with small log files.  We can improve the implementation later,
if needed, but we have to get the API right on the first try.

0003: The fix looks good to me.  I verified that the test fails without
the fix, and it passes with the fix.


The test added in 0002 is a bit optimistic regarding timing, as well as
potentially slow; it loops 1000 times and sleeps 100 milliseconds each
time.  In a very slow server (valgrind or clobber_cache animals) this
could not be sufficient time, while on fast servers it may end up
waiting longer than needed.  Maybe we can do something like this:

for (my $i = 0 ; $i < 1000; $i++)
{
my $current_log_size = determine_current_log_size()

if ($node_standby_3->find_in_log(
"requested WAL segment [0-9A-F]+ has already been 
removed",
$logstart))
{
last;
}
elsif ($node_standby_3->find_in_log(
"End of WAL reached on timeline",
   $logstart))
{
$success = 1;
last;
}
$logstart = $current_log_size;

while (determine_current_log_size() == current_log_size)
{
usleep(10_000);
# with a retry count?
}
}

With test patch, make check PROVE_FLAGS="--timer" 
PROVE_TESTS=t/001_stream_rep.pl

ok 6386 ms ( 0.00 usr  0.00 sys +  1.14 cusr  0.93 csys =  2.07 CPU)
ok 6352 ms ( 0.00 usr  0.00 sys +  1.10 cusr  0.94 csys =  2.04 CPU)
ok 6255 ms ( 0.01 usr  0.00 sys +  0.99 cusr  0.97 csys =  1.97 CPU)

without test patch:

ok 4954 ms ( 0.00 usr  0.00 sys +  0.71 cusr  0.64 csys =  1.35 CPU)
ok 5033 ms ( 0.01 usr  0.00 sys +  0.71 cusr  0.73 csys =  1.45 CPU)
ok 4991 ms ( 0.01 usr  0.00 sys +  0.73 cusr  0.59 csys =  1.33 CPU)

-- 
Álvaro Herrera




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-01-08 Thread Alvaro Herrera
On 2020-Dec-01, Alvaro Herrera wrote:

> On 2020-Nov-30, Justin Pryzby wrote:

> Thanks for all the comments. I'll incorporate everything and submit an
> updated version later.

Here's a rebased version 5, with the typos fixed.  More comments below.

> > The attname "detached" is a stretch of what's intuitive (it's more like
> > "detachING" or half-detached).  But I think psql should for sure show 
> > something
> > more obvious to users.  Esp. seeing as psql output isn't documented.  Let's
> > figure out what to show to users and then maybe rename the column that, too.
> 
> OK.  I agree that "being detached" is the state we want users to see, or
> maybe "detach pending", or "unfinisheddetach" (ugh).  I'm not sure that
> pg_inherits.inhbeingdetached" is a great column name.  Opinions welcome.

I haven't changed this yet; I can't make up my mind about what I like
best.

Partition of: parent FOR VALUES IN (1) UNFINISHED DETACH
Partition of: parent FOR VALUES IN (1) UNDER DETACH
Partition of: parent FOR VALUES IN (1) BEING DETACHED

> > ATExecDetachPartition:
> > Doesn't this need to lock the table before testing for default partition ?
> 
> Correct, it does.

I failed to point out that by the time ATExecDetachPartition is called,
the relation has already been locked by the invoking ALTER TABLE support
code.

> > I ended up with apparently broken constraint when running multiple loops 
> > around
> > a concurrent detach / attach:
> > 
> > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; 
> > do :; done&
> > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; 
> > do :; done&
> > 
> > "p1_check" CHECK (true)
> > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> 
> Not good.

Haven't had time to investigate this problem yet.

-- 
Álvaro Herrera
>From d21acb0e99ed3d296db70fed2d5d036d721d611b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v5 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 993da56d43..a8528a3423 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -156,6 +156,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -353,7 +355,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4405,7 +4407,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4414,10 +4415,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4429,7 +4430,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 ATPostAlterTypeCleanup(wqueue, tab, lockmode);
 
-			relation_close(rel, NoLock);
+			if (tab->rel)
+			{
+relation_close(tab->rel, NoLock);
+tab->rel = NULL;
+			}
 		}
 	}
 
@@ -4455,11 +4460,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
  * ATExecCmd: dispatch a subcommand to appropriate execution routine
  */
 static void
-ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,

Re: pgbench: option delaying queries till connections establishment?

2021-01-08 Thread Thomas Munro
On Sun, Jan 3, 2021 at 9:49 AM Fabien COELHO  wrote:
> > Just for fun, the attached 0002 patch is a quick prototype of a
> > replacement for that stuff that seems to work OK on a Mac here.  (I'm
> > not sure if the Windows part makes sense or works.)
>
> Thanks! That will definitely help because I do not have a Mac. I'll do
> some cleanup.

I think the main things to clean up are:

1.  pthread_barrier_init() should check for errors from
pthread_cond_init() and pthread_mutex_init(), and return -1.
2.  pthread_barrier_destroy() should call pthread_cond_destroy() and
pthread_mutex_destroy().
3 . Decide if it's sane for the Windows-based emulation to be in here
too, or if it should stay in pgbench.c.  Or alternatively, if we're
emulating pthread stuff on Windows, why not also put the other pthread
emulation stuff from pgbench.c into a "ports" file; that seems
premature and overkill for your project.  I dunno.
4.  cfbot shows that it's not building on Windows because
HAVE_PTHREAD_BARRIER_WAIT is missing from Solution.pm.

As far as I know, it's OK and common practice to ignore the return
code from eg pthread_mutex_lock() and pthread_cond_wait(), with
rationale along the lines that there'd have to be a programming error
for them to fail in simple cases.

Unfortunately, cfbot can only tell us that it's building OK on a Mac,
but doesn't actually run the pgbench code to reach this stuff.  It's
not running check-world on that platform yet for the following asinine
reason:

connection to database failed: Unix-domain socket path
"/private/var/folders/3y/l0z1x3693dl_8n0qybp4dqwhgn/T/cirrus-ci-build/src/bin/pg_upgrade/.s.PGSQL.58080"
is too long (maximum 103 bytes)




Re: proposal: schema variables

2021-01-08 Thread Erik Rijkers

On 2021-01-08 07:20, Pavel Stehule wrote:

Hi

just rebase

[schema-variables-20200108.patch]


Hey Pavel,

My gcc 8.3.0 compile says:
(on debian 10/Buster)

utility.c: In function ‘CreateCommandTag’:
utility.c:2332:8: warning: this statement may fall through 
[-Wimplicit-fallthrough=]

tag = CMDTAG_SELECT;
^~~
utility.c:2334:3: note: here
   case T_LetStmt:
   ^~~~


compile, check, check-world, runs without further problem.

I also changed a few typos/improvements in the documentation, see 
attached.


One thing I wasn't sure of: I have assumed that
  ON TRANSACTIONAL END RESET

should be
  ON TRANSACTION END RESET

and changed it accordingly, please double-check.


Erik Rijkers
--- doc/src/sgml/ref/create_variable.sgml.orig	2021-01-08 17:40:20.181823036 +0100
+++ doc/src/sgml/ref/create_variable.sgml	2021-01-08 17:59:46.976127524 +0100
@@ -16,7 +16,7 @@
 
  
   CREATE VARIABLE
-  define a new permissioned typed schema variable
+  define a schema variable
  
 
  
@@ -29,24 +29,24 @@
   Description
 
   
-   The CREATE VARIABLE command creates a new schema variable.
+   The CREATE VARIABLE command creates a schema variable.
Schema variables, like relations, exist within a schema and their access is
controlled via GRANT and REVOKE commands.
-   Their changes are non-transactional by default.
+   Changing a schema variable is non-transactional by default.
   
 
   
The value of a schema variable is local to the current session. Retrieving
a variable's value returns either a NULL or a default value, unless its value
is set to something else in the current session with a LET command. By default,
-   the content of variable is not transactional, alike regular variables in PL languages.
+   the content of a variable is not transactional. This is the same as in regular variables in PL languages.
   
 
   
-   Schema variables are retrieved by the regular SELECT SQL command.
-   Their value can be set with the LET SQL command.
-   Notably, while schema variables share properties with tables, they cannot be updated
-   with UPDATE commands.
+   Schema variables are retrieved by the SELECT SQL command.
+   Their value is set with the LET SQL command.
+   While schema variables share properties with tables, their value cannot be updated
+   with an UPDATE command.
   
  
 
@@ -76,7 +76,7 @@
 name
 
  
-  The name (optionally schema-qualified) of the variable to create.
+  The name, optionally schema-qualified, of the variable.
  
 

@@ -85,7 +85,7 @@
 data_type
 
  
-  The name (optionally schema-qualified) of the data type of the variable to be created.
+  The name, optionally schema-qualified, of the data type of the variable.
  
 

@@ -105,7 +105,7 @@
 NOT NULL
 
  
-  The NOT NULL clause forbid to set the variable to
+  The NOT NULL clause forbids to set the variable to
   a null value. A variable created as NOT NULL and without an explicitly
   declared default value cannot be read until it is initialized by a LET
   command. This obliges the user to explicitly initialize the variable
@@ -118,22 +118,22 @@
 DEFAULT default_expr
 
  
-  The DEFAULT clause assigns a default data to
-  schema variable.
+  The DEFAULT clause can be used to assign a default value to
+  a schema variable.
  
 

 

-ON COMMIT DROP, ON TRANSACTIONAL END RESET
+ON COMMIT DROP, ON TRANSACTION END RESET
 
  
   The ON COMMIT DROP clause specifies the behaviour
-  of temporary schema variable at transaction commit. With this clause the
+  of a temporary schema variable at transaction commit. With this clause the
       variable is dropped at commit time. The clause is only allowed
-      for temporary variables. The ON TRANSACTIONAL END RESET
+      for temporary variables. The ON TRANSACTION END RESET
   clause enforces the variable to be reset to its default value when
-  the transaction is either commited or rolled back.
+  the transaction is committed or rolled back.
  
 

@@ -145,7 +145,7 @@
   Notes
 
   
-   Use DROP VARIABLE command to remove a variable.
+   Use the DROP VARIABLE command to remove a variable.
   
  
 
--- doc/src/sgml/ref/discard.sgml.orig	2021-01-08 18:02:25.837531779 +0100
+++ doc/src/sgml/ref/discard.sgml	2021-01-08 18:40:09.973630164 +0100
@@ -104,6 +104,7 @@
 DISCARD PLANS;
 DISCARD TEMP;
 DISCARD SEQUENCES;
+DISCARD VARIABLES;
 
 

--- doc/src/sgml/ref/drop_variable.sgml.orig	2021-01-08 18:05:28.643147771 +0100
+++ doc/src/sgml/ref/drop_variable.sgml	2021-01-08 18:07:17.876113523 +0100
@@ -16,7 +16,7 @@
 
  
   DROP VARIABLE
-  removes a schema variable
+  remove a schema variable
  
 
  
@@ -52,7 +52,7 @@
 name
 
  
-  The name (optionally schema-qualified) of a schema variable.
+  The name, optionally schema-qualified, of a schema 

Re: WIP: System Versioned Temporal Table

2021-01-08 Thread Ryan Lambert
On Fri, Jan 8, 2021 at 5:34 AM Simon Riggs 
wrote:

> On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs 
> wrote:
> >
> > On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs 
> wrote:
> >
> > > I've minimally rebased the patch to current head so that it compiles
> > > and passes current make check.
> >
> > Full version attached
>
> New version attached with improved error messages, some additional
> docs and a review of tests.
>
>
The v10 patch fails to make on the current master branch (15b824da).  Error:


make[2]: Entering directory
'/var/lib/postgresql/git/postgresql/src/backend/parser'
'/usr/bin/perl' ./check_keywords.pl gram.y
../../../src/include/parser/kwlist.h
/usr/bin/bison -Wno-deprecated  -d -o gram.c gram.y
gram.y:3685.55-56: error: $4 of ‘ColConstraintElem’ has no declared type
  n->contype = ($4)->contype;
   ^^
gram.y:3687.56-57: error: $4 of ‘ColConstraintElem’ has no declared type
  n->raw_expr = ($4)->raw_expr;
^^
gram.y:3734.41-42: error: $$ of ‘generated_type’ has no declared type
  $$ = n;
 ^^
gram.y:3741.41-42: error: $$ of ‘generated_type’ has no declared type
  $$ = n;
 ^^
gram.y:3748.41-42: error: $$ of ‘generated_type’ has no declared type
  $$ = n;
 ^^
../../../src/Makefile.global:750: recipe for target 'gram.c' failed
make[2]: *** [gram.c] Error 1
make[2]: Leaving directory
'/var/lib/postgresql/git/postgresql/src/backend/parser'
Makefile:137: recipe for target 'parser/gram.h' failed
make[1]: *** [parser/gram.h] Error 2
make[1]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend'
src/Makefile.global:389: recipe for target 'submake-generated-headers'
failed
make: *** [submake-generated-headers] Error 2


* UPDATE doesn't set EndTime correctly, so err... the patch doesn't
> work on this aspect.
> Everything else does actually work, AFAICS, so we "just" need a way to
> update the END_TIME column in place...
> So input from other Hackers/Committers needed on this point to see
> what is acceptable.
>
> * Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved
>
> * No discussion, comments or tests around freezing and whether that
> causes issues here
>
> * What happens if you ask for a future time?
> It will give an inconsistent result as it scans, so we should refuse a
> query for time > current_timestamp.

* ALTER TABLE needs some work, it's a bit klugey at the moment and
> needs extra tests.
> Should refuse DROP COLUMN on a system time column
>
> * Do StartTime and EndTime show in SELECT *? Currently, yes. Would
> guess we wouldn't want them to, not sure what standard says.
>
>
I prefer to have them hidden by default.  This was mentioned up-thread with
no decision, it seems the standard is ambiguous.  MS SQL appears to
have flip-flopped on this decision [1].

> SELECT * shows the timestamp columns, don't we need to hide the period
> > timestamp columns from this query?
> >
> >
> The sql standard didn't dictate hiding the column but i agree hiding it by
> default is good thing because this columns are used by the system
> to classified the data and not needed in user side frequently. I can
> change to that if we have consensus





> * The syntax changes in gram.y probably need some coralling
>
> Overall, it's a pretty good patch and worth working on more. I will
> consider a recommendation on what to do with this.
>
> --
> Simon Riggshttp://www.EnterpriseDB.com/


I am increasingly interested in this feature and have heard others asking
for this type of functionality.  I'll do my best to continue reviewing and
testing.

Thanks,

Ryan Lambert

[1] https://bornsql.ca/blog/temporal-tables-hidden-columns/


Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-01-08 Thread japin


On Fri, 08 Jan 2021 at 17:24, Bharath Rupireddy wrote:
> On Fri, Jan 8, 2021 at 1:50 PM japin  wrote:
>> Thanks for updating the patch!
>>
>> +   /* Get the data generating query. */
>> +   dataQuery = get_matview_query(stmt, , );
>>
>> -   /*
>> -* Check for active uses of the relation in the current transaction, 
>> such
>> -* as open scans.
>> -*
>> -* NB: We count on this to protect us against problems with 
>> refreshing the
>> -* data using TABLE_INSERT_FROZEN.
>> -*/
>> -   CheckTableNotInUse(matviewRel, "REFRESH MATERIALIZED VIEW");
>> +   relowner = matviewRel->rd_rel->relowner;
>>
>> After apply the patch, there is a duplicate
>>
>> relowner = matviewRel->rd_rel->relowner;
>
> Corrected that.
>
>> +   else if(matviewInfo)
>> +   dest = 
>> CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap);
>>
>> If the `matviewInfo->OIDNewHeap` is invalid, IMO we don't need create
>> DestReceiver, isn't it?  And we should add a space after `if`.
>
> Yes, we can skip creating the dest receiver when OIDNewHeap is
> invalid, this can happen for plain explain refresh mat view case.
>
> if (explainInfo && !explainInfo->es->analyze)
> OIDNewHeap = InvalidOid;
> else
> OIDNewHeap = get_new_heap_oid(stmt, matviewRel, matviewOid,
>   );
>
> Since we don't call ExecutorRun for plain explain, we can skip the
> dest receiver creation. I modified the code as below in explain.c.
>
> if (into)
> dest = CreateIntoRelDestReceiver(into);
> else if (matviewInfo && OidIsValid(matviewInfo->OIDNewHeap))
> dest = CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap);
> else
> dest = None_Receiver;
>
> Thanks for taking a look at the patches.
>

Thanks!

> Attaching v3 patches, please consider these for further review.
>

I find that both the declaration and definition of match_matview_with_new_data()
have a tab between type and variable.  We can use pgindent to fix it.
What do you think?


static void
match_matview_with_new_data(RefreshMatViewStmt *stmt, Relation  matviewRel,
  ^
Oid matviewOid, Oid OIDNewHeap, Oid relowner,
int save_sec_context, char relpersistence,
uint64  processed)
  ^

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-08 Thread Bruce Momjian
On Fri, Jan  8, 2021 at 05:06:24PM +0100, Magnus Hagander wrote:
> On Fri, Jan 8, 2021 at 4:57 PM Bruce Momjian  wrote:
> > I think once we have better online enabling of checksums people can more
> > easily test the overhead on their workloads.
> 
> Yeah, definitely.
> 
> If they have equivalent hardware they can easily do it now -- create a
> replica, turn off checksums on replica, compare. That is, assuming we
> turn them on by default :) But being able to turn them both on and off
> without a large downtime is obviously going to make experimentation a
> lot more reasonable.

Can someone compute overhead on a real workload for us now?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-08 Thread Magnus Hagander
On Fri, Jan 8, 2021 at 4:57 PM Bruce Momjian  wrote:
>
> On Fri, Jan  8, 2021 at 09:46:58AM -0500, David Steele wrote:
> > On 1/8/21 5:03 AM, Andres Freund wrote:
> > > On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote:
> > > >
> > > > The serious crowd are more likely to choose a non-default setting
> > > > to avoid paying the price for a feature that they don't need.
> > >
> > > I don't really buy this argument. That way we're going to have an ever 
> > > growing set of things that need to be tuned to have a database that's 
> > > usable in an even halfway busy setup. That's unavoidable in some cases, 
> > > but it's a significant cost across use cases.
> > >
> > > Increasing the overhead in the default config from one version to the 
> > > next isn't great - it makes people more hesitant to upgrade. It's also 
> > > not a cost you're going to find all that quickly, and it's a really hard 
> > > to pin down cost.
> >
> > I'm +1 for enabling checksums by default, even with the performance
> > penalties.
> >
> > As far as people upgrading, one advantage is existing pg_upgrade'd databases
> > would not be affected. Only newly init'd clusters would get this setting.
>
> I think once we have better online enabling of checksums people can more
> easily test the overhead on their workloads.

Yeah, definitely.

If they have equivalent hardware they can easily do it now -- create a
replica, turn off checksums on replica, compare. That is, assuming we
turn them on by default :) But being able to turn them both on and off
without a large downtime is obviously going to make experimentation a
lot more reasonable.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-08 Thread Bruce Momjian
On Fri, Jan  8, 2021 at 09:46:58AM -0500, David Steele wrote:
> On 1/8/21 5:03 AM, Andres Freund wrote:
> > On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote:
> > > 
> > > The serious crowd are more likely to choose a non-default setting
> > > to avoid paying the price for a feature that they don't need.
> > 
> > I don't really buy this argument. That way we're going to have an ever 
> > growing set of things that need to be tuned to have a database that's 
> > usable in an even halfway busy setup. That's unavoidable in some cases, but 
> > it's a significant cost across use cases.
> > 
> > Increasing the overhead in the default config from one version to the next 
> > isn't great - it makes people more hesitant to upgrade. It's also not a 
> > cost you're going to find all that quickly, and it's a really hard to pin 
> > down cost.
> 
> I'm +1 for enabling checksums by default, even with the performance
> penalties.
> 
> As far as people upgrading, one advantage is existing pg_upgrade'd databases
> would not be affected. Only newly init'd clusters would get this setting.

I think once we have better online enabling of checksums people can more
easily test the overhead on their workloads.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-08 Thread Bruce Momjian
On Fri, Jan  8, 2021 at 10:53:35AM +0100, Laurenz Albe wrote:
> On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote:
> > I expected there'd be some disagreement on this, but I do continue to
> > feel that it's sensible to enable checksums by default.
> 
> +1
> 
> I think the problem here (apart from the original line of argumentation)
> is that there are two kinds of PostgreSQL installations:
> 
> - installations done on dubious hardware with minimal tuning
>   (the "cheap crowd")
> 
> - installations done on good hardware, where people make an effort to
>   properly configure the database (the "serious crowd")
> 
> I am aware that this is an oversimplification for the sake of the argument.
> 
> The voices against checksums on by default are probably thinking of
> the serious crowd.
> 
> If checksums were enabled by default, the cheap crowd would benefit
> from the early warnings that something has gone wrong.
> 
> The serious crowd are more likely to choose a non-default setting
> to avoid paying the price for a feature that they don't need.

I think you have captured the major issue here --- it explains a lot.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Improper use about DatumGetInt32

2021-01-08 Thread Peter Eisentraut

On 2021-01-08 10:21, Peter Eisentraut wrote:

I think on 64-bit systems it's actually safe, but on 32-bit systems
(with USE_FLOAT8_BYVAL), if you use the new binaries with the old
SQL-level definitions, you'd get the int4 that is passed in interpreted
as a pointer, which would lead to very bad things.  So I think we need
to create new functions with a different C symbol.  I'll work on that.


Updated patch that does that.
>From d0b802478ef2f5fc4200175c56b1f9b2f712e9ed Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 8 Jan 2021 16:49:07 +0100
Subject: [PATCH v3] pageinspect: Change block number arguments to bigint

Block numbers are 32-bit unsigned integers.  Therefore, the smallest
SQL integer type that they can fit in is bigint.  However, in the
pageinspect module, most input and output parameters dealing with
block numbers were declared as int.  The behavior with block numbers
larger than a signed 32-bit integer was therefore dubious.  Change
these arguments to type bigint and add some more explicit error
checking on the block range.

(Other contrib modules appear to do this correctly already.)

Since we are changing argument types of existing functions, in order
to not misbehave if the binary is updated before the extension is
updated, we need to create new C symbols for the entry points, similar
to how it's done in other extensions as well.

Reported-by: Ashutosh Bapat 
Reviewed-by: Alvaro Herrera 
Discussion: 
https://www.postgresql.org/message-id/flat/d8f6bdd536df403b9b33816e9f7e0b9d@G08CNEXMBPEKD05.g08.fujitsu.local
---
 contrib/pageinspect/Makefile  |  3 +-
 contrib/pageinspect/brinfuncs.c   | 13 ++-
 contrib/pageinspect/btreefuncs.c  | 84 ++
 contrib/pageinspect/hashfuncs.c   | 11 ++-
 contrib/pageinspect/pageinspect--1.8--1.9.sql | 81 +
 contrib/pageinspect/pageinspect.control   |  2 +-
 contrib/pageinspect/pageinspect.h |  9 ++
 contrib/pageinspect/rawpage.c | 87 ++-
 doc/src/sgml/pageinspect.sgml | 12 +--
 9 files changed, 270 insertions(+), 32 deletions(-)
 create mode 100644 contrib/pageinspect/pageinspect--1.8--1.9.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index d9d8177116..3fcbfea293 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -12,7 +12,8 @@ OBJS = \
rawpage.o
 
 EXTENSION = pageinspect
-DATA =  pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \
+DATA =  pageinspect--1.8--1.9.sql \
+   pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \
pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index e872f65f01..0e3c2deb66 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -252,7 +252,18 @@ brin_page_items(PG_FUNCTION_ARGS)
int att = attno - 1;
 
values[0] = UInt16GetDatum(offset);
-   values[1] = UInt32GetDatum(dtup->bt_blkno);
+   switch (TupleDescAttr(tupdesc, 1)->atttypid)
+   {
+   case INT8OID:
+   values[1] = Int64GetDatum((int64) 
dtup->bt_blkno);
+   break;
+   case INT4OID:
+   /* support for old extension version */
+   values[1] = 
UInt32GetDatum(dtup->bt_blkno);
+   break;
+   default:
+   elog(ERROR, "incorrect output types");
+   }
values[2] = UInt16GetDatum(attno);
values[3] = 
BoolGetDatum(dtup->bt_columns[att].bv_allnulls);
values[4] = 
BoolGetDatum(dtup->bt_columns[att].bv_hasnulls);
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 445605db58..a123955aae 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,8 +41,10 @@
 #include "utils/varlena.h"
 
 PG_FUNCTION_INFO_V1(bt_metap);
+PG_FUNCTION_INFO_V1(bt_page_items_1_9);
 PG_FUNCTION_INFO_V1(bt_page_items);
 PG_FUNCTION_INFO_V1(bt_page_items_bytea);
+PG_FUNCTION_INFO_V1(bt_page_stats_1_9);
 PG_FUNCTION_INFO_V1(bt_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -160,11 +162,11 @@ GetBTPageStatistics(BlockNumber blkno, Buffer buffer, 
BTPageStat *stat)
  * Usage: SELECT * FROM bt_page_stats('t1_pkey', 1);
  * ---
  */
-Datum
-bt_page_stats(PG_FUNCTION_ARGS)
+static Datum

Re: Proposal: Global Index

2021-01-08 Thread Bruce Momjian
On Fri, Jan  8, 2021 at 11:26:48AM +0800, 曾文旌 wrote:
> > On Thu, Jan  7, 2021 at 05:44:01PM +0800, 曾文旌 wrote:
> >> I've been following this topic for a long time. It's been a year since the 
> >> last response.
> >> It was clear that our customers wanted this feature as well, and a large 
> >> number of them mentioned it.
> >> 
> >> So, I wish the whole feature to mature as soon as possible.
> >> I summarized the scheme mentioned in the email and completed the POC 
> >> patch(base on PG_13).
> > 
> > I think you need to address the items mentioned in this blog, and the
> > email link it mentions:
> > 
> > https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020
> 
> Thank you for your reply.
> I read your blog and it helped me a lot.
> 
> The blog mentions a specific problem: "A large global index might also 
> reintroduce problems that prompted the creation of partitioning in the first 
> place. "
> I don't quite understand, could you give some specific information?

Well, if you created partitions, you probably did so because:

1.  heap files are smaller, allowing for more targeted sequential scans
2.  smaller indexes
3.  the ability to easily drop tables/indexes that are too old

If you have global indexes, #1 is the same, but #2 is not longer true,
and for #3, you can drop the heap but the index entries still exist in
the global index and must be removed.

So, if you created partitions for one of the three reasons, once you
have global indexes, some of those advantage of partitioning are no
longer true.  I am sure there are some workloads where the advantages of
partitioning, minus the advantages lost when using global indexes, are
useful, but are there enough of them to make the feature useful?  I
don't know.

> In addition you mentioned: "It is still unclear if these use-cases justify 
> the architectural changes needed to enable global indexes."
> Please also describe the problems you see, I will confirm each specific issue 
> one by one.

Well, the email thread I linked to has a lot of them, but the
fundamental issue is that you have to break the logic that a single
index serves a single heap file.  Considering what I said above, is
there enough usefulness to warrant such an architectural change?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-08 Thread David Steele

On 1/8/21 5:03 AM, Andres Freund wrote:

On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote:


The serious crowd are more likely to choose a non-default setting
to avoid paying the price for a feature that they don't need.


I don't really buy this argument. That way we're going to have an ever growing 
set of things that need to be tuned to have a database that's usable in an even 
halfway busy setup. That's unavoidable in some cases, but it's a significant 
cost across use cases.

Increasing the overhead in the default config from one version to the next 
isn't great - it makes people more hesitant to upgrade. It's also not a cost 
you're going to find all that quickly, and it's a really hard to pin down cost.


I'm +1 for enabling checksums by default, even with the performance 
penalties.


As far as people upgrading, one advantage is existing pg_upgrade'd 
databases would not be affected. Only newly init'd clusters would get 
this setting.


Regards,
--
-David
da...@pgmasters.net




Re: WIP: System Versioned Temporal Table

2021-01-08 Thread Andrew Dunstan


On 1/8/21 7:33 AM, Simon Riggs wrote:
>
> * What happens if you ask for a future time?
> It will give an inconsistent result as it scans, so we should refuse a
> query for time > current_timestamp.


That seems like a significant limitation. Can we fix it instead of
refusing the query?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] Simple progress reporting for COPY command

2021-01-08 Thread Matthias van de Meent
On Thu, 7 Jan 2021 at 23:00, Josef Šimánek  wrote:
>
> čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
>  napsal:
> >
> > I'm not particularly attached to the "lines" naming, it just seemed OK
> > to me. So if there's consensus to rename this somehow, I'm OK with it.
>
> The problem I do see here is it depends on the "way" of COPY. If
> you're copying from CSV file to table, those are actually lines (since
> 1 line = 1 tuple). But copying from DB to file is copying tuples (but
> 1 tuple = 1 file line). Line works better here for me personally.
>
> Once I'll fix the problem with triggers (and also another cases if
> found), I think we can consider it lines. It will represent amount of
> lines processed from file on COPY FROM and amount of lines written to
> file in COPY TO form (at least in CSV format). I'm not sure how BINARY
> format works, I'll check.

Counterexample that 1 tuple need not be 1 line, in csv/binary:

/*
 * create a table with one tuple containing 1 text field, which consists of
 * 10 newline characters.
 * If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n).
 */
# CREATE TABLE ttab (val) AS
  SELECT * FROM (values (
repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text
  )) as v;

# -- indeed, one unix-style line, according to $ wc -l copy.txt
# COPY ttab TO 'copy.txt' (format text);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text);
COPY 1

# -- 11 lines
# COPY ttab TO 'copy.csv' (format csv);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv);
COPY 1

# -- 13 lines
# COPY ttab TO 'copy.bin' (format binary);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary);
COPY 1

All of the above copy statements would only report 'lines_processed = 1',
in the progress reporting, while csv/binary line counts are definatively
inconsistent with what the progress reporting shows, because progress
reporting counts tuples / table rows, not the amount of lines in the
external file.




Re: Add session statistics to pg_stat_database

2021-01-08 Thread Masahiro Ikeda

On 2021-01-08 18:34, Laurenz Albe wrote:

On Fri, 2021-01-08 at 12:00 +0900, Masahiro Ikeda wrote:

2. monitoring.sgml

> > IIUC, "active_time" includes the time executes a fast-path function
> > and
> > "idle in transaction" includes "idle in transaction(aborted)" time.
> > Why don't you reference pg_stat_activity's "state" column and
> > "active_time" is the total time when the state is "active" and "fast
> > path"?
> > "idle in transaction" is as same too.
>
> Good idea; I have expanded the documentation like that.

BTW, is there any reason to merge the above statistics?
IIUC, to separate statistics' cons is that two columns increase, and
there is no performance penalty. So, I wonder that there is a way to
separate them
corresponding to the state column of pg_stat_activity.


Sure, that could be done.

I decided to do it like this because I thought that few people would
be interested in "time spend doing fast-path function calls"; my guess
was that the more interesting value is "time where the database was
busy calculating results".

I tried to keep the balance between providing reasonable detail
while not creating more additional columns to "pg_stat_database"
than necessary.

This is of course a matter of taste, and it is good to hear different
opinions.  If more people share your opinion, I'll change the code.


OK, I understood.
I don't have any strong opinions to add them.


There are some following codes in pgstatfuncs.c.
int64 result = 0.0;

But, I think the following is better.
int64 result = 0;


You are right.  That was a silly copy-and-paste error.  Fixed.


Thanks.

Although now pg_stat_get_db_session_time is initialize "result" to 
zero

when it is declared,
another pg_stat_XXX function didn't initialize. Is it better to change
it?


I looked at other similar functions, and the ones I saw returned
NULL if there were no data.  In that case, it makes sense to write

char *result;

if ((result = get_stats_data()) == NULL)
PG_RETURN_NULL();

PG_RETURN_TEXT_P(cstring_to_text(result));

But I want to return 0 for the session time if there are no data yet,
so I think initializing the result to 0 in the declaration makes sense.

There are some functions that do it like this:

int32   result;

result = 0;
for (...)
{
if (...)
result++;
}

PG_RETURN_INT32(result);

Again, it is a matter of taste, and I didn't detect a clear pattern
in the existing code that I feel I should follow in this question.


Thanks, I understood.

I checked my comments are fixed.
This patch looks good to me for monitoring session statistics.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Global snapshots

2021-01-08 Thread Fujii Masao




On 2021/01/01 12:14, tsunakawa.ta...@fujitsu.com wrote:

Hello,


Fujii-san and I discussed how to move the scale-out development forward.  We 
are both worried that Clock-SI is (highly?) likely to infringe the said 
Microsoft's patent.  So we agreed we are going to investigate the Clock-SI and 
the patent, and if we have to conclude that we cannot embrace Clock-SI, we will 
explore other possibilities.


Yes.




IMO, it seems that Clock-SI overlaps with the patent and we can't use it.  First, looking 
back how to interpret the patent document, patent "claims" are what we should 
pay our greatest attention.  According to the following citation from the IP guide by 
Software Freedom Law Center (SFLC) [1], software infringes a patent if it implements 
everything of any claim, not all claims.


--
4.2 Patent Infringement
To prove that you5 infringe a patent, the patent holder must show that you 
make, use, offer to sell, or sell the invention as it is defined in at least 
one claim of the patent.

For software to infringe a patent, the software essentially must implement 
everything recited in one of the patent�fs claims. It is crucial to recognize 
that infringement is based directly on the claims of the patent, and not on 
what is stated or described in other parts of the patent document.
--


And, Clock-SI implements at least claims 11 and 20 cited below.  It doesn't 
matter whether Clock-SI uses a physical clock or logical one.


Thanks for sharing the result of your investigation!

Regarding at least claim 11, I reached the same conclusion. As far as
I understand correctly, Clock-SI actually does the method described
at the claim 11 when determing the commit time and doing the commit
on each node.

I don't intend to offend Clock-SI and any activities based on that. OTOH,
I'm now wondering if it's worth considering another approach for global
transaction support, while I'm still interested in Clock-SI technically.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: plpgsql variable assignment with union is broken

2021-01-08 Thread Merlin Moncure
On Thu, Jan 7, 2021 at 10:55 AM Pavel Stehule  wrote:
>> Again, if this is narrowly confined to assignment into set query
>> operations, maybe this is not so bad. But is it?
>
>  PLpgSQL_Expr: opt_target_list
> <--><--><-->from_clause where_clause
> <--><--><-->group_clause having_clause window_clause
> <--><--><-->opt_sort_clause opt_select_limit opt_for_locking_clause
> <--><--><--><-->{
>
> So SELECT INTO and assignment are not fully compatible now.

OK.  Well, I went ahead and checked the code base for any suspicious
assignments that might fall out of the tightened syntax.  It was
cursory, but didn't turn up anything obvious.  So I'm going to change
my position on this.

My feedback would be to take backwards compatibility very seriously
relating to language changes in the future, and to rely less on
documentation arguments as it relates to change justification. The
current behavior has been in place for decades and is therefore a de
facto standard.  Change freedom ought to be asserted in scenarios
where behavior is reserved as undefined or is non standard rather than
assumed.  Rereading the development thread, it seems a fairly short
timeframe between idea origination and commit, and hypothetical
impacts to existing code bases was not rigorously assessed.  I guess
it's possible this may ultimately cause some heartburn but it's hard
to say without strong data to justify that position.

Having said all of that, I'm very excited about the array assignment
improvements and investment in this space is very much appreciated. .

merlin




Re: Added schema level support for publication.

2021-01-08 Thread Amit Kapila
On Thu, Jan 7, 2021 at 10:03 PM vignesh C  wrote:
>
> This feature adds schema option while creating publication. Users will
> be able to specify one or more schemas while creating publication,
> when the user specifies schema option, then the data changes for the
> tables present in the schema specified by the user will be replicated
> to the subscriber. Few examples have been listed below:
>
> Create a publication that publishes all changes for all the tables
> present in production schema:
> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;
>
> Create a publication that publishes all changes for all the tables
> present in marketing and sales schemas:
> CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;
>
> Add some schemas to the publication:
> ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
>
> Drop some schema from the publication:
> ALTER PUBLICATION production_quarterly_publication DROP SCHEMA 
> production_july;
>
> Attached is a POC patch for the same. I felt this feature would be quite 
> useful.
>

What do we do if the user Drops the schema? Do we automatically remove
it from the publication?

I see some use of such a feature but you haven't described the use
case or how did you arrive at the conclusion that it would be quite
useful?

-- 
With Regards,
Amit Kapila.




Re: Disable WAL logging to speed up data loading

2021-01-08 Thread Masahiko Sawada
On Fri, Jan 8, 2021 at 2:12 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Robert Haas 
> > Were the issues that I mentioned regarding GIST (and maybe other AMs)
> > in the last paragraph of
> > http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> > dmrev...@mail.gmail.com
> > addressed in some way? That seems like a pretty hard engineering
> > problem to me, and I don't see that there's been any discussion of it.
> > Those are correctness concerns separate from any wal_level tracking we
> > might want to do to avoid accidental mistakes.
>
> Thank you very much for reminding me of this.  I forgot I replied as follows:
>
>
> --
> Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged 
> temporary GiST indexes use backend-local sequence values.  Other unlogged and 
> temporary relations don't set LSNs on pages.  So, I think it's enough to call 
> GetFakeLSNForUnloggedRel() when wal_level = none as well.
> --
>
>
> But this is not correct.  We have to allow (RM_GIST_ID, XLOG_GIST_ASSIGN_LSN) 
> WAL records to be emitted (by tweaking the filter in XLogInsert()), just like 
> those WAL records are emitted when wal_level = minimal and and other WAL 
> records are not emitted.

I think it's better to have index AM (and perhaps table AM) control it
instead of filtering in XLogInsert(). Because otherwise third-party
access methods that use LSN like gist indexes cannot write WAL records
at all when wal_level = none even if they want.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-08 Thread Andres Freund
Hi,

On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote:
> On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote:
> > I expected there'd be some disagreement on this, but I do continue to
> > feel that it's sensible to enable checksums by default.
> 
> +1

I don't disagree with this in principle, but if you want that you need to work 
on making checksum overhead far smaller. That's doable. Afterwards it makes 
sense to have this discussion.

> I think the problem here (apart from the original line of argumentation)
> is that there are two kinds of PostgreSQL installations:
> 
> - installations done on dubious hardware with minimal tuning
>   (the "cheap crowd")
> 
> - installations done on good hardware, where people make an effort to
>   properly configure the database (the "serious crowd")
> 
> I am aware that this is an oversimplification for the sake of the argument.
> 
> The voices against checksums on by default are probably thinking of
> the serious crowd.
> 
> If checksums were enabled by default, the cheap crowd would benefit
> from the early warnings that something has gone wrong.
> 
> The serious crowd are more likely to choose a non-default setting
> to avoid paying the price for a feature that they don't need.

I don't really buy this argument. That way we're going to have an ever growing 
set of things that need to be tuned to have a database that's usable in an even 
halfway busy setup. That's unavoidable in some cases, but it's a significant 
cost across use cases.

Increasing the overhead in the default config from one version to the next 
isn't great - it makes people more hesitant to upgrade. It's also not a cost 
you're going to find all that quickly, and it's a really hard to pin down cost.


Andres





Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-08 Thread Amit Kapila
On Fri, Jan 8, 2021 at 12:21 PM Antonin Houska  wrote:
>
> Amit Kapila  wrote:
>
> > > As an alternative, have you considered allocation of the XID even in 
> > > parallel
> > > mode? I imagine that the first parallel worker that needs the XID for
> > > insertions allocates it and shares it with the other workers as well as 
> > > with
> > > the leader process.
> > >
> >
> > As a matter of this patch
> > (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
> > never need to allocate xids by workers because Insert is always
> > performed by leader backend.
>
> When writing this comment, I was actually thinking of
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather
> than v11-0001, see below. On the other hand, if we allowed XID allocation in
> the parallel mode (as a separate patch), even the 0001 patch would get a bit
> simpler.
>
> > Even, if we want to do what you are suggesting it would be tricky because
> > currently, we don't have such an infrastructure where we can pass
> > information among workers.
>
> How about barriers (storage/ipc/barrier.c)? What I imagine is that all the
> workers "meet" at the barrier when they want to insert the first tuple. Then
> one of them allocates the XID, makes it available to others (via shared
> memory) and all the workers can continue.
>

Even if want to do this I am not sure if we need barriers because
there is no intrinsic need for all workers to stop before allocating
XID. After allocation of XID, we just need some way for other workers
to use it, maybe something like all workers currently synchronizes for
getting the block number to process in parallel sequence scans. But
the question is it really worth because in many cases it would be
already allocated by the time parallel DML operation is started and we
do share it in the beginning?  I think if we really want to allow
allocation of xid in parallel-mode then we should also think to allow
it for subtransactions xid not only for main transactions and that
will open up some other use cases. I feel it is better to tackle that
problem separately.

-- 
With Regards,
Amit Kapila.




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-08 Thread Laurenz Albe
On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote:
> I expected there'd be some disagreement on this, but I do continue to
> feel that it's sensible to enable checksums by default.

+1

I think the problem here (apart from the original line of argumentation)
is that there are two kinds of PostgreSQL installations:

- installations done on dubious hardware with minimal tuning
  (the "cheap crowd")

- installations done on good hardware, where people make an effort to
  properly configure the database (the "serious crowd")

I am aware that this is an oversimplification for the sake of the argument.

The voices against checksums on by default are probably thinking of
the serious crowd.

If checksums were enabled by default, the cheap crowd would benefit
from the early warnings that something has gone wrong.

The serious crowd are more likely to choose a non-default setting
to avoid paying the price for a feature that they don't need.

Yours,
Laurenz Albe





Re: proposal - psql - use pager for \watch command

2021-01-08 Thread Pavel Stehule
ne 19. 4. 2020 v 19:27 odesílatel Pavel Stehule 
napsal:

> Hi,
>
> last week I finished pspg 3.0 https://github.com/okbob/pspg . pspg now
> supports pipes, named pipes very well. Today the pspg can be used as pager
> for output of \watch command. Sure, psql needs attached patch.
>
> I propose new psql environment variable PSQL_WATCH_PAGER. When this
> variable is not empty, then \watch command starts specified pager, and
> redirect output to related pipe. When pipe is closed - by pager, then
> \watch cycle is leaved.
>
> If you want to test proposed feature, you need a pspg with 
> cb4114f98318344d162a84b895a3b7f8badec241
> commit.
>
> Then you can set your env
>
> export PSQL_WATCH_PAGER="pspg --stream"
> psql
>
> SELECT * FROM pg_stat_database;
> \watch 1
>
> Comments, notes?
>
> Regards
>
> Pavel
>

rebase
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 303e7c3ad8..7ab5aed3fa 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -173,6 +173,7 @@ static char *pset_value_string(const char *param, printQueryOpt *popt);
 static void checkWin32Codepage(void);
 #endif
 
+static bool sigpipe_received = false;
 
 
 /*--
@@ -4749,6 +4750,17 @@ do_shell(const char *command)
 	return true;
 }
 
+/*
+ * We want to detect sigpipe to break from watch cycle faster,
+ * before waiting 1 sec in sleep time, and unsuccess write to
+ * pipe.
+ */
+static void
+pagerpipe_sigpipe_handler(int signum)
+{
+	sigpipe_received = true;
+}
+
 /*
  * do_watch -- handler for \watch
  *
@@ -4763,6 +4775,8 @@ do_watch(PQExpBuffer query_buf, double sleep)
 	const char *strftime_fmt;
 	const char *user_title;
 	char	   *title;
+	const char *pagerprog = NULL;
+	FILE	   *pagerpipe = NULL;
 	int			title_len;
 	int			res = 0;
 
@@ -4772,6 +4786,21 @@ do_watch(PQExpBuffer query_buf, double sleep)
 		return false;
 	}
 
+	pagerprog = getenv("PSQL_WATCH_PAGER");
+	if (pagerprog)
+	{
+		sigpipe_received = false;
+
+#ifndef WIN32
+		pqsignal(SIGPIPE, pagerpipe_sigpipe_handler);
+#endif
+
+		pagerpipe = popen(pagerprog, "w");
+		if (!pagerpipe)
+			/*silently proceed without pager */
+			restore_sigpipe_trap();
+	}
+
 	/*
 	 * Choose format for timestamps.  We might eventually make this a \pset
 	 * option.  In the meantime, using a variable for the format suppresses
@@ -4783,7 +4812,8 @@ do_watch(PQExpBuffer query_buf, double sleep)
 	 * Set up rendering options, in particular, disable the pager, because
 	 * nobody wants to be prompted while watching the output of 'watch'.
 	 */
-	myopt.topt.pager = 0;
+	if (!pagerpipe)
+		myopt.topt.pager = 0;
 
 	/*
 	 * If there's a title in the user configuration, make sure we have room
@@ -4817,7 +4847,7 @@ do_watch(PQExpBuffer query_buf, double sleep)
 		myopt.title = title;
 
 		/* Run the query and print out the results */
-		res = PSQLexecWatch(query_buf->data, );
+		res = PSQLexecWatch(query_buf->data, , pagerpipe);
 
 		/*
 		 * PSQLexecWatch handles the case where we can no longer repeat the
@@ -4826,6 +4856,9 @@ do_watch(PQExpBuffer query_buf, double sleep)
 		if (res <= 0)
 			break;
 
+		if (pagerpipe && ferror(pagerpipe))
+			break;
+
 		/*
 		 * Set up cancellation of 'watch' via SIGINT.  We redo this each time
 		 * through the loop since it's conceivable something inside
@@ -4843,14 +4876,27 @@ do_watch(PQExpBuffer query_buf, double sleep)
 		i = sleep_ms;
 		while (i > 0)
 		{
-			long		s = Min(i, 1000L);
+			long		s = Min(i, pagerpipe ? 100L : 1000L);
 
 			pg_usleep(s * 1000L);
 			if (cancel_pressed)
 break;
+
+			if (sigpipe_received)
+break;
+
 			i -= s;
 		}
 		sigint_interrupt_enabled = false;
+
+		if (sigpipe_received)
+			break;
+	}
+
+	if (pagerpipe)
+	{
+		fclose(pagerpipe);
+		restore_sigpipe_trap();
 	}
 
 	pg_free(title);
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 6f507104f4..08bb046462 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -592,12 +592,13 @@ PSQLexec(const char *query)
  * e.g., because of the interrupt, -1 on error.
  */
 int
-PSQLexecWatch(const char *query, const printQueryOpt *opt)
+PSQLexecWatch(const char *query, const printQueryOpt *opt, FILE *printQueryFout)
 {
 	PGresult   *res;
 	double		elapsed_msec = 0;
 	instr_time	before;
 	instr_time	after;
+	FILE	   *fout;
 
 	if (!pset.db)
 	{
@@ -638,14 +639,16 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt)
 		return 0;
 	}
 
+	fout = printQueryFout ? printQueryFout : pset.queryFout;
+
 	switch (PQresultStatus(res))
 	{
 		case PGRES_TUPLES_OK:
-			printQuery(res, opt, pset.queryFout, false, pset.logfile);
+			printQuery(res, opt, fout, false, pset.logfile);
 			break;
 
 		case PGRES_COMMAND_OK:
-			fprintf(pset.queryFout, "%s\n%s\n\n", opt->title, PQcmdStatus(res));
+			fprintf(fout, "%s\n%s\n\n", opt->title, PQcmdStatus(res));
 			break;
 
 		case PGRES_EMPTY_QUERY:
@@ -668,7 +671,7 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt)
 
 	PQclear(res);
 
-	fflush(pset.queryFout);
+	fflush(fout);
 
 	

Re: Add session statistics to pg_stat_database

2021-01-08 Thread Laurenz Albe
On Fri, 2021-01-08 at 12:00 +0900, Masahiro Ikeda wrote:
> 2. monitoring.sgml
> 
> > > IIUC, "active_time" includes the time executes a fast-path function 
> > > and
> > > "idle in transaction" includes "idle in transaction(aborted)" time.
> > > Why don't you reference pg_stat_activity's "state" column and
> > > "active_time" is the total time when the state is "active" and "fast
> > > path"?
> > > "idle in transaction" is as same too.
> >
> > Good idea; I have expanded the documentation like that.
> 
> BTW, is there any reason to merge the above statistics?
> IIUC, to separate statistics' cons is that two columns increase, and
> there is no performance penalty. So, I wonder that there is a way to 
> separate them
> corresponding to the state column of pg_stat_activity.

Sure, that could be done.

I decided to do it like this because I thought that few people would
be interested in "time spend doing fast-path function calls"; my guess
was that the more interesting value is "time where the database was
busy calculating results".

I tried to keep the balance between providing reasonable detail
while not creating more additional columns to "pg_stat_database"
than necessary.

This is of course a matter of taste, and it is good to hear different
opinions.  If more people share your opinion, I'll change the code.

> There are some following codes in pgstatfuncs.c.
> int64 result = 0.0;
> 
> But, I think the following is better.
> int64 result = 0;

You are right.  That was a silly copy-and-paste error.  Fixed.

> Although now pg_stat_get_db_session_time is initialize "result" to zero 
> when it is declared,
> another pg_stat_XXX function didn't initialize. Is it better to change 
> it?

I looked at other similar functions, and the ones I saw returned
NULL if there were no data.  In that case, it makes sense to write

char *result;

if ((result = get_stats_data()) == NULL)
PG_RETURN_NULL();

PG_RETURN_TEXT_P(cstring_to_text(result));

But I want to return 0 for the session time if there are no data yet,
so I think initializing the result to 0 in the declaration makes sense.

There are some functions that do it like this:

int32   result;

result = 0;
for (...)
{
if (...)
result++;
}

PG_RETURN_INT32(result);

Again, it is a matter of taste, and I didn't detect a clear pattern
in the existing code that I feel I should follow in this question.

Version 12 of the patch is attached.

Yours,
Laurenz Albe
From 324847353f5d9e5b2899dd93d43fb345df1dcdb8 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 7 Jan 2021 16:33:45 +0100
Subject: [PATCH] Add session statistics to pg_stat_database

If "track_counts" is active, track the following per database:
- total number of connections
- number of sessions that ended by loss of network connection,
  fatal errors and operator intervention
- total time spent in database sessions
- total time spent executing queries
- total idle in transaction time

This is useful to check if connection pooling is working.
It also helps to estimate the size of the connection pool
required to keep the database busy, which depends on the
percentage of the transaction time that is spent idling.

Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.ca...@cybertec.at
Reviewed-By: Soumyadeep Chakraborty, Justin Pryzby, Masahiro Ikeda, Magnus Hagander

(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
---
 doc/src/sgml/monitoring.sgml |  77 +++
 src/backend/catalog/system_views.sql |   7 ++
 src/backend/postmaster/pgstat.c  | 134 ++-
 src/backend/tcop/postgres.c  |  11 ++-
 src/backend/utils/adt/pgstatfuncs.c  |  94 +++
 src/backend/utils/error/elog.c   |   8 ++
 src/include/catalog/pg_proc.dat  |  32 +++
 src/include/pgstat.h |  39 
 src/test/regress/expected/rules.out  |   7 ++
 9 files changed, 404 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 43fe8ae383..59622173da 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3737,6 +3737,83 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   session_time double precision
+  
+  
+   Time spent by database sessions in this database, in milliseconds
+   (note that statistics are only updated when the state of a session
+   changes, so if sessions have been idle for a long time, this idle time
+   won't be included)
+  
+ 
+
+ 
+  
+   active_time double precision
+  
+  
+   Time spent executing SQL statements in this database, in milliseconds
+   (this corresponds to the states active and
+   fastpath function call in
+   
+   pg_stat_activity)
+  
+ 
+
+ 
+  
+   

Re: Single transaction in the tablesync worker?

2021-01-08 Thread Peter Smith
On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie  wrote:
>
> > PSA the v12 patch for the Tablesync Solution1.
> >
> > Differences from v11:
> >   + Added PG docs to mention the tablesync slot
> >   + Refactored tablesync slot drop (done by
> > DropSubscription/process_syncing_tables_for_sync)
> >   + Fixed PG docs mentioning wrong state code
> >   + Fixed wrong code comment describing TCOPYDONE state
> >
> Hi
>
> I look into the new patch and have some comments.
>
> 1.
> +   /* Setup replication origin tracking. */
> ①+  originid = replorigin_by_name(originname, true);
> +   if (!OidIsValid(originid))
> +   {
>
> ②+  originid = replorigin_by_name(originname, true);
> +   if (originid != InvalidRepOriginId)
> +   {
>
> There are two different style code which check whether originid is valid.
> Both are fine, but do you think it’s better to have a same style here?

Yes. I think the 1st style is better, so I used the OidIsValid for all
the new code of the patch.
But the check in DropSubscription is an exception; there I used 2nd
style but ONLY to be consistent with another originid check which
already existed in that same function.

>
>
> 2.
>   *  state to SYNCDONE.  There might be zero changes applied 
> between
>   *  CATCHUP and SYNCDONE, because the sync worker might be ahead 
> of the
>   *  apply worker.
> + *- The sync worker has a intermediary state TCOPYDONE which comes 
> after
> + * DATASYNC and before SYNCWAIT. This state indicates that the 
> initial
>
> This comment about TCOPYDONE is better to be placed at [1]*, where is between 
> DATASYNC and SYNCWAIT.
>
>  * - Tablesync worker starts; changes table state from INIT to 
> DATASYNC while
>  *   copying.
>  [1]*
>  * - Tablesync worker finishes the copy and sets table state to 
> SYNCWAIT;
>  *   waits for state change.
>

Agreed. I have moved the comment per your suggestion (and I also
re-worded it again).
Fixed in latest patch [v13]

> 3.
> +   /*
> +* To build a slot name for the sync work, we are limited to 
> NAMEDATALEN -
> +* 1 characters.
> +*
> +* The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). 
> (It's
> +* actually the NAMEDATALEN on the remote that matters, but this 
> scheme
> +* will also work reasonably if that is different.)
> +*/
> +   StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small");   /* 
> for sanity */
> +
> +   syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
>
> The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
> But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not 
> NAMEDATALEN - 1.
> Should we change the comment here?
>

The comment wording is a remnant from older code which had a
differently format slot name.
I think the comment is still valid, albeit maybe unnecessary since in
the current code the tablesync slot
name length is fixed. But I left the older comment here as a safety reminder
in case some future change would want to modify the slot name. What do
you think?


[v13] = 
https://www.postgresql.org/message-id/CAHut%2BPvby4zg6kM1RoGd_j-xs9OtPqZPPVhbiC53gCCRWdNSrw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-08 Thread vignesh C
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow  wrote:
>
> Posting an updated set of patches to address recent feedback:
>
> - Removed conditional-locking code used in parallel-safety checking
> code (Tsunakawa-san feedback). It turns out that for the problem test
> case, no parallel-safety checking should be occurring that locks
> relations because those inserts are specifying VALUES, not an
> underlying SELECT, so the parallel-safety checking code was updated to
> bail out early if no underlying SELECT is specified for the INSERT. No
> change to the test code was required.
> - Added comment to better explain the reason for treating "INSERT ...
> ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip)
> - Added assertion to heap_prepare_insert() (Amit)
> - Updated error handling for NULL index_expr_item case (Vignesh)

Thanks Greg for fixing and posting a new patch.
Few comments:
+-- Test INSERT with underlying query.
+-- (should create plan with parallel SELECT, Gather parent node)
+--
+explain(costs off) insert into para_insert_p1 select unique1,
stringu1 from tenk1;
+   QUERY PLAN
+
+ Insert on para_insert_p1
+   ->  Gather
+ Workers Planned: 4
+ ->  Parallel Seq Scan on tenk1
+(4 rows)
+
+insert into para_insert_p1 select unique1, stringu1 from tenk1;
+select count(*), sum(unique1) from para_insert_p1;
+ count |   sum
+---+--
+ 1 | 49995000
+(1 row)
+

For one of the test you can validate that the same transaction has
been used by all the parallel workers, you could use something like
below to validate:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  para_insert_p1) as dt;

Few includes are not required:
 #include "executor/nodeGather.h"
+#include "executor/nodeModifyTable.h"
 #include "executor/nodeSubplan.h"
 #include "executor/tqueue.h"
 #include "miscadmin.h"
@@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
GatherState *gatherstate;
Plan   *outerNode;
TupleDesc   tupDesc;
+   Index   varno;

This include is not required in nodeModifyTable.c

+#include "catalog/index.h"
+#include "catalog/indexing.h"

@@ -43,7 +49,11 @@
 #include "parser/parse_agg.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
+#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"

The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c

There are few typos:
+table and populate it can use a parallel plan. Another
exeption is the command
+INSERT INTO ... SELECT ... which can use a
parallel plan for
+the underlying SELECT part of the query.

exeption should be exception

+ /*
+ * For the number of workers to use for a parallel
+ * INSERT/UPDATE/DELETE, it seems resonable to use the same number
+ * of workers as estimated for the underlying query.
+ */
+ parallelModifyWorkers = path->parallel_workers;
resonable should be reasonable

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-01-08 Thread Bharath Rupireddy
On Fri, Jan 8, 2021 at 1:50 PM japin  wrote:
> Thanks for updating the patch!
>
> +   /* Get the data generating query. */
> +   dataQuery = get_matview_query(stmt, , );
>
> -   /*
> -* Check for active uses of the relation in the current transaction, 
> such
> -* as open scans.
> -*
> -* NB: We count on this to protect us against problems with 
> refreshing the
> -* data using TABLE_INSERT_FROZEN.
> -*/
> -   CheckTableNotInUse(matviewRel, "REFRESH MATERIALIZED VIEW");
> +   relowner = matviewRel->rd_rel->relowner;
>
> After apply the patch, there is a duplicate
>
> relowner = matviewRel->rd_rel->relowner;

Corrected that.

> +   else if(matviewInfo)
> +   dest = 
> CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap);
>
> If the `matviewInfo->OIDNewHeap` is invalid, IMO we don't need create
> DestReceiver, isn't it?  And we should add a space after `if`.

Yes, we can skip creating the dest receiver when OIDNewHeap is
invalid, this can happen for plain explain refresh mat view case.

if (explainInfo && !explainInfo->es->analyze)
OIDNewHeap = InvalidOid;
else
OIDNewHeap = get_new_heap_oid(stmt, matviewRel, matviewOid,
  );

Since we don't call ExecutorRun for plain explain, we can skip the
dest receiver creation. I modified the code as below in explain.c.

if (into)
dest = CreateIntoRelDestReceiver(into);
else if (matviewInfo && OidIsValid(matviewInfo->OIDNewHeap))
dest = CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap);
else
dest = None_Receiver;

Thanks for taking a look at the patches.

Attaching v3 patches, please consider these for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From f6f1760e6b460d3265ee343d5b1d53f82f45fee6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 8 Jan 2021 14:04:26 +0530
Subject: [PATCH v3 1/2]  Rearrange Refresh Mat View Code

Currently, the function ExecRefreshMatView in matview.c is having
many lines of code which is not at all good from readability and
maintainability perspectives. This patch adds few functions and
moves the code from ExecRefreshMatView to them making the code
look better.
---
 src/backend/commands/matview.c | 452 -
 1 file changed, 272 insertions(+), 180 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c5c25ce11d..06bd5629a8 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -64,7 +64,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty
 static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
-static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
+static uint64 refresh_matview_datafill(Oid OIDNewHeap, Query *query,
 	   const char *queryString);
 static char *make_temptable_name_n(char *tempname, int n);
 static void refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
@@ -73,6 +73,16 @@ static void refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersist
 static bool is_usable_unique_index(Relation indexRel);
 static void OpenMatViewIncrementalMaintenance(void);
 static void CloseMatViewIncrementalMaintenance(void);
+static Query *get_matview_query(RefreshMatViewStmt *stmt, Relation *rel,
+Oid *objectId);
+static Query *rewrite_refresh_matview_query(Query *dataQuery);
+static Oid get_new_heap_oid(RefreshMatViewStmt *stmt, Relation matviewRel,
+			Oid matviewOid, char *relpersistence);
+static void match_matview_with_new_data(RefreshMatViewStmt *stmt,
+		Relation matviewRel, Oid matviewOid,
+		Oid OIDNewHeap, Oid relowner,
+		int save_sec_context,
+		char relpersistence, uint64	processed);
 
 /*
  * SetMatViewPopulatedState
@@ -140,127 +150,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 {
 	Oid			matviewOid;
 	Relation	matviewRel;
-	RewriteRule *rule;
-	List	   *actions;
 	Query	   *dataQuery;
-	Oid			tableSpace;
-	Oid			relowner;
 	Oid			OIDNewHeap;
-	DestReceiver *dest;
 	uint64		processed = 0;
-	bool		concurrent;
-	LOCKMODE	lockmode;
+	Oid			relowner;
 	char		relpersistence;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	ObjectAddress address;
 
-	/* Determine strength of lock needed. */
-	concurrent = stmt->concurrent;
-	lockmode = concurrent ? ExclusiveLock : AccessExclusiveLock;
-
-	/*
-	 * Get a lock until end of transaction.
-	 */
-	matviewOid = RangeVarGetRelidExtended(stmt->relation,
-		  lockmode, 0,
-		  RangeVarCallbackOwnsTable, NULL);
-	matviewRel = table_open(matviewOid, NoLock);
-
-	/* Make sure it is a materialized view. */
-	if (matviewRel->rd_rel->relkind 

Re: Improper use about DatumGetInt32

2021-01-08 Thread Peter Eisentraut

On 2020-12-25 08:45, Michael Paquier wrote:

If we really think that we ought to differentiate, then we could do what
pg_stat_statement does, and have a separate C function that's called
with the obsolete signature (pg_stat_statements_1_8 et al).

With the 1.8 flavor, it is possible to pass down a negative number
and it may not fail depending on the number of blocks of the relation,
so I think that you had better have a compatibility layer if a user
has the new binaries but is still on 1.8.  And that's surely a safe
move.


I think on 64-bit systems it's actually safe, but on 32-bit systems 
(with USE_FLOAT8_BYVAL), if you use the new binaries with the old 
SQL-level definitions, you'd get the int4 that is passed in interpreted 
as a pointer, which would lead to very bad things.  So I think we need 
to create new functions with a different C symbol.  I'll work on that.





Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-08 Thread Amit Kapila
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska  wrote:
>
> Greg Nancarrow  wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
..
>
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch
> ---
>
> @@ -1021,12 +1039,15 @@ IsInParallelMode(void)
>   * Prepare for entering parallel mode, based on command-type.
>   */
>  void
> -PrepareParallelMode(CmdType commandType)
> +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
>  {
> Assert(!IsInParallelMode() || force_parallel_mode != 
> FORCE_PARALLEL_OFF);
>
> if (IsModifySupportedInParallelMode(commandType))
> {
> +   if (isParallelModifyLeader)
> +   (void) GetCurrentCommandId(true);
>
> I miss a comment here. I suppose this is to set currentCommandIdUsed, so that
> the leader process gets a new commandId for the following statements in the
> same transaction, and thus it can see the rows inserted by the parallel
> workers?
>

oh no, leader backend and worker backends must use the same commandId.
I am also not sure if we need this because for Insert statements we
already call GetCurrentCommandId(true) is standard_ExecutorStart. We
don't want the rows visibility behavior for parallel-inserts any
different than non-parallel ones.

> If my understanding is correct, I think that the leader should not participate
> in the execution of the Insert node, else it would use higher commandId than
> the workers. That would be weird, although probably not data corruption.
>

Yeah, exactly this is the reason both leader and backends must use the
same commandId.

> I
> wonder if parallel_leader_participation should be considered false for the
> "Gather -> Insert -> ..." plans.
>

If what I said above is correct then this is moot.

>
>
> @@ -208,7 +236,7 @@ ExecGather(PlanState *pstate)
> }
>
> /* Run plan locally if no workers or enabled and not 
> single-copy. */
> -   node->need_to_scan_locally = (node->nreaders == 0)
> +   node->need_to_scan_locally = (node->nworkers_launched <= 0)
> || (!gather->single_copy && 
> parallel_leader_participation);
> node->initialized = true;
> }
>
> Is this change needed? The code just before this test indicates that nreaders
> should be equal to nworkers_launched.
>

This change is required because we don't need to set up readers for
parallel-insert unless there is a returning clause. See the below
check a few lines before this change:

- if (pcxt->nworkers_launched > 0)
+ if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader &&
!isParallelModifyWithReturning))
  {

I think this check could be simplified to if (pcxt->nworkers_launched
> 0 && isParallelModifyWithReturning) or something like that.

>
> In grouping_planner(), this branch
>
> +   /* Consider a supported parallel table-modification command */
> +   if (IsModifySupportedInParallelMode(parse->commandType) &&
> +   !inheritance_update &&
> +   final_rel->consider_parallel &&
> +   parse->rowMarks == NIL)
> +   {
>
> is very similar to creation of the non-parallel ModifyTablePaths - perhaps an
> opportunity to move the common code into a new function.
>

+1.

>
> @@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool 
> inheritance_update,
> }
> }
>
> +   if (parallel_modify_partial_path_count > 0)
> +   {
> +   final_rel->rows = current_rel->rows;/* ??? why hasn't 
> this been
> + 
>* set above somewhere  */
> +   generate_useful_gather_paths(root, final_rel, false);
> +   }
> +
> extra.limit_needed = limit_needed(parse);
> extra.limit_tuples = limit_tuples;
> extra.count_est = count_est;
>
> A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no
> need to count the paths using parallel_modify_partial_path_count.
>

Sounds sensible.

>
> @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> PlannerGlobal *glob = root->glob;
> int rtoffset = list_length(glob->finalrtable);
> ListCell   *lc;
> +   Plan   *finalPlan;
>
> /*
>  * Add all the query's RTEs to the flattened rangetable.  The live 
> ones
> @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> }
>
> /* Now fix the Plan tree */
> -   return set_plan_refs(root, plan, rtoffset);
> +   finalPlan = set_plan_refs(root, plan, rtoffset);
> +   if (finalPlan != NULL && IsA(finalPlan, Gather))
> +   {
> +   Plan   *subplan = outerPlan(finalPlan);
> +
> +   if (IsA(subplan, ModifyTable) && 

Re: proposal: enhancing plpgsql debug API - returns text value of variable content

2021-01-08 Thread Pavel Stehule
Hi

rebase

Regards

Pavel
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 3a9349b724..208ba181fc 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4123,6 +4123,8 @@ plpgsql_estate_setup(PLpgSQL_execstate *estate,
 	{
 		(*plpgsql_plugin_ptr)->error_callback = plpgsql_exec_error_callback;
 		(*plpgsql_plugin_ptr)->assign_expr = exec_assign_expr;
+		(*plpgsql_plugin_ptr)->eval_datum = exec_eval_datum;
+		(*plpgsql_plugin_ptr)->cast_value = do_cast_value;
 
 		if ((*plpgsql_plugin_ptr)->func_setup)
 			((*plpgsql_plugin_ptr)->func_setup) (estate, func);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 0b0ff4e5de..82a703639e 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -417,6 +417,7 @@ pl_block		: decl_sect K_BEGIN proc_sect exception_sect K_END opt_label
 		new->cmd_type	= PLPGSQL_STMT_BLOCK;
 		new->lineno		= plpgsql_location_to_lineno(@2);
 		new->stmtid		= ++plpgsql_curr_compile->nstatements;
+		new->ns			= plpgsql_ns_top();
 		new->label		= $1.label;
 		new->n_initvars = $1.n_initvars;
 		new->initvarnos = $1.initvarnos;
@@ -911,7 +912,8 @@ stmt_perform	: K_PERFORM
 		new = palloc0(sizeof(PLpgSQL_stmt_perform));
 		new->cmd_type = PLPGSQL_STMT_PERFORM;
 		new->lineno   = plpgsql_location_to_lineno(@1);
-		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		plpgsql_push_back_token(K_PERFORM);
 
 		/*
@@ -947,6 +949,7 @@ stmt_call		: K_CALL
 		new->cmd_type = PLPGSQL_STMT_CALL;
 		new->lineno = plpgsql_location_to_lineno(@1);
 		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->ns = plpgsql_ns_top();
 		plpgsql_push_back_token(K_CALL);
 		new->expr = read_sql_stmt();
 		new->is_call = true;
@@ -963,6 +966,7 @@ stmt_call		: K_CALL
 		new->cmd_type = PLPGSQL_STMT_CALL;
 		new->lineno = plpgsql_location_to_lineno(@1);
 		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->ns = plpgsql_ns_top();
 		plpgsql_push_back_token(K_DO);
 		new->expr = read_sql_stmt();
 		new->is_call = false;
@@ -998,7 +1002,8 @@ stmt_assign		: T_DATUM
 		new = palloc0(sizeof(PLpgSQL_stmt_assign));
 		new->cmd_type = PLPGSQL_STMT_ASSIGN;
 		new->lineno   = plpgsql_location_to_lineno(@1);
-		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->varno = $1.datum->dno;
 		/* Push back the head name to include it in the stmt */
 		plpgsql_push_back_token(T_DATUM);
@@ -1020,6 +1025,7 @@ stmt_getdiag	: K_GET getdiag_area_opt K_DIAGNOSTICS getdiag_list ';'
 		new->cmd_type = PLPGSQL_STMT_GETDIAG;
 		new->lineno   = plpgsql_location_to_lineno(@1);
 		new->stmtid	  = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->is_stacked = $2;
 		new->diag_items = $4;
 
@@ -1192,6 +1198,7 @@ stmt_if			: K_IF expr_until_then proc_sect stmt_elsifs stmt_else K_END K_IF ';'
 		new->cmd_type	= PLPGSQL_STMT_IF;
 		new->lineno		= plpgsql_location_to_lineno(@1);
 		new->stmtid		= ++plpgsql_curr_compile->nstatements;
+		new->ns			= plpgsql_ns_top();
 		new->cond		= $2;
 		new->then_body	= $3;
 		new->elsif_list = $4;
@@ -1297,6 +1304,7 @@ stmt_loop		: opt_loop_label K_LOOP loop_body
 		new->cmd_type = PLPGSQL_STMT_LOOP;
 		new->lineno   = plpgsql_location_to_lineno(@2);
 		new->stmtid   = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->label	  = $1;
 		new->body	  = $3.stmts;
 
@@ -1315,6 +1323,7 @@ stmt_while		: opt_loop_label K_WHILE expr_until_loop loop_body
 		new->cmd_type = PLPGSQL_STMT_WHILE;
 		new->lineno   = plpgsql_location_to_lineno(@2);
 		new->stmtid	  = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->label	  = $1;
 		new->cond	  = $3;
 		new->body	  = $4.stmts;
@@ -1336,6 +1345,7 @@ stmt_for		: opt_loop_label K_FOR for_control loop_body
 			new = (PLpgSQL_stmt_fori *) $3;
 			new->lineno   = plpgsql_location_to_lineno(@2);
 			new->stmtid	  = ++plpgsql_curr_compile->nstatements;
+			new->ns		  = plpgsql_ns_top();
 			new->label	  = $1;
 			new->body	  = $4.stmts;
 			$$ = (PLpgSQL_stmt *) new;
@@ -1351,6 +1361,7 @@ stmt_for		: opt_loop_label K_FOR for_control loop_body
 			new = (PLpgSQL_stmt_forq *) $3;
 			new->lineno   = plpgsql_location_to_lineno(@2);
 			new->stmtid	  = ++plpgsql_curr_compile->nstatements;
+			new->ns		  = plpgsql_ns_top();
 			new->label	  = $1;
 			new->body	  = $4.stmts;
 			$$ = (PLpgSQL_stmt *) new;
@@ -1381,6 +1392,7 @@ for_control		: for_variable K_IN
 			new = 

Re: Single transaction in the tablesync worker?

2021-01-08 Thread Peter Smith
Hi Amit.

PSA the v13 patch for the Tablesync Solution1.

Differences from v12:
+ Fixed whitespace errors of v12-0001
+ Modify TCOPYDONE state comment (houzj feedback)
+ WIP fix for AlterSubscripion_refresh (Amit feedback)



Features:

* The tablesync slot is now permanent instead of temporary. The
tablesync slot name is no longer tied to the Subscription slot na

* The tablesync slot cleanup (drop) code is added for DropSubscription
and for process_syncing_tables_for_sync functions

* The tablesync worker is now allowing multiple tx instead of single tx

* A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.

* If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then
it will bypass the initial copy_table phase.

* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.

* The tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.

* The DropSubscription cleanup code was enhanced (v7+) to take care of
any crashed tablesync workers.

* Updates to PG docs

TODO / Known Issues:

* Address review comments

* ALTER PUBLICATION DROP TABLE can mean knowledge of tablesyncs gets
lost causing resource cleanup to be missed. There is a WIP fix for
this in the AlterSubscription_refresh, however it is not entirely
correct; there are known race conditions. See FIXME comments.

---

Kind Regards,
Peter Smith.
Fujitsu Australia

On Thu, Jan 7, 2021 at 6:52 PM Peter Smith  wrote:
>
> Hi Amit.
>
> PSA the v12 patch for the Tablesync Solution1.
>
> Differences from v11:
>   + Added PG docs to mention the tablesync slot
>   + Refactored tablesync slot drop (done by
> DropSubscription/process_syncing_tables_for_sync)
>   + Fixed PG docs mentioning wrong state code
>   + Fixed wrong code comment describing TCOPYDONE state
>
> 
>
> Features:
>
> * The tablesync slot is now permanent instead of temporary. The
> tablesync slot name is no longer tied to the Subscription slot na
>
> * The tablesync slot cleanup (drop) code is added for DropSubscription
> and for process_syncing_tables_for_sync functions
>
> * The tablesync worker is now allowing multiple tx instead of single tx
>
> * A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful
> copy_table in LogicalRepSyncTableStart.
>
> * If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then
> it will bypass the initial copy_table phase.
>
> * Now tablesync sets up replication origin tracking in
> LogicalRepSyncTableStart (similar as done for the apply worker). The
> origin is advanced when first created.
>
> * The tablesync replication origin tracking is cleaned up during
> DropSubscription and/or process_syncing_tables_for_apply.
>
> * The DropSubscription cleanup code was enhanced (v7+) to take care of
> any crashed tablesync workers.
>
> * Updates to PG docs
>
> TODO / Known Issues:
>
> * Address review comments
>
> * Patch applies with whitespace warning
>
> ---
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia


v13-0002-Tablesync-extra-logging.patch
Description: Binary data


v13-0001-Tablesync-Solution1.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-01-08 Thread Amit Langote
Hi,

On Mon, Dec 28, 2020 at 10:01 PM Arne Roland  wrote:
> While I'd agree that simply deleting with "on delete cascade" on tuple 
> rerouting is a strong enough violation of the spirit of partitioning changing 
> that behavior, I am surprised by the idea to make a differentiation between 
> fks and other triggers. The way user defined triggers work, is a violation to 
> the same degree I get complaints about that on a monthly (if not weekly) 
> basis. It's easy to point towards the docs, but the docs offer no solution to 
> archive the behavior, that makes partitioning somewhat transparent. Most 
> people I know don't partition because they like to complicate things, but 
> just because they have to much data.
>
> It isn't just a thing with after triggers. Imo before triggers are even 
> worse: If we move a row between partitions we fire all three triggers at once 
> (at different leaf pages obviously). It's easy to point towards the docs. 
> Heart bleed was well documented, but I am happy that it was fixed. I don't 
> want to imply this totally unrelated security issue has anything to do with 
> our weird behavior. I just want to raise the question whether that's a good 
> thing, because frankly I haven't met anyone thus far, who thought it is.

Just to be clear, are you only dissatisfied with the firing of
triggers during a row-moving UPDATEs on partitioned tables or all
firing behaviors of triggers defined on partitioned tables?  Can you
give a specific example of the odd behavior in that case?

> To me it seems the RI triggers are just a specific victim of the way we made 
> triggers work in general.

That's true.

> What I tried to express, albeit I apparently failed: I care about having 
> triggers, which make partitioning transparent, on the long run.
>
> > because what can happen
> > today with CASCADE triggers does not seem like a useful behavior by
> > any measure.
>
> I wholeheartedly agree. I just want to point out, that you could state the 
> same about triggers in general.
>
> Backwards compatibility is usually a pretty compelling argument. I would 
> still want to raise the question, whether this should change, because I 
> frankly think this is a bad idea.
>
> You might ask: Why am I raising this question in this thread?
>
> In case we can not (instantly) agree on the fact that this behavior should 
> change, I'd still prefer to think about making that behavior possible for 
> other triggers (possibly later) as well. One possibility would be having an 
> entry in the catalog to mark when the trigger should fire.

Sorry, I don't understand what new setting for triggers you are
thinking of here.

> I don't want to stall your definitely useful RI-Trigger fix, but I sincerely 
> believe, that we have to do better with triggers in general.
>
> If we would make the condition when to fire or not dependent something 
> besides that fact whether the trigger is internal, we could at a later date 
> choose to make both ways available, if anyone makes a good case for this. 
> Even though I still think it's not worth it.
>
> >I don't quite recall if the decision to implement it like this was
> > based on assuming that this is what users would like to see happen in
> > this case or the perceived difficulty of implementing it the other way
> > around, that is, of firing AFTER UPDATE triggers in this case.
>
> I tried to look that up, but I couldn't find any discussion about this. Do 
> you have any ideas in which thread that was handled?

It was discussed here:

https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails.  You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach.  Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)


--
Amit Langote
EDB: http://www.enterprisedb.com




Re: In-placre persistance change of a relation

2021-01-08 Thread Kyotaro Horiguchi
At Fri, 08 Jan 2021 14:47:05 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> This version RelationChangePersistence() is changed not to choose
> in-place method for indexes other than btree. It seems to be usable
> with all kind of indexes other than Gist, but at the mement it applies
> only to btrees.
> 
> 1: 
> https://www.postgresql.org/message-id/ca+tgmozez5rons49c7mepjhjndqmqtvrz_lcqukprwdmrev...@mail.gmail.com

Hmm. This is not wroking correctly. I'll repost after fixint that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Proposal: Global Index

2021-01-08 Thread Julien Rouhaud
On Fri, Jan 8, 2021 at 4:02 PM 曾文旌  wrote:
>
> > 2021年1月7日 22:16,Bruce Momjian  写道:
> >
> > On Thu, Jan  7, 2021 at 05:44:01PM +0800, 曾文旌 wrote:
> >> I've been following this topic for a long time. It's been a year since the 
> >> last response.
> >> It was clear that our customers wanted this feature as well, and a large 
> >> number of them mentioned it.
> >>
> >> So, I wish the whole feature to mature as soon as possible.
> >> I summarized the scheme mentioned in the email and completed the POC 
> >> patch(base on PG_13).
> >
> > I think you need to address the items mentioned in this blog, and the
> > email link it mentions:
> >
> >   https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020
>
> Thank you for your reply.
> I read your blog and it helped me a lot.
>
> The blog mentions a specific problem: "A large global index might also 
> reintroduce problems that prompted the creation of partitioning in the first 
> place. "
> I don't quite understand, could you give some specific information?
>
> In addition you mentioned: "It is still unclear if these use-cases justify 
> the architectural changes needed to enable global indexes."
> Please also describe the problems you see, I will confirm each specific issue 
> one by one.

One example is date partitioning.  People frequently need to store
only the most recent data.  For instance doing a monthly partitioning
and dropping the oldest partition every month once you hit the wanted
retention is very efficient for that use case, as it should be almost
instant (provided that you can acquire the necessary locks
immediately).  But if you have a global index, you basically lose the
advantage of partitioning as it'll require heavy changes on that
index.




Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-01-08 Thread japin


On Thu, 07 Jan 2021 at 17:53, Bharath Rupireddy wrote:
> On Mon, Dec 28, 2020 at 5:56 PM Bharath Rupireddy
>  wrote:
>>
>> On Tue, Dec 22, 2020 at 7:01 PM Bharath Rupireddy
>>  wrote:
>> > Currently, $subject is not allowed. We do plan the mat view query
>> > before every refresh. I propose to show the explain/explain analyze of
>> > the select part of the mat view in case of Refresh Mat View(RMV). It
>> > will be useful for the user to know what exactly is being planned and
>> > executed as part of RMV. Please note that we already have
>> > explain/explain analyze CTAS/Create Mat View(CMV), where we show the
>> > explain/explain analyze of the select part. This proposal will do the
>> > same thing.
>> >
>> > The behaviour can be like this:
>> > EXPLAIN REFRESH MATERIALIZED VIEW mv1;   --> will not refresh the mat
>> > view, but shows the select part's plan of mat view.
>> > EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW mv1;   --> will refresh the
>> > mat view and shows the select part's plan of mat view.
>> >
>> > Thoughts? If okay, I will post a patch later.
>>
>> Attaching below patches:
>>
>> 0001 - Rearrange Refresh Mat View Code - Currently, the function
>> ExecRefreshMatView in matview.c is having many lines of code which is
>> not at all good from readability and maintainability perspectives.
>> This patch adds a few functions and moves the code from
>> ExecRefreshMatView to them making the code look better.
>>
>> 0002 - EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW support and tests.
>>
>> If this proposal is useful, I have few open points - 1) In the patch I
>> have added a new mat view info parameter to ExplainOneQuery(), do we
>> also need to add it to ExplainOneQuery_hook_type? 2) Do we document
>> (under respective command pages or somewhere else) that we allow
>> explain/explain analyze for a command?
>>
>> Thoughts?
>
> Attaching v2 patch set reabsed on the latest master f7a1a805cb. And
> also added an entry for upcoming commitfest -
> https://commitfest.postgresql.org/32/2928/
>
> Please consider the v2 patches for further review.
>

Thanks for updating the patch!

+   /* Get the data generating query. */
+   dataQuery = get_matview_query(stmt, , );
 
-   /*
-* Check for active uses of the relation in the current transaction, 
such
-* as open scans.
-*
-* NB: We count on this to protect us against problems with refreshing 
the
-* data using TABLE_INSERT_FROZEN.
-*/
-   CheckTableNotInUse(matviewRel, "REFRESH MATERIALIZED VIEW");
+   relowner = matviewRel->rd_rel->relowner;

After apply the patch, there is a duplicate

relowner = matviewRel->rd_rel->relowner;

+   else if(matviewInfo)
+   dest = CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap);

If the `matviewInfo->OIDNewHeap` is invalid, IMO we don't need create
DestReceiver, isn't it?  And we should add a space after `if`.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.Ltd.




Re: Why latestRemovedXid|cuteoff_xid are always sent?

2021-01-08 Thread Michail Nikolaev
Hello, Peter.

Thanks for your explanation. One of the reasons I was asking - is an idea
to use the same technique in the "LP_DEAD index hint bits on standby" WIP
patch to reduce the amount of additional WAL.

Now I am sure such optimization should work correctly.

Thanks,
Michail.


Re: Proposal: Global Index

2021-01-08 Thread 曾文旌


> 2021年1月7日 22:16,Bruce Momjian  写道:
> 
> On Thu, Jan  7, 2021 at 05:44:01PM +0800, 曾文旌 wrote:
>> I've been following this topic for a long time. It's been a year since the 
>> last response.
>> It was clear that our customers wanted this feature as well, and a large 
>> number of them mentioned it.
>> 
>> So, I wish the whole feature to mature as soon as possible.
>> I summarized the scheme mentioned in the email and completed the POC 
>> patch(base on PG_13).
> 
> I think you need to address the items mentioned in this blog, and the
> email link it mentions:
> 
>   https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020

Thank you for your reply.
I read your blog and it helped me a lot.

The blog mentions a specific problem: "A large global index might also 
reintroduce problems that prompted the creation of partitioning in the first 
place. "
I don't quite understand, could you give some specific information?

In addition you mentioned: "It is still unclear if these use-cases justify the 
architectural changes needed to enable global indexes."
Please also describe the problems you see, I will confirm each specific issue 
one by one.


Thanks

Wenjing


> 
> I am not clear this is a feature we will want.  Yes, people ask for it,
> but if the experience will be bad for them and they will regret using
> it, I am not sure we want it.  Of course, if you code it up and we get
> a good user experience, we would want it --- I am just saying it is not
> clear right now.
> 
> -- 
>  Bruce Momjian  https://momjian.us
>  EnterpriseDB https://enterprisedb.com
> 
>  The usefulness of a cup is in its emptiness, Bruce Lee



smime.p7s
Description: S/MIME cryptographic signature