date:20210715

On Fri, Jul 16, 2021 at 1:04 AM Dean Rasheed  wrote:
> Having pushed [1], I started looking at this, and I think it's mostly
> in good shape.

Thanks a lot for taking a look at this.

> Since we're improving the CREATE COLLATION errors, I think it's also
> worth splitting out the error for LOCALE + LC_COLLATE/LC_CTYPE from
> the error for FROM + any other option.
>
> In the case of LOCALE + LC_COLLATE/LC_CTYPE, there is an identical
> error in CREATE DATABASE, so we should use the same error message and
> detail text here.
>
> Then logically, FROM + any other option should have an error of the
> same general form.
>
> And finally, it then makes sense to make the other errors follow the
> same pattern (with the "specified more than once" text in the detail),
> which is also where we ended up in the discussion over in [1].
>
> So, attached is what I propose.

Here are some comments:

1) Duplicate option check for "FROM" option is unnecessary and will be
a dead code. Because the syntaxer anyways would catch if FROM is
specified more than once, something like CREATE COLLATION mycoll1 FROM
FROM "C";.
+ {
+ if (fromEl)
+ errorDuplicateDefElem(defel, pstate);
  defelp = 

And we might think to catch below errors:

postgres=# CREATE COLLATION coll_dup_chk (FROM = "C", FROM = "C",
VERSION = "1");
ERROR:  conflicting or redundant options
LINE 1: CREATE COLLATION coll_dup_chk (FROM = "C", FROM = "C", VERSI...
   ^
DETAIL:  Option "from" specified more than once.

But IMO, the above should fail with:

postgres=# CREATE COLLATION coll_dup_chk (FROM = "C", FROM = "C",
VERSION = "1");
ERROR:  conflicting or redundant options
DETAIL:  FROM cannot be specified together with any other options.

2) I don't understand the difference between errorConflictingDefElem
and errorDuplicateDefElem. Isn't the following statement "This should
only be used if defel->defname is guaranteed to match the user-entered
option name"
true with errorConflictingDefElem? I mean, aren't we calling
errorConflictingDefElem only if the defel->defname is guaranteed to
match the user-entered option name? I don't see much use of
errdetail("Option \"%s\" specified more than once.", defel->defname),
in errorDuplicateDefElem when we have pstate and that sort of points
to the option that's specified more than once.
+
+/*
+ * Raise an error about a duplicate DefElem.
+ *
+ * This is similar to errorConflictingDefElem(), except that it is intended for
+ * an option that the user explicitly specified more than once.  This should
+ * only be used if defel->defname is guaranteed to match the user-entered
+ * option name, otherwise the detail text might be confusing.
+ */

I personally don't like the new function errorDuplicateDefElem as it
doesn't add any value given the presence of pstate.

Regards,
Bharath Rupireddy.

Re: Introduce pg_receivewal gzip compression tests

2021-07-15 Thread Michael Paquier

On Fri, Jul 16, 2021 at 08:59:11AM +0900, Michael Paquier wrote:
> --compress is used and the sync fails for a non-compressed segment.
> Looking at the code it is pretty obvious that open_walfile() is
> getting confused with the handling of an existing .partial segment
> while walmethods.c uses dir_data->compression in all the places that
> matter.  So that's a legit bug, that happens only when mixing
> pg_receivewal runs for where successive runs use the compression or
> non-compression modes.

Ditto.  After reading the code more carefully, the code is actually
able to work even if it could be cleaner:
1) dir_existsfile() would check for the existence of a
non-compressed, partial segment all the time.
2) If this non-compressed file was padded, the code would use
open_for_write() that would open a compressed, partial segment.
3) The compressed, partial segment would be the one flushed.

This behavior is rather debatable, and it would be more instinctive to
me to just skip any business related to the pre-padding if compression
is enabled, at the cost of one extra callback in WalWriteMethod to
grab the compression level (dir_open_for_write() skips that for
compression) to allow receivelog.c to handle that.  But at the same
time few users are going to care about that as pg_receivewal has most
likely always the same set of options, so complicating this code is
not really appealing either.

> I am amazed that the other buildfarm members are not complaining, to
> be honest.  jacana runs this TAP test with MinGW and ZLIB, and does
> not complain.

I have spent more time on that with my own environment, and while
testing I have bumped on a different issue with zlib, which was
really weird.  In the same scenario as above, gzdopen() has been
failing for me at step 2), causing the test to loop forever.  We
document to use DLLs for ZLIB coming from zlib.net, but the ones
available there are really outdated as far as I can see (found some
called zlib.lib/dll myself, breaking Solution.pm).  For now I have
disabled those tests on Windows to bring back bowerbird to green, but
there is something else going on here.  We don't do much tests with
ZLIB on Windows for pg_basebackup and pg_dump, so there may be some
more issues?

@Andrew: which version of ZLIB are you using on bowerbird?  That's the
one in c:\prog\3p64.  That's a zdll.lib, right?
--
Michael

signature.asc
Description: PGP signature

Re: row filtering for logical replication

2021-07-15 Thread Dilip Kumar

On Fri, Jul 16, 2021 at 8:57 AM Amit Kapila  wrote:
>
> On Wed, Jul 14, 2021 at 4:30 PM Dilip Kumar  wrote:
> >
> > On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
> >  wrote:
> > >
> > > Is there some reasonable rule which of the old/new tuples (or both) to
> > > use for the WHERE condition? Or maybe it'd be handy to allow referencing
> > > OLD/NEW as in triggers?
> >
> > I think for insert we are only allowing those rows to replicate which
> > are matching filter conditions, so if we updating any row then also we
> > should maintain that sanity right? That means at least on the NEW rows
> > we should apply the filter, IMHO.  Said that, now if there is any row
> > inserted which were satisfying the filter and replicated, if we update
> > it with the new value which is not satisfying the filter then it will
> > not be replicated,  I think that makes sense because if an insert is
> > not sending any row to a replica which is not satisfying the filter
> > then why update has to do that, right?
> >
>
> There is another theory in this regard which is what if the old row
> (created by the previous insert) is not sent to the subscriber as that
> didn't match the filter but after the update, we decide to send it
> because the updated row (new row) matches the filter condition. In
> this case, I think it will generate an update conflict on the
> subscriber as the old row won't be present. As of now, we just skip
> the update but in the future, we might have some conflict handling
> there. If this is true then even if the new row matches the filter,
> there is no guarantee that it will be applied on the subscriber-side
> unless the old row also matches the filter.

Yeah, it's a valid point.

 Sure, there could be a
> case where the user might have changed the filter between insert and
> update but maybe we can have a separate way to deal with such cases if
> required like providing some provision where the user can specify
> whether it would like to match old/new row in updates?

Yeah, I think the best way is that users should get an option whether
they want to apply the filter on the old row or on the new row, or
both, in fact, they should be able to apply the different filters on
old and new rows.  I have one more thought in mind: currently, we are
providing a filter for the publication table, doesn't it make sense to
provide filters for operations of the publication table?  I mean the
different filters for Insert, delete, and the old row of update and
the new row of the update.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Speed up COMMIT PREPARED

2021-07-15 Thread r.takahash...@fujitsu.com

Hi,


I noticed that anti-virus software slow down the open().
I stopped the anti-virus software and re-run the test.
(Average of 10 times)

master: 1924tps
Hold_xlogreader.patch: 1993tps (+3.5%)
Read_from_walbuffer.patch: 1954tps(+1.5%)

Therefore, the effect of my patch is limited.

I'm sorry for the confusion.

Regards,
Ryohei Takahashi

Re: Added schema level support for publication.

2021-07-15 Thread Greg Nancarrow

On Wed, Jul 14, 2021 at 8:17 PM vignesh C  wrote:
>
> Thanks for your comments, the attached v11 patch fixes the issues.
>

Thanks for your work on this.

I have some minor review comments on the documentation:

(1) wrong link (uses altersubscription instead of alterpublication)
doc/src/sgml/catalogs.sgml

BEFORE:
+   created as an empty publication type. When a table or schema is added to
+   the publication using 
+   ALTER PUBLICATION then the publication type

AFTER:
+   created as an empty publication type. When a table or schema is added to
+   the publication using 
+   ALTER PUBLICATION then the publication type


(2) Improve wording and suggest "or" instead of "and"
doc/src/sgml/catalogs.sgml

BEFORE:
+   If a publication is created without specifying any of
+   FOR ALL TABLES, FOR TABLE and
+   FOR SCHEMA option, then the publication will be

AFTER:
+   If a publication is created without specifying any of the
+   FOR ALL TABLES, FOR TABLE or
+   FOR SCHEMA options, then the publication will be


(3) space at start of literal
doc/src/sgml/catalogs.sgml

+   and  FOR SCHEMA, so for such publications there will be a


(4) Should say "variants of this command change ..." ?

+   The fourth, fifth and sixth variants change which schemas are part of the



Also, there seems to be an issue with ALTER PUBLICATION ... SET SCHEMA ...
(PubType is getting set to 'e' instead of 's'

test_pub=# create publication pub1;
CREATE PUBLICATION
test_pub=# create table myschema.test(i int);
CREATE TABLE
test_pub=# alter publication pub1 set schema myschema;
ALTER PUBLICATION
test_pub=# \dRp pub1
   List of publications
 Name | Owner | All tables | Inserts | Updates | Deletes | Truncates |
Via root | PubType
--+---++-+-+-+---+--+-
 pub1 | gregn | f  | t   | t   | t   | t |
f| e
(1 row)

test_pub=# alter publication pub1 add table test;
ALTER PUBLICATION
test_pub=# \dRp pub1
   List of publications
 Name | Owner | All tables | Inserts | Updates | Deletes | Truncates |
Via root | PubType
--+---++-+-+-+---+--+-
 pub1 | gregn | f  | t   | t   | t   | t |
f| t
(1 row)


When I use "ADD SCHEMA" instead of "SET SCHEMA" on an empty
publication, it seems OK.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: alter table set TABLE ACCESS METHOD

rebased.

Also, there were two redundant checks for multiple SET ACCESS METHOD commands.
But one of them wasn't hit if the ALTER was setting the current AM due to the
no-op test.

I think it's better to fail in every case, and not just sometimes (especially
if we were to use ERRCODE_SYNTAX_ERROR).

I included my 2ndary patch allowing to set the AM of partitioned table, same as
for a tablespace.

-- 
Justin
>From a20b924adbcc6e24a88f8d9bd0287c62456720a3 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 5 May 2021 14:28:59 -0700
Subject: [PATCH 1/2] ALTER TABLE ... SET ACCESS METHOD

Adds support for changing the access method of a table with a
rewrite.

Author: Justin Pryzby, Jeff Davis
---
 doc/src/sgml/ref/alter_table.sgml   | 20 
 src/backend/commands/cluster.c  | 21 +---
 src/backend/commands/matview.c  |  5 +-
 src/backend/commands/tablecmds.c| 68 +++--
 src/backend/parser/gram.y   |  8 +++
 src/bin/psql/tab-complete.c | 10 ++--
 src/include/commands/cluster.h  |  4 +-
 src/include/commands/event_trigger.h|  1 +
 src/include/nodes/parsenodes.h  |  1 +
 src/test/regress/expected/create_am.out | 34 +
 src/test/regress/sql/create_am.sql  | 21 
 11 files changed, 175 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index c5e5e84e06..1d0f30d366 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -75,6 +75,7 @@ ALTER TABLE [ IF EXISTS ] name
 CLUSTER ON index_name
 SET WITHOUT CLUSTER
 SET WITHOUT OIDS
+SET ACCESS METHOD new_access_method
 SET TABLESPACE new_tablespace
 SET { LOGGED | UNLOGGED }
 SET ( storage_parameter [= value] [, ... ] )
@@ -692,6 +693,16 @@ WITH ( MODULUS numeric_literal, REM
 

 
+   
+SET ACCESS METHOD
+
+ 
+  This form changes the access method of the table by rewriting it. See
+   for more information.
+ 
+
+   
+

 SET TABLESPACE
 
@@ -1228,6 +1239,15 @@ WITH ( MODULUS numeric_literal, REM
   
  
 
+ 
+  new_access_method
+  
+   
+The name of the access method to which the table will be converted.
+   
+  
+ 
+
  
   new_tablespace
   
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 2912b4c257..45bf1276a5 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -641,6 +641,7 @@ static void
 rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Oid			accessMethod = OldHeap->rd_rel->relam;
 	Oid			tableSpace = OldHeap->rd_rel->reltablespace;
 	Oid			OIDNewHeap;
 	char		relpersistence;
@@ -661,6 +662,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 
 	/* Create the transient table that will receive the re-ordered data */
 	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   accessMethod,
 			   relpersistence,
 			   AccessExclusiveLock);
 
@@ -682,16 +684,16 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 /*
  * Create the transient table that will be filled with new data during
  * CLUSTER, ALTER TABLE, and similar operations.  The transient table
- * duplicates the logical structure of the OldHeap, but is placed in
- * NewTableSpace which might be different from OldHeap's.  Also, it's built
- * with the specified persistence, which might differ from the original's.
+ * duplicates the logical structure of the OldHeap; but will have the
+ * specified physical storage properties NewTableSpace, NewAccessMethod, and
+ * relpersistence.
  *
  * After this, the caller should load the new heap with transferred/modified
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
-			  LOCKMODE lockmode)
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
+			  char relpersistence, LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
 	char		NewHeapName[NAMEDATALEN];
@@ -750,7 +752,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 		  InvalidOid,
 		  InvalidOid,
 		  OldHeap->rd_rel->relowner,
-		  OldHeap->rd_rel->relam,
+		  NewAccessMethod,
 		  OldHeapDesc,
 		  NIL,
 		  RELKIND_RELATION,
@@ -1100,6 +1102,10 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		relform1->reltablespace = relform2->reltablespace;
 		relform2->reltablespace = swaptemp;
 
+		swaptemp = relform1->relam;
+		relform1->relam = relform2->relam;
+		relform2->relam = swaptemp;
+
 		swptmpchr = relform1->relpersistence;
 		relform1->relpersistence = relform2->relpersistence;
 		relform2->relpersistence = swptmpchr;
@@ -1135,6 +1141,9 @@ swap_relation_files(Oid r1, Oid r2, bool

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

On Fri, 16 Jul 2021 at 02:53, Ronan Dunklau  wrote:
> Please find attached a v9 just moving the flag setting to ExecInitSort, and my
> apologies if I misunderstood your point.

I took this and adjusted a few things and ended up with the attached patch.

The changes are fairly minor. I made the bracing consistent between
both tuplesort_begin calls. I rewrote the comment at the top of
ExecSort() to make it more clear about each method used.

I also adjusted the comment down at the end of ExecSort that was
mentioning something about tuplesort_gettupleslot returning NULL.
Your patch didn't touch this, but to me, the comment just looked wrong
both before and after the changes. tuplesort_gettupleslot returns
false and sets the slot to empty when it runs out of tuples.  Anyway,
I wrote something there that I think improves that.

I feel like this patch is commit-worthy now.  However, I'll leave it
for a few days, maybe until after the weekend as there's been a fair
bit of interest and I imagine someone will have comments to make.

David

v10-0001-Make-nodeSort.c-do-Datum-sorts-for-single-column.patch
Description: Binary data

Re: row filtering for logical replication

2021-07-15 Thread Amit Kapila

On Wed, Jul 14, 2021 at 4:30 PM Dilip Kumar  wrote:
>
> On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
>  wrote:
> >
> > Is there some reasonable rule which of the old/new tuples (or both) to
> > use for the WHERE condition? Or maybe it'd be handy to allow referencing
> > OLD/NEW as in triggers?
>
> I think for insert we are only allowing those rows to replicate which
> are matching filter conditions, so if we updating any row then also we
> should maintain that sanity right? That means at least on the NEW rows
> we should apply the filter, IMHO.  Said that, now if there is any row
> inserted which were satisfying the filter and replicated, if we update
> it with the new value which is not satisfying the filter then it will
> not be replicated,  I think that makes sense because if an insert is
> not sending any row to a replica which is not satisfying the filter
> then why update has to do that, right?
>

There is another theory in this regard which is what if the old row
(created by the previous insert) is not sent to the subscriber as that
didn't match the filter but after the update, we decide to send it
because the updated row (new row) matches the filter condition. In
this case, I think it will generate an update conflict on the
subscriber as the old row won't be present. As of now, we just skip
the update but in the future, we might have some conflict handling
there. If this is true then even if the new row matches the filter,
there is no guarantee that it will be applied on the subscriber-side
unless the old row also matches the filter. Sure, there could be a
case where the user might have changed the filter between insert and
update but maybe we can have a separate way to deal with such cases if
required like providing some provision where the user can specify
whether it would like to match old/new row in updates?

-- 
With Regards,
Amit Kapila.

Re: Corrected documentation of data type for the logical replication message formats.

2021-07-15 Thread Peter Smith

Hi Vignesh.

FYI - Because the other patch [1] was pushed ahead of this one, I
think your patch now needs to be updated for the new message types
that were introduced.

--
[1] 
https://github.com/postgres/postgres/commit/a8fd13cab0ba815e9925dc9676e6309f699b5f72#diff-331c33fd11c3ed85f9dbfead93f139c20ff3a25176651fc2ed37c486b97630e6

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-15 Thread Yugo NAGATA

Hello,

I attached the updated patch.

On Tue, 13 Jul 2021 15:50:52 +0900
Yugo NAGATA  wrote:

> > >> > I'm a little hesitant about how to count and report such unfinished
> > >> > because of bench timeout transactions, though. Not counting them seems
> > >> > to be the best option.

> > > I will fix to finish the benchmark when the time is over during retrying, 
> > > that is,
> > > change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

Done.
(I wrote CSTATE_ERROR, but correctly it is CSTATE_FAILURE.)

Now, once the timer is expired during retrying a failed transaction, pgbench 
never start
a new transaction for retry. If the transaction successes, it will counted in 
the result.
Otherwise, if the transaction fails again, it is not counted.

In addition, I fixed to work well with pipeline mode. Previously, pipeline mode 
was not
enough considered, ROLLBACK was not sent correctly. I fixed to handle errors in 
pipeline
mode properly, and now it works.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 1cd3519f3a1cfbffe7bcce35fd6f12da566625b1 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v15 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml| 402 +++-
 src/bin/pgbench/pgbench.c| 974 +--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 -
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c   |  16 +-
 src/include/fe_utils/conditional.h   |   2 +
 6 files changed, 1505 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..0a3ccb3c92 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)

   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In -T mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  -T mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see  for more information).
   The last line reports the number of transactions per second.

@@

Re: row filtering for logical replication

2021-07-15 Thread Peter Smith

On Thu, Jul 15, 2021 at 4:30 AM Euler Taveira  wrote:
>
> On Wed, Jul 14, 2021, at 8:21 AM, Greg Nancarrow wrote:
>
> Some minor v19 patch review points you might consider for your next
> patch version:
>
> Greg, thanks for another review. I agree with all of these changes. It will be
> in the next patch.

Hi, here are a couple more minor review comments for the V19 patch.

(The 2nd one overlaps a bit with one that Greg previously gave).

//

1. doc/src/sgml/ref/create_publication.sgml

+   columns. A NULL value causes the expression to evaluate
+   to false; avoid using columns without not-null constraints in the
+   WHERE clause. The WHERE clause does
+   not allow functions and user-defined operators.
+  

=>

typo: "and user-defined operators." --> "or user-defined operators."

--

2. src/backend/commands/publicationcmds.c - OpenTableList IsA logic

IIUC the tables list can only consist of one kind of list element.

Since there is no expected/permitted "mixture" of kinds then there is
no need to check the IsA within the loop like v19 is doing; instead
you can check only the list head element. If you want to, then you
could Assert that every list element has a consistent kind as the
initial kind, but maybe that is overkill too?

PSA a small tmp patch to demonstrate what this comment is about.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

v19-0001-PS-tmp-OpenTableList-IsA-logic.patch
Description: Binary data

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

On Fri, 16 Jul 2021 at 02:53, Ronan Dunklau  wrote:
>
> Le jeudi 15 juillet 2021, 16:19:23 CEST David Rowley a écrit :>
> > Ronan's latest results plus John's make me think there's no need to
> > separate out the node function as I did in v8.  However, I do think v6
> > could learn a little from v8. I think I'd rather see the sort method
> > determined in ExecInitSort() rather than ExecSort(). I think
> > minimising those few extra instructions in ExecSort() might help the
> > L1 instruction cache.
> >
>
> I'm not sure I understand what you expect from moving that to ExecInitSort ?

The motivation was to reduce the extra code that's being added to
ExecSort. I checked the assembly of ExecSort on v6 and v9 and v6 was
544 lines of assembly and v9 is 534 lines.

> Maybe we should also implement the tuplesort_state initialization in
> ExecInitSort ? (not the actual feeding and sorting of course).

I don't think that would be a good idea.  Setting the datumSort does
not require any new memory to be allocated. That's not the case for
the tuplesort_begin routines.  The difference here is that we can
delay the memory allocation until we pull the first tuple and if we
don't pull any tuples from the outer node then there are no needless
allocations.

> Please find attached a v9 just moving the flag setting to ExecInitSort, and my
> apologies if I misunderstood your point.

That's exactly what I meant. Thanks

David

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

On Wed, Jul 14, 2021 at 07:42:33AM +0200, Laurenz Albe wrote:
> Besides, schemas are not physical, but logical containers.  So I see a point 
> in
> measuring the storage used in a certain tablespace, but not so much by all 
> objects
> in a certain schema.  It might be useful for accounting purposes, though.

We use only a few schemas, 1) to hide child tables; 2) to exclude some extended
stats from backups, and 1-2 other things.  But it's useful to be able to see
how storage is used by schema, and better to do it conveniently.

I think it'd be even more useful for people who use schemas more widely than we
do:
"Who's using all our space?"
\dn++
"Oh, it's that one - let me clean that up..."

Or, "what's the pg_toast stuff, and do I need to do something about it?"

> But I don't expect it to be in frequent enough demand to add a psql command.
> 
> What about inventing a function pg_schema_size(regnamespace)?

But for "physical" storage it's also possible to get the size from the OS, much
more efficiently, using /bin/df or zfs list (assuming nothing else is using
those filesystems).  The pg_*_size functions are inefficient, but psql \db+ and
\l+ already call them anyway.

For schemas, there's no way to get the size from the OS, so it's nice to make
the size available from psql, conveniently.

v3 patch:
 - fixes an off by one in forkNum loop;
 - removes an unnecessary subquery in describe.c;
 - returns 0 rather than NULL if the schema is empty;
 - adds pg_am_size;

regression=# \dA++
  List of access methods
  Name  | Type  |   Handler|  Description   
|  Size   
+---+--++-
 brin   | Index | brinhandler  | block range index (BRIN) access method 
| 744 kB
 btree  | Index | bthandler| b-tree index access method 
| 21 MB
 gin| Index | ginhandler   | GIN index access method
| 2672 kB
 gist   | Index | gisthandler  | GiST index access method   
| 2800 kB
 hash   | Index | hashhandler  | hash index access method   
| 2112 kB
 heap   | Table | heap_tableam_handler | heap table access method   
| 60 MB
 heap2  | Table | heap_tableam_handler |
| 120 kB
 spgist | Index | spghandler   | SP-GiST index access method
| 5840 kB
(8 rows)

regression=# \dn++
   List of schemas
Name|  Owner  | Access privileges  |  Description   |  
Size   
+-+++-
 fkpart3| pryzbyj ||| 
168 kB
 fkpart4| pryzbyj ||| 
104 kB
 fkpart5| pryzbyj ||| 
40 kB
 fkpart6| pryzbyj ||| 
48 kB
 mvtest_mvschema| pryzbyj ||| 
16 kB
 public | pryzbyj | pryzbyj=UC/pryzbyj+| standard public schema | 
69 MB
| | =UC/pryzbyj|| 
 regress_indexing   | pryzbyj ||| 
48 kB
 regress_rls_schema | pryzbyj ||| 0 
bytes
 regress_schema_2   | pryzbyj ||| 0 
bytes
 testxmlschema  | pryzbyj ||| 
24 kB
(10 rows)

-- 
Justin
>From d7f5446c0aa1a12a512482d2ed121eb44f6e2f59 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH v3 1/4] psql: show size of schemas and AMs in \dn+ and \dA+..

See also: 358a897fa, 528ac10c7

Add pg_am_size() and psql \dA++
---
 src/backend/utils/adt/dbsize.c  | 138 
 src/bin/psql/describe.c |  12 ++-
 src/include/catalog/pg_proc.dat |  19 +
 3 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d5a7fb13f3..62273d4f15 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,18 +13,23 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_tablespace.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenodemap.h"

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

Em qui., 15 de jul. de 2021 às 10:04, Ranier Vilela 
escreveu:

> Em qui., 15 de jul. de 2021 às 10:01, Aleksander Alekseev <
> aleksan...@timescale.com> escreveu:
>
>> Thanks, David.
>>
>> > I lost where. Can you show me?
>>
>> See the attached warnings.txt.
>>
> Thank you.
>
>
>>
>> > But the benchmark came from:
>> > pgbench -i -p 5432 -d postgres
>> > pgbench -c 50 -T 300 -S -n
>>
>> I'm afraid this tells nothing unless you also provide the
>> configuration files and the hardware description, and also some
>> information on how you checked that there is no performance
>> degradation on all the other supported platforms and possible
>> configurations.
>
>
>
>> Benchmarking is a very complicated topic - trust me,
>> been there!
>>
> Absolutely.
>
>
>>
>> It would be better to submit two separate patches, the one that
>> addresses Size_t and another that addresses shadowing. Refactorings
>> only, nothing else.
>>
>> Regarding the code formatting, please see src/tools/pgindent.
>>
> I will try.
>
Here are the two patches.
As suggested, reclassified as refactoring only.

regards,
Ranier Vilela
From 57eaa5fa92489b217239ec44e2ccf83556a1da4f Mon Sep 17 00:00:00 2001
From: Ranier Vilela 
Date: Thu, 15 Jul 2021 21:36:21 -0300
Subject: [PATCH] Promove unshadowing of two variables PGPROC type.

ProcArrayGroupClearXid function has a parameter already named *proc*,
When declaring a local variable with the same name, people may get confused when using them.
Better to rename local variables to nextproc and maintain readability.

No need to backpatch, this patch is classified as refactoring only.
---
 src/backend/storage/ipc/procarray.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4c91e721d0..286132c696 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -829,12 +829,12 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
 	/* Walk the list and clear all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
-		PGPROC	   *proc = [nextidx];
+		PGPROC	   *nextproc = [nextidx];
 
-		ProcArrayEndTransactionInternal(proc, proc->procArrayGroupMemberXid);
+		ProcArrayEndTransactionInternal(nextproc, proc->procArrayGroupMemberXid);
 
 		/* Move to next proc in list. */
-		nextidx = pg_atomic_read_u32(>procArrayGroupNext);
+		nextidx = pg_atomic_read_u32(>procArrayGroupNext);
 	}
 
 	/* We're done with the lock now. */
@@ -849,18 +849,18 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
 	 */
 	while (wakeidx != INVALID_PGPROCNO)
 	{
-		PGPROC	   *proc = [wakeidx];
+		PGPROC	   *nextproc = [wakeidx];
 
-		wakeidx = pg_atomic_read_u32(>procArrayGroupNext);
-		pg_atomic_write_u32(>procArrayGroupNext, INVALID_PGPROCNO);
+		wakeidx = pg_atomic_read_u32(>procArrayGroupNext);
+		pg_atomic_write_u32(>procArrayGroupNext, INVALID_PGPROCNO);
 
 		/* ensure all previous writes are visible before follower continues. */
 		pg_write_barrier();
 
-		proc->procArrayGroupMember = false;
+		nextproc->procArrayGroupMember = false;
 
-		if (proc != MyProc)
-			PGSemaphoreUnlock(proc->sem);
+		if (nextproc != MyProc)
+			PGSemaphoreUnlock(nextproc->sem);
 	}
 }
 
-- 
2.25.1

From fafb23a62835f7d35bd3f4642b06253cf16cbfca Mon Sep 17 00:00:00 2001
From: Ranier Vilela 
Date: Thu, 15 Jul 2021 21:40:20 -0300
Subject: [PATCH] Reduce Wsign-compare warnings from clang-12 compiler.

All size_t variables declared in procarray.c, are true ints.
Lets promove a changes for int, reducing warnings like
"comparison of integers of different signs".

No need to backpatch, why this patch is classified as
refactoring only.
---
 src/backend/storage/ipc/procarray.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 286132c696..efb489bc5d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -703,7 +703,7 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 static inline void
 ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid)
 {
-	size_t		pgxactoff = proc->pgxactoff;
+	int			pgxactoff = proc->pgxactoff;
 
 	/*
 	 * Note: we need exclusive lock here because we're going to change other
@@ -875,7 +875,7 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
 void
 ProcArrayClearTransaction(PGPROC *proc)
 {
-	size_t		pgxactoff;
+	int			pgxactoff;
 
 	/*
 	 * Currently we need to lock ProcArrayLock exclusively here, as we
@@ -1355,7 +1355,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	TransactionId topxid;
 	TransactionId latestCompletedXid;
 	int			mypgxactoff;
-	size_t		numProcs;
+	int			numProcs;
 	int			j;
 
 	/*
@@ -1432,7 +1432,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	/* No shortcuts, gotta grovel through the array */
 	mypgxactoff = MyProc->pgxactoff;
 	numProcs =

Re: Introduce pg_receivewal gzip compression tests

2021-07-15 Thread Michael Paquier

On Thu, Jul 15, 2021 at 09:35:52PM +0900, Michael Paquier wrote:
> For this one, I'll try to test harder on my own host.  I am curious to
> see if the other Windows members running the TAP tests have anything
> to say.  Looking at the code of zlib, this would come from gz_zero()
> in gzflush(), which could blow up on a write() in gz_comp().

bowerbird has just failed for the second time in a row on EACCESS, so
there is more here than meets the eye.  Looking at the code, I think I
have spotted what it is and the buildfarm logs give a very good hint:
# Running: pg_receivewal -D
:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal
--verbose --endpos 0/328 --compress 1
pg_receivewal: starting log streaming at 0/200 (timeline 1)
pg_receivewal: fatal: could not fsync existing write-ahead log file
"00010002.partial": Permission denied
not ok 20 - streaming some WAL using ZLIB compression

--compress is used and the sync fails for a non-compressed segment.
Looking at the code it is pretty obvious that open_walfile() is
getting confused with the handling of an existing .partial segment
while walmethods.c uses dir_data->compression in all the places that
matter.  So that's a legit bug, that happens only when mixing
pg_receivewal runs for where successive runs use the compression or
non-compression modes.

I am amazed that the other buildfarm members are not complaining, to
be honest.  jacana runs this TAP test with MinGW and ZLIB, and does
not complain.
--
Michael


signature.asc
Description: PGP signature

RE: Speed up COMMIT PREPARED

2021-07-15 Thread r.takahash...@fujitsu.com

I noticed that the previous Read_from_walbuffer.patch has a mistake in 
xlogreader.c.
Could you please use the attached v2 patch?

The performance result of the previous mail is the result of v2 patch.


Regards,
Ryohei Takahashi


v2-Read_from_walbuffer.patch
Description: v2-Read_from_walbuffer.patch

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thursday, July 15, 2021, Jan Wieck  wrote:

> On 7/15/21 4:24 PM, David G. Johnston wrote:
>
> OK, especially as this seems useful outside of pg_upgrade, and if done
>> separately is something pg_upgrade could just run as part of its new
>> cluster evaluation scripts.  Knowing whether an extension is outdated
>> doesn't require the old cluster.
>>
>
> Knowing that (the extension is outdated) exactly how? Can you give us an
> example query, maybe a few SQL snippets explaining what exactly you are
> talking about? Because at this point you completely lost me.
>

I was mostly going off other people saying it was possible.  In any case,
looking at pg_available_extension_versions, once you figure out how to
order by the version text column, would let you check if any installed
extensions are not their latest version.

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly


On 7/15/21 4:24 PM, David G. Johnston wrote:

OK, especially as this seems useful outside of pg_upgrade, and if done 
separately is something pg_upgrade could just run as part of its new 
cluster evaluation scripts.  Knowing whether an extension is outdated 
doesn't require the old cluster.


Knowing that (the extension is outdated) exactly how? Can you give us an 
example query, maybe a few SQL snippets explaining what exactly you are 
talking about? Because at this point you completely lost me.


Sorry, Jan

--
Jan Wieck

Re: data corruption hazard in reorderbuffer.c

> On Jul 15, 2021, at 3:32 PM, Tomas Vondra  
> wrote:
> 
> I think it's mostly futile to list all the possible issues this might have 
> caused - if you skip arbitrary decoded changes, that can trigger pretty much 
> any bug in reorder buffer. But those bugs can be triggered by various other 
> issues, of course.

I thought that at first, too, but when I searched for bug reports with the 
given error messages, they all had to do with logical replication or logical 
decoding.  That seems a bit fishy to me.  If these bugs were coming from all 
over the system, why would that be so?

> It's hard to say what was the cause, but the "logic" bugs are probably 
> permanent, while the issues triggered by I/O probably disappear after a 
> restart?

If you mean "logic" bugs like passing an invalid file descriptor to close(), 
then yes, those would be permanent, but I don't believe we have any bugs of 
that kind.

The consequences of I/O errors are not going to go away after restart.  On the 
subscriber side, if logical replication has replayed less than a full 
transaction worth of changes without raising an error, the corruption will be 
permanent.

> That being said, I agree this seems like an issue and we should not ignore 
> I/O errors.

I agree.

> I'd bet other places using transient files (like sorting or hashagg spilling 
> to disk) has the same issue, although in that case the impact is likely 
> limited to a single query.

I'm not so convinced.  In most places, the result is ignored for close()-type 
operations only when the prior operation failed and we're closing the handle in 
preparation for raising an error.  There are a small number of other places, 
such as pg_import_system_collations and perform_base_backup, which ignore the 
result from closing a handle, but those are reading from the handle, not 
writing to it, so the situation is not comparable.

I believe the oversight in reorderbuffer.c really is a special case.

> I wonder if sync before the close is an appropriate solution, though. It 
> seems rather expensive, and those files are meant to be "temporary" (i.e. we 
> don't keep them over restart). So maybe we could ensure the consistency is a 
> cheaper way - perhaps tracking some sort of checksum for each file, or 
> something like that?

I'm open to suggestions of what to do, but thinking of these files as temporary 
may be what misleads developers to believe they don't have to be treated 
carefully.  The code was first committed in 2014 and as far as I am aware 
nobody else has complained about this before.  In some part that might be 
because CloseTemporaryFile() is less familiar than close() to most developers, 
and they may be assuming that it contains its own error handling and just don't 
realize that it returns an error code just like regular close() does.

The point of the reorder buffer is to sort the changes from a single 
transaction so that they can be replayed in order.  The files used for the 
sorting are temporary, but there is nothing temporary about the failure to 
include all the changes in the files, as that will have permanent consequences 
for whoever replays them.  If lucky, the attempt to replay the changes will 
abort because they don't make sense, but I've demonstrated to my own 
satisfaction that nothing prevents silent data loss if the failure to write 
changes happens to destroy a complete rather than partial change.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: data corruption hazard in reorderbuffer.c

2021-07-15 Thread Tomas Vondra


Hi,

I think it's mostly futile to list all the possible issues this might 
have caused - if you skip arbitrary decoded changes, that can trigger 
pretty much any bug in reorder buffer. But those bugs can be triggered 
by various other issues, of course.


It's hard to say what was the cause, but the "logic" bugs are probably 
permanent, while the issues triggered by I/O probably disappear after a 
restart?


That being said, I agree this seems like an issue and we should not 
ignore I/O errors. I'd bet other places using transient files (like 
sorting or hashagg spilling to disk) has the same issue, although in 
that case the impact is likely limited to a single query.


I wonder if sync before the close is an appropriate solution, though. It 
seems rather expensive, and those files are meant to be "temporary" 
(i.e. we don't keep them over restart). So maybe we could ensure the 
consistency is a cheaper way - perhaps tracking some sort of checksum 
for each file, or something like that?



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: speed up verifying UTF-8

2021-07-15 Thread John Naylor

I wrote:

> To simplify the constants, I do shift down to uint32, and I didn't bother
working around that. v16alpha regressed on worst-case input, so for v16beta
I went back to earlier coding for the one-byte ascii check. That helped,
but it's still slower than v14.

It occurred to me that I could rewrite the switch test into simple
comparisons, like I already had for the 2- and 4-byte lead cases. While at
it, I folded the leading byte and continuation tests into a single
operation, like this:

/* 3-byte lead with two continuation bytes */
else if ((chunk & 0xF0C0C000) == 0xE0808000)

...and also tried using 64-bit constants to avoid shifting. Still didn't
quite beat v14, but got pretty close:

> The numbers on Power8 / gcc 4.8 (little endian):
>
> HEAD:
>
>  chinese | mixed | ascii | mixed16 | mixed8
> -+---+---+-+
> 2951 |  1521 |   871 |1474 |   1508
>
> v14:
>
>  chinese | mixed | ascii | mixed16 | mixed8
> -+---+---+-+
>  885 |   607 |   179 | 774 |   1325

v16gamma:

 chinese | mixed | ascii | mixed16 | mixed8
-+---+---+-+
 952 |   632 |   180 | 800 |   1333

A big-endian 64-bit platform just might shave enough cycles to beat v14
this way... or not.

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 0636b8765b..f48d79638c 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -13,8 +13,41 @@
 #include "c.h"
 
 #include "mb/pg_wchar.h"
+#include "port/pg_bswap.h"
 
 
+/* Verify a chunk of bytes for valid ASCII including a zero-byte check. */
+static inline int
+check_ascii(const uint64 chunk)
+{
+   uint64
+   highbits_set,
+   highbit_carry;
+
+   /* Check if any bytes in this chunk have the high bit set. */
+   highbits_set = chunk & UINT64CONST(0x8080808080808080);
+   if (highbits_set)
+   return 0;
+
+   /*
+* Check if there are any zero bytes in this chunk.
+*
+* First, add 0x7f to each byte. This sets the high bit in each byte,
+* unless it was a zero. We already checked that none of the bytes had 
the
+* high bit set previously, so the max value each byte can have after 
the
+* addition is 0x7f + 0x7f = 0xfe, and we don't need to worry about
+* carrying over to the next byte.
+*/
+   highbit_carry = chunk + UINT64CONST(0x7f7f7f7f7f7f7f7f);
+
+   /* Then check that the high bit is set in each byte. */
+   highbit_carry &= UINT64CONST(0x8080808080808080);
+   if (highbit_carry == UINT64CONST(0x8080808080808080))
+   return sizeof(chunk);
+   else
+   return 0;
+}
+
 /*
  * Operations on multi-byte encodings are driven by a table of helper
  * functions.
@@ -1728,6 +1761,67 @@ pg_gb18030_verifystr(const unsigned char *s, int len)
return s - start;
 }
 
+/*
+ * Workhorse for pg_utf8_verifychar(). Returns the length of the character
+ * at *s in bytes, or -1 on invalid input or premature end of input.
+ * Static inline for the benefit of pg_utf8_verifystr().
+ */
+static inline int
+pg_utf8_verifychar_internal(const uint64 chunk_orig)
+{
+   const uint64 chunk = (pg_hton64(chunk_orig));
+
+   /* high bit should be set */
+   Assert((chunk & 0x8000) != 0);
+
+   /* 2-byte lead with one continuation byte */
+   if ((chunk & 0xE0C0) == 0xC080)
+   {
+   /* check 2-byte overlong: 1100.000x.10xx. */
+   if (chunk < 0xC200)
+   return -1;
+
+   /* found valid sequence for code points U+0080 through U+07FF */
+   return 2;
+   }
+   /* 3-byte lead with two continuation bytes */
+   else if ((chunk & 0xF0C0C000) == 0xE0808000)
+   {
+   /* check 3-byte overlong: 1110. 100x. 10xx. */
+   if (chunk < 0xE0A0)
+   return -1;
+
+   /* check surrogate: 1110.1101 101x. 10xx. */
+   if (chunk > 0xED9FBFFF && chunk < 0xEE00)
+   return -1;
+
+   /*
+* found valid sequence for code points U+0800 through U+D7FF or
+* U+E000 through U+
+*/
+   return 3;
+   }
+   /* 4-byte lead with three continuation bytes */
+   else if ((chunk & 0xF8C0C0C0) == 0xF0808080)
+   {
+   /*
+* check 4-byte overlong: . 1000. 10xx. 
10xx.
+*/
+   if (chunk < 0xF090)
+   return -1;
+
+   /* check too large: .0100 1001. 10xx. 10xx. */
+   if (chunk > 0xF48FBFBF)
+

Re: data corruption hazard in reorderbuffer.c

> On Jul 15, 2021, at 1:51 PM, Mark Dilger  wrote:
> 
> one common error I see

Another common error is of the form

ERROR:  could not map filenode "base/16384/16413" to relation OID

resulting from a ddl statement having not been written correctly, I think.  
This, too, has shown up in [1] fairly recently against a 12.3 database.  A 
similar looking bug was reported a couple years earlier in [2], for which 
Andres pushed the fix e9edc1ba0be21278de8f04a068c2fb3504dc03fc and backpatched 
as far back as 9.2.  Postgres 12.3 was released quite a bit more recently than 
that, so it appears the bug report from [1] occurred despite having the fix 
from [2].

[1] 
https://www.postgresql-archive.org/BUG-16812-Logical-decoding-error-td6170022.html
[2] 
https://www.postgresql.org/message-id/flat/20180914021046.oi7dm4ra3ot2g2kt%40alap3.anarazel.de

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: data corruption hazard in reorderbuffer.c

> On Jul 15, 2021, at 1:03 PM, Mark Dilger  wrote:
> 
> Skipping some writes while not others easily creates a variety of failures, 
> and for brevity I won't post a patch to demonstrate that here.

If anybody is curious, one common error I see when simulating a close() 
skipping partial changes rather than whole ones looks like:

ERROR:  got sequence entry 31 for toast chunk 16719 instead of seq 21

(where the exact numbers differ, of course).  This symptom has shown up in at 
least two ([1], [2] below) unsolved user bug reports specifically mentioning 
replication.  That doesn't prove a connection between the those reports and 
this issue, but it makes me wonder.

[1] 
https://www.postgresql.org/message-id/ca+_m4obs2apkjqd1gxnx2ykutjogycfq8tzgr1upp3ztbty...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/3bd6953d-3da6-040c-62bf-9522808d5c2f%402ndquadrant.com#f6f165ebea024f851d47a17723de5d29

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, Jul 15, 2021 at 11:52 AM Dave Cramer 
wrote:

>
> On Thu, 15 Jul 2021 at 14:31, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> Yep, and IIUC running "CREATE EXTENSION pg_stat_statements VERSION
>> '1.5';" works correctly in v13 as does executing
>>
> While it does work there are issues with dumping and restoring a database
> with the old version of pg_stat_statements in it that would only be found
> by dumping and restoring.
>

I'm unsure how this impacts the broader discussion.  At worse you'd have a
chance to manually run the update command on the new cluster before using
dump/restore.

>
>> "ALTER EXTENSION pg_stat_statements UPDATE;" while version 1.5 is
>> installed.
>>
>
>
>
>> So even without doing the copying of the old contrib libraries to the new
>> server such a "one at a time" procedure would work just fine for this
>> particular contrib extension.
>>
>
> You cannot copy the old contrib libraries into the new server. This will
> fail due to changes in the API and various exported variables either not
> being there or being renamed.
>

If this is true then the docs have a bug.  It sounds more like the
documentation should say "ensure that the new cluster has extension
libraries installed that are compatible with the version of the extension
installed on the old cluster".  Whether we want to be even more specific
with regards to contrib I cannot say - it seems like newer versions largely
retain backward compatibility so this is basically a non-issue for contrib
(though maybe individual extensions have their own requirements?)

but it fails to say that the versions that are installed may need to be
> updated.
>

OK, especially as this seems useful outside of pg_upgrade, and if done
separately is something pg_upgrade could just run as part of its new
cluster evaluation scripts.  Knowing whether an extension is outdated
doesn't require the old cluster.
David J.

data corruption hazard in reorderbuffer.c

Hackers,

I believe there is a hazard in reorderbuffer.c if a call to write() buffers 
data rather than flushing it to disk, only to fail when flushing the data 
during close().  The relevant code is in ReorderBufferSerializeTXN(), which 
iterates over changes for a transaction, opening transient files as needed for 
the changes to be written, writing the transaction changes to the transient 
files using ReorderBufferSerializeChange(), and closing the files when finished 
using CloseTransientFile(), the return code from which is ignored.  If 
ReorderBufferSerializeChange() were fsync()ing the writes, then ignoring the 
failures on close() would likely be ok, or at least in line with what we do 
elsewhere.  But as far as I can see, no call to sync the file is performed.

I expect a failure here could result in a partially written change in the 
transient file, perhaps preceded or followed by additional complete or partial 
changes.  Perhaps even worse, a failure could result in a change not being 
written at all (rather than partially), potentially with preceding and 
following changes written intact, with no indication that one change is missing.

Upon testing, both of these expectations appear to be true.  Skipping some 
writes while not others easily creates a variety of failures, and for brevity I 
won't post a patch to demonstrate that here.  The following code change causes 
whole rather than partial changes to be written or skipped.  After applying 
this change and running the tests in contrib/test_decoding/, "test toast" shows 
that an UPDATE command has been silently skipped.

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 7378beb684..a6c60b81c9 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -108,6 +108,7 @@
 #include "utils/rel.h"
 #include "utils/relfilenodemap.h"
 
+static bool lose_writes_until_close = false;
 
 /* entry for a hash table we use to map from xid to our transaction state */
 typedef struct ReorderBufferTXNByIdEnt
@@ -3523,6 +3524,8 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
Sizespilled = 0;
Sizesize = txn->size;
 
+   lose_writes_until_close = false;
+
elog(DEBUG2, "spill %u changes in XID %u to disk",
 (uint32) txn->nentries_mem, txn->xid);
 
@@ -3552,7 +3555,10 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
charpath[MAXPGPATH];
 
if (fd != -1)
+   {
CloseTransientFile(fd);
+   lose_writes_until_close = 
!lose_writes_until_close;
+   }
 
XLByteToSeg(change->lsn, curOpenSegNo, 
wal_segment_size);
 
@@ -3600,6 +3606,8 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
 
if (fd != -1)
CloseTransientFile(fd);
+
+   lose_writes_until_close = false;
 }
 
 /*
@@ -3790,7 +3798,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
 
errno = 0;
pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_WRITE);
-   if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+
+   if (lose_writes_until_close)
+   ;   /* silently do nothing with buffer contents */
+   else if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
{
int save_errno = errno;


The attached patch catches the failures on close(), but to really fix this 
properly, we should call pg_fsync() before close().

Any thoughts on this?  It seems to add a fair amount of filesystem burden to 
add all the extra fsync activity, though I admit to having not benchmarked that 
yet.  Perhaps doing something like Thomas's work for commit dee663f7843 where 
closing files is delayed so that fewer syncs are needed?  I'm not sure how much 
that would help here, and would like feedback before pursuing anything of that 
sort.

The relevant code exists back as far as the 9_4_STABLE branch, where commit 
b89e151054a from 2014 first appears.



v1-0001-No-longer-ignoring-failures-on-file-close.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Avoid repeated PQfnumber() in pg_dump

2021-07-15 Thread Daniel Gustafsson

> On 15 Jul 2021, at 04:51, houzj.f...@fujitsu.com wrote:
> On July 15, 2021 5:35 AM Daniel Gustafsson  wrote:

>> Out of curiosity, how many columns are "many columns"?
> 
> I tried dump 10 table definitions while each table has 1000 columns
> (maybe not real world case).

While unlikely to be common, very wide tables aren’t unheard of.  Either way, I
think it has merit to pull out the PQfnumber before the loop even if it’s a
wash performance wise for many users, as it’s a pattern used elsewhere in
pg_dump.

--
Daniel Gustafsson   https://vmware.com/

Re: CREATE COLLATION - check for duplicate options and error out if found one

2021-07-15 Thread Dean Rasheed

On Mon, 31 May 2021 at 15:10, vignesh C  wrote:
>
> On Sat, May 29, 2021 at 9:20 PM Bharath Rupireddy
>  wrote:
> >
> > Thanks. PSA v3 patch.
>
> Thanks for the updated patch, the changes look good to me.
>

Hi,

Having pushed [1], I started looking at this, and I think it's mostly
in good shape.

Since we're improving the CREATE COLLATION errors, I think it's also
worth splitting out the error for LOCALE + LC_COLLATE/LC_CTYPE from
the error for FROM + any other option.

In the case of LOCALE + LC_COLLATE/LC_CTYPE, there is an identical
error in CREATE DATABASE, so we should use the same error message and
detail text here.

Then logically, FROM + any other option should have an error of the
same general form.

And finally, it then makes sense to make the other errors follow the
same pattern (with the "specified more than once" text in the detail),
which is also where we ended up in the discussion over in [1].

So, attached is what I propose.

Regards,
Dean

[1] 
https://www.postgresql.org/message-id/CAEZATCXHWa9OoSAetiZiGQy1eM2raa9q-b3K4ZYDwtcARypCcA%40mail.gmail.com
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
new file mode 100644
index ebb0994..40e2626
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -86,19 +86,47 @@ DefineCollation(ParseState *pstate, List
 		DefElem   **defelp;
 
 		if (strcmp(defel->defname, "from") == 0)
+		{
+			if (fromEl)
+errorDuplicateDefElem(defel, pstate);
 			defelp = 
+		}
 		else if (strcmp(defel->defname, "locale") == 0)
+		{
+			if (localeEl)
+errorDuplicateDefElem(defel, pstate);
 			defelp = 
+		}
 		else if (strcmp(defel->defname, "lc_collate") == 0)
+		{
+			if (lccollateEl)
+errorDuplicateDefElem(defel, pstate);
 			defelp = 
+		}
 		else if (strcmp(defel->defname, "lc_ctype") == 0)
+		{
+			if (lcctypeEl)
+errorDuplicateDefElem(defel, pstate);
 			defelp = 
+		}
 		else if (strcmp(defel->defname, "provider") == 0)
+		{
+			if (providerEl)
+errorDuplicateDefElem(defel, pstate);
 			defelp = 
+		}
 		else if (strcmp(defel->defname, "deterministic") == 0)
+		{
+			if (deterministicEl)
+errorDuplicateDefElem(defel, pstate);
 			defelp = 
+		}
 		else if (strcmp(defel->defname, "version") == 0)
+		{
+			if (versionEl)
+errorDuplicateDefElem(defel, pstate);
 			defelp = 
+		}
 		else
 		{
 			ereport(ERROR,
@@ -112,11 +140,17 @@ DefineCollation(ParseState *pstate, List
 		*defelp = defel;
 	}
 
-	if ((localeEl && (lccollateEl || lcctypeEl))
-		|| (fromEl && list_length(parameters) != 1))
+	if (localeEl && (lccollateEl || lcctypeEl))
 		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("conflicting or redundant options")));
+errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting or redundant options"),
+errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE."));
+
+	if (fromEl && list_length(parameters) != 1)
+		ereport(ERROR,
+errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting or redundant options"),
+errdetail("FROM cannot be specified together with any other options."));
 
 	if (fromEl)
 	{
diff --git a/src/backend/commands/define.c b/src/backend/commands/define.c
new file mode 100644
index aafd755..8dd260d
--- a/src/backend/commands/define.c
+++ b/src/backend/commands/define.c
@@ -359,3 +359,21 @@ errorConflictingDefElem(DefElem *defel,
 			errmsg("conflicting or redundant options"),
 			parser_errposition(pstate, defel->location));
 }
+
+/*
+ * Raise an error about a duplicate DefElem.
+ *
+ * This is similar to errorConflictingDefElem(), except that it is intended for
+ * an option that the user explicitly specified more than once.  This should
+ * only be used if defel->defname is guaranteed to match the user-entered
+ * option name, otherwise the detail text might be confusing.
+ */
+void
+errorDuplicateDefElem(DefElem *defel, ParseState *pstate)
+{
+	ereport(ERROR,
+			errcode(ERRCODE_SYNTAX_ERROR),
+			errmsg("conflicting or redundant options"),
+			errdetail("Option \"%s\" specified more than once.", defel->defname),
+			parser_errposition(pstate, defel->location));
+}
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
new file mode 100644
index f84d099..91743ca
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -154,5 +154,6 @@ extern TypeName *defGetTypeName(DefElem
 extern int	defGetTypeLength(DefElem *def);
 extern List *defGetStringList(DefElem *def);
 extern void errorConflictingDefElem(DefElem *defel, ParseState *pstate) pg_attribute_noreturn();
+extern void errorDuplicateDefElem(DefElem *defel, ParseState *pstate) pg_attribute_noreturn();
 
 #endif			/* DEFREM_H */
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
new file mode 100644
index 0b60b55..c08d8f2
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -701,6 +701,65 @@ View definition:
  SELECT ss.c1

Re: unnesting multirange data types

2021-07-15 Thread Jonathan S. Katz

On 7/15/21 12:26 PM, Alexander Korotkov wrote:
> On Thu, Jul 15, 2021 at 6:47 PM Tom Lane  wrote:
>> Yeah, that seems pretty horrid.  I still don't like the way the
>> array casts were done, but I'd be okay with pushing the unnest
>> addition.
> 
> +1 for just unnest().

...which was my original ask at the beginning of the thread :) So +1.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature

Re: Removing unneeded self joins

2021-07-15 Thread Zhihong Yu

On Thu, Jul 15, 2021 at 8:25 AM Zhihong Yu  wrote:

>
>
> On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov 
> wrote:
>
>> On 6/7/21 13:49, Hywel Carver wrote:
>> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
>> > mailto:a.lepik...@postgrespro.ru>> wrote:
>> > Looking through the email chain, a previous version of this patch added
>> > ~0.6% to planning time in the worst case tested - does that meet the
>> > "essentially free" requirement?
>> I think these tests weren't full coverage of possible use cases. It will
>> depend on a number of relations in the query. For the JOIN of
>> partitioned tables, for example, the overhead could grow. But in the
>> context of overall planning time this overhead will be small till the
>> large number of relations.
>> Also, we made this feature optional to solve possible problems.
>> Rebased on 768ea9bcf9
>>
>> --
>> regards,
>> Andrey Lepikhov
>> Postgres Professional
>>
> Hi,
>
> bq. We can proof the uniqueness
>
> proof -> prove
>
> 1. Collect all mergejoinable join quals looks like a.x = b.x
>
>  quals looks like -> quals which look like
>
> For update_ec_sources(), the variable cc is not used.
>
> Cheers
>
Hi,

+   *otherjoinquals = rjoinquals;

Maybe rename rjoinquals as ojoinquals to align with the target variable
name.

+   int k; /* Index of kept relation */
+   int r = -1; /* Index of removed relation */

Naming k as kept, r as removed would make the code more readable (remain
starts with r but has opposite meaning).

+   if (bms_is_member(r, info->syn_righthand) &&
+   !bms_is_member(k, info->syn_righthand))
+   jinfo_check = false;
+
+   if (!jinfo_check)
+   break;

There are 4 if statements where jinfo_check is assigned false. Once
jinfo_check is assigned, we can break out of the loop - instead of checking
the remaining conditions.

+   else if (!innerrel_is_unique(root, joinrelids, outer->relids,

nit: the 'else' is not needed since the if block above it goes to next
iteration of the loop.

+   /* See for row marks. */
+   foreach (lc, root->rowMarks)

It seems once imark and omark are set, we can come out of the loop.

Cheers

Re: [PATCH] psql: \dn+ to show size of each schema..

On Wed, Jul 14, 2021 at 02:05:29PM +0900, Ian Lawrence Barwick wrote:
> 2021年7月14日(水) 12:07 Justin Pryzby :
> >
> > \db+ and \l+ show sizes of tablespaces and databases, so I was surprised in 
> > the
> > past that \dn+ didn't show sizes of schemas.  I would find that somewhat
> > convenient, and I assume other people would use it even more useful.
> 
> It's something which would be useful to have. But see this previous proposal:
> 
>
> https://www.postgresql.org/message-id/flat/2d6d2ebf-4dbc-4f74-17d8-05461f4782e2%40dalibo.com

Thanks for finding that.

It sounds like the objections were:
1) it may be too slow - I propose the size should be shown only with \n++;
I think \db and \l should get the same treatment, and probably everywhere
should change to use the "int verbose".  I moved the ++ columns to the
right-most column.

2) it may fail or be misleading if user lacks permissions.  
I think Tom's concern was that at some point we might decide to avoid showing a
relation's size to a user who has no access to the rel, and then \dn+ would
show misleading information, or fail.
I implemented this a server-side function for super-user/monitoring role only.  

I think \dn++ is also a reasonable way to address the second concern - if
someone asksk for "very verbose" outpu, they get more of an internal,
implementation dependant output, which might be more likely to change in future
releases.  For example, if we move the ++ columns to the right, someone might
jusifiably think that the \n and \n+ columns would be less likely to change in
the future than the \n++ columns.

I imagine ++ would find more uses in the future.  Like, say, size of an access
methods \dA++.  I'll add that in a future revision - I hope that PG15 will also
have create table like (INCLUDE ACCESS METHOD), ALTER TABLE SET ACCESS METHOD,
and pg_restore --no-tableam.

++ may also allow improved testing of psql features - platform dependent stuff
like size can be in ++, allowing better/easier/testing of +.

-- 
Justin
>From 21bb5d22492089a804903a817b8dd5c179643056 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH v2 1/5] psql: \dn+ to show size of each schema..

See also: 358a897fa, 528ac10c7
---
 src/backend/utils/adt/dbsize.c  | 81 +
 src/bin/psql/describe.c |  7 +++
 src/include/catalog/pg_proc.dat | 10 
 3 files changed, 98 insertions(+)

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d5a7fb13f3..d13e691807 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,8 +13,10 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -25,6 +27,8 @@
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenodemap.h"
@@ -832,6 +836,83 @@ pg_size_bytes(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+/* Compute the size of a schema (namespace) */
+static int64
+calculate_namespace_size(Oid nspOid)
+{
+	int64		totalsize = 0;
+	AclResult	aclresult;
+	Relation	pg_class;
+	ScanKeyData 	skey;
+	SysScanDesc	scan;
+	HeapTuple	tuple;
+
+	/*
+	 * User must be a member of pg_read_all_stats or have CREATE privilege for
+	 * target namespace.
+	 */
+	if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		aclresult = pg_namespace_aclcheck(nspOid, GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+		   get_namespace_name(nspOid));
+	}
+
+	ScanKeyInit(, Anum_pg_class_relnamespace,
+			BTEqualStrategyNumber, F_OIDEQ, nspOid);
+
+	pg_class = table_open(RelationRelationId, AccessShareLock);
+	scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, );
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Relation rel;
+		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
+
+		rel = try_relation_open(classtuple->oid, AccessShareLock);
+		if (!rel)
+			continue;
+
+		for (int forkNum = 1; forkNum <= MAX_FORKNUM; forkNum++)
+			totalsize += calculate_relation_size(>rd_node, rel->rd_backend, forkNum);
+
+		relation_close(rel, AccessShareLock);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_class, AccessShareLock);
+	return totalsize;
+}
+
+Datum
+pg_namespace_size_oid(PG_FUNCTION_ARGS)
+{
+	int64		size;
+	Oid			nspOid = PG_GETARG_OID(0);
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size == 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_namespace_size_name(PG_FUNCTION_ARGS)
+{
+	int64		size;
+	Name		nspName = PG_GETARG_NAME(0);
+	Oid			nspOid = get_namespace_oid(NameStr(*nspName), false);
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size == 0)
+

Re: Git revision in tarballs

2021-07-15 Thread Magnus Hagander

On Thu, Jul 15, 2021 at 4:35 PM Tom Lane  wrote:
>
> Magnus Hagander  writes:
> > Putting it in the tarball making script certainly works for me,
> > though, if that's what people prefer. And that does away with the
> > "clean" part as that one blows away the whole directory between each
> > run.
>
> Actually, we *have* to do it over there, because what that script
> does is
>
># Export the selected git ref
>git archive ${i} | tar xf - -C pgsql
>
>cd pgsql
>./configure
># some irrelevant stuff
>make dist
>
> So there's no .git subdirectory in the directory it runs "make dist"
> in.  Now maybe it'd work anyway because of the GIT_DIR environment
> variable, but what I think is more likely is that the file would
> end up containing the current master-branch HEAD commit, whereas
> the thing we actually want to record here is ${i}.

Hah, yeah, that certainly decides it. And I was even poking around
that script as well today, just nt at the same time :)


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, 15 Jul 2021 at 14:31, David G. Johnston 
wrote:

> On Thu, Jul 15, 2021 at 11:14 AM Jan Wieck  wrote:
>
>> On 7/15/21 1:10 PM, David G. Johnston wrote:
>> > ... But while
>> > PostgreSQL doesn't really have a choice here - it cannot be expected to
>> > subdivide itself - extensions (at least external ones - PostGIS is one
>> I
>> > have in mind presently) - can and often do attempt to support multiple
>> > versions of PostgreSQL for whatever major versions of their product
>> they
>> > are offering.  For these it is possible to adhere to the "change one
>> > thing at a time principle" and to treat the extensions as not being
>> part
>> > of "ALL the changes from one major version to the target version..."
>>
>> You may make that exception for an external extension like PostGIS. But
>> I don't think it is valid for one distributed in sync with the core
>> system in the contrib package, like pg_stat_statements. Which happens to
>> be the one named in the subject line of this entire discussion.
>>
>>
> Yep, and IIUC running "CREATE EXTENSION pg_stat_statements VERSION '1.5';"
> works correctly in v13 as does executing
>
While it does work there are issues with dumping and restoring a database
with the old version of pg_stat_statements in it that would only be found
by dumping and restoring.


> "ALTER EXTENSION pg_stat_statements UPDATE;" while version 1.5 is
> installed.
>



> So even without doing the copying of the old contrib libraries to the new
> server such a "one at a time" procedure would work just fine for this
> particular contrib extension.
>

You cannot copy the old contrib libraries into the new server. This will
fail due to changes in the API and various exported variables either not
being there or being renamed.


>
> And since the OP was unaware of the presence of the existing ALTER
> EXTENSION UPDATE command I'm not sure at what point a "lack of features"
> complaint here is due to lack of knowledge or actual problems (yes, I did
> forget too but at least this strengthens my position that one-at-a-time
> methods are workable, even today).
>

You are correct I was not aware of the ALTER EXTENSION UPDATE command, but
that doesn't change the issue.
It's not so much the lack of features that I am complaining about; it is
the incompleteness of the upgrade. pg_upgrade does a wonderful job telling
me what extensions are incompatible with the upgrade before it does the
upgrade, but it fails to say that the versions that are installed may need
to be updated.

Dave

Re: Using a stock openssl BIO

2021-07-15 Thread Andres Freund

Hi,

On 2021-07-15 13:59:26 -0400, Bruce Momjian wrote:
> On Wed, Jul 14, 2021 at 07:17:47PM -0700, Andres Freund wrote:
> > FWIW, I don't think hardware tls acceleration is a particularly crucial 
> > thing
> > for now. Outside of backups it's rare to have the symmetric encryption part 
> > of
> > tls be the problem these days thanks, to the AES etc functions in most of 
> > the
> > common CPUs.
> >
> > I don't plan to work on this, but Thomas encouraged me to mention this on 
> > the
> > list when I mention it to him.
>
> So, I am aware of CPU AES acceleration and I assume PG uses that.

Yes, it does so via openssl. But that still happens on the CPU. And
what's more, there's a lot of related work in TLS that's fairly
expensive (chunking up the data into TLS records etc). Some of the
better NICs can do that work in the happy path, so the CPU doesn't have
to do encryption nor framing. In some cases that can avoid the
to-be-sent data ever being pulled into the CPU caches, but instead it
can be DMA directly to the NIC.

In PG's case that's particularly interesting when sending out file data
in bulk, say in basebackup.c or walsender.c - the data can be sent via
sendfile(), so it never goes to userspace.

Here's an overview of the kernel TLS / TLS offload
https://legacy.netdevconf.info/1.2/papers/ktls.pdf

> It is the public key certificate verification part of TLS that we
> don't use hardware acceleration for, right?

That's true, but separate from what I was talking about. For most
database workloads the public key stuff shouldn't be a major piece,
because connection establishment shouldn't be that frequent...

Greetings,

Andres Freund

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, Jul 15, 2021 at 11:14 AM Jan Wieck  wrote:

> On 7/15/21 1:10 PM, David G. Johnston wrote:
> > ... But while
> > PostgreSQL doesn't really have a choice here - it cannot be expected to
> > subdivide itself - extensions (at least external ones - PostGIS is one I
> > have in mind presently) - can and often do attempt to support multiple
> > versions of PostgreSQL for whatever major versions of their product they
> > are offering.  For these it is possible to adhere to the "change one
> > thing at a time principle" and to treat the extensions as not being part
> > of "ALL the changes from one major version to the target version..."
>
> You may make that exception for an external extension like PostGIS. But
> I don't think it is valid for one distributed in sync with the core
> system in the contrib package, like pg_stat_statements. Which happens to
> be the one named in the subject line of this entire discussion.
>
>
Yep, and IIUC running "CREATE EXTENSION pg_stat_statements VERSION '1.5';"
works correctly in v13 as does executing "ALTER EXTENSION
pg_stat_statements UPDATE;" while version 1.5 is installed.  So even
without doing the copying of the old contrib libraries to the new server
such a "one at a time" procedure would work just fine for this particular
contrib extension.

And since the OP was unaware of the presence of the existing ALTER
EXTENSION UPDATE command I'm not sure at what point a "lack of features"
complaint here is due to lack of knowledge or actual problems (yes, I did
forget too but at least this strengthens my position that one-at-a-time
methods are workable, even today).

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, Jul 15, 2021 at 02:14:28PM -0400, Jan Wieck wrote:
> On 7/15/21 1:10 PM, David G. Johnston wrote:
> > ... But while PostgreSQL doesn't really have a choice here - it cannot
> > be expected to subdivide itself - extensions (at least external ones -
> > PostGIS is one I have in mind presently) - can and often do attempt to
> > support multiple versions of PostgreSQL for whatever major versions of
> > their product they are offering.  For these it is possible to adhere to
> > the "change one thing at a time principle" and to treat the extensions
> > as not being part of "ALL the changes from one major version to the
> > target version..."
> 
> You may make that exception for an external extension like PostGIS. But I
> don't think it is valid for one distributed in sync with the core system in
> the contrib package, like pg_stat_statements. Which happens to be the one
> named in the subject line of this entire discussion.

Yes, I think one big issue is that the documentation of the new server
might not match the API of the extension installed on the old server.

There has been a lot of discussion from years ago about why we can't
auto-upgrade extensions, so it might be good to revisit that.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_upgrade does not upgrade pg_stat_statements properly


On 7/15/21 1:10 PM, David G. Johnston wrote:
... But while 
PostgreSQL doesn't really have a choice here - it cannot be expected to 
subdivide itself - extensions (at least external ones - PostGIS is one I 
have in mind presently) - can and often do attempt to support multiple 
versions of PostgreSQL for whatever major versions of their product they 
are offering.  For these it is possible to adhere to the "change one 
thing at a time principle" and to treat the extensions as not being part 
of "ALL the changes from one major version to the target version..."


You may make that exception for an external extension like PostGIS. But 
I don't think it is valid for one distributed in sync with the core 
system in the contrib package, like pg_stat_statements. Which happens to 
be the one named in the subject line of this entire discussion.



Regards, Jan

--
Jan Wieck

Re: speed up verifying UTF-8

2021-07-15 Thread John Naylor

On Thu, Jul 15, 2021 at 1:10 AM Amit Khandekar 
wrote:

> - check_ascii() seems to be used only for 64-bit chunks. So why not
> remove the len argument and the len <= sizeof(int64) checks inside the
> function. We can rename it to check_ascii64() for clarity.

Thanks for taking a look!

Well yes, but there's nothing so intrinsic to 64 bits that the name needs
to reflect that. Earlier versions worked on 16 bytes at time. The compiler
will optimize away the len check, but we could replace with an assert
instead.

> - I was thinking, why not have a pg_utf8_verify64() that processes
> 64-bit chunks (or a 32-bit version). In check_ascii(), we anyway
> extract a 64-bit chunk from the string. We can use the same chunk to
> extract the required bits from a two byte char or a 4 byte char. This
> way we can avoid extraction of separate bytes like b1 = *s; b2 = s[1]
> etc.

Loading bytes from L1 is really fast -- I wouldn't even call it
"extraction".

> More importantly, we can avoid the separate continuation-char
> checks for each individual byte.

On a pipelined superscalar CPU, I wouldn't expect it to matter in the
slightest.

> Additionally, we can try to simplify
> the subsequent overlong or surrogate char checks. Something like this

My recent experience with itemptrs has made me skeptical of this kind of
thing, but the idea was interesting enough that I couldn't resist trying it
out. I have two attempts, which are attached as v16*.txt and apply
independently. They are rough, and some comments are now lies. To simplify
the constants, I do shift down to uint32, and I didn't bother working
around that. v16alpha regressed on worst-case input, so for v16beta I went
back to earlier coding for the one-byte ascii check. That helped, but it's
still slower than v14.

That was not unexpected, but I was mildly shocked to find out that v15 is
also slower than the v14 that Heikki posted. The only non-cosmetic
difference is using pg_utf8_verifychar_internal within pg_utf8_verifychar.
I'm not sure why it would make such a big difference here. The numbers on
Power8 / gcc 4.8 (little endian):

HEAD:

 chinese | mixed | ascii | mixed16 | mixed8
-+---+---+-+
2951 |  1521 |   871 |1474 |   1508

v14:

 chinese | mixed | ascii | mixed16 | mixed8
-+---+---+-+
 885 |   607 |   179 | 774 |   1325

v15:

 chinese | mixed | ascii | mixed16 | mixed8
-+---+---+-+
1085 |   671 |   180 |1032 |   1799

v16alpha:

 chinese | mixed | ascii | mixed16 | mixed8
-+---+---+-+
1268 |   822 |   180 |1410 |   2518

v16beta:

 chinese | mixed | ascii | mixed16 | mixed8
-+---+---+-+
1096 |   654 |   182 | 814 |   1403

As it stands now, for v17 I'm inclined to go back to v15, but without the
attempt at being clever that seems to have slowed it down from v14.

Any interest in testing on 64-bit Arm?

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 0636b8765b..177c607be6 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -13,8 +13,47 @@
 #include "c.h"

 #include "mb/pg_wchar.h"
+#include "port/pg_bswap.h"

+/* for UTF-8 */
+#define IS_CONTINUATION_BYTE(c)(((c) & 0xC0) == 0x80)
+#define IS_TWO_BYTE_LEAD(c)(((c) & 0xE0) == 0xC0)
+#define IS_THREE_BYTE_LEAD(c)  (((c) & 0xF0) == 0xE0)
+#define IS_FOUR_BYTE_LEAD(c)   (((c) & 0xF8) == 0xF0)
+
+/* Verify a chunk of bytes for valid ASCII including a zero-byte check. */
+static inline int
+check_ascii(const uint64 chunk)
+{
+   uint64
+   highbits_set,
+   highbit_carry;
+
+   /* Check if any bytes in this chunk have the high bit set. */
+   highbits_set = chunk & UINT64CONST(0x8080808080808080);
+   if (highbits_set)
+   return 0;
+
+   /*
+* Check if there are any zero bytes in this chunk.
+*
+* First, add 0x7f to each byte. This sets the high bit in each byte,
+* unless it was a zero. We already checked that none of the bytes had 
the
+* high bit set previously, so the max value each byte can have after 
the
+* addition is 0x7f + 0x7f = 0xfe, and we don't need to worry about
+* carrying over to the next byte.
+*/
+   highbit_carry = chunk + UINT64CONST(0x7f7f7f7f7f7f7f7f);
+
+   /* Then check that the high bit is set in each byte. */
+   highbit_carry &= UINT64CONST(0x8080808080808080);
+   if (highbit_carry == UINT64CONST(0x8080808080808080))
+   return sizeof(chunk);
+   else
+   return 0;
+}
+
 /*
  * Operations on multi-byte encodings are driven by a table of helper
  * functions.
@@ -1728,6 +1767,97 @@ pg_gb18030_verifystr(const unsigned char *s, int len)
return s - start;
 }

+/*
+ * Workhorse for

Re: Using a stock openssl BIO

On Wed, Jul 14, 2021 at 07:17:47PM -0700, Andres Freund wrote:
> FWIW, I don't think hardware tls acceleration is a particularly crucial thing
> for now. Outside of backups it's rare to have the symmetric encryption part of
> tls be the problem these days thanks, to the AES etc functions in most of the
> common CPUs.
> 
> I don't plan to work on this, but Thomas encouraged me to mention this on the
> list when I mention it to him.

So, I am aware of CPU AES acceleration and I assume PG uses that.  It is
the public key certificate verification part of TLS that we don't use
hardware acceleration for, right?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, Jul 15, 2021 at 9:58 AM Jan Wieck  wrote:

> On 7/15/21 12:46 PM, David G. Johnston wrote:
>
> > I am an application developer who operates on the principle of "change
> > only one thing at a time".
>
> Which pg_upgrade by definition isn't. It is bringing ALL the changes
> from one major version to the target version, which may be multiple at
> once. Including, but not limited to, catalog schema changes, SQL
> language changes, extension behavior changes and utility command
> behavior changes.
>
> On that principle, you should advocate against using pg_upgrade in the
> first place.
>
>
Not that I use extensions a whole lot (yes, my overall experience here is
slim) but I would definitely prefer those that allow me to stay on a single
PostgreSQL major version while migrating between major versions of their
own product.  Extensions that don't fit this model (i.e., choose to treat
their major version as being the same as the major version of PostgreSQL
they were developed for) must by necessity be upgraded simultaneously with
the PostgreSQL server.  But while PostgreSQL doesn't really have a choice
here - it cannot be expected to subdivide itself - extensions (at least
external ones - PostGIS is one I have in mind presently) - can and often do
attempt to support multiple versions of PostgreSQL for whatever major
versions of their product they are offering.  For these it is possible to
adhere to the "change one thing at a time principle" and to treat the
extensions as not being part of "ALL the changes from one major version to
the target version..."

David J.

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-15 Thread James Coleman

On Thu, Jul 15, 2021 at 10:19 AM David Rowley  wrote:
>
> On Fri, 16 Jul 2021 at 01:44, James Coleman  wrote:
> >
> > On Wed, Jul 14, 2021 at 9:22 PM David Rowley  wrote:
> > >
> > > On Thu, 15 Jul 2021 at 12:30, Ranier Vilela  wrote:
> > > >
> > > > Em qua., 14 de jul. de 2021 às 21:21, David Rowley 
> > > >  escreveu:
> > > >> But, in v8 there is no additional branch, so no branch to mispredict.
> > > >> I don't really see how your explanation fits.
> > > >
> > > > In v8 the branch occurs at :
> > > > + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)
> > >
> > > You do know that branch is in a function that's only executed once
> > > during executor initialization, right?
> >
> > This is why I have a hard time believing there's a "real" change here
> > and not the result of either noise or something not really
> > controllable like executable layout changing.
>
> Yeah, I think we likely are at the level where layout changes in the
> compiled code are going to make things hard to measure.  I just want
> to make sure we're not going to end up with some regression that's
> actual and not random depending on layout changes of unrelated code.
> I think a branch that's taken consistently *should* be predicted
> correctly each time.
>
> Anyway, I think all the comparisons with v7b can safely be ignored. As
> Ronan pointed out, v7b has some issues due to it not recording the
> sort method in the executor state that leads to it forgetting which
> method it used once we start pulling tuples from it. The reproductions
> of that are it calling tuplesort_gettupleslot() from the 2nd tuple
> onwards regardless of if we've done a datum or tuple sort.
>
> Ronan's latest results plus John's make me think there's no need to
> separate out the node function as I did in v8.  However, I do think v6
> could learn a little from v8. I think I'd rather see the sort method
> determined in ExecInitSort() rather than ExecSort(). I think
> minimising those few extra instructions in ExecSort() might help the
> L1 instruction cache.

I ran master/v6/v8 tests for 90s each with David's test script on an
AWS c5n.metal instance (so should be immune to noise neighbor issues).
Here are comparative results:

Test1 Test2 Test3 Test4 Test5 Test6 Test7 Test8
v6 68.66% 0.05% 32.21% -0.83% 12.58% 10.42% -1.48% 50.98%
v8 69.78% -0.44% 32.45% -1.11% 12.01% 10.58% -1.40% 49.30%

So I see a consistent change in the data, but I don't really see a
good explanation for it not being noise. Can't prove that yet though.

James

Re: pg_upgrade does not upgrade pg_stat_statements properly


On 7/15/21 12:46 PM, David G. Johnston wrote:

I am an application developer who operates on the principle of "change 
only one thing at a time".


Which pg_upgrade by definition isn't. It is bringing ALL the changes 
from one major version to the target version, which may be multiple at 
once. Including, but not limited to, catalog schema changes, SQL 
language changes, extension behavior changes and utility command 
behavior changes.


On that principle, you should advocate against using pg_upgrade in the 
first place.



Regards, Jan

--
Jan Wieck

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, Jul 15, 2021 at 9:36 AM Jan Wieck  wrote:

>
> I am a bit confused here. From the previous exchange I get the feeling
> that you haven't created and maintained a single extension that survived
> a single version upgrade of itself or PostgreSQL (in the latter case
> requiring code changes to the extension due to internal API changes
> inside the PostgreSQL version).
>

Correct.

>
> I have. PL/Profiler to be explicit.
>
> Can you please elaborate what experience your opinion is based on?
>
>
I am an application developer who operates on the principle of "change only
one thing at a time".

David J.

Next Steps with Hash Indexes

2021-07-15 Thread Simon Riggs

Hi,

I've been investigating hash indexes and have what I think is a clear
picture in my head, so time for discussion.

It would be very desirable to allow Hash Indexes to become Primary Key
Indexes, which requires both
amroutine->amcanunique = true;
amroutine->amcanmulticol = true;

Every other hash index TODO seems like performance tuning, so can wait
awhile, even if it is tempting to do that first.

1. Multi-column Indexes
seems to have floundered because of this thread "Multicolumn Hash Indexes",
https://www.postgresql.org/message-id/29263.1506483172%40sss.pgh.pa.us,
but those issues don't apply in most common cases and so they seem
acceptable restrictions, especially since some already apply to btrees
etc..
(And noting that Hash indexes already assume strict hash operators, so
that is not an issue).
For me, this separates into two sub-concerns:
1.1 Allow multi-columns to be defined for hash indexes
Enabling this a simple one line patch
amroutine->amcanmulticol = true;
which works just fine on current HEAD without further changes (manual
testing, as yet).
If we do this first, then any work on uniqueness checking can take
into account multiple columns.

1.2 Combining multi-column hashes into one hash value
Trivially, this is already how it works, in the sense that we just use
the first column, however many columns there are in the index! Doing
more is an already solved problem in Postgres,
[TupleHashTableHash_internal() in src/backend/executor/execGrouping.c]
as pointed out here: "Combining hash values"
https://www.postgresql.org/message-id/CAEepm%3D3rdgjfxW4cKvJ0OEmya2-34B0qHNG1xV0vK7TGPJGMUQ%40mail.gmail.com
though noting there was no discussion on that point [1]. This just
needs a little refactoring to improve things, but it seems more like a
nice to have than an essential aspect of hash indexes that need not
block us from enabling multi-column hash indexes.

2. Unique Hash Indexes have been summarized here:
https://www.postgresql.org/message-id/CAA4eK1KATC1TA5bR5eobYQVO3RWsnH6djNpk3P376em4V8MuUA%40mail.gmail.com
which also seems to have two parts to it.

2.1 Uniqueness Check
Amit: "to ensure that there is no duplicate entry we need to traverse
the whole bucket chain"
Agreed. That seems straightforward and can also be improved later.

2.2 Locking
Amit's idea of holding ExclusiveLock on the bucket page works for me,
but there was some doubt about splitting.

Splitting seems to be an awful behavior that users would wish to avoid
if they knew about the impact and duration. In my understanding,
splitting involves moving 50% of rows and likely touches all blocks
eventually. If the existing index is the wrong shape then just run
REINDEX. If we tune the index build, looks like REINDEX would be
quicker and easier way of growing an index than trying to split an
existing index. i.e. rely on ecdysis not internal growth. This is much
more viable now because of the CIC changes in PG14.

(I would argue that removing splitting completely is a good idea,
similar to the way we have removed the original VACUUM FULL algorithm,
but that will take a while to swallow that thought). Instead, I
suggest we introduce a new indexam option for hash indexes of
autosplit=on (default) | off, so that users can turn splitting off.
Which means we would probably also need another option for
initial_buckets=0 (default) means use number of existing rows to size,
or N=use that specific size. Note that turning off splitting does not
limit the size of the index, it just stops the index from re-freshing
its number of buckets. B-trees are the default for PKs, so Hash
indexes are an option for larger tables only, so there is less need to
have hash indexes cope with tables of unknown size - we wouldn't even
be using hash unless we already know it is a big table.

If splitting causes any difficulty at all, then we should simply say
that Unique Hash Index indexes should initially force autosplit=off,
so we don't have to worry about the correctness of the locking. I
suggest we implement that first and then decide if we really care
about splitting, cos I'd bet we don't. Yes, I consider uniqueness much
more desirable than splitting.

I've written a patch that refactors index build so we *never* need to
perform a split during index build, allowing us to more credibly skip
index splitting completely. (Incidentally, it avoids the need to
update the metapage for each row during the build, allowing us to
consider writing in batches to the index as a next step). So there
need be no *requirement* for splitting to be supported with
uniqueness, while build/reindex looks like it can be much faster. I
can post it if anyone wants to see it, but I don't want to distract us
from discussion of the main requirements.

I have other performance tuning ideas, but they can wait.

Anyway, that's what I think at present. Thoughts?

--
Simon Riggshttp://www.EnterpriseDB.com/

Re: pg_upgrade does not upgrade pg_stat_statements properly


On 7/15/21 12:31 PM, Robert Eckhardt wrote:
  I don’t know if this is a terrible flaw in pg_upgrade, it is a 
terrible flaw in the overall Postgres experience.


+1 (that is the actual problem here)


--
Jan Wieck

Re: pg_upgrade does not upgrade pg_stat_statements properly


On 7/15/21 12:25 PM, David G. Johnston wrote:

I would say that it probably should be "--upgrade-extension=aaa 
--upgrade_extension=bbb" though if we are going to the effort to offer 
something.


I am a bit confused here. From the previous exchange I get the feeling 
that you haven't created and maintained a single extension that survived 
a single version upgrade of itself or PostgreSQL (in the latter case 
requiring code changes to the extension due to internal API changes 
inside the PostgreSQL version).


I have. PL/Profiler to be explicit.

Can you please elaborate what experience your opinion is based on?


Regards, Jan


--
Jan Wieck

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, 15 Jul 2021 at 12:25, David G. Johnston 
wrote:

> On Thu, Jul 15, 2021 at 9:16 AM Dave Cramer  wrote:
>
>> Eh, and
>>>   pg_upgrade [other switches] --upgrade-extensions
>>> sounds good too ...
>>>
>>
>> Ultimately I believe this is the solution, however we still need to teach
>> extensions how to upgrade themselves or emit a message saying they can't,
>> or even ignore if it truly is a NOP.
>>
>>
> If it's opt-in and simple I don't really care but I doubt I would use it
> as personally I'd rather the upgrade not touch my application at all (to
> the extent possible) and just basically promise that I'll get a reliable
> upgrade.  Then I'll go ahead and ensure I have the backups of the new
> version and that my application works correctly, then just run the "ALTER
> EXTENSION" myself.  But anything that will solve pain points for
> same-PostgreSQL-version extension upgrading is great.
>

I may have not communicated this clearly. In this case the application
worked fine, the extension worked fine. The issue only arose when doing a
dump and restore of the database and then the only reason it failed was due
to trying to revoke permissions from pg_stat_statements_reset.

There may have been other things that were not working correctly but since
it did not cause any errors it was difficult to know.

As Robert points out in the next message this is not a particularly great
user experience

Dave

>
>

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread Robert Eckhardt

On Thu, Jul 15, 2021 at 8:43 AM Dave Cramer 
mailto:davecra...@gmail.com>> wrote:

On Thu, 15 Jul 2021 at 11:29, David G. Johnston 
mailto:david.g.johns...@gmail.com>> wrote:

I’m not familiar with what hoops extensions jump through to facilitate upgrades 
but even if it was as simple as “create extension upgrade” I wouldn’t have 
pg_upgrade execute that command (or at least not by default).  I would maybe 
have pg_upgrade help move the libraries over from the old server (and we must 
be dealing with different databases having different extension versions in some 
manner…).

Well IMHO the status quo is terrible. Perhaps you have a suggestion on how to 
make it better ?

To a certain extent it is beyond pg_upgrade's purview to care about extension 
explicitly - it considers them "data" on the database side and copies over the 
schema and, within reason, punts on the filesystem by saying "ensure that the 
existing versions of your extensions in the old cluster can correctly run in 
the new cluster" (which basically just takes a simple file copy/install and the 
assumption you are upgrading to a server version that is supported by the 
extension in question - also a reasonable requirement).  In short, I don't have 
a suggestion on how to improve that and don't really consider it a terrible 
flaw in pg_upgrade.

 I don’t know if this is a terrible flaw in pg_upgrade, it is a terrible flaw 
in the overall Postgres experience.

 We are currently working through various upgrade processes and it seems like 
the status quo is:

Drop the extension, upgrade and reinstall
OR
Upgrade the cluster then upgrade the extension

The issue is that it often isn’t clear which path to choose and choosing the 
wrong path can lead to data loss.

I don’t think it is ok to expect end users to understand when it is an isn’t ok 
to just drop and recreate and often it
Isn’t clear in the extension documentation itself.  I’m not sure what core 
can/should do about it but it is a major pain.

-- Rob

Re: unnesting multirange data types

2021-07-15 Thread Alexander Korotkov

On Thu, Jul 15, 2021 at 6:47 PM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > On 2021-Jun-19, Tom Lane wrote:
> >> I'd say let's sit on the unnest code for a little bit and see what
> >> happens.
>
> > ... So, almost a month has gone by, and we still don't have multirange
> > unnest().  Looking at the open items list, it doesn't look like we have
> > anything that would require a catversion bump.  Does that mean that
> > we're going to ship pg14 without multirange unnest?
>
> > That seems pretty sad, as the usability of the feature is greatly
> > reduced.  Just look at what's being suggested:
> >   https://postgr.es/m/20210715121508.ga30...@depesz.com
> > To me this screams of an incomplete datatype.  I far prefer a beta3
> > initdb than shipping 14GA without multirange unnest.
>
> Yeah, that seems pretty horrid.  I still don't like the way the
> array casts were done, but I'd be okay with pushing the unnest
> addition.

I agree that array casts require better polymorphism and should be
considered for pg15.

+1 for just unnest().

--
Regards,
Alexander Korotkov

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, Jul 15, 2021 at 9:16 AM Dave Cramer  wrote:

> Eh, and
>>   pg_upgrade [other switches] --upgrade-extensions
>> sounds good too ...
>>
>
> Ultimately I believe this is the solution, however we still need to teach
> extensions how to upgrade themselves or emit a message saying they can't,
> or even ignore if it truly is a NOP.
>
>
If it's opt-in and simple I don't really care but I doubt I would use it as
personally I'd rather the upgrade not touch my application at all (to the
extent possible) and just basically promise that I'll get a reliable
upgrade.  Then I'll go ahead and ensure I have the backups of the new
version and that my application works correctly, then just run the "ALTER
EXTENSION" myself.  But anything that will solve pain points for
same-PostgreSQL-version extension upgrading is great.

I would say that it probably should be "--upgrade-extension=aaa
--upgrade_extension=bbb" though if we are going to the effort to offer
something.

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, 15 Jul 2021 at 12:13, Alvaro Herrera 
wrote:

> On 2021-Jul-15, Alvaro Herrera wrote:
>
> > On 2021-Jul-15, Dave Cramer wrote:
> >
> > > Well IMHO the status quo is terrible. Perhaps you have a suggestion on
> how
> > > to make it better ?
> >
> > I thought the suggestion of having pg_upgrade emit a file with a list of
> > all extensions needing upgrade in each database was a fairly decent one.
>
> I think this is the minimum we should be doing.

> Eh, and
>   pg_upgrade [other switches] --upgrade-extensions
> sounds good too ...
>

Ultimately I believe this is the solution, however we still need to teach
extensions how to upgrade themselves or emit a message saying they can't,
or even ignore if it truly is a NOP.

Dave

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, Jul 15, 2021 at 8:43 AM Dave Cramer  wrote:

>
> On Thu, 15 Jul 2021 at 11:29, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> I’m not familiar with what hoops extensions jump through to facilitate
>> upgrades but even if it was as simple as “create extension upgrade” I
>> wouldn’t have pg_upgrade execute that command (or at least not by
>> default).  I would maybe have pg_upgrade help move the libraries over from
>> the old server (and we must be dealing with different databases having
>> different extension versions in some manner…).
>>
>
> Well IMHO the status quo is terrible. Perhaps you have a suggestion on how
> to make it better ?
>

To a certain extent it is beyond pg_upgrade's purview to care about
extension explicitly - it considers them "data" on the database side and
copies over the schema and, within reason, punts on the filesystem by
saying "ensure that the existing versions of your extensions in the old
cluster can correctly run in the new cluster" (which basically just takes a
simple file copy/install and the assumption you are upgrading to a server
version that is supported by the extension in question - also a reasonable
requirement).  In short, I don't have a suggestion on how to improve that
and don't really consider it a terrible flaw in pg_upgrade.

I'll readily admit that I lack sufficient knowledge here to make such
suggestions as I don't hold any optionions that things are "quite terrible"
and haven't been presented with concrete problems to consider alternatives
for.

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread Alvaro Herrera

On 2021-Jul-15, Alvaro Herrera wrote:

> On 2021-Jul-15, Dave Cramer wrote:
> 
> > Well IMHO the status quo is terrible. Perhaps you have a suggestion on how
> > to make it better ?
> 
> I thought the suggestion of having pg_upgrade emit a file with a list of
> all extensions needing upgrade in each database was a fairly decent one.

Eh, and 
  pg_upgrade [other switches] --upgrade-extensions
sounds good too ...

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread Alvaro Herrera

On 2021-Jul-15, Dave Cramer wrote:

> Well IMHO the status quo is terrible. Perhaps you have a suggestion on how
> to make it better ?

I thought the suggestion of having pg_upgrade emit a file with a list of
all extensions needing upgrade in each database was a fairly decent one.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: unnesting multirange data types

Alvaro Herrera  writes:
> On 2021-Jun-19, Tom Lane wrote:
>> I'd say let's sit on the unnest code for a little bit and see what
>> happens.

> ... So, almost a month has gone by, and we still don't have multirange
> unnest().  Looking at the open items list, it doesn't look like we have
> anything that would require a catversion bump.  Does that mean that
> we're going to ship pg14 without multirange unnest?

> That seems pretty sad, as the usability of the feature is greatly
> reduced.  Just look at what's being suggested:
>   https://postgr.es/m/20210715121508.ga30...@depesz.com
> To me this screams of an incomplete datatype.  I far prefer a beta3
> initdb than shipping 14GA without multirange unnest.

Yeah, that seems pretty horrid.  I still don't like the way the
array casts were done, but I'd be okay with pushing the unnest
addition.

regards, tom lane

Re: psql - add SHOW_ALL_RESULTS option

2021-07-15 Thread Fabien COELHO





The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".


Ok. I noticed. The patch got significantly broken by the watch pager 
commit. I also have to enhance the added tests (per Peter request).


--
Fabien.

Re: enable_resultcache confusion

On Mon, Jul 12, 2021 at 11:47:27AM +1200, David Rowley wrote:
> On Mon, 12 Jul 2021 at 03:22, Justin Pryzby  wrote:
> > |This is useful if only a small percentage of rows is checked on
> > |the inner side and is controlled by  > |linkend="guc-enable-resultcache"/>.
> 
> You might be right there, but I'm not too sure if I changed that that
> it might cause a mention of the rename to be missed in the changes
> since beta2 notes.

The commit will appear in the PG 14 git log, and someone will then see
the rename is part of the changes since the previous beta.  Also, when
the release notes "as of" date is updated, all commits since the
previous "as of" date will be reviewed.  Yes, someone might try to
update the release notes for this change and see it was already done,
but that is easily handled.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, 15 Jul 2021 at 11:29, David G. Johnston 
wrote:

> On Thursday, July 15, 2021, Dave Cramer  wrote:
>
>>
>> I'm thinking at this point we need something a bit more sophisticated like
>>
>> ALTER EXTENSION ... UPGRADE. And the extension knows how to upgrade
>> itself.
>>
>
> I’m not familiar with what hoops extensions jump through to facilitate
> upgrades but even if it was as simple as “create extension upgrade” I
> wouldn’t have pg_upgrade execute that command (or at least not by
> default).  I would maybe have pg_upgrade help move the libraries over from
> the old server (and we must be dealing with different databases having
> different extension versions in some manner…).
>

Well IMHO the status quo is terrible. Perhaps you have a suggestion on how
to make it better ?

Dave


>
> David J.
>
>

Re: Replacing pg_depend PIN entries with a fixed range check

John Naylor  writes:
> If no one else has anything, I think this is ready for commit.

Pushed, after adopting the suggestion to dispense with
isSharedObjectPinned.

regards, tom lane

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-15 Thread Fujii Masao





On 2021/07/09 22:44, Masahiko Sawada wrote:

On Fri, Jul 9, 2021 at 3:26 PM Fujii Masao  wrote:

As far as I read the code, keep using old API for foreign subtransaction doesn't
cause any actual bug. But it's just strange and half-baked to manage top and
sub transaction in the differenet layer and to use old and new API for them.


That's a valid concern. I'm really not sure what we should do here but
I guess that even if we want to support subscriptions we have another
API dedicated for subtransaction commit and rollback.

Ok, so if possible I will write POC patch for new API for foreign 
subtransactions
and consider whether it's enough simple that we can commit into core or not.


+#define FDWXACT_FLAG_PARALLEL_WORKER   0x02/* is parallel worker? */

This implies that parallel workers may execute PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to the foreign server for atomic commit?
If so, what happens if the PREPARE TRANSACTION that one of
parallel workers issues fails? In this case, not only that parallel worker
but also other parallel workers and the leader should rollback the transaction
at all. That is, they should issue ROLLBACK PREPARED to the foreign servers.
This issue was already handled and addressed in the patches?

This seems not actual issue if only postgres_fdw is used. Because postgres_fdw
doesn't have IsForeignScanParallelSafe API. Right? But what about other FDW?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: unnesting multirange data types

2021-07-15 Thread Alvaro Herrera

On 2021-Jun-19, Tom Lane wrote:

> Alexander Korotkov  writes:
> > I also don't feel comfortable hurrying with unnest part to beta2.
> > According to the open items wiki page, there should be beta3.  Does
> > unnest part have a chance for beta3?
> 
> Hm.  I'd prefer to avoid another forced initdb after beta2.  On the
> other hand, it's entirely likely that there will be some other thing
> that forces that; in which case there'd be no reason not to push in
> the unnest feature as well.
> 
> I'd say let's sit on the unnest code for a little bit and see what
> happens.

... So, almost a month has gone by, and we still don't have multirange
unnest().  Looking at the open items list, it doesn't look like we have
anything that would require a catversion bump.  Does that mean that
we're going to ship pg14 without multirange unnest?

That seems pretty sad, as the usability of the feature is greatly
reduced.  Just look at what's being suggested:
  https://postgr.es/m/20210715121508.ga30...@depesz.com
To me this screams of an incomplete datatype.  I far prefer a beta3
initdb than shipping 14GA without multirange unnest.

I haven't seen any announcements about beta3, but it's probably not far
off; I think if we're going to have it, it would be much better to give
it buildfarm cycles several days in advance and not just the last
weekend.

What do others think?

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thursday, July 15, 2021, Dave Cramer  wrote:

>
> I'm thinking at this point we need something a bit more sophisticated like
>
> ALTER EXTENSION ... UPGRADE. And the extension knows how to upgrade itself.
>

I’m not familiar with what hoops extensions jump through to facilitate
upgrades but even if it was as simple as “create extension upgrade” I
wouldn’t have pg_upgrade execute that command (or at least not by
default).  I would maybe have pg_upgrade help move the libraries over from
the old server (and we must be dealing with different databases having
different extension versions in some manner…).

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, 15 Jul 2021 at 11:15, David G. Johnston 
wrote:

> On Thursday, July 15, 2021, David G. Johnston 
> wrote:
>
>> On Thursday, July 15, 2021, Dave Cramer  wrote:
>>
>>>
>>>   Install any custom shared object files (or DLLs) used by the old
>>> cluster
>>>   into the new cluster, e.g., pgcrypto.so,
>>>   whether they are from contrib
>>> - or some other source.
>>> However it may be
>>> + necessary to recreate the extension on the new server after the
>>> upgrade
>>> + to ensure compatibility with the new library.
>>>
>>>
>>
>>  My uncertainty revolves around core extensions since it seems odd to
>> tell the user to overwrite them with versions from an older version of
>> PostgreSQL.
>>
>
> Ok. Just re-read the docs a third time…no uncertainty regarding contrib
> now…following the first part of the instructions means that before one
> could re-run create extension they would need to restore the original
> contrib library files to avoid the new extension code using the old
> library.  So that whole part about recreation is inconsistent with the
> existing unchanged text.
>
>
The way I solved the original problem of having old function definitions
for pg_stat_statement functions in the *new* library was by recreating the
extension which presumably redefines the functions correctly.

I'm thinking at this point we need something a bit more sophisticated like

ALTER EXTENSION ... UPGRADE. And the extension knows how to upgrade itself.

Dave


>
>
>

Re: Removing unneeded self joins

2021-07-15 Thread Zhihong Yu

On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov 
wrote:

> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of
> partitioned tables, for example, the overhead could grow. But in the
> context of overall planning time this overhead will be small till the
> large number of relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
>
> --
> regards,
> Andrey Lepikhov
> Postgres Professional
>
Hi,

bq. We can proof the uniqueness

proof -> prove

1. Collect all mergejoinable join quals looks like a.x = b.x

 quals looks like -> quals which look like

For update_ec_sources(), the variable cc is not used.

Cheers

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thursday, July 15, 2021, Dave Cramer  wrote:

> Well clearly my suggestion was not clear if you interpreted that as over
> writing them with versions from an older version of PostgreSQL.
>
>>
>>
Ignoring my original interpretation as being moot; the section immediately
preceding your edit says to do exactly that.

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thursday, July 15, 2021, David G. Johnston 
wrote:

> On Thursday, July 15, 2021, Dave Cramer  wrote:
>
>>
>>   Install any custom shared object files (or DLLs) used by the old
>> cluster
>>   into the new cluster, e.g., pgcrypto.so,
>>   whether they are from contrib
>> - or some other source.
>> However it may be
>> + necessary to recreate the extension on the new server after the
>> upgrade
>> + to ensure compatibility with the new library.
>>
>>
>
>  My uncertainty revolves around core extensions since it seems odd to tell
> the user to overwrite them with versions from an older version of
> PostgreSQL.
>

Ok. Just re-read the docs a third time…no uncertainty regarding contrib
now…following the first part of the instructions means that before one
could re-run create extension they would need to restore the original
contrib library files to avoid the new extension code using the old
library.  So that whole part about recreation is inconsistent with the
existing unchanged text.

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thu, 15 Jul 2021 at 11:01, David G. Johnston 
wrote:

> On Thursday, July 15, 2021, Dave Cramer  wrote:
>
>>
>> As a first step I propose the following
>>
>> diff --git a/doc/src/sgml/ref/pgupgrade.sgml
>> b/doc/src/sgml/ref/pgupgrade.sgml
>> index a83c63cd98..f747a4473a 100644
>> --- a/doc/src/sgml/ref/pgupgrade.sgml
>> +++ b/doc/src/sgml/ref/pgupgrade.sgml
>> @@ -305,9 +305,10 @@ make prefix=/usr/local/pgsql.new install
>>   Install any custom shared object files (or DLLs) used by the old
>> cluster
>>   into the new cluster, e.g., pgcrypto.so,
>>   whether they are from contrib
>> - or some other source. Do not install the schema definitions, e.g.,
>> - CREATE EXTENSION pgcrypto, because these will be
>> upgraded
>> - from the old cluster.
>> + or some other source. Do not execute CREATE EXTENSION on the new
>> cluster.
>> + The extensions will be upgraded from the old cluster. However it
>> may be
>> + necessary to recreate the extension on the new server after the
>> upgrade
>> + to ensure compatibility with the new library.
>>   Also, any custom full text search files (dictionary, synonym,
>>   thesaurus, stop words) must also be copied to the new cluster.
>>  
>>
>>
>
> I think this needs some work to distinguish between core extensions where
> we know the new server already has a library installed and external
> extensions where it’s expected that the library that is added to the new
> cluster is compatible with the version being migrated (not upgraded).  In
> short, it should never be necessary to recreate the extension.  My
> uncertainty revolves around core extensions since it seems odd to tell the
> user to overwrite them with versions from an older version of PostgreSQL.
>


Well clearly my suggestion was not clear if you interpreted that as over
writing them with versions from an older version of PostgreSQL.


Dave Cramer

>
>

Re: pg_upgrade does not upgrade pg_stat_statements properly

On Thursday, July 15, 2021, Dave Cramer  wrote:

>
> As a first step I propose the following
>
> diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.
> sgml
> index a83c63cd98..f747a4473a 100644
> --- a/doc/src/sgml/ref/pgupgrade.sgml
> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -305,9 +305,10 @@ make prefix=/usr/local/pgsql.new install
>   Install any custom shared object files (or DLLs) used by the old
> cluster
>   into the new cluster, e.g., pgcrypto.so,
>   whether they are from contrib
> - or some other source. Do not install the schema definitions, e.g.,
> - CREATE EXTENSION pgcrypto, because these will be
> upgraded
> - from the old cluster.
> + or some other source. Do not execute CREATE EXTENSION on the new
> cluster.
> + The extensions will be upgraded from the old cluster. However it may
> be
> + necessary to recreate the extension on the new server after the
> upgrade
> + to ensure compatibility with the new library.
>   Also, any custom full text search files (dictionary, synonym,
>   thesaurus, stop words) must also be copied to the new cluster.
>  
>
>

I think this needs some work to distinguish between core extensions where
we know the new server already has a library installed and external
extensions where it’s expected that the library that is added to the new
cluster is compatible with the version being migrated (not upgraded).  In
short, it should never be necessary to recreate the extension.  My
uncertainty revolves around core extensions since it seems odd to tell the
user to overwrite them with versions from an older version of PostgreSQL.

David J.

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-15 Thread Ronan Dunklau

Le jeudi 15 juillet 2021, 16:19:23 CEST David Rowley a écrit :> 
> Ronan's latest results plus John's make me think there's no need to
> separate out the node function as I did in v8.  However, I do think v6
> could learn a little from v8. I think I'd rather see the sort method
> determined in ExecInitSort() rather than ExecSort(). I think
> minimising those few extra instructions in ExecSort() might help the
> L1 instruction cache.
> 

I'm not sure I understand what you expect from moving that to ExecInitSort ? 
Maybe we should also implement the tuplesort_state initialization in 
ExecInitSort ? (not the actual feeding and sorting of course).

Please find attached a v9 just moving the flag setting to ExecInitSort, and my 
apologies if I misunderstood your point.

-- 
Ronan Dunklaucommit 2bf8db23c8ddb53f7dbcba33c60350bda9d703ea
Author: Ronan Dunklau 
Date:   Tue Jul 6 16:34:31 2021 +0200

Use optimized single-datum tuplesort in ExecSort

Previously, with optimization was only done in ExecAgg.
Doing this optimization on regular sort nodes provides for a nice
performance improvement, when the input / output tuple has only one
attribute.

diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index b99027e0d7..861e4c6e9e 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -29,6 +29,10 @@
  *		which saves the results in a temporary file or memory. After the
  *		initial call, returns a tuple from the file with each call.
  *
+ *		The tuplesort can either occur on the whole tuple (this is the nominal
+ *		case) or, when the input / output tuple consists of only one attribute,
+ *		we switch to the tuplesort_*_datum API, optimized for that specific case.
+ *
  *		Conditions:
  *		  -- none.
  *
@@ -86,31 +90,63 @@ ExecSort(PlanState *pstate)
 		outerNode = outerPlanState(node);
 		tupDesc = ExecGetResultType(outerNode);
 
-		tuplesortstate = tuplesort_begin_heap(tupDesc,
-			  plannode->numCols,
-			  plannode->sortColIdx,
-			  plannode->sortOperators,
-			  plannode->collations,
-			  plannode->nullsFirst,
-			  work_mem,
-			  NULL,
-			  node->randomAccess);
+		/*
+		 * Switch to the tuplesort_*_datum interface when we have only one
+		 * key, as it is much more efficient especially when the type is
+		 * pass-by-value.
+		 */
+		if (node->datumSort)
+		{
+			tuplesortstate = tuplesort_begin_datum(TupleDescAttr(tupDesc, 0)->atttypid,
+   plannode->sortOperators[0],
+   plannode->collations[0],
+   plannode->nullsFirst[0],
+   work_mem,
+   NULL,
+   node->randomAccess);
+		}
+		else
+			tuplesortstate = tuplesort_begin_heap(tupDesc,
+  plannode->numCols,
+  plannode->sortColIdx,
+  plannode->sortOperators,
+  plannode->collations,
+  plannode->nullsFirst,
+  work_mem,
+  NULL,
+  node->randomAccess);
 		if (node->bounded)
 			tuplesort_set_bound(tuplesortstate, node->bound);
 		node->tuplesortstate = (void *) tuplesortstate;
 
 		/*
-		 * Scan the subplan and feed all the tuples to tuplesort.
+		 * Scan the subplan and feed all the tuples to tuplesort, using either
+		 * the _putdatum or _puttupleslot API as appropriate.
 		 */
-
-		for (;;)
+		if (node->datumSort)
 		{
-			slot = ExecProcNode(outerNode);
-
-			if (TupIsNull(slot))
-break;
-
-			tuplesort_puttupleslot(tuplesortstate, slot);
+			for (;;)
+			{
+slot = ExecProcNode(outerNode);
+
+if (TupIsNull(slot))
+	break;
+slot_getsomeattrs(slot, 1);
+tuplesort_putdatum(tuplesortstate,
+   slot->tts_values[0],
+   slot->tts_isnull[0]);
+			}
+		}
+		else
+		{
+			for (;;)
+			{
+slot = ExecProcNode(outerNode);
+
+if (TupIsNull(slot))
+	break;
+tuplesort_puttupleslot(tuplesortstate, slot);
+			}
 		}
 
 		/*
@@ -150,9 +186,18 @@ ExecSort(PlanState *pstate)
 	 * next fetch from the tuplesort.
 	 */
 	slot = node->ss.ps.ps_ResultTupleSlot;
-	(void) tuplesort_gettupleslot(tuplesortstate,
-  ScanDirectionIsForward(dir),
-  false, slot, NULL);
+	if (node->datumSort)
+	{
+		ExecClearTuple(slot);
+		if (tuplesort_getdatum(tuplesortstate, ScanDirectionIsForward(dir),
+			   &(slot->tts_values[0]), &(slot->tts_isnull[0]), NULL))
+			ExecStoreVirtualTuple(slot);
+	}
+	else
+		(void) tuplesort_gettupleslot(tuplesortstate,
+	  ScanDirectionIsForward(dir),
+	  false, slot, NULL);
+
 	return slot;
 }
 
@@ -191,6 +236,7 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
 	sortstate->bounded = false;
 	sortstate->sort_Done = false;
 	sortstate->tuplesortstate = NULL;
+	sortstate->datumSort = false;
 
 	/*
 	 * Miscellaneous initialization
@@ -220,6 +266,15 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
 	 */
 	ExecInitResultTupleSlotTL(>ss.ps, );

Re: Removing unneeded self joins

2021-07-15 Thread Andrey Lepikhov


On 6/7/21 13:49, Hywel Carver wrote:
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:
Looking through the email chain, a previous version of this patch added 
~0.6% to planning time in the worst case tested - does that meet the 
"essentially free" requirement?
I think these tests weren't full coverage of possible use cases. It will 
depend on a number of relations in the query. For the JOIN of 
partitioned tables, for example, the overhead could grow. But in the 
context of overall planning time this overhead will be small till the 
large number of relations.

Also, we made this feature optional to solve possible problems.
Rebased on 768ea9bcf9

--
regards,
Andrey Lepikhov
Postgres Professional
From 6f9d11ec64b5b8e2304156deaea7842e0fd77c3e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can proof the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 890 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 399 ++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 189 +
 12 files changed, 1550 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..a8e638f6e7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids 
joinrelids);
+ Relids 
joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int 
*nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
   RelOptInfo 
*innerrel,
   JoinType 
jointype,
   List 
*restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
remove_rel_from_query(root, innerrelid,
  
bms_union(sjinfo->min_lefthand,
-   
sjinfo->min_righthand));
+   
sjinfo->min_righthand), 0);
 
/* We verify that exactly one reference gets removed from 
joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo 
*sjinfo)
 
 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to

Re: What are exactly bootstrap processes, auxiliary processes, standalone backends, normal backends(user sessions)?

On Fri, Jul 09, 2021 at 09:24:19PM +0530, Bharath Rupireddy wrote:
> I've always had a hard time distinguishing various types of
> processes/terms used in postgres. I look at the source code every time
> to understand them, yet I don't feel satisfied with my understanding.
> I request any hacker (having a better idea than me) to help me with
> what each different process does and how they are different from each
> other? Of course, I'm clear with normal backends (user sessions), bg
> workers, but the others need a bit more understanding.

It sounds like something that should be in the glossary, which currently refers
to but doesn't define "auxiliary processes".

 * Background writer, checkpointer, WAL writer and archiver run during normal
 * operation.  Startup process and WAL receiver also consume 2 slots, but WAL
 * writer is launched only after startup has exited, so we only need 5 slots.
 */
#define NUM_AUXILIARY_PROCS 5

Bootstrap is run by initdb:
src/bin/initdb/initdb.c: "\"%s\" --boot -x0 %s 
%s "

Standalone backend is run by --single, right ?

-- 
Justin

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Em qui., 15 de jul. de 2021 às 11:19, David Rowley 
escreveu:

> On Fri, 16 Jul 2021 at 01:44, James Coleman  wrote:
> >
> > On Wed, Jul 14, 2021 at 9:22 PM David Rowley 
> wrote:
> > >
> > > On Thu, 15 Jul 2021 at 12:30, Ranier Vilela 
> wrote:
> > > >
> > > > Em qua., 14 de jul. de 2021 às 21:21, David Rowley <
> dgrowle...@gmail.com> escreveu:
> > > >> But, in v8 there is no additional branch, so no branch to
> mispredict.
> > > >> I don't really see how your explanation fits.
> > > >
> > > > In v8 the branch occurs at :
> > > > + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)
> > >
> > > You do know that branch is in a function that's only executed once
> > > during executor initialization, right?
> >
> > This is why I have a hard time believing there's a "real" change here
> > and not the result of either noise or something not really
> > controllable like executable layout changing.
>
> Yeah, I think we likely are at the level where layout changes in the
> compiled code are going to make things hard to measure.  I just want
> to make sure we're not going to end up with some regression that's
> actual and not random depending on layout changes of unrelated code.
> I think a branch that's taken consistently *should* be predicted
> correctly each time.


> Anyway, I think all the comparisons with v7b can safely be ignored. As
> Ronan pointed out, v7b has some issues due to it not recording the
> sort method in the executor state that leads to it forgetting which
> method it used once we start pulling tuples from it. The reproductions
> of that are it calling tuplesort_gettupleslot() from the 2nd tuple
> onwards regardless of if we've done a datum or tuple sort.
>
Sorry for insisting on this.
Assuming v7b is doing it the wrong way, which I still don't think it is.
Why is it still faster than v6 and v8?

regards,
Ranier Vilela

Re: storing an explicit nonce

On Wed, Jul 14, 2021 at 09:45:12PM +0530, vignesh C wrote:
> On Sat, Jun 26, 2021 at 2:52 AM Bruce Momjian  wrote:
> >
> > On Wed, May 26, 2021 at 05:02:01PM -0400, Bruce Momjian wrote:
> > > For these reasons, if we decide to go in the direction of using a
> > > non-LSN nonce, I no longer plan to continue working on this feature. I
> > > would rather work on things that have a more positive impact.  Maybe a
> > > non-LSN nonce is a better long-term plan, but there are too many
> > > unknowns and complexity for me to feel comfortable with it.
> >
> > As stated above, I have no plans to continue working on this feature.  I
> > am attaching my final patches here in case anyone wants to make use of
> > them;  it passes check-world and all my private tests.  I have removed
> > my patches from the feature wiki page:
> >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
> >
> > and replaced it with a link to this email.
> 
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

Oh, I forgot this was in the commitfast.  I have marked it as Withdrawn.
Sorry for the confusion.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Git revision in tarballs

Magnus Hagander  writes:
> Putting it in the tarball making script certainly works for me,
> though, if that's what people prefer. And that does away with the
> "clean" part as that one blows away the whole directory between each
> run.

Actually, we *have* to do it over there, because what that script
does is

   # Export the selected git ref
   git archive ${i} | tar xf - -C pgsql

   cd pgsql
   ./configure
   # some irrelevant stuff
   make dist

So there's no .git subdirectory in the directory it runs "make dist"
in.  Now maybe it'd work anyway because of the GIT_DIR environment
variable, but what I think is more likely is that the file would
end up containing the current master-branch HEAD commit, whereas
the thing we actually want to record here is ${i}.

regards, tom lane

Re: SQL:2011 PERIODS vs Postgres Ranges?

2021-07-15 Thread Paul A Jungwirth

On Thu, Jul 15, 2021 at 6:21 AM Ibrar Ahmed  wrote:
> Based on last comments of Paul and David S I am changing the status to 
> "Waiting on Author".

I thought the subject was quite out of date, so I sent my last patch
here: 
https://www.postgresql.org/message-id/CA%2BrenyUApHgSZF9-nd-a0%2BOPGharLQLO%3DmDHcY4_qQ0%2BnoCUVg%40mail.gmail.com

I also added that thread to the commitfest item.

I'm going to change the commitfest entry back to Needs Review, but
please let me know if you disagree. Sorry for the confusion!

Yours,
Paul

Re: What are exactly bootstrap processes, auxiliary processes, standalone backends, normal backends(user sessions)?

On Tue, Jul 13, 2021 at 3:00 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Bharath Rupireddy (bharath.rupireddyforpostg...@gmail.com) wrote:
> > I've always had a hard time distinguishing various types of
> > processes/terms used in postgres. I look at the source code every time
> > to understand them, yet I don't feel satisfied with my understanding.
> > I request any hacker (having a better idea than me) to help me with
> > what each different process does and how they are different from each
> > other? Of course, I'm clear with normal backends (user sessions), bg
> > workers, but the others need a bit more understanding.
>
> There was an effort to try to pull these things together because, yeah,
> it seems a bit messy.
>
> I'd suggest you take a look at:
>
> https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO=htC6LnA6aW4r-+jq=3q5raofqgw8...@mail.gmail.com

Thanks. I will check it.

Regards,
Bharath Rupireddy.

Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep

On Mon, Jul 12, 2021 at 9:26 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> As suggested in [1], starting a new thread for discussing $subject
> separately. {pre, post}_auth_delay waiting  logic currently uses
> pg_usleep which can't detect postmaster death. So, there are chances
> that some of the backends still stay in the system even when a
> postmaster crashes (for whatever reasons it may be). Please have a
> look at the attached patch that does $subject. I pulled out some of
> the comments from the other thread related to the $subject, [2], [3],
> [4], [5].
>
> [1] - https://www.postgresql.org/message-id/YOv8Yxd5zrbr3k%2BH%40paquier.xyz
> [2] - https://www.postgresql.org/message-id/162764.1624892517%40sss.pgh.pa.us
> [3] - 
> https://www.postgresql.org/message-id/20210705.145251.462698229911576780.horikyota.ntt%40gmail.com
> [4] - 
> https://www.postgresql.org/message-id/flat/2021070513.GD20766%40tamriel.snowman.net
> [5] - https://www.postgresql.org/message-id/YOOnlP4NtWVzfsyb%40paquier.xyz

I added this to the commitfest - https://commitfest.postgresql.org/34/3255/

Regards,
Bharath Rupireddy.

Re: Inaccurate error message when set fdw batch_size to 0

On Mon, Jul 12, 2021 at 10:11 PM Bharath Rupireddy
 wrote:
>
> On Mon, Jul 12, 2021 at 9:20 PM Fujii Masao  
> wrote:
> > +  
> > +   Avoid Using non-negative Word in Error 
> > Messages
> > +
> > +   
> > +Do not use non-negative word in error messages as it 
> > looks
> > +ambiguous. Instead, use foo must be an integer value greater 
> > than zero
> > +or  foo must be an integer value greater than or equal to 
> > zero
> > +if option foo expects an integer value.
> > +   
> > +  
> >
> > It seems suitable to put this guide under "Tricky Words to Avoid"
> > rather than adding it as separate section. Thought?
>
> +1. I will change.

PSA v7 patch with the above change.

Regards,
Bharath Rupireddy.


v7-0001-Disambiguate-error-messages-that-use-non-negative.patch
Description: Binary data

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Em qua., 14 de jul. de 2021 às 22:22, David Rowley 
escreveu:

> On Thu, 15 Jul 2021 at 12:30, Ranier Vilela  wrote:
> >
> > Em qua., 14 de jul. de 2021 às 21:21, David Rowley 
> escreveu:
> >> But, in v8 there is no additional branch, so no branch to mispredict.
> >> I don't really see how your explanation fits.
> >
> > In v8 the branch occurs at :
> > + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)
>
> You do know that branch is in a function that's only executed once
> during executor initialization, right?
>
There's a real difference between v8 and v6, if I understood correctly.

v6 the branches is per tuple:
+ if (tupDesc->natts == 1)

v8 the branches is per state:
+ if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)

I think that a big different way to solve the problem.
Or am I getting it wrong?

If the sortstate number of attributes is equal to 1, is it worth the same
for each tuple?
Can you explain this, please?

regards,
Ranier Vilela

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

On Fri, 16 Jul 2021 at 01:44, James Coleman  wrote:
>
> On Wed, Jul 14, 2021 at 9:22 PM David Rowley  wrote:
> >
> > On Thu, 15 Jul 2021 at 12:30, Ranier Vilela  wrote:
> > >
> > > Em qua., 14 de jul. de 2021 às 21:21, David Rowley  
> > > escreveu:
> > >> But, in v8 there is no additional branch, so no branch to mispredict.
> > >> I don't really see how your explanation fits.
> > >
> > > In v8 the branch occurs at :
> > > + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)
> >
> > You do know that branch is in a function that's only executed once
> > during executor initialization, right?
>
> This is why I have a hard time believing there's a "real" change here
> and not the result of either noise or something not really
> controllable like executable layout changing.

Yeah, I think we likely are at the level where layout changes in the
compiled code are going to make things hard to measure.  I just want
to make sure we're not going to end up with some regression that's
actual and not random depending on layout changes of unrelated code.
I think a branch that's taken consistently *should* be predicted
correctly each time.

Anyway, I think all the comparisons with v7b can safely be ignored. As
Ronan pointed out, v7b has some issues due to it not recording the
sort method in the executor state that leads to it forgetting which
method it used once we start pulling tuples from it. The reproductions
of that are it calling tuplesort_gettupleslot() from the 2nd tuple
onwards regardless of if we've done a datum or tuple sort.

Ronan's latest results plus John's make me think there's no need to
separate out the node function as I did in v8.  However, I do think v6
could learn a little from v8. I think I'd rather see the sort method
determined in ExecInitSort() rather than ExecSort(). I think
minimising those few extra instructions in ExecSort() might help the
L1 instruction cache.

David

Re: Git revision in tarballs

2021-07-15 Thread Magnus Hagander

On Thu, Jul 15, 2021 at 3:53 PM Tom Lane  wrote:
>
> Magnus Hagander  writes:
> > On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek  
> > wrote:
> >> The only problem I do see is adding "git" as a new dependency. That
> >> can potentially cause troubles.
>
> > But only for *creating* the tarballs,  and not for using them. I'm not
> > sure what the usecase would be to create a tarball from an environment
> > that doesn't have git?
>
> I agree, this objection seems silly.  If we ever move off of git, the
> process could be adapted at that time.  However, there *is* a reasonable
> question whether this ought to be handled by "make dist" versus the
> tarball-wrapping script.
>
> >> For the file name, I have seen GIT_VERSION or REVISION file names used
> >> before in another projects. Using ".gitrevision" doesn't make sense to
> >> me since it will be hidden on Unix by default and I'm not sure that is
> >> intended.
>
> > It was definitely intended, as I'd assume it's normally a file that
> > most people don't care about, but more something that scripts that
> > verify things would. But I'm more than happy to change it to a
> > different name if that's preferred. I looked around a bit and couldn't
> > find any general consensus for a name for such a file, but I may not
> > have looked carefully enough.
>
> We already have that convention in place:
>
> $ ls -a
> ./  .gitignore  README.git  contrib/
> ../ COPYRIGHT   aclocal.m4  doc/
> .dir-locals.el  GNUmakefile config/ src/
> .editorconfig   GNUmakefile.in  config.log  tmp_install/
> .git/   HISTORY config.status*
> .git-blame-ignore-revs  Makefileconfigure*
> .gitattributes  README  configure.ac
>
> So ".gitrevision" or the like seems fine to me.
>
> My thoughts about the proposed patch are (1) you'd better have a
> .gitignore entry too, and (2) what is the mechanism that removes
> this file?  It seems weird to have a make rule that makes a
> generated file but none to remove it.  Perhaps maintainer-clean
> should remove it?

maintainer-clean sounds reasonable for that, yes.


> Both of those issues vanish if this is delegated to the tarball
> making script; as does the need to cope with a starting point
> that isn't a specific commit.  So on the whole I'm leaning to
> the idea that it would be better done over there.

I'd be fine with either. The argument for putting it in the makefile
would be, uh, maybe it makes it a tad bit easier to verify builds
because you get it in your local build as well. But it's not like it's
very *hard* to do it...

Putting it in the tarball making script certainly works for me,
though, if that's what people prefer. And that does away with the
"clean" part as that one blows away the whole directory between each
run.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Git revision in tarballs

Magnus Hagander  writes:
> On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek  wrote:
>> The only problem I do see is adding "git" as a new dependency. That
>> can potentially cause troubles.

> But only for *creating* the tarballs,  and not for using them. I'm not
> sure what the usecase would be to create a tarball from an environment
> that doesn't have git?

I agree, this objection seems silly.  If we ever move off of git, the
process could be adapted at that time.  However, there *is* a reasonable
question whether this ought to be handled by "make dist" versus the
tarball-wrapping script.

>> For the file name, I have seen GIT_VERSION or REVISION file names used
>> before in another projects. Using ".gitrevision" doesn't make sense to
>> me since it will be hidden on Unix by default and I'm not sure that is
>> intended.

> It was definitely intended, as I'd assume it's normally a file that
> most people don't care about, but more something that scripts that
> verify things would. But I'm more than happy to change it to a
> different name if that's preferred. I looked around a bit and couldn't
> find any general consensus for a name for such a file, but I may not
> have looked carefully enough.

We already have that convention in place:

$ ls -a
./  .gitignore  README.git  contrib/
../ COPYRIGHT   aclocal.m4  doc/
.dir-locals.el  GNUmakefile config/ src/
.editorconfig   GNUmakefile.in  config.log  tmp_install/
.git/   HISTORY config.status*
.git-blame-ignore-revs  Makefileconfigure*
.gitattributes  README  configure.ac

So ".gitrevision" or the like seems fine to me.

My thoughts about the proposed patch are (1) you'd better have a
.gitignore entry too, and (2) what is the mechanism that removes
this file?  It seems weird to have a make rule that makes a 
generated file but none to remove it.  Perhaps maintainer-clean
should remove it?

Both of those issues vanish if this is delegated to the tarball
making script; as does the need to cope with a starting point
that isn't a specific commit.  So on the whole I'm leaning to
the idea that it would be better done over there.

regards, tom lane

Re: 回复： Why is XLOG_FPI_FOR_HINT always need backups?

2021-07-15 Thread Fujii Masao





On 2021/07/07 16:11, Kyotaro Horiguchi wrote:

Hello.

At Tue, 6 Jul 2021 20:42:23 +0800, "zwj" <757634...@qq.com> wrote in

But I wonder whether it is necessary or not while my file system can protect the 
blocks of database to be torn. And I read a comment in 
functionMarkBufferDirtyHint:

/*
  * If we need to protect hint bit updates from torn writes, WAL-log a
  * full page image of the page. This full page image is only necessary
  * if the hint bit update is the first change to the page since the
  * last checkpoint.
  *
  * We don't check full_page_writes here because that logic is included
  * when we call XLogInsert() since the value changes dynamically.
  */

However, the code tell me it has nothing to do with full_page_writes. I can't 
figure it out.


The doc of wal_log_hints says that "*even* for non-critical
modifications of so-called hint bits", which seems to me implies it is
following full_page_writes (and I think it is nonsense otherwise, as
you suspect).

XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE since 2c03216d83116 when
the symbol was introduced. As my understanding XLogInsert did not have
an ability to enforce FPIs before the commit. The code comment above
is older than that commit. So it seems to me a thinko that
XLogSaveBufferForHint sets REGBUF_FORCE_IMAGE.

I think the attached fixes that thinko.


With the patch, I got the following error during crash recovery.
I guess this happened because XLOG_FPI_FOR_HINT record had
no backup blocks even though the replay logic for XLOG_FPI_FOR_HINT
assumes it contains backup blocks.

FATAL:  unexpected XLogReadBufferForRedo result when restoring backup block
CONTEXT:  WAL redo at 0/169C600 for XLOG/FPI_FOR_HINT: ; blkref #0: rel 
1663/13236/16385, blk 0

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Introduce pg_receivewal gzip compression tests

2021-07-15 Thread gkokolatos




‐‐‐ Original Message ‐‐‐

On Thursday, July 15th, 2021 at 14:35, Michael Paquier  
wrote:

> On Thu, Jul 15, 2021 at 08:35:27PM +0900, Michael Paquier wrote:

>
> > 2.  curculio:
> > Looking at the OpenBSD code (usr.bin/compress/main.c), long options
> > are supported, where --version does exit(0) without printing
> > set_outfile() is doing a discard of the file suffixes it does not
> > recognize, and I think that their implementation bumps on .gz.partial
> > and generates an exit code of 512 to map with WARNING. I still wish
> > to keep this test, and I'd like to think that the contents of
> > @zlib_wals are enough in terms of coverage. What do you think?
>
> After thinking more about this one, I have taken the course to just
> remove the .gz.partial segment from the check, a full segment should
> be enough in terms of coverage. I prefer this simplification over a
> rename of the .partial segment or a tweak of the error code to map
> with WARNING.

Fair enough.

Cheers,
//Georgios

> ---
>
> Michael

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-15 Thread James Coleman

On Wed, Jul 14, 2021 at 9:22 PM David Rowley  wrote:
>
> On Thu, 15 Jul 2021 at 12:30, Ranier Vilela  wrote:
> >
> > Em qua., 14 de jul. de 2021 às 21:21, David Rowley  
> > escreveu:
> >> But, in v8 there is no additional branch, so no branch to mispredict.
> >> I don't really see how your explanation fits.
> >
> > In v8 the branch occurs at :
> > + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)
>
> You do know that branch is in a function that's only executed once
> during executor initialization, right?

This is why I have a hard time believing there's a "real" change here
and not the result of either noise or something not really
controllable like executable layout changing.

James

Re: SQL:2011 PERIODS vs Postgres Ranges?

2021-07-15 Thread Ibrar Ahmed

On Fri, Apr 9, 2021 at 4:54 PM David Steele  wrote:

> On 4/8/21 7:40 PM, Paul A Jungwirth wrote:
> > On Thu, Apr 8, 2021 at 7:22 AM David Steele  wrote:
> >>
> >> Paul, you can submit to the next CF when you are ready with a new patch.
> >
> > Thanks David! I've made a lot of progress but still need to finish
> > support for CASCADE on temporal foreign keys. I've been swamped with
> > other things, but hopefully I can get something during this current
> > CF.
>
> The next CF starts on July 1 so you have some time.
>
> Regards,
> --
> -David
> da...@pgmasters.net


Based on last comments of Paul and David S I am changing the status to
"Waiting on Author".


-- 
Ibrar Ahmed

Re: Asymmetric partition-wise JOIN

2021-07-15 Thread Ibrar Ahmed

On Thu, Jul 15, 2021 at 11:32 AM Andrey Lepikhov 
wrote:

> On 5/7/21 23:15, Zhihong Yu wrote:
> > On Mon, Jul 5, 2021 at 2:57 AM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > +* Can't imagine situation when join relation already
> > exists. But in
> > +* the 'partition_join' regression test it happens.
> > +* It may be an indicator of possible problems.
> >
> > Should a log be added in the above case ?
> I worked more on this case and found more serious mistake. During
> population of additional paths on the existed RelOptInfo we can remove
> some previously generated paths that pointed from a higher-level list of
> subplans and it could cause to lost of subplan links. I prohibit such
> situation (you can read comments in the new version of the patch).
> Also, choosing of a cheapest path after appendrel creation was added.
> Unstable tests were fixed.
>
> --
> regards,
> Andrey Lepikhov
> Postgres Professional
>

Patch is failing the regression, can you please take a look at that.

partition_join ... FAILED 6328 ms

--
Ibrar Ahmed

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

Em qui., 15 de jul. de 2021 às 10:01, Aleksander Alekseev <
aleksan...@timescale.com> escreveu:

> Thanks, David.
>
> > I lost where. Can you show me?
>
> See the attached warnings.txt.
>
Thank you.


>
> > But the benchmark came from:
> > pgbench -i -p 5432 -d postgres
> > pgbench -c 50 -T 300 -S -n
>
> I'm afraid this tells nothing unless you also provide the
> configuration files and the hardware description, and also some
> information on how you checked that there is no performance
> degradation on all the other supported platforms and possible
> configurations.



> Benchmarking is a very complicated topic - trust me,
> been there!
>
Absolutely.


>
> It would be better to submit two separate patches, the one that
> addresses Size_t and another that addresses shadowing. Refactorings
> only, nothing else.
>
> Regarding the code formatting, please see src/tools/pgindent.
>
I will try.

regards,
Ranier Vilela

Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

2021-07-15 Thread Ibrar Ahmed

On Thu, Jul 15, 2021 at 2:35 PM Etsuro Fujita 
wrote:

> On Thu, Jul 15, 2021 at 4:17 AM Ibrar Ahmed  wrote:
> > On Wed, Jul 14, 2021 at 1:41 AM Tom Lane  wrote:
> >> Seems to just need an update of the expected-file to account for test
> >> cases added recently.  (I take no position on whether the new results
> >> are desirable; some of these might be breaking the intent of the case.
> >> But this should quiet the cfbot anyway.)
>
> > The test case was added by commit "Add support for asynchronous
> execution"
> > "27e1f14563cf982f1f4d71e21ef24782a052" by Etsuro Fujita. He can
> comment
> > whether the new results are desirable or not.
>
> The new results aren't what I intended.  I'll update the patch to
> avoid that by modifying the original test cases properly, if there are
> no objections.
>
> Best regards,
> Etsuro Fujita
>

Thanks Etsuro,

I have changed the status to "Waiting On Author", because patch need
changes.
Etsuro, can you make yourself a reviewer/co-author to keep track of that?


-- 
Ibrar Ahmed

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2021-07-15 Thread Ronan Dunklau

Le mardi 13 juillet 2021, 06:44:12 CEST David Rowley a écrit :
> I've attached the updated patches.

The approach of building a pathkey for the first order by we find, then 
appending to it as needed seems sensible but I'm a bit worried about users 
starting to rely on this as an optimization. Even if we don't document it, 
people may start to change the order of their target lists to "force" a 
specific sort on the lower nodes. How confident are we that we won't change 
this 
or that we will be willing to break it ?

Generating all possible pathkeys and costing the resulting plans would be too 
expensive, but maybe a more "stable" (and limited) approach would be fine, like 
generating the pathkeys only if every ordered aggref shares the same prefix. I 
don't think there would be any ambiguity here.

-- 
Ronan Dunklau

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

2021-07-15 Thread Aleksander Alekseev

Thanks, David.

> I lost where. Can you show me?

See the attached warnings.txt.

> But the benchmark came from:
> pgbench -i -p 5432 -d postgres
> pgbench -c 50 -T 300 -S -n

I'm afraid this tells nothing unless you also provide the
configuration files and the hardware description, and also some
information on how you checked that there is no performance
degradation on all the other supported platforms and possible
configurations. Benchmarking is a very complicated topic - trust me,
been there!

It would be better to submit two separate patches, the one that
addresses Size_t and another that addresses shadowing. Refactorings
only, nothing else.

Regarding the code formatting, please see src/tools/pgindent.

-- 
Best regards,
Aleksander Alekseev
procarray.c:780:31: warning: passing 'const pg_atomic_uint32 *' (aka 'const 
struct pg_atomic_uint32 *') to parameter of type
  'volatile pg_atomic_uint32 *' (aka 'volatile struct pg_atomic_uint32 *') 
discards qualifiers
  [-Wincompatible-pointer-types-discards-qualifiers]
nextidx = pg_atomic_read_u32(>procArrayGroupFirst);
 ^~~~
../../../../src/include/port/atomics.h:241:47: note: passing argument to 
parameter 'ptr' here
pg_atomic_read_u32(volatile pg_atomic_uint32 *ptr)
  ^
procarray.c:785:38: warning: passing 'const pg_atomic_uint32 *' (aka 'const 
struct pg_atomic_uint32 *') to parameter of type
  'volatile pg_atomic_uint32 *' (aka 'volatile struct pg_atomic_uint32 *') 
discards qualifiers
  [-Wincompatible-pointer-types-discards-qualifiers]
if 
(pg_atomic_compare_exchange_u32(>procArrayGroupFirst,
   
^~~~
../../../../src/include/port/atomics.h:311:59: note: passing argument to 
parameter 'ptr' here
pg_atomic_compare_exchange_u32(volatile pg_atomic_uint32 *ptr,
  ^
procarray.c:829:35: warning: passing 'const pg_atomic_uint32 *' (aka 'const 
struct pg_atomic_uint32 *') to parameter of type
  'volatile pg_atomic_uint32 *' (aka 'volatile struct pg_atomic_uint32 *') 
discards qualifiers
  [-Wincompatible-pointer-types-discards-qualifiers]
nextidx = pg_atomic_exchange_u32(>procArrayGroupFirst,
 ^~~~
../../../../src/include/port/atomics.h:292:51: note: passing argument to 
parameter 'ptr' here
pg_atomic_exchange_u32(volatile pg_atomic_uint32 *ptr, uint32 newval)
  ^
procarray.c:3132:9: warning: returning 'const PGPROC *' (aka 'const struct 
PGPROC *') from a function with result type 'PGPROC *'
  (aka 'struct PGPROC *') discards qualifiers 
[-Wincompatible-pointer-types-discards-qualifiers]
return result;
   ^~
procarray.c:3159:11: warning: returning 'const PGPROC *' (aka 'const struct 
PGPROC *') from a function with result type 'PGPROC *'
  (aka 'struct PGPROC *') discards qualifiers 
[-Wincompatible-pointer-types-discards-qualifiers]
return proc;
   ^~~~
procarray.c:3778:23: warning: passing 'const List *' (aka 'const struct List 
*') to parameter of type 'List *' (aka 'struct List *')
  discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
pids = lappend_int(pids, proc->pid);
   ^~~~
../../../../src/include/nodes/pg_list.h:540:45: note: passing argument to 
parameter 'list' here
extern pg_nodiscard List *lappend_int(List *list, int datum);
^
6 warnings generated.
^Cmake[3]: *** [pg_checksums.o] Interrupt: 2
make[3]: *** wait: Interrupted system call.  Stop.
make[3]: *** Waiting for unfinished jobs
make[3]: *** [pg_basebackup.o] Interrupt: 2
make[2]: *** wait: Interrupted system call.  Stop.
make[2]: *** Waiting for unfinished jobs
make[2]: *** [all-pg_checksums-recurse] Interrupt: 2
make[2]: *** wait: Interrupted system call.  Stop.
make[1]: *** [all-bin-recurse] Error 2
make: *** [world-src-recurse] Interrupt: 2
make[3]: *** [html-stamp] Interrupt: 2
make[3]: *** [man-stamp] Interrupt: 2
make[2]: *** [all] Interrupt: 2
make[1]: *** [all] Interrupt: 2
make: *** [world-doc-recurse] Interrupt: 2

Aleksanders-MacBook-Pro:postgresql eax$ cc --version
Apple clang version 11.0.0 (clang-1100.0.33.16)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

Em qui., 15 de jul. de 2021 às 09:45, David Rowley 
escreveu:

> On Thu, 15 Jul 2021 at 23:38, Aleksander Alekseev
>  wrote:
> > I'm updating the status to "Ready for Committer".
>
> I think that might be a bit premature.  I can't quite see how changing
> the pids List to a const List makes any sense, especially when the
> code goes and calls lappend_int() on it to assign it some different
> value.
>
> There are also problems in BackendPidGetProcWithLock around consts.
>
> Much of this patch kinda feels like another one of those "I've got a
> fancy new static analyzer" patches.  Unfortunately, it just introduces
> a bunch of compiler warnings as a result of the changes it makes.
>
> I'd suggest splitting each portion of the patch out into parts related
> to what it aims to achieve.  For example,  it looks like there's some
> renaming going on to remove a local variable from shadowing a function
> parameter.  Yet the patch is claiming performance improvements.  I
> don't see how that part relates to performance. The changes to
> ProcArrayClearTransaction() seem also unrelated to performance.
>
> I'm not sure what the point of changing things like for (int i =0...
> to move the variable declaration somewhere else is about.  That just
> seems like needless stylistic changes that achieve nothing but more
> headaches for committers doing backpatching work.
>
> I'd say if this patch wants to be taken seriously it better decide
> what it's purpose is, because to me it looks just like a jumble of
> random changes that have no clear purpose.
>
> I'm going to set this back to waiting on author.
>
I understood.
I will try to address all concerns in the new version.

regards,
Ranier Vilela

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

On Thu, 15 Jul 2021 at 23:38, Aleksander Alekseev
 wrote:
> I'm updating the status to "Ready for Committer".

I think that might be a bit premature.  I can't quite see how changing
the pids List to a const List makes any sense, especially when the
code goes and calls lappend_int() on it to assign it some different
value.

There are also problems in BackendPidGetProcWithLock around consts.

Much of this patch kinda feels like another one of those "I've got a
fancy new static analyzer" patches.  Unfortunately, it just introduces
a bunch of compiler warnings as a result of the changes it makes.

I'd suggest splitting each portion of the patch out into parts related
to what it aims to achieve.  For example,  it looks like there's some
renaming going on to remove a local variable from shadowing a function
parameter.  Yet the patch is claiming performance improvements.  I
don't see how that part relates to performance. The changes to
ProcArrayClearTransaction() seem also unrelated to performance.

I'm not sure what the point of changing things like for (int i =0...
to move the variable declaration somewhere else is about.  That just
seems like needless stylistic changes that achieve nothing but more
headaches for committers doing backpatching work.

I'd say if this patch wants to be taken seriously it better decide
what it's purpose is, because to me it looks just like a jumble of
random changes that have no clear purpose.

I'm going to set this back to waiting on author.

David

Re: Rename of triggers for partitioned tables

2021-07-15 Thread Arne Roland

Hello Vignesh,

thank you for your interest!

From: vignesh C 
Sent: Thursday, July 15, 2021 14:08
To: Arne Roland
Cc: Zhihong Yu; Alvaro Herrera; Pg Hackers
Subject: Re: Rename of triggers for partitioned tables

> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".


Please let me know, whether the attached patch works for you.

Regards
Arne

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 952c8d582a..e02da5e52f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -1327,7 +1327,6 @@ get_trigger_oid(Oid relid, const char *trigname, bool missing_ok)
 
 	tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
 NULL, 2, skey);
-
 	tup = systable_getnext(tgscan);
 
 	if (!HeapTupleIsValid(tup))
@@ -1387,10 +1386,11 @@ RangeVarCallbackForRenameTrigger(const RangeVar *rv, Oid relid, Oid oldrelid,
 }
 
 /*
- *		renametrig		- changes the name of a trigger on a relation
+ *		renameonlytrig		- changes the name of a trigger on a relation
  *
  *		trigger name is changed in trigger catalog.
  *		No record of the previous name is kept.
+ *		This function assumes that the row is already locked.
  *
  *		get proper relrelation from relation catalog (if not arg)
  *		scan trigger catalog
@@ -1400,41 +1400,28 @@ RangeVarCallbackForRenameTrigger(const RangeVar *rv, Oid relid, Oid oldrelid,
  *		update row in catalog
  */
 ObjectAddress
-renametrig(RenameStmt *stmt)
+renameonlytrig(RenameStmt *stmt, Relation tgrel, Oid relid, Oid tgparent)
 {
 	Oid			tgoid;
 	Relation	targetrel;
-	Relation	tgrel;
 	HeapTuple	tuple;
 	SysScanDesc tgscan;
 	ScanKeyData key[2];
-	Oid			relid;
 	ObjectAddress address;
+	int			matched;
+	int			stmtTgnameMatches;
 
-	/*
-	 * Look up name, check permissions, and acquire lock (which we will NOT
-	 * release until end of transaction).
-	 */
-	relid = RangeVarGetRelidExtended(stmt->relation, AccessExclusiveLock,
-	 0,
-	 RangeVarCallbackForRenameTrigger,
-	 NULL);
 
 	/* Have lock already, so just need to build relcache entry. */
 	targetrel = relation_open(relid, NoLock);
 
+
 	/*
 	 * Scan pg_trigger twice for existing triggers on relation.  We do this in
 	 * order to ensure a trigger does not exist with newname (The unique index
 	 * on tgrelid/tgname would complain anyway) and to ensure a trigger does
 	 * exist with oldname.
 	 *
-	 * NOTE that this is cool only because we have AccessExclusiveLock on the
-	 * relation, so the trigger set won't be changing underneath us.
-	 */
-	tgrel = table_open(TriggerRelationId, RowExclusiveLock);
-
-	/*
 	 * First pass -- look for name conflict
 	 */
 	ScanKeyInit([0],
@@ -1455,34 +1442,50 @@ renametrig(RenameStmt *stmt)
 	systable_endscan(tgscan);
 
 	/*
-	 * Second pass -- look for trigger existing with oldname and update
+	 * Second pass -- look for trigger existing and update if pgparent is
+	 * matching
 	 */
 	ScanKeyInit([0],
 Anum_pg_trigger_tgrelid,
 BTEqualStrategyNumber, F_OIDEQ,
 ObjectIdGetDatum(relid));
-	ScanKeyInit([1],
-Anum_pg_trigger_tgname,
-BTEqualStrategyNumber, F_NAMEEQ,
-PointerGetDatum(stmt->subname));
+
 	tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
-NULL, 2, key);
-	if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
+NULL, 1, key);
+	matched = 0;
+	while (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
 	{
 		Form_pg_trigger trigform;
 
+		trigform = (Form_pg_trigger) GETSTRUCT(tuple);
+
+stmtTgnameMatches = strcmp(stmt->subname, NameStr(trigform->tgname));
+		/*
+		 * Skip triggers that not relevant. Relevant is the parent trigger as
+		 * well as parts of the current distributed trigger.
+		 */
+		if (tgparent == 0 && stmtTgnameMatches
+|| tgparent && trigform->tgparentid != tgparent)
+			continue;
+
+		if (stmtTgnameMatches)
+		{
+			ereport(NOTICE,
+	(errmsg("trigger \"%s\" for table \"%s\" was not named \"%s\", renaming anyways",
+			NameStr(trigform->tgname), RelationGetRelationName(targetrel), stmt->subname)));
+		}
+
 		/*
 		 * Update pg_trigger tuple with new tgname.
 		 */
 		tuple = heap_copytuple(tuple);	/* need a modifiable copy */
 		trigform = (Form_pg_trigger) GETSTRUCT(tuple);
-		tgoid = trigform->oid;
-
 		namestrcpy(>tgname,
    stmt->newname);
-
 		CatalogTupleUpdate(tgrel, >t_self, tuple);
 
+		tgoid = trigform->oid;
+
 		InvokeObjectPostAlterHook(TriggerRelationId,
   tgoid, 0);
 
@@ -1492,29 +1495,122 @@ renametrig(RenameStmt *stmt)
 		 * entries.  (Ideally this should happen automatically...)
 		 */
 		CacheInvalidateRelcache(targetrel);
+		matched++;
 	}
-	else
+
+	systable_endscan(tgscan);
+
+	/*
+	 * There always should be exactly one matching trigger on the child
+	 * partition. If there isn't fail with an error.
+	 */
+	if (!matched)
 	{
 		ereport(ERROR,
-(errcode(ERRCODE_UNDEFINED_OBJECT),
-

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Em qui., 15 de jul. de 2021 às 09:27, Ronan Dunklau 
escreveu:

> Le jeudi 15 juillet 2021, 14:09:26 CEST Ranier Vilela a écrit :
> > Is there a special reason to not share v7b tests and results?
> >
>
> The v7b patch is wrong, as it loses the type of tuplesort being used

I don't see 'node->datumSort' being anywhere else yet.


> and as
> such always tries to fetch results using tuplesort_gettupleslot after the
> first
> tuple is fetched.

Is that why it is faster than v6?

regards,
Ranier Vilela

Re: Introduce pg_receivewal gzip compression tests

2021-07-15 Thread Michael Paquier

On Thu, Jul 15, 2021 at 08:35:27PM +0900, Michael Paquier wrote:
> 1) bowerbird on Windows/MSVC:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird=2021-07-15%2010%3A30%3A36
> pg_receivewal: fatal: could not fsync existing write-ahead log file
> "00010002.partial": Permission denied
> not ok 20 - streaming some WAL using ZLIB compression
> I don't think the existing code can be blamed for that as this means a
> failure with gzflush().  Likely a concurrency issue as that's an
> EACCES.  If that's repeatable, that could point to an actual issue
> with pg_receivewal --compress.

For this one, I'll try to test harder on my own host.  I am curious to
see if the other Windows members running the TAP tests have anything
to say.  Looking at the code of zlib, this would come from gz_zero()
in gzflush(), which could blow up on a write() in gz_comp().

> 2) curculio:
> 
> Looking at the OpenBSD code (usr.bin/compress/main.c), long options
> are supported, where --version does exit(0) without printing
> anything, and --test is supported even if that's not on the man pages.
> set_outfile() is doing a discard of the file suffixes it does not
> recognize, and I think that their implementation bumps on .gz.partial
> and generates an exit code of 512 to map with WARNING.  I still wish
> to keep this test, and I'd like to think that the contents of
> @zlib_wals are enough in terms of coverage.  What do you think?

After thinking more about this one, I have taken the course to just
remove the .gz.partial segment from the check, a full segment should
be enough in terms of coverage.  I prefer this simplification over a
rename of the .partial segment or a tweak of the error code to map
with WARNING.
--
Michael


signature.asc
Description: PGP signature

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)