date:20210305

RE: [HACKERS] logical decoding of two-phase transactions

2021-03-05 Thread osumi.takami...@fujitsu.com

Hi


On Saturday, March 6, 2021 10:49 AM Peter Smith  wrote:
> Please find attached the latest patch set v50*

When I read throught the patch set, I found there is a
wierd errmsg in apply_handle_begin_prepare(), which seems a mistake.

File : v50-0003-Add-support-for-apply-at-prepare-time-to-built-i.patch

+* The gid must not already be prepared.
+*/
+   if (LookupGXact(begin_data.gid, begin_data.end_lsn, 
begin_data.committime))
+   ereport(ERROR,
+   (errcode(ERRCODE_DUPLICATE_OBJECT),
+   
errmsg("transaction?identifier?\"%s\"?is?already?in?use",
+  begin_data.gid)));

Please fix this in a next update.


Best Regards,
Takamichi Osumi

Re: Different compression methods for FPI

2021-03-05 Thread Andrey Borodin




> 1 марта 2021 г., в 10:03, Justin Pryzby  написал(а):

Justin, Michael, thanks for comments!

As far as I understood TODO list for the patch looks as follows:

1. Reuse compression API of other patches. But which one? Or should I invent 
new one? "compressamapi.h" from "custom compression methods" bothers only with 
varlena <-> varlena conversions, and only for lz4. And it is "access method" 
after all, residing in backend...
ZStream looks good, but it lacks OID identification for compression methods and 
lz4.
2. Store OID in FPIs instead of 1-byte CompressionId. Make sure frontend is 
able to recognize builtin compression OIDs.
3. Complain if wal_compression_method is set to lib which is not compiled in.
4. Add tests for different WAL compression methods similar to src/test/ssl

Did I miss something?
I would appreciate a help with item 1, I do not know how to choose starting 
point.

> wal_compression_method (why not just using wal_compression?)

I hope one day we will compress all WAL, not just FPIs. Advanced archive 
management tools already do so, why not compress it in walwriter?
When this will be implemented, we could have wal_compression = {off, fpi, all}.

Best regards, Andrey Borodin.

Re: Some regular-expression performance hacking

2021-03-05 Thread Joel Jacobson

On Fri, Feb 26, 2021, at 19:55, Tom Lane wrote:
> "Joel Jacobson"  writes:
> > On Fri, Feb 26, 2021, at 01:16, Tom Lane wrote:
> >> 0007-smarter-regex-allocation-2.patch
> 
> > I've successfully tested this patch.
> 
> Cool, thanks for testing!

I thought it would be interesting to see if any differences
in *where* matches occur not only *what* matches.

I've compared the output from regexp_positions()
between REL_13_STABLE and HEAD.

I'm happy to report no differences were found,
except some new expected

invalid regular expression: invalid character range

errors due to the fixes.

This time I also ran into the

(["'`])(?:\\\1|.)*?\1

pattern due to using the flags,
which caused a timeout on REL_13_STABLE,
but the same pattern is fast on HEAD.

All good.

/Joel

Re: Add client connection check during the execution of the query

2021-03-05 Thread Zhihong Yu

For v4-0002-some-fixups.patch :

+   if (client_connection_check_interval > 0)
+   enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,

+   /* Start timeout for checking if the client has gone away if necessary.
*/
+   if (client_connection_check_interval)

It would be better if the second if condition is aligned with that of the
first (> 0).

For v4-0001-Detect-dropped-connections-while-running-queries.patch :

+Sets a time interval, in milliseconds, between periodic

I wonder if the interval should be expressed in seconds. Since the
description says: while running very long queries.

Cheers

On Fri, Mar 5, 2021 at 8:07 PM Thomas Munro  wrote:

> On Mon, Mar 1, 2021 at 6:18 PM Thomas Munro 
> wrote:
> > I've done a quick rebase of this the patch and added it to the
> > commitfest.  No other changes.  Several things were mentioned earlier
> > that still need to be tidied up.
>
> Rebased again due to bitrot.  This time I did some actual work:
>
> 1.  I didn't like the way it was rearming the timer *in the timer
> handler*; I think it should be done in the CFI(), and only if it
> determines that you're still running a query (otherwise you'll get
> periodic wakeups while you're idle between quieries, which is bad for
> the arctic ice cap; we already handle client going away efficiently
> between queries with WaitEventSet socket readiness).
> 2.  The timer handler surely has to set the latch to close a race (cf.
> other similar handlers; between the CFI() and the beginning of the
> sleep, you could handle the signal, set the flag, and then go to sleep
> for 100 years).
> 3.  The test might as well use pg_sleep() instead of doing a plpgsql
> busy loop of SELECT queries.
> 4.  I prefer the name CLIENT_CONNECTION_CHECK_TIMEOUT instead of
> SKIP_CLIENT_CHECK_TIMEOUT; let's make up only one new name for a
> concept instead of two.
> 5.  Miniscule doc change.
>
> I put these into a separate patch for ease of review.  I don't claim
> this is ready -- still needs more testing etc -- but it seems to be
> generating the right system calls at the right times now.
>

Re: Add client connection check during the execution of the query

2021-03-05 Thread Thomas Munro

On Mon, Mar 1, 2021 at 6:18 PM Thomas Munro  wrote:
> I've done a quick rebase of this the patch and added it to the
> commitfest.  No other changes.  Several things were mentioned earlier
> that still need to be tidied up.

Rebased again due to bitrot.  This time I did some actual work:

1.  I didn't like the way it was rearming the timer *in the timer
handler*; I think it should be done in the CFI(), and only if it
determines that you're still running a query (otherwise you'll get
periodic wakeups while you're idle between quieries, which is bad for
the arctic ice cap; we already handle client going away efficiently
between queries with WaitEventSet socket readiness).
2.  The timer handler surely has to set the latch to close a race (cf.
other similar handlers; between the CFI() and the beginning of the
sleep, you could handle the signal, set the flag, and then go to sleep
for 100 years).
3.  The test might as well use pg_sleep() instead of doing a plpgsql
busy loop of SELECT queries.
4.  I prefer the name CLIENT_CONNECTION_CHECK_TIMEOUT instead of
SKIP_CLIENT_CHECK_TIMEOUT; let's make up only one new name for a
concept instead of two.
5.  Miniscule doc change.

I put these into a separate patch for ease of review.  I don't claim
this is ready -- still needs more testing etc -- but it seems to be
generating the right system calls at the right times now.
From 8962b1ac40cf63c90af3e65dc15d554d764483dc Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 1 Mar 2021 18:08:23 +1300
Subject: [PATCH v4 1/2] Detect dropped connections while running queries.

Provide a new optional GUC that can be used to check whether the client
connection has gone away periodically while running very long queries.

Author: Sergey Cherkashin 
Author: 
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 doc/src/sgml/config.sgml  |  20 
 src/backend/libpq/pqcomm.c|  31 +
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/init/globals.c  |   1 +
 src/backend/utils/init/postinit.c |  13 +++
 src/backend/utils/misc/guc.c  |  10 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/include/libpq/libpq.h |   2 +
 src/include/miscadmin.h   |   3 +-
 src/include/utils/timeout.h   |   1 +
 src/test/modules/Makefile |   1 +
 src/test/modules/connection/Makefile  |  16 +++
 .../connection/t/001_close_connection.pl  | 107 ++
 13 files changed, 212 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/connection/Makefile
 create mode 100644 src/test/modules/connection/t/001_close_connection.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 967de73596..cadf6a9e7d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9231,6 +9231,26 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'

+ 
+  client_connection_check_interval (integer)
+  
+   client_connection_check_interval configuration parameter
+  
+  
+  
+   
+Sets a time interval, in milliseconds, between periodic
+verification of client-server connection during query execution.
+If the client aborts the connection, the query is terminated.
+   
+   
+Default value is zero - it disables connection
+checks, so the backend will detect client disconnection only when trying
+to send a response to the query.
+   
+  
+ 
+

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 4c7b1e7bfd..9f33138105 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -104,6 +104,7 @@
  */
 int			Unix_socket_permissions;
 char	   *Unix_socket_group;
+int 		client_connection_check_interval;

 /* Where the Unix socket files are (list of palloc'd strings) */
 static List *sock_paths = NIL;
@@ -1921,3 +1922,33 @@ pq_settcpusertimeout(int timeout, Port *port)

 	return STATUS_OK;
 }
+
+/* 
+ *	pq_check_client_connection - check if client connected to socket or not
+ * 
+ */
+void pq_check_client_connection(void)
+{
+	CheckClientConnectionPending = false;
+	if (IsUnderPostmaster &&
+		MyProcPort != NULL && !PqCommReadingMsg && !PqCommBusy)
+	{
+		char nextbyte;
+		int r;
+
+#ifdef WIN32
+		pgwin32_noblock = 1;
+#endif
+		r = recv(MyProcPort->sock, &nextbyte, 1, MSG_PEEK);
+#ifdef WIN32
+		pgwin32_noblock = 0;
+#endif
+
+		if (r == 0 || (r == -1 &&
+			errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR))
+		{
+			ClientConnectionLost = true;
+			InterruptPending = true;
+		}
+	}
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8a0332dde9..c9be532362 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tco

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson

On Fri, Mar 5, 2021, at 20:46, Joel Jacobson wrote:
> My conclusion is that we should use setof int4range[] as the return value for 
> regexp_positions().

If acceptable by the project, it be even nicer if we could just return the 
suggested composite type.

I don't see any existing catalog functions returning composite types though?
Is this due to some policy of not wanting composite types as return values for 
built-ins or just a coincidence?

Example on regexp_positions -> setof range[] 
where range is:

CREATE TYPE range AS (start int8, stop int8);

SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
regexp_positions
--
{"(6,6)"}
(1 row)

SELECT r FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
   r
---
{"(3,6)","(6,11)"}
{"(11,16)","(16,20)"}
(2 rows)

SELECT r[1].*, r[2].* FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
start | stop | start | stop
---+--+---+--
 3 |6 | 6 |   11
11 |   16 |16 |   20
(2 rows)

SELECT r[1].* FROM regexp_positions('','^','g') AS r;
start | stop
---+--
 0 |0
(1 row)

Thoughts?

/Joel

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Kohei KaiGai

2021年3月6日(土) 11:21 Tom Lane :
>
> Kohei KaiGai  writes:
> > 2021年3月6日(土) 1:41 Tom Lane :
> >> Scanning the code, I have a couple of gripes.  I'm not sure it's
> >> a good plan to just send the "header" field raw like that ---
> >> would breaking it into a dimension field and a point bool be
> >> better?  In any case, the receive function has to be more careful
> >> than this about accepting only valid header values.
> >>
> > I have a different opinion here.
> > Do we never reinterpret the unused header fields (bits 8-30) for another 
> > purpose
> > in the future version?
>
> Right, that's what to be concerned about.
>
> The best way might be to send the header as-is, as you've done,
> but for cube_recv to throw error if the reserved bits aren't
> all zero.  That way we can't get into a situation where we
> aren't sure what's in stored values.  If we do expand the header
> in future, values should be forward compatible.
>
Ok, the attached v4 sends the raw header as-is, then cube_recv
validates the header.
If num-of-dimension is larger than CUBE_MAX_DIM, it is obviously
unused bits (8-30)
are used or out of the range.

It also changes the manner of offsetof() as you suggested.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql-cube-binary-inout-handler.v4.patch
Description: Binary data

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Tom Lane

Kohei KaiGai  writes:
> 2021年3月6日(土) 1:41 Tom Lane :
>> Scanning the code, I have a couple of gripes.  I'm not sure it's
>> a good plan to just send the "header" field raw like that ---
>> would breaking it into a dimension field and a point bool be
>> better?  In any case, the receive function has to be more careful
>> than this about accepting only valid header values.
>> 
> I have a different opinion here.
> Do we never reinterpret the unused header fields (bits 8-30) for another 
> purpose
> in the future version?

Right, that's what to be concerned about.

The best way might be to send the header as-is, as you've done,
but for cube_recv to throw error if the reserved bits aren't
all zero.  That way we can't get into a situation where we
aren't sure what's in stored values.  If we do expand the header
in future, values should be forward compatible.

regards, tom lane

Re: DROP INDEX CONCURRENTLY on partitioned index

2021-03-05 Thread Michael Paquier

On Fri, Mar 05, 2021 at 09:27:05AM -0500, David Steele wrote:
> It appears there are still some issues to be resolved with this patch, but
> the next step seems to be for you to have a look at Justin's most recent
> patch.

Not sure if I'll be able to do that by the end of this month.  Looking
quicky at the patch, I am not much a fan of the new code path
introduced for the deletion of dependent objects in the partition
tree, so my gut is telling me that we need to think harder about the
problem at hand first.
--
Michael


signature.asc
Description: PGP signature

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Kohei KaiGai

2021年3月6日(土) 1:41 Tom Lane :
>
> Kohei KaiGai  writes:
> > Thanks, the attached patch add cube--1.5 for binary send/recv functions and
> > modification of cube type using the new ALTER TYPE.
>
> Hm, that was already superseded by events (112d411fb).
> As long as we get this done for v14, we can just treat it
> as an add-on to cube 1.5, so here's a quick rebase onto HEAD.
>
Thanks for this revising.

> Scanning the code, I have a couple of gripes.  I'm not sure it's
> a good plan to just send the "header" field raw like that ---
> would breaking it into a dimension field and a point bool be
> better?  In any case, the receive function has to be more careful
> than this about accepting only valid header values.
>
I have a different opinion here.
Do we never reinterpret the unused header fields (bits 8-30) for another purpose
in the future version?
If application saves the raw header field as-is, at least, it can keep
the header field
without information loss.
On the other hand, if cube_send() individually sent num-of-dimension
and point flag,
an application (that is built for the current version) will drop the
bit fields currently unused,
but the new version of server may reinterpret the field for something.

Of course, it's better to have more careful validation at cuda_recv()
when it received
the header field.

> Also, I don't think "offsetof(NDBOX, x[nitems])" is per project
> style.  It at least used to be true that MSVC couldn't cope
> with that, so we prefer
>
> offsetof(NDBOX, x) + nitems * sizeof(whatever)
>
Ok, I'll fix it on the next patch.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Disallow SSL compression?

2021-03-05 Thread Michael Paquier

On Fri, Mar 05, 2021 at 05:44:20PM +0100, Magnus Hagander wrote:
> We've broken stats views before. While it'd be nice if we could group
> multiple breakages at the same time, I don't think it's that
> important. Better to get rid of it once and for all from as many
> places as possible.

Okay, cool.  I'd rather wait more for Peter before doing anything, so
if there are no objections, I'll look at that stuff again at the
beginning of next week and perhaps apply it.  If you wish to take care
of that yourself, please feel free to do so, of course.
--
Michael


signature.asc
Description: PGP signature

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Michael Paquier

On Fri, Mar 05, 2021 at 11:16:55AM -0500, Tom Lane wrote:
> That would have uncertain effects on other TAP tests, so I'm disinclined
> to do it that way.

+1.  There may be tests out-of-core that rely on this value as
default.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2021-03-05 Thread Alvaro Herrera

v33 was indeed marked a pass by cfbot.  However, it only *builds* the
test program, it does not *run* it.  I guess we'll have to wait for the
buildfarm to tell us more.

In the meantime, I implemented PQsendQuery() as callable in pipeline
mode; it does that by using the extended-query protocol directly rather
than sending 'Q' as in non-pipeline mode.  I also adjusted the docs a
little bit more.  That's what you see here as v34.

I'll take the weekend to think about the issue with conn->last_query and
conn->queryclass that I mentioned yesterday; other than that detail my
feeling is that this is committable, so I'll be looking at getting this
pushed early next weeks, barring opinions from others.

-- 
Álvaro Herrera   Valdivia, Chile
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 0553279314..e3cd5c377b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3173,6 +3173,33 @@ ExecStatusType PQresultStatus(const PGresult *res);

   
  
+
+ 
+  PGRES_PIPELINE_SYNC
+  
+   
+The PGresult represents a
+synchronization point in pipeline mode, requested by 
+.
+This status occurs only when pipeline mode has been selected.
+   
+  
+ 
+
+ 
+  PGRES_PIPELINE_ABORTED
+  
+   
+The PGresult represents a pipeline that has
+received an error from the server.  PQgetResult
+must be called repeatedly, and each time it will return this status code
+until the end of the current pipeline, at which point it will return
+PGRES_PIPELINE_SYNC and normal processing can
+resume.
+   
+  
+ 
+
 
 
 If the result status is PGRES_TUPLES_OK or
@@ -4919,6 +4946,473 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Pipeline Mode
+
+  
+   libpq
+   pipeline mode
+  
+
+  
+   pipelining
+   in libpq
+  
+
+  
+   batch mode
+   in libpq
+  
+
+  
+   libpq pipeline mode allows applications to
+   send a query without having to read the result of the previously
+   sent query.  Taking advantage of the pipeline mode, a client will wait
+   less for the server, since multiple queries/results can be
+   sent/received in a single network transaction.
+  
+
+  
+   While pipeline mode provides a significant performance boost, writing
+   clients using the pipeline mode is more complex because it involves
+   managing a queue of pending queries and finding which result
+   corresponds to which query in the queue.
+  
+
+  
+   Pipeline mode also generally consumes more memory on both the client and server,
+   though careful and aggressive management of the send/receive queue can mitigate
+   this.  This applies whether or not the connection is in blocking or non-blocking
+   mode.
+  
+
+  
+   Using Pipeline Mode
+
+   
+To issue pipelines, the application must switch the connection
+into pipeline mode,
+which is done with .
+ can be used
+to test whether pipeline mode is active.
+In pipeline mode, only asynchronous operations
+are permitted, and COPY is disallowed.
+Using synchronous command execution functions
+such as PQfn,
+PQexec,
+PQexecParams,
+PQprepare,
+PQexecPrepared,
+PQdescribePrepared,
+PQdescribePortal,
+is an error condition.
+Once all dispatched commands have had their results processed, and
+the end pipeline result has been consumed, the application may return
+to non-pipelined mode with .
+   
+
+   
+
+ It is best to use pipeline mode with libpq in
+ non-blocking mode. If used
+ in blocking mode it is possible for a client/server deadlock to occur.
+  
+   
+The client will block trying to send queries to the server, but the
+server will block trying to send results to the client from queries
+it has already processed. This only occurs when the client sends
+enough queries to fill both its output buffer and the server's receive
+buffer before it switches to processing input from the server,
+but it's hard to predict exactly when that will happen.
+   
+  
+
+   
+
+   
+Issuing Queries
+
+
+ After entering pipeline mode, the application dispatches requests using
+ , 
+ , 
+ or its prepared-query sibling
+ .
+ These requests are queued on the client-side until flushed to the server;
+ this occurs when  is used to
+ establish a synchronization point in the pipeline,
+ or when  is called.
+ The functions ,
+ , and
+  also work in pipeline mode.
+ Result processing is described below.
+
+
+
+ The server executes statements, and returns results, in the order the
+ client sends them.  The server will

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2021-03-05 Thread Zhihong Yu

Hi,

+  \gset and \aset cannot be used
+  pipeline mode, since query results are not immediately

'used pipeline mode' -> 'used in pipeline mode'

--- /dev/null
+++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c
@@ -0,0 +1,1144 @@
+/*
+ * src/test/modules/libpq_pipeline/libpq_pipeline.c

Looks like license information is missing from the header.

Cheers

On Thu, Mar 4, 2021 at 1:40 PM Alvaro Herrera 
wrote:

> On 2021-Mar-04, Alvaro Herrera wrote:
>
> > I don't know where do __WSAFDIsSet and __imp_select come from or what to
> > do about them.  Let's see if adding pgport and pgcommon fixes things.
>
> Indeed all those other problems were fixed and these remain.  New
> failure is:
>
> "C:\projects\postgresql\pgsql.sln" (default target) (1) ->
> 6007"C:\projects\postgresql\libpq_pipeline.vcxproj" (default target) (55)
> ->
> 6008(Link target) ->
> 6009  libpq_pipeline.obj : error LNK2019: unresolved external symbol
> __WSAFDIsSet referenced in function test_pipelined_insert
> [C:\projects\postgresql\libpq_pipeline.vcxproj]
> 6010  libpq_pipeline.obj : error LNK2019: unresolved external symbol
> __imp_select referenced in function test_pipelined_insert
> [C:\projects\postgresql\libpq_pipeline.vcxproj]
> 6011  .\Release\libpq_pipeline\libpq_pipeline.exe : fatal error LNK1120: 2
> unresolved externals [C:\projects\postgresql\libpq_pipeline.vcxproj]
>
> I did notice that isolationtester.c is using select(), and one
> difference is that it includes  which libpq_pipeline.c
> does not -- and it also pulls in ws2_32.lib.  Let's see if those two
> changes fix things.
>
> --
> Álvaro Herrera   Valdivia, Chile
>

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

2021-03-05 Thread Daniel Gustafsson

> On 3 Mar 2021, at 23:19, Thomas Munro  wrote:
> On Thu, Mar 4, 2021 at 4:18 AM Daniel Gustafsson  wrote:

>> Since there is no way to get make the first destroy_tablespace_directories 
>> call
>> fail on purpose in testing, the assertion coverage may have limited use 
>> though?
> 
> There is: all you have to do is drop a table, and then drop the
> tablespace that held it without a checkpoint in between.

Of course, that makes a lot of sense.

> One thing I am pretty sure of is that it's never OK to
> wait for a ProcSignalBarrier when you're not interruptible;

Agreed.

> for one thing, you won't process the request yourself (self deadlock) and for
> another, it would be hypocritical of you to expect others to process 
> interrupts
> when you can't (interprocess deadlock); perhaps there should be an assertion
> about that, but it's pretty obvious if you screw that up: it hangs.

An assertion for interrupts not being held off doesn't seem like a terrible
idea, if only to document the intent of the code for readers.

> That's why I release and reacquire that LWLock.  But does that break some
> logic?

One clear change to current behavior is naturally that a concurrent
TablespaceCreateDbspace can happen while barrier absorption is performed.
Given where we are that might not be a problem, but I don't have enough
caffeine at the moment to conclude anything there.  Testing nu inducing
concurent calls while absorption was stalled didn't trigger anything, but I'm
sure I didn't test every scenario. Do you see anything off the cuff?

--
Daniel Gustafsson   https://vmware.com/

Re: Extend more usecase for planning time partition pruning and init partition pruning.

2021-03-05 Thread Andy Fan

Hi Amit:
  Thanks for your review!

On Thu, Mar 4, 2021 at 5:07 PM Amit Langote  wrote:

> Hi Andy,
>
> On Sun, Jan 24, 2021 at 7:34 PM Andy Fan  wrote:
> >  I recently found a use case like this.  SELECT * FROM p, q WHERE
> p.partkey =
> >  q.colx AND (q.colx = $1 OR q.colx = $2); Then we can't do either
> planning time
> >  partition prune or init partition prune.  Even though we have run-time
> >  partition pruning work at last, it is too late in some cases since we
> have
> >  to init all the plan nodes in advance.  In my case, there are 10+
> >  partitioned relation in one query and the execution time is short, so
> the
> >  init plan a lot of plan nodes cares a lot.
> >
> > The attached patches fix this issue. It just get the "p.partkey = q.colx"
> > case in root->eq_classes or rel->joinlist (outer join), and then check
> if there
> > is some baserestrictinfo in another relation which can be used for
> partition
> > pruning. To make the things easier, both partkey and colx must be Var
> > expression in implementation.
> >
> > - v1-0001-Make-some-static-functions-as-extern-and-extend-C.patch
> >
> > Just some existing refactoring and extending ChangeVarNodes to be able
> > to change var->attno.
> >
> > - v1-0002-Build-some-implied-pruning-quals-to-extend-the-us.patch
>
> IIUC, your proposal is to transpose the "q.b in (1, 2)" in the
> following query as "p.a in (1, 2)" and pass it down as a pruning qual
> for p:
>
> select * from p, q where p.a = q.b and q.b in (1, 2);
>
> or "(q.b = 1 or q.b = 2)" in the following query as "(p.a = 1 or p.a = 2)":
>
> select * from p, q where p.a = q.b and (q.b = 1 or q.b = 2);
>
>
Yes,  you understand me correctly.


> While that transposition sounds *roughly* valid, I have some questions
> about the approach:
>
> * If the transposed quals are assumed valid to use for partition
> pruning, could they also not be used by, say, the surviving
> partitions' index scan paths?  So, perhaps, it doesn't seem right that
> partprune.c builds the clauses on-the-fly for pruning and dump them
> once done.
>
> * On that last part, I wonder if partprune.c isn't the wrong place to
> determine that "q.b in (1, 2)" and "p.a in (1, 2)" are in fact
> equivalent.  That sort of thing is normally done in the phase of
> planning when distribute_qual_to_rels() runs and any equivalences
> found stored in PlannerInfo.eq_classes.  Have you investigated why the
> process_ machinery doesn't support working with ScalarArrayOpExpr and
> BoolExpr to begin with?
>
> * Or maybe have you considered generalizing what
> build_implied_pruning_quals() does so that other places like
> indxpath.c can use the facility?
>
>
Actually at the beginning of this work, I do think I should put the implied
quals to baserestictinfo in the distribute_qual_for_rels stage.  That
probably
can fix all the issues you reported.  However that probably more complex
than what I did with more risks and I have a very limited timeline to handle
the real custom issue,  so I choose this strategy.   But it is the time to
re-think
the baserestrictinfo way now.  I will spend some time in this direction,
Thank you
 for this kind  of push-up:)  I just checked this stuff on Oracle,  Oracle
does use
this strategy.

SQL> explain plan for select * from t1, t2 where t1.a = t2.a and t1.a > 2;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

Plan hash value: 1838229974

---
| Id  | Operation   | Name | Rows  | Bytes | Cost (%CPU)| Time  |
---
|   0 | SELECT STATEMENT   |  | 1 |52 | 4   (0)| 00:00:01 |
|*  1 |  HASH JOIN   |  | 1 |52 | 4   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   | 1 |26 | 2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   | 1 |26 | 2   (0)| 00:00:01 |
---


PLAN_TABLE_OUTPUT

Predicate Information (identified by operation id):
---

   1 - access("T1"."A"="T2"."A")

*   2 - filter("T1"."A">2)   3 - filter("T2"."A">2)*

17 rows selected.


postgres=# explain (costs off) select * from t1, t2 where t1.a = t2.a and
t1.a > 2;
  QUERY PLAN
---
 Merge Join
   Merge Cond: (t1.a = t2.a)
   ->  Sort
 Sort Key: t1.a
 ->  Seq Scan on t1
   Filter: (a > 2)
   ->  Sort
 Sort Key: t2.a
 ->  Seq Scan on t2
(9 rows)


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-03-05 Thread Andy Fan

On Fri, Mar 5, 2021 at 4:16 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Fri, Mar 05, 2021 at 10:22:45AM +0800, Andy Fan wrote:
> > > > I checked again and found I do miss the check on JoinExpr->quals.  I
> have
> > > > fixed it in v3 patch. Thanks for the review!
> > > >
> > > > In the attached v3,  commit 1 is the real patch, and commit 2 is
> just add
> > > > some logs to help local testing.  notnull.sql/notnull.out is the test
> > > case
> > > > for
> > > > this patch, both commit 2 and notnull.* are not intended to be
> committed
> > > > at last.
> > >
> > > Just to clarify, this version of notnullattrs here is the latest one,
> > > and another one from "UniqueKey on Partitioned table" thread should be
> > > disregarded?
> > >
> >
> > Actually they are different sections for UniqueKey.  Since I don't want
> to
> > mess
> > two topics in one place, I open another thread.  The topic here is how to
> > represent
> > a not null attribute, which is a precondition for all UniqueKey stuff.
> The
> > thread
> > " UniqueKey on Partitioned table[1] " is talking about how to maintain
> the
> > UniqueKey on a partitioned table only.
>
> Sure, those two threads are addressing different topics. But [1] also
> includes the patch for notnullattrs (I guess it's the same as one of the
> older versions from this thread), so it would be good to specify which
> one should be used to avoid any confusion.
>
> > > I'm not sure about this, but couldn't be there still
> > > some cases when a Var belongs to nullable_baserels, but still has some
> > > constraints preventing it from being nullable (e.g. a silly example
> when
> > > the not nullable column belong to the table, and the query does full
> > > join of this table on itself using this column)?
> > >
> > > Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a
> > WHERE
> > t2.b = 3; "?   In this case, the outer join will be reduced to inner join
> > at
> > reduce_outer_join stage, which means t2 will not be shown in
> > nullable_baserels.
>
> Nope, as I said it's a bit useless example of full self join t1 on
> itself. In this case not null column "a" will be considered as nullable,
> but following your description for is_var_nullable it's fine (although
> couple of commentaries to this function are clearly necessary).
>
> > > Is this function necessary for the following patches? I've got an
> > > impression that the discussion in this thread was mostly evolving about
> > > correct description when notnullattrs could be used, not making it
> > > bullet proof.
> > >
> >
> > Exactly, that is the blocker issue right now. I hope more authorities can
> > give
> > some suggestions to move on.
>
> Hm...why essentially a documentation question is the blocker? Or if you
> mean it's a question of the patch scope, are there any arguments for
> extending it?
>

I treat the below comment as the blocker issue:

> It would be good to agree on the correct representation for Vars that
> cannot produce NULLs so that we don't shut the door on classes of
> optimisation that require something other than what you need for your
> case.

David/Tom/Ashutosh,  do you mind to share more insights to this?
I mean the target is the patch is in a committable state.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Allow matching whole DN from a client certificate

2021-03-05 Thread Andrew Dunstan


On 3/4/21 2:16 PM, Joel Jacobson wrote:
> On Sat, Jan 30, 2021, at 22:18, Andrew Dunstan wrote:
>> ssl-match-client-cert-dn-v3.patch
>
> Spelling error of "conjunction":
> +   This option is probably best used in comjunction with a
> username map.
>
>



Yeah, fixed this, added a bit more docco, and got rid of some bitrot.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index b420486a0a..61b646c20a 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -598,7 +598,7 @@ hostnogssenc  database  user
 
   
-   In addition to the method-specific options listed below, there is one
+   In addition to the method-specific options listed below, there is a
method-independent authentication option clientcert, which
can be specified in any hostssl record.
This option can be set to verify-ca or
@@ -612,6 +612,27 @@ hostnogssenc  database  userhostssl entries.
   
+  
+   On any record using client certificate authentication (i.e. one
+   using the cert authentication method or one
+   using the clientcert option), you can specify
+   which part of the client certificate credentials to match using
+   the clientname option. This option can have one
+   of two values. If you specify clientname=CN, which
+   is the default, the username is matched against the certificate's
+   Common Name (CN). If instead you specify
+   clientname=DN the username is matched against the
+   entire Distinguished Name (DN) of the certificate.
+   This option is probably best used in conjunction with a username map.
+   The comparison is done with the DN in RFC2253 format.
+   To see the DN of a client certificate in this format,
+   do
+
+openssl x509 -in myclient.crt -noout --subject -nameopt RFC2253 | sed "s/^subject=//"
+
+Care need to be taken when using this option, especially when using
+regular expression matching against the DN.
+  
  
 

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 994251e7d9..5b4bad4cff 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -2800,12 +2800,15 @@ static int
 CheckCertAuth(Port *port)
 {
 	int			status_check_usermap = STATUS_ERROR;
+	char	   *peer_username;
 
 	Assert(port->ssl);
 
 	/* Make sure we have received a username in the certificate */
-	if (port->peer_cn == NULL ||
-		strlen(port->peer_cn) <= 0)
+	peer_username = port->hba->clientcertname == clientCertCN ? port->peer_cn : port->peer_dn;
+
+	if (peer_username == NULL ||
+		strlen(peer_username) <= 0)
 	{
 		ereport(LOG,
 (errmsg("certificate authentication failed for user \"%s\": client certificate contains no user name",
@@ -2813,8 +2816,8 @@ CheckCertAuth(Port *port)
 		return STATUS_ERROR;
 	}
 
-	/* Just pass the certificate cn to the usermap check */
-	status_check_usermap = check_usermap(port->hba->usermap, port->user_name, port->peer_cn, false);
+	/* Just pass the certificate cn/dn to the usermap check */
+	status_check_usermap = check_usermap(port->hba->usermap, port->user_name, peer_username, false);
 	if (status_check_usermap != STATUS_OK)
 	{
 		/*
@@ -2825,7 +2828,7 @@ CheckCertAuth(Port *port)
 		if (port->hba->clientcert == clientCertFull && port->hba->auth_method != uaCert)
 		{
 			ereport(LOG,
-	(errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": CN mismatch",
+	(errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": CN/DN mismatch",
 			port->user_name)));
 		}
 	}
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 4c4f025eb1..d0184a2ce2 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -543,22 +543,25 @@ aloop:
 	/* Get client certificate, if available. */
 	port->peer = SSL_get_peer_certificate(port->ssl);
 
-	/* and extract the Common Name from it. */
+	/* and extract the Common Name / Distinguished Name from it. */
 	port->peer_cn = NULL;
+	port->peer_dn = NULL;
 	port->peer_cert_valid = false;
 	if (port->peer != NULL)
 	{
 		int			len;
+		X509_NAME  *x509name = X509_get_subject_name(port->peer);
+		char	   *peer_cn;
+		char	   *peer_dn;
+		BIO		   *bio = NULL;
+		BUF_MEM*bio_buf = NULL;
 
-		len = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		NID_commonName, NULL, 0);
+		len = X509_NAME_get_text_by_NID(x509name, NID_commonName, NULL, 0);
 		if (len != -1)
 		{
-			char	   *peer_cn;
-
 			peer_cn = MemoryContextAlloc(TopMemoryContext, len + 1);
-			r = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		  NID_commonName, peer_cn, len + 1);
+			r = X509_NAME_get_text_by_NID(x509name, NID_commonName, peer_cn,
+		  len + 1);
 			peer_cn[len] = '\0';
 			if (r != len)
 			{
@@ -582,6 +585,36 @@ aloop:

Re: CI/windows docker vs "am a service" autodetection on windows

2021-03-05 Thread Andres Freund

Hi,

On 2021-03-05 10:57:52 -0800, Andres Freund wrote:
> On 2021-03-04 12:48:59 -0800, Andres Freund wrote:
> > On 2021-03-04 21:36:23 +0100, Magnus Hagander wrote:
> > > > I think that's a good answer for pg_ctl - not so sure about postgres
> > > > itself, at least once it's up and running. I don't know what lead to all
> > > > of this autodetection stuff, but is there the possibility of blocking on
> > > > whatever stderr is set too as a service?
> > > >
> > > > Perhaps we could make the service detection more reliable by checking
> > > > whether stderr is actually something useful?
> > > 
> > > So IIRC, and mind that this is like 15 years ago, there is something
> > > that looks like stderr, but the contents are thrown away. It probably
> > > exists specifically so that programs won't crash when run as a
> > > service...
> > 
> > Yea, that'd make sense.
> > 
> > I wish we had tests for the service stuff, but that's from long before
> > there were tap tests...
> 
> After fighting with a windows VM for a bit (ugh), it turns out that yes,
> there is stderr, but that fileno(stderr) returns -2, and
> GetStdHandle(STD_ERROR_HANDLE) returns NULL (not INVALID_HANDLE_VALUE).
> 
> The complexity however is that while that's true for pg_ctl within
> pgwin32_ServiceMain:
> checking what stderr=7FF8687DFCB0 is (handle: 0, fileno=-2)
> but not for postmaster or backends
> WARNING: 01000: checking what stderr=7FF880F5FCB0 is (handle: 92, 
> fileno=2)
> 
> which makes sense in a way, because we don't tell CreateProcessAsUser()
> that it should pass stdin/out/err down (which then seems to magically
> get access to the "topmost" console applications output - damn, this
> stuff is weird).

That part is not too hard to address - it seems we only need to do that
in pg_ctl pgwin32_doRunAsService(). It seems that the
stdin/stderr/stdout being set to invalid will then be passed down to
postmaster children.

https://docs.microsoft.com/en-us/windows/console/getstdhandle
"If an application does not have associated standard handles, such as a
service running on an interactive desktop, and has not redirected them,
the return value is NULL."

There does seem to be some difference between what services get as std*
- GetStdHandle() returns NULL, and what explicitly passing down invalid
handles to postmaster does - GetStdHandle() returns
INVALID_HANDLE_VALUE. But passing down NULL rather than
INVALID_HANDLE_VALUE to postmaster seems to lead to postmaster
re-opening console buffers.

Patch attached.

I'd like to commit something to address this issue to master soon - it
allows us to run a lot more tests in cirrus-ci... But probably not
backpatch it [yet] - there've not really been field complaints, and who
knows if there end up being some unintentional side-effects...


> You'd earlier mentioned that other distributions may not use pg_ctl
> register - but I assume they use pg_ctl runservice? Or do they actually
> re-implement all those magic incantations in pg_ctl.c?

It seems that we, in addition to the above patch, should add a guc that
pg_ctl runservice passes down to postgres. And then rip out the call to
pgwin32_is_service() from the backend.  That doesn't require other
distributions to use pg_ctl register, just pg_ctl runservice - which I
think they need to do anyway, unless they want to duplicate all the
logic around pgwin32_SetServiceStatus()?

Greetings,

Andres Freund
>From 86fb1f3da2db85d513f491d88b638ea7cc1327b8 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 2 Mar 2021 21:24:11 -0800
Subject: [PATCH v2] windows: Only consider us to be running as service if
 stderr is invalid.

Previously pgwin32_is_service() would falsely return true when
postgres is started from somewhere within a service, but not as a
service. That is e.g. always the case with windows docker containers,
which some CI services use to run windows tests in.

In addition to this change, it likely would be a good idea to have
pg_ctl runservice pass down a flag indicating that postgres is running
as a service.

Author: Andres Freund 
Discussion: https://postgr.es/m/20210305185752.3up5eq2eanb7o...@alap3.anarazel.de
---
 src/port/win32security.c | 13 +++--
 src/bin/pg_ctl/pg_ctl.c  | 33 +
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/src/port/win32security.c b/src/port/win32security.c
index 4a673fde19a..b57ce61d752 100644
--- a/src/port/win32security.c
+++ b/src/port/win32security.c
@@ -95,8 +95,9 @@ pgwin32_is_admin(void)
  * We consider ourselves running as a service if one of the following is
  * true:
  *
- * 1) We are running as LocalSystem (only used by services)
- * 2) Our token contains SECURITY_SERVICE_RID (automatically added to the
+ * 1) Standard error is not valid (always the case for services)
+ * 2) We are running as LocalSystem (only used by services)
+ * 3) Our token contains SECURITY_SERVICE_RID (automatically added to the
  *	  process token by the SCM when starting a

Re: Nicer error when connecting to standby with hot_standby=off

2021-03-05 Thread Alvaro Herrera

On 2021-Mar-05, James Coleman wrote:

> Do you have any thoughts on what you'd like to see the message be? I
> could change the PM_RECOVERY (without hot standby enabled) to return
> CAC_RECOVERY which would give us the message "the database system is
> in recovery mode", but that would be a change from what that state
> returns now in a way that's unrelated to the goal of the patch.

Here's an idea:

* hot_standby=on, before reaching consistent state
  FATAL:  database is not accepting connections
  DETAIL:  Consistent state has not yet been reached.

* hot_standby=off, past consistent state
  FATAL:  database is not accepting connections
  DETAIL:  Hot standby mode is disabled.

* hot_standby=off, before reaching consistent state
  FATAL:  database is not accepting connections
  DETAIL:  Hot standby mode is disabled.
  or maybe
  DETAIL:  Consistent state has not yet been reached, and hot standby mode is 
disabled.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Ed is the standard text editor."
  http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3

Re: Nicer error when connecting to standby with hot_standby=off

2021-03-05 Thread James Coleman

On Fri, Mar 5, 2021 at 12:36 PM Fujii Masao  wrote:
>
>
>
> On 2021/03/05 22:45, David Steele wrote:
> > Hi Fujii,
> >
> > On 9/8/20 1:17 PM, James Coleman wrote:
> >> On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
> >>  wrote:
> >>> Thanks for updating the patch! But it failed to be applied to the master 
> >>> branch
> >>> cleanly because of the recent commit 0038f94387. So could you update the 
> >>> patch
> >>> again?
> >>
> >> Updated patch attached.
> >
> > Any thoughts on the updated patch?
>
> Thanks for the ping!
>
> With the patch, if hot_standby is enabled, the message
> "the database system is starting up" is output while the server is
> in PM_RECOVERY state until it reaches the consistent recovery point.
> On the other hand, if hot_standby is not enabled, the message
> "the database system is up, but hot_standby is off" is output even
> while the server is in that same situation. That is, opposite
> messages can be output for the same situation based on the setting
> of hot_standby. One message is "system is starting up", the other
> is "system is up". Isn't this rather confusing?

Do you have any thoughts on what you'd like to see the message be? I
could change the PM_RECOVERY (without hot standby enabled) to return
CAC_RECOVERY which would give us the message "the database system is
in recovery mode", but that would be a change from what that state
returns now in a way that's unrelated to the goal of the patch.

Thanks,
James

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson

On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote:
> I like the idea so I did a bit of testing.  I think the following should not 
> error, but does:
> 
> +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
> +ERROR:  range lower bound must be less than or equal to range upper bound

Doh! How stupid of me. I realize now I had a off-by-one thinko in my 0001 patch 
using int4range.

I didn't use the raw "so" and "eo" values in regexp.c like I should have,
instead, I incorrectly used (so + 1) as the startpos,
and just eo as the endpos.

This is what caused all the problems.

The fix is simple:
-  lower.val = Int32GetDatum(so + 1);
+ lower.val = Int32GetDatum(so);

The example that gave the error now works properly:

SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
regexp_positions
--
{"[6,7)"}
(1 row)

I've also created a SQL PoC of the composite range type idea,
and convenience wrapper functions for int4range and int8range.

CREATE TYPE range AS (start int8, stop int8);

Helper functions:
range(start int8, stop int8) -> range
range(int8range) -> range
range(int4range) -> range
range(int8range[]) -> range[]
range(int4range[]) -> range[]

Demo:

regexp_positions() returns setof int4range[]:

SELECT r FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
   r
---
{"[3,7)","[6,12)"}
{"[11,17)","[16,21)"}
(2 rows)

Convert int4range[] -> range[]:

SELECT range(r) FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
 range
---
{"(3,6)","(6,11)"}
{"(11,16)","(16,20)"}
(2 rows)

"start" and "stop" fields:

SELECT (range(r[1])).* FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
start | stop
---+--
 3 |6
11 |   16
(2 rows)

zero-length match at beginning:

SELECT r FROM regexp_positions('','^','g') AS r;
 r
---
{"[0,1)"}
(1 row)

SELECT (range(r[1])).* FROM regexp_positions('','^','g') AS r;
start | stop
---+--
 0 |0
(1 row)

My conclusion is that we should use setof int4range[] as the return value for 
regexp_positions().

New patch attached.

The composite range type and helper functions are of course not at all 
necessary,
but I think they would be a nice addition, to make it easier to work with ranges
for composite types. I intentionally didn't create anyrange versions of them,
since they can only support composite types,
since they don't require the inclusive/exclusive semantics.

/Joel

range.sql
Description: Binary data

0003-regexp-positions.patch
Description: Binary data

Re: PROXY protocol support

2021-03-05 Thread Jacob Champion

On Fri, 2021-03-05 at 10:22 +0100, Magnus Hagander wrote:
> On Fri, Mar 5, 2021 at 12:21 AM Jacob Champion  wrote:
> > A small nitpick on the current separate-port PoC is that I'm forced to
> > set up a "regular" TCP port, even if I only want the PROXY behavior.
> 
> Yeah. I'm not sure there's a good way to avoid that without making
> configuations a lot more complex.

A generic solution would also solve the "I want to listen on more than
one port" problem, but that's probably not something to tackle at the
same time.

> > The original-host logging isn't working for me:
> > 
> > [...]
> 
> That's interesting -- it works perfectly fine here. What platform are
> you testing on?

Ubuntu 20.04.

> But yes, you are correct, it should do that. I guess it's a case of
> the salen actually ending up being uninitialized in the copy, and thus
> failing at a later stage.

That seems right; EAI_FAMILY can be returned for a mismatched addrlen.

> (I sent for sizeof(SockAddr) to make it
> easier to read without having to look things up, but the net result is
> the same)

Cool. Did you mean to attach a patch?

== More Notes ==

(Stop me if I'm digging too far into a proof of concept patch.)

> + proxyaddrlen = pg_ntoh16(proxyheader.len);
> +
> + if (proxyaddrlen > sizeof(proxyaddr))
> + {
> + ereport(COMMERROR,
> + (errcode(ERRCODE_PROTOCOL_VIOLATION),
> +  errmsg("oversized proxy packet")));
> + return STATUS_ERROR;
> + }

I think this is not quite right -- if there's additional data beyond
the IPv6 header size, that just means there are TLVs tacked onto the
header that we should ignore. (Or, eventually, use.)

Additionally, we need to check for underflow as well. A misbehaving
proxy might not send enough data to fill up the address block for the
address family in use.

> + /* If there is any more header data present, skip past it */
> + if (proxyaddrlen > sizeof(proxyaddr))
> + pq_discardbytes(proxyaddrlen - sizeof(proxyaddr));

This looks like dead code, given that we'll error out for the same
check above -- but once it's no longer dead code, the return value of
pq_discardbytes should be checked for EOF.

> + else if (proxyheader.fam == 0x11)
> + {
> + /* TCPv4 */
> + port->raddr.addr.ss_family = AF_INET;
> + port->raddr.salen = sizeof(struct sockaddr_in);
> + ((struct sockaddr_in *) &port->raddr.addr)->sin_addr.s_addr = 
> proxyaddr.ip4.src_addr;
> + ((struct sockaddr_in *) &port->raddr.addr)->sin_port = 
> proxyaddr.ip4.src_port;
> + }

I'm trying to reason through the fallout of setting raddr and not
laddr. I understand why we're not setting laddr -- several places in
the code rely on the laddr to actually refer to a machine-local address
-- but the fact that there is no actual connection from raddr to laddr
could cause shenanigans. For example, the ident auth protocol will just
break (and it might be nice to explicitly disable it for PROXY
connections). Are there any other situations where a "faked" raddr
could throw off Postgres internals?

--Jacob

Re: CI/windows docker vs "am a service" autodetection on windows

2021-03-05 Thread Andres Freund

Hi,

On 2021-03-04 12:48:59 -0800, Andres Freund wrote:
> On 2021-03-04 21:36:23 +0100, Magnus Hagander wrote:
> > > I think that's a good answer for pg_ctl - not so sure about postgres
> > > itself, at least once it's up and running. I don't know what lead to all
> > > of this autodetection stuff, but is there the possibility of blocking on
> > > whatever stderr is set too as a service?
> > >
> > > Perhaps we could make the service detection more reliable by checking
> > > whether stderr is actually something useful?
> > 
> > So IIRC, and mind that this is like 15 years ago, there is something
> > that looks like stderr, but the contents are thrown away. It probably
> > exists specifically so that programs won't crash when run as a
> > service...
> 
> Yea, that'd make sense.
> 
> I wish we had tests for the service stuff, but that's from long before
> there were tap tests...

After fighting with a windows VM for a bit (ugh), it turns out that yes,
there is stderr, but that fileno(stderr) returns -2, and
GetStdHandle(STD_ERROR_HANDLE) returns NULL (not INVALID_HANDLE_VALUE).

The complexity however is that while that's true for pg_ctl within
pgwin32_ServiceMain:
checking what stderr=7FF8687DFCB0 is (handle: 0, fileno=-2)
but not for postmaster or backends
WARNING: 01000: checking what stderr=7FF880F5FCB0 is (handle: 92, fileno=2)

which makes sense in a way, because we don't tell CreateProcessAsUser()
that it should pass stdin/out/err down (which then seems to magically
get access to the "topmost" console applications output - damn, this
stuff is weird).

You'd earlier mentioned that other distributions may not use pg_ctl
register - but I assume they use pg_ctl runservice? Or do they actually
re-implement all those magic incantations in pg_ctl.c?

Greetings,

Andres Freund

Re: We should stop telling users to "vacuum that database in single-user mode"

2021-03-05 Thread Alvaro Herrera

On 2021-Mar-02, David Rowley wrote:

> However, I wonder if it's worth going a few steps further to try and
> reduce the chances of that message being seen in the first place.
> Maybe it's worth considering ditching any (auto)vacuum cost limits for
> any table which is within X transaction from wrapping around.
> Likewise for "VACUUM;" when the database's datfrozenxid is getting
> dangerously high.

Yeah, I like this kind of approach, and I think one change we can do
that can have a very large effect is to disable index cleanup when in
emergency situations.  That way, the XID limit is advanced as much as
possible with as little effort as possible; once the system is back in
normal conditions, indexes can be cleaned up at a leisurely pace.

-- 
Álvaro Herrera   Valdivia, Chile
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

Re: Nicer error when connecting to standby with hot_standby=off

2021-03-05 Thread Fujii Masao





On 2021/03/05 22:45, David Steele wrote:

Hi Fujii,

On 9/8/20 1:17 PM, James Coleman wrote:

On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
 wrote:

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?


Updated patch attached.


Any thoughts on the updated patch?


Thanks for the ping!

With the patch, if hot_standby is enabled, the message
"the database system is starting up" is output while the server is
in PM_RECOVERY state until it reaches the consistent recovery point.
On the other hand, if hot_standby is not enabled, the message
"the database system is up, but hot_standby is off" is output even
while the server is in that same situation. That is, opposite
messages can be output for the same situation based on the setting
of hot_standby. One message is "system is starting up", the other
is "system is up". Isn't this rather confusing?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PoC] Non-volatile WAL buffer

2021-03-05 Thread Tomas Vondra

Hello Takashi-san,

On 3/5/21 9:08 AM, Takashi Menjo wrote:
> Hi Tomas,
> 
> Thank you so much for your report. I have read it with great interest.
> 
> Your conclusion sounds reasonable to me. My patchset you call "NTT /
> segments" got as good performance as "NTT / buffer" patchset. I have
> been worried that calling mmap/munmap for each WAL segment file could
> have a lot of overhead. Based on your performance tests, however, the
> overhead looks less than what I thought. In addition, "NTT / segments"
> patchset is more compatible to the current PG and more friendly to
> DBAs because that patchset uses WAL segment files and does not
> introduce any other new WAL-related file.
> 

I agree. I was actually a bit surprised it performs this well, mostly in
line with the "NTT / buffer" patchset. I've seen significant issues with
our simple experimental patches, which however went away with larger WAL
segments. But the "NTT / segments" patch does not have that issue, so
either our patches were doing something wrong, or perhaps there was some
other issue (not sure why larger WAL segments would improve that).

Do these results match your benchmarks? Or are you seeing significantly
different behavior?

Do you have any thoughts regarding the impact of full-page writes? So
far all the benchmarks we did focused on small OLTP transactions on data
sets that fit into RAM. The assumption was that that's the workload that
would benefit from this, but maybe that's missing something important
about workloads producing much larger WAL records? Say, workloads
working with large BLOBs, bulk loads etc.

The other question is whether simply placing WAL on DAX (without any
code changes) is safe. If it's not, then all the "speedups" are computed
with respect to unsafe configuration and so are useless. And BTT should
be used instead, which would of course produce very different results.

> I also think that supporting both file I/O and mmap is better than
> supporting only mmap. I will continue my work on "NTT / segments"
> patchset to support both ways.
> 

+1

> In the following, I will answer "Issues & Questions" you reported.
> 
> 
>> While testing the "NTT / segments" patch, I repeatedly managed to crash the 
>> cluster with errors like this:
>>
>> 2021-02-28 00:07:21.221 PST client backend [3737139] WARNING:  creating 
>> logfile segment just before
>> mapping; path "pg_wal/00010007002F"
>> 2021-02-28 00:07:21.670 PST client backend [3737142] WARNING:  creating 
>> logfile segment just before
>> mapping; path "pg_wal/000100070030"
>> ...
>> 2021-02-28 00:07:21.698 PST client backend [3737145] WARNING:  creating 
>> logfile segment just before
>> mapping; path "pg_wal/000100070030"
>> 2021-02-28 00:07:21.698 PST client backend [3737130] PANIC:  could not open 
>> file
>> "pg_wal/000100070030": No such file or directory
>>
>> I do believe this is a thinko in the 0008 patch, which does XLogFileInit in 
>> XLogFileMap. Notice there are multiple
>> "creating logfile" messages with the ..0030 segment, followed by the 
>> failure. AFAICS the XLogFileMap may be
>> called from multiple backends, so they may call XLogFileInit concurrently, 
>> likely triggering some sort of race
>> condition. It's fairly rare issue, though - I've only seen it twice from ~20 
>> runs.
> 
> Thank you for your report. I found that rather the patch 0009 has an
> issue, and that will also cause WAL loss. I should have set
> use_existent to true, or InstallXlogFileSegment and BasicOpenFile in
> XLogFileInit can be racy. I have misunderstood that use_existent can
> be true because I am creating a brand-new file with XLogFileInit.
> 
> I will fix the issue.
> 

OK, thanks for looking into this.

> 
>> The other question I have is about WALInsertLockUpdateInsertingAt. 0003 
>> removes this function, but leaves
>> behind some of the other bits working with insert locks and insertingAt. But 
>> it does not explain how it works without
>> WaitXLogInsertionsToFinish() - how does it ensure that when we commit 
>> something, all the preceding WAL is
>> "complete" (i.e. written by other backends etc.)?
> 
> To wait for *all* the WALInsertLocks to be released, no matter each of
> them precedes or follows the current insertion.
> 
> It would have worked functionally, but I rethink it is not good for
> performance because XLogFileMap in GetXLogBuffer (where
> WaitXLogInsertionsToFinish is removed) can block because it can
> eventually call write() in XLogFileInit.
> 
> I will restore the WALInsertLockUpdateInsertingAt function and related
> code for mmap.
> 

OK. I'm still not entirely sure I understand if the current version is
correct, but I'll wait for the reworked version.


kind regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: n_mod_since_analyze isn't reset at table truncation

2021-03-05 Thread Julien Rouhaud

On Fri, Mar 05, 2021 at 10:43:51PM +0900, Masahiko Sawada wrote:
> 
> I think we can use n_live_tup for that but since it's an estimation
> value it doesn't necessarily have the same result as DELETE and I'm
> not sure it's reliable.

I agree that it's not 100% reliable, but in my experience those estimates are
quite good and of the same order of magnitude, which should be enough for this
use case.  It will be in any case better that simply keeping the old value, and
I doubt that we can do better anyway.

Re: Allow an alias to be attached directly to a JOIN ... USING

2021-03-05 Thread David Steele


On 11/14/20 3:49 AM, Peter Eisentraut wrote:

On 2020-11-10 16:15, Georgios Kokolatos wrote:

I noticed that this patch fails on the cfbot.
For this, I changed the status to: 'Waiting on Author'.

Cheers,
//Georgios

The new status of this patch is: Waiting on Author


Here is a rebased and lightly retouched patch.


There don't seem to be any objections to just documenting the slight 
divergence from the spec.


So, does it make sense to just document that and proceed?

Regards,
--
-David
da...@pgmasters.net

RE: [POC] Fast COPY FROM command for the table with foreign partitions

2021-03-05 Thread tsunakawa.ta...@fujitsu.com

From: Justin Pryzby 
> I think this change to the regression tests is suspicous:
> 
> -CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES
> ($1, $2)
> -COPY rem2, line 1: "-1 xyzzy"
> +CONTEXT:  COPY loc2, line 1: "-1   xyzzy"
> +remote SQL command: COPY public.loc2(f1, f2) FROM STDIN
> +COPY rem2, line 2
> 
> I think it shouldn't say "COPY rem2, line 2" but rather a remote version of 
> the
> same:
> |COPY loc2, line 1: "-1   xyzzy"

No, the output is OK.  The remote message is included as the first line of the 
CONTEXT message field.  The last line of the CONTEXT field is something that 
was added by the local COPY command.  (Anyway, useful enough information is 
included in the message -- the constraint violation and the data that caused 
it.)


> I have rebased this on my side over yesterday's libpq changes - I'll send it 
> if
> you want, but it's probably just as easy if you do it.

I've managed to rebased it, although it took unexpectedly long.  The patch is 
attached.  It passes make check against core and postgres_fdw.  I'll turn the 
CF status back to ready for committer shortly.



Regards
Takayuki Tsunakawa



v19-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patch
Description:  v19-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patch

Re: Disallow SSL compression?

2021-03-05 Thread Magnus Hagander

On Fri, Mar 5, 2021 at 1:37 PM Michael Paquier  wrote:
>
> On Fri, Mar 05, 2021 at 01:21:01PM +0100, Daniel Gustafsson wrote:
> > On 5 Mar 2021, at 08:04, Michael Paquier  wrote:
> >> FWIW, I would vote to nuke it from all those places, reducing a bit
> >> pg_stat_get_activity() while on it.  Keeping it around in the system
> >> catalogs may cause confusion IMHO, by making people think that it is
> >> still possible to get into configurations where sslcompression could
> >> be really enabled.  The rest of the patch looks fine to me.
> >
> > Attached is a version which removes that as well.
>
> Peter, Magnus, any comments about this point?

We've broken stats views before. While it'd be nice if we could group
multiple breakages at the same time, I don't think it's that
important. Better to get rid of it once and for all from as many
places as possible.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Tom Lane

Kohei KaiGai  writes:
> Thanks, the attached patch add cube--1.5 for binary send/recv functions and
> modification of cube type using the new ALTER TYPE.

Hm, that was already superseded by events (112d411fb).
As long as we get this done for v14, we can just treat it
as an add-on to cube 1.5, so here's a quick rebase onto HEAD.

Scanning the code, I have a couple of gripes.  I'm not sure it's
a good plan to just send the "header" field raw like that ---
would breaking it into a dimension field and a point bool be
better?  In any case, the receive function has to be more careful
than this about accepting only valid header values.

Also, I don't think "offsetof(NDBOX, x[nitems])" is per project
style.  It at least used to be true that MSVC couldn't cope
with that, so we prefer

offsetof(NDBOX, x) + nitems * sizeof(whatever)

regards, tom lane

diff --git a/contrib/cube/cube--1.4--1.5.sql b/contrib/cube/cube--1.4--1.5.sql
index 54492e5d18..4b5bf8d205 100644
--- a/contrib/cube/cube--1.4--1.5.sql
+++ b/contrib/cube/cube--1.4--1.5.sql
@@ -6,3 +6,16 @@
 -- Remove @ and ~
 DROP OPERATOR @ (cube, cube);
 DROP OPERATOR ~ (cube, cube);
+
+-- Add binary input/output handlers
+CREATE FUNCTION cube_recv(internal)
+RETURNS cube
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION cube_send(cube)
+RETURNS bytea
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+ALTER TYPE cube SET ( RECEIVE = cube_recv, SEND = cube_send );
diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
index 6f810b26c5..d6b0dd75b0 100644
--- a/contrib/cube/cube.c
+++ b/contrib/cube/cube.c
@@ -13,6 +13,7 @@
 #include "access/gist.h"
 #include "access/stratnum.h"
 #include "cubedata.h"
+#include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
 
@@ -31,6 +32,8 @@ PG_FUNCTION_INFO_V1(cube_in);
 PG_FUNCTION_INFO_V1(cube_a_f8_f8);
 PG_FUNCTION_INFO_V1(cube_a_f8);
 PG_FUNCTION_INFO_V1(cube_out);
+PG_FUNCTION_INFO_V1(cube_send);
+PG_FUNCTION_INFO_V1(cube_recv);
 PG_FUNCTION_INFO_V1(cube_f8);
 PG_FUNCTION_INFO_V1(cube_f8_f8);
 PG_FUNCTION_INFO_V1(cube_c_f8);
@@ -319,6 +322,55 @@ cube_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(buf.data);
 }
 
+/*
+ * cube_send - a binary output handler for cube type
+ */
+Datum
+cube_send(PG_FUNCTION_ARGS)
+{
+	NDBOX	   *cube = PG_GETARG_NDBOX_P(0);
+	StringInfoData buf;
+	int32		i, nitems = DIM(cube);
+
+	pq_begintypsend(&buf);
+	pq_sendint32(&buf, cube->header);
+	for (i=0; i < nitems; i++)
+	{
+		pq_sendfloat8(&buf, LL_COORD(cube, i));
+	}
+	if (!cube_is_point_internal(cube))
+	{
+		for (i=0; i < nitems; i++)
+		{
+			pq_sendfloat8(&buf, UR_COORD(cube, i));
+		}
+	}
+	PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
+}
+
+/*
+ * cube_recv - a binary input handler for cube type
+ */
+Datum
+cube_recv(PG_FUNCTION_ARGS)
+{
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
+	int32		header;
+	int			i, nitems;
+	NDBOX	   *cube;
+
+	header = pq_getmsgint(buf, sizeof(int32));
+	nitems = (header & DIM_MASK);
+	if ((header & POINT_BIT) == 0)
+		nitems += nitems;
+	cube = palloc(offsetof(NDBOX, x[nitems]));
+	SET_VARSIZE(cube, offsetof(NDBOX, x[nitems]));
+	cube->header = header;
+	for (i=0; i < nitems; i++)
+		cube->x[i] = pq_getmsgfloat8(buf);
+
+	PG_RETURN_NDBOX_P(cube);
+}
 
 /*
  *		   GiST functions

Re: How to retain lesser paths at add_path()?

2021-03-05 Thread David Steele


On 11/5/20 10:41 AM, Dmitry Dolgov wrote:

On Tue, Jan 14, 2020 at 12:46:02AM +0900, Kohei KaiGai wrote:
The v2 patch is attached.


Thanks for the patch! I had a quick look at it and have a few questions:

KaiGai, any thoughts on Dmitry's questions?

Regards,
--
-David
da...@pgmasters.net

Re: pgbench - add pseudo-random permutation function

2021-03-05 Thread Alvaro Herrera

Hi David

On 2021-Mar-05, David Steele wrote:

> On 4/2/20 3:01 AM, Alvaro Herrera wrote:
> > On 2020-Apr-02, Fabien COELHO wrote:
> > 
> > > > I'm planning to mark this patch RwF on April 8 and I suggest you
> > > > resubmit if you are able to get some consensus.
> > > 
> > > People interested in non-uniform benchmarks would see the point. Why many
> > > people would be happy with uniform benchmarks only while life is not 
> > > uniform
> > > at all fails me.
> > 
> > I don't think we should boot this patch.  I don't think I would be able
> > to get this over the commit line in this CF, but let's not discard it.
> 
> As far as I can see you are the only committer who has shown real interest
> in this patch. It's been sitting idle for the last year.
> 
> What are your current thoughts?

Thanks for prodding.  I still think it's a useful feature.  However I
don't think I'll have to time to get it done on the current commitfest.
I suggest to let it sit in the commitfest to see if somebody else will
pick it up -- and if not, we move it to the next one, with apologies to
author and reviewers.

I may have time to become familiar or at least semi-comfortable with all
that weird math in it by then.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"At least to kernel hackers, who really are human, despite occasional
rumors to the contrary" (LWN.net)

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Kohei KaiGai

Thanks, the attached patch add cube--1.5 for binary send/recv functions and
modification of cube type using the new ALTER TYPE.

Best regards,

2021年3月4日(木) 0:45 Tom Lane :
>
> I wrote:
> > You can add the I/O functions with ALTER TYPE nowadays.
>
> To be concrete, see 949a9f043eb70a4986041b47513579f9a13d6a33
>
> regards, tom lane

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 

pgsql-cube-binary-inout-handler.v2.patch
Description: Binary data

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Tom Lane

Kyotaro Horiguchi  writes:
> So I think we need to remove the shared_buffers setting for the
> allows_streamig case in PostgresNode.pm

That would have uncertain effects on other TAP tests, so I'm disinclined
to do it that way.  011_crash_recovery.pl doesn't actually use a standby
server, so just removing its use of the allows_streaming option seems
sufficient.

But, of course, first we need a fix for the bug we now know exists.
Was anyone volunteering to make the patch?

regards, tom lane

Re: [PATCH] remove deprecated v8.2 containment operators

2021-03-05 Thread Tom Lane

Justin Pryzby  writes:
> On Thu, Mar 04, 2021 at 08:58:39PM -0500, Tom Lane wrote:
>> I'm confused by why this patch is only dropping the operators'
>> opclass-membership links.  Don't we want to actually DROP OPERATOR
>> too?

> Okay

Pushed.  Since hstore already had a new-in-v14 edition, I just added
the commands to hstore--1.7--1.8.sql rather than make another update
script.  (Also, you forgot to drop ~ in that one?)

> Also , I think it's unrelated to this patch, but shouldn't these be removed ?

Right, done.

regards, tom lane

Re: problem with RETURNING and update row movement

2021-03-05 Thread David Steele


On 10/30/20 6:00 AM, Amit Langote wrote:

The question
I guess is whether that thread must conclude before the fix here can
be committed.


Indeed. But it seems like there is a candidate patch on [1] though that 
thread has also stalled. I'll try to get some status on that thread but 
the question remains if this patch will be stalled until [1] is resolved.


Regards,
--
-David
da...@pgmasters.net

[1] 
https://www.postgresql.org/message-id/flat/141051591267657%40mail.yandex.ru

Re: shared-memory based stats collector

2021-03-05 Thread Fujii Masao





On 2021/03/05 17:18, Kyotaro Horiguchi wrote:

At Thu, 21 Jan 2021 12:03:48 +0900 (JST), Kyotaro Horiguchi 
 wrote in

Commit 960869da08 (database statistics) conflicted with this. Rebased.

I'm concerned about the behavior that pgstat_update_connstats calls
GetCurrentTimestamp() every time stats update happens (with intervals
of 10s-60s in this patch). But I didn't change that design since that
happens with about 0.5s intervals in master and the rate is largely
reduced in this patch, to make this patch simpler.


I stepped on my foot, and another commit coflicted. Just rebased.


Thanks for rebasing the patches!

I think that 0003 patch is self-contained and useful, for example which
enables us to monitor archiver process in pg_stat_activity. So IMO
it's worth pusing 0003 patch firstly.

Here are the review comments for 0003 patch.

+   /* Archiver process's latch */
+   Latch  *archiverLatch;
+   /* Current shared estimate of appropriate spins_per_delay value */

The last line in the above seems not necessary.

In proc.h, NUM_AUXILIARY_PROCS needs to be incremented.

/* --
 * Functions called from postmaster
 * --
 */
extern int  pgarch_start(void);

In pgarch.h, the above is not necessary.

+extern void XLogArchiveWakeup(void);

This seems no longer necessary.

+extern void XLogArchiveWakeupStart(void);
+extern void XLogArchiveWakeupEnd(void);
+extern void XLogArchiveWakeup(void);

These seem also no longer necessary.

PgArchPID = 0;
if (!EXIT_STATUS_0(exitstatus))
-   LogChildExit(LOG, _("archiver process"),
-pid, exitstatus);
-   if (PgArchStartupAllowed())
-   PgArchPID = pgarch_start();
+   HandleChildCrash(pid, exitstatus,
+_("archiver 
process"));

I don't think that we should treat non-zero exit condition as a crash,
as before. Otherwise when archive_command fails on a signal,
archiver emits FATAL error and which leads the server restart.

- * walwriter, autovacuum, or background worker.
+ * walwriter, autovacuum, archiver or background worker.
  *
  * The objectives here are to clean up our local state about the child
  * process, and to signal all other remaining children to quickdie.
@@ -3609,6 +3606,18 @@ HandleChildCrash(int pid, int exitstatus, const char 
*procname)
signal_child(AutoVacPID, (SendStop ? SIGSTOP : SIGQUIT));
}
 
+	/* Take care of the archiver too */

+   if (pid == PgArchPID)
+   PgArchPID = 0;
+   else if (PgArchPID != 0 && take_action)
+   {
+   ereport(DEBUG2,
+   (errmsg_internal("sending %s to process %d",
+(SendStop ? "SIGSTOP" : 
"SIGQUIT"),
+(int) 
PgArchPID)));
+   signal_child(PgArchPID, (SendStop ? SIGSTOP : SIGQUIT));
+   }
+

Same as above.


In xlogarchive.c, "#include "storage/pmsignal.h"" is no longer necessary.


pgarch_forkexec() should be removed from pgarch.c because it's no longer used.


/* 
 * Public functions called from postmaster follow
 * 
 */

The definition of PgArchiverMain() should be placed just
after the above comment.


exit(0) in PgArchiverMain() should be proc_exit(0)?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Andreas Karlsson


On 3/4/21 4:40 PM, Tom Lane wrote:

I wonder if a 2-D integer array wouldn't be a better idea,
ie {{startpos1,length1},{startpos2,length2},...}.  My experience
with working with parallel arrays in SQL has been unpleasant.


Hm, I can see your point but on the other hand I can't say my 
experiences working with 2-D arrays have been that pleasant either. The 
main issue being how there is no simple way to unnest just one dimension 
of the array. Maybe it would be worth considering implementing a 
function for that.


As far as I know to unnest just one dimension you would need to use 
generate_series() or something like the query below. Please correct me 
if I am wrong and there is some more ergonomic way to do it.


WITH d (a) AS (SELECT '{{2,3},{4,5}}'::int[])
SELECT array_agg(unnest) FROM d, unnest(a) WITH ORDINALITY GROUP BY 
(ordinality - 1) / array_length(a, 2);


Andreas

Re: Which PG version does CVE-2021-20229 affected?

2021-03-05 Thread Tom Lane

Michael Banck  writes:
> On Fri, Mar 05, 2021 at 04:38:17PM +0900, Michael Paquier wrote:
>> This link includes incorrect information.  CVE-2021-20229 is only a
>> problem in 13.0 and 13.1, fixed in 13.2.  Please see for example here:
>> https://www.postgresql.org/support/security/

> Probably because the referenced Red Hat bugzilla bug claims it's
> affecting all back branches and they scrapes that info from there:

> https://bugzilla.redhat.com/show_bug.cgi?id=1925296

Indeed.  Must have been some internal miscommunication in Red Hat,
because we certainly gave them the right info when we filed for the
CVE number.  I've commented on that BZ entry, hopefully that'll be
enough to get them to update things.

regards, tom lane

Re: PoC/WIP: Extended statistics on expressions

2021-03-05 Thread Dean Rasheed

On Thu, 4 Mar 2021 at 22:16, Tomas Vondra  wrote:
>
> Attached is a slightly improved version of the patch series, addressing
> most of the issues raised in the previous message.

Cool. Sorry for the delay replying.

> 0003-Extended-statistics-on-expressions-20210304.patch
>
> Mostly unchanged, The one improvement is removing some duplicate code in
> in mvc.c.
>
> 0004-WIP-rework-tracking-of-expressions-20210304.patch
>
> This is mostly unchanged of the patch reworking how we assign artificial
> attnums to expressions (negative instead of (MaxHeapAttributeNumber+i)).

Looks good.

I see you undid the change to get_relation_statistics() in plancat.c,
which offset the attnums of plain attributes in the StatisticExtInfo
struct. I was going to suggest that as a simplification to the
previous 0004 patch. Related to that, is this comment in
dependencies_clauselist_selectivity():

/*
 * Count matching attributes - we have to undo two attnum offsets.
 * First, the dependency is offset using the number of expressions
 * for that statistics, and then (if it's a plain attribute) we
 * need to apply the same offset as above, by unique_exprs_cnt.
 */

which needs updating, since there is now just one attnum offset, not
two. Only the unique_exprs_cnt offset is relevant now.

Also, related to that change, I don't think that
stat_covers_attributes() is needed anymore. I think that the code that
calls it can now just be reverted back to using bms_is_subset(), since
that bitmapset holds plain attributes that aren't offset.

> 0005-WIP-unify-handling-of-attributes-and-expres-20210304.patch
>
> This reworks how we build statistics on attributes and expressions.
> Instead of treating attributes and expressions separately, this  allows
> handling them uniformly.
>
> Until now, the various "build" functions (for different statistics
> kinds) extracted attribute values from sampled tuples, but expressions
> were pre-calculated in a separate array. Firstly to save CPU time (not
> having to evaluate expensive expressions repeatedly) and to keep the
> different stats consistent (there might be volatile functions etc.).
>
> So the build functions had to look at the attnum, determine if it's
> attribute or expression, and in some cases it was tricky / easy to get
> wrong.
>
> This patch replaces this "split" view with a simple "consistent"
> representation merging values from attributes and expressions, and just
> passes that to the build functions. There's no need to check the attnum,
> and handle expressions in some special way, so the build functions are
> much simpler / easier to understand (at least I think so).
>
> The build data is represented by "StatsBuildData" struct - not sure if
> there's a better name.
>
> I'm mostly happy with how this turned out. I'm sure there's a bit more
> cleanup needed (e.g. the merging/remapping of dependencies needs some
> refactoring, I think) but overall this seems reasonable.

Agreed. That's a nice improvement.

I wonder if dependency_is_compatible_expression() can be merged with
dependency_is_compatible_clause() to reduce code duplication. It
probably also ought to be possible to support "Expr IN Array" there,
in a similar way to the other code in statext_is_compatible_clause().
Also, should this check rinfo->clause_relids against the passed-in
relid to rule out clauses referencing other relations, in the same way
that statext_is_compatible_clause() does?

> I did some performance testing, I don't think there's any measurable
> performance degradation. I'm actually wondering if we need to transform
> the AttrNumber arrays into bitmaps in various places - maybe we should
> just do a plain linear search. We don't really expect many elements, as
> each statistics has 8 attnums at most. So maybe building the bitmapsets
> is a net loss? The one exception might be functional dependencies, where
> we can "merge" multiple statistics together. But even then it'd require
> many statistics objects to make a difference.

Possibly. There's a danger in trying to change too much at once
though. As it stands, I think it's fairly close to being committable,
with just a little more tidying up.

Regards,
Dean

Re: DROP INDEX CONCURRENTLY on partitioned index

2021-03-05 Thread David Steele


Hi Michael,

On 10/27/20 8:44 PM, Justin Pryzby wrote:



3. I have a patch which changes index_drop() to "expand" a partitioned index 
into
its list of children.  Each of these becomes a List:
| indexId, heapId, userIndexRelation, userHeapRelation, heaplocktag, heaprelid, 
indexrelid
The same process is followed as for a single index, but handling all partitions
at once in two transactions total.  Arguably, this is bad since that function
currently takes a single Oid but would now ends up operating on a list of 
indexes.


This is what's implemented in the attached.  It's very possible I've missed
opportunities for better simplification/integration.


It appears there are still some issues to be resolved with this patch, 
but the next step seems to be for you to have a look at Justin's most 
recent patch.


Regards,
--
-David
da...@pgmasters.net

Re: pgbench - add pseudo-random permutation function

2021-03-05 Thread David Steele


Hi Álvaro,

On 4/2/20 3:01 AM, Alvaro Herrera wrote:

On 2020-Apr-02, Fabien COELHO wrote:


I'm planning to mark this patch RwF on April 8 and I suggest you
resubmit if you are able to get some consensus.


People interested in non-uniform benchmarks would see the point. Why many
people would be happy with uniform benchmarks only while life is not uniform
at all fails me.


I don't think we should boot this patch.  I don't think I would be able
to get this over the commit line in this CF, but let's not discard it.


As far as I can see you are the only committer who has shown real 
interest in this patch. It's been sitting idle for the last year.


What are your current thoughts?

Regards,
--
-David
da...@pgmasters.net

Re: A reloption for partitioned tables - parallel_workers

2021-03-05 Thread Laurenz Albe

On Fri, 2021-03-05 at 22:55 +0900, Amit Langote wrote:
> On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe  wrote:
> > On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> > > For example, with the attached PoC patch:
> > 
> > I have incorporated your POC patch and added a regression test.
> > 
> > I didn't test it thoroughly though.
> 
> Thanks.  Although, I wonder if we should rather consider it a
> standalone patch to fix a partition planning code deficiency.

Oh - I didn't realize that your patch was independent.

Yours,
Laurenz Albe

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2021-03-05 Thread David Steele


Hi Georgios,

On 9/13/20 12:04 PM, Nikolay Shaplov wrote:

В письме от понедельник, 20 июля 2020 г. 18:36:44 MSK пользователь Georgios
Kokolatos написал:


thank you for the patch. It applies cleanly, compiles and passes check,
check-world.


Thank you for reviewing efforts.





Please find new version of the patch in the attachment.


Any thoughts on the updated patch?

Regards,
--
-David
da...@pgmasters.net

Re: A reloption for partitioned tables - parallel_workers

2021-03-05 Thread Amit Langote

On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe  wrote:
> On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> > For example, with the attached PoC patch:
>
> I have incorporated your POC patch and added a regression test.
>
> I didn't test it thoroughly though.

Thanks.  Although, I wonder if we should rather consider it a
standalone patch to fix a partition planning code deficiency.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: PROXY protocol support

2021-03-05 Thread Álvaro Hernández




On 5/3/21 10:03, Magnus Hagander wrote:
> On Fri, Mar 5, 2021 at 1:33 AM Álvaro Hernández  wrote:
>>
>>
>> On 5/3/21 0:21, Jacob Champion wrote:
>>> On Thu, 2021-03-04 at 21:45 +0100, Magnus Hagander wrote:
 On Thu, Mar 4, 2021 at 9:07 PM Jacob Champion  wrote:
> Idle thought I had while setting up a local test rig: Are there any
> compelling cases for allowing PROXY packets to arrive over Unix
> sockets? (By which I mean, the proxy is running on the same machine as
> Postgres, and connects to it using the .s.PGSQL socket file instead of
> TCP.) Are there cases where you want some other software to interact
> with the TCP stack instead of Postgres, but it'd still be nice to have
> the original connection information available?
 I'm uncertain what that usecase would be for something like haproxy,
 tbh. It can't do connection pooling, so adding it on the same machine
 as postgres itself wouldn't really add anything, I think?
>>> Yeah, I wasn't thinking HAproxy so much as some unspecified software
>>> appliance that's performing Some Task before allowing a TCP client to
>>> speak to Postgres. But it'd be better to hear from someone that has an
>>> actual use case, instead of me spitballing.
>> Here's a use case: Envoy's Postgres filter (see [1], [2]). Right now
>> is able to capture protocol-level metrics and send them to a metrics
>> collector (eg. Prometheus) while proxying the traffic. More capabilities
>> are being added as of today, and will eventually manage HBA too. It
>> would greatly benefit from this proposal, since it proxies the traffic
>> with, obviously, its IP, not the client's. It may be used (we do)
>> locally fronting Postgres, via UDS (so it can be easily trusted).
> Yeah, Envoy is definitely a great example of a usecase for the proxy
> protocol in general.

    Actually Envoy already implements the Proxy protocol:
https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/proxy_protocol.html
But I believe it would need some further cooperation with the Postgres
filter, unless they can be chained directly. Still, Postgres needs to
understand it, which is what your patch would add (thanks!).

>
> Specifically about the Unix socket though -- doesn't envoy normally
> run on a different instance (or in a different container at least),
> thus normally uses tcp between envoy and postgres? Or would it be a
> reasonable usecase that you ran it locally on the postgres server,
> having it speak IP to the clients but unix sockets to the postgres
> backend? I guess maybe it is outside of the containerized world?
>

    This is exactly the architecture we use at StackGres [1][2]. We use
Envoy as a sidecar (so it runs on the same pod, server as Postgres) and
connects via UDS. But then exposes the connection to the outside clients
via TCP/IP. So in my opinion it is quite applicable to the container
world :)


    Álvaro


[1] https://stackgres.io
[2]
https://stackgres.io/doc/latest/intro/architecture/#stackgres-pod-architecture-diagram

-- 

Alvaro Hernandez


---
OnGres

Re: A reloption for partitioned tables - parallel_workers

2021-03-05 Thread Laurenz Albe

On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:
> For example, with the attached PoC patch:

I have incorporated your POC patch and added a regression test.

I didn't test it thoroughly though.

Yours,
Laurenz Albe
From 34f7da98b34bc1dbf7daca9e2ca6055c80d77f43 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 5 Mar 2021 14:43:34 +0100
Subject: [PATCH] Allow setting parallel_workers on partitioned tables.

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase when used with cstore_fdw
column store partitions.

This is also useful to lower the number of parallel workers, for
example when the executor is expected to prune most partitions.

The setting also affects parallel appends for partitionwise aggregates
on a partitioned table, but not partitionwise joins.

Authors: Seamus Abshere, Amit Langote, Laurenz Albe
Discussion: https://postgr.es/m/95b1dd96-8634-4545-b1de-e2ac779beb44%40www.fastmail.com
---
 doc/src/sgml/ref/create_table.sgml|  15 +-
 src/backend/access/common/reloptions.c|  14 +-
 src/backend/optimizer/path/allpaths.c | 155 ++
 src/backend/optimizer/plan/planner.c  |  19 +++
 src/include/utils/rel.h   |  15 +-
 .../regress/expected/partition_aggregate.out  |  73 -
 src/test/regress/sql/partition_aggregate.sql  |  19 ++-
 7 files changed, 231 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 3b2b227683..c8c14850c1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS numeric_literal, REM
 If a table parameter value is set and the
 equivalent toast. parameter is not, the TOAST table
 will use the table's parameter value.
-Specifying these parameters for partitioned tables is not supported,
-but you may specify them for individual leaf partitions.
+These parameters, with the exception of parallel_workers,
+are not supported on partitioned tables, but you may specify them for
+individual leaf partitions.

 

@@ -1401,9 +1402,13 @@ WITH ( MODULUS numeric_literal, REM
  
   This sets the number of workers that should be used to assist a parallel
   scan of this table.  If not set, the system will determine a value based
-  on the relation size.  The actual number of workers chosen by the planner
-  or by utility statements that use parallel scans may be less, for example
-  due to the setting of .
+  on the relation size and the number of scanned partitions.
+  When set on a partitioned table, the specified number of workers will
+  work on distinct partitions, so the number of partitions affected by the
+  parallel operation should be taken into account.
+  The actual number of workers chosen by the planner or by utility
+  statements that use parallel scans may be less, for example due
+  to the setting of .
  
 

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
 		{
 			"parallel_workers",
 			"Number of parallel processes that can be used per executor node for this relation.",
-			RELOPT_KIND_HEAP,
+			RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
 			ShareUpdateExclusiveLock
 		},
 		-1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
 	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
+	 * Currently the only setting known to be useful for partitioned tables
+	 * is parallel_workers.
 	 */
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(PartitionedTableRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 	  RELOPT_KIND_PARTITIONED,
-	  0, NULL, 0);
+	  sizeof(PartitionedTableRdOptions),
+	  tab, lengthof(tab));
 }
 
 /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..e22cb6f33e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,6 +97,9 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 Index rti, RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	Index rti, RangeTblEntry *rte);
+static int compute_append_parallel_workers(RelOptInfo *rel, List *s

Re: Nicer error when connecting to standby with hot_standby=off

2021-03-05 Thread David Steele


Hi Fujii,

On 9/8/20 1:17 PM, James Coleman wrote:

On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
 wrote:

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?


Updated patch attached.


Any thoughts on the updated patch?

Regards,
--
-David
da...@pgmasters.net

Re: n_mod_since_analyze isn't reset at table truncation

2021-03-05 Thread Masahiko Sawada

On Fri, Mar 5, 2021 at 6:51 PM Julien Rouhaud  wrote:
>
> On Fri, Mar 05, 2021 at 06:07:05PM +0900, Fujii Masao wrote:
> >
> > On 2021/03/05 15:59, Julien Rouhaud wrote:
> > >
> > > I don't especially want to defer autoanalyze in that case.  But an 
> > > autoanalyze
> > > happening quickly after a TRUNCATE is critical for performance, I'd 
> > > prefer to
> > > find a way to trigger autoanalyze reliably.
> >
> > One just idea is to make TRUNCATE increase n_mod_since_analyze by
> > the number of records to truncate. That is, we treat TRUNCATE
> > in the same way as "DELETE without WHERE".

Makes sense. I had been thinking we can treat TRUNCATE as like "DROP
TABLE and CREATE TABLE" in terms of the statistics but it's rather
"DELETE without WHERE" as you mentioned.

>
> > If the table has lots of records and is truncated, n_mod_since_analyze
> > will be increased very much and which would trigger autoanalyze soon.
> > This might be expected behavior because the statistics collected before
> > truncate is very "different" from the status of the table after truncate.
> >
> > OTOH, if the table is very small, TRUNCATE doesn't increase
> > n_mod_since_analyze so much. So analyze might not be triggered soon.
> > But this might be ok because the statistics collected before truncate is
> > not so "different" from the status of the table after truncate.
> >
> > I'm not sure how much this idea is "reliable" and would be helpful in
> > practice, though.
>
> It seems like a better approach as it it would have the same results on
> autovacuum as a DELETE, so +1 from me.

I think we can use n_live_tup for that but since it's an estimation
value it doesn't necessarily have the same result as DELETE and I'm
not sure it's reliable.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: libpq debug log

2021-03-05 Thread alvhe...@alvh.no-ip.org

On 2021-Mar-05, tsunakawa.ta...@fujitsu.com wrote:

> From: Tom Lane 
> > But I think passing the message start address explicitly might be
> > better than having it understand the buffering behavior in enough
> > detail to know where to find the message.  Part of the point here
> > (IMO) is to decouple the tracing logic from the core libpq logic, in
> > hopes of not having common-mode bugs.
> 
> Ouch, you're perfectly right.  Then let's make the signature:
> 
> void pqLogMessage(PGconn *conn, const char *message, bool toServer);

Yeah, looks good!  I agree that going this route will result in more
trustworthy trace output.

-- 
Álvaro Herrera39°49'30"S 73°17'W

Re: Which PG version does CVE-2021-20229 affected?

2021-03-05 Thread Michael Banck

On Fri, Mar 05, 2021 at 04:38:17PM +0900, Michael Paquier wrote:
> On Fri, Mar 05, 2021 at 12:32:43AM -0700, bchen90 wrote:
> > NVD link：
> > 
> > https://nvd.nist.gov/vuln/detail/CVE-2021-20229#vulnCurrentDescriptionTitle
> 
> This link includes incorrect information.  CVE-2021-20229 is only a
> problem in 13.0 and 13.1, fixed in 13.2.  Please see for example here:
> https://www.postgresql.org/support/security/

Probably because the referenced Red Hat bugzilla bug claims it's
affecting all back branches and they scrapes that info from there:

https://bugzilla.redhat.com/show_bug.cgi?id=1925296


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-05 Thread Amit Kapila

On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow  wrote:
>
> On Fri, Mar 5, 2021 at 9:35 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 5, 2021 at 8:24 AM Greg Nancarrow  wrote:
> > >
> >
> > In patch v21-0003-Add-new-parallel-dml-GUC-and-table-options, we are
> > introducing GUC (enable_parallel_dml) and table option
> > (parallel_dml_enabled) for this feature. I am a bit worried about
> > using *_dml in the names because it is quite possible that for
> > parallel updates and parallel deletes we might not need any such GUC.
> > The reason we mainly need here is due to checking of parallel-safety
> > of partitioned tables and updates/deletes handle partitioned tables
> > differently than inserts so those might not be that costly. It is
> > possible that they are costly due to a different reason but not sure
> > mapping those to one GUC or table option is a good idea. Can we
> > consider using *_insert instead? I think GUC having _insert can be
> > probably used for a parallel copy (from) as well which I think will
> > have a similar overhead.
> >
>
> I'll need to think about that one.
>

Okay, one more minor comment on the same patch:
+ /*
+ * Check if parallel_dml_enabled is enabled for the target table,
+ * if not, skip the safety checks.
+ *
+ * (Note: if the target table is partitioned, the parallel_dml_enabled
+ * option setting of the partitions are ignored).
+ */
+ rte = rt_fetch(parse->resultRelation, parse->rtable);
+ rel = table_open(rte->relid, NoLock);

I think here the patch is using NoLock based on the assumption that
the relation is already locked in parse/analyze phase, so it's better
to have a comment like the one we have in
target_rel_max_parallel_hazard.

> I may be wrong, but I would have thought at least updates would have
> similar parallel-safety checking requirements to inserts and would
> have similar potential cost issues.
>

For updates and deletes, we go via inheritance planner where we
already do some work related to partitions, so I think we might not
need to open all the partitions for parallel-safety checks as we do
for inserts. I am sure some part of the code for inserts like the one
we have max_parallel_hazard will be used for updates as well but there
will be probably some change for checking parallel-safety for
partitioned relations.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Pavel Stehule

Hi

pá 5. 3. 2021 v 13:44 odesílatel Joel Jacobson  napsal:

> Idea #5:
>
> Allow disabling canonicalization via optional parameter to range
> constructor functions.
>

I think so rules describing ranges and multirages are long enough, so
increasing functionality doesn't look like a practical idea.

I prefere special simple composite type like you described in the previous
email (start, stop) or (start, length). It can be used more times when
using range or multi range is not practical.

The composite types are more natural for this purpose than 2D arrays.

Regards

Pavel


> This would then allow using the range type,
> to create inclusive/inclusive integer ranges,
> where lower() and upper() would return what you expect.
>
> /Joel
>

RE: libpq debug log

2021-03-05 Thread k.jami...@fujitsu.com

Hi Alvaro-san and Horiguchi-san,
CC) Iwata-san, Tsunakawa-san

Attached is the V23 of the libpq trace patch.

(1)
From: Álvaro Herrera 
> It appears that something is still wrong.  I applied lipq pipeline v27 from 
> [1]
> and ran src/test/modules/test_libpq/pipeline singlerow, after patching it to
> do PQtrace() after PQconn(). Below is the output I get from that. The
> noteworthy point is that "ParseComplete" messages appear multiple times
> for some reason ... but that's quite odd, because if I look at the network 
> traffic
> with Wireshark I certainly do not see the ParseComplete message being sent
> three times.

I applied Alvaro's libpq pipeline patch (v33) [1] on top of this libpq trace 
patch,
then traced the src/test/modules/libpq_pipeline/libpq_pipeline.c 
test_singlerowmode.
It's still the same trace output even after applying the fix recommendations
of Horiguchi-san.

>   Parse   38 "" "SELECT generate_series(42, $1)" #0
>   Bind20 "" "" #0 #1 2 '44' #1 #0
>   Describe6 P ""
>   Execute 9 "" 0
>   Parse   38 "" "SELECT generate_series(42, $1)" #0
>   Bind20 "" "" #0 #1 2 '45' #1 #0
>   Describe6 P ""
>   Execute 9 "" 0
>   Parse   38 "" "SELECT generate_series(42, $1)" #0
>   Bind20 "" "" #0 #1 2 '46' #1 #0
>   Describe6 P ""
>   Execute 9 "" 0
>   Sync4
<   ParseComplete   4
<   BindComplete4
<   RowDescription  40 #1 "generate_series" 0 #0 23 #4 -1 #0
<   DataRow 12 #1 2 '42'
<   DataRow 12 #1 2 '43'
<   DataRow 12 #1 2 '44'
<   CommandComplete 13 "SELECT 3"
<   ParseComplete   4
<   ParseComplete   4
<   ParseComplete   4
<   BindComplete4
<   RowDescription  40 #1 "generate_series" 0 #0 23 #4 -1 #0
<   DataRow 12 #1 2 '42'
<   DataRow 12 #1 2 '43'
<   DataRow 12 #1 2 '44'
<   DataRow 12 #1 2 '45'
<   CommandComplete 13 "SELECT 4"
<   ParseComplete   4
<   ParseComplete   4
<   ParseComplete   4
<   BindComplete4
<   RowDescription  40 #1 "generate_series" 0 #0 23 #4 -1 #0
<   DataRow 12 #1 2 '42'
<   DataRow 12 #1 2 '43'
<   DataRow 12 #1 2 '44'
<   DataRow 12 #1 2 '45'
<   DataRow 12 #1 2 '46'
<   CommandComplete 13 "SELECT 5"
<   ReadyForQuery   5 I

ParseComplete still appears 3 times, but I wonder if that is expected only for 
pipeline's singlerowmode test.
From my observation, when I traced the whole regression test output
by putting PQtrace() in fe-connect.c: connectDBComplete(),
ParseComplete appears only once or twice, etc. for other commands.
In other words, ISTM, it's a case-to-case basis:
Some examples from the whole regression trace output is below:
a.
>   Describe6 P ""
>   Execute 9 "" 0
>   Sync4
<   ParseComplete   4
<   BindComplete4
b.
<   ErrorResponse   217 S "ERROR" V "ERROR" C "42883" M "function 
no_such_function(integer) does not exist" H "No function matches the given name 
and argument types. You might need to add explicit type casts." P "8" F 
"parse_func.c" L "629" R "ParseFuncOrColumn" \x00
<   ReadyForQuery   5 I
<   ParseComplete   4
<   ParseComplete   4
c.
>   Bind31 "" "my_insert" #0 #1 4 '9652' #1 #0
>   Describe6 P ""
>   Execute 9 "" 0
<   ParseComplete   4
<   ParseComplete   4
<   ParseComplete   4
<   BindComplete4
<   NoData  4
d.
<   CommandComplete 15 "INSERT 0 2"
<   ReadyForQuery   5 T
>   Parse   89 "i" "insert into test (name, amount, letter) select name, 
amount+$1, letter from test" #0
>   Sync4
<   ParseComplete   4
<   ReadyForQuery   5 T

As you can see from the examples above, ParseComplete appears in multiples/once 
depending on the test.
As for libpq_pipeline's test_singlerowmode, it appears 3x for the 
CommandComplete SELECT.
I am wondering now whether or not it's a bug. Maybe it's not.
Thoughts?


How I traced the whole regression test: fe-connect.c: connectDBComplete()

@@ -2114,6 +2137,7 @@ connectDBComplete(PGconn *conn)
int timeout = 0;
int last_whichhost = -2;/* certainly different 
from whichhost */
struct addrinfo *last_addr_cur = NULL;
+   FILE*trace_file;
 
if (conn == NULL || conn->status == CONNECTION_BAD)
return 0;
@@ -2171,6 +2195,9 @@ connectDBComplete(PGconn *conn)
switch (flag)
{
case PGRES_POLLING_OK:
+   trace_file = 
fopen("/home/postgres/tracelog/regression_trace

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson

Idea #5:

Allow disabling canonicalization via optional parameter to range constructor 
functions.

This would then allow using the range type,
to create inclusive/inclusive integer ranges,
where lower() and upper() would return what you expect.

/Joel

Re: [patch] bit XOR aggregate functions

2021-03-05 Thread Alexey Bashtanov


Hi all,

Thanks for your reviews.
I've updated my patch to the current master and added a documentation 
line suggesting using the new function as a checksum.


Best regards, Alex

On 04/03/2021 17:14, Ibrar Ahmed wrote:



On Wed, Mar 3, 2021 at 7:30 PM Peter Eisentraut 
> wrote:


On 10.02.21 06:42, Kyotaro Horiguchi wrote:
> We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought
> as it falls into the same category with BIT_AND and BIT_OR, that is,
> we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and BIT_OR.
The latter give you an "all" or "any" result of the bits set. 
BIT_XOR
will return 1 or true if an odd number of inputs are 1 or true, which
isn't useful by itself.  But it can be used as a checksum, so it
seems
pretty reasonable to me to add it.  Perhaps the use case could be
pointed out in the documentation.




Hi Alex,

The patch is failing just because of a comment, which is already 
changed by another patch


-/* Define to build with OpenSSL support. (--with-ssl=openssl) */

+/* Define to 1 if you have OpenSSL support. */


Do you mind sending an updated patch?

http://cfbot.cputube.org/patch_32_2980.log.

I am changing the status to "Waiting for Author"


In my opinion that change no more requires so I removed that and 
attached the patch.


--
Ibrar Ahmed


diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..dff158e99a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19198,6 +19198,33 @@ SELECT NULLIF(value, '(none)') ...
Yes
   
 
+  
+   
+
+ bit_xor
+
+bit_xor ( smallint )
+smallint
+   
+   
+bit_xor ( integer )
+integer
+   
+   
+bit_xor ( bigint )
+bigint
+   
+   
+bit_xor ( bit )
+bit
+   
+   
+Computes the bitwise exclusive OR of all non-null input values.
+May be useful as a checksum for an unordered set of values.
+   
+   Yes
+  
+
   

 
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 5c1f962251..0d8c5a922a 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -505,18 +505,26 @@
   aggcombinefn => 'int2and', aggtranstype => 'int2' },
 { aggfnoid => 'bit_or(int2)', aggtransfn => 'int2or', aggcombinefn => 'int2or',
   aggtranstype => 'int2' },
+{ aggfnoid => 'bit_xor(int2)', aggtransfn => 'int2xor', aggcombinefn => 'int2xor',
+  aggtranstype => 'int2' },
 { aggfnoid => 'bit_and(int4)', aggtransfn => 'int4and',
   aggcombinefn => 'int4and', aggtranstype => 'int4' },
 { aggfnoid => 'bit_or(int4)', aggtransfn => 'int4or', aggcombinefn => 'int4or',
   aggtranstype => 'int4' },
+{ aggfnoid => 'bit_xor(int4)', aggtransfn => 'int4xor', aggcombinefn => 'int4xor',
+  aggtranstype => 'int4' },
 { aggfnoid => 'bit_and(int8)', aggtransfn => 'int8and',
   aggcombinefn => 'int8and', aggtranstype => 'int8' },
 { aggfnoid => 'bit_or(int8)', aggtransfn => 'int8or', aggcombinefn => 'int8or',
   aggtranstype => 'int8' },
+{ aggfnoid => 'bit_xor(int8)', aggtransfn => 'int8xor', aggcombinefn => 'int8xor',
+  aggtranstype => 'int8' },
 { aggfnoid => 'bit_and(bit)', aggtransfn => 'bitand', aggcombinefn => 'bitand',
   aggtranstype => 'bit' },
 { aggfnoid => 'bit_or(bit)', aggtransfn => 'bitor', aggcombinefn => 'bitor',
   aggtranstype => 'bit' },
+{ aggfnoid => 'bit_xor(bit)', aggtransfn => 'bitxor', aggcombinefn => 'bitxor',
+  aggtranstype => 'bit' },
 
 # xml
 { aggfnoid => 'xmlagg', aggtransfn => 'xmlconcat2', aggtranstype => 'xml' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 59d2b71ca9..506689d8ac 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7995,24 +7995,36 @@
 { oid => '2237', descr => 'bitwise-or smallint aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
+{ oid => '8452', descr => 'bitwise-xor smallint aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2238', descr => 'bitwise-and integer aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f',
   prorettype => 'int4', proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2239', descr => 'bitwise-or integer aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
+{ oid => '8453', descr => 'bitwise-xor integer aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proargtypes => 'int4', prosrc => 'aggrega

Re: Confusing behavior of psql's \e

2021-03-05 Thread Laurenz Albe

On Thu, 2021-03-04 at 16:51 +, Jacob Champion wrote:
> > I am not against fixing or improving the behavior, but given the
> > fact that we can't portably get less than second precision, it
> > seems impossible.
> 
> You could backdate the temporary file, so that any save is guaranteed
> to move the timestamp forward. That should work even if the filesystem
> has extremely poor precision.

Ah, of course, that is the way to go.
Attached is version 3 which does it like this.

Yours,
Laurenz Albe
From 7ff01271cb8ea5c9011a57ed613026ec7511d785 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 5 Mar 2021 13:34:12 +0100
Subject: [PATCH] Improve \e, \ef and \ev if the editor is quit

Before, if you edited a function or view definition or a
script file, quitting the editor without saving would cause
the *previous query* to be executed, which is certainly not
what anyone would expect.

It is better to execute nothing and clear the query buffer
in this case.

The behavior when editing the query buffer is unchanged:
quitting without saving will retain the query buffer.

Author: Laurenz Albe 
Reviewed-by: Chapman Flack, Jacob Champion
Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b.camel%40cybertec.at
---
 doc/src/sgml/ref/psql-ref.sgml | 10 --
 src/bin/psql/command.c | 56 +-
 2 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 13c1edfa4d..0bf69174ff 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1970,7 +1970,9 @@ testdb=>
 
 
 
-The new contents of the query buffer are then re-parsed according to
+If you edit a file or the previous query, and you quit the editor without
+modifying the file, the query buffer is cleared.
+Otherwise, the new contents of the query buffer are re-parsed according to
 the normal rules of psql, treating the
 whole buffer as a single line.  Any complete queries are immediately
 executed; that is, if the query buffer contains or ends with a
@@ -2039,7 +2041,8 @@ Tue Oct 26 21:40:57 CEST 1999
  in the form of a CREATE OR REPLACE FUNCTION or
  CREATE OR REPLACE PROCEDURE command.
  Editing is done in the same way as for \edit.
- After the editor exits, the updated command is executed immediately
+ If you quit the editor without saving, the statement is discarded.
+ If you save and exit the editor, the updated command is executed immediately
  if you added a semicolon to it.  Otherwise it is redisplayed;
  type semicolon or \g to send it, or \r
  to cancel.
@@ -2115,7 +2118,8 @@ Tue Oct 26 21:40:57 CEST 1999
  This command fetches and edits the definition of the named view,
  in the form of a CREATE OR REPLACE VIEW command.
  Editing is done in the same way as for \edit.
- After the editor exits, the updated command is executed immediately
+ If you quit the editor without saving, the statement is discarded.
+ If you save and exit the editor, the updated command is executed immediately
  if you added a semicolon to it.  Otherwise it is redisplayed;
  type semicolon or \g to send it, or \r
  to cancel.
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index c98e3d31d0..bd01d59d96 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifndef WIN32
 #include 			/* for stat() */
 #include /* open() flags */
@@ -151,7 +152,7 @@ static void copy_previous_query(PQExpBuffer query_buf, PQExpBuffer previous_buf)
 static bool do_connect(enum trivalue reuse_previous_specification,
 	   char *dbname, char *user, char *host, char *port);
 static bool do_edit(const char *filename_arg, PQExpBuffer query_buf,
-	int lineno, bool *edited);
+	int lineno, bool *edited, bool discard_on_quit);
 static bool do_shell(const char *command);
 static bool do_watch(PQExpBuffer query_buf, double sleep);
 static bool lookup_object_oid(EditableObjectType obj_type, const char *desc,
@@ -1003,6 +1004,13 @@ exec_command_edit(PsqlScanState scan_state, bool active_branch,
 			}
 			if (status != PSQL_CMD_ERROR)
 			{
+/*
+ * If the query buffer is empty, we'll edit the previous
+ * query. But in that case, we don't want to keep that if the
+ * editor is quit.
+ */
+bool		discard_on_quit = (query_buf->len == 0);
+
 expand_tilde(&fname);
 if (fname)
 	canonicalize_path(fname);
@@ -1010,7 +1018,7 @@ exec_command_edit(PsqlScanState scan_state, bool active_branch,
 /* If query_buf is empty, recall previous query for editing */
 copy_previous_query(query_buf, previous_buf);
 
-if (do_edit(fname, query_buf, lineno, NULL))
+if (do_edit(fname, query_buf, lineno, NULL, discard_on_quit))

Re: Disallow SSL compression?

2021-03-05 Thread Michael Paquier

On Fri, Mar 05, 2021 at 01:21:01PM +0100, Daniel Gustafsson wrote:
> On 5 Mar 2021, at 08:04, Michael Paquier  wrote:
>> FWIW, I would vote to nuke it from all those places, reducing a bit
>> pg_stat_get_activity() while on it.  Keeping it around in the system
>> catalogs may cause confusion IMHO, by making people think that it is
>> still possible to get into configurations where sslcompression could
>> be really enabled.  The rest of the patch looks fine to me.
> 
> Attached is a version which removes that as well.

Peter, Magnus, any comments about this point?

> I left the compression
> keyword in PQsslAttribute on purpose, not really for backwards compatibility
> (PQsslAttributeNames takes care of that) but rather since it's a more generic
> connection-info function.

Makes sense.
--
Michael


signature.asc
Description: PGP signature

Re: Disallow SSL compression?

2021-03-05 Thread Daniel Gustafsson

> On 5 Mar 2021, at 08:04, Michael Paquier  wrote:
> 
> On Thu, Mar 04, 2021 at 11:52:56PM +0100, Daniel Gustafsson wrote:
>> The attached version takes a step further and removes sslcompression from
>> pg_conn and just eats the value as there is no use in setting a dummy alue.  
>> It
>> also removes compression from PgBackendSSLStatus and be_tls_get_compression 
>> as
>> raised by Michael downthread.  I opted for keeping the column in pg_stat_ssl
>> with a note in the documentation that it will be removed, for the same
>> backwards compatibility reason of eating the connection param without acting 
>> on
>> it.  This might be overthinking it however.
> 
> FWIW, I would vote to nuke it from all those places, reducing a bit
> pg_stat_get_activity() while on it.  Keeping it around in the system
> catalogs may cause confusion IMHO, by making people think that it is
> still possible to get into configurations where sslcompression could
> be really enabled.  The rest of the patch looks fine to me.

Attached is a version which removes that as well.  I left the compression
keyword in PQsslAttribute on purpose, not really for backwards compatibility
(PQsslAttributeNames takes care of that) but rather since it's a more generic
connection-info function.

--
Daniel Gustafsson   https://vmware.com/



v5-0001-Disallow-SSL-compression.patch
Description: Binary data

Re: POC: GROUP BY optimization

2021-03-05 Thread Pavel Borisov

> Regression (stats_ext)  is failing because you forgot to drop the table
> created in a test
> case (aggregates),  It's a bit minor change so the attached patch fixes
> that issue.
>
> https://cirrus-ci.com/task/6704792446697472
>
> Thank you very much!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-03-05 Thread Bharath Rupireddy

On Fri, Mar 5, 2021 at 9:32 AM Japin Li  wrote:
> On Thu, 04 Mar 2021 at 14:53, Bharath Rupireddy 
>  wrote:
> > Thanks! I will look forward for more review comments.
> >
>
> v4-0001-Rearrange-Refresh-Mat-View-Code.patch
> -
>
> +static Oid
> +get_new_heap_oid(RefreshMatViewStmt *stmt, Relation matviewRel, Oid 
> matviewOid,
> +char *relpersistence)
> +{
> +   Oid OIDNewHeap;
> +   boolconcurrent;
> +   Oid tableSpace;
> +
> +   concurrent = stmt->concurrent;
> +
> +   /* Concurrent refresh builds new data in temp tablespace, and does 
> diff. */
> +   if (concurrent)
> +   {
> +   tableSpace = GetDefaultTablespace(RELPERSISTENCE_TEMP, false);
> +   *relpersistence = RELPERSISTENCE_TEMP;
> +   }
>
> Since the concurrent only use in one place, I think we can remove the local 
> variable
> concurrent in get_new_heap_oid().

Done.

> The others looks good to me.

Thanks.

Attaching v5 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 26970ffd33e67324609a03f0f61eeb6406216a7f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 5 Mar 2021 15:47:12 +0530
Subject: [PATCH v5 1/2] Rearrange Refresh Mat View Code

Currently, the function ExecRefreshMatView in matview.c is having
many lines of code which is not at all good from readability and
maintainability perspectives. This patch adds few functions and
moves the code from ExecRefreshMatView to them making the code
look better.
---
 src/backend/commands/matview.c | 449 -
 1 file changed, 269 insertions(+), 180 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c5c25ce11d..18e18fa627 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -64,7 +64,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty
 static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
-static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
+static uint64 refresh_matview_datafill(Oid OIDNewHeap, Query *query,
 	   const char *queryString);
 static char *make_temptable_name_n(char *tempname, int n);
 static void refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
@@ -73,6 +73,16 @@ static void refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersist
 static bool is_usable_unique_index(Relation indexRel);
 static void OpenMatViewIncrementalMaintenance(void);
 static void CloseMatViewIncrementalMaintenance(void);
+static Query *get_matview_query(RefreshMatViewStmt *stmt, Relation *rel,
+Oid *objectId);
+static Query *rewrite_refresh_matview_query(Query *dataQuery);
+static Oid	get_new_heap_oid(RefreshMatViewStmt *stmt, Relation matviewRel,
+			 Oid matviewOid, char *relpersistence);
+static void match_matview_with_new_data(RefreshMatViewStmt *stmt,
+		Relation matviewRel, Oid matviewOid,
+		Oid OIDNewHeap, Oid relowner,
+		int save_sec_context,
+		char relpersistence, uint64 processed);
 
 /*
  * SetMatViewPopulatedState
@@ -140,127 +150,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 {
 	Oid			matviewOid;
 	Relation	matviewRel;
-	RewriteRule *rule;
-	List	   *actions;
 	Query	   *dataQuery;
-	Oid			tableSpace;
-	Oid			relowner;
 	Oid			OIDNewHeap;
-	DestReceiver *dest;
 	uint64		processed = 0;
-	bool		concurrent;
-	LOCKMODE	lockmode;
+	Oid			relowner;
 	char		relpersistence;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	ObjectAddress address;
 
-	/* Determine strength of lock needed. */
-	concurrent = stmt->concurrent;
-	lockmode = concurrent ? ExclusiveLock : AccessExclusiveLock;
-
-	/*
-	 * Get a lock until end of transaction.
-	 */
-	matviewOid = RangeVarGetRelidExtended(stmt->relation,
-		  lockmode, 0,
-		  RangeVarCallbackOwnsTable, NULL);
-	matviewRel = table_open(matviewOid, NoLock);
-
-	/* Make sure it is a materialized view. */
-	if (matviewRel->rd_rel->relkind != RELKIND_MATVIEW)
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("\"%s\" is not a materialized view",
-		RelationGetRelationName(matviewRel;
-
-	/* Check that CONCURRENTLY is not specified if not populated. */
-	if (concurrent && !RelationIsPopulated(matviewRel))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("CONCURRENTLY cannot be used when the materialized view is not populated")));
-
-	/* Check that conflicting options have not been specified. */
-	if (concurrent && stmt->skipData)
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("CONCURRENTLY and WITH NO DATA

Re: POC: GROUP BY optimization

2021-03-05 Thread Ibrar Ahmed

On Thu, Oct 29, 2020 at 8:15 PM Pavel Borisov 
wrote:

> In case if I'm missing something and Pavel's proposal is significantly
>> different from the original patch (if I understand correctly, at the
>> moment the latest patch posted here is a rebase and adjusting the old
>> patch to work with the latest changes in master, right?), then indeed
>> they could be merged, but please in the newer thread [1].
>>
>
> Sure, my patch has the only difference from the previous Theodor's code
> for compatibility with v.13, though it is not small, and I appreciate the
> changes in paths processing. The only thing that caused my notice, is that
> some useful changes which I've mentioned before, are discarded now. But as
> long as they are planned to be put in later it is completely fine. I agree
> to discuss the thing in any thread, though I don't quite understand the
> reason for a switch.
>
> Still I don't see a problem.
>
> --
> Best regards,
> Pavel Borisov
>
> Postgres Professional: http://postgrespro.com 
>

Regression (stats_ext)  is failing because you forgot to drop the table
created in a test
case (aggregates),  It's a bit minor change so the attached patch fixes
that issue.

https://cirrus-ci.com/task/6704792446697472


-- 
Ibrar Ahmed


v12-0001-GROUP-BY-optimization-made-compatible-with-chang.patch
Description: Binary data

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-05 Thread Masahiro Ikeda


On 2021-03-05 12:47, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is 
also

incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" 
or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening parenthesis 
here and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query 
the

operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: "This is 
also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this 
event is
reported in wal_buffers_full in) This is undesirable 
because ..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly
computing the sync statistics but does require computing the 
write
statistics.  This is because of the presence of 
issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I 
observe that
the XLogWrite code path calls pgstat_report_wait_*() while the 
WAL
receiver path does not.  It seems technically straight-forward 
to
refactor here to avoid the almost-duplicated logic in the two 
places,
though I suspect there may be a trade-off for not adding 
another
function call to the stack given the importance of WAL 
processing
(though that seems marginalized compared to the cost of 
actually
writing the WAL).  Or, as Fujii noted, go the other way and 
don't have
any shared code between the two but instead implement the WAL 
receiver

one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL receiver 
stats messages between the WAL receiver and the stats 
collector, and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats 
are

collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process 
running
at that moment. IOW, it seems strange that some values show 
dynamic
stats and the others show collected stats, even though they are 
in

the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal 
view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed 
to minimize

+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() 
is called.
For example, if wal_writer_delay is set to several seconds, some 
values in
pg_stat_wal would be not up-to-date meaninglessly for those 
seconds.
So I'm thinking to withdraw my previous comment and it's ok to 
send

the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = GetCurrentTi

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson

On Thu, Mar 4, 2021, at 17:55, Gilles Darold wrote:
> I also think that it should return a setof 2-D integer array, an other 
> solution is to return all start/end positions of an occurrence chained 
> in an integer array {start1,end1,start2,end2,..}.

Hmm. Seems like we've in total managed to come up with three flawed ideas.

Pros/cons I see:

Idea #1: setof 2-D integer array
+ Packs the values into one single value.
- Difficult to work with 2-D arrays, doesn't work well with unnest(), has to 
inspect the dims and use for loops to extract values.
- Looking at a 2-D value, it's not obvious what the integer values means in it 
means. Which one is "startpos" and do we have "length" or "endpos" values?

Idea #2: setof (start_pos integer[], end_pos integer[])
+ It's obvious to the user what type of integers "start_pos" and "end_pos" 
contain.
- Decouples the values into two separate values.
- Tom mentioned some bad experiences with separate array values. (Details on 
this would be interesting.)

Idea #3: chained integer array {start1,end1,start2,end2,..}
- Mixes different values into the same value
- Requires maths (although simple calculations) to extract values

I think all three ideas (including mine) are ugly. None of them is wart free.

Idea #4: add a new composite built-in type.

A simple composite type with two int8 fields.

The field names seems to vary a lot between languages:

Rust: "start", "end" [1]
C++: "begin", "end" [2]
Python: "start", "stop" [3]

Such a simple composite type, could then always be used,
when you want to represent simple integer ranges,
between two exact values, arguably a very common need.

Such type could be converted to/from int8range,
but would have easily accessible field names,
which is simpler than using lower() and upper(),
since upper() always returns the canonical
exclusive upper bound for discrete types,
which is not usually what you want when
dealing with "start" and "end" integer ranges.

Since there is no type named just "range", why not just use this name?

Since "end" is a keyword, I suggest the "stop" name:

PoC:

CREATE TYPE range AS (start int8, stop int8);

A real implementation would of course also verify CHECK (start <= stop),
and would add conversions to/from int8range.

I realise this is probably a controversial idea.
But, I think this is a general common problem that deserves a clean general 
solution.

Thoughts? More ideas?

[1] https://doc.rust-lang.org/std/ops/struct.Range.html
[2] https://en.cppreference.com/w/cpp/ranges
[3] https://www.w3schools.com/python/ref_func_range.asp

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-05 Thread Amit Kapila

On Fri, Mar 5, 2021 at 8:24 AM Greg Nancarrow  wrote:
>

In patch v21-0003-Add-new-parallel-dml-GUC-and-table-options, we are
introducing GUC (enable_parallel_dml) and table option
(parallel_dml_enabled) for this feature. I am a bit worried about
using *_dml in the names because it is quite possible that for
parallel updates and parallel deletes we might not need any such GUC.
The reason we mainly need here is due to checking of parallel-safety
of partitioned tables and updates/deletes handle partitioned tables
differently than inserts so those might not be that costly. It is
possible that they are costly due to a different reason but not sure
mapping those to one GUC or table option is a good idea. Can we
consider using *_insert instead? I think GUC having _insert can be
probably used for a parallel copy (from) as well which I think will
have a similar overhead.

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] logical decoding of two-phase transactions

2021-03-05 Thread vignesh C

On Fri, Mar 5, 2021 at 12:21 PM Ajin Cherian  wrote:
>
> On Thu, Mar 4, 2021 at 9:53 PM Peter Smith  wrote:
>
> > [05a] Now syncing the psf file at prepare time
>
> The patch v46-0008 does not handle spooling of streaming prepare if
> the Subscription is configured for both two-phase and streaming.
> I feel that it would be best if we don't support both two-phase and
> streaming together in a subscription in this release.
> Probably a future release could handle this. So, changing the patch to
> not allow streaming and two-phase together.
> This new patch v49 has the following changes.
>
> * Don't support creating a subscription with both streaming and
> two-phase enabled.
> * Don't support altering a subscription enabling streaming if it was
> created with two-phase enabled.
> * Remove stream_prepare callback as a "required" callback, make it an
> optional callback and remove all code related to stream_prepare in the
> pgoutput plugin as well as in worker.c
>
> Also fixed
> * Don't support the alter of subscription setting two-phase. Toggling
> of two-phase mode using the alter command on the subscription can
> cause transactions to be missed and result in an inconsistent replica.
>

Thanks for the updated patch.
Few minor comments:

I'm not sure if we plan to change this workaround, if we are not
planning to change this workaround. We can reword the comments
suitably. We generally don't use workaround in our comments.
+   /*
+* Workaround Part 1 of 2:
+*
+* Make sure every tablesync has reached at least SYNCDONE state
+* before letting the apply worker proceed.
+*/
+   elog(DEBUG1,
+"apply_handle_begin_prepare, end_lsn = %X/%X,
final_lsn = %X/%X, lstate_lsn = %X/%X",
+LSN_FORMAT_ARGS(begin_data.end_lsn),
+LSN_FORMAT_ARGS(begin_data.final_lsn),
+LSN_FORMAT_ARGS(MyLogicalRepWorker->relstate_lsn));
+

We should include two_phase in tab completion (tab-complete.c file
psql_completion(const char *text, int start, int end) function) :
postgres=# create subscription sub1 connection 'port=5441
dbname=postgres' publication  pub1 with (
CONNECT COPY_DATA   CREATE_SLOT ENABLED
 SLOT_NAME   SYNCHRONOUS_COMMIT

+
+ 
+  It is not allowed to combine streaming set to
+  true and two_phase set to
+  true.
+ 
+
+
+   
+   
+two_phase (boolean)
+
+ 
+  Specifies whether two-phase commit is enabled for this subscription.
+  The default is false.
+ 
+
+ 
+  When two-phase commit is enabled then the decoded
transactions are sent
+  to the subscriber on the PREPARE TRANSACTION. By default,
the transaction
+  preapred on publisher is decoded as normal transaction at commit.
+ 
+
+ 
+  It is not allowed to combine two_phase set to
+  true and streaming set to
+  true.
+ 

It is not allowed to combine streaming set to true and two_phase set to true.
Should this be:
streaming option is not supported along with two_phase option.

Similarly here too:
It is not allowed to combine two_phase set to true and streaming set to true.
Should this be:
two_phase option is not supported along with streaming option.

Few indentation issues are present, we can run pgindent:
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+
  XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+
 LogicalRepPreparedTxnData *prepare_data);
+extern void logicalrep_write_commit_prepared(StringInfo out,
ReorderBufferTXN* txn,
+
  XLogRecPtr commit_lsn);

ReorderBufferTXN * should be ReorderBufferTXN*

Line exceeds 80 chars:
+   /*
+* Now that we replayed the psf it is no longer
needed. Just delete it.
+*/
+   prepare_spoolfile_delete(psfpath);

There is a typo, preapred should be prepared.
+ 
+  When two-phase commit is enabled then the decoded
transactions are sent
+  to the subscriber on the PREPARE TRANSACTION. By default,
the transaction
+  preapred on publisher is decoded as normal transaction at commit.
+ 

Regards,
Vignesh

Re: n_mod_since_analyze isn't reset at table truncation

2021-03-05 Thread Julien Rouhaud

On Fri, Mar 05, 2021 at 06:07:05PM +0900, Fujii Masao wrote:
> 
> On 2021/03/05 15:59, Julien Rouhaud wrote:
> > 
> > I don't especially want to defer autoanalyze in that case.  But an 
> > autoanalyze
> > happening quickly after a TRUNCATE is critical for performance, I'd prefer 
> > to
> > find a way to trigger autoanalyze reliably.
> 
> One just idea is to make TRUNCATE increase n_mod_since_analyze by
> the number of records to truncate. That is, we treat TRUNCATE
> in the same way as "DELETE without WHERE".

Yes, that's the approach I had in mind to make it more reliable.

> If the table has lots of records and is truncated, n_mod_since_analyze
> will be increased very much and which would trigger autoanalyze soon.
> This might be expected behavior because the statistics collected before
> truncate is very "different" from the status of the table after truncate.
> 
> OTOH, if the table is very small, TRUNCATE doesn't increase
> n_mod_since_analyze so much. So analyze might not be triggered soon.
> But this might be ok because the statistics collected before truncate is
> not so "different" from the status of the table after truncate.
> 
> I'm not sure how much this idea is "reliable" and would be helpful in
> practice, though.

It seems like a better approach as it it would have the same results on
autovacuum as a DELETE, so +1 from me.

Re: Printing backtrace of postgres processes

2021-03-05 Thread torikoshia


On 2021-03-04 21:55, Bharath Rupireddy wrote:
On Mon, Mar 1, 2021 at 10:43 AM torikoshia  
wrote:

Since the current patch use BackendPidGetProc(), it does not
support this feature not only postmaster, logging, and
statistics but also checkpointer, background writer, and
walwriter.

And when I specify pid of these PostgreSQL processes, it
says "PID  is not a PostgreSQL server process".

I think it may confuse users, so it might be worth
changing messages for those PostgreSQL processes.
AuxiliaryPidGetProc() may help to do it.


Exactly this was the doubt I got when I initially reviewed this patch.
And I felt it should be discussed in a separate thread, you may want
to update your thoughts there [1].

[1] -
https://www.postgresql.org/message-id/CALj2ACW7Rr-R7mBcBQiXWPp%3DJV5chajjTdudLiF5YcpW-BmHhg%40mail.gmail.com


Thanks!
I'm going to join the discussion there.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: [PATCH] pgbench: Bug fix for the -d option

2021-03-05 Thread Fujii Masao





On 2021/03/05 16:33, Michael Paquier wrote:

On Fri, Mar 05, 2021 at 01:30:11PM +0900, Fujii Masao wrote:

if ((env = getenv("PGDATABASE")) != NULL && *env != '\0')
dbName = env;
-   else if (login != NULL && *login != '\0')
-   dbName = login;
+   else if ((env = getenv("PGUSER")) != NULL && *env != '\0')
+   dbName = env;
else
-   dbName = "";
+   dbName = get_user_name_or_exit(progname);

If dbName is set by libpq, the above also is not necessary?


If you remove this part, pgbench loses some log information if
PQconnectdbParams() returns NULL, like on OOM if the database name is
NULL.  Perhaps that's not worth caring about here for a single log
failure, but that's the reason is why I left this part around.


Understood. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: PROXY protocol support

2021-03-05 Thread Magnus Hagander

On Fri, Mar 5, 2021 at 12:21 AM Jacob Champion  wrote:
>
> On Thu, 2021-03-04 at 21:45 +0100, Magnus Hagander wrote:
> > On Thu, Mar 4, 2021 at 9:07 PM Jacob Champion  wrote:
> > > Idle thought I had while setting up a local test rig: Are there any
> > > compelling cases for allowing PROXY packets to arrive over Unix
> > > sockets? (By which I mean, the proxy is running on the same machine as
> > > Postgres, and connects to it using the .s.PGSQL socket file instead of
> > > TCP.) Are there cases where you want some other software to interact
> > > with the TCP stack instead of Postgres, but it'd still be nice to have
> > > the original connection information available?
> >
> > I'm uncertain what that usecase would be for something like haproxy,
> > tbh. It can't do connection pooling, so adding it on the same machine
> > as postgres itself wouldn't really add anything, I think?
>
> Yeah, I wasn't thinking HAproxy so much as some unspecified software
> appliance that's performing Some Task before allowing a TCP client to
> speak to Postgres. But it'd be better to hear from someone that has an
> actual use case, instead of me spitballing.
>
> > Iid think about the other end, if you had a proxy on a different
> > machine accepting unix connections and passing them on over
> > PROXY-over-tcp. But I doubt it's useful to know it was unix in that
> > case  (since it still couldn't do peer or such for the auth) --
> > instead, that seems like an argument where it'd be better to proxy
> > without using PROXY and just letting the IP address be.
>
> You could potentially design a system that lets you proxy a "local all
> all trust" setup from a different (trusted) machine, without having to
> actually let people onto the machine that's running Postgres. That
> would require some additional authentication on the PROXY connection
> (i.e. something stronger than host-based auth) to actually be useful.
>
> -- other notes --
>
> A small nitpick on the current separate-port PoC is that I'm forced to
> set up a "regular" TCP port, even if I only want the PROXY behavior.

Yeah. I'm not sure there's a good way to avoid that without making
configuations a lot more complex.


> The original-host logging isn't working for me:
>
>   WARNING:  pg_getnameinfo_all() failed: ai_family not supported
>   LOG:  proxy connection from: host=??? port=???
>
> and I think the culprit is this:
>
> >/* Store a copy of the original address, for logging */
> >memcpy(&raddr_save, &port->raddr, port->raddr.salen);
>
> port->raddr.salen is the length of port->raddr.addr; we want the length
> of the copy to be sizeof(port->raddr) here, no?

That's interesting -- it works perfectly fine here. What platform are
you testing on?

But yes, you are correct, it should do that. I guess it's a case of
the salen actually ending up being uninitialized in the copy, and thus
failing at a later stage. (I sent for sizeof(SockAddr) to make it
easier to read without having to look things up, but the net result is
the same)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: PROXY protocol support

2021-03-05 Thread Magnus Hagander

On Fri, Mar 5, 2021 at 12:08 AM Tatsuo Ishii  wrote:
>
> >> On Thu, 2021-03-04 at 10:42 +0900, Tatsuo Ishii wrote:
> >> > Is there any formal specification for the "a protocol common and very
> >> > light weight in proxies"?
> >>
> >> See
> >>
> >> https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
> >
> > Yeah, it's currently in one of the comments, but should probably be
> > added to the docs side as well.
>
> It seems the protocol is HAproxy product specific and I think it would
> be better to be mentioned in the docs.

It's definitely not HAProxy specific, it's more or less an industry
standard. It's just maintained by them. That said, yes, it should be
referenced in the docs.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Kyotaro Horiguchi

At Fri, 05 Mar 2021 16:51:17 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> The difference comes from the difference of shared_buffers. In the
> "allows_streaming" case, PostgresNode::init() *reduces* the number
> down to '1MB'(128 blocks) which leads to only 8 XLOGbuffers, which
> will very soon be wrap-arounded, which leads to XLogWrite.
> 
> When allows_streaming=0 case, the default size of shared_buffers is
> 128MB (16384 blocks).  WAL buffer (512) doesn't get wrap-arounded
> during the test and no WAL buffer is written out in that case.

So I think we need to remove the shared_buffers setting for the
allows_streamig case in PostgresNode.pm

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: n_mod_since_analyze isn't reset at table truncation

2021-03-05 Thread Fujii Masao





On 2021/03/05 15:59, Julien Rouhaud wrote:

On Fri, Mar 05, 2021 at 03:31:33PM +0900, Fujii Masao wrote:


On 2021/03/04 12:40, Julien Rouhaud wrote:

In that case, conversely, we want to trigger autoanalyze ASAP because the 
contents in the table was changed very much?


We might want, but wouldn't keeping the current n_mod_since_analyze would make
things unpredictable?


I don't think so. autoanalyze still works based on the number of
modifications since last analyze, so ISTM that's predictable...


If we keep n_mod_since_analyze, autoanalyze might trigger after the first write
or might wait for a full cycle of writes, depending on the kept value.  So yes
it can make autoanalyze triggers earlier in some cases, but that's not
predictable from the TRUNCATE even point of view.


Also the selectivity estimation functions already take into account the actual
relation size, so the estimates aren't completely off after a truncate.


But the statistics is out of date and which may affect the planning badly?
Also if generic plan is used, it's not updated until next analyze, i.e.,
generic plan created based on old statistics is used for a while.

So I'm still not sure why you want to defer autoanalyze in that case.


I don't especially want to defer autoanalyze in that case.  But an autoanalyze
happening quickly after a TRUNCATE is critical for performance, I'd prefer to
find a way to trigger autoanalyze reliably.


One just idea is to make TRUNCATE increase n_mod_since_analyze by
the number of records to truncate. That is, we treat TRUNCATE
in the same way as "DELETE without WHERE".

If the table has lots of records and is truncated, n_mod_since_analyze
will be increased very much and which would trigger autoanalyze soon.
This might be expected behavior because the statistics collected before
truncate is very "different" from the status of the table after truncate.

OTOH, if the table is very small, TRUNCATE doesn't increase
n_mod_since_analyze so much. So analyze might not be triggered soon.
But this might be ok because the statistics collected before truncate is
not so "different" from the status of the table after truncate.

I'm not sure how much this idea is "reliable" and would be helpful in
practice, though.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: PROXY protocol support

2021-03-05 Thread Magnus Hagander

On Fri, Mar 5, 2021 at 1:33 AM Álvaro Hernández  wrote:
>
>
>
> On 5/3/21 0:21, Jacob Champion wrote:
> > On Thu, 2021-03-04 at 21:45 +0100, Magnus Hagander wrote:
> >> On Thu, Mar 4, 2021 at 9:07 PM Jacob Champion  wrote:
> >>> Idle thought I had while setting up a local test rig: Are there any
> >>> compelling cases for allowing PROXY packets to arrive over Unix
> >>> sockets? (By which I mean, the proxy is running on the same machine as
> >>> Postgres, and connects to it using the .s.PGSQL socket file instead of
> >>> TCP.) Are there cases where you want some other software to interact
> >>> with the TCP stack instead of Postgres, but it'd still be nice to have
> >>> the original connection information available?
> >> I'm uncertain what that usecase would be for something like haproxy,
> >> tbh. It can't do connection pooling, so adding it on the same machine
> >> as postgres itself wouldn't really add anything, I think?
> > Yeah, I wasn't thinking HAproxy so much as some unspecified software
> > appliance that's performing Some Task before allowing a TCP client to
> > speak to Postgres. But it'd be better to hear from someone that has an
> > actual use case, instead of me spitballing.
>
> Here's a use case: Envoy's Postgres filter (see [1], [2]). Right now
> is able to capture protocol-level metrics and send them to a metrics
> collector (eg. Prometheus) while proxying the traffic. More capabilities
> are being added as of today, and will eventually manage HBA too. It
> would greatly benefit from this proposal, since it proxies the traffic
> with, obviously, its IP, not the client's. It may be used (we do)
> locally fronting Postgres, via UDS (so it can be easily trusted).

Yeah, Envoy is definitely a great example of a usecase for the proxy
protocol in general.

Specifically about the Unix socket though -- doesn't envoy normally
run on a different instance (or in a different container at least),
thus normally uses tcp between envoy and postgres? Or would it be a
reasonable usecase that you ran it locally on the postgres server,
having it speak IP to the clients but unix sockets to the postgres
backend? I guess maybe it is outside of the containerized world?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: PROXY protocol support

2021-03-05 Thread Magnus Hagander

On Fri, Mar 5, 2021 at 12:57 AM Hannu Krosing  wrote:
>
> The current proposal seems to miss the case of transaction pooling
> (and statement pooling) where the same established connection
> multiplexes transactions / statements from multiple remote clients.

Not at all.

The current proposal is there to implement the PROXY protocol. It
doesn't try to do anything with connection pooling at all.

Solving a similar problem for connection poolers would also definitely
be a useful thing, but it is entirely out of scope of this patch, and
is a completely separate implementation.

I'd definitely like to see that one solved as well, but let's look at
it on a different thread so we don't derail this one.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Thomas Munro

On Fri, Mar 5, 2021 at 5:40 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Fri, Mar 5, 2021 at 5:10 PM Tom Lane  wrote:
> >> Alternatively, maybe we can salvage the function's usefulness by making it
> >> flush WAL before returning?
>
> > To make pg_xact_status()'s result reliable, don't you need to make
> > pg_current_xact_id() flush?  In other words, isn't it at the point
> > that you *observe* the transaction that you have to make sure that
> > this transaction ID won't be reused after crash recovery.  Before
> > that, it's simultaneously allocated and unallocated, like the cat.
>
> We need to be sure that the XID is written out to WAL before we
> let the client see it, yeah.  I've not looked to see exactly
> where in the code would be the best place.

One idea would be for TransactionStateData's bool member didLogXid to
become an LSN, initially invalid, that points one past the first
record logged for this transaction, maintained by
MarkCurrentTransactionIdLoggedIfAny().  Then, pg_current_xact_id()
(and any similar xid-reporting function that we deem to be an
officially sanctioned way for a client to "observe" xids) could check
if it's valid yet; if not, it could go ahead and log something
containing the xid to make it valid.  Then it could flush the log up
to that LSN.

Re: Is it useful to record whether plans are generic or custom?

2021-03-05 Thread Fujii Masao

On 2021/02/08 14:02, torikoshia wrote:

On 2021-02-04 11:19, Kyotaro Horiguchi wrote:

At Thu, 04 Feb 2021 10:16:47 +0900, torikoshia
 wrote in

Chengxi Sun, Yamada-san, Horiguchi-san,

Thanks for all your comments.
Adding only the number of generic plan execution seems acceptable.

On Mon, Jan 25, 2021 at 2:10 PM Kyotaro Horiguchi
 wrote:
> Note that ActivePortal is the closest nested portal. So it gives the
> wrong result for nested portals.

I may be wrong, but I thought it was ok since the closest nested
portal is the portal to be executed.

After executing the inner-most portal, is_plan_type_generic has a
value for the inner-most portal and it won't be changed ever after. At
the ExecutorEnd of all the upper-portals see the value for the
inner-most portal left behind is_plan_type_generic nevertheless the
portals at every nest level are independent.

ActivePortal is used in ExecutorStart hook in the patch.
And as far as I read PortalStart(), ActivePortal is changed to the
portal to be executed before ExecutorStart().

If possible, could you tell me the specific case which causes wrong
results?

Running a plpgsql function that does PREPRE in a query that does
PREPARE?

Thanks for your explanation!

I confirmed that it in fact happened.

To avoid it, attached patch preserves the is_plan_type_generic before changing 
it and sets it back at the end of pgss_ExecutorEnd().

Any thoughts?

I just tried this feature. When I set plan_cache_mode to force_generic_plan
and executed the following queries, I found that 
pg_stat_statements.generic_calls
and pg_prepared_statements.generic_plans were not the same.
Is this behavior expected? I was thinking that they are basically the same.

DEALLOCATE ALL;
SELECT pg_stat_statements_reset();
PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;
EXECUTE hoge(1);
EXECUTE hoge(1);
EXECUTE hoge(1);

SELECT generic_plans, statement FROM pg_prepared_statements WHERE statement 
LIKE '%hoge%';
 generic_plans |   statement
---+
 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;

SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query LIKE 
'%hoge%';
 calls | generic_calls | query
---+---+---
 3 | 2 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE 
aid = $1

When I executed the prepared statements via EXPLAIN ANALYZE, I found
pg_stat_statements.generic_calls was not incremented. Is this behavior expected?
Or we should count generic_calls even when executing the queries via 
ProcessUtility()?

DEALLOCATE ALL;
SELECT pg_stat_statements_reset();
PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;
EXPLAIN ANALYZE EXECUTE hoge(1);
EXPLAIN ANALYZE EXECUTE hoge(1);
EXPLAIN ANALYZE EXECUTE hoge(1);

SELECT generic_plans, statement FROM pg_prepared_statements WHERE statement 
LIKE '%hoge%';
 generic_plans |   statement
---+
 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;

SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query LIKE 
'%hoge%';
 calls | generic_calls | query
---+---+---
 3 | 0 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE 
aid = $1
 3 | 0 | EXPLAIN ANALYZE EXECUTE hoge(1)

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Is it useful to record whether plans are generic or custom?

2021-03-05 Thread Fujii Masao





On 2021/01/28 8:11, Tatsuro Yamada wrote:

Hi Toricoshi-san,

On 2021/01/12 20:36, torikoshia wrote:

I suppose it would be normal practice to store past results of
pg_stat_statements for future comparisons.
If this is the case, I think that if we only add the number of
generic plan execution, it will give us a hint to notice the cause
of performance degradation due to changes in the plan between
generic and custom.

For example, if there is a clear difference in the number of times
the generic plan is executed between before and after performance
degradation as below, it would be natural to check if there is a
problem with the generic plan.

...

Attached a patch that just adds a generic call counter to
pg_stat_statements.



I think that I'd like to use the view when we faced a performance
problem and find the reason. If we did the fixed-point observation
(should I say time-series analysis?) of generic_calls, it allows us to
realize the counter changes, and we can know whether the suspect is
generic_plan or not. So the patch helps DBA, I believe.


In that use case maybe what you actually want to see is whether the plan was
changed or not, rather than whether generic plan or custom plan is used?
If so, it's better to expose seq_scan (num of sequential scans processed by
the query) and idx_scan (num of index scans processed by the query) like
pg_stat_all_tables, per query in pg_stat_statements?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Kyotaro Horiguchi

At Fri, 5 Mar 2021 13:20:53 +0500, Andrey Borodin  wrote 
in 
> > 5 марта 2021 г., в 13:00, Kyotaro Horiguchi  
> > написал(а):
> > 
> > The problem records have 15 pages of FPIs.  The reduction of their
> > size may prevent WAL-buffer wrap around and wal writes.  If no wal is
> > written the test fails.
> Thanks, I've finally understood the root cause.
> So, test verifies guarantee that is not provided (durability of aborted 
> transactions)?

I think that's right.

> Maybe flip it to test that transaction effects are not committed\visible?

Maybe no. The objective of the test is to check if a maybe-comitted
transaction at crash is finally committed or aborted without directly
confirming the result data, I think.  And that feature is found not to
be working as expected.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: 011_crash_recovery.pl failes using wal_block_size=16K

2021-03-05 Thread Kyotaro Horiguchi

Oops! I forgot that the issue starts from this mail.

At Thu, 4 Mar 2021 22:34:38 +0800, "walker"  wrote in 
> 011_crash_recovery.pl ..
> 1..3
> ok 1 - own xid is in-progress
> not ok 2 - new xid after restart is greater

> But if I modified something in t/011_crash_recovery.pl, this perl script 
> works fine, as follows:
> is($node->safe_psql('postgres'), qq[SELECT pg_xact_status('$xid');]),
>              'in progress', 'own xid is 
> in-progress');
> 
> 
> sleep(1);  # here new added, just make sure the CREATE TABLE XLOG 
> can be flushed into WAL segment file on disk.

The sleep let the unwriten WAL records go out to disk (buffer).

> I think the problem is that before crash(simulated by stop with immediate 
> mode), the XLOG of "create table mine" didn't get flushed into wal file on 
> disk. Instead, if delay some time, e.g. 200ms, or more after issue create 
> table, in theory, the data in wal buffer should be written to disk by wal 
> writer.

Right.

> However, I'm not sure the root cause. what's the difference between 
> wal_blocksize=8k and wal_blocksize=16k while flushing wal buffer data to disk?

I'm sorry that I didn't follow this message.  However, the explanation
is in the following mail.

https://www.postgresql.org/message-id/20210305.135342.384699732619433016.horikyota.ntt%40gmail.com

In short, the doubled block size prevents wal-writes from happen.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Andrey Borodin




> 5 марта 2021 г., в 13:00, Kyotaro Horiguchi  
> написал(а):
> 
> The problem records have 15 pages of FPIs.  The reduction of their
> size may prevent WAL-buffer wrap around and wal writes.  If no wal is
> written the test fails.
Thanks, I've finally understood the root cause.
So, test verifies guarantee that is not provided (durability of aborted 
transactions)?
Maybe flip it to test that transaction effects are not committed\visible?

Best regards, Andrey Borodin.

Re: Which PG version does CVE-2021-20229 affected?

2021-03-05 Thread Thomas Kellerer

Michael Paquier schrieb am 05.03.2021 um 08:38:
> On Fri, Mar 05, 2021 at 12:32:43AM -0700, bchen90 wrote:
>> NVD link：
>>
>> https://nvd.nist.gov/vuln/detail/CVE-2021-20229#vulnCurrentDescriptionTitle
>
> This link includes incorrect information.  CVE-2021-20229 is only a
> problem in 13.0 and 13.1, fixed in 13.2.  Please see for example here:
> https://www.postgresql.org/support/security/
>
> The commit that fixed the issue is c028faf, mentioning 9ce77d7 as the
> origin point, a commit introduced in Postgres 13.

I think the information is correct as it says "Up to (excluding) 13.2"

I understand the "(excluding)" part, such that the "excluded" version
is _not_ affected by it.

But it's really a confusing way to present that kind of information.

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-03-05 Thread Dmitry Dolgov

> On Fri, Mar 05, 2021 at 10:22:45AM +0800, Andy Fan wrote:
> > > I checked again and found I do miss the check on JoinExpr->quals.  I have
> > > fixed it in v3 patch. Thanks for the review!
> > >
> > > In the attached v3,  commit 1 is the real patch, and commit 2 is just add
> > > some logs to help local testing.  notnull.sql/notnull.out is the test
> > case
> > > for
> > > this patch, both commit 2 and notnull.* are not intended to be committed
> > > at last.
> >
> > Just to clarify, this version of notnullattrs here is the latest one,
> > and another one from "UniqueKey on Partitioned table" thread should be
> > disregarded?
> >
>
> Actually they are different sections for UniqueKey.  Since I don't want to
> mess
> two topics in one place, I open another thread.  The topic here is how to
> represent
> a not null attribute, which is a precondition for all UniqueKey stuff.  The
> thread
> " UniqueKey on Partitioned table[1] " is talking about how to maintain the
> UniqueKey on a partitioned table only.

Sure, those two threads are addressing different topics. But [1] also
includes the patch for notnullattrs (I guess it's the same as one of the
older versions from this thread), so it would be good to specify which
one should be used to avoid any confusion.

> > I'm not sure about this, but couldn't be there still
> > some cases when a Var belongs to nullable_baserels, but still has some
> > constraints preventing it from being nullable (e.g. a silly example when
> > the not nullable column belong to the table, and the query does full
> > join of this table on itself using this column)?
> >
> > Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a
> WHERE
> t2.b = 3; "?   In this case, the outer join will be reduced to inner join
> at
> reduce_outer_join stage, which means t2 will not be shown in
> nullable_baserels.

Nope, as I said it's a bit useless example of full self join t1 on
itself. In this case not null column "a" will be considered as nullable,
but following your description for is_var_nullable it's fine (although
couple of commentaries to this function are clearly necessary).

> > Is this function necessary for the following patches? I've got an
> > impression that the discussion in this thread was mostly evolving about
> > correct description when notnullattrs could be used, not making it
> > bullet proof.
> >
>
> Exactly, that is the blocker issue right now. I hope more authorities can
> give
> some suggestions to move on.

Hm...why essentially a documentation question is the blocker? Or if you
mean it's a question of the patch scope, are there any arguments for
extending it?

Re: Add support for PROVE_FLAGS and PROVE_TESTS in MSVC scripts

2021-03-05 Thread Julien Rouhaud

On Fri, Mar 5, 2021 at 9:28 AM Michael Paquier  wrote:
>
> On Thu, Mar 04, 2021 at 06:37:56PM +0100, Juan José Santamaría Flecha wrote:
> > LGTM.
>
> Thanks.  I have tested more combinations of options, came back a bit
> to the documentation for buildenv.pl where copying the values from the
> docs would result in a warning, and applied it.

Great news!

Re: [PoC] Non-volatile WAL buffer

2021-03-05 Thread Takashi Menjo

Hi Tomas,

Thank you so much for your report. I have read it with great interest.

Your conclusion sounds reasonable to me. My patchset you call "NTT /
segments" got as good performance as "NTT / buffer" patchset. I have
been worried that calling mmap/munmap for each WAL segment file could
have a lot of overhead. Based on your performance tests, however, the
overhead looks less than what I thought. In addition, "NTT / segments"
patchset is more compatible to the current PG and more friendly to
DBAs because that patchset uses WAL segment files and does not
introduce any other new WAL-related file.

I also think that supporting both file I/O and mmap is better than
supporting only mmap. I will continue my work on "NTT / segments"
patchset to support both ways.

In the following, I will answer "Issues & Questions" you reported.

> While testing the "NTT / segments" patch, I repeatedly managed to crash the 
> cluster with errors like this:
>
> 2021-02-28 00:07:21.221 PST client backend [3737139] WARNING:  creating 
> logfile segment just before
> mapping; path "pg_wal/00010007002F"
> 2021-02-28 00:07:21.670 PST client backend [3737142] WARNING:  creating 
> logfile segment just before
> mapping; path "pg_wal/000100070030"
> ...
> 2021-02-28 00:07:21.698 PST client backend [3737145] WARNING:  creating 
> logfile segment just before
> mapping; path "pg_wal/000100070030"
> 2021-02-28 00:07:21.698 PST client backend [3737130] PANIC:  could not open 
> file
> "pg_wal/000100070030": No such file or directory
>
> I do believe this is a thinko in the 0008 patch, which does XLogFileInit in 
> XLogFileMap. Notice there are multiple
> "creating logfile" messages with the ..0030 segment, followed by the failure. 
> AFAICS the XLogFileMap may be
> called from multiple backends, so they may call XLogFileInit concurrently, 
> likely triggering some sort of race
> condition. It's fairly rare issue, though - I've only seen it twice from ~20 
> runs.

Thank you for your report. I found that rather the patch 0009 has an
issue, and that will also cause WAL loss. I should have set
use_existent to true, or InstallXlogFileSegment and BasicOpenFile in
XLogFileInit can be racy. I have misunderstood that use_existent can
be true because I am creating a brand-new file with XLogFileInit.

I will fix the issue.

> The other question I have is about WALInsertLockUpdateInsertingAt. 0003 
> removes this function, but leaves
> behind some of the other bits working with insert locks and insertingAt. But 
> it does not explain how it works without
> WaitXLogInsertionsToFinish() - how does it ensure that when we commit 
> something, all the preceding WAL is
> "complete" (i.e. written by other backends etc.)?

To wait for *all* the WALInsertLocks to be released, no matter each of
them precedes or follows the current insertion.

It would have worked functionally, but I rethink it is not good for
performance because XLogFileMap in GetXLogBuffer (where
WaitXLogInsertionsToFinish is removed) can block because it can
eventually call write() in XLogFileInit.

I will restore the WALInsertLockUpdateInsertingAt function and related
code for mmap.

Best regards,
Takashi

On Tue, Mar 2, 2021 at 5:40 AM Tomas Vondra
 wrote:
>
> Hi,
>
> I've performed some additional benchmarking and testing on the patches
> sent on 26/1 [1], and I'd like to share some interesting results.
>
> I did the tests on two different machines, with slightly different
> configurations. Both machines use the same CPU generation with slightly
> different frequency, a different OS (Ubuntu vs. RH), kernel (5.3 vs.
> 4.18) and so on. A more detailed description is in the attached PDF,
> along with the PostgreSQL configuration.
>
> The benchmark is fairly simple - pgbench with scale 500 (fits into
> shared buffers) and 5000 (fits into RAM). The runs were just 1 minute
> each, which is fairly short - it's however intentional, because I've
> done this with both full_page_writes=on/off to test how this behaves
> with many and no FPIs. This models extreme behaviors at the beginning
> and at the end of a checkpoint.
>
> This thread is rather confusing because there are far too many patches
> with over-lapping version numbers - even [1] contains two very different
> patches. I'll refer to them as "NTT / buffer" (for the patch using one
> large PMEM buffer) and "NTT / segments" for the patch using regular WAL
> segments.
>
> The attached PDF shows all these results along with charts. The two
> systems have a bit different performance (throughput), the conclusions
> seem to be mostly the same, so I'll just talk about results from one of
> the systems here (aka "System A").
>
> Note: Those systems are hosted / provided by Intel SDP, and Intel is
> interested in providing access to other devs interested in PMEM.
>
> Furthermore, these patches seem to be very insensitive to WAL segment
> size (unlike the experimental patches I shared some time ago),

Re: 011_crash_recovery.pl intermittently fails

2021-03-05 Thread Kyotaro Horiguchi

(Sorry for my slippery fingers.)

At Fri, 5 Mar 2021 10:07:06 +0500, Andrey Borodin  wrote 
in 
> Maybe it's offtopic here, but anyway...
> While working on "lz4 for FPIs" I've noticed that this test fails with 
> wal_compression = on.
> I did not investigate the case at that moment, but I think that it would be 
> good to run recovery regression tests with compression too.
> 
> Best regards, Andrey Borodin.

The problem records have 15 pages of FPIs.  The reduction of their
size may prevent WAL-buffer wrap around and wal writes.  If no wal is
written the test fails.

regrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

88 matches

Mail list logo