Re: Column Filtering in Logical Replication

2021-09-03 Thread Amit Kapila
On Sat, Sep 4, 2021 at 10:12 AM Amit Kapila  wrote:
>
> On Thu, Sep 2, 2021 at 2:51 AM Rahila Syed  wrote:
> >
> >>>
> >>> Do we want to consider that the columns specified in the filter must
> >>> not have NOT NULL constraint? Because, otherwise, the subscriber will
> >>> error out inserting such rows?
> >>>
> >> I think you mean columns *not* specified in the filter must not have NOT 
> >> NULL constraint
> >> on the subscriber, as this will break during insert, as it will try to 
> >> insert NULL for columns
> >> not sent by the publisher.
> >> I will look into fixing this. Probably this won't be a problem in
> >> case the column is auto generated or contains a default value.
> >>
> >
> > I am not sure if this needs to be handled. Ideally, we need to prevent the 
> > subscriber tables from having a NOT NULL
> > constraint if the publisher uses column filters to publish the values of 
> > the table. There is no way
> > to do this at the time of creating a table on subscriber.
> >
> > As this involves querying the publisher for this information, it can be 
> > done at the time of initial table synchronization.
> > i.e error out if any of the subscribed tables has NOT NULL constraint on 
> > non-filter columns.
> > This will lead to the user dropping and recreating the subscription after 
> > removing the
> > NOT NULL constraint from the table.
> > I think the same can be achieved by doing nothing and letting the 
> > subscriber error out while inserting rows.
> >
>
> That makes sense and also it is quite possible that users don't have
> such columns in the tables on subscribers. I guess we can add such a
> recommendation in the docs instead of doing anything in the code.
>
> Few comments:
> 
>

Did you give any thoughts to my earlier suggestion related to syntax [1]?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1J9b_0_PMnJ2jq9E55bcbmTKdUmy6jPnkf1Zwy2jxah_g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2021-09-03 Thread Amit Kapila
On Thu, Sep 2, 2021 at 2:51 AM Rahila Syed  wrote:
>
>>>
>>> Do we want to consider that the columns specified in the filter must
>>> not have NOT NULL constraint? Because, otherwise, the subscriber will
>>> error out inserting such rows?
>>>
>> I think you mean columns *not* specified in the filter must not have NOT 
>> NULL constraint
>> on the subscriber, as this will break during insert, as it will try to 
>> insert NULL for columns
>> not sent by the publisher.
>> I will look into fixing this. Probably this won't be a problem in
>> case the column is auto generated or contains a default value.
>>
>
> I am not sure if this needs to be handled. Ideally, we need to prevent the 
> subscriber tables from having a NOT NULL
> constraint if the publisher uses column filters to publish the values of the 
> table. There is no way
> to do this at the time of creating a table on subscriber.
>
> As this involves querying the publisher for this information, it can be done 
> at the time of initial table synchronization.
> i.e error out if any of the subscribed tables has NOT NULL constraint on 
> non-filter columns.
> This will lead to the user dropping and recreating the subscription after 
> removing the
> NOT NULL constraint from the table.
> I think the same can be achieved by doing nothing and letting the subscriber 
> error out while inserting rows.
>

That makes sense and also it is quite possible that users don't have
such columns in the tables on subscribers. I guess we can add such a
recommendation in the docs instead of doing anything in the code.

Few comments:

1.
+
+ /*
+ * Cannot specify column filter when REPLICA IDENTITY IS FULL
+ * or if column filter does not contain REPLICA IDENITY columns
+ */
+ if (targetcols != NIL)
+ {
+ if (replidentfull)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot add relation \"%s\" to publication",
+ RelationGetRelationName(targetrel)),
+ errdetail("Cannot have column filter with REPLICA IDENTITY FULL")));

Why do we want to have such a restriction for REPLICA IDENTITY FULL? I
think it is better to expand comments in that regards.

2.
@@ -839,7 +839,6 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  ereport(ERROR,
  (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
  errmsg("extra data after last expected column")));
-
  fieldno = 0;

@@ -944,7 +992,6 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
  flags |= LOGICALREP_IS_REPLICA_IDENTITY;

  pq_sendbyte(out, flags);
-
  /* attribute name */
  pq_sendstring(out, NameStr(att->attname));

@@ -953,6 +1000,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)

  /* attribute mode */
  pq_sendint32(out, att->atttypmod);
+
  }

Spurious line removals and addition.

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2021-09-03 Thread Amit Kapila
On Thu, Sep 2, 2021 at 2:19 PM Alvaro Herrera  wrote:
>
> On 2021-Sep-02, Rahila Syed wrote:
>
> > After thinking about this, I think it is best to remove the entire table
> > from publication,
> > if a column specified in the column filter is dropped from the table.
>
> Hmm, I think it would be cleanest to give responsibility to the user: if
> the column to be dropped is in the filter, then raise an error, aborting
> the drop.
>

Do you think that will make sense if the user used Cascade (Alter
Table ... Drop Column ... Cascade)?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-09-03 Thread Amit Kapila
On Sat, Sep 4, 2021 at 8:54 AM Amit Kapila  wrote:
>
> On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger  
> wrote:
> >
> > > On Aug 30, 2021, at 12:06 AM, Masahiko Sawada  
> > > wrote:
> > >
> > > I've attached rebased patches.
> > For the v12-0003 patch:
> >
> > I believe this feature is needed, but it also seems like a very powerful 
> > foot-gun.  Can we do anything to make it less likely that users will hurt 
> > themselves with this tool?
> >
>
> This won't do any more harm than currently, users can do via
> pg_replication_slot_advance and the same is documented as well, see
> [1].
>

Sorry, forgot to give the link.

[1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-09-03 Thread Amit Kapila
On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger  wrote:
>
> > On Aug 30, 2021, at 12:06 AM, Masahiko Sawada  wrote:
> >
> > I've attached rebased patches.
> For the v12-0003 patch:
>
> I believe this feature is needed, but it also seems like a very powerful 
> foot-gun.  Can we do anything to make it less likely that users will hurt 
> themselves with this tool?
>

This won't do any more harm than currently, users can do via
pg_replication_slot_advance and the same is documented as well, see
[1]. This will be allowed to only superusers. Its effect will be
documented with a precautionary note to use it only when the other
available ways can't be used. Any better ideas?

> I am thinking back to support calls I have attended.  When a production 
> system is down, there is often some hesitancy to perform ad-hoc operations on 
> the database, but once the decision has been made to do so, people try to get 
> the whole process done as quickly as possible.  If multiple transactions on 
> the publisher fail on the subscriber, they will do so in series, not in 
> parallel.
>

The subscriber will know only one transaction failure at a time, till
that is resolved, the apply won't move ahead and it won't know even if
there are other transactions that are going to fail in the future.

>
> If the user could instead clear all failed transactions of the same type, 
> that might make it less likely that they unthinkingly also skip subsequent 
> errors of some different type.  Perhaps something like ALTER SUBSCRIPTION ... 
> SET (skip_failures = 'duplicate key value violates unique constraint 
> "test_pkey"')?
>

I think if we want we can allow to skip particular error via
skip_error_code instead of via error message but not sure if it would
be better to skip a particular operation of the transaction rather
than the entire transaction. Normally from the atomicity purpose the
transaction can be either committed or rolled-back but not partially
done so I think it would be preferable to skip the entire transaction
rather than skipping it partially.

>  This is arguably a different feature request, and not something your patch 
> is required to address, but I wonder how much we should limit people shooting 
> themselves in the foot?  If we built something like this using your skip_xid 
> feature, rather than instead of your skip_xid feature, would your feature 
> need to be modified?
>

Sawada-San can answer better but I don't see any problem building any
such feature on top of what is currently proposed.

>
> I'm having trouble thinking of an example conflict where skipping a 
> transaction would be better than writing a BEFORE INSERT trigger on the 
> conflicting table which suppresses or redirects conflicting rows somewhere 
> else.  Particularly for larger transactions containing multiple statements, 
> suppressing the conflicting rows using a trigger would be less messy than 
> skipping the transaction.  I think your patch adds a useful tool to the 
> toolkit, but maybe we should mention more alternatives in the docs?  
> Something like, "changing the data on the subscriber so that it doesn't 
> conflict with incoming changes, or dropping the conflicting constraint or 
> unique index, or writing a trigger on the subscriber to suppress or redirect 
> conflicting incoming changes, or as a last resort, by skipping the whole 
> transaction"?
>

+1 for extending the docs as per this suggestion.

-- 
With Regards,
Amit Kapila.




Re: when the startup process doesn't (logging startup delays)

2021-09-03 Thread Justin Pryzby
On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote:
> Please find the updated patch attached.

Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=rwtpyhc7-...@mail.gmail.com

I agree with Robert that a standby server should not continuously show timing
messages for WAL replay.

Some doc comments:

+ Sets the time interval between each progress update of the operations
+ performed during startup process. This produces the log message after

Either say "performed by the startup process" or "performed during startup".

s/the/a/

+ every interval of time for the operations that take longer time. The 
unit

..for those operations which take longer than the specified duration.

+ used to specify the value is seconds. For example, if you set it to
+  10s , then after every  10s  
there

remove "after"

+ is a log message indicating which operation is going on and what is 
the

say "..every 10s, a log is emitted indicating which operation is ongoing, and
the elapsed time from the beginning of the operation.."

+ elapsed time from beginning. If the value is set to  0 
,
+ then it logs all the available messages for such operations. 
 -1

"..then all messages for such operations are logged."

+  disables the feature. The default value is set to 
 10s
+ 

"The default value is >10s<."




Re: [PATCH] pg_ctl should not truncate command lines at 1024 characters

2021-09-03 Thread Tom Lane
Phil Krylov  writes:
> Attaching the new version, with the test case and free-before-exit 
> removed.

Pushed with minor cosmetic adjustments.  Thanks for the patch!

regards, tom lane




Re: prevent immature WAL streaming

2021-09-03 Thread Andres Freund
Hi,

On 2021-09-03 20:01:50 -0400, Alvaro Herrera wrote:
> From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Tue, 31 Aug 2021 20:55:10 -0400
> Subject: [PATCH v3 1/5] Revert "Avoid creating archive status ".ready" files
>  too early"

> This reverts commit 515e3d84a0b58b58eb30194209d2bc47ed349f5b.

I'd prefer to see this pushed soon. I've a bunch of patches to xlog.c that
conflict with the prior changes, and rebasing back and forth isn't that much
fun...



> From f767cdddb3120f1f6c079c8eb00eaff38ea98c79 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Thu, 2 Sep 2021 17:21:46 -0400
> Subject: [PATCH v3 2/5] Implement FIRST_IS_ABORTED_CONTRECORD
>
> ---
>  src/backend/access/transam/xlog.c   | 53 +++--
>  src/backend/access/transam/xlogreader.c | 39 +-
>  src/include/access/xlog_internal.h  | 14 ++-
>  src/include/access/xlogreader.h |  3 ++
>  4 files changed, 103 insertions(+), 6 deletions(-)
>
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index e51a7a749d..411f1618df 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -586,6 +586,8 @@ typedef struct XLogCtlData
>   XLogRecPtr  replicationSlotMinLSN;  /* oldest LSN needed by any 
> slot */
>
>   XLogSegNo   lastRemovedSegNo;   /* latest removed/recycled XLOG 
> segment */
> + XLogRecPtr  abortedContrecordPtr; /* LSN of incomplete record at 
> end of
> +* 
> WAL */
>
>   /* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
>   XLogRecPtr  unloggedLSN;
> @@ -848,6 +850,7 @@ static XLogSource XLogReceiptSource = XLOG_FROM_ANY;
>  /* State information for XLOG reading */
>  static XLogRecPtr ReadRecPtr;/* start of last record read */
>  static XLogRecPtr EndRecPtr; /* end+1 of last record read */
> +static XLogRecPtr abortedContrecordPtr;  /* end+1 of incomplete record */
>
>  /*
>   * Local copies of equivalent fields in the control file.  When running
> @@ -2246,6 +2249,30 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool 
> opportunistic)
>   if (!Insert->forcePageWrites)
>   NewPage->xlp_info |= XLP_BKP_REMOVABLE;
>
> + /*
> +  * If the last page ended with an aborted partial continuation 
> record,
> +  * mark the new page to indicate that the partial record can be
> +  * omitted.
> +  *
> +  * This happens only once at the end of recovery, so there's no 
> race
> +  * condition here.
> +  */
> + if (XLogCtl->abortedContrecordPtr >= NewPageBeginPtr)
> + {

Can we move this case out of AdvanceXLInsertBuffer()? As the comment says,
this only happens at the end of recovery, so putting it into
AdvanceXLInsertBuffer() doesn't really seem necessary?


> +#ifdef WAL_DEBUG
> + if (XLogCtl->abortedContrecordPtr != NewPageBeginPtr)
> + elog(PANIC, "inconsistent aborted contrecord 
> location %X/%X, expected %X/%X",
> +  
> LSN_FORMAT_ARGS(XLogCtl->abortedContrecordPtr),
> +  LSN_FORMAT_ARGS(NewPageBeginPtr));
> + ereport(LOG,
> + (errmsg_internal("setting 
> XLP_FIRST_IS_ABORTED_PARTIAL flag at %X/%X",
> +  
> LSN_FORMAT_ARGS(NewPageBeginPtr;
> +#endif
> + NewPage->xlp_info |= XLP_FIRST_IS_ABORTED_PARTIAL;
> +
> + XLogCtl->abortedContrecordPtr = InvalidXLogRecPtr;
> + }

>   /*
>* If first page of an XLOG segment file, make it a long header.
>*/
> @@ -4392,6 +4419,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
>   record = XLogReadRecord(xlogreader, );
>   ReadRecPtr = xlogreader->ReadRecPtr;
>   EndRecPtr = xlogreader->EndRecPtr;
> + abortedContrecordPtr = xlogreader->abortedContrecordPtr;
>   if (record == NULL)
>   {
>   if (readFile >= 0)
> @@ -7691,10 +7719,29 @@ StartupXLOG(void)
>   /*
>* Re-fetch the last valid or last applied record, so we can identify 
> the
>* exact endpoint of what we consider the valid portion of WAL.
> +  *
> +  * When recovery ended in an incomplete record, continue writing from 
> the
> +  * point where it went missing.  This leaves behind an initial part of
> +  * broken record, which rescues downstream which have already received
> +  * that first part.
>*/
> - XLogBeginRead(xlogreader, LastRec);
> - record = ReadRecord(xlogreader, PANIC, false);
> - 

Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

2021-09-03 Thread Andres Freund
Hi,

On 2021-08-31 21:56:50 -0700, Andres Freund wrote:
> On 2021-08-27 13:57:45 +0900, Michael Paquier wrote:
> > On Wed, Aug 25, 2021 at 01:20:03AM -0700, Andres Freund wrote:
> > > On 2021-08-25 12:51:58 +0900, Michael Paquier wrote:
> > > As I said before, this ship has long sailed:
> > >
> > > typedef struct PgStat_MsgTabstat
> > > {
> > >   PgStat_MsgHdr m_hdr;
> > >   Oid m_databaseid;
> > >   int m_nentries;
> > >   int m_xact_commit;
> > >   int m_xact_rollback;
> > >   PgStat_Counter m_block_read_time;   /* times in microseconds */
> > >   PgStat_Counter m_block_write_time;
> > >   PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
> > > } PgStat_MsgTabstat;
> >
> > Well, I kind of misread what you meant upthread then.
> > PgStat_MsgTabstat has a name a bit misleading, especially if you
> > assign connection stats to it.
>
> ISTM we should just do this fairly obvious change. Given that we already
> transport commit / rollback / IO stats, I don't see why the connection stats
> change anything to a meaningful degree. I'm fairly baffled why that's not the
> obvious thing to do for v14.

Here's how I think that would look like. While writing up this draft, I found
two more issues:

- On windows / 32 bit systems, the session time would overflow if idle for
  longer than ~4300s. long is only 32 bit. Easy to fix obviously.
- Right now walsenders, including database connected walsenders, are not
  reported in connection stats. That doesn't seem quite right to me.

In the patch I made the message for connecting an explicitly reported message,
that seems cleaner, because it then happens at a clearly defined point. I
didn't do the same for disconnecting, but perhaps that would be better? Then
we could get rid of the whole pgStatSessionEndCause variable.

Greetings,

Andres Freund
>From 2b3cf32bd02ea4b73157db023d6475826e32c64a Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 3 Sep 2021 17:02:15 -0700
Subject: [PATCH] wip: Reduce overhead of "pg_stat_database counters for
 sessions and session time"

"Fixes" 960869da0803

Author: Andres Freund 
Reviewed-By:
Discussion: https://postgr.es/m/20210901045650.fz4he2b3wx4p7...@alap3.anarazel.de
Backpatch:
---
 src/include/pgstat.h|  34 ++--
 src/backend/postmaster/pgstat.c | 171 
 src/backend/tcop/postgres.c |   2 +
 src/backend/utils/activity/backend_status.c |   4 +-
 4 files changed, 132 insertions(+), 79 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 509849c7ff4..a7b386821f6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -81,7 +81,8 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_DEADLOCK,
 	PGSTAT_MTYPE_CHECKSUMFAILURE,
 	PGSTAT_MTYPE_REPLSLOT,
-	PGSTAT_MTYPE_CONNECTION,
+	PGSTAT_MTYPE_CONNECT,
+	PGSTAT_MTYPE_DISCONNECT,
 } StatMsgType;
 
 /* --
@@ -279,7 +280,7 @@ typedef struct PgStat_TableEntry
  * --
  */
 #define PGSTAT_NUM_TABENTRIES  \
-	((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - 3 * sizeof(int) - 2 * sizeof(PgStat_Counter))	\
+	((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - 3 * sizeof(int) - 5 * sizeof(PgStat_Counter)) \
 	 / sizeof(PgStat_TableEntry))
 
 typedef struct PgStat_MsgTabstat
@@ -291,6 +292,9 @@ typedef struct PgStat_MsgTabstat
 	int			m_xact_rollback;
 	PgStat_Counter m_block_read_time;	/* times in microseconds */
 	PgStat_Counter m_block_write_time;
+	PgStat_Counter m_session_time;
+	PgStat_Counter m_active_time;
+	PgStat_Counter m_idle_in_xact_time;
 	PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
 } PgStat_MsgTabstat;
 
@@ -653,20 +657,26 @@ typedef struct PgStat_MsgChecksumFailure
 } PgStat_MsgChecksumFailure;
 
 /* --
- * PgStat_MsgConn			Sent by the backend to update connection statistics.
+ * PgStat_MsgConnect			Sent by the backend upon connection
+ *establishment
  * --
  */
-typedef struct PgStat_MsgConn
+typedef struct PgStat_MsgConnect
 {
 	PgStat_MsgHdr m_hdr;
 	Oid			m_databaseid;
-	PgStat_Counter m_count;
-	PgStat_Counter m_session_time;
-	PgStat_Counter m_active_time;
-	PgStat_Counter m_idle_in_xact_time;
-	SessionEndType m_disconnect;
-} PgStat_MsgConn;
+} PgStat_MsgConnect;
 
+/* --
+ * PgStat_MsgDisconnect			Sent by the backend when disconnecting
+ * --
+ */
+typedef struct PgStat_MsgDisconnect
+{
+	PgStat_MsgHdr m_hdr;
+	Oid			m_databaseid;
+	SessionEndType m_cause;
+} PgStat_MsgDisconnect;
 
 /* --
  * PgStat_Msg	Union over all possible messages.
@@ -700,7 +710,8 @@ typedef union PgStat_Msg
 	PgStat_MsgTempFile msg_tempfile;
 	PgStat_MsgChecksumFailure msg_checksumfailure;
 	PgStat_MsgReplSlot msg_replslot;
-	PgStat_MsgConn msg_conn;
+	PgStat_MsgConnect msg_connect;
+	PgStat_MsgDisconnect msg_disconnect;
 } PgStat_Msg;
 
 
@@ -1010,6 +1021,7 @@ extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type t
 extern void 

Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
I thought that the way to have debug output for this new WAL code is to
use WAL_DEBUG; that way it won't bother anyone and we can remove them
later if necessary.

Also, I realized that we should cause any error in the path that
assembles a record from contrecords is to set a flag that we can test
after the standard "err:" label; no need to create a new label.

I also wrote a lot more comments to try and explain what is going on and
why.

I'm still unsure about the two-flags reporting in xlogreader, so I put
that in a separate commit.  Opinions on that one?

The last commit is something I noticed in pg_rewind ...

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
>From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 31 Aug 2021 20:55:10 -0400
Subject: [PATCH v3 1/5] Revert "Avoid creating archive status ".ready" files
 too early"

This reverts commit 515e3d84a0b58b58eb30194209d2bc47ed349f5b.
---
 src/backend/access/transam/timeline.c|   2 +-
 src/backend/access/transam/xlog.c| 220 ++-
 src/backend/access/transam/xlogarchive.c |  17 +-
 src/backend/postmaster/walwriter.c   |   7 -
 src/backend/replication/walreceiver.c|   6 +-
 src/include/access/xlog.h|   1 -
 src/include/access/xlogarchive.h |   4 +-
 src/include/access/xlogdefs.h|   1 -
 8 files changed, 24 insertions(+), 234 deletions(-)

diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index acd5c2431d..8d0903c175 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -452,7 +452,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	if (XLogArchivingActive())
 	{
 		TLHistoryFileName(histfname, newTLI);
-		XLogArchiveNotify(histfname, true);
+		XLogArchiveNotify(histfname);
 	}
 }
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24165ab03e..e51a7a749d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -724,18 +724,6 @@ typedef struct XLogCtlData
 	XLogRecPtr	lastFpwDisableRecPtr;
 
 	slock_t		info_lck;		/* locks shared variables shown above */
-
-	/*
-	 * Variables used to track segment-boundary-crossing WAL records.  See
-	 * RegisterSegmentBoundary.  Protected by segtrack_lck.
-	 */
-	XLogSegNo	lastNotifiedSeg;
-	XLogSegNo	earliestSegBoundary;
-	XLogRecPtr	earliestSegBoundaryEndPtr;
-	XLogSegNo	latestSegBoundary;
-	XLogRecPtr	latestSegBoundaryEndPtr;
-
-	slock_t		segtrack_lck;	/* locks shared variables shown above */
 } XLogCtlData;
 
 static XLogCtlData *XLogCtl = NULL;
@@ -932,7 +920,6 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo,
 		   XLogSegNo *endlogSegNo);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
-static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos);
 static void CleanupBackupHistory(void);
 static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
@@ -1167,56 +1154,23 @@ XLogInsertRecord(XLogRecData *rdata,
 	END_CRIT_SECTION();
 
 	/*
-	 * If we crossed page boundary, update LogwrtRqst.Write; if we crossed
-	 * segment boundary, register that and wake up walwriter.
+	 * Update shared LogwrtRqst.Write, if we crossed page boundary.
 	 */
 	if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
 	{
-		XLogSegNo	StartSeg;
-		XLogSegNo	EndSeg;
-
-		XLByteToSeg(StartPos, StartSeg, wal_segment_size);
-		XLByteToSeg(EndPos, EndSeg, wal_segment_size);
-
-		/*
-		 * Register our crossing the segment boundary if that occurred.
-		 *
-		 * Note that we did not use XLByteToPrevSeg() for determining the
-		 * ending segment.  This is so that a record that fits perfectly into
-		 * the end of the segment causes the latter to get marked ready for
-		 * archival immediately.
-		 */
-		if (StartSeg != EndSeg && XLogArchivingActive())
-			RegisterSegmentBoundary(EndSeg, EndPos);
-
-		/*
-		 * Advance LogwrtRqst.Write so that it includes new block(s).
-		 *
-		 * We do this after registering the segment boundary so that the
-		 * comparison with the flushed pointer below can use the latest value
-		 * known globally.
-		 */
 		SpinLockAcquire(>info_lck);
+		/* advance global request to include new block(s) */
 		if (XLogCtl->LogwrtRqst.Write < EndPos)
 			XLogCtl->LogwrtRqst.Write = EndPos;
 		/* update local result copy while I have the chance */
 		LogwrtResult = XLogCtl->LogwrtResult;
 		SpinLockRelease(>info_lck);
-
-		/*
-		 * There's a chance that the record was already flushed to disk and we
-		 * missed marking segments as ready for archive.  If this happens, we
-		 * nudge the WALWriter, which will take care of notifying segments as
-		 * needed.
-		 */
-		if (StartSeg != EndSeg && 

Re: [PATCH] support tab-completion for single quote input with equal sign

2021-09-03 Thread Jacob Champion
Hi Tang,

On Fri, 2021-09-03 at 04:32 +, tanghy.f...@fujitsu.com wrote:
> I'd appreciate it if you can share your test results with me.

Sure! Here's my output (after a `make clean && make`):

cd . && TESTDIR='/home/pchampion/workspace/postgres/src/bin/psql' 
PATH="/home/pchampion/workspace/postgres/tmp_install/usr/local/pgsql-master/bin:$PATH"
 
LD_LIBRARY_PATH="/home/pchampion/workspace/postgres/tmp_install/usr/local/pgsql-master/lib"
  PGPORT='65432' 
PG_REGRESS='/home/pchampion/workspace/postgres/src/bin/psql/../../../src/test/regress/pg_regress'
 /usr/bin/prove -I ../../../src/test/perl/ -I .  t/*.pl
t/010_tab_completion.pl .. 17/? 
#   Failed test 'tab-completion after single quoted text input with equal 
sign'
#   at t/010_tab_completion.pl line 198.
# Actual output was "CREATE SUBSCRIPTION my_sub CONNECTION 'host=localhost 
port=5432 dbname=postgres' \a"
# Did not match "(?^:PUBLICATION)"
# Looks like you failed 1 test of 23.
t/010_tab_completion.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/23 subtests 
t/020_cancel.pl .. ok   

Test Summary Report
---
t/010_tab_completion.pl (Wstat: 256 Tests: 23 Failed: 1)
  Failed test:  17
  Non-zero exit status: 1
Files=2, Tests=25,  8 wallclock secs ( 0.02 usr  0.01 sys +  1.48 cusr  
0.37 csys =  1.88 CPU)
Result: FAIL
make: *** [Makefile:87: check] Error 1

Thanks,
--Jacob


Re: [PATCH] test/ssl: rework the sslfiles Makefile target

2021-09-03 Thread Jacob Champion
On Fri, 2021-09-03 at 09:46 +0900, Michael Paquier wrote:
> Nice.  This is neat.  The split helps a lot to understand how you've
> changed things from the original implementation.  As a whole, this
> looks rather committable to me.

Great!

> One small-ish comment that I have is about all the .config files we
> have at the root of src/test/ssl/, bloating the whole.  I think that
> it would be a bit cleaner to put all of them in a different
> sub-directory, say just config/ or conf/.

That sounds reasonable. I won't be able to get to it before the holiday
weekend, but I can put up a patch sometime next week.

Thanks,
--Jacob


Re: A reloption for partitioned tables - parallel_workers

2021-09-03 Thread Amit Langote
On Sat, Sep 4, 2021 at 3:10 Laurenz Albe  wrote:

> On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:
> > > On 6 Apr 2021, at 09:46, Amit Langote  wrote:
> > > On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe 
> wrote:
> >
> > > > I don't know if Seamus is still working on that; if not, we might
> > > > mark it as "returned with feedback".
> > >
> > > I have to agree given the time left.
> >
> > This thread has stalled and the patch no longer applies.  I propose that
> we
> > mark this Returned with Feedback, is that Ok with you Amit?
>
> +1.  That requires more thought.


Yes, I think so too.
-- 
Amit Langote
EDB: http://www.enterprisedb.com


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-03 Thread Andres Freund
Hi,

On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote:
> Yeah, we can surely lock the relation as described by Robert, but IMHO,
> while creating the database we are already holding the exclusive lock on
> the database and there is no one else allowed to be connected to the
> database, so do we actually need to bother about the lock for the
> correctness?

The problem is that checkpointer, bgwriter, buffer reclaim don't care about
the database of the buffer they're working on... The exclusive lock on the
database doesn't change anything about that. Perhaps you can justify it's safe
because there can't be any dirty buffers or such though.


> I think we already have such a code in multiple places where we bypass the
> shared buffers for copying the relation
> e.g. index_copy_data(), heapam_relation_copy_data().

That's not at all comparable. We hold an exclusive lock on the relation at
that point, and we don't have a separate implementation of reading tuples from
the table or something like that.

Greetings,

Andres Freund




Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally

2021-09-03 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Sep-03, Tom Lane wrote:
>> TBH, I'd sooner rip out SendStop, and simplify the related postmaster
>> logic.

> I wrote a patch to do that in 2012, after this exchange:
> https://postgr.es/m/1333124720-sup-6...@alvh.no-ip.org
> I obviously doesn't apply at all anymore, but the thing that prevented
> me from sending it was I couldn't find what the mentioned feature was
> that would cause all backends to dump core at the time of a crash.

Oh, I think you misunderstood what I wrote.  I was thinking of the
ancient habit of most kernels to dump cores to a file just named
"core"; so that even if you went around and manually SIGABRT'd
each stopped process, the cores would all overwrite each other,
leaving you with little to show for the exercise.  Nowadays you're
more likely to get "core.NNN" for each PID, so that it could in
principle be useful to force all the backends to dump core for later
analysis.  But I know of no mechanism that would do that for you.

However, thinking about this afresh, it seems like that Berkeley-era
comment about "the wily post_hacker" was never very apropos.  If what
you wanted was a few GB of core files for later analysis, it'd make
more sense to have the postmaster send SIGABRT or the like.  That
saves a bunch of tedious manual steps, plus the cluster isn't left
in a funny state that requires yet more manual cleanup steps.

So I'm thinking that the *real* use-case for this is for developers
to attach with gdb and do on-the-fly investigation of the state of
other backends, rather than forcing core-dumps.  However, it's still
a pretty half-baked feature because there's no easy way to clean up
afterwards.

The other elephant in the room is that by the time the postmaster
has reacted to the initial backend crash, it's dubious whether the
state of other processes is still able to tell you much.  (IME,
at least, the postmaster doesn't hear about it until the kernel
has finished writing out the dying process's core image, which
takes approximately forever compared to modern CPU speeds.)

> So it seemed to me that we would be ripping out a feature I had used,
> with no replacement.

If we had a really useful feature here I'd be all over it.
But it looks more like somebody's ten-minute hack, so the
fact that it's undocumented and obscure-to-invoke seems
appropriate to me.

(BTW, I think we had exactly this discussion way back when
Peter cleaned up the postmaster/postgres command line switches.
Just about all the other old switches have equivalent GUCs,
and IIRC it is not an oversight that SendStop was left out.)

regards, tom lane




Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, Tom Lane wrote:

> "=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?="  writes:
> > I want to share a patch with you, in which I add a guc parameter 
> > 'enable_send_stop' to enable set the value of SendStop in postmaster.c more 
> > conveniently.  SendStop enable postmaster to send SIGSTOP rather than 
> > SIGQUIT to its children when some backend dumps core, and this variable is 
> > originally set with -T parameter when start postgres, which is inconvenient 
> > to control in some scenarios.
> 
> TBH, I'd sooner rip out SendStop, and simplify the related postmaster
> logic.

I wrote a patch to do that in 2012, after this exchange:
https://postgr.es/m/1333124720-sup-6...@alvh.no-ip.org
I obviously doesn't apply at all anymore, but the thing that prevented
me from sending it was I couldn't find what the mentioned feature was
that would cause all backends to dump core at the time of a crash.
So it seemed to me that we would be ripping out a feature I had used,
with no replacement.

(It applies cleanly on top of 36b7e3da17bc.)

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Cuando no hay humildad las personas se degradan" (A. Christie)
>From d8198f6ad6c814c03c486bcce696fda912393405 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 23 May 2012 22:18:21 -0400
Subject: [PATCH] Remove -T postmaster option

This used to tell postmaster to send SIGSTOP instead of SIGQUIT in case
of trouble, with the intent of letting a hypothetical PG hacker collect
core dumps from every backends by attaching a debugger to them, one by
one.  This has long been superseded by the ability of current operating
systems to dump core of several processes simultaneously.
---
 src/backend/postmaster/postmaster.c | 52 +
 1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eeea933b19..bf6166caf4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -187,7 +187,6 @@ static char ExtraOptions[MAXPGPATH];
  * shared data structures.	(Reinit is currently dead code, though.)
  */
 static bool Reinit = true;
-static int	SendStop = false;
 
 /* still more option variables */
 bool		EnableSSL = false;
@@ -544,7 +543,7 @@ PostmasterMain(int argc, char *argv[])
 	 * tcop/postgres.c (the option sets should not conflict) and with the
 	 * common help() function in main/main.c.
 	 */
-	while ((opt = getopt(argc, argv, "A:B:bc:C:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1)
+	while ((opt = getopt(argc, argv, "A:B:bc:C:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:st:W:-:")) != -1)
 	{
 		switch (opt)
 		{
@@ -654,16 +653,6 @@ PostmasterMain(int argc, char *argv[])
 SetConfigOption("log_statement_stats", "true", PGC_POSTMASTER, PGC_S_ARGV);
 break;
 
-			case 'T':
-
-/*
- * In the event that some backend dumps core, send SIGSTOP,
- * rather than SIGQUIT, to all its peers.  This lets the wily
- * post_hacker collect core dumps from everyone.
- */
-SendStop = true;
-break;
-
 			case 't':
 {
 	const char *tmp = get_stats_option_name(optarg);
@@ -2704,9 +2693,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			 * to commit hara-kiri.
 			 *
 			 * SIGQUIT is the special signal that says exit without proc_exit
-			 * and let the user know what's going on. But if SendStop is set
-			 * (-s on command line), then we send SIGSTOP instead, so that we
-			 * can get core dumps from all backends by hand.
+			 * and let the user know what's going on.
 			 *
 			 * We could exclude dead_end children here, but at least in the
 			 * SIGSTOP case it seems better to include them.
@@ -2715,9 +2702,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			{
 ereport(DEBUG2,
 		(errmsg_internal("sending %s to process %d",
-		 (SendStop ? "SIGSTOP" : "SIGQUIT"),
-		 (int) bp->pid)));
-signal_child(bp->pid, (SendStop ? SIGSTOP : SIGQUIT));
+		 "SIGQUIT", (int) bp->pid)));
+signal_child(bp->pid, SIGQUIT);
 			}
 		}
 	}
@@ -2729,9 +2715,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	{
 		ereport(DEBUG2,
 (errmsg_internal("sending %s to process %d",
- (SendStop ? "SIGSTOP" : "SIGQUIT"),
- (int) StartupPID)));
-		signal_child(StartupPID, (SendStop ? SIGSTOP : SIGQUIT));
+ "SIGQUIT", (int) StartupPID)));
+		signal_child(StartupPID, SIGQUIT);
 	}
 
 	/* Take care of the bgwriter too */
@@ -2741,9 +2726,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	{
 		ereport(DEBUG2,
 (errmsg_internal("sending %s to process %d",
- (SendStop ? "SIGSTOP" : "SIGQUIT"),
- (int) BgWriterPID)));
-		signal_child(BgWriterPID, (SendStop ? SIGSTOP : SIGQUIT));
+ "SIGQUIT", (int) BgWriterPID)));
+		signal_child(BgWriterPID, SIGQUIT);
 	}
 
 	/* Take care of the checkpointer too */
@@ -2753,9 

Re: ORDER BY pushdowns seem broken in postgres_fdw

2021-09-03 Thread David Zhang
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Applied the v6 patch to master branch and ran regression test for contrib, the 
result was "All tests successful."

Re: pgsql: Set the volatility of the timestamptz version of date_bin() back

2021-09-03 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Sep-03, John Naylor wrote:
>> On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera 
>> wrote:
>>> These catversion bumps in branch 14 this late in the cycle seem suspect.
>>> Didn't we have some hesitation to push multirange unnest around beta2
>>> precisely because of a desire to avoid catversion bumps?

>> This was for correcting a mistake (although the first commit turned out to
>> be a mistake itself), so I understood it to be necessary.

> A crazy idea might have been to return to the original value.

Yes, that is what should have been done, as I complained over
in pgsql-committers before seeing this exchange.  As things
stand, a pg_upgrade is going to be forced on beta3 users
without even the excuse of fixing a bug.

regards, tom lane




Re: Estimating HugePages Requirements?

2021-09-03 Thread Andres Freund
Hi,

On 2021-09-01 15:53:52 +0900, Michael Paquier wrote:
> Hmm.  I am not sure about the addition of huge_pages_required, knowing
> that we would have shared_memory_size.  I'd rather let the calculation
> part to the user with a scan of /proc/meminfo.

-1. We can easily do better, what do we gain by making the user do this stuff?
Especially because the right value also depends on huge_page_size.

Greetings,

Andres Freund




Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...

2021-09-03 Thread Daniel Gustafsson
> On 19 May 2021, at 09:53, Michael Paquier  wrote:
> 
> On Tue, Apr 27, 2021 at 12:58:52PM +0300, Aleksander Alekseev wrote:
>> I've noticed there is no tab completion for ALTER TABLE xxx ADD. Here
>> is an alternative version of the patch that fixes this as well. Not
>> sure if this should be in the same commit though.
> 
> -   /* If we have ALTER TABLE  DROP, provide COLUMN or CONSTRAINT */
> -   else if (Matches("ALTER", "TABLE", MatchAny, "DROP"))
> +   /* If we have ALTER TABLE  ADD|DROP, provide COLUMN or CONSTRAINT */
> +   else if (Matches("ALTER", "TABLE", MatchAny, "ADD|DROP"))
> Seems to me that the behavior to not complete with COLUMN or
> CONSTRAINT for ADD is intentional, as it is possible to specify a
> constraint or column name without the object type first.  This
> introduces a inconsistent behavior with what we do for columns with
> ADD, for one.  So a more consistent approach would be to list columns,
> constraints, COLUMN and CONSTRAINT in the list of options available
> after ADD.
> 
> +   else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT"))
> +   {
> +   completion_info_charp = prev3_wd;
> +   COMPLETE_WITH_QUERY(Query_for_nonvalid_constraint_of_table);
> +   }
> Specifying valid constraints is an authorized grammar, so it does not
> seem that bad to keep things as they are, either.  I would leave that
> alone.

This has stalled being marked Waiting on Author since May, and reading the
above it sounds like marking it Returned with Feedback is the logical next step
(patch also no longer applies).

--
Daniel Gustafsson   https://vmware.com/





Re: pgsql: Set the volatility of the timestamptz version of date_bin() back

2021-09-03 Thread Justin Pryzby
On Fri, Sep 03, 2021 at 01:56:50PM -0400, Alvaro Herrera wrote:
> On 2021-Sep-03, John Naylor wrote:
> 
> > On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera 
> > wrote:
> > >
> > > On 2021-Sep-03, John Naylor wrote:
> > > These catversion bumps in branch 14 this late in the cycle seem suspect.
> > > Didn't we have some hesitation to push multirange unnest around beta2
> > > precisely because of a desire to avoid catversion bumps?
> > 
> > This was for correcting a mistake (although the first commit turned out to
> > be a mistake itself), so I understood it to be necessary.
> 
> A crazy idea might have been to return to the original value.

+1.  I think the catversion usually is always increased even in a "revert", but
in this exceptional case [0] it would be nice if beta4/rc1 had the same number
as b3.

[0] two commits close to each other, with no other catalog changes, and with
the specific goal of allowing trivial upgrade from b3.

-- 
Justin




Re: A reloption for partitioned tables - parallel_workers

2021-09-03 Thread Laurenz Albe
On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:
> > On 6 Apr 2021, at 09:46, Amit Langote  wrote:
> > On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe  
> > wrote:
> 
> > > I don't know if Seamus is still working on that; if not, we might
> > > mark it as "returned with feedback".
> > 
> > I have to agree given the time left.
> 
> This thread has stalled and the patch no longer applies.  I propose that we
> mark this Returned with Feedback, is that Ok with you Amit?

+1.  That requires more thought.

Yours,
Laurenz Albe





Re: pgsql: Set the volatility of the timestamptz version of date_bin() back

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, John Naylor wrote:

> On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera 
> wrote:
> >
> > On 2021-Sep-03, John Naylor wrote:
> > These catversion bumps in branch 14 this late in the cycle seem suspect.
> > Didn't we have some hesitation to push multirange unnest around beta2
> > precisely because of a desire to avoid catversion bumps?
> 
> This was for correcting a mistake (although the first commit turned out to
> be a mistake itself), so I understood it to be necessary.

A crazy idea might have been to return to the original value.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, Andres Freund wrote:

> Hi,
> 
> On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote:
> > Oh, but of course we can't modify XLogReaderState in backbranches to add
> > the new struct member abortedContrecordPtr (or whatever we end up naming
> > that.)
> 
> Why is that? Afaict it's always allocated via XLogReaderAllocate(), so adding
> a new field at the end should be fine?

Hmm, true, that works.

> That said, I'm worried that this stuff is too complicated to get right in the
> backbranches. I suspect letting it stew in master for a while before
> backpatching would be a good move.

Sure, we can put it in master now and backpatch before the November
minors if everything goes well.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: pgsql: Set the volatility of the timestamptz version of date_bin() back

2021-09-03 Thread John Naylor
On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera 
wrote:
>
> On 2021-Sep-03, John Naylor wrote:
> These catversion bumps in branch 14 this late in the cycle seem suspect.
> Didn't we have some hesitation to push multirange unnest around beta2
> precisely because of a desire to avoid catversion bumps?

This was for correcting a mistake (although the first commit turned out to
be a mistake itself), so I understood it to be necessary.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: prevent immature WAL streaming

2021-09-03 Thread Andres Freund
Hi,

On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote:
> Oh, but of course we can't modify XLogReaderState in backbranches to add
> the new struct member abortedContrecordPtr (or whatever we end up naming
> that.)

Why is that? Afaict it's always allocated via XLogReaderAllocate(), so adding
a new field at the end should be fine?

That said, I'm worried that this stuff is too complicated to get right in the
backbranches. I suspect letting it stew in master for a while before
backpatching would be a good move.

Greetings,

Andres Freund




Re: Estimating HugePages Requirements?

2021-09-03 Thread Bossart, Nathan
On 9/2/21, 10:12 PM, "Kyotaro Horiguchi"  wrote:
> By the way I noticed that postgres -C huge_page_size shows 0, which I
> think should have the number used for the calculation if we show
> huge_page_required.

I would agree with this if huge_page_size was a runtime-computed GUC,
but since it's intended for users to explicitly request the huge page
size, it might be slightly confusing.  Perhaps another option would be
to create a new GUC for this.  Or maybe it's enough to note that the
value will be changed from 0 at runtime if huge pages are supported.
In any case, it might be best to handle this separately.

> I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
> whereas the error message above showed 148897792 bytes (142MB). So it
> seems that something is forgotten while calculating
> shared_memory_size.  As the consequence, launching postgres setting
> huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
> "could not map anonymous shared memory" error.

Hm.  I'm not seeing this with the v5 patch set, so maybe I
inadvertently fixed it already.  Can you check this again with v5?

Nathan



Re: pgsql: Set the volatility of the timestamptz version of date_bin() back

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, John Naylor wrote:

> Set the volatility of the timestamptz version of date_bin() back to immutable
> 
> 543f36b43d was too hasty in thinking that the volatility of date_bin()
> had to match date_trunc(), since only the latter references
> session_timezone.
> 
> Bump catversion

These catversion bumps in branch 14 this late in the cycle seem suspect.
Didn't we have some hesitation to push multirange unnest around beta2
precisely because of a desire to avoid catversion bumps?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, Kyotaro Horiguchi wrote:

> At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera  
> wrote in 

> The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?
> 
> > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
> > {
> > report_invalid_record(state,
> >   
> > "there is no contrecord flag at %X/%X",
> >   
> > LSN_FORMAT_ARGS(RecPtr));
> > +   goto aborted_contrecord;
> 
> This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and
> _IS_ABROTED_PARTIAL.  Is it okay?  (I don't object to remove the check.).

On second thought, I'm not sure that we should make xlogreader report an
invalid record here.  If we do, how is the user going to recover?
Recovery will stop there and lose whatever was written afterwards.
Maybe you could claim that if both bits are set then WAL is corrupted,
so it's okay to stop recovery.  But if WAL is really corrupted, then the
CRC check will fail.  All in all, I think I'd rather ignore the flag if
we see it set.

At most, we could have an

#ifndef FRONTEND
ereport(WARNING, "found unexpected flag xyz");
#endif

or something like that.  However, xlogreader does not currently have
anything like that, so I'm not completely sure.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
Oh, but of course we can't modify XLogReaderState in backbranches to add
the new struct member abortedContrecordPtr (or whatever we end up naming
that.)

I think I'm going to fix this, in backbranches only, by having
xlogreader.c have a global variable, which is going to be used by
ReadRecord instead of accessing the struct member.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)

2021-09-03 Thread Fujii Masao



On 2021/09/03 14:56, kuroda.hay...@fujitsu.com wrote:

Dear Fujii-san,

Thank you for your great works. Attached is the latest version.


Thanks a lot!



I set four testcases:

(1) Sets neither GUC nor server option
(2) Sets server option, but not GUC
(3) Sets GUC but not server option
(4) Sets both GUC and server option


OK.



I confirmed it almost works fine, but I found that
fallback_application_name will be never used in our test enviroment.
It is caused because our test runner pg_regress sets PGAPPNAME to "pg_regress" 
and
libpq prefer the environment variable to fallback_appname.
(I tried to control it by \setenv, but failed...)


make check uses "pg_regress", but I guess that make installcheck uses
"postgres_fdw". So the patch would cause make installcheck to fail.
I think that the case (1) is not so important, so can be removed. Thought?

Attached is the updated version of the patch. I removed the test
for case (1). And I arranged the regression tests so that they are based
on debug_discard_caches, to simplify them. Also I added and updated
some comments and docs. Could you review this version?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index 82aa14a65d..705f60a3ae 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -353,10 +353,11 @@ connect_pg_server(ForeignServer *server, UserMapping 
*user)
/*
 * Construct connection params from generic options of 
ForeignServer
 * and UserMapping.  (Some of them might not be libpq options, 
in
-* which case we'll just waste a few array slots.)  Add 3 extra 
slots
-* for fallback_application_name, client_encoding, end marker.
+* which case we'll just waste a few array slots.)  Add 4 extra 
slots
+* for application_name, fallback_application_name, 
client_encoding,
+* end marker.
 */
-   n = list_length(server->options) + list_length(user->options) + 
3;
+   n = list_length(server->options) + list_length(user->options) + 
4;
keywords = (const char **) palloc(n * sizeof(char *));
values = (const char **) palloc(n * sizeof(char *));
 
@@ -366,7 +367,23 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
n += ExtractConnectionOptions(user->options,
  
keywords + n, values + n);
 
-   /* Use "postgres_fdw" as fallback_application_name. */
+   /*
+* Use pgfdw_application_name as application_name if set.
+*
+* PQconnectdbParams() processes the parameter arrays from 
start to
+* end. If any key word is repeated, the last value is used. 
Therefore
+* note that pgfdw_application_name must be added to the arrays 
after
+* options of ForeignServer are, so that it can override
+* application_name set in ForeignServer.
+*/
+   if (pgfdw_application_name && *pgfdw_application_name != '\0')
+   {
+   keywords[n] = "application_name";
+   values[n] = pgfdw_application_name;
+   n++;
+   }
+
+   /* Use "postgres_fdw" as fallback_application_name */
keywords[n] = "fallback_application_name";
values[n] = "postgres_fdw";
n++;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index e3ee30f1aa..6a7d83c81f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10761,3 +10761,82 @@ ERROR:  invalid value for integer option "fetch_size": 
100$%$#$#
 CREATE FOREIGN TABLE inv_bsz (c1 int )
SERVER loopback OPTIONS (batch_size '100$%$#$#');
 ERROR:  invalid value for integer option "batch_size": 100$%$#$#
+-- ===
+-- test postgres_fdw.application_name GUC
+-- ===
+-- Turn debug_discard_caches off for this test to make that
+-- the remote connection is alive when checking its application_name.
+-- For each test, close all the existing cached connections manually and
+-- establish connection with new setting of application_name.
+SET debug_discard_caches = 0;
+-- If appname is set as GUC but not as options of server object,
+-- the GUC setting is used as application_name of remote connection.
+SET postgres_fdw.application_name TO 'fdw_guc_appname';
+SELECT 1 FROM postgres_fdw_disconnect_all();
+ ?column? 
+--
+1
+(1 row)
+

Re: psql: \dl+ to list large objects privileges

2021-09-03 Thread Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi,

There is something I forgot to mention in my previous review.

Typically when describing a thing in psql, it is the column "Description" that
belongs in the verbose version. For example listing indexes produces:

   List of relations
 Schema |   Name| Type  |  Owner   | Table

and the verbose version:
   List of relations
 Schema |   Name| Type  |  Owner   | Table | Persistence | Access method |  
  Size| Description

Since '\dl' already contains description, it might be worthwhile to consider to 
add the column `Access privileges`
without introducing a verbose version.

Thoughts?

Cheers,
//Georgios

Re: Improve logging when using Huge Pages

2021-09-03 Thread Fujii Masao




On 2021/09/03 23:27, Tom Lane wrote:

Fujii Masao  writes:

IMO, if the level is promoted to LOG, the message should be updated
so that it follows the error message style guide. But I agree that simpler
message would be better in this case. So what about something like
the following?



LOG: could not map anonymous shared memory (%zu bytes) with huge pages enabled
HINT: The server will map anonymous shared memory again with huge pages 
disabled.


That is not a hint.  Maybe it qualifies as errdetail, though.


Yes, agreed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: psql: \dl+ to list large objects privileges

2021-09-03 Thread Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi,

thank you for the patch, I personally think it is a useful addition and thus it
gets my vote. However, I also think that the proposed code will need some
changes.

On a high level I will recommend the addition of tests. There are similar tests
present in:
./src/test/regress/sql/psql.sql

I will also inquire as to the need for renaming the function `do_lo_list` to
`listLargeObjects` and its move to describe.c. from large_obj.c. In itself it is
not necessarily a blocking point, though it will require some strong arguments
for doing so.

Applying the patch, generates several whitespace warnings. It will be helpful
if those warnings are removed.

The patch contains:

case 'l':
-   success = do_lo_list();
+   success = listLargeObjects(show_verbose);


It might be of some interest to consider in the above to check the value of the
next character in command or emit an error if not valid. Such a pattern can be
found in the same switch block as for example:

switch (cmd[2])
{
case '\0':
case '+':

success = ...

break;
default:
status = PSQL_CMD_UNKNOWN;
break;
}


The patch contains:

else if (strcmp(cmd + 3, "list") == 0)
-   success = do_lo_list();
+   success = listLargeObjects(false);
+
+   else if (strcmp(cmd + 3, "list+") == 0)
+   success = listLargeObjects(true);


In a fashion similar to `exec_command_list`, it might be interesting to consider
expressing the above as:

show_verbose = strchr(cmd, '+') ? true : false;

else if (strcmp(cmd + 3, "list") == 0
success = do_lo_list(show_verbose);


Thoughts?

Cheers,
//Georgios

Re: A reloption for partitioned tables - parallel_workers

2021-09-03 Thread Daniel Gustafsson
> On 6 Apr 2021, at 09:46, Amit Langote  wrote:
> On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe  wrote:

>> I don't know if Seamus is still working on that; if not, we might
>> mark it as "returned with feedback".
> 
> I have to agree given the time left.

This thread has stalled and the patch no longer applies.  I propose that we
mark this Returned with Feedback, is that Ok with you Amit?

--
Daniel Gustafsson   https://vmware.com/





Re: pg_receivewal starting position

2021-09-03 Thread Bharath Rupireddy
On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau  wrote:
> There is still the concern raised by Bharath about being able to select the
> way to fetch the replication slot information for the user, and what should or
> should not be included in the documentation. I am in favor of documenting the
> process of selecting the wal start, and maybe include a --start-lsn option for
> the user to override it, but that's maybe for another patch.

Let's hear from others.

Thanks for the patches. I have some quick comments on the v5 patch-set:

0001:
1) Do you also want to MemSet values too in ReadReplicationSlot?

2) When if clause has single statement we don't generally use "{" and "}"
+ if (slot == NULL || !slot->in_use)
+ {
+ LWLockRelease(ReplicationSlotControlLock);
+ }
you can just have:
+ if (slot == NULL || !slot->in_use)
+ LWLockRelease(ReplicationSlotControlLock);

3) This is still not copying the slot contents, as I said earlier you
can just take the required info into some local variables instead of
taking the slot pointer to slot_contents pointer.
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(>mutex);
+ slot_contents = *slot;
+ SpinLockRelease(>mutex);
+ LWLockRelease(ReplicationSlotControlLock);

4) As I said earlier, why can't we have separate tli variables for
more readability instead of just one slots_position_timeline and
timeline_history variable? And you are not even resetting those 2
variables after if
(!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
up sending the restart_lsn timelineid for confirmed_flush. So, better
use two separate variables. In fact you can use block local variables:
+ if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+ {
+ List*tl_history= NIL;
+ TimeLineID tli;
+ tl_history= readTimeLineHistory(ThisTimeLineID);
+ tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
+   tl_history);
+ values[i] = Int32GetDatum(tli);
+ nulls[i] = false;
+ }
similarly for confirmed_flush.

5) I still don't see the need for below test case:
+ "READ_REPLICATION_SLOT does not produce an error with non existent slot");
when we have
+ "READ_REPLICATION_SLOT does not produce an error with dropped slot");
Because for a user, dropped or non existent slot is same, it's just
that for dropped slot we internally don't delete its entry from the
shared memory.

0002:
1) Imagine GetSlotInformation always returns
READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
infinite loop there? Instead, why don't you just exit(1); instead of
return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
think, you can just do exit(1), no need to retry.
+ case READ_REPLICATION_SLOT_ERROR:
+
+ /*
+ * Error has been logged by GetSlotInformation, return and
+ * maybe retry
+ */
+ return;

2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
passed? And why are you having this check after you connect to the
server and fetch the data?
+ /* If no slotinformation has been passed, we can return immediately */
+ if (slot_info == NULL)
+ {
+ PQclear(res);
+ return READ_REPLICATION_SLOT_OK;
+ }
Instead you can just have a single assert:

+ Assert(slot_name && slot_info );

3) How about
pg_log_error("could not read replication slot:
instead of
pg_log_error("could not fetch replication slot:

4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
+ default:
+ if (slot_info.is_logical)
+ {
+ /*
+ * If the slot is not physical we can't expect to
+ * recover from that
+ */
+ pg_log_error("cannot use the slot provided, physical slot expected.");
+ exit(1);
+ }
+ stream.startpos = slot_info.restart_lsn;
+ stream.timeline = slot_info.restart_tli;
+ }
You can just have another case statement for READ_REPLICATION_SLOT_OK
and in the default you can throw an error "unknown read replication
slot status" or some other better working and exit(1);

5) Do you want initialize slot_info to 0?
+ if (replication_slot)
+ {
+ SlotInformation slot_info;
+ MemSet(slot_info, 0, sizeof(SlotInformation));

6) This isn't readable:
+ slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
How about:
if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0)
   slot_info->is_logical = true;
You don't need to set it false, because you would have
MemSet(slot_info) in the caller.

7) How about
uint32 hi;
uint32 lo;
instead of
+ uint32 hi,
+ lo;

8) Move  SlotInformation * slot_info) to the next line as it crosses
the 80 char limit.
+GetSlotInformation(PGconn *conn, const char *slot_name,
SlotInformation * slot_info)

9) Instead of a boolean is_logical, I would rather suggest to use an
enum or #define macros the slot types correctly, because someday we
might introduce new type slots and somebody wants is_physical kind of
variable name?
+typedef struct SlotInformation {
+ boolis_logical;

Regards,
Bharath Rupireddy.




Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-03 Thread Bharath Rupireddy
On Fri, Sep 3, 2021 at 3:23 PM Daniel Gustafsson  wrote:
> > 4) Instead of just exiting from the main pg_basebackup process when
> > the child WAL receiver dies, can't we think of restarting the child
> > process, probably with the WAL streaming position where it left off or
> > stream from the beginning? This way, the work that the main
> > pg_basebackup has done so far doesn't get wasted. I'm not sure if this
> > affects the pg_basebackup functionality. We can restart the child
> > process for 1 or 2 times, if it still dies, we can kill the main
> > pg_baasebackup process too. Thoughts?
>
> I was toying with the idea, but I ended up not pursuing it.  This error is 
> well
> into the “really shouldn’t happen, but can” territory and it’s quite likely
> that some level of manual intervention is required to make it successfully
> restart.  I’m not convinced that adding complicated logic to restart (and even
> more complicated tests to simulate and test it) will be worthwhile.

 I withdraw my suggestion because I now feel that it's better not to
make it complex and let the user decide if at all the child process
exits abnormally.

I think we might still miss abnormal child thread exits on Windows
because we set bgchild_exited = true only if ReceiveXlogStream or
walmethod->finish() returns false. I'm not sure the parent thread on
Windows can detect a child's abnormal exit. Since there is no signal
mechanism on Windows, what the patch does is better to detect child
exit on two important functions failures.

Overall, the v3 patch looks good to me.

Regards,
Bharath Rupireddy.




Re: [PATCH] Make pkg-config files cross-compile friendly

2021-09-03 Thread Peter Eisentraut

On 05.03.20 22:38, Sebastian Kemper wrote:

This commit addresses this by doing the following two things:

   1. Instead of hard coding the paths in "Cflags" and "Libs"
  "${includedir}" and "${libdir}" are used. Note: these variables can
  be overriden on the pkg-config command line
  ("--define-variable=libdir=/some/path").

   2. Add the variables "prefix" and "exec_prefix". If "includedir"
  and/or "libdir" are using these then construct them accordingly.
  This is done because buildroots (for instance OpenWrt) tend to
  rename the real pkg-config and call it indirectly from a script
  that sets "prefix", "exec_prefix" and "bindir", like so:


Committed.  I simplified your code a little bit, and I also made it so 
that exec_prefix is set to ${prefix} by default.  That way it matches 
what most other .pc files I have found do.





Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally

2021-09-03 Thread Tom Lane
"=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?="  writes:
> I want to share a patch with you, in which I add a guc parameter 
> 'enable_send_stop' to enable set the value of SendStop in postmaster.c more 
> conveniently.  SendStop enable postmaster to send SIGSTOP rather than SIGQUIT 
> to its children when some backend dumps core, and this variable is originally 
> set with -T parameter when start postgres, which is inconvenient to control 
> in some scenarios.

TBH, I'd sooner rip out SendStop, and simplify the related postmaster
logic.  I've never used it in twenty-some years of Postgres hacking,
and I doubt anyone else has used it much either.  It's not worth the
overhead of a GUC.  (The argument that you need it in situations
where you can't control the postmaster's command line seems pretty
thin, too.  I'm much more worried about somebody turning it on by
accident and then complaining that the cluster freezes upon crash.)

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-03 Thread Robert Haas
On Fri, Sep 3, 2021 at 6:23 AM Dilip Kumar  wrote:
>> + /* Built-in oids are mapped directly */
>> + if (classForm->oid < FirstGenbkiObjectId)
>> + relfilenode = classForm->oid;
>> + else if (OidIsValid(classForm->relfilenode))
>> + relfilenode = classForm->relfilenode;
>> + else
>> + continue;
>>
>> Am I missing something, or is this totally busted?
>
> Oops, I think the condition should be like below, but I will think carefully 
> before posting the next version if there is something else I am missing.
>
> if (OidIsValid(classForm->relfilenode))
>relfilenode = classForm->relfilenode;
> else if  if (classForm->oid < FirstGenbkiObjectId)
>relfilenode = classForm->oid;
> else
>   continue

What about mapped rels that have been rewritten at some point?

> Agreed to all, but In general, I think WAL hitting the disk before data is 
> more applicable for the shared buffers, where we want to flush the WAL first 
> before writing the shared buffer so that in case of torn page we have an 
> option to recover the page from previous FPI. But in such cases where we are 
> creating a directory or file there is no such requirement.   Anyways, I 
> agreed with the comments that it should be more uniform and the comment 
> should be correct.

There have been previous debates about whether WAL records for
filesystem operations should be issued before or after those
operations are performed. I'm not sure how easy those discussion are
to find in the archives, but it's very relevant here. I think the
short version is - if we write a WAL record first and then the
operation fails afterward, we have to PANIC. But if we perform the
operation first and then write the WAL record if it succeeds, then we
could crash before writing WAL and end up out of sync with our
standbys. If we then later do any WAL-logged operation locally that
depends on that operation having been performed, replay will fail on
the standby. There used to be, or maybe still are, comments in the
code defending the latter approach, but more recently it's been
strongly criticized. The thinking, AIUI, is basically that filesystem
operations really ought not to fail, because nobody should be doing
weird things to the data directory, and if they do, panicking is OK.
But having replay fail in strange ways on the standby later is not OK.

I'm not sure if everyone agrees with that logic; it seems somewhat
debatable. I *think* I personally agree with it but ... I'm not even
100% sure about that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Improve logging when using Huge Pages

2021-09-03 Thread Tom Lane
Fujii Masao  writes:
> IMO, if the level is promoted to LOG, the message should be updated
> so that it follows the error message style guide. But I agree that simpler
> message would be better in this case. So what about something like
> the following?

> LOG: could not map anonymous shared memory (%zu bytes) with huge pages enabled
> HINT: The server will map anonymous shared memory again with huge pages 
> disabled.

That is not a hint.  Maybe it qualifies as errdetail, though.

regards, tom lane




Re: using an end-of-recovery record in all cases

2021-09-03 Thread Robert Haas
On Fri, Sep 3, 2021 at 12:52 AM Kyotaro Horiguchi
 wrote:
> I tried to reproduce this but just replacing the end-of-recovery
> checkpoint request with issuing an end-of-recovery record didn't cause
> make check-workd fail for me.  Do you have an idea of any other
> requirement to cause that?

Did you use the patch I posted, or a different one?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: replay of CREATE TABLESPACE eats data at wal_level=minimal

2021-09-03 Thread Robert Haas
On Thu, Sep 2, 2021 at 10:56 PM Noah Misch  wrote:
> > Is there anything still standing in the way of committing this?
>
> I pushed it as commit 97ddda8.

Oh, thanks. Sorry, I had missed that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, Kyotaro Horiguchi wrote:

> At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera  
> wrote in 

> 0002:
> > +   XLogRecPtr  abortedContrecordPtr; /* LSN of incomplete record at 
> > end of
> > +  * 
> > WAL */
> 
> The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?

I went over various iterations of the name of this, and still not
entirely happy.  I think we need to convey the ideas that

* This is the endptr+1 of the known-good part of the record, that is,
  the beginning of the next part of the record.  I think "endPtr"
  summarizes this well; we use this name elsewhere.

* At some point before recovery, this was the last WAL record that
  existed

* there is an invalid contrecord, or we were looking for a contrecord
  and found invalid data

* this record is incomplete

So maybe
1. incompleteRecEndPtr
2. finalInvalidRecEndPtr
3. brokenContrecordEndPtr

> > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
> > {
> > report_invalid_record(state,
> >   
> > "there is no contrecord flag at %X/%X",
> >   
> > LSN_FORMAT_ARGS(RecPtr));
> > -   goto err;
> > +   goto aborted_contrecord;
> 
> This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and
> _IS_ABROTED_PARTIAL.  Is it okay?  (I don't object to remove the check.).

Yeah, I was unsure about that.  I think it's good to have it as a
cross-check, though it should never occur.  I'll put it back.

Another related point is whether it's a good idea to have the ereport()
about the bit appearing in a not-start-of-page address being a PANIC.
If we degrade to WARNING then it'll be lost in the noise, but I'm not
sure what else can we do.  (If it's a PANIC, then you end up with an
unusable database).

> I didin't thought this as an aborted contrecord. but on second
> thought, when we see a record broken in any style, we stop recovery at
> the point. I agree to the change and all the silmiar changes.
> 
> + /* XXX should we goto 
> aborted_contrecord here? */
> 
> I think it should be aborted_contrecord.
> 
> When that happens, the loaded bytes actually looked like the first
> fragment of a continuation record to xlogreader, even if the cause
> were a broken total_len.  So if we abort the record there, the next
> time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same
> page, and correctly finds a new record there.
> 
> On the other hand if we just errored-out there, we will step-back to
> the beginning of the broken record in the previous page or segment
> then start writing a new record there but that is exactly what we want
> to avoid now.

Hmm, yeah, makes sense.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: Multi-Column List Partitioning

2021-09-03 Thread Amit Langote
On Wed, Sep 1, 2021 at 2:31 PM Amit Langote  wrote:
> On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>  wrote:
> > The attached patch also fixes the above comments.
>
> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, 
> false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
>QUERY PLAN
> 
>  Seq Scan on p01 p0  (cost=0.00..22.50 rows=6 width=37)
>Filter: (a IS NULL)
> (2 rows)
>
> I guess that may be due to the following newly added code being incomplete:
>
> +/*
> + * get_partition_bound_null_index
> + *
> + * Returns the partition index of the partition bound which accepts NULL.
> + */
> +int
> +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
> +{
> +   int i = 0;
> +   int j = 0;
> +
> +   if (!boundinfo->isnulls)
> +   return -1;
>
> -   if (!val->constisnull)
> -   count++;
> +   for (i = 0; i < boundinfo->ndatums; i++)
> +   {
> +   //TODO: Handle for multi-column cases
> +   for (j = 0; j < 1; j++)
> +   {
> +   if (boundinfo->isnulls[i][j])
> +   return boundinfo->indexes[i];
> }
> }
>
> +   return -1;
> +}
>
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.
>
> Some other issues I noticed and suggestions for improvement:
>
> +/*
> + * checkForDuplicates
> + *
> + * Returns TRUE if the list bound element is already present in the list of
> + * list bounds, FALSE otherwise.
> + */
> +static bool
> +checkForDuplicates(List *source, List *searchElem)
>
> This function name may be too generic.  Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)
>
> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."
>
> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.
>
> +   int i = 0;
> +   int j = 0;
>
> Better to initialize such loop counters closer to the loop.
>
> +   colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> +   colname[i] = get_attname(RelationGetRelid(parent),
> +key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().
>
> +   ListCell   *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.
>
> +   RowExpr *rowexpr = NULL;
> +
> +   if (!IsA(expr, RowExpr))
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +   errmsg("Invalid list bound specification"),
> +   parser_errposition(pstate, exprLocation((Node
> *) spec;
> +
> +   rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))
>
> +   if (isDuplicate)
> +   continue;
> +
> +   result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);
>
> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
>  typedef struct PartitionListValue
>  {
> int index;
> -   Datum   value;
> +   Datum  *values;
> +   bool   *isnulls;
>  } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it.  Call it, say,
> PartitionListBound?
>
> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.
>
> Will send more comments in a bit...

+ * partition_bound_accepts_nulls
+ *
+ * Returns TRUE if partition bound has NULL value, FALSE otherwise.
  */

I suggest slight rewording, as follows:

"Returns TRUE if any of the partition bounds contains a NULL value,
FALSE otherwise."

-   PartitionListValue *all_values;
+   PartitionListValue **all_values;
...
-   

Re: psql: \dl+ to list large objects privileges

2021-09-03 Thread Pavel Luzanov

Hello,

Thank you very mush for review.

I will prepare a new version of the patch according to your comments. 
For now, I will answer this question:



I will also inquire as to the need for renaming the function `do_lo_list` to
`listLargeObjects` and its move to describe.c. from large_obj.c. In itself it is
not necessarily a blocking point, though it will require some strong arguments
for doing so.


I understand that I needed a good reason for such actions.

On the one hand all the commands for working with large objects are in 
large_obj.c. On the other hand, all commands for displaying the contents 
of system catalogs are in describe.c. The function do_lo_list belongs to 
both groups.


The main reason for moving the function to describe.c is that I wanted 
to use the printACLColumn function to display lomacl column. 
printACLColumn function is used in all the other commands to display 
privileges and this function is locally defined in describe.c and there 
is no reason to make in public.


Another option is to duplicate the printACLColumn function (or its 
contents) in large_obj.c. This seemed wrong to me.

Is it any other way?

Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company





Re: Improve logging when using Huge Pages

2021-09-03 Thread Fujii Masao




On 2021/09/03 16:49, Kyotaro Horiguchi wrote:

IF you are thinking to show that in GUC, you might want to look into
the nearby thread [1]


Yes, let's discuss this feature at that thread.



I have some comment about the patch.

-   if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED)
-   elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages 
disabled: %m",
-allocsize);
+   if (ptr != MAP_FAILED)
+   using_huge_pages = true;
+   else if (huge_pages == HUGE_PAGES_TRY)
+   ereport(LOG,
+   (errmsg("could not map anonymous shared 
memory: %m"),
+(mmap_errno == ENOMEM) ?
+errhint("This error usually means that 
PostgreSQL's request "

If we set huge_pages to try and postgres falled back to regular pages,
it emits a large message relative to its importance. The user specifed
that "I'd like to use huge pages, but it's ok if not available.", so I
think the message should be far smaller.  Maybe just raising the
DEBUG1 message to LOG along with moving to ereport might be
sufficient.


IMO, if the level is promoted to LOG, the message should be updated
so that it follows the error message style guide. But I agree that simpler
message would be better in this case. So what about something like
the following?

LOG: could not map anonymous shared memory (%zu bytes) with huge pages enabled
HINT: The server will map anonymous shared memory again with huge pages 
disabled.


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: psql: \dl+ to list large objects privileges

2021-09-03 Thread Pavel Luzanov

Hello,

Thank you very much for review.


Since '\dl' already contains description, it might be worthwhile to consider to 
add the column `Access privileges`
without introducing a verbose version.


I thought about it.
The reason why I decided to add the verbose version is because of 
backward compatibility. Perhaps the appearance of a new column in an 
existing command may become undesirable to someone.


If we don't worry about backward compatibility, the patch will be 
easier. But I'm not sure this is the right approach.


Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company





Re: .ready and .done files considered harmful

2021-09-03 Thread Dipesh Pandit
Hi,

Thanks for the feedback.

> Which approach do you think we should use?  I think we have decent
> patches for both approaches at this point, so perhaps we should see if
> we can get some additional feedback from the community on which one we
> should pursue further.

In my opinion both the approaches have benefits over current implementation.
I think in keep-trying-the-next-file approach we have handled all rare and
specific
scenarios which requires us to force a directory scan to archive the
desired files.
In addition to this with the recent change to force a directory scan at
checkpoint
we can avoid an infinite wait for a file which is still being missed out
despite
handling the special scenarios. It is also more efficient in extreme
scenarios
as discussed in this thread. However, multiple-files-per-readdir approach
is
cleaner with resilience of current implementation.

I agree that we should decide on which approach to pursue further based on
additional feedback from the community.

> The problem I see with this is that pgarch_archiveXlog() might end up
> failing.  If it does, we won't retry archiving the file until we do a
> directory scan.  I think we could try to avoid forcing a directory
> scan outside of these failure cases and archiver startup, but I'm not
> sure it's really worth it.  When pgarch_readyXlog() returns false, it
> most likely means that there are no .ready files present, so I'm not
> sure we are gaining a whole lot by avoiding a directory scan in that
> case.  I guess it might help a bit if there are a ton of .done files,
> though.

Yes, I think it will be useful when we have a bunch of .done files and
the frequency of .ready files is such that the archiver goes to wait
state before the next WAL file is ready for archival.

> I agree, but it should probably be something like DEBUG3 instead of
> LOG.

I will update it in the next patch.

Thanks,
Dipesh


Re: Avoid stuck of pbgench due to skipped transactions

2021-09-03 Thread Fujii Masao




On 2021/06/17 1:23, Yugo NAGATA wrote:

I attached the v2 patch to clarify that I withdrew the v3 patch.


Thanks for the patch!

+* For very unrealistic 
rates under -T, some skipped
+* transactions are not 
counted because the catchup
+* loop is not fast 
enough just to do the scheduling
+* and counting at the 
expected speed.
+*
+* We do not bother 
with such a degenerate case.
+*/

ISTM that the patch changes pgbench so that it can skip counting
some skipped transactions here even for realistic rates under -T.
Of course, which would happen very rarely. Is this understanding right?

On the other hand, even without the patch, in the first place, there seems
no guarantee that all the skipped transactions are counted under -T.
When the timer is exceeded in CSTATE_END_TX, a client ends without
checking outstanding skipped transactions. Therefore the "issue" that
some skipped transactions are not counted is not one the patch newly introdues.
So that behavior change by the patch would be acceptable.
Is this understanding right?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Added schema level support for publication.

2021-09-03 Thread Amit Kapila
On Thu, Sep 2, 2021 at 5:12 PM Amit Kapila  wrote:
>
> On Thu, Sep 2, 2021 at 11:58 AM vignesh C  wrote:
> >
>
> Below are few comments on v23. If you have already addressed anything
> in v24, then ignore it.
>

More Review comments (on latest version v24):
===
1.
+Oidpsnspcid BKI_LOOKUP(pg_class);/* Oid of the schema */
+} FormData_pg_publication_schema;

Why in the above structure you have used pg_class instead of pg_namespace?

2. GetSchemaPublicationRelations() uses two different ways to skip
non-publishable relations in two while loops. Both are correct but I
think it would be easier to follow if they use the same way and in
this case I would prefer a check like if (is_publishable_class(relid,
relForm)). The comments atop function could also be modified to :"Get
the list of publishable relation oids for a specified schema.". I
think it is better to write some comments on why you need to scan and
loop twice.

3. The other point about GetSchemaPublicationRelations() is that I am
afraid that it will collect duplicate relation oids in the final list
when the partitioned table and child tables are in the same schema.

4. In GetRelationPublicationActions(), the handling related to
partitions seems to be missing for the schema. It won't be able to
take care of child tables when they are in a different schema than the
parent table.

5.
If I modify the search path to remove public schema then I get the
below error message:
postgres=# Create publication mypub for all tables in schema current_schema;
ERROR:  no schema has been selected

I think this message is not very clear. How about changing to
something like "current_schema doesn't contain any valid schema"? This
message is used in more than one place, so let's keep it the same at
all the places if you agree to change it.

6. How about naming ConvertPubObjSpecListToOidList() as
ObjectsInPublicationToOids()? I see somewhat similarly named functions
in the code like objectsInSchemaToOids, objectNamesToOids.

7.
+ /*
+ * Schema lock is held until the publication is created to prevent
+ * concurrent schema deletion. No need to unlock the schemas, the
+ * locks will be released automatically at the end of create
+ * publication command.
+ */

In this comment no need to say create publication command as that is
implicit, we can just write " at the end of command".

8. Can you please extract changes like tab-completion, dump/restore in
separate patches? This can help to review the core (backend) part of
the patch in more detail.

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-03 Thread Amit Kapila
On Wed, Sep 1, 2021 at 12:05 PM Amit Kapila  wrote:
>
> On Wed, Sep 1, 2021 at 8:52 AM Greg Nancarrow  wrote:
> >
>
> > I'd expect a lot of users to naturally think that "ALTER PUBLICATION
> > pub1 DROP ALL TABLES IN SCHEMA sc1;" would drop from the publication
> > all tables that are in schema "sc1", which is not what it is currently
> > doing.
> > Since the syntax was changed to specifically refer to FOR ALL TABLES
> > IN SCHEMA rather than FOR SCHEMA, then now it's clear we're referring
> > to tables only, when specifying "... FOR ALL TABLES in sc1, TABLE
> > sc1.test", so IMHO it's reasonable to remove duplicates here, rather
> > than treating these as somehow separate ways of referencing the same
> > table.
> >
>
> I see your point and if we decide to go this path then it is better to
> give an error than silently removing duplicates.
>

Today, I have thought about this point again and it seems better to
give an error in this case and let the user take the action rather
than silently removing such tables to avoid any confusion.

-- 
With Regards,
Amit Kapila.




Re: using an end-of-recovery record in all cases

2021-09-03 Thread Amul Sul
On Fri, Sep 3, 2021 at 10:23 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 2 Sep 2021 11:30:59 -0400, Robert Haas  wrote 
> in
> > On Mon, Aug 9, 2021 at 3:00 PM Robert Haas  wrote:
> > > I decided to try writing a patch to use an end-of-recovery record
> > > rather than a checkpoint record in all cases.
> > >
> > > The first problem I hit was that GetRunningTransactionData() does
> > > Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)).
> > >
> > > Unfortunately we can't just relax the assertion, because the
> > > XLOG_RUNNING_XACTS record will eventually be handed to
> > > ProcArrayApplyRecoveryInfo() for processing ... and that function
> > > contains a matching assertion which would in turn fail. It in turn
> > > passes the value to MaintainLatestCompletedXidRecovery() which
> > > contains yet another matching assertion, so the restriction to normal
> > > XIDs here looks pretty deliberate. There are no comments, though, so
> > > the reader is left to guess why. I see one problem:
> > > MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which
> > > expects a normal XID. Perhaps it's best to just dodge the entire issue
> > > by skipping LogStandbySnapshot() if latestCompletedXid happens to be
> > > 2, but that feels like a hack, because AFAICS the real problem is that
> > > StartupXLog() doesn't agree with the rest of the code on whether 2 is
> > > a legal case, and maybe we ought to be storing a value that doesn't
> > > need to be computed via TransactionIdRetreat().
> >
> > Anyone have any thoughts about this?
>
> I tried to reproduce this but just replacing the end-of-recovery
> checkpoint request with issuing an end-of-recovery record didn't cause
> make check-workd fail for me.  Do you have an idea of any other
> requirement to cause that?
>

You might need the following change at the end of StartupXLOG():

- if (promoted)
- RequestCheckpoint(CHECKPOINT_FORCE);
+ RequestCheckpoint(CHECKPOINT_FORCE);

Regards,
Amul




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-03 Thread Dilip Kumar
On Thu, Sep 2, 2021 at 8:52 PM Robert Haas  wrote:

> On Thu, Sep 2, 2021 at 2:06 AM Dilip Kumar  wrote:
> > 0003- The main patch for WAL logging the created database operation.
>
> Andres pointed out that this approach ends up accessing relations
> without taking a lock on them. It doesn't look like you did anything
> about that.
>

I missed that, I have shared my opinion about this in my last email [1]


>
> + /* Built-in oids are mapped directly */
> + if (classForm->oid < FirstGenbkiObjectId)
> + relfilenode = classForm->oid;
> + else if (OidIsValid(classForm->relfilenode))
> + relfilenode = classForm->relfilenode;
> + else
> + continue;
>
> Am I missing something, or is this totally busted?
>

Oops, I think the condition should be like below, but I will think
carefully before posting the next version if there is something else I am
missing.

if (OidIsValid(classForm->relfilenode))
   relfilenode = classForm->relfilenode;
else if  if (classForm->oid < FirstGenbkiObjectId)
   relfilenode = classForm->oid;
else
  continue


>   /*
> + * Now drop all buffers holding data of the target database; they should
> + * no longer be dirty so DropDatabaseBuffers is safe.
>
> The way things worked before, this was true, but now AFAICS it's
> false. I'm not sure whether that means that DropDatabaseBuffers() here
> is actually unsafe or whether it just means that you haven't updated
> the comment to explain the reason.
>

I think DropDatabaseBuffers(), itself is unsafe, we just copied pages using
bufmgr and dirtied the buffers so dropping buffers is definitely unsafe
here.


> + * Since we copy the file directly without looking at the shared buffers,
> + * we'd better first flush out any pages of the source relation that are
> + * in shared buffers.  We assume no new changes will be made while we are
> + * holding exclusive lock on the rel.
>
> Ditto.
>

Yeah this comment no longer makes sense now.


>
> + /* As always, WAL must hit the disk before the data update does. */
>
> Actually, the way it's coded now, part of the on-disk changes are done
> before WAL is issued, and part are done after. I doubt that's the
> right idea.

There's nothing special about writing the actual payload
> bytes vs. the other on-disk changes (creating directories and files).
> In any case the ordering deserves a better-considered comment than
> this one.
>

Agreed to all, but In general, I think WAL hitting the disk before data is
more applicable for the shared buffers, where we want to flush the WAL
first before writing the shared buffer so that in case of torn page we have
an option to recover the page from previous FPI. But in such cases where we
are creating a directory or file there is no such requirement.   Anyways, I
agreed with the comments that it should be more uniform and the comment
should be correct.

+ XLogRegisterData((char *) PG_MAJORVERSION, nbytes);
>
> Surely this is utterly pointless.
>

Yeah it is.  During the WAL replay also we know the PG_MAJORVERSION :)


> + CopyDatabase(src_dboid, dboid, src_deftablespace, dst_deftablespace);
>   PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
>   PointerGetDatum());
>
> I'd leave braces around the code for which we're ensuring error
> cleanup even if it's just one line.
>

Okay


> + if (info == XLOG_DBASEDIR_CREATE)
>   {
>   xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *)
> XLogRecGetData(record);
>
> Seems odd to rename the record but not change the name of the struct.
> I think I would be inclined to keep the existing record name, but if
> we're going to change it we should be more thorough.
>

Right, I think we can leave the record name as it is.

[1]
https://www.postgresql.org/message-id/CAFiTN-sP_6hWv5AxcwnWCgg%3D4hyEeeZcCgFucZsYWpr3XQbP1g%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: pg_receivewal starting position

2021-09-03 Thread Ronan Dunklau
Le jeudi 2 septembre 2021, 10:37:20 CEST Michael Paquier a écrit :
> On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote:
> > I could see the use for sending active_pid for use within pg_basebackup:
> > at
> > least we could fail early if the slot is already in use. But at the same
> > time, maybe it won't be in use anymore once we need it.
> 
> There is no real concurrent protection with this design.  You could
> read that the slot is not active during READ_REPLICATION_SLOT just to
> find out after in the process of pg_basebackup streaming WAL that it
> became in use in-between.  And the backend-side protections would kick
> in at this stage.
> 
> Hmm.  The logic doing the decision-making with pg_receivewal may
> become more tricky when it comes to pg_replication_slots.wal_status,
> max_slot_wal_keep_size and pg_replication_slots.safe_wal_size.  The
> number of cases we'd like to consider impacts directly the amount of
> data send through READ_REPLICATION_SLOT.  That's not really different
> than deciding of a failure, a success or a retry with active_pid at an
> earlier or a later stage of a base backup.  pg_receivewal, on the
> contrary, can just rely on what the backend tells when it begins
> streaming.  So I'd prefer keeping things simple and limit the number
> of fields a minimum for this command.

Ok. Please find attached new patches rebased on master.*

0001 is yours without any modification.

0002 for pg_receivewal tried to simplify the logic of information to return, 
by using a dedicated struct for this. This accounts for Bahrath's demands to 
return every possible field.
In particular, an is_logical field simplifies the detection of the type of 
slot. 
In my opinion it makes sense to simplify it like this on the client side while 
being more open-minded on the server side if we ever need to provide a new 
type of slot. Also, GetSlotInformation now returns an enum to be able to 
handle the different modes of failures, which differ between pg_receivewal and 
pg_basebackup. 

0003 is the pg_basebackup one, not much changed except for the concerns you 
had about the log message and handling of different failure modes.

There is still the concern raised by Bharath about being able to select the 
way to fetch the replication slot information for the user, and what should or 
should not be included in the documentation. I am in favor of documenting the 
process of selecting the wal start, and maybe include a --start-lsn option for 
the user to override it, but that's maybe for another patch.

-- 
Ronan Dunklau>From 60b8cedb196f5acdd75b489c1d2155c2698084a4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v5 1/3] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml  |  66 
 src/backend/replication/repl_gram.y |  16 ++-
 src/backend/replication/repl_scanner.l  |   1 +
 src/backend/replication/walsender.c | 112 
 src/include/nodes/nodes.h   |   1 +
 src/include/nodes/replnodes.h   |  10 ++
 src/test/recovery/t/001_stream_rep.pl   |  47 +++-
 src/test/recovery/t/006_logical_decoding.pl |  15 ++-
 src/tools/pgindent/typedefs.list|   1 +
 9 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..8191f17137 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,72 @@ The commands accepted in replication mode are:
 
   
 
+  
+READ_REPLICATION_SLOT slot_name
+  READ_REPLICATION_SLOT
+
+
+ 
+  Read the information of a replication slot. Returns a tuple with
+  NULL values if the replication slot does not
+  exist.
+ 
+ 
+  In response to this command, the server will return a one-row result set,
+  containing the following fields:
+  
+   
+slot_type (text)
+
+ 
+  The replication slot's type, either physical or
+  logical.
+ 
+
+   
+
+   
+restart_lsn (text)
+
+ 
+  The replication slot's restart_lsn.
+ 
+
+   
+
+   
+confirmed_flush_lsn (text)
+
+ 
+  The replication slot's confirmed_flush_lsn.
+ 
+
+   
+
+   
+restart_tli (int4)
+
+ 
+  The timeline ID for the restart_lsn position,
+  when following the current timeline history.
+ 
+
+   
+
+   
+confirmed_flush_tli (int4)
+
+ 
+  The timeline ID for the confirmed_flush_lsn
+  position, when following the current timeline history.
+ 
+
+   
+  
+ 
+
+  
+
   
 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ]
  START_REPLICATION
diff --git 

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-03 Thread Daniel Gustafsson
> On 1 Sep 2021, at 12:28, Bharath Rupireddy 
>  wrote:
> 
> On Wed, Sep 1, 2021 at 1:56 PM Daniel Gustafsson  wrote:
>> A v2 with the above fixes is attached.
> 
> Thanks for the updated patch. Here are some comments:
> 
> 1) Do we need to set bgchild = -1 before the exit(1); in the code
> below so that we don't kill(bgchild, SIGTERM); unnecessarily in
> kill_bgchild_atexit?

Good point. We can also inspect bgchild_exited in kill_bgchild_atexit.

> 2) Missing "," after "On Windows, we use a ."
> + * that time. On Windows we use a background thread which can communicate
> 
> 3) How about "/* Flag to indicate whether or not child process exited
> */" instead of +/* State of child process */?

Fixed.

> 4) Instead of just exiting from the main pg_basebackup process when
> the child WAL receiver dies, can't we think of restarting the child
> process, probably with the WAL streaming position where it left off or
> stream from the beginning? This way, the work that the main
> pg_basebackup has done so far doesn't get wasted. I'm not sure if this
> affects the pg_basebackup functionality. We can restart the child
> process for 1 or 2 times, if it still dies, we can kill the main
> pg_baasebackup process too. Thoughts?

I was toying with the idea, but I ended up not pursuing it.  This error is well
into the “really shouldn’t happen, but can” territory and it’s quite likely
that some level of manual intervention is required to make it successfully
restart.  I’m not convinced that adding complicated logic to restart (and even
more complicated tests to simulate and test it) will be worthwhile.

--
Daniel Gustafsson   https://vmware.com/



v3-0001-Quick-exit-on-log-stream-child-exit-in-pg_basebac.patch
Description: Binary data


Re: Added schema level support for publication.

2021-09-03 Thread Amit Kapila
On Mon, Aug 30, 2021 at 8:56 PM vignesh C  wrote:
>
> On Mon, Aug 30, 2021 at 9:10 AM houzj.f...@fujitsu.com
>  wrote:
> >
>
> > 5)
> > +   if (list_length(pubobj->name) == 1 &&
> > +   (strcmp(relname, "CURRENT_SCHEMA") == 0))
> > +   ereport(ERROR,
> > +   
> > errcode(ERRCODE_SYNTAX_ERROR),
> > +   errmsg("invalid relation 
> > name at or near"),
> > +   parser_errposition(pstate, 
> > pubobj->location));
> >
> > Maybe we don't need this check, because it will report an error in
> > OpenTableList() anyway, "relation "CURRENT_SCHEMA" does not exist" , and 
> > that
> > message seems readable to me.
>
> Allowing CURRENT_SCHEMA is required to support current schema for
> schema publications, currently I'm allowing this syntax during parsing
> and this error is thrown for relations later, this is done to keep the
> similar error as earlier before this feature support. I felt we can
> keep it like this to maintain the similar error. Thoughts?
>

I find this check quite ad-hoc in the code and I am not sure if we
need to be consistent for the exact message in this case. So, I think
it is better to remove it.

>
> > About  0003
> > 7)
> > The v22-0003 seems simple and can remove lots of code in patch v22-0001, so
> > maybe we can merge 0001 and 0003 into one patch ?
>
> I agree that the code becomes simpler, it reduces a lot of code. I had
> kept it like that as the testing effort might be more and also I was
> waiting if there was no objection for that syntax from anyone else. I
> will wait for a few more reviews and merge it to 0001 if there are no
> objections.
>

+1 to merge the patch as suggested by Hou-San.

-- 
With Regards,
Amit Kapila.




Re: Question about an Extension Project

2021-09-03 Thread Tomas Vondra

Hi,

I don't want to sound overly rude, but I suggest not spamming this list 
with the same message when you don't get an answer right away. If no one 
answered the first time, they're not going to answer the second time.


Providing a quote usually requires some sort of a business relationship, 
and I doubt people on this list will rush to do that when the person is 
entirely anonymous, without any prior history in the community, etc.



As for the technical side, I only quickly skimmed the specification, and 
it's entirely unclear to me


(a) Why? What's the whole point of the proposed extension, and why e.g. 
pgmp is not suitable to achieve that.


(b) What? Are the proposed data types & aritmetics a completely new 
thing, or is that already implemented somewhere? I doubt people on this 
list will be interested in inventing entirely new ways to do math (as 
opposed to using a library that already exists).



regards

On 9/3/21 10:17 AM, A Z wrote:

To who it may concern,

I am trying to get a project completed to enhance PostgreSQL arithmetic 
and elementary functions
prowess by means of two new High Precision mixed decimal number types in 
a self installing

extension.  Hopefully, I want this to be a free or low cost project.

Is there anyone who can read these project specifications and email back to
me here, at powerus...@live.com.au, to give me a quote for this project?
They are in my top posting at this discussion thread, at:

https://github.com/dvarrazzo/pgmp/issues/22

The extension could be called HPPM, High Precision Postgresql 
Mathematics.  It is
to be written in C, and will need a number of offline installers for 
major operating
systems, like Windows 10/11 or rpm based Linux. The project could be 
hosted on SourceForge

or GitHub.

If anyone on this list is interested, or knows which direction to point 
me in,

could they please reply to me here, at powerus...@live.com.au?


ZM.


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-03 Thread Dilip Kumar
On Fri, Jun 18, 2021 at 12:18 AM Andres Freund  wrote:

> Hi,
>
> On 2021-06-17 14:22:52 -0400, Robert Haas wrote:
> > On Thu, Jun 17, 2021 at 2:17 PM Andres Freund 
> wrote:
> > > Adding a hacky special case implementation for cross-database relation
> > > accesses that violates all kinds of assumptions (like holding a lock on
> > > a relation when accessing it / pinning pages, processing relcache
> > > invals, ...) doesn't seem like a good plan.
> >
> > I agree that we don't want hacky code that violates assumptions, but
> > bypassing shared_buffers is a bit hacky, too. Can't we lock the
> > relations as we're copying them? We know pg_class's OID a fortiori,
> > and we can find out all the other OIDs as we go.


> We possibly can - but I'm not sure that won't end up violating some
> other assumptions.
>

Yeah, we can surely lock the relation as described by Robert, but IMHO,
while creating the database we are already holding the exclusive lock on
the database and there is no one else allowed to be connected to the
database, so do we actually need to bother about the lock for the
correctness?


> > I'm just thinking that the hackiness of going around shared_buffers
> > feels irreducible, but maybe the hackiness in the patch is something
> > that can be solved with more engineering.
>
> Which bypassing of shared buffers are you talking about here? We'd still
> have to solve a subset of the issues around locking (at least on the
> source side), but I don't think we need to read pg_class contents to be
> able to go through shared_buffers? As I suggested, we can use the init
> fork presence to infer relpersistence?
>

I believe we want to avoid scanning pg_class for identifying the relation
list so that we can avoid this special-purpose code?  IMHO, scanning the
disk, such as going through all the tablespaces and then finding the source
database directory and identifying each relfilenode, also appears to be a
special-purpose code, unless I am missing what you mean by special-purpose
code.

Or do you mean that looking at the filesystem at all is bypassing shared
> buffers?
>

I think we already have such a code in multiple places where we bypass the
shared buffers for copying the relation
e.g. index_copy_data(), heapam_relation_copy_data().  So the current system
as it stands, we can not claim that we are designing something for the
first time where we are bypassing the shared buffers.   So if we are
thinking that bypassing the shared buffers is hackish and we don't want to
use it for the new patches then lets avoid it completely even while
identifying the relfilenodes to be copied.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [PATCH] pg_ctl should not truncate command lines at 1024 characters

2021-09-03 Thread Phil Krylov

On 2021-09-03 02:09, Tom Lane wrote:

I think that these free() calls you propose to add are a complete
waste of code space.  Certainly a free() right before an exit() call
is that; if anything, it's *delaying* recycling the memory space for
some useful purpose.  But no part of pg_ctl runs long enough for it
to be worth worrying about small leaks.


OK, I have removed the free() before exit().


I do not find your proposed test case to be a useful expenditure
of test cycles, either.  If it ever fails, we'd learn nothing,
except that that particular platform has a surprisingly small
command line length limit.


Hmm, it's a test case that fails with the current code and stops failing 
with my fix, so I've put it there to show the problem. But, truly, it 
does not bring much value after the fix is applied.


Attaching the new version, with the test case and free-before-exit 
removed.


-- Ph.From 4f8e39f3c5e39b5ec507ec4b07ad5d33c6610524 Mon Sep 17 00:00:00 2001
From: Phil Krylov 
Date: Thu, 2 Sep 2021 21:39:58 +0200
Subject: [PATCH] pg_ctl should not truncate command lines at 1024 characters

---
 src/bin/pg_ctl/pg_ctl.c | 43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 7985da0a94..e07b5f02d7 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -442,7 +442,7 @@ free_readfile(char **optlines)
 static pgpid_t
 start_postmaster(void)
 {
-	char		cmd[MAXPGPATH];
+	char	   *cmd;
 
 #ifndef WIN32
 	pgpid_t		pm_pid;
@@ -487,12 +487,12 @@ start_postmaster(void)
 	 * has the same PID as the current child process.
 	 */
 	if (log_file != NULL)
-		snprintf(cmd, MAXPGPATH, "exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
- exec_path, pgdata_opt, post_opts,
- DEVNULL, log_file);
+		cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+	   exec_path, pgdata_opt, post_opts,
+	   DEVNULL, log_file);
 	else
-		snprintf(cmd, MAXPGPATH, "exec \"%s\" %s%s < \"%s\" 2>&1",
- exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
+	   exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
 
@@ -553,12 +553,12 @@ start_postmaster(void)
 		else
 			close(fd);
 
-		snprintf(cmd, MAXPGPATH, "\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
- comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
+		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
+	   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
 	}
 	else
-		snprintf(cmd, MAXPGPATH, "\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
- comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
+	   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (!CreateRestrictedProcess(cmd, , false))
 	{
@@ -566,6 +566,7 @@ start_postmaster(void)
 	 progname, (unsigned long) GetLastError());
 		exit(1);
 	}
+	free(cmd);
 	/* Don't close command process handle here; caller must do so */
 	postmasterProcess = pi.hProcess;
 	CloseHandle(pi.hThread);
@@ -828,7 +829,7 @@ find_other_exec_or_die(const char *argv0, const char *target, const char *versio
 static void
 do_init(void)
 {
-	char		cmd[MAXPGPATH];
+	char	   *cmd;
 
 	if (exec_path == NULL)
 		exec_path = find_other_exec_or_die(argv0, "initdb", "initdb (PostgreSQL) " PG_VERSION "\n");
@@ -840,17 +841,18 @@ do_init(void)
 		post_opts = "";
 
 	if (!silent_mode)
-		snprintf(cmd, MAXPGPATH, "\"%s\" %s%s",
- exec_path, pgdata_opt, post_opts);
+		cmd = psprintf("\"%s\" %s%s",
+	   exec_path, pgdata_opt, post_opts);
 	else
-		snprintf(cmd, MAXPGPATH, "\"%s\" %s%s > \"%s\"",
- exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("\"%s\" %s%s > \"%s\"",
+	   exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (system(cmd) != 0)
 	{
 		write_stderr(_("%s: database system initialization failed\n"), progname);
 		exit(1);
 	}
+	free(cmd);
 }
 
 static void
@@ -2175,7 +2177,7 @@ set_starttype(char *starttypeopt)
 static void
 adjust_data_dir(void)
 {
-	char		cmd[MAXPGPATH],
+	char	   *cmd,
 filename[MAXPGPATH],
 			   *my_exec_path;
 	FILE	   *fd;
@@ -2207,10 +2209,10 @@ adjust_data_dir(void)
 		my_exec_path = pg_strdup(exec_path);
 
 	/* it's important for -C to be the first option, see main.c */
-	snprintf(cmd, MAXPGPATH, "\"%s\" -C data_directory %s%s",
-			 my_exec_path,
-			 pgdata_opt ? pgdata_opt : "",
-			 post_opts ? post_opts : "");
+	cmd = psprintf("\"%s\" -C data_directory %s%s",
+   my_exec_path,
+   pgdata_opt ? pgdata_opt : "",
+   post_opts ? post_opts : "");
 
 	fd = popen(cmd, "r");
 	if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL)
@@ -2219,6 +2221,7 @@ adjust_data_dir(void)
 		exit(1);
 	}
 	pclose(fd);
+	free(cmd);
 	free(my_exec_path);
 
 	/* strip trailing newline and carriage return */
-- 
2.15.2 (Apple 

Question about an Extension Project

2021-09-03 Thread A Z
To who it may concern,

I am trying to get a project completed to enhance PostgreSQL arithmetic and 
elementary functions
prowess by means of two new High Precision mixed decimal number types in a self 
installing
extension.  Hopefully, I want this to be a free or low cost project.

Is there anyone who can read these project specifications and email back to
me here, at powerus...@live.com.au, to give me a quote for this project?
They are in my top posting at this discussion thread, at:

https://github.com/dvarrazzo/pgmp/issues/22

The extension could be called HPPM, High Precision Postgresql Mathematics.  It 
is
to be written in C, and will need a number of offline installers for major 
operating
systems, like Windows 10/11 or rpm based Linux. The project could be hosted on 
SourceForge
or GitHub.

If anyone on this list is interested, or knows which direction to point me in,
could they please reply to me here, at powerus...@live.com.au?


ZM.


Re: corruption of WAL page header is never reported

2021-09-03 Thread Kyotaro Horiguchi
At Fri, 03 Sep 2021 16:55:36 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > Yes, I'm fine with your latest patch.
> 
> Thanks. Maybe some additional comment is needed.

Something like this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24165ab03e..b621ad6b0f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12496,9 +12496,21 @@ retry:
 	 *
 	 * Validating the page header is cheap enough that doing it twice
 	 * shouldn't be a big deal from a performance point of view.
+	 *
+	 * Don't call XLogReaderValidatePageHeader here while not in standby mode
+	 * so that this function won't return with a valid errmsg_buf.
 	 */
-	if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+	if (StandbyMode &&
+		!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
 	{
+		/*
+		 * emit this error right now then retry this page immediately
+		 * the message is already translated
+		 */
+		if (xlogreader->errormsg_buf[0])
+			ereport(emode_for_corrupt_record(emode, EndRecPtr),
+	(errmsg_internal("%s", xlogreader->errormsg_buf)));
+
 		/* reset any error XLogReaderValidatePageHeader() might have set */
 		xlogreader->errormsg_buf[0] = '\0';
 		goto next_record_is_invalid;


Re: corruption of WAL page header is never reported

2021-09-03 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 21:52:00 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/09/02 16:26, Kyotaro Horiguchi wrote:
> > I believe errmsg_buf is an interface to emit error messages dedicated
> > to xlogreader that doesn't have an access to elog facility, and
> > xlogreader doesn't (or ought not to or expect to) suppose
> > non-xlogreader callback functions set the variable.  In that sense I
> > don't think theoriginally proposed patch is proper for the reason that
> > the non-xlogreader callback function may set errmsg_buf.  This is what
> > I meant by the word "modularity".
> > For that reason I avoided in my second proposal to call
> > XLogReaderValidatePageHeader() at all while not in standby mode,
> > because calling the validator function while in non-standby mode
> > results in the non-xlogreader function return errmsg_buf.  Of course
> > we can instead always consume errmsg_buf in the function but I don't
> > like to shadow the caller's task.
> 
> Understood. Thanks for clarifying this!
> 
> > Does that makes sense?
> 
> Yes, I'm fine with your latest patch.

Thanks. Maybe some additional comment is needed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: when the startup process doesn't (logging startup delays)

2021-09-03 Thread Nitin Jadhav
> > Anything that's not used in other files should be declared static in
> > the file itself, and not present in the header. Once you fix this for
> > reset_startup_progress_timeout, the header won't need to include
> > datatype/timestamp.h any more, which is good, because we don't want
> > header files to depend on more other header files than necessary.
>
> Thanks for identifying this. I will take care in the next patch.

Fixed.

> > This hunk should be removed.
>
> I will remove it in the next patch.

Removed.

Please find the updated patch attached.

On Wed, Aug 18, 2021 at 12:23 PM Nitin Jadhav
 wrote:
>
> > Anything that's not used in other files should be declared static in
> > the file itself, and not present in the header. Once you fix this for
> > reset_startup_progress_timeout, the header won't need to include
> > datatype/timestamp.h any more, which is good, because we don't want
> > header files to depend on more other header files than necessary.
>
> Thanks for identifying this. I will take care in the next patch.
>
> > Looking over this version, I realized something I (or you) should have
> > noticed sooner: you've added the RegisterTimeout call to
> > InitPostgres(), but that's for things that are used by all backends,
> > and this is only used by the startup process. So it seems to me that
> > the right place is StartupProcessMain. That would have the further
> > advantage of allowing startup_progress_timeout_handler to be made
> > static. begin_startup_progress_phase() and
> > startup_progress_timeout_has_expired() are the actual API functions
> > though so they will need to remain extern.
>
> Yes. I had noticed this earlier and the RegisterTimeout() call was
> only present in StartupProcessMain() and not in InitPostgres() in the
> earlier versions (v7) of the patch. Since StartupXLOG() gets called in
> the 2 places, I had restricted the InitPostgres() flow by checking for
> the !AmStartupProcess() in the newly added functions. But later we had
> discussion and concluded to add the RegisterTimeout() call even in
> case of InitPostgres(). Kindly refer to the discussion just after the
> v7 patch in this thread and let me know your thoughts.
>
> > This hunk should be removed.
>
> I will remove it in the next patch.
>
> On Tue, Aug 17, 2021 at 1:08 AM Robert Haas  wrote:
> >
> > On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby  wrote:
> > > Should this feature distinguish between crash recovery and archive 
> > > recovery on
> > > a hot standby ?  Otherwise the standby will display this all the time.
> > >
> > > 2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, 
> > > elapsed time: 124.42 s, current LSN: 0/EEE2100
> > >
> > > If so, I think maybe you'd check !InArchiveRecovery (but until Robert 
> > > finishes
> > > cleanup of xlog.c variables, I can't say that with much confidence).
> >
> > Hmm. My inclination is to think that on an actual standby, you
> > wouldn't want to get messages like this, but if you were doing a
> > point-in-time-recovery, then you would. So I think maybe
> > InArchiveRecovery is not the right thing. Perhaps StandbyMode?
> >
> > --
> > Robert Haas
> > EDB: http://www.enterprisedb.com


v13-0001-startup-process-progress.patch
Description: Binary data


Re: Improve logging when using Huge Pages

2021-09-03 Thread Kyotaro Horiguchi
Hello.

At Fri, 3 Sep 2021 06:28:58 +, "Shinoda, Noriyoshi (PN Japan FSIP)" 
 wrote in 
> Fujii-san, Julien-san
> 
> Thank you very much for your comment.
> I followed your comment and changed the elog function to ereport function and 
> also changed the log level. The output message is the same as in the case of 
> non-HugePages memory acquisition failure.I did not simplify the error 
> messages as it would have complicated the response to the preprocessor.
> 
> > I agree that the message should be promoted to a higher level.  But I 
> > think we should also make that information available at the SQL level, 
> > as the log files may be truncated / rotated before you need the info, 
> > and it can be troublesome to find the information at the OS level, if 
> > you're lucky enough to have OS access.
> 
> In the attached patch, I have added an Internal GUC 'using_huge_pages' to 
> know that it is using HugePages. This parameter will be True only if the 
> instance is using HugePages.

IF you are thinking to show that in GUC, you might want to look into
the nearby thread [1], especially about the behavior when invoking
postgres -C using_huge_pages.  (Even though the word "using" in the
name may suggest that the server is running, but I don't think it is
neat that the variable shows "no" by the command but shows "yes" while
the same server is running.)

I have some comment about the patch.

-   if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED)
-   elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge 
pages disabled: %m",
-allocsize);
+   if (ptr != MAP_FAILED)
+   using_huge_pages = true;
+   else if (huge_pages == HUGE_PAGES_TRY)
+   ereport(LOG,
+   (errmsg("could not map anonymous shared 
memory: %m"),
+(mmap_errno == ENOMEM) ?
+errhint("This error usually means that 
PostgreSQL's request "

If we set huge_pages to try and postgres falled back to regular pages,
it emits a large message relative to its importance. The user specifed
that "I'd like to use huge pages, but it's ok if not available.", so I
think the message should be far smaller.  Maybe just raising the
DEBUG1 message to LOG along with moving to ereport might be
sufficient.

-   elog(DEBUG1, "CreateFileMapping(%zu) with 
SEC_LARGE_PAGES failed, "
-"huge pages disabled",
-size);
+   ereport(LOG,
+   (errmsg("could not create 
shared memory segment: error code %lu", GetLastError()),
+errdetail("Failed system call 
was CreateFileMapping(size=%zu, name=%s).",
+  size, 
szShareMem)));

It doesn't seem to be a regular user-facing message.  Isn't it
sufficient just to raise the log level to LOG?


[1] 
https://www.postgresql.org/message-id/20210903.141206.103927759882272221.horikyota.ntt%40gmail.com




Re: prevent immature WAL streaming

2021-09-03 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera  
wrote in 
> On 2021-Sep-02, Kyotaro Horiguchi wrote:
> 
> > So, this is a crude PoC of that.
> 
> I had ended up with something very similar, except I was trying to cram
> the flag via the checkpoint record instead of hacking
> AdvanceXLInsertBuffer().  I removed that stuff and merged both, here's
> the result.
> 
> > 1. This patch is written on the current master, but it doesn't
> >   interfare with the seg-boundary-memorize patch since it removes the
> >   calls to RegisterSegmentBoundary.
> 
> I rebased on top of the revert patch.

Thanks!

> > 2. Since xlogreader cannot emit a log-message immediately, we don't
> >   have a means to leave a log message to inform recovery met an
> >   aborted partial continuation record. (In this PoC, it is done by
> >   fprintf:p)
> 
> Shrug.  We can just use an #ifndef FRONTEND / elog(LOG).  (I didn't keep
> this part, sorry.)

No problem, it was mere a develop-time message for behavior
observation.

> > 3. Myebe we need to pg_waldump to show partial continuation records,
> >   but I'm not sure how to realize that.
> 
> Ah yes, we'll need to fix that.

I just believe 0001 does the right thing.

0002:
> + XLogRecPtr  abortedContrecordPtr; /* LSN of incomplete record at 
> end of
> +* 
> WAL */

The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?

>   if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
>   {
>   report_invalid_record(state,
> 
> "there is no contrecord flag at %X/%X",
> 
> LSN_FORMAT_ARGS(RecPtr));
> - goto err;
> + goto aborted_contrecord;

This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and
_IS_ABROTED_PARTIAL.  Is it okay?  (I don't object to remove the check.).

I didin't thought this as an aborted contrecord. but on second
thought, when we see a record broken in any style, we stop recovery at
the point. I agree to the change and all the silmiar changes.

+   /* XXX should we goto 
aborted_contrecord here? */

I think it should be aborted_contrecord.

When that happens, the loaded bytes actually looked like the first
fragment of a continuation record to xlogreader, even if the cause
were a broken total_len.  So if we abort the record there, the next
time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same
page, and correctly finds a new record there.

On the other hand if we just errored-out there, we will step-back to
the beginning of the broken record in the previous page or segment
then start writing a new record there but that is exactly what we want
to avoid now.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Unused variable in TAP tests file

2021-09-03 Thread Michael Paquier
On Fri, Sep 03, 2021 at 10:53:19AM +0530, Amul Sul wrote:
> Few tap test files have the "tempdir_short" variable which isn't in
> use. The attached patch removes the same

Indeed.  Let's clean up that.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Add guc to enable send SIGSTOP to peers when backend exits abnormally

2021-09-03 Thread 蔡梦娟(玊于)

Hi all

I want to share a patch with you, in which I add a guc parameter 
'enable_send_stop' to enable set the value of SendStop in postmaster.c more 
conveniently.  SendStop enable postmaster to send SIGSTOP rather than SIGQUIT 
to its children when some backend dumps core, and this variable is originally 
set with -T parameter when start postgres, which is inconvenient to control in 
some scenarios.

Thanks & Best Regards

0001-Add-guc-to-set-SendStop.patch
Description: Binary data


Re: suboverflowed subtransactions concurrency performance optimize

2021-09-03 Thread Andrey Borodin
Sorry, for some reason Mail.app converted message to html and mailing list 
mangled this html into mess. I'm resending previous message as plain text 
again. Sorry for the noise.

> 31 авг. 2021 г., в 11:43, Pengchengliu  написал(а):
> 
> Hi Andrey,
>  Thanks a lot for your replay and reference information.
> 
>  The default NUM_SUBTRANS_BUFFERS is 32. My implementation is 
> local_cache_subtrans_pages can be adjusted dynamically.
>  If we configure local_cache_subtrans_pages as 64, every backend use only 
> extra 64*8192=512KB memory. 
>  So the local cache is similar to the first level cache. And subtrans SLRU is 
> the second level cache.
>  And I think extra memory is very well worth it. It really resolve massive 
> subtrans stuck issue which I mentioned in previous email.
> 
>  I have view the patch of [0] before. For SLRU buffers adding GUC 
> configuration parameters are very nice.
>  I think for subtrans, its optimize is not enough. For 
> SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then 
> call SubTransGetParent in loop.
>  Prevent acquire/release  SubtransSLRULock in SubTransGetTopmostTransaction-> 
> SubTransGetParent in loop.
>  After I apply this patch which I  optimize SubTransGetTopmostTransaction,  
> with my test case, I still get stuck result.

SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem 
may arise only when someone reads page from disk. But if you have big enough 
cache - this will never happen. And this cache will be much less than 
512KB*max_connections.

I think if we really want to fix exclusive SubtransSLRULock I think best option 
would be to split SLRU control lock into array of locks - one for each bank (in 
v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this 
approach we will have to rename s/bank/partition/g for consistency with locks 
and buffers partitions. I really liked having my own banks, but consistency 
worth it anyway.

Thanks!

Best regards, Andrey Borodin.



Re: Skipping logical replication transactions on subscriber side

2021-09-03 Thread Greg Nancarrow
On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada  wrote:
>
> I've attached rebased patches. 0004 patch is not the scope of this
> patch. It's borrowed from another thread[1] to fix the assertion
> failure for newly added tests. Please review them.
>

BTW, these patches need rebasing (broken by recent commits, patches
0001, 0003 and 0004 no longer apply, and it's failing in the cfbot).


Regards,
Greg Nancarrow
Fujitsu Australia




RE: Improve logging when using Huge Pages

2021-09-03 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Fujii-san, Julien-san

Thank you very much for your comment.
I followed your comment and changed the elog function to ereport function and 
also changed the log level. The output message is the same as in the case of 
non-HugePages memory acquisition failure.I did not simplify the error messages 
as it would have complicated the response to the preprocessor.

> I agree that the message should be promoted to a higher level.  But I 
> think we should also make that information available at the SQL level, 
> as the log files may be truncated / rotated before you need the info, 
> and it can be troublesome to find the information at the OS level, if 
> you're lucky enough to have OS access.

In the attached patch, I have added an Internal GUC 'using_huge_pages' to know 
that it is using HugePages. This parameter will be True only if the instance is 
using HugePages.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Fujii Masao [mailto:masao.fu...@oss.nttdata.com] 
Sent: Wednesday, September 1, 2021 2:06 AM
To: Julien Rouhaud ; Shinoda, Noriyoshi (PN Japan FSIP) 

Cc: pgsql-hack...@postgresql.org
Subject: Re: Improve logging when using Huge Pages



On 2021/08/31 15:57, Julien Rouhaud wrote:
> On Tue, Aug 31, 2021 at 1:37 PM Shinoda, Noriyoshi (PN Japan FSIP) 
>  wrote:
>>
>> In the current version, when GUC huge_pages=try, which is the default 
>> setting, no log is output regardless of the success or failure of the 
>> HugePages acquisition. If you want to output logs, you need to set 
>> log_min_messages=DEBUG3, but it will output a huge amount of extra logs.
>> With huge_pages=try setting, if the kernel parameter vm.nr_hugepages is not 
>> enough, the administrator will not notice that HugePages is not being used.
>> I think it should output a log if HugePages was not available.

+1

-   elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge 
pages disabled: %m",
+   elog(WARNING, "mmap(%zu) with MAP_HUGETLB failed, huge 
pages 
+disabled: %m",

elog() should be used only for internal errors and low-level debug logging.
So per your proposal, elog() is not suitable here. Instead, ereport() should be 
used.

The log level should be LOG rather than WARNING because this message indicates 
the information about server activity that administrators are interested in.

The message should be updated so that it follows the Error Message Style Guide.
https://www.postgresql.org/docs/devel/error-style-guide.html 

With huge_pages=on, if shared memory fails to be allocated, the error message 
is reported currently. Even with huge_page=try, this error message should be 
used to simplify the code as follows?

 errno = mmap_errno;
-   ereport(FATAL,
+   ereport((huge_pages == HUGE_PAGES_ON) ? FATAL : LOG,
 (errmsg("could not map anonymous shared 
memory: %m"),
  (mmap_errno == ENOMEM) ?
  errhint("This error usually means that 
PostgreSQL's request "



> I agree that the message should be promoted to a higher level.  But I 
> think we should also make that information available at the SQL level, 
> as the log files may be truncated / rotated before you need the info, 
> and it can be troublesome to find the information at the OS level, if 
> you're lucky enough to have OS access.

+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


huge_pages_log_v2.diff
Description: huge_pages_log_v2.diff


Re: Read-only vs read only vs readonly

2021-09-03 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 22:07:02 +, "Bossart, Nathan"  
wrote in 
> On 9/2/21, 11:30 AM, "Magnus Hagander"  wrote:
> > I had a customer point out to me that we're inconsistent in how we
> > spell read-only. Turns out we're not as inconsistent as I initially
> > thought :), but that they did manage to spot the one actual log
> > message we have that writes it differently than everything else -- but
> > that broke their grepping...
> >
> > Almost everywhere we use read-only. Attached patch changes the one log
> > message where we didn't, as well as a few places in the docs for it. I
> > did not bother with things like comments in the code.
> > 
> > Two questions:
> >
> > 1. Is it worth fixing? Or just silly nitpicking?
> 
> It seems entirely reasonable to me to consistently use "read-only" in
> the log messages and documentation.
> 
> > 2. What about translations? This string exists in translations --
> > should we just "fix" it there, without touching the translated string?
> > Or try to fix both? Or leave it for the translators who will get a
> > diff on it?
> 
> I don't have a strong opinion, but if I had to choose, I would say to
> leave it to the translators to decide.

+1 for both.  As a translator, it seems that that kind of changes are
usual.  Many changes about full-stops, spacings, capitalizing and so
happen especially at near-release season like now.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center