Re: Column Filtering in Logical Replication
On Sat, Sep 4, 2021 at 10:12 AM Amit Kapila wrote: > > On Thu, Sep 2, 2021 at 2:51 AM Rahila Syed wrote: > > > >>> > >>> Do we want to consider that the columns specified in the filter must > >>> not have NOT NULL constraint? Because, otherwise, the subscriber will > >>> error out inserting such rows? > >>> > >> I think you mean columns *not* specified in the filter must not have NOT > >> NULL constraint > >> on the subscriber, as this will break during insert, as it will try to > >> insert NULL for columns > >> not sent by the publisher. > >> I will look into fixing this. Probably this won't be a problem in > >> case the column is auto generated or contains a default value. > >> > > > > I am not sure if this needs to be handled. Ideally, we need to prevent the > > subscriber tables from having a NOT NULL > > constraint if the publisher uses column filters to publish the values of > > the table. There is no way > > to do this at the time of creating a table on subscriber. > > > > As this involves querying the publisher for this information, it can be > > done at the time of initial table synchronization. > > i.e error out if any of the subscribed tables has NOT NULL constraint on > > non-filter columns. > > This will lead to the user dropping and recreating the subscription after > > removing the > > NOT NULL constraint from the table. > > I think the same can be achieved by doing nothing and letting the > > subscriber error out while inserting rows. > > > > That makes sense and also it is quite possible that users don't have > such columns in the tables on subscribers. I guess we can add such a > recommendation in the docs instead of doing anything in the code. > > Few comments: > > Did you give any thoughts to my earlier suggestion related to syntax [1]? [1] - https://www.postgresql.org/message-id/CAA4eK1J9b_0_PMnJ2jq9E55bcbmTKdUmy6jPnkf1Zwy2jxah_g%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Column Filtering in Logical Replication
On Thu, Sep 2, 2021 at 2:51 AM Rahila Syed wrote: > >>> >>> Do we want to consider that the columns specified in the filter must >>> not have NOT NULL constraint? Because, otherwise, the subscriber will >>> error out inserting such rows? >>> >> I think you mean columns *not* specified in the filter must not have NOT >> NULL constraint >> on the subscriber, as this will break during insert, as it will try to >> insert NULL for columns >> not sent by the publisher. >> I will look into fixing this. Probably this won't be a problem in >> case the column is auto generated or contains a default value. >> > > I am not sure if this needs to be handled. Ideally, we need to prevent the > subscriber tables from having a NOT NULL > constraint if the publisher uses column filters to publish the values of the > table. There is no way > to do this at the time of creating a table on subscriber. > > As this involves querying the publisher for this information, it can be done > at the time of initial table synchronization. > i.e error out if any of the subscribed tables has NOT NULL constraint on > non-filter columns. > This will lead to the user dropping and recreating the subscription after > removing the > NOT NULL constraint from the table. > I think the same can be achieved by doing nothing and letting the subscriber > error out while inserting rows. > That makes sense and also it is quite possible that users don't have such columns in the tables on subscribers. I guess we can add such a recommendation in the docs instead of doing anything in the code. Few comments: 1. + + /* + * Cannot specify column filter when REPLICA IDENTITY IS FULL + * or if column filter does not contain REPLICA IDENITY columns + */ + if (targetcols != NIL) + { + if (replidentfull) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot add relation \"%s\" to publication", + RelationGetRelationName(targetrel)), + errdetail("Cannot have column filter with REPLICA IDENTITY FULL"))); Why do we want to have such a restriction for REPLICA IDENTITY FULL? I think it is better to expand comments in that regards. 2. @@ -839,7 +839,6 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("extra data after last expected column"))); - fieldno = 0; @@ -944,7 +992,6 @@ logicalrep_write_attrs(StringInfo out, Relation rel) flags |= LOGICALREP_IS_REPLICA_IDENTITY; pq_sendbyte(out, flags); - /* attribute name */ pq_sendstring(out, NameStr(att->attname)); @@ -953,6 +1000,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel) /* attribute mode */ pq_sendint32(out, att->atttypmod); + } Spurious line removals and addition. -- With Regards, Amit Kapila.
Re: Column Filtering in Logical Replication
On Thu, Sep 2, 2021 at 2:19 PM Alvaro Herrera wrote: > > On 2021-Sep-02, Rahila Syed wrote: > > > After thinking about this, I think it is best to remove the entire table > > from publication, > > if a column specified in the column filter is dropped from the table. > > Hmm, I think it would be cleanest to give responsibility to the user: if > the column to be dropped is in the filter, then raise an error, aborting > the drop. > Do you think that will make sense if the user used Cascade (Alter Table ... Drop Column ... Cascade)? -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Sat, Sep 4, 2021 at 8:54 AM Amit Kapila wrote: > > On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger > wrote: > > > > > On Aug 30, 2021, at 12:06 AM, Masahiko Sawada > > > wrote: > > > > > > I've attached rebased patches. > > For the v12-0003 patch: > > > > I believe this feature is needed, but it also seems like a very powerful > > foot-gun. Can we do anything to make it less likely that users will hurt > > themselves with this tool? > > > > This won't do any more harm than currently, users can do via > pg_replication_slot_advance and the same is documented as well, see > [1]. > Sorry, forgot to give the link. [1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger wrote: > > > On Aug 30, 2021, at 12:06 AM, Masahiko Sawada wrote: > > > > I've attached rebased patches. > For the v12-0003 patch: > > I believe this feature is needed, but it also seems like a very powerful > foot-gun. Can we do anything to make it less likely that users will hurt > themselves with this tool? > This won't do any more harm than currently, users can do via pg_replication_slot_advance and the same is documented as well, see [1]. This will be allowed to only superusers. Its effect will be documented with a precautionary note to use it only when the other available ways can't be used. Any better ideas? > I am thinking back to support calls I have attended. When a production > system is down, there is often some hesitancy to perform ad-hoc operations on > the database, but once the decision has been made to do so, people try to get > the whole process done as quickly as possible. If multiple transactions on > the publisher fail on the subscriber, they will do so in series, not in > parallel. > The subscriber will know only one transaction failure at a time, till that is resolved, the apply won't move ahead and it won't know even if there are other transactions that are going to fail in the future. > > If the user could instead clear all failed transactions of the same type, > that might make it less likely that they unthinkingly also skip subsequent > errors of some different type. Perhaps something like ALTER SUBSCRIPTION ... > SET (skip_failures = 'duplicate key value violates unique constraint > "test_pkey"')? > I think if we want we can allow to skip particular error via skip_error_code instead of via error message but not sure if it would be better to skip a particular operation of the transaction rather than the entire transaction. Normally from the atomicity purpose the transaction can be either committed or rolled-back but not partially done so I think it would be preferable to skip the entire transaction rather than skipping it partially. > This is arguably a different feature request, and not something your patch > is required to address, but I wonder how much we should limit people shooting > themselves in the foot? If we built something like this using your skip_xid > feature, rather than instead of your skip_xid feature, would your feature > need to be modified? > Sawada-San can answer better but I don't see any problem building any such feature on top of what is currently proposed. > > I'm having trouble thinking of an example conflict where skipping a > transaction would be better than writing a BEFORE INSERT trigger on the > conflicting table which suppresses or redirects conflicting rows somewhere > else. Particularly for larger transactions containing multiple statements, > suppressing the conflicting rows using a trigger would be less messy than > skipping the transaction. I think your patch adds a useful tool to the > toolkit, but maybe we should mention more alternatives in the docs? > Something like, "changing the data on the subscriber so that it doesn't > conflict with incoming changes, or dropping the conflicting constraint or > unique index, or writing a trigger on the subscriber to suppress or redirect > conflicting incoming changes, or as a last resort, by skipping the whole > transaction"? > +1 for extending the docs as per this suggestion. -- With Regards, Amit Kapila.
Re: when the startup process doesn't (logging startup delays)
On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote: > Please find the updated patch attached. Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=rwtpyhc7-...@mail.gmail.com I agree with Robert that a standby server should not continuously show timing messages for WAL replay. Some doc comments: + Sets the time interval between each progress update of the operations + performed during startup process. This produces the log message after Either say "performed by the startup process" or "performed during startup". s/the/a/ + every interval of time for the operations that take longer time. The unit ..for those operations which take longer than the specified duration. + used to specify the value is seconds. For example, if you set it to + 10s , then after every 10s there remove "after" + is a log message indicating which operation is going on and what is the say "..every 10s, a log is emitted indicating which operation is ongoing, and the elapsed time from the beginning of the operation.." + elapsed time from beginning. If the value is set to 0 , + then it logs all the available messages for such operations. -1 "..then all messages for such operations are logged." + disables the feature. The default value is set to 10s + "The default value is >10s<."
Re: [PATCH] pg_ctl should not truncate command lines at 1024 characters
Phil Krylov writes: > Attaching the new version, with the test case and free-before-exit > removed. Pushed with minor cosmetic adjustments. Thanks for the patch! regards, tom lane
Re: prevent immature WAL streaming
Hi, On 2021-09-03 20:01:50 -0400, Alvaro Herrera wrote: > From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera > Date: Tue, 31 Aug 2021 20:55:10 -0400 > Subject: [PATCH v3 1/5] Revert "Avoid creating archive status ".ready" files > too early" > This reverts commit 515e3d84a0b58b58eb30194209d2bc47ed349f5b. I'd prefer to see this pushed soon. I've a bunch of patches to xlog.c that conflict with the prior changes, and rebasing back and forth isn't that much fun... > From f767cdddb3120f1f6c079c8eb00eaff38ea98c79 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera > Date: Thu, 2 Sep 2021 17:21:46 -0400 > Subject: [PATCH v3 2/5] Implement FIRST_IS_ABORTED_CONTRECORD > > --- > src/backend/access/transam/xlog.c | 53 +++-- > src/backend/access/transam/xlogreader.c | 39 +- > src/include/access/xlog_internal.h | 14 ++- > src/include/access/xlogreader.h | 3 ++ > 4 files changed, 103 insertions(+), 6 deletions(-) > > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index e51a7a749d..411f1618df 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -586,6 +586,8 @@ typedef struct XLogCtlData > XLogRecPtr replicationSlotMinLSN; /* oldest LSN needed by any > slot */ > > XLogSegNo lastRemovedSegNo; /* latest removed/recycled XLOG > segment */ > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at > end of > +* > WAL */ > > /* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */ > XLogRecPtr unloggedLSN; > @@ -848,6 +850,7 @@ static XLogSource XLogReceiptSource = XLOG_FROM_ANY; > /* State information for XLOG reading */ > static XLogRecPtr ReadRecPtr;/* start of last record read */ > static XLogRecPtr EndRecPtr; /* end+1 of last record read */ > +static XLogRecPtr abortedContrecordPtr; /* end+1 of incomplete record */ > > /* > * Local copies of equivalent fields in the control file. When running > @@ -2246,6 +2249,30 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool > opportunistic) > if (!Insert->forcePageWrites) > NewPage->xlp_info |= XLP_BKP_REMOVABLE; > > + /* > + * If the last page ended with an aborted partial continuation > record, > + * mark the new page to indicate that the partial record can be > + * omitted. > + * > + * This happens only once at the end of recovery, so there's no > race > + * condition here. > + */ > + if (XLogCtl->abortedContrecordPtr >= NewPageBeginPtr) > + { Can we move this case out of AdvanceXLInsertBuffer()? As the comment says, this only happens at the end of recovery, so putting it into AdvanceXLInsertBuffer() doesn't really seem necessary? > +#ifdef WAL_DEBUG > + if (XLogCtl->abortedContrecordPtr != NewPageBeginPtr) > + elog(PANIC, "inconsistent aborted contrecord > location %X/%X, expected %X/%X", > + > LSN_FORMAT_ARGS(XLogCtl->abortedContrecordPtr), > + LSN_FORMAT_ARGS(NewPageBeginPtr)); > + ereport(LOG, > + (errmsg_internal("setting > XLP_FIRST_IS_ABORTED_PARTIAL flag at %X/%X", > + > LSN_FORMAT_ARGS(NewPageBeginPtr; > +#endif > + NewPage->xlp_info |= XLP_FIRST_IS_ABORTED_PARTIAL; > + > + XLogCtl->abortedContrecordPtr = InvalidXLogRecPtr; > + } > /* >* If first page of an XLOG segment file, make it a long header. >*/ > @@ -4392,6 +4419,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode, > record = XLogReadRecord(xlogreader, ); > ReadRecPtr = xlogreader->ReadRecPtr; > EndRecPtr = xlogreader->EndRecPtr; > + abortedContrecordPtr = xlogreader->abortedContrecordPtr; > if (record == NULL) > { > if (readFile >= 0) > @@ -7691,10 +7719,29 @@ StartupXLOG(void) > /* >* Re-fetch the last valid or last applied record, so we can identify > the >* exact endpoint of what we consider the valid portion of WAL. > + * > + * When recovery ended in an incomplete record, continue writing from > the > + * point where it went missing. This leaves behind an initial part of > + * broken record, which rescues downstream which have already received > + * that first part. >*/ > - XLogBeginRead(xlogreader, LastRec); > - record = ReadRecord(xlogreader, PANIC, false); > -
Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead
Hi, On 2021-08-31 21:56:50 -0700, Andres Freund wrote: > On 2021-08-27 13:57:45 +0900, Michael Paquier wrote: > > On Wed, Aug 25, 2021 at 01:20:03AM -0700, Andres Freund wrote: > > > On 2021-08-25 12:51:58 +0900, Michael Paquier wrote: > > > As I said before, this ship has long sailed: > > > > > > typedef struct PgStat_MsgTabstat > > > { > > > PgStat_MsgHdr m_hdr; > > > Oid m_databaseid; > > > int m_nentries; > > > int m_xact_commit; > > > int m_xact_rollback; > > > PgStat_Counter m_block_read_time; /* times in microseconds */ > > > PgStat_Counter m_block_write_time; > > > PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES]; > > > } PgStat_MsgTabstat; > > > > Well, I kind of misread what you meant upthread then. > > PgStat_MsgTabstat has a name a bit misleading, especially if you > > assign connection stats to it. > > ISTM we should just do this fairly obvious change. Given that we already > transport commit / rollback / IO stats, I don't see why the connection stats > change anything to a meaningful degree. I'm fairly baffled why that's not the > obvious thing to do for v14. Here's how I think that would look like. While writing up this draft, I found two more issues: - On windows / 32 bit systems, the session time would overflow if idle for longer than ~4300s. long is only 32 bit. Easy to fix obviously. - Right now walsenders, including database connected walsenders, are not reported in connection stats. That doesn't seem quite right to me. In the patch I made the message for connecting an explicitly reported message, that seems cleaner, because it then happens at a clearly defined point. I didn't do the same for disconnecting, but perhaps that would be better? Then we could get rid of the whole pgStatSessionEndCause variable. Greetings, Andres Freund >From 2b3cf32bd02ea4b73157db023d6475826e32c64a Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 3 Sep 2021 17:02:15 -0700 Subject: [PATCH] wip: Reduce overhead of "pg_stat_database counters for sessions and session time" "Fixes" 960869da0803 Author: Andres Freund Reviewed-By: Discussion: https://postgr.es/m/20210901045650.fz4he2b3wx4p7...@alap3.anarazel.de Backpatch: --- src/include/pgstat.h| 34 ++-- src/backend/postmaster/pgstat.c | 171 src/backend/tcop/postgres.c | 2 + src/backend/utils/activity/backend_status.c | 4 +- 4 files changed, 132 insertions(+), 79 deletions(-) diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 509849c7ff4..a7b386821f6 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -81,7 +81,8 @@ typedef enum StatMsgType PGSTAT_MTYPE_DEADLOCK, PGSTAT_MTYPE_CHECKSUMFAILURE, PGSTAT_MTYPE_REPLSLOT, - PGSTAT_MTYPE_CONNECTION, + PGSTAT_MTYPE_CONNECT, + PGSTAT_MTYPE_DISCONNECT, } StatMsgType; /* -- @@ -279,7 +280,7 @@ typedef struct PgStat_TableEntry * -- */ #define PGSTAT_NUM_TABENTRIES \ - ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - 3 * sizeof(int) - 2 * sizeof(PgStat_Counter)) \ + ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - 3 * sizeof(int) - 5 * sizeof(PgStat_Counter)) \ / sizeof(PgStat_TableEntry)) typedef struct PgStat_MsgTabstat @@ -291,6 +292,9 @@ typedef struct PgStat_MsgTabstat int m_xact_rollback; PgStat_Counter m_block_read_time; /* times in microseconds */ PgStat_Counter m_block_write_time; + PgStat_Counter m_session_time; + PgStat_Counter m_active_time; + PgStat_Counter m_idle_in_xact_time; PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES]; } PgStat_MsgTabstat; @@ -653,20 +657,26 @@ typedef struct PgStat_MsgChecksumFailure } PgStat_MsgChecksumFailure; /* -- - * PgStat_MsgConn Sent by the backend to update connection statistics. + * PgStat_MsgConnect Sent by the backend upon connection + *establishment * -- */ -typedef struct PgStat_MsgConn +typedef struct PgStat_MsgConnect { PgStat_MsgHdr m_hdr; Oid m_databaseid; - PgStat_Counter m_count; - PgStat_Counter m_session_time; - PgStat_Counter m_active_time; - PgStat_Counter m_idle_in_xact_time; - SessionEndType m_disconnect; -} PgStat_MsgConn; +} PgStat_MsgConnect; +/* -- + * PgStat_MsgDisconnect Sent by the backend when disconnecting + * -- + */ +typedef struct PgStat_MsgDisconnect +{ + PgStat_MsgHdr m_hdr; + Oid m_databaseid; + SessionEndType m_cause; +} PgStat_MsgDisconnect; /* -- * PgStat_Msg Union over all possible messages. @@ -700,7 +710,8 @@ typedef union PgStat_Msg PgStat_MsgTempFile msg_tempfile; PgStat_MsgChecksumFailure msg_checksumfailure; PgStat_MsgReplSlot msg_replslot; - PgStat_MsgConn msg_conn; + PgStat_MsgConnect msg_connect; + PgStat_MsgDisconnect msg_disconnect; } PgStat_Msg; @@ -1010,6 +1021,7 @@ extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type t extern void
Re: prevent immature WAL streaming
I thought that the way to have debug output for this new WAL code is to use WAL_DEBUG; that way it won't bother anyone and we can remove them later if necessary. Also, I realized that we should cause any error in the path that assembles a record from contrecords is to set a flag that we can test after the standard "err:" label; no need to create a new label. I also wrote a lot more comments to try and explain what is going on and why. I'm still unsure about the two-flags reporting in xlogreader, so I put that in a separate commit. Opinions on that one? The last commit is something I noticed in pg_rewind ... -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "No hay ausente sin culpa ni presente sin disculpa" (Prov. francés) >From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 31 Aug 2021 20:55:10 -0400 Subject: [PATCH v3 1/5] Revert "Avoid creating archive status ".ready" files too early" This reverts commit 515e3d84a0b58b58eb30194209d2bc47ed349f5b. --- src/backend/access/transam/timeline.c| 2 +- src/backend/access/transam/xlog.c| 220 ++- src/backend/access/transam/xlogarchive.c | 17 +- src/backend/postmaster/walwriter.c | 7 - src/backend/replication/walreceiver.c| 6 +- src/include/access/xlog.h| 1 - src/include/access/xlogarchive.h | 4 +- src/include/access/xlogdefs.h| 1 - 8 files changed, 24 insertions(+), 234 deletions(-) diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c index acd5c2431d..8d0903c175 100644 --- a/src/backend/access/transam/timeline.c +++ b/src/backend/access/transam/timeline.c @@ -452,7 +452,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, if (XLogArchivingActive()) { TLHistoryFileName(histfname, newTLI); - XLogArchiveNotify(histfname, true); + XLogArchiveNotify(histfname); } } diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 24165ab03e..e51a7a749d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -724,18 +724,6 @@ typedef struct XLogCtlData XLogRecPtr lastFpwDisableRecPtr; slock_t info_lck; /* locks shared variables shown above */ - - /* - * Variables used to track segment-boundary-crossing WAL records. See - * RegisterSegmentBoundary. Protected by segtrack_lck. - */ - XLogSegNo lastNotifiedSeg; - XLogSegNo earliestSegBoundary; - XLogRecPtr earliestSegBoundaryEndPtr; - XLogSegNo latestSegBoundary; - XLogRecPtr latestSegBoundaryEndPtr; - - slock_t segtrack_lck; /* locks shared variables shown above */ } XLogCtlData; static XLogCtlData *XLogCtl = NULL; @@ -932,7 +920,6 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo); static void UpdateLastRemovedPtr(char *filename); static void ValidateXLOGDirectoryStructure(void); -static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos); static void CleanupBackupHistory(void); static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, @@ -1167,56 +1154,23 @@ XLogInsertRecord(XLogRecData *rdata, END_CRIT_SECTION(); /* - * If we crossed page boundary, update LogwrtRqst.Write; if we crossed - * segment boundary, register that and wake up walwriter. + * Update shared LogwrtRqst.Write, if we crossed page boundary. */ if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ) { - XLogSegNo StartSeg; - XLogSegNo EndSeg; - - XLByteToSeg(StartPos, StartSeg, wal_segment_size); - XLByteToSeg(EndPos, EndSeg, wal_segment_size); - - /* - * Register our crossing the segment boundary if that occurred. - * - * Note that we did not use XLByteToPrevSeg() for determining the - * ending segment. This is so that a record that fits perfectly into - * the end of the segment causes the latter to get marked ready for - * archival immediately. - */ - if (StartSeg != EndSeg && XLogArchivingActive()) - RegisterSegmentBoundary(EndSeg, EndPos); - - /* - * Advance LogwrtRqst.Write so that it includes new block(s). - * - * We do this after registering the segment boundary so that the - * comparison with the flushed pointer below can use the latest value - * known globally. - */ SpinLockAcquire(>info_lck); + /* advance global request to include new block(s) */ if (XLogCtl->LogwrtRqst.Write < EndPos) XLogCtl->LogwrtRqst.Write = EndPos; /* update local result copy while I have the chance */ LogwrtResult = XLogCtl->LogwrtResult; SpinLockRelease(>info_lck); - - /* - * There's a chance that the record was already flushed to disk and we - * missed marking segments as ready for archive. If this happens, we - * nudge the WALWriter, which will take care of notifying segments as - * needed. - */ - if (StartSeg != EndSeg &&
Re: [PATCH] support tab-completion for single quote input with equal sign
Hi Tang, On Fri, 2021-09-03 at 04:32 +, tanghy.f...@fujitsu.com wrote: > I'd appreciate it if you can share your test results with me. Sure! Here's my output (after a `make clean && make`): cd . && TESTDIR='/home/pchampion/workspace/postgres/src/bin/psql' PATH="/home/pchampion/workspace/postgres/tmp_install/usr/local/pgsql-master/bin:$PATH" LD_LIBRARY_PATH="/home/pchampion/workspace/postgres/tmp_install/usr/local/pgsql-master/lib" PGPORT='65432' PG_REGRESS='/home/pchampion/workspace/postgres/src/bin/psql/../../../src/test/regress/pg_regress' /usr/bin/prove -I ../../../src/test/perl/ -I . t/*.pl t/010_tab_completion.pl .. 17/? # Failed test 'tab-completion after single quoted text input with equal sign' # at t/010_tab_completion.pl line 198. # Actual output was "CREATE SUBSCRIPTION my_sub CONNECTION 'host=localhost port=5432 dbname=postgres' \a" # Did not match "(?^:PUBLICATION)" # Looks like you failed 1 test of 23. t/010_tab_completion.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/23 subtests t/020_cancel.pl .. ok Test Summary Report --- t/010_tab_completion.pl (Wstat: 256 Tests: 23 Failed: 1) Failed test: 17 Non-zero exit status: 1 Files=2, Tests=25, 8 wallclock secs ( 0.02 usr 0.01 sys + 1.48 cusr 0.37 csys = 1.88 CPU) Result: FAIL make: *** [Makefile:87: check] Error 1 Thanks, --Jacob
Re: [PATCH] test/ssl: rework the sslfiles Makefile target
On Fri, 2021-09-03 at 09:46 +0900, Michael Paquier wrote: > Nice. This is neat. The split helps a lot to understand how you've > changed things from the original implementation. As a whole, this > looks rather committable to me. Great! > One small-ish comment that I have is about all the .config files we > have at the root of src/test/ssl/, bloating the whole. I think that > it would be a bit cleaner to put all of them in a different > sub-directory, say just config/ or conf/. That sounds reasonable. I won't be able to get to it before the holiday weekend, but I can put up a patch sometime next week. Thanks, --Jacob
Re: A reloption for partitioned tables - parallel_workers
On Sat, Sep 4, 2021 at 3:10 Laurenz Albe wrote: > On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote: > > > On 6 Apr 2021, at 09:46, Amit Langote wrote: > > > On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe > wrote: > > > > > > I don't know if Seamus is still working on that; if not, we might > > > > mark it as "returned with feedback". > > > > > > I have to agree given the time left. > > > > This thread has stalled and the patch no longer applies. I propose that > we > > mark this Returned with Feedback, is that Ok with you Amit? > > +1. That requires more thought. Yes, I think so too. -- Amit Langote EDB: http://www.enterprisedb.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Hi, On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote: > Yeah, we can surely lock the relation as described by Robert, but IMHO, > while creating the database we are already holding the exclusive lock on > the database and there is no one else allowed to be connected to the > database, so do we actually need to bother about the lock for the > correctness? The problem is that checkpointer, bgwriter, buffer reclaim don't care about the database of the buffer they're working on... The exclusive lock on the database doesn't change anything about that. Perhaps you can justify it's safe because there can't be any dirty buffers or such though. > I think we already have such a code in multiple places where we bypass the > shared buffers for copying the relation > e.g. index_copy_data(), heapam_relation_copy_data(). That's not at all comparable. We hold an exclusive lock on the relation at that point, and we don't have a separate implementation of reading tuples from the table or something like that. Greetings, Andres Freund
Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally
Alvaro Herrera writes: > On 2021-Sep-03, Tom Lane wrote: >> TBH, I'd sooner rip out SendStop, and simplify the related postmaster >> logic. > I wrote a patch to do that in 2012, after this exchange: > https://postgr.es/m/1333124720-sup-6...@alvh.no-ip.org > I obviously doesn't apply at all anymore, but the thing that prevented > me from sending it was I couldn't find what the mentioned feature was > that would cause all backends to dump core at the time of a crash. Oh, I think you misunderstood what I wrote. I was thinking of the ancient habit of most kernels to dump cores to a file just named "core"; so that even if you went around and manually SIGABRT'd each stopped process, the cores would all overwrite each other, leaving you with little to show for the exercise. Nowadays you're more likely to get "core.NNN" for each PID, so that it could in principle be useful to force all the backends to dump core for later analysis. But I know of no mechanism that would do that for you. However, thinking about this afresh, it seems like that Berkeley-era comment about "the wily post_hacker" was never very apropos. If what you wanted was a few GB of core files for later analysis, it'd make more sense to have the postmaster send SIGABRT or the like. That saves a bunch of tedious manual steps, plus the cluster isn't left in a funny state that requires yet more manual cleanup steps. So I'm thinking that the *real* use-case for this is for developers to attach with gdb and do on-the-fly investigation of the state of other backends, rather than forcing core-dumps. However, it's still a pretty half-baked feature because there's no easy way to clean up afterwards. The other elephant in the room is that by the time the postmaster has reacted to the initial backend crash, it's dubious whether the state of other processes is still able to tell you much. (IME, at least, the postmaster doesn't hear about it until the kernel has finished writing out the dying process's core image, which takes approximately forever compared to modern CPU speeds.) > So it seemed to me that we would be ripping out a feature I had used, > with no replacement. If we had a really useful feature here I'd be all over it. But it looks more like somebody's ten-minute hack, so the fact that it's undocumented and obscure-to-invoke seems appropriate to me. (BTW, I think we had exactly this discussion way back when Peter cleaned up the postmaster/postgres command line switches. Just about all the other old switches have equivalent GUCs, and IIRC it is not an oversight that SendStop was left out.) regards, tom lane
Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally
On 2021-Sep-03, Tom Lane wrote: > "=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?=" writes: > > I want to share a patch with you, in which I add a guc parameter > > 'enable_send_stop' to enable set the value of SendStop in postmaster.c more > > conveniently. SendStop enable postmaster to send SIGSTOP rather than > > SIGQUIT to its children when some backend dumps core, and this variable is > > originally set with -T parameter when start postgres, which is inconvenient > > to control in some scenarios. > > TBH, I'd sooner rip out SendStop, and simplify the related postmaster > logic. I wrote a patch to do that in 2012, after this exchange: https://postgr.es/m/1333124720-sup-6...@alvh.no-ip.org I obviously doesn't apply at all anymore, but the thing that prevented me from sending it was I couldn't find what the mentioned feature was that would cause all backends to dump core at the time of a crash. So it seemed to me that we would be ripping out a feature I had used, with no replacement. (It applies cleanly on top of 36b7e3da17bc.) -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Cuando no hay humildad las personas se degradan" (A. Christie) >From d8198f6ad6c814c03c486bcce696fda912393405 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 23 May 2012 22:18:21 -0400 Subject: [PATCH] Remove -T postmaster option This used to tell postmaster to send SIGSTOP instead of SIGQUIT in case of trouble, with the intent of letting a hypothetical PG hacker collect core dumps from every backends by attaching a debugger to them, one by one. This has long been superseded by the ability of current operating systems to dump core of several processes simultaneously. --- src/backend/postmaster/postmaster.c | 52 + 1 file changed, 16 insertions(+), 36 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index eeea933b19..bf6166caf4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -187,7 +187,6 @@ static char ExtraOptions[MAXPGPATH]; * shared data structures. (Reinit is currently dead code, though.) */ static bool Reinit = true; -static int SendStop = false; /* still more option variables */ bool EnableSSL = false; @@ -544,7 +543,7 @@ PostmasterMain(int argc, char *argv[]) * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ - while ((opt = getopt(argc, argv, "A:B:bc:C:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1) + while ((opt = getopt(argc, argv, "A:B:bc:C:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:st:W:-:")) != -1) { switch (opt) { @@ -654,16 +653,6 @@ PostmasterMain(int argc, char *argv[]) SetConfigOption("log_statement_stats", "true", PGC_POSTMASTER, PGC_S_ARGV); break; - case 'T': - -/* - * In the event that some backend dumps core, send SIGSTOP, - * rather than SIGQUIT, to all its peers. This lets the wily - * post_hacker collect core dumps from everyone. - */ -SendStop = true; -break; - case 't': { const char *tmp = get_stats_option_name(optarg); @@ -2704,9 +2693,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname) * to commit hara-kiri. * * SIGQUIT is the special signal that says exit without proc_exit - * and let the user know what's going on. But if SendStop is set - * (-s on command line), then we send SIGSTOP instead, so that we - * can get core dumps from all backends by hand. + * and let the user know what's going on. * * We could exclude dead_end children here, but at least in the * SIGSTOP case it seems better to include them. @@ -2715,9 +2702,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname) { ereport(DEBUG2, (errmsg_internal("sending %s to process %d", - (SendStop ? "SIGSTOP" : "SIGQUIT"), - (int) bp->pid))); -signal_child(bp->pid, (SendStop ? SIGSTOP : SIGQUIT)); + "SIGQUIT", (int) bp->pid))); +signal_child(bp->pid, SIGQUIT); } } } @@ -2729,9 +2715,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname) { ereport(DEBUG2, (errmsg_internal("sending %s to process %d", - (SendStop ? "SIGSTOP" : "SIGQUIT"), - (int) StartupPID))); - signal_child(StartupPID, (SendStop ? SIGSTOP : SIGQUIT)); + "SIGQUIT", (int) StartupPID))); + signal_child(StartupPID, SIGQUIT); } /* Take care of the bgwriter too */ @@ -2741,9 +2726,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname) { ereport(DEBUG2, (errmsg_internal("sending %s to process %d", - (SendStop ? "SIGSTOP" : "SIGQUIT"), - (int) BgWriterPID))); - signal_child(BgWriterPID, (SendStop ? SIGSTOP : SIGQUIT)); + "SIGQUIT", (int) BgWriterPID))); + signal_child(BgWriterPID, SIGQUIT); } /* Take care of the checkpointer too */ @@ -2753,9
Re: ORDER BY pushdowns seem broken in postgres_fdw
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Applied the v6 patch to master branch and ran regression test for contrib, the result was "All tests successful."
Re: pgsql: Set the volatility of the timestamptz version of date_bin() back
Alvaro Herrera writes: > On 2021-Sep-03, John Naylor wrote: >> On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera >> wrote: >>> These catversion bumps in branch 14 this late in the cycle seem suspect. >>> Didn't we have some hesitation to push multirange unnest around beta2 >>> precisely because of a desire to avoid catversion bumps? >> This was for correcting a mistake (although the first commit turned out to >> be a mistake itself), so I understood it to be necessary. > A crazy idea might have been to return to the original value. Yes, that is what should have been done, as I complained over in pgsql-committers before seeing this exchange. As things stand, a pg_upgrade is going to be forced on beta3 users without even the excuse of fixing a bug. regards, tom lane
Re: Estimating HugePages Requirements?
Hi, On 2021-09-01 15:53:52 +0900, Michael Paquier wrote: > Hmm. I am not sure about the addition of huge_pages_required, knowing > that we would have shared_memory_size. I'd rather let the calculation > part to the user with a scan of /proc/meminfo. -1. We can easily do better, what do we gain by making the user do this stuff? Especially because the right value also depends on huge_page_size. Greetings, Andres Freund
Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
> On 19 May 2021, at 09:53, Michael Paquier wrote: > > On Tue, Apr 27, 2021 at 12:58:52PM +0300, Aleksander Alekseev wrote: >> I've noticed there is no tab completion for ALTER TABLE xxx ADD. Here >> is an alternative version of the patch that fixes this as well. Not >> sure if this should be in the same commit though. > > - /* If we have ALTER TABLE DROP, provide COLUMN or CONSTRAINT */ > - else if (Matches("ALTER", "TABLE", MatchAny, "DROP")) > + /* If we have ALTER TABLE ADD|DROP, provide COLUMN or CONSTRAINT */ > + else if (Matches("ALTER", "TABLE", MatchAny, "ADD|DROP")) > Seems to me that the behavior to not complete with COLUMN or > CONSTRAINT for ADD is intentional, as it is possible to specify a > constraint or column name without the object type first. This > introduces a inconsistent behavior with what we do for columns with > ADD, for one. So a more consistent approach would be to list columns, > constraints, COLUMN and CONSTRAINT in the list of options available > after ADD. > > + else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT")) > + { > + completion_info_charp = prev3_wd; > + COMPLETE_WITH_QUERY(Query_for_nonvalid_constraint_of_table); > + } > Specifying valid constraints is an authorized grammar, so it does not > seem that bad to keep things as they are, either. I would leave that > alone. This has stalled being marked Waiting on Author since May, and reading the above it sounds like marking it Returned with Feedback is the logical next step (patch also no longer applies). -- Daniel Gustafsson https://vmware.com/
Re: pgsql: Set the volatility of the timestamptz version of date_bin() back
On Fri, Sep 03, 2021 at 01:56:50PM -0400, Alvaro Herrera wrote: > On 2021-Sep-03, John Naylor wrote: > > > On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera > > wrote: > > > > > > On 2021-Sep-03, John Naylor wrote: > > > These catversion bumps in branch 14 this late in the cycle seem suspect. > > > Didn't we have some hesitation to push multirange unnest around beta2 > > > precisely because of a desire to avoid catversion bumps? > > > > This was for correcting a mistake (although the first commit turned out to > > be a mistake itself), so I understood it to be necessary. > > A crazy idea might have been to return to the original value. +1. I think the catversion usually is always increased even in a "revert", but in this exceptional case [0] it would be nice if beta4/rc1 had the same number as b3. [0] two commits close to each other, with no other catalog changes, and with the specific goal of allowing trivial upgrade from b3. -- Justin
Re: A reloption for partitioned tables - parallel_workers
On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote: > > On 6 Apr 2021, at 09:46, Amit Langote wrote: > > On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe > > wrote: > > > > I don't know if Seamus is still working on that; if not, we might > > > mark it as "returned with feedback". > > > > I have to agree given the time left. > > This thread has stalled and the patch no longer applies. I propose that we > mark this Returned with Feedback, is that Ok with you Amit? +1. That requires more thought. Yours, Laurenz Albe
Re: pgsql: Set the volatility of the timestamptz version of date_bin() back
On 2021-Sep-03, John Naylor wrote: > On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera > wrote: > > > > On 2021-Sep-03, John Naylor wrote: > > These catversion bumps in branch 14 this late in the cycle seem suspect. > > Didn't we have some hesitation to push multirange unnest around beta2 > > precisely because of a desire to avoid catversion bumps? > > This was for correcting a mistake (although the first commit turned out to > be a mistake itself), so I understood it to be necessary. A crazy idea might have been to return to the original value. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: prevent immature WAL streaming
On 2021-Sep-03, Andres Freund wrote: > Hi, > > On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote: > > Oh, but of course we can't modify XLogReaderState in backbranches to add > > the new struct member abortedContrecordPtr (or whatever we end up naming > > that.) > > Why is that? Afaict it's always allocated via XLogReaderAllocate(), so adding > a new field at the end should be fine? Hmm, true, that works. > That said, I'm worried that this stuff is too complicated to get right in the > backbranches. I suspect letting it stew in master for a while before > backpatching would be a good move. Sure, we can put it in master now and backpatch before the November minors if everything goes well. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: pgsql: Set the volatility of the timestamptz version of date_bin() back
On Fri, Sep 3, 2021 at 1:46 PM Alvaro Herrera wrote: > > On 2021-Sep-03, John Naylor wrote: > These catversion bumps in branch 14 this late in the cycle seem suspect. > Didn't we have some hesitation to push multirange unnest around beta2 > precisely because of a desire to avoid catversion bumps? This was for correcting a mistake (although the first commit turned out to be a mistake itself), so I understood it to be necessary. -- John Naylor EDB: http://www.enterprisedb.com
Re: prevent immature WAL streaming
Hi, On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote: > Oh, but of course we can't modify XLogReaderState in backbranches to add > the new struct member abortedContrecordPtr (or whatever we end up naming > that.) Why is that? Afaict it's always allocated via XLogReaderAllocate(), so adding a new field at the end should be fine? That said, I'm worried that this stuff is too complicated to get right in the backbranches. I suspect letting it stew in master for a while before backpatching would be a good move. Greetings, Andres Freund
Re: Estimating HugePages Requirements?
On 9/2/21, 10:12 PM, "Kyotaro Horiguchi" wrote: > By the way I noticed that postgres -C huge_page_size shows 0, which I > think should have the number used for the calculation if we show > huge_page_required. I would agree with this if huge_page_size was a runtime-computed GUC, but since it's intended for users to explicitly request the huge page size, it might be slightly confusing. Perhaps another option would be to create a new GUC for this. Or maybe it's enough to note that the value will be changed from 0 at runtime if huge pages are supported. In any case, it might be best to handle this separately. > I noticed that postgres -C shared_memory_size showed 137 (= 144703488) > whereas the error message above showed 148897792 bytes (142MB). So it > seems that something is forgotten while calculating > shared_memory_size. As the consequence, launching postgres setting > huge_pages_required (69 pages) as vm.nr_hugepages ended up in the > "could not map anonymous shared memory" error. Hm. I'm not seeing this with the v5 patch set, so maybe I inadvertently fixed it already. Can you check this again with v5? Nathan
Re: pgsql: Set the volatility of the timestamptz version of date_bin() back
On 2021-Sep-03, John Naylor wrote: > Set the volatility of the timestamptz version of date_bin() back to immutable > > 543f36b43d was too hasty in thinking that the volatility of date_bin() > had to match date_trunc(), since only the latter references > session_timezone. > > Bump catversion These catversion bumps in branch 14 this late in the cycle seem suspect. Didn't we have some hesitation to push multirange unnest around beta2 precisely because of a desire to avoid catversion bumps? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: prevent immature WAL streaming
On 2021-Sep-03, Kyotaro Horiguchi wrote: > At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera > wrote in > The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work? > > > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) > > { > > report_invalid_record(state, > > > > "there is no contrecord flag at %X/%X", > > > > LSN_FORMAT_ARGS(RecPtr)); > > + goto aborted_contrecord; > > This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and > _IS_ABROTED_PARTIAL. Is it okay? (I don't object to remove the check.). On second thought, I'm not sure that we should make xlogreader report an invalid record here. If we do, how is the user going to recover? Recovery will stop there and lose whatever was written afterwards. Maybe you could claim that if both bits are set then WAL is corrupted, so it's okay to stop recovery. But if WAL is really corrupted, then the CRC check will fail. All in all, I think I'd rather ignore the flag if we see it set. At most, we could have an #ifndef FRONTEND ereport(WARNING, "found unexpected flag xyz"); #endif or something like that. However, xlogreader does not currently have anything like that, so I'm not completely sure. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: prevent immature WAL streaming
Oh, but of course we can't modify XLogReaderState in backbranches to add the new struct member abortedContrecordPtr (or whatever we end up naming that.) I think I'm going to fix this, in backbranches only, by having xlogreader.c have a global variable, which is going to be used by ReadRecord instead of accessing the struct member. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
On 2021/09/03 14:56, kuroda.hay...@fujitsu.com wrote: Dear Fujii-san, Thank you for your great works. Attached is the latest version. Thanks a lot! I set four testcases: (1) Sets neither GUC nor server option (2) Sets server option, but not GUC (3) Sets GUC but not server option (4) Sets both GUC and server option OK. I confirmed it almost works fine, but I found that fallback_application_name will be never used in our test enviroment. It is caused because our test runner pg_regress sets PGAPPNAME to "pg_regress" and libpq prefer the environment variable to fallback_appname. (I tried to control it by \setenv, but failed...) make check uses "pg_regress", but I guess that make installcheck uses "postgres_fdw". So the patch would cause make installcheck to fail. I think that the case (1) is not so important, so can be removed. Thought? Attached is the updated version of the patch. I removed the test for case (1). And I arranged the regression tests so that they are based on debug_discard_caches, to simplify them. Also I added and updated some comments and docs. Could you review this version? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 82aa14a65d..705f60a3ae 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -353,10 +353,11 @@ connect_pg_server(ForeignServer *server, UserMapping *user) /* * Construct connection params from generic options of ForeignServer * and UserMapping. (Some of them might not be libpq options, in -* which case we'll just waste a few array slots.) Add 3 extra slots -* for fallback_application_name, client_encoding, end marker. +* which case we'll just waste a few array slots.) Add 4 extra slots +* for application_name, fallback_application_name, client_encoding, +* end marker. */ - n = list_length(server->options) + list_length(user->options) + 3; + n = list_length(server->options) + list_length(user->options) + 4; keywords = (const char **) palloc(n * sizeof(char *)); values = (const char **) palloc(n * sizeof(char *)); @@ -366,7 +367,23 @@ connect_pg_server(ForeignServer *server, UserMapping *user) n += ExtractConnectionOptions(user->options, keywords + n, values + n); - /* Use "postgres_fdw" as fallback_application_name. */ + /* +* Use pgfdw_application_name as application_name if set. +* +* PQconnectdbParams() processes the parameter arrays from start to +* end. If any key word is repeated, the last value is used. Therefore +* note that pgfdw_application_name must be added to the arrays after +* options of ForeignServer are, so that it can override +* application_name set in ForeignServer. +*/ + if (pgfdw_application_name && *pgfdw_application_name != '\0') + { + keywords[n] = "application_name"; + values[n] = pgfdw_application_name; + n++; + } + + /* Use "postgres_fdw" as fallback_application_name */ keywords[n] = "fallback_application_name"; values[n] = "postgres_fdw"; n++; diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index e3ee30f1aa..6a7d83c81f 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -10761,3 +10761,82 @@ ERROR: invalid value for integer option "fetch_size": 100$%$#$# CREATE FOREIGN TABLE inv_bsz (c1 int ) SERVER loopback OPTIONS (batch_size '100$%$#$#'); ERROR: invalid value for integer option "batch_size": 100$%$#$# +-- === +-- test postgres_fdw.application_name GUC +-- === +-- Turn debug_discard_caches off for this test to make that +-- the remote connection is alive when checking its application_name. +-- For each test, close all the existing cached connections manually and +-- establish connection with new setting of application_name. +SET debug_discard_caches = 0; +-- If appname is set as GUC but not as options of server object, +-- the GUC setting is used as application_name of remote connection. +SET postgres_fdw.application_name TO 'fdw_guc_appname'; +SELECT 1 FROM postgres_fdw_disconnect_all(); + ?column? +-- +1 +(1 row) +
Re: psql: \dl+ to list large objects privileges
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi, There is something I forgot to mention in my previous review. Typically when describing a thing in psql, it is the column "Description" that belongs in the verbose version. For example listing indexes produces: List of relations Schema | Name| Type | Owner | Table and the verbose version: List of relations Schema | Name| Type | Owner | Table | Persistence | Access method | Size| Description Since '\dl' already contains description, it might be worthwhile to consider to add the column `Access privileges` without introducing a verbose version. Thoughts? Cheers, //Georgios
Re: Improve logging when using Huge Pages
On 2021/09/03 23:27, Tom Lane wrote: Fujii Masao writes: IMO, if the level is promoted to LOG, the message should be updated so that it follows the error message style guide. But I agree that simpler message would be better in this case. So what about something like the following? LOG: could not map anonymous shared memory (%zu bytes) with huge pages enabled HINT: The server will map anonymous shared memory again with huge pages disabled. That is not a hint. Maybe it qualifies as errdetail, though. Yes, agreed. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: psql: \dl+ to list large objects privileges
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi, thank you for the patch, I personally think it is a useful addition and thus it gets my vote. However, I also think that the proposed code will need some changes. On a high level I will recommend the addition of tests. There are similar tests present in: ./src/test/regress/sql/psql.sql I will also inquire as to the need for renaming the function `do_lo_list` to `listLargeObjects` and its move to describe.c. from large_obj.c. In itself it is not necessarily a blocking point, though it will require some strong arguments for doing so. Applying the patch, generates several whitespace warnings. It will be helpful if those warnings are removed. The patch contains: case 'l': - success = do_lo_list(); + success = listLargeObjects(show_verbose); It might be of some interest to consider in the above to check the value of the next character in command or emit an error if not valid. Such a pattern can be found in the same switch block as for example: switch (cmd[2]) { case '\0': case '+': success = ... break; default: status = PSQL_CMD_UNKNOWN; break; } The patch contains: else if (strcmp(cmd + 3, "list") == 0) - success = do_lo_list(); + success = listLargeObjects(false); + + else if (strcmp(cmd + 3, "list+") == 0) + success = listLargeObjects(true); In a fashion similar to `exec_command_list`, it might be interesting to consider expressing the above as: show_verbose = strchr(cmd, '+') ? true : false; else if (strcmp(cmd + 3, "list") == 0 success = do_lo_list(show_verbose); Thoughts? Cheers, //Georgios
Re: A reloption for partitioned tables - parallel_workers
> On 6 Apr 2021, at 09:46, Amit Langote wrote: > On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe wrote: >> I don't know if Seamus is still working on that; if not, we might >> mark it as "returned with feedback". > > I have to agree given the time left. This thread has stalled and the patch no longer applies. I propose that we mark this Returned with Feedback, is that Ok with you Amit? -- Daniel Gustafsson https://vmware.com/
Re: pg_receivewal starting position
On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau wrote: > There is still the concern raised by Bharath about being able to select the > way to fetch the replication slot information for the user, and what should or > should not be included in the documentation. I am in favor of documenting the > process of selecting the wal start, and maybe include a --start-lsn option for > the user to override it, but that's maybe for another patch. Let's hear from others. Thanks for the patches. I have some quick comments on the v5 patch-set: 0001: 1) Do you also want to MemSet values too in ReadReplicationSlot? 2) When if clause has single statement we don't generally use "{" and "}" + if (slot == NULL || !slot->in_use) + { + LWLockRelease(ReplicationSlotControlLock); + } you can just have: + if (slot == NULL || !slot->in_use) + LWLockRelease(ReplicationSlotControlLock); 3) This is still not copying the slot contents, as I said earlier you can just take the required info into some local variables instead of taking the slot pointer to slot_contents pointer. + /* Copy slot contents while holding spinlock */ + SpinLockAcquire(>mutex); + slot_contents = *slot; + SpinLockRelease(>mutex); + LWLockRelease(ReplicationSlotControlLock); 4) As I said earlier, why can't we have separate tli variables for more readability instead of just one slots_position_timeline and timeline_history variable? And you are not even resetting those 2 variables after if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end up sending the restart_lsn timelineid for confirmed_flush. So, better use two separate variables. In fact you can use block local variables: + if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) + { + List*tl_history= NIL; + TimeLineID tli; + tl_history= readTimeLineHistory(ThisTimeLineID); + tli = tliOfPointInHistory(slot_contents.data.restart_lsn, + tl_history); + values[i] = Int32GetDatum(tli); + nulls[i] = false; + } similarly for confirmed_flush. 5) I still don't see the need for below test case: + "READ_REPLICATION_SLOT does not produce an error with non existent slot"); when we have + "READ_REPLICATION_SLOT does not produce an error with dropped slot"); Because for a user, dropped or non existent slot is same, it's just that for dropped slot we internally don't delete its entry from the shared memory. 0002: 1) Imagine GetSlotInformation always returns READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an infinite loop there? Instead, why don't you just exit(1); instead of return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I think, you can just do exit(1), no need to retry. + case READ_REPLICATION_SLOT_ERROR: + + /* + * Error has been logged by GetSlotInformation, return and + * maybe retry + */ + return; 2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't passed? And why are you having this check after you connect to the server and fetch the data? + /* If no slotinformation has been passed, we can return immediately */ + if (slot_info == NULL) + { + PQclear(res); + return READ_REPLICATION_SLOT_OK; + } Instead you can just have a single assert: + Assert(slot_name && slot_info ); 3) How about pg_log_error("could not read replication slot: instead of pg_log_error("could not fetch replication slot: 4) Why are you having the READ_REPLICATION_SLOT_OK case in default? + default: + if (slot_info.is_logical) + { + /* + * If the slot is not physical we can't expect to + * recover from that + */ + pg_log_error("cannot use the slot provided, physical slot expected."); + exit(1); + } + stream.startpos = slot_info.restart_lsn; + stream.timeline = slot_info.restart_tli; + } You can just have another case statement for READ_REPLICATION_SLOT_OK and in the default you can throw an error "unknown read replication slot status" or some other better working and exit(1); 5) Do you want initialize slot_info to 0? + if (replication_slot) + { + SlotInformation slot_info; + MemSet(slot_info, 0, sizeof(SlotInformation)); 6) This isn't readable: + slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0; How about: if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0) slot_info->is_logical = true; You don't need to set it false, because you would have MemSet(slot_info) in the caller. 7) How about uint32 hi; uint32 lo; instead of + uint32 hi, + lo; 8) Move SlotInformation * slot_info) to the next line as it crosses the 80 char limit. +GetSlotInformation(PGconn *conn, const char *slot_name, SlotInformation * slot_info) 9) Instead of a boolean is_logical, I would rather suggest to use an enum or #define macros the slot types correctly, because someday we might introduce new type slots and somebody wants is_physical kind of variable name? +typedef struct SlotInformation { + boolis_logical; Regards, Bharath Rupireddy.
Re: Trap errors from streaming child in pg_basebackup to exit early
On Fri, Sep 3, 2021 at 3:23 PM Daniel Gustafsson wrote: > > 4) Instead of just exiting from the main pg_basebackup process when > > the child WAL receiver dies, can't we think of restarting the child > > process, probably with the WAL streaming position where it left off or > > stream from the beginning? This way, the work that the main > > pg_basebackup has done so far doesn't get wasted. I'm not sure if this > > affects the pg_basebackup functionality. We can restart the child > > process for 1 or 2 times, if it still dies, we can kill the main > > pg_baasebackup process too. Thoughts? > > I was toying with the idea, but I ended up not pursuing it. This error is > well > into the “really shouldn’t happen, but can” territory and it’s quite likely > that some level of manual intervention is required to make it successfully > restart. I’m not convinced that adding complicated logic to restart (and even > more complicated tests to simulate and test it) will be worthwhile. I withdraw my suggestion because I now feel that it's better not to make it complex and let the user decide if at all the child process exits abnormally. I think we might still miss abnormal child thread exits on Windows because we set bgchild_exited = true only if ReceiveXlogStream or walmethod->finish() returns false. I'm not sure the parent thread on Windows can detect a child's abnormal exit. Since there is no signal mechanism on Windows, what the patch does is better to detect child exit on two important functions failures. Overall, the v3 patch looks good to me. Regards, Bharath Rupireddy.
Re: [PATCH] Make pkg-config files cross-compile friendly
On 05.03.20 22:38, Sebastian Kemper wrote: This commit addresses this by doing the following two things: 1. Instead of hard coding the paths in "Cflags" and "Libs" "${includedir}" and "${libdir}" are used. Note: these variables can be overriden on the pkg-config command line ("--define-variable=libdir=/some/path"). 2. Add the variables "prefix" and "exec_prefix". If "includedir" and/or "libdir" are using these then construct them accordingly. This is done because buildroots (for instance OpenWrt) tend to rename the real pkg-config and call it indirectly from a script that sets "prefix", "exec_prefix" and "bindir", like so: Committed. I simplified your code a little bit, and I also made it so that exec_prefix is set to ${prefix} by default. That way it matches what most other .pc files I have found do.
Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally
"=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?=" writes: > I want to share a patch with you, in which I add a guc parameter > 'enable_send_stop' to enable set the value of SendStop in postmaster.c more > conveniently. SendStop enable postmaster to send SIGSTOP rather than SIGQUIT > to its children when some backend dumps core, and this variable is originally > set with -T parameter when start postgres, which is inconvenient to control > in some scenarios. TBH, I'd sooner rip out SendStop, and simplify the related postmaster logic. I've never used it in twenty-some years of Postgres hacking, and I doubt anyone else has used it much either. It's not worth the overhead of a GUC. (The argument that you need it in situations where you can't control the postmaster's command line seems pretty thin, too. I'm much more worried about somebody turning it on by accident and then complaining that the cluster freezes upon crash.) regards, tom lane
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Sep 3, 2021 at 6:23 AM Dilip Kumar wrote: >> + /* Built-in oids are mapped directly */ >> + if (classForm->oid < FirstGenbkiObjectId) >> + relfilenode = classForm->oid; >> + else if (OidIsValid(classForm->relfilenode)) >> + relfilenode = classForm->relfilenode; >> + else >> + continue; >> >> Am I missing something, or is this totally busted? > > Oops, I think the condition should be like below, but I will think carefully > before posting the next version if there is something else I am missing. > > if (OidIsValid(classForm->relfilenode)) >relfilenode = classForm->relfilenode; > else if if (classForm->oid < FirstGenbkiObjectId) >relfilenode = classForm->oid; > else > continue What about mapped rels that have been rewritten at some point? > Agreed to all, but In general, I think WAL hitting the disk before data is > more applicable for the shared buffers, where we want to flush the WAL first > before writing the shared buffer so that in case of torn page we have an > option to recover the page from previous FPI. But in such cases where we are > creating a directory or file there is no such requirement. Anyways, I > agreed with the comments that it should be more uniform and the comment > should be correct. There have been previous debates about whether WAL records for filesystem operations should be issued before or after those operations are performed. I'm not sure how easy those discussion are to find in the archives, but it's very relevant here. I think the short version is - if we write a WAL record first and then the operation fails afterward, we have to PANIC. But if we perform the operation first and then write the WAL record if it succeeds, then we could crash before writing WAL and end up out of sync with our standbys. If we then later do any WAL-logged operation locally that depends on that operation having been performed, replay will fail on the standby. There used to be, or maybe still are, comments in the code defending the latter approach, but more recently it's been strongly criticized. The thinking, AIUI, is basically that filesystem operations really ought not to fail, because nobody should be doing weird things to the data directory, and if they do, panicking is OK. But having replay fail in strange ways on the standby later is not OK. I'm not sure if everyone agrees with that logic; it seems somewhat debatable. I *think* I personally agree with it but ... I'm not even 100% sure about that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Improve logging when using Huge Pages
Fujii Masao writes: > IMO, if the level is promoted to LOG, the message should be updated > so that it follows the error message style guide. But I agree that simpler > message would be better in this case. So what about something like > the following? > LOG: could not map anonymous shared memory (%zu bytes) with huge pages enabled > HINT: The server will map anonymous shared memory again with huge pages > disabled. That is not a hint. Maybe it qualifies as errdetail, though. regards, tom lane
Re: using an end-of-recovery record in all cases
On Fri, Sep 3, 2021 at 12:52 AM Kyotaro Horiguchi wrote: > I tried to reproduce this but just replacing the end-of-recovery > checkpoint request with issuing an end-of-recovery record didn't cause > make check-workd fail for me. Do you have an idea of any other > requirement to cause that? Did you use the patch I posted, or a different one? -- Robert Haas EDB: http://www.enterprisedb.com
Re: replay of CREATE TABLESPACE eats data at wal_level=minimal
On Thu, Sep 2, 2021 at 10:56 PM Noah Misch wrote: > > Is there anything still standing in the way of committing this? > > I pushed it as commit 97ddda8. Oh, thanks. Sorry, I had missed that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: prevent immature WAL streaming
On 2021-Sep-03, Kyotaro Horiguchi wrote: > At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera > wrote in > 0002: > > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at > > end of > > + * > > WAL */ > > The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work? I went over various iterations of the name of this, and still not entirely happy. I think we need to convey the ideas that * This is the endptr+1 of the known-good part of the record, that is, the beginning of the next part of the record. I think "endPtr" summarizes this well; we use this name elsewhere. * At some point before recovery, this was the last WAL record that existed * there is an invalid contrecord, or we were looking for a contrecord and found invalid data * this record is incomplete So maybe 1. incompleteRecEndPtr 2. finalInvalidRecEndPtr 3. brokenContrecordEndPtr > > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) > > { > > report_invalid_record(state, > > > > "there is no contrecord flag at %X/%X", > > > > LSN_FORMAT_ARGS(RecPtr)); > > - goto err; > > + goto aborted_contrecord; > > This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and > _IS_ABROTED_PARTIAL. Is it okay? (I don't object to remove the check.). Yeah, I was unsure about that. I think it's good to have it as a cross-check, though it should never occur. I'll put it back. Another related point is whether it's a good idea to have the ereport() about the bit appearing in a not-start-of-page address being a PANIC. If we degrade to WARNING then it'll be lost in the noise, but I'm not sure what else can we do. (If it's a PANIC, then you end up with an unusable database). > I didin't thought this as an aborted contrecord. but on second > thought, when we see a record broken in any style, we stop recovery at > the point. I agree to the change and all the silmiar changes. > > + /* XXX should we goto > aborted_contrecord here? */ > > I think it should be aborted_contrecord. > > When that happens, the loaded bytes actually looked like the first > fragment of a continuation record to xlogreader, even if the cause > were a broken total_len. So if we abort the record there, the next > time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same > page, and correctly finds a new record there. > > On the other hand if we just errored-out there, we will step-back to > the beginning of the broken record in the previous page or segment > then start writing a new record there but that is exactly what we want > to avoid now. Hmm, yeah, makes sense. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: Multi-Column List Partitioning
On Wed, Sep 1, 2021 at 2:31 PM Amit Langote wrote: > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav > wrote: > > The attached patch also fixes the above comments. > > I noticed that multi-column list partitions containing NULLs don't > work correctly with partition pruning yet. > > create table p0 (a int, b text, c bool) partition by list (a, b, c); > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, > false)); > create table p02 partition of p0 for values in ((1, NULL, false)); > explain select * from p0 where a is null; >QUERY PLAN > > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37) >Filter: (a IS NULL) > (2 rows) > > I guess that may be due to the following newly added code being incomplete: > > +/* > + * get_partition_bound_null_index > + * > + * Returns the partition index of the partition bound which accepts NULL. > + */ > +int > +get_partition_bound_null_index(PartitionBoundInfo boundinfo) > +{ > + int i = 0; > + int j = 0; > + > + if (!boundinfo->isnulls) > + return -1; > > - if (!val->constisnull) > - count++; > + for (i = 0; i < boundinfo->ndatums; i++) > + { > + //TODO: Handle for multi-column cases > + for (j = 0; j < 1; j++) > + { > + if (boundinfo->isnulls[i][j]) > + return boundinfo->indexes[i]; > } > } > > + return -1; > +} > > Maybe this function needs to return a "bitmapset" of indexes, because > multiple partitions can now contain NULL values. > > Some other issues I noticed and suggestions for improvement: > > +/* > + * checkForDuplicates > + * > + * Returns TRUE if the list bound element is already present in the list of > + * list bounds, FALSE otherwise. > + */ > +static bool > +checkForDuplicates(List *source, List *searchElem) > > This function name may be too generic. Given that it is specific to > implementing list bound de-duplication, maybe the following signature > is more appropriate: > > static bool > checkListBoundDuplicated(List *list_bounds, List *new_bound) > > Also, better if the function comment mentions those parameter names, like: > > "Returns TRUE if the list bound element 'new_bound' is already present > in the target list 'list_bounds', FALSE otherwise." > > +/* > + * transformPartitionListBounds > + * > + * Converts the expressions of list partition bounds from the raw grammar > + * representation. > > A sentence about the result format would be helpful, like: > > The result is a List of Lists of Const nodes to account for the > partition key possibly containing more than one column. > > + int i = 0; > + int j = 0; > > Better to initialize such loop counters closer to the loop. > > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char)); > + colname[i] = get_attname(RelationGetRelid(parent), > +key->partattrs[i], false); > > The palloc in the 1st statement is wasteful, because the 2nd statement > overwrites its pointer by the pointer to the string palloc'd by > get_attname(). > > + ListCell *cell2 = NULL; > > No need to explicitly initialize the loop variable. > > + RowExpr *rowexpr = NULL; > + > + if (!IsA(expr, RowExpr)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), > + errmsg("Invalid list bound specification"), > + parser_errposition(pstate, exprLocation((Node > *) spec; > + > + rowexpr = (RowExpr *) expr; > > It's okay to assign rowexpr at the top here instead of the dummy > NULL-initialization and write the condition as: > > if (!IsA(rowexpr, RowExpr)) > > + if (isDuplicate) > + continue; > + > + result = lappend(result, values); > > I can see you copied this style from the existing code, but how about > writing this simply as: > > if (!isDuplicate) > result = lappend(result, values); > > -/* One value coming from some (index'th) list partition */ > +/* One bound of a list partition */ > typedef struct PartitionListValue > { > int index; > - Datum value; > + Datum *values; > + bool *isnulls; > } PartitionListValue; > > Given that this is a locally-defined struct, I wonder if it makes > sense to rename the struct while we're at it. Call it, say, > PartitionListBound? > > Also, please keep part of the existing comment that says that the > bound belongs to index'th partition. > > Will send more comments in a bit... + * partition_bound_accepts_nulls + * + * Returns TRUE if partition bound has NULL value, FALSE otherwise. */ I suggest slight rewording, as follows: "Returns TRUE if any of the partition bounds contains a NULL value, FALSE otherwise." - PartitionListValue *all_values; + PartitionListValue **all_values; ... -
Re: psql: \dl+ to list large objects privileges
Hello, Thank you very mush for review. I will prepare a new version of the patch according to your comments. For now, I will answer this question: I will also inquire as to the need for renaming the function `do_lo_list` to `listLargeObjects` and its move to describe.c. from large_obj.c. In itself it is not necessarily a blocking point, though it will require some strong arguments for doing so. I understand that I needed a good reason for such actions. On the one hand all the commands for working with large objects are in large_obj.c. On the other hand, all commands for displaying the contents of system catalogs are in describe.c. The function do_lo_list belongs to both groups. The main reason for moving the function to describe.c is that I wanted to use the printACLColumn function to display lomacl column. printACLColumn function is used in all the other commands to display privileges and this function is locally defined in describe.c and there is no reason to make in public. Another option is to duplicate the printACLColumn function (or its contents) in large_obj.c. This seemed wrong to me. Is it any other way? Pavel Luzanov Postgres Professional: https://postgrespro.com The Russian Postgres Company
Re: Improve logging when using Huge Pages
On 2021/09/03 16:49, Kyotaro Horiguchi wrote: IF you are thinking to show that in GUC, you might want to look into the nearby thread [1] Yes, let's discuss this feature at that thread. I have some comment about the patch. - if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED) - elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m", -allocsize); + if (ptr != MAP_FAILED) + using_huge_pages = true; + else if (huge_pages == HUGE_PAGES_TRY) + ereport(LOG, + (errmsg("could not map anonymous shared memory: %m"), +(mmap_errno == ENOMEM) ? +errhint("This error usually means that PostgreSQL's request " If we set huge_pages to try and postgres falled back to regular pages, it emits a large message relative to its importance. The user specifed that "I'd like to use huge pages, but it's ok if not available.", so I think the message should be far smaller. Maybe just raising the DEBUG1 message to LOG along with moving to ereport might be sufficient. IMO, if the level is promoted to LOG, the message should be updated so that it follows the error message style guide. But I agree that simpler message would be better in this case. So what about something like the following? LOG: could not map anonymous shared memory (%zu bytes) with huge pages enabled HINT: The server will map anonymous shared memory again with huge pages disabled. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: psql: \dl+ to list large objects privileges
Hello, Thank you very much for review. Since '\dl' already contains description, it might be worthwhile to consider to add the column `Access privileges` without introducing a verbose version. I thought about it. The reason why I decided to add the verbose version is because of backward compatibility. Perhaps the appearance of a new column in an existing command may become undesirable to someone. If we don't worry about backward compatibility, the patch will be easier. But I'm not sure this is the right approach. Pavel Luzanov Postgres Professional: https://postgrespro.com The Russian Postgres Company
Re: .ready and .done files considered harmful
Hi, Thanks for the feedback. > Which approach do you think we should use? I think we have decent > patches for both approaches at this point, so perhaps we should see if > we can get some additional feedback from the community on which one we > should pursue further. In my opinion both the approaches have benefits over current implementation. I think in keep-trying-the-next-file approach we have handled all rare and specific scenarios which requires us to force a directory scan to archive the desired files. In addition to this with the recent change to force a directory scan at checkpoint we can avoid an infinite wait for a file which is still being missed out despite handling the special scenarios. It is also more efficient in extreme scenarios as discussed in this thread. However, multiple-files-per-readdir approach is cleaner with resilience of current implementation. I agree that we should decide on which approach to pursue further based on additional feedback from the community. > The problem I see with this is that pgarch_archiveXlog() might end up > failing. If it does, we won't retry archiving the file until we do a > directory scan. I think we could try to avoid forcing a directory > scan outside of these failure cases and archiver startup, but I'm not > sure it's really worth it. When pgarch_readyXlog() returns false, it > most likely means that there are no .ready files present, so I'm not > sure we are gaining a whole lot by avoiding a directory scan in that > case. I guess it might help a bit if there are a ton of .done files, > though. Yes, I think it will be useful when we have a bunch of .done files and the frequency of .ready files is such that the archiver goes to wait state before the next WAL file is ready for archival. > I agree, but it should probably be something like DEBUG3 instead of > LOG. I will update it in the next patch. Thanks, Dipesh
Re: Avoid stuck of pbgench due to skipped transactions
On 2021/06/17 1:23, Yugo NAGATA wrote: I attached the v2 patch to clarify that I withdrew the v3 patch. Thanks for the patch! +* For very unrealistic rates under -T, some skipped +* transactions are not counted because the catchup +* loop is not fast enough just to do the scheduling +* and counting at the expected speed. +* +* We do not bother with such a degenerate case. +*/ ISTM that the patch changes pgbench so that it can skip counting some skipped transactions here even for realistic rates under -T. Of course, which would happen very rarely. Is this understanding right? On the other hand, even without the patch, in the first place, there seems no guarantee that all the skipped transactions are counted under -T. When the timer is exceeded in CSTATE_END_TX, a client ends without checking outstanding skipped transactions. Therefore the "issue" that some skipped transactions are not counted is not one the patch newly introdues. So that behavior change by the patch would be acceptable. Is this understanding right? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Added schema level support for publication.
On Thu, Sep 2, 2021 at 5:12 PM Amit Kapila wrote: > > On Thu, Sep 2, 2021 at 11:58 AM vignesh C wrote: > > > > Below are few comments on v23. If you have already addressed anything > in v24, then ignore it. > More Review comments (on latest version v24): === 1. +Oidpsnspcid BKI_LOOKUP(pg_class);/* Oid of the schema */ +} FormData_pg_publication_schema; Why in the above structure you have used pg_class instead of pg_namespace? 2. GetSchemaPublicationRelations() uses two different ways to skip non-publishable relations in two while loops. Both are correct but I think it would be easier to follow if they use the same way and in this case I would prefer a check like if (is_publishable_class(relid, relForm)). The comments atop function could also be modified to :"Get the list of publishable relation oids for a specified schema.". I think it is better to write some comments on why you need to scan and loop twice. 3. The other point about GetSchemaPublicationRelations() is that I am afraid that it will collect duplicate relation oids in the final list when the partitioned table and child tables are in the same schema. 4. In GetRelationPublicationActions(), the handling related to partitions seems to be missing for the schema. It won't be able to take care of child tables when they are in a different schema than the parent table. 5. If I modify the search path to remove public schema then I get the below error message: postgres=# Create publication mypub for all tables in schema current_schema; ERROR: no schema has been selected I think this message is not very clear. How about changing to something like "current_schema doesn't contain any valid schema"? This message is used in more than one place, so let's keep it the same at all the places if you agree to change it. 6. How about naming ConvertPubObjSpecListToOidList() as ObjectsInPublicationToOids()? I see somewhat similarly named functions in the code like objectsInSchemaToOids, objectNamesToOids. 7. + /* + * Schema lock is held until the publication is created to prevent + * concurrent schema deletion. No need to unlock the schemas, the + * locks will be released automatically at the end of create + * publication command. + */ In this comment no need to say create publication command as that is implicit, we can just write " at the end of command". 8. Can you please extract changes like tab-completion, dump/restore in separate patches? This can help to review the core (backend) part of the patch in more detail. -- With Regards, Amit Kapila.
Re: Added schema level support for publication.
On Wed, Sep 1, 2021 at 12:05 PM Amit Kapila wrote: > > On Wed, Sep 1, 2021 at 8:52 AM Greg Nancarrow wrote: > > > > > I'd expect a lot of users to naturally think that "ALTER PUBLICATION > > pub1 DROP ALL TABLES IN SCHEMA sc1;" would drop from the publication > > all tables that are in schema "sc1", which is not what it is currently > > doing. > > Since the syntax was changed to specifically refer to FOR ALL TABLES > > IN SCHEMA rather than FOR SCHEMA, then now it's clear we're referring > > to tables only, when specifying "... FOR ALL TABLES in sc1, TABLE > > sc1.test", so IMHO it's reasonable to remove duplicates here, rather > > than treating these as somehow separate ways of referencing the same > > table. > > > > I see your point and if we decide to go this path then it is better to > give an error than silently removing duplicates. > Today, I have thought about this point again and it seems better to give an error in this case and let the user take the action rather than silently removing such tables to avoid any confusion. -- With Regards, Amit Kapila.
Re: using an end-of-recovery record in all cases
On Fri, Sep 3, 2021 at 10:23 AM Kyotaro Horiguchi wrote: > > At Thu, 2 Sep 2021 11:30:59 -0400, Robert Haas wrote > in > > On Mon, Aug 9, 2021 at 3:00 PM Robert Haas wrote: > > > I decided to try writing a patch to use an end-of-recovery record > > > rather than a checkpoint record in all cases. > > > > > > The first problem I hit was that GetRunningTransactionData() does > > > Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)). > > > > > > Unfortunately we can't just relax the assertion, because the > > > XLOG_RUNNING_XACTS record will eventually be handed to > > > ProcArrayApplyRecoveryInfo() for processing ... and that function > > > contains a matching assertion which would in turn fail. It in turn > > > passes the value to MaintainLatestCompletedXidRecovery() which > > > contains yet another matching assertion, so the restriction to normal > > > XIDs here looks pretty deliberate. There are no comments, though, so > > > the reader is left to guess why. I see one problem: > > > MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which > > > expects a normal XID. Perhaps it's best to just dodge the entire issue > > > by skipping LogStandbySnapshot() if latestCompletedXid happens to be > > > 2, but that feels like a hack, because AFAICS the real problem is that > > > StartupXLog() doesn't agree with the rest of the code on whether 2 is > > > a legal case, and maybe we ought to be storing a value that doesn't > > > need to be computed via TransactionIdRetreat(). > > > > Anyone have any thoughts about this? > > I tried to reproduce this but just replacing the end-of-recovery > checkpoint request with issuing an end-of-recovery record didn't cause > make check-workd fail for me. Do you have an idea of any other > requirement to cause that? > You might need the following change at the end of StartupXLOG(): - if (promoted) - RequestCheckpoint(CHECKPOINT_FORCE); + RequestCheckpoint(CHECKPOINT_FORCE); Regards, Amul
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Thu, Sep 2, 2021 at 8:52 PM Robert Haas wrote: > On Thu, Sep 2, 2021 at 2:06 AM Dilip Kumar wrote: > > 0003- The main patch for WAL logging the created database operation. > > Andres pointed out that this approach ends up accessing relations > without taking a lock on them. It doesn't look like you did anything > about that. > I missed that, I have shared my opinion about this in my last email [1] > > + /* Built-in oids are mapped directly */ > + if (classForm->oid < FirstGenbkiObjectId) > + relfilenode = classForm->oid; > + else if (OidIsValid(classForm->relfilenode)) > + relfilenode = classForm->relfilenode; > + else > + continue; > > Am I missing something, or is this totally busted? > Oops, I think the condition should be like below, but I will think carefully before posting the next version if there is something else I am missing. if (OidIsValid(classForm->relfilenode)) relfilenode = classForm->relfilenode; else if if (classForm->oid < FirstGenbkiObjectId) relfilenode = classForm->oid; else continue > /* > + * Now drop all buffers holding data of the target database; they should > + * no longer be dirty so DropDatabaseBuffers is safe. > > The way things worked before, this was true, but now AFAICS it's > false. I'm not sure whether that means that DropDatabaseBuffers() here > is actually unsafe or whether it just means that you haven't updated > the comment to explain the reason. > I think DropDatabaseBuffers(), itself is unsafe, we just copied pages using bufmgr and dirtied the buffers so dropping buffers is definitely unsafe here. > + * Since we copy the file directly without looking at the shared buffers, > + * we'd better first flush out any pages of the source relation that are > + * in shared buffers. We assume no new changes will be made while we are > + * holding exclusive lock on the rel. > > Ditto. > Yeah this comment no longer makes sense now. > > + /* As always, WAL must hit the disk before the data update does. */ > > Actually, the way it's coded now, part of the on-disk changes are done > before WAL is issued, and part are done after. I doubt that's the > right idea. There's nothing special about writing the actual payload > bytes vs. the other on-disk changes (creating directories and files). > In any case the ordering deserves a better-considered comment than > this one. > Agreed to all, but In general, I think WAL hitting the disk before data is more applicable for the shared buffers, where we want to flush the WAL first before writing the shared buffer so that in case of torn page we have an option to recover the page from previous FPI. But in such cases where we are creating a directory or file there is no such requirement. Anyways, I agreed with the comments that it should be more uniform and the comment should be correct. + XLogRegisterData((char *) PG_MAJORVERSION, nbytes); > > Surely this is utterly pointless. > Yeah it is. During the WAL replay also we know the PG_MAJORVERSION :) > + CopyDatabase(src_dboid, dboid, src_deftablespace, dst_deftablespace); > PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback, > PointerGetDatum()); > > I'd leave braces around the code for which we're ensuring error > cleanup even if it's just one line. > Okay > + if (info == XLOG_DBASEDIR_CREATE) > { > xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) > XLogRecGetData(record); > > Seems odd to rename the record but not change the name of the struct. > I think I would be inclined to keep the existing record name, but if > we're going to change it we should be more thorough. > Right, I think we can leave the record name as it is. [1] https://www.postgresql.org/message-id/CAFiTN-sP_6hWv5AxcwnWCgg%3D4hyEeeZcCgFucZsYWpr3XQbP1g%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pg_receivewal starting position
Le jeudi 2 septembre 2021, 10:37:20 CEST Michael Paquier a écrit : > On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote: > > I could see the use for sending active_pid for use within pg_basebackup: > > at > > least we could fail early if the slot is already in use. But at the same > > time, maybe it won't be in use anymore once we need it. > > There is no real concurrent protection with this design. You could > read that the slot is not active during READ_REPLICATION_SLOT just to > find out after in the process of pg_basebackup streaming WAL that it > became in use in-between. And the backend-side protections would kick > in at this stage. > > Hmm. The logic doing the decision-making with pg_receivewal may > become more tricky when it comes to pg_replication_slots.wal_status, > max_slot_wal_keep_size and pg_replication_slots.safe_wal_size. The > number of cases we'd like to consider impacts directly the amount of > data send through READ_REPLICATION_SLOT. That's not really different > than deciding of a failure, a success or a retry with active_pid at an > earlier or a later stage of a base backup. pg_receivewal, on the > contrary, can just rely on what the backend tells when it begins > streaming. So I'd prefer keeping things simple and limit the number > of fields a minimum for this command. Ok. Please find attached new patches rebased on master.* 0001 is yours without any modification. 0002 for pg_receivewal tried to simplify the logic of information to return, by using a dedicated struct for this. This accounts for Bahrath's demands to return every possible field. In particular, an is_logical field simplifies the detection of the type of slot. In my opinion it makes sense to simplify it like this on the client side while being more open-minded on the server side if we ever need to provide a new type of slot. Also, GetSlotInformation now returns an enum to be able to handle the different modes of failures, which differ between pg_receivewal and pg_basebackup. 0003 is the pg_basebackup one, not much changed except for the concerns you had about the log message and handling of different failure modes. There is still the concern raised by Bharath about being able to select the way to fetch the replication slot information for the user, and what should or should not be included in the documentation. I am in favor of documenting the process of selecting the wal start, and maybe include a --start-lsn option for the user to override it, but that's maybe for another patch. -- Ronan Dunklau>From 60b8cedb196f5acdd75b489c1d2155c2698084a4 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 2 Sep 2021 16:25:25 +0900 Subject: [PATCH v5 1/3] Add READ_REPLICATION_SLOT command --- doc/src/sgml/protocol.sgml | 66 src/backend/replication/repl_gram.y | 16 ++- src/backend/replication/repl_scanner.l | 1 + src/backend/replication/walsender.c | 112 src/include/nodes/nodes.h | 1 + src/include/nodes/replnodes.h | 10 ++ src/test/recovery/t/001_stream_rep.pl | 47 +++- src/test/recovery/t/006_logical_decoding.pl | 15 ++- src/tools/pgindent/typedefs.list| 1 + 9 files changed, 266 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index a232546b1d..8191f17137 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2052,6 +2052,72 @@ The commands accepted in replication mode are: + +READ_REPLICATION_SLOT slot_name + READ_REPLICATION_SLOT + + + + Read the information of a replication slot. Returns a tuple with + NULL values if the replication slot does not + exist. + + + In response to this command, the server will return a one-row result set, + containing the following fields: + + +slot_type (text) + + + The replication slot's type, either physical or + logical. + + + + + +restart_lsn (text) + + + The replication slot's restart_lsn. + + + + + +confirmed_flush_lsn (text) + + + The replication slot's confirmed_flush_lsn. + + + + + +restart_tli (int4) + + + The timeline ID for the restart_lsn position, + when following the current timeline history. + + + + + +confirmed_flush_tli (int4) + + + The timeline ID for the confirmed_flush_lsn + position, when following the current timeline history. + + + + + + + + START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ] START_REPLICATION diff --git
Re: Trap errors from streaming child in pg_basebackup to exit early
> On 1 Sep 2021, at 12:28, Bharath Rupireddy > wrote: > > On Wed, Sep 1, 2021 at 1:56 PM Daniel Gustafsson wrote: >> A v2 with the above fixes is attached. > > Thanks for the updated patch. Here are some comments: > > 1) Do we need to set bgchild = -1 before the exit(1); in the code > below so that we don't kill(bgchild, SIGTERM); unnecessarily in > kill_bgchild_atexit? Good point. We can also inspect bgchild_exited in kill_bgchild_atexit. > 2) Missing "," after "On Windows, we use a ." > + * that time. On Windows we use a background thread which can communicate > > 3) How about "/* Flag to indicate whether or not child process exited > */" instead of +/* State of child process */? Fixed. > 4) Instead of just exiting from the main pg_basebackup process when > the child WAL receiver dies, can't we think of restarting the child > process, probably with the WAL streaming position where it left off or > stream from the beginning? This way, the work that the main > pg_basebackup has done so far doesn't get wasted. I'm not sure if this > affects the pg_basebackup functionality. We can restart the child > process for 1 or 2 times, if it still dies, we can kill the main > pg_baasebackup process too. Thoughts? I was toying with the idea, but I ended up not pursuing it. This error is well into the “really shouldn’t happen, but can” territory and it’s quite likely that some level of manual intervention is required to make it successfully restart. I’m not convinced that adding complicated logic to restart (and even more complicated tests to simulate and test it) will be worthwhile. -- Daniel Gustafsson https://vmware.com/ v3-0001-Quick-exit-on-log-stream-child-exit-in-pg_basebac.patch Description: Binary data
Re: Added schema level support for publication.
On Mon, Aug 30, 2021 at 8:56 PM vignesh C wrote: > > On Mon, Aug 30, 2021 at 9:10 AM houzj.f...@fujitsu.com > wrote: > > > > > 5) > > + if (list_length(pubobj->name) == 1 && > > + (strcmp(relname, "CURRENT_SCHEMA") == 0)) > > + ereport(ERROR, > > + > > errcode(ERRCODE_SYNTAX_ERROR), > > + errmsg("invalid relation > > name at or near"), > > + parser_errposition(pstate, > > pubobj->location)); > > > > Maybe we don't need this check, because it will report an error in > > OpenTableList() anyway, "relation "CURRENT_SCHEMA" does not exist" , and > > that > > message seems readable to me. > > Allowing CURRENT_SCHEMA is required to support current schema for > schema publications, currently I'm allowing this syntax during parsing > and this error is thrown for relations later, this is done to keep the > similar error as earlier before this feature support. I felt we can > keep it like this to maintain the similar error. Thoughts? > I find this check quite ad-hoc in the code and I am not sure if we need to be consistent for the exact message in this case. So, I think it is better to remove it. > > > About 0003 > > 7) > > The v22-0003 seems simple and can remove lots of code in patch v22-0001, so > > maybe we can merge 0001 and 0003 into one patch ? > > I agree that the code becomes simpler, it reduces a lot of code. I had > kept it like that as the testing effort might be more and also I was > waiting if there was no objection for that syntax from anyone else. I > will wait for a few more reviews and merge it to 0001 if there are no > objections. > +1 to merge the patch as suggested by Hou-San. -- With Regards, Amit Kapila.
Re: Question about an Extension Project
Hi, I don't want to sound overly rude, but I suggest not spamming this list with the same message when you don't get an answer right away. If no one answered the first time, they're not going to answer the second time. Providing a quote usually requires some sort of a business relationship, and I doubt people on this list will rush to do that when the person is entirely anonymous, without any prior history in the community, etc. As for the technical side, I only quickly skimmed the specification, and it's entirely unclear to me (a) Why? What's the whole point of the proposed extension, and why e.g. pgmp is not suitable to achieve that. (b) What? Are the proposed data types & aritmetics a completely new thing, or is that already implemented somewhere? I doubt people on this list will be interested in inventing entirely new ways to do math (as opposed to using a library that already exists). regards On 9/3/21 10:17 AM, A Z wrote: To who it may concern, I am trying to get a project completed to enhance PostgreSQL arithmetic and elementary functions prowess by means of two new High Precision mixed decimal number types in a self installing extension. Hopefully, I want this to be a free or low cost project. Is there anyone who can read these project specifications and email back to me here, at powerus...@live.com.au, to give me a quote for this project? They are in my top posting at this discussion thread, at: https://github.com/dvarrazzo/pgmp/issues/22 The extension could be called HPPM, High Precision Postgresql Mathematics. It is to be written in C, and will need a number of offline installers for major operating systems, like Windows 10/11 or rpm based Linux. The project could be hosted on SourceForge or GitHub. If anyone on this list is interested, or knows which direction to point me in, could they please reply to me here, at powerus...@live.com.au? ZM. -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Jun 18, 2021 at 12:18 AM Andres Freund wrote: > Hi, > > On 2021-06-17 14:22:52 -0400, Robert Haas wrote: > > On Thu, Jun 17, 2021 at 2:17 PM Andres Freund > wrote: > > > Adding a hacky special case implementation for cross-database relation > > > accesses that violates all kinds of assumptions (like holding a lock on > > > a relation when accessing it / pinning pages, processing relcache > > > invals, ...) doesn't seem like a good plan. > > > > I agree that we don't want hacky code that violates assumptions, but > > bypassing shared_buffers is a bit hacky, too. Can't we lock the > > relations as we're copying them? We know pg_class's OID a fortiori, > > and we can find out all the other OIDs as we go. > We possibly can - but I'm not sure that won't end up violating some > other assumptions. > Yeah, we can surely lock the relation as described by Robert, but IMHO, while creating the database we are already holding the exclusive lock on the database and there is no one else allowed to be connected to the database, so do we actually need to bother about the lock for the correctness? > > I'm just thinking that the hackiness of going around shared_buffers > > feels irreducible, but maybe the hackiness in the patch is something > > that can be solved with more engineering. > > Which bypassing of shared buffers are you talking about here? We'd still > have to solve a subset of the issues around locking (at least on the > source side), but I don't think we need to read pg_class contents to be > able to go through shared_buffers? As I suggested, we can use the init > fork presence to infer relpersistence? > I believe we want to avoid scanning pg_class for identifying the relation list so that we can avoid this special-purpose code? IMHO, scanning the disk, such as going through all the tablespaces and then finding the source database directory and identifying each relfilenode, also appears to be a special-purpose code, unless I am missing what you mean by special-purpose code. Or do you mean that looking at the filesystem at all is bypassing shared > buffers? > I think we already have such a code in multiple places where we bypass the shared buffers for copying the relation e.g. index_copy_data(), heapam_relation_copy_data(). So the current system as it stands, we can not claim that we are designing something for the first time where we are bypassing the shared buffers. So if we are thinking that bypassing the shared buffers is hackish and we don't want to use it for the new patches then lets avoid it completely even while identifying the relfilenodes to be copied. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] pg_ctl should not truncate command lines at 1024 characters
On 2021-09-03 02:09, Tom Lane wrote: I think that these free() calls you propose to add are a complete waste of code space. Certainly a free() right before an exit() call is that; if anything, it's *delaying* recycling the memory space for some useful purpose. But no part of pg_ctl runs long enough for it to be worth worrying about small leaks. OK, I have removed the free() before exit(). I do not find your proposed test case to be a useful expenditure of test cycles, either. If it ever fails, we'd learn nothing, except that that particular platform has a surprisingly small command line length limit. Hmm, it's a test case that fails with the current code and stops failing with my fix, so I've put it there to show the problem. But, truly, it does not bring much value after the fix is applied. Attaching the new version, with the test case and free-before-exit removed. -- Ph.From 4f8e39f3c5e39b5ec507ec4b07ad5d33c6610524 Mon Sep 17 00:00:00 2001 From: Phil Krylov Date: Thu, 2 Sep 2021 21:39:58 +0200 Subject: [PATCH] pg_ctl should not truncate command lines at 1024 characters --- src/bin/pg_ctl/pg_ctl.c | 43 +++ 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 7985da0a94..e07b5f02d7 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -442,7 +442,7 @@ free_readfile(char **optlines) static pgpid_t start_postmaster(void) { - char cmd[MAXPGPATH]; + char *cmd; #ifndef WIN32 pgpid_t pm_pid; @@ -487,12 +487,12 @@ start_postmaster(void) * has the same PID as the current child process. */ if (log_file != NULL) - snprintf(cmd, MAXPGPATH, "exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1", - exec_path, pgdata_opt, post_opts, - DEVNULL, log_file); + cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1", + exec_path, pgdata_opt, post_opts, + DEVNULL, log_file); else - snprintf(cmd, MAXPGPATH, "exec \"%s\" %s%s < \"%s\" 2>&1", - exec_path, pgdata_opt, post_opts, DEVNULL); + cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1", + exec_path, pgdata_opt, post_opts, DEVNULL); (void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL); @@ -553,12 +553,12 @@ start_postmaster(void) else close(fd); - snprintf(cmd, MAXPGPATH, "\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"", - comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file); + cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"", + comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file); } else - snprintf(cmd, MAXPGPATH, "\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"", - comspec, exec_path, pgdata_opt, post_opts, DEVNULL); + cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"", + comspec, exec_path, pgdata_opt, post_opts, DEVNULL); if (!CreateRestrictedProcess(cmd, , false)) { @@ -566,6 +566,7 @@ start_postmaster(void) progname, (unsigned long) GetLastError()); exit(1); } + free(cmd); /* Don't close command process handle here; caller must do so */ postmasterProcess = pi.hProcess; CloseHandle(pi.hThread); @@ -828,7 +829,7 @@ find_other_exec_or_die(const char *argv0, const char *target, const char *versio static void do_init(void) { - char cmd[MAXPGPATH]; + char *cmd; if (exec_path == NULL) exec_path = find_other_exec_or_die(argv0, "initdb", "initdb (PostgreSQL) " PG_VERSION "\n"); @@ -840,17 +841,18 @@ do_init(void) post_opts = ""; if (!silent_mode) - snprintf(cmd, MAXPGPATH, "\"%s\" %s%s", - exec_path, pgdata_opt, post_opts); + cmd = psprintf("\"%s\" %s%s", + exec_path, pgdata_opt, post_opts); else - snprintf(cmd, MAXPGPATH, "\"%s\" %s%s > \"%s\"", - exec_path, pgdata_opt, post_opts, DEVNULL); + cmd = psprintf("\"%s\" %s%s > \"%s\"", + exec_path, pgdata_opt, post_opts, DEVNULL); if (system(cmd) != 0) { write_stderr(_("%s: database system initialization failed\n"), progname); exit(1); } + free(cmd); } static void @@ -2175,7 +2177,7 @@ set_starttype(char *starttypeopt) static void adjust_data_dir(void) { - char cmd[MAXPGPATH], + char *cmd, filename[MAXPGPATH], *my_exec_path; FILE *fd; @@ -2207,10 +2209,10 @@ adjust_data_dir(void) my_exec_path = pg_strdup(exec_path); /* it's important for -C to be the first option, see main.c */ - snprintf(cmd, MAXPGPATH, "\"%s\" -C data_directory %s%s", - my_exec_path, - pgdata_opt ? pgdata_opt : "", - post_opts ? post_opts : ""); + cmd = psprintf("\"%s\" -C data_directory %s%s", + my_exec_path, + pgdata_opt ? pgdata_opt : "", + post_opts ? post_opts : ""); fd = popen(cmd, "r"); if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL) @@ -2219,6 +2221,7 @@ adjust_data_dir(void) exit(1); } pclose(fd); + free(cmd); free(my_exec_path); /* strip trailing newline and carriage return */ -- 2.15.2 (Apple
Question about an Extension Project
To who it may concern, I am trying to get a project completed to enhance PostgreSQL arithmetic and elementary functions prowess by means of two new High Precision mixed decimal number types in a self installing extension. Hopefully, I want this to be a free or low cost project. Is there anyone who can read these project specifications and email back to me here, at powerus...@live.com.au, to give me a quote for this project? They are in my top posting at this discussion thread, at: https://github.com/dvarrazzo/pgmp/issues/22 The extension could be called HPPM, High Precision Postgresql Mathematics. It is to be written in C, and will need a number of offline installers for major operating systems, like Windows 10/11 or rpm based Linux. The project could be hosted on SourceForge or GitHub. If anyone on this list is interested, or knows which direction to point me in, could they please reply to me here, at powerus...@live.com.au? ZM.
Re: corruption of WAL page header is never reported
At Fri, 03 Sep 2021 16:55:36 +0900 (JST), Kyotaro Horiguchi wrote in > > Yes, I'm fine with your latest patch. > > Thanks. Maybe some additional comment is needed. Something like this. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 24165ab03e..b621ad6b0f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12496,9 +12496,21 @@ retry: * * Validating the page header is cheap enough that doing it twice * shouldn't be a big deal from a performance point of view. + * + * Don't call XLogReaderValidatePageHeader here while not in standby mode + * so that this function won't return with a valid errmsg_buf. */ - if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) + if (StandbyMode && + !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) { + /* + * emit this error right now then retry this page immediately + * the message is already translated + */ + if (xlogreader->errormsg_buf[0]) + ereport(emode_for_corrupt_record(emode, EndRecPtr), + (errmsg_internal("%s", xlogreader->errormsg_buf))); + /* reset any error XLogReaderValidatePageHeader() might have set */ xlogreader->errormsg_buf[0] = '\0'; goto next_record_is_invalid;
Re: corruption of WAL page header is never reported
At Thu, 2 Sep 2021 21:52:00 +0900, Fujii Masao wrote in > > > On 2021/09/02 16:26, Kyotaro Horiguchi wrote: > > I believe errmsg_buf is an interface to emit error messages dedicated > > to xlogreader that doesn't have an access to elog facility, and > > xlogreader doesn't (or ought not to or expect to) suppose > > non-xlogreader callback functions set the variable. In that sense I > > don't think theoriginally proposed patch is proper for the reason that > > the non-xlogreader callback function may set errmsg_buf. This is what > > I meant by the word "modularity". > > For that reason I avoided in my second proposal to call > > XLogReaderValidatePageHeader() at all while not in standby mode, > > because calling the validator function while in non-standby mode > > results in the non-xlogreader function return errmsg_buf. Of course > > we can instead always consume errmsg_buf in the function but I don't > > like to shadow the caller's task. > > Understood. Thanks for clarifying this! > > > Does that makes sense? > > Yes, I'm fine with your latest patch. Thanks. Maybe some additional comment is needed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: when the startup process doesn't (logging startup delays)
> > Anything that's not used in other files should be declared static in > > the file itself, and not present in the header. Once you fix this for > > reset_startup_progress_timeout, the header won't need to include > > datatype/timestamp.h any more, which is good, because we don't want > > header files to depend on more other header files than necessary. > > Thanks for identifying this. I will take care in the next patch. Fixed. > > This hunk should be removed. > > I will remove it in the next patch. Removed. Please find the updated patch attached. On Wed, Aug 18, 2021 at 12:23 PM Nitin Jadhav wrote: > > > Anything that's not used in other files should be declared static in > > the file itself, and not present in the header. Once you fix this for > > reset_startup_progress_timeout, the header won't need to include > > datatype/timestamp.h any more, which is good, because we don't want > > header files to depend on more other header files than necessary. > > Thanks for identifying this. I will take care in the next patch. > > > Looking over this version, I realized something I (or you) should have > > noticed sooner: you've added the RegisterTimeout call to > > InitPostgres(), but that's for things that are used by all backends, > > and this is only used by the startup process. So it seems to me that > > the right place is StartupProcessMain. That would have the further > > advantage of allowing startup_progress_timeout_handler to be made > > static. begin_startup_progress_phase() and > > startup_progress_timeout_has_expired() are the actual API functions > > though so they will need to remain extern. > > Yes. I had noticed this earlier and the RegisterTimeout() call was > only present in StartupProcessMain() and not in InitPostgres() in the > earlier versions (v7) of the patch. Since StartupXLOG() gets called in > the 2 places, I had restricted the InitPostgres() flow by checking for > the !AmStartupProcess() in the newly added functions. But later we had > discussion and concluded to add the RegisterTimeout() call even in > case of InitPostgres(). Kindly refer to the discussion just after the > v7 patch in this thread and let me know your thoughts. > > > This hunk should be removed. > > I will remove it in the next patch. > > On Tue, Aug 17, 2021 at 1:08 AM Robert Haas wrote: > > > > On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby wrote: > > > Should this feature distinguish between crash recovery and archive > > > recovery on > > > a hot standby ? Otherwise the standby will display this all the time. > > > > > > 2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, > > > elapsed time: 124.42 s, current LSN: 0/EEE2100 > > > > > > If so, I think maybe you'd check !InArchiveRecovery (but until Robert > > > finishes > > > cleanup of xlog.c variables, I can't say that with much confidence). > > > > Hmm. My inclination is to think that on an actual standby, you > > wouldn't want to get messages like this, but if you were doing a > > point-in-time-recovery, then you would. So I think maybe > > InArchiveRecovery is not the right thing. Perhaps StandbyMode? > > > > -- > > Robert Haas > > EDB: http://www.enterprisedb.com v13-0001-startup-process-progress.patch Description: Binary data
Re: Improve logging when using Huge Pages
Hello. At Fri, 3 Sep 2021 06:28:58 +, "Shinoda, Noriyoshi (PN Japan FSIP)" wrote in > Fujii-san, Julien-san > > Thank you very much for your comment. > I followed your comment and changed the elog function to ereport function and > also changed the log level. The output message is the same as in the case of > non-HugePages memory acquisition failure.I did not simplify the error > messages as it would have complicated the response to the preprocessor. > > > I agree that the message should be promoted to a higher level. But I > > think we should also make that information available at the SQL level, > > as the log files may be truncated / rotated before you need the info, > > and it can be troublesome to find the information at the OS level, if > > you're lucky enough to have OS access. > > In the attached patch, I have added an Internal GUC 'using_huge_pages' to > know that it is using HugePages. This parameter will be True only if the > instance is using HugePages. IF you are thinking to show that in GUC, you might want to look into the nearby thread [1], especially about the behavior when invoking postgres -C using_huge_pages. (Even though the word "using" in the name may suggest that the server is running, but I don't think it is neat that the variable shows "no" by the command but shows "yes" while the same server is running.) I have some comment about the patch. - if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED) - elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m", -allocsize); + if (ptr != MAP_FAILED) + using_huge_pages = true; + else if (huge_pages == HUGE_PAGES_TRY) + ereport(LOG, + (errmsg("could not map anonymous shared memory: %m"), +(mmap_errno == ENOMEM) ? +errhint("This error usually means that PostgreSQL's request " If we set huge_pages to try and postgres falled back to regular pages, it emits a large message relative to its importance. The user specifed that "I'd like to use huge pages, but it's ok if not available.", so I think the message should be far smaller. Maybe just raising the DEBUG1 message to LOG along with moving to ereport might be sufficient. - elog(DEBUG1, "CreateFileMapping(%zu) with SEC_LARGE_PAGES failed, " -"huge pages disabled", -size); + ereport(LOG, + (errmsg("could not create shared memory segment: error code %lu", GetLastError()), +errdetail("Failed system call was CreateFileMapping(size=%zu, name=%s).", + size, szShareMem))); It doesn't seem to be a regular user-facing message. Isn't it sufficient just to raise the log level to LOG? [1] https://www.postgresql.org/message-id/20210903.141206.103927759882272221.horikyota.ntt%40gmail.com
Re: prevent immature WAL streaming
At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera wrote in > On 2021-Sep-02, Kyotaro Horiguchi wrote: > > > So, this is a crude PoC of that. > > I had ended up with something very similar, except I was trying to cram > the flag via the checkpoint record instead of hacking > AdvanceXLInsertBuffer(). I removed that stuff and merged both, here's > the result. > > > 1. This patch is written on the current master, but it doesn't > > interfare with the seg-boundary-memorize patch since it removes the > > calls to RegisterSegmentBoundary. > > I rebased on top of the revert patch. Thanks! > > 2. Since xlogreader cannot emit a log-message immediately, we don't > > have a means to leave a log message to inform recovery met an > > aborted partial continuation record. (In this PoC, it is done by > > fprintf:p) > > Shrug. We can just use an #ifndef FRONTEND / elog(LOG). (I didn't keep > this part, sorry.) No problem, it was mere a develop-time message for behavior observation. > > 3. Myebe we need to pg_waldump to show partial continuation records, > > but I'm not sure how to realize that. > > Ah yes, we'll need to fix that. I just believe 0001 does the right thing. 0002: > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at > end of > +* > WAL */ The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work? > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) > { > report_invalid_record(state, > > "there is no contrecord flag at %X/%X", > > LSN_FORMAT_ARGS(RecPtr)); > - goto err; > + goto aborted_contrecord; This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and _IS_ABROTED_PARTIAL. Is it okay? (I don't object to remove the check.). I didin't thought this as an aborted contrecord. but on second thought, when we see a record broken in any style, we stop recovery at the point. I agree to the change and all the silmiar changes. + /* XXX should we goto aborted_contrecord here? */ I think it should be aborted_contrecord. When that happens, the loaded bytes actually looked like the first fragment of a continuation record to xlogreader, even if the cause were a broken total_len. So if we abort the record there, the next time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same page, and correctly finds a new record there. On the other hand if we just errored-out there, we will step-back to the beginning of the broken record in the previous page or segment then start writing a new record there but that is exactly what we want to avoid now. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Unused variable in TAP tests file
On Fri, Sep 03, 2021 at 10:53:19AM +0530, Amul Sul wrote: > Few tap test files have the "tempdir_short" variable which isn't in > use. The attached patch removes the same Indeed. Let's clean up that. Thanks! -- Michael signature.asc Description: PGP signature
Add guc to enable send SIGSTOP to peers when backend exits abnormally
Hi all I want to share a patch with you, in which I add a guc parameter 'enable_send_stop' to enable set the value of SendStop in postmaster.c more conveniently. SendStop enable postmaster to send SIGSTOP rather than SIGQUIT to its children when some backend dumps core, and this variable is originally set with -T parameter when start postgres, which is inconvenient to control in some scenarios. Thanks & Best Regards 0001-Add-guc-to-set-SendStop.patch Description: Binary data
Re: suboverflowed subtransactions concurrency performance optimize
Sorry, for some reason Mail.app converted message to html and mailing list mangled this html into mess. I'm resending previous message as plain text again. Sorry for the noise. > 31 авг. 2021 г., в 11:43, Pengchengliu написал(а): > > Hi Andrey, > Thanks a lot for your replay and reference information. > > The default NUM_SUBTRANS_BUFFERS is 32. My implementation is > local_cache_subtrans_pages can be adjusted dynamically. > If we configure local_cache_subtrans_pages as 64, every backend use only > extra 64*8192=512KB memory. > So the local cache is similar to the first level cache. And subtrans SLRU is > the second level cache. > And I think extra memory is very well worth it. It really resolve massive > subtrans stuck issue which I mentioned in previous email. > > I have view the patch of [0] before. For SLRU buffers adding GUC > configuration parameters are very nice. > I think for subtrans, its optimize is not enough. For > SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then > call SubTransGetParent in loop. > Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> > SubTransGetParent in loop. > After I apply this patch which I optimize SubTransGetTopmostTransaction, > with my test case, I still get stuck result. SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page from disk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections. I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approach we will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked having my own banks, but consistency worth it anyway. Thanks! Best regards, Andrey Borodin.
Re: Skipping logical replication transactions on subscriber side
On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada wrote: > > I've attached rebased patches. 0004 patch is not the scope of this > patch. It's borrowed from another thread[1] to fix the assertion > failure for newly added tests. Please review them. > BTW, these patches need rebasing (broken by recent commits, patches 0001, 0003 and 0004 no longer apply, and it's failing in the cfbot). Regards, Greg Nancarrow Fujitsu Australia
RE: Improve logging when using Huge Pages
Fujii-san, Julien-san Thank you very much for your comment. I followed your comment and changed the elog function to ereport function and also changed the log level. The output message is the same as in the case of non-HugePages memory acquisition failure.I did not simplify the error messages as it would have complicated the response to the preprocessor. > I agree that the message should be promoted to a higher level. But I > think we should also make that information available at the SQL level, > as the log files may be truncated / rotated before you need the info, > and it can be troublesome to find the information at the OS level, if > you're lucky enough to have OS access. In the attached patch, I have added an Internal GUC 'using_huge_pages' to know that it is using HugePages. This parameter will be True only if the instance is using HugePages. Regards, Noriyoshi Shinoda -Original Message- From: Fujii Masao [mailto:masao.fu...@oss.nttdata.com] Sent: Wednesday, September 1, 2021 2:06 AM To: Julien Rouhaud ; Shinoda, Noriyoshi (PN Japan FSIP) Cc: pgsql-hack...@postgresql.org Subject: Re: Improve logging when using Huge Pages On 2021/08/31 15:57, Julien Rouhaud wrote: > On Tue, Aug 31, 2021 at 1:37 PM Shinoda, Noriyoshi (PN Japan FSIP) > wrote: >> >> In the current version, when GUC huge_pages=try, which is the default >> setting, no log is output regardless of the success or failure of the >> HugePages acquisition. If you want to output logs, you need to set >> log_min_messages=DEBUG3, but it will output a huge amount of extra logs. >> With huge_pages=try setting, if the kernel parameter vm.nr_hugepages is not >> enough, the administrator will not notice that HugePages is not being used. >> I think it should output a log if HugePages was not available. +1 - elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m", + elog(WARNING, "mmap(%zu) with MAP_HUGETLB failed, huge pages +disabled: %m", elog() should be used only for internal errors and low-level debug logging. So per your proposal, elog() is not suitable here. Instead, ereport() should be used. The log level should be LOG rather than WARNING because this message indicates the information about server activity that administrators are interested in. The message should be updated so that it follows the Error Message Style Guide. https://www.postgresql.org/docs/devel/error-style-guide.html With huge_pages=on, if shared memory fails to be allocated, the error message is reported currently. Even with huge_page=try, this error message should be used to simplify the code as follows? errno = mmap_errno; - ereport(FATAL, + ereport((huge_pages == HUGE_PAGES_ON) ? FATAL : LOG, (errmsg("could not map anonymous shared memory: %m"), (mmap_errno == ENOMEM) ? errhint("This error usually means that PostgreSQL's request " > I agree that the message should be promoted to a higher level. But I > think we should also make that information available at the SQL level, > as the log files may be truncated / rotated before you need the info, > and it can be troublesome to find the information at the OS level, if > you're lucky enough to have OS access. +1 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION huge_pages_log_v2.diff Description: huge_pages_log_v2.diff
Re: Read-only vs read only vs readonly
At Thu, 2 Sep 2021 22:07:02 +, "Bossart, Nathan" wrote in > On 9/2/21, 11:30 AM, "Magnus Hagander" wrote: > > I had a customer point out to me that we're inconsistent in how we > > spell read-only. Turns out we're not as inconsistent as I initially > > thought :), but that they did manage to spot the one actual log > > message we have that writes it differently than everything else -- but > > that broke their grepping... > > > > Almost everywhere we use read-only. Attached patch changes the one log > > message where we didn't, as well as a few places in the docs for it. I > > did not bother with things like comments in the code. > > > > Two questions: > > > > 1. Is it worth fixing? Or just silly nitpicking? > > It seems entirely reasonable to me to consistently use "read-only" in > the log messages and documentation. > > > 2. What about translations? This string exists in translations -- > > should we just "fix" it there, without touching the translated string? > > Or try to fix both? Or leave it for the translators who will get a > > diff on it? > > I don't have a strong opinion, but if I had to choose, I would say to > leave it to the translators to decide. +1 for both. As a translator, it seems that that kind of changes are usual. Many changes about full-stops, spacings, capitalizing and so happen especially at near-release season like now. regards. -- Kyotaro Horiguchi NTT Open Source Software Center