Re: Can 64bit libpq to access 32bit postgresDB in X86_64 platform

2022-04-09 Thread Tom Lane
=?UTF-8?B?6ZmI?=  writes:
> I tried to access the PostgresDB 32bit, with 64bit libpq and it works very 
> nice.
> So my question is,  can we use 64bit lippq to access 32bit PostgresDB in 
> X86_64 platform? 

Seems like you know your answer already.

In any case, the PG wire protocol is platform-independent, so yes.

regards, tom lane




Can 64bit libpq to access 32bit postgresDB in X86_64 platform

2022-04-09 Thread
Hi, all


We use 32bit postgres PG11 in our product for X86_64 platform, Now we want to 
add new process which is 64bit
to access the 32bit progres using 64bit libpq.


I tried to access the PostgresDB 32bit, with 64bit libpq and it works very nice.
So my question is,  can we use 64bit lippq to access 32bit PostgresDB in X86_64 
platform? 


Thank you,
BoChen

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-09 Thread Tatsuo Ishii
> The buildfarm is still complaining about the synopsis being too
> wide for PDF format.  I think what we ought to do is give up on
> using a  for log lines at all, and instead convert the
> documentation into a tabular list of fields.  Proposal attached,
> which also fixes a couple of outright errors.

Once I thought about that too. Looks good to me.

> One thing that this doesn't fix is that the existing text appears
> to suggest that the "failures" column is something different from
> the sum of the serialization_failures and deadlock_failures
> columns, which it's obvious from the code is not so.  If this isn't
> a code bug then I think we ought to just drop that column entirely,
> because it's redundant.

+1.

> (BTW, now that I've read this stuff I am quite horrified by how
> the non-aggregated log format has been mangled for error retries,
> and will be probably be submitting a proposal to change that.
> But that's a different issue.)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Tom Lane
Andres Freund  writes:
> It's been broken in different ways all the way back to 9.0, from what I can
> see, but I didn't check every single version.

> Afaics the fix is to nuke the idea of doing anything substantial in the signal
> handler from orbit, and instead just set a flag in the handler.

+1.  This is probably more feasible given the latch infrastructure
than it was when that code was first written.

regards, tom lane




Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Andres Freund
Hi,

On 2022-04-09 16:10:02 -0700, Andres Freund wrote:
> It's not that (although I still suspect it's a problem). It's a self-deadlock,
> because StandbyTimeoutHandler(), which ResolveRecoveryConflictWithBufferPin()
> *explicitly enables*, calls SendRecoveryConflictWithBufferPin(). Which does
> CancelDBBackends(). Which ResolveRecoveryConflictWithBufferPin() also calls,
> if the deadlock timeout is reached.
> 
> To make it easier to hit, I put a pg_usleep(1) in CancelDBBackends(), and 
> boom:
> 
> [... backtrace ... ]
>
> it's reproducible on linux.
> 
> 
> I'm lacking words I dare to put in an email to describe how bad an idea it is
> to call CancelDBBackends() from within a timeout function, particularly when
> the function enabling the timeout also calls that function itself. Before
> disabling timers.

It's been broken in different ways all the way back to 9.0, from what I can
see, but I didn't check every single version.

Afaics the fix is to nuke the idea of doing anything substantial in the signal
handler from orbit, and instead just set a flag in the handler. Then check
whether the timeout happend after ProcWaitForSignal() and call
SendRecoveryConflictWithBufferPin().


Also worth noting that the disable_all_timeouts() calls appears to break
STARTUP_PROGRESS_TIMEOUT.

Greetings,

Andres Freund




GSOC: New and improved website for pgjdbc (JDBC) (2022)

2022-04-09 Thread S.R Keshav



postgreSql_ New and improved website for pgjdbc (JDBC) (2022).pdf
Description: Adobe PDF document


Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Andres Freund
Hi,

On 2022-04-09 15:00:54 -0700, Andres Freund wrote:
> What are we expecting to wake the startup process up, once it does
> SendRecoveryConflictWithBufferPin()?
>
> It's likely not the problem here, because we never seem to have even reached
> that path, but afaics once we've called disable_all_timeouts() at the bottom
> of ResolveRecoveryConflictWithBufferPin() and then re-entered
> ResolveRecoveryConflictWithBufferPin(), and go into the "We're already behind,
> so clear a path as quickly as possible."  path, there's no guarantee for any
> timeout to be pending anymore?
>
> If there's either no backend that we're still conflicting with (an entirely
> possible race condition), or if there's e.g. a snapshot or database conflict,
> there's afaics nobody setting the startup processes' latch.

It's not that (although I still suspect it's a problem). It's a self-deadlock,
because StandbyTimeoutHandler(), which ResolveRecoveryConflictWithBufferPin()
*explicitly enables*, calls SendRecoveryConflictWithBufferPin(). Which does
CancelDBBackends(). Which ResolveRecoveryConflictWithBufferPin() also calls,
if the deadlock timeout is reached.

To make it easier to hit, I put a pg_usleep(1) in CancelDBBackends(), and 
boom:

(gdb) bt
#0  __futex_abstimed_wait_common64 (futex_word=futex_word@entry=0x7fd4cb001138, 
expected=expected@entry=0, clockid=clockid@entry=0,
abstime=abstime@entry=0x0, private=, 
cancel=cancel@entry=true) at ../sysdeps/nptl/futex-internal.c:87
#1  0x7fd4ce5a215b in __GI___futex_abstimed_wait_cancelable64 
(futex_word=futex_word@entry=0x7fd4cb001138, expected=expected@entry=0,
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=) at ../sysdeps/nptl/futex-internal.c:123
#2  0x7fd4ce59e44f in do_futex_wait (sem=sem@entry=0x7fd4cb001138, 
abstime=0x0, clockid=0) at sem_waitcommon.c:112
#3  0x7fd4ce59e4e8 in __new_sem_wait_slow64 (sem=0x7fd4cb001138, 
abstime=0x0, clockid=0) at sem_waitcommon.c:184
#4  0x7fd4cf20d823 in PGSemaphoreLock (sema=0x7fd4cb001138) at pg_sema.c:327
#5  0x7fd4cf2dd675 in LWLockAcquire (lock=0x7fd4cb001600, 
mode=LW_EXCLUSIVE) at 
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1324
#6  0x7fd4cf2c36e7 in CancelDBBackends (databaseid=0, 
sigmode=PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, conflictPending=false)
at /home/andres/src/postgresql/src/backend/storage/ipc/procarray.c:3638
#7  0x7fd4cf2cc579 in SendRecoveryConflictWithBufferPin 
(reason=PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)
at /home/andres/src/postgresql/src/backend/storage/ipc/standby.c:846
#8  0x7fd4cf2cc69d in StandbyTimeoutHandler () at 
/home/andres/src/postgresql/src/backend/storage/ipc/standby.c:911
#9  0x7fd4cf4e68d7 in handle_sig_alarm (postgres_signal_arg=14) at 
/home/andres/src/postgresql/src/backend/utils/misc/timeout.c:421
#10 
#11 0x7fd4cddfffc4 in __GI___select (nfds=0, readfds=0x0, writefds=0x0, 
exceptfds=0x0, timeout=0x7ffc6e5561c0) at ../sysdeps/unix/sysv/linux/select.c:71
#12 0x7fd4cf52ea2a in pg_usleep (microsec=1) at 
/home/andres/src/postgresql/src/port/pgsleep.c:56
#13 0x7fd4cf2c36f1 in CancelDBBackends (databaseid=0, 
sigmode=PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, conflictPending=false)
at /home/andres/src/postgresql/src/backend/storage/ipc/procarray.c:3640
#14 0x7fd4cf2cc579 in SendRecoveryConflictWithBufferPin 
(reason=PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK)
at /home/andres/src/postgresql/src/backend/storage/ipc/standby.c:846
#15 0x7fd4cf2cc50f in ResolveRecoveryConflictWithBufferPin () at 
/home/andres/src/postgresql/src/backend/storage/ipc/standby.c:820
#16 0x7fd4cf2a996f in LockBufferForCleanup (buffer=43) at 
/home/andres/src/postgresql/src/backend/storage/buffer/bufmgr.c:4336
#17 0x7fd4ceec911c in XLogReadBufferForRedoExtended (record=0x7fd4d106a618, 
block_id=0 '\000', mode=RBM_NORMAL, get_cleanup_lock=true, buf=0x7ffc6e556394)
at /home/andres/src/postgresql/src/backend/access/transam/xlogutils.c:427
#18 0x7fd4cee1aa41 in heap_xlog_prune (record=0x7fd4d106a618) at 
/home/andres/src/postgresql/src/backend/access/heap/heapam.c:8634

it's reproducible on linux.


I'm lacking words I dare to put in an email to describe how bad an idea it is
to call CancelDBBackends() from within a timeout function, particularly when
the function enabling the timeout also calls that function itself. Before
disabling timers.

I ...

Greetings,

Andres Freund




Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-04-09 Thread Andres Freund
Hi,

On 2022-04-09 14:39:16 -0700, Andres Freund wrote:
> On 2022-04-09 17:00:41 -0400, Tom Lane wrote:
> > Thomas Munro  writes:
> > > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
> > > doesn't just set a sig_atomic_t flag and poke the latch.  Is the extra
> > > stuff it does safe?  For example, is this call stack OK (to pick one
> > > that jumps out, but not the only one)?
> > 
> > > procsignal_sigusr1_handler
> > > -> RecoveryConflictInterrupt
> > >  -> HoldingBufferPinThatDelaysRecovery
> > >   -> GetPrivateRefCount
> > >-> GetPrivateRefCountEntry
> > > -> hash_search(...hash table that might be in the middle of an 
> > > update...)
> > 
> > Ugh.  That one was safe before somebody decided we needed a hash table
> > for buffer refcounts, but it's surely not safe now.
> 
> Mea culpa. This is 4b4b680c3d6d - from 2014.

Whoa. There's way worse: StandbyTimeoutHandler() calls
SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends(), which
acquires lwlocks etc.

Which very plausibly is the cause for the issue I'm investigating in
https://www.postgresql.org/message-id/20220409220054.fqn5arvbeesmxdg5%40alap3.anarazel.de

Greetings,

Andres Freund




Re: Commitfest wrapup

2022-04-09 Thread Jonathan S. Katz

On 4/9/22 1:14 PM, Greg Stark wrote:

On Sat, 9 Apr 2022 at 10:51, Tom Lane  wrote:



Sound like bugfixes to be backpatched.


Yeah.  I'm not sure why these have received so little love.


Since bug fixes are important enough that they'll definitely get done
(and can happen after feature freeze) there's a bit of a perverse
incentive to focus on other things...

"Everybody was sure that Somebody would do it. Anybody could have done
it, but Nobody did it"

I think every project struggles with bugs that sit in bug tracking
systems indefinitely. The Open Issues is the biggest hammer we have.
Maybe we should be tracking "Open Issues" from earlier in the process
-- things that we think we shouldn't do a release without addressing.


The RMT does both delineate and track open issues as a result of new 
features committed and a subset of issues in existing releases. 
Traditionally the RMT is more hands off on the latter unless it 
determines that not fixing the issue would have stability or other 
consequences if it's not included in the release.


Jonathan




OpenPGP_signature
Description: OpenPGP digital signature


Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Andres Freund
Hi,

On 2022-04-08 22:05:01 -0700, Andres Freund wrote:
> On 2022-04-08 21:55:15 -0700, Andres Freund wrote:
> > on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which 
> > is
> > interesting, because I ran it there dozens if not hundreds of times before
> > commit, with - I think - only cosmetic changes.
> 
> Scratch that part - I found an instance of the freebsd failure earlier, just
> didn't notice because that run failed for other reasons as well. So this might
> just have uncovered an older bug around recovery conflict handling,
> potentially platform dependent.
> 
> I guess I'll try to reproduce it on freebsd...

That failed.


Adding a bunch of debug statements, I think I might have found at least some
problems.

The code in LockBufferForCleanup() that actually interacts with other backends
is:

/* Publish the bufid that Startup process waits on */
SetStartupBufferPinWaitBufId(buffer - 1);
/* Set alarm and then wait to be signaled by 
UnpinBuffer() */
ResolveRecoveryConflictWithBufferPin();
/* Reset the published bufid */
SetStartupBufferPinWaitBufId(-1);

where ResolveRecoveryConflictWithBufferPin() sends procsignals to all other
backends and then waits with ProcWaitForSignal():

void
ProcWaitForSignal(uint32 wait_event_info)
{
(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
 wait_event_info);
ResetLatch(MyLatch);
CHECK_FOR_INTERRUPTS();
}

one problem is that we pretty much immediately get a SIGALRM whenever we're in
that WaitLatch(). Which does a SetLatch(), interrupting the WaitLatch(). The
startup process then proceeds to SetStartupBufferPinWaitBufId(-1).

In the unlucky cases, the backend holding the pin only processes the interrupt
(in RecoveryConflictInterrupt()) after the
SetStartupBufferPinWaitBufId(-1). The backend then does sees that
HoldingBufferPinThatDelaysRecovery() returns false, and happily continues.


But that's not the whole story, I think. It's a problem leading to conflicts
being handled more slowly, but we eventually should not have more timeouts.

Here's debugging output from a failing run, where I added a few debugging 
statements:
https://api.cirrus-ci.com/v1/artifact/task/6179111512047616/log/src/test/recovery/tmp_check/log/000_0recovery_conflict_standby.log
https://github.com/anarazel/postgres/commit/212268e753093861aa22a51657c6598c65eeb81b

Curiously, there's only
20644: received interrupt 11

Which is PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, not
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN.

I guess we've never gotten past the standby timeout.


20644: received interrupt 11
2022-04-09 21:18:23.824 UTC [20630][startup] DEBUG:  one cycle of 
LockBufferForCleanup() iterating in HS
2022-04-09 21:18:23.824 UTC [20630][startup] CONTEXT:  WAL redo at 0/3432800 
for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 1; blkref #0: rel 
1663/16385/16386, blk 0
2022-04-09 21:18:23.824 UTC [20630][startup] DEBUG:  setting timeout() in 0 
1
2022-04-09 21:18:23.824 UTC [20630][startup] CONTEXT:  WAL redo at 0/3432800 
for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 1; blkref #0: rel 
1663/16385/16386, blk 0
2022-04-09 21:18:23.835 UTC [20630][startup] DEBUG:  setting latch()
2022-04-09 21:18:23.835 UTC [20630][startup] CONTEXT:  WAL redo at 0/3432800 
for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 1; blkref #0: rel 
1663/16385/16386, blk 0
2022-04-09 21:18:23.835 UTC [20630][startup] DEBUG:  setting timeout() in 0 3481
2022-04-09 21:18:23.835 UTC [20630][startup] CONTEXT:  WAL redo at 0/3432800 
for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 1; blkref #0: rel 
1663/16385/16386, blk 0
20644: received interrupt 11
2022-04-09 21:23:47.975 UTC [20631][walreceiver] FATAL:  could not receive data 
from WAL stream: server closed the connection unexpectedly

So we sent a conflict interrupt, and then waited. And nothing happened.


What are we expecting to wake the startup process up, once it does
SendRecoveryConflictWithBufferPin()?

It's likely not the problem here, because we never seem to have even reached
that path, but afaics once we've called disable_all_timeouts() at the bottom
of ResolveRecoveryConflictWithBufferPin() and then re-entered
ResolveRecoveryConflictWithBufferPin(), and go into the "We're already behind,
so clear a path as quickly as possible."  path, there's no guarantee for any
timeout to be pending anymore?

If there's either no backend that we're still conflicting with (an entirely
possible race condition), or if there's e.g. a snapshot or database conflict,
there's afaics nobody setting the startup processes' latch.

Greetings,

Andres Freund




Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-04-09 Thread Andres Freund
Hi,

On 2022-04-09 17:00:41 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
> > doesn't just set a sig_atomic_t flag and poke the latch.  Is the extra
> > stuff it does safe?  For example, is this call stack OK (to pick one
> > that jumps out, but not the only one)?
> 
> > procsignal_sigusr1_handler
> > -> RecoveryConflictInterrupt
> >  -> HoldingBufferPinThatDelaysRecovery
> >   -> GetPrivateRefCount
> >-> GetPrivateRefCountEntry
> > -> hash_search(...hash table that might be in the middle of an 
> > update...)
> 
> Ugh.  That one was safe before somebody decided we needed a hash table
> for buffer refcounts, but it's surely not safe now.

Mea culpa. This is 4b4b680c3d6d - from 2014.


> Which, of course, demonstrates the folly of allowing signal handlers to call
> much of anything; but especially doing so without clearly marking the called
> functions as needing to be signal safe.

Yea. Particularly when just going through bufmgr and updating places that look
at pin counts, it's not immediately obvious that
HoldingBufferPinThatDelaysRecovery() runs in a signal handler. Partially
because RecoveryConflictInterrupt() - which is mentioned in the comment above
HoldingBufferPinThatDelaysRecovery() - sounds a lot like it's called from
ProcessInterrupts(), which doesn't run in a signal handler...

RecoveryConflictInterrupt() calls a lot of functions, some of which quite
plausibly could be changed to not be signal safe, even if they currently are.


Is there really a reason for RecoveryConflictInterrupt() to run in a signal
handler? Given that we only react to conflicts in ProcessInterrupts(), it's
not immediately obvious that we need to do anything in
RecoveryConflictInterrupt() but set some flags.  There's probably some minor
efficiency gains, but that seems unconvincing.


The comments really need a rewrite - it sounds like
RecoveryConflictInterrupt() will error out itself:

/*
 * If we can abort just the current subtransaction then we are
 * OK to throw an ERROR to resolve the conflict. Otherwise
 * drop through to the FATAL case.
 *
 * XXX other times that we can throw just an ERROR *may* be
 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
 * parent transactions
 *
 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
 * by parent transactions and the transaction is not
 * transaction-snapshot mode
 *
 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
 * cursors open in parent transactions
 */

it's technically not *wrong* because it's setting up state that then leads to
ERROR / FATAL being thrown, but ...


Greetings,

Andres Freund




Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-04-09 Thread Tom Lane
Thomas Munro  writes:
> Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
> doesn't just set a sig_atomic_t flag and poke the latch.  Is the extra
> stuff it does safe?  For example, is this call stack OK (to pick one
> that jumps out, but not the only one)?

> procsignal_sigusr1_handler
> -> RecoveryConflictInterrupt
>  -> HoldingBufferPinThatDelaysRecovery
>   -> GetPrivateRefCount
>-> GetPrivateRefCountEntry
> -> hash_search(...hash table that might be in the middle of an update...)

Ugh.  That one was safe before somebody decided we needed a hash table
for buffer refcounts, but it's surely not safe now.  Which, of course,
demonstrates the folly of allowing signal handlers to call much of
anything; but especially doing so without clearly marking the called
functions as needing to be signal safe.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-09 Thread Tom Lane
Tatsuo Ishii  writes:
> Patch pushed. Thanks.

The buildfarm is still complaining about the synopsis being too
wide for PDF format.  I think what we ought to do is give up on
using a  for log lines at all, and instead convert the
documentation into a tabular list of fields.  Proposal attached,
which also fixes a couple of outright errors.

One thing that this doesn't fix is that the existing text appears
to suggest that the "failures" column is something different from
the sum of the serialization_failures and deadlock_failures
columns, which it's obvious from the code is not so.  If this isn't
a code bug then I think we ought to just drop that column entirely,
because it's redundant.

(BTW, now that I've read this stuff I am quite horrified by how
the non-aggregated log format has been mangled for error retries,
and will be probably be submitting a proposal to change that.
But that's a different issue.)

regards, tom lane

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 9ba26e5e86..d12cbaa8ab 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2289,33 +2289,95 @@ END;
   
 
   
-   The format of the log is:
-
-
-client_id transaction_no time script_no time_epoch time_us  schedule_lag   retries 
-
-
-   where
-   client_id indicates which client session ran the transaction,
-   transaction_no counts how many transactions have been
-   run by that session,
-   time is the total elapsed transaction time in microseconds,
-   script_no identifies which script file was used (useful when
-   multiple scripts were specified with -f or -b),
-   and time_epoch/time_us are a
-   Unix-epoch time stamp and an offset
-   in microseconds (suitable for creating an ISO 8601
-   time stamp with fractional seconds) showing when
-   the transaction completed.
-   The schedule_lag field is the difference between the
-   transaction's scheduled start time, and the time it actually started, in
-   microseconds. It is only present when the --rate option is used.
+   Each line in a log file describes one transaction.
+   It contains the following space-separated fields:
+
+   
+
+ client_id
+ 
+  
+   identifies the client session that ran the transaction
+  
+ 
+
+
+
+ transaction_no
+ 
+  
+   counts how many transactions have been run by that session
+  
+ 
+
+
+
+ time
+ 
+  
+   transaction's elapsed time, in microseconds
+  
+ 
+
+
+
+ script_no
+ 
+  
+   identifies the script file that was used for the transaction
+   (useful when multiple scripts are specified
+   with -f or -b)
+  
+ 
+
+
+
+ time_epoch
+ 
+  
+   transaction's completion time, as a Unix-epoch time stamp
+  
+ 
+
+
+
+ time_us
+ 
+  
+   fractional-second part of transaction's completion time, in
+   microseconds
+  
+ 
+
+
+
+ schedule_lag
+ 
+  
+   difference between the transaction's scheduled start time and the
+   time it actually started, in microseconds
+   (present only if --rate is specified)
+  
+ 
+
+
+
+ retries
+ 
+  
+   count of retries after serialization or deadlock errors during the
+   transaction
+   (present only if --max-tries is not equal to one)
+  
+ 
+
+   
+  
+
+  
When both --rate and --latency-limit are used,
the time for a skipped transaction will be reported as
skipped.
-   retries is the sum of all retries after the
-   serialization or deadlock errors during the current script execution. It is
-   present only if the --max-tries option is not equal to 1.
If the transaction ends with a failure, its time
will be reported as failed. If you use the
--failures-detailed option, the
@@ -2398,66 +2460,171 @@ END;
 
   
With the --aggregate-interval option, a different
-   format is used for the log files:
-
-
-interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
-sum_lag sum_lag_2 min_lag max_lag skipped
-retried retries failures serialization_failures deadlock_failures
-
-
-   where
-   interval_start is the start of the interval (as a Unix
-   epoch time stamp),
-   num_transactions is the number of transactions
-   within the interval,
-   sum_latency is the sum of the transaction
-   latencies within the interval,
-   sum_latency_2 is the sum of squares of the
-   transaction latencies within the interval,
-   min_latency is the minimum latency within the interval,
-   and
-   max_latency is the maximum latency within the interval,
-   failures is the number of transactions that ended
-   with a failed SQL command within the interval.
-  
-  
-   The next fields,
-   sum_lag, sum_lag_2, min_lag,
-   and max_lag, only meaningful if the --rate
-   option is used. Otherwise, they are all 0.0.
-   They pro

Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-04-09 Thread Thomas Munro
Hi,

Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
doesn't just set a sig_atomic_t flag and poke the latch.  Is the extra
stuff it does safe?  For example, is this call stack OK (to pick one
that jumps out, but not the only one)?

procsignal_sigusr1_handler
-> RecoveryConflictInterrupt
 -> HoldingBufferPinThatDelaysRecovery
  -> GetPrivateRefCount
   -> GetPrivateRefCountEntry
-> hash_search(...hash table that might be in the middle of an update...)

(I noticed this incidentally while trying to follow along with the
nearby thread on 031_recovery_conflict.pl, but the question of why we
really need this of interest to me for a back-burner project I have to
try to remove all use of signals except for latches, and then remove
the signal emulation for Windows.  It may turn out to be a pipe dream,
but this stuff is one of the subproblems.)




Re: shared-memory based stats collector - v70

2022-04-09 Thread David G. Johnston
On Sat, Apr 9, 2022 at 12:07 PM Andres Freund  wrote:

>
> > ... specific counters.  In particular, replay will not increment
> > pg_stat_database or pg_stat_all_tables columns, and the startup process
> > will not report reads and writes for the pg_statio views.
> >
> > It would helpful to give at least one specific example of what is being
> > recorded normally, especially since we give three of what is not.
>
> The second sentence is a set of examples - or do you mean examples for what
> actions by the startup process are counted?
>
>
Specific views that these statistics will be updating; like
pg_stat_database being the example of a view that is not updating.

David J.


Re: shared-memory based stats collector - v70

2022-04-09 Thread Andres Freund
Hi,

On 2022-04-07 21:39:55 -0700, David G. Johnston wrote:
> On Thu, Apr 7, 2022 at 8:59 PM Andres Freund  wrote:
> 
> >
> >   
> >Cumulative statistics are collected in shared memory. Every
> >PostgreSQL process collects statistics
> > locally
> >then updates the shared data at appropriate intervals.  When a server,
> >including a physical replica, shuts down cleanly, a permanent copy of
> > the
> >statistics data is stored in the pg_stat
> > subdirectory,
> >so that statistics can be retained across server restarts.  In contrast,
> >when starting from an unclean shutdown (e.g., after an immediate
> > shutdown,
> >a server crash, starting from a base backup, and point-in-time
> > recovery),
> >all statistics counters are reset.
> >   
> >
> 
> I like this.  My comment regarding using "i.e.," here stands though.

Argh. I'd used in e.g., but not i.e..


> >
> >
> > The cumulative statistics system is active during recovery. All scans,
> > reads, blocks, index usage, etc., will be recorded normally on the
> > standby. However, WAL replay will not increment relation and database
> > specific counters. I.e. replay will not increment pg_stat_all_tables
> > columns (like n_tup_ins), nor will reads or writes performed by the
> > startup process be tracked in the pg_statio views, nor will associated
> > pg_stat_database columns be incremented.
> >
> >
> >
> I like this too.  The second part with three nors is a bit rough.  Maybe:

Agreed. I tried to come up with a smoother formulation, but didn't (perhaps
because I was a tad tired).


> ... specific counters.  In particular, replay will not increment
> pg_stat_database or pg_stat_all_tables columns, and the startup process
> will not report reads and writes for the pg_statio views.
> 
> It would helpful to give at least one specific example of what is being
> recorded normally, especially since we give three of what is not.

The second sentence is a set of examples - or do you mean examples for what
actions by the startup process are counted?

Greetings,

Andres Freund




Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Jonathan S. Katz

On 4/9/22 12:27 PM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

-1, at least for the moment. Sometimes a user doesn't know what they're
looking for coupled with being unaware of what the default value is. If
a setting is set to a default value and that value is the problematic
setting, a user should be able to see that even in a full list.


Sure, but then you do "\dconfig *".  With there being several hundred
GUCs (and no doubt more coming), I'm not sure that "show me every GUC"
is a common use-case at all, let alone so common as to deserve being
the default behavior.

One thing we could perhaps do to reduce confusion is to change the
table heading when doing this, say from "List of configuration parameters"
to "List of non-default configuration parameters".


Reasonable points. I don't have any objections to this proposal.

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Commitfest wrapup

2022-04-09 Thread Greg Stark
On Sat, 9 Apr 2022 at 10:51, Tom Lane  wrote:
>
> > Sound like bugfixes to be backpatched.
>
> Yeah.  I'm not sure why these have received so little love.

Since bug fixes are important enough that they'll definitely get done
(and can happen after feature freeze) there's a bit of a perverse
incentive to focus on other things...

"Everybody was sure that Somebody would do it. Anybody could have done
it, but Nobody did it"

I think every project struggles with bugs that sit in bug tracking
systems indefinitely. The Open Issues is the biggest hammer we have.
Maybe we should be tracking "Open Issues" from earlier in the process
-- things that we think we shouldn't do a release without addressing.

-- 
greg




Re: why pg_walfile_name() cannot be executed during recovery?

2022-04-09 Thread Robert Haas
On Sat, Apr 9, 2022 at 12:25 PM Andrey Borodin  wrote:
> Please excuse me if I'm not attentive enough. I've read this thread. And I 
> could not find what is the problem that you are solving. What is the purpose 
> of the WAL file name you want to obtain?

Yeah, I'd also like to know this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Vik Fearing

On 4/9/22 16:31, Tom Lane wrote:

Christoph Berg  writes:


I would think that if \dconfig showed the non-default settings only,
it would be much more useful; the full list would still be available
with "\dconfig *". This is in line with \dt only showing tables on the
search_path, and "\dt *.*" showing all.


Hm, I could get on board with that -- any other opinions?


+1
--
Vik Fearing




Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Robert Haas
On Sat, Apr 9, 2022 at 10:31 AM Tom Lane  wrote:
> Christoph Berg  writes:
> > The name has evolved from \dcp over various longer \d-things to the
> > more verbose \dconfig. How about we evolve it even more and just call
> > it \config?
>
> I think people felt that it should be part of the \d family.
> Also, because we have \connect and \conninfo, you'd need to
> type at least five characters before you could tab-complete,
> whereas \dconfig is unique at four (you just need \dco).

Regarding this point, I kind of think that \dconfig is a break with
established precedent, but in a good way. Previous additions have
generally tried to pick some vaguely mnemonic sequence of letters that
somehow corresponds to what's being listed, but you have to be pretty
hard-core to remember what \db and \dAc and \drds do. \dconfig is
probably easier to remember.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread David G. Johnston
On Sat, Apr 9, 2022 at 9:27 AM Tom Lane  wrote:

> "Jonathan S. Katz"  writes:
> > -1, at least for the moment. Sometimes a user doesn't know what they're
> > looking for coupled with being unaware of what the default value is. If
> > a setting is set to a default value and that value is the problematic
> > setting, a user should be able to see that even in a full list.
>
> Sure, but then you do "\dconfig *".  With there being several hundred
> GUCs (and no doubt more coming), I'm not sure that "show me every GUC"
> is a common use-case at all, let alone so common as to deserve being
> the default behavior.
>
>
I'm for having a default that doesn't mean "show everything".

I'm also wondering whether we can invent GUC namespaces for the different
contexts, so I can use a pattern like: context.internal.*

A similar ability for category would be nice but we'd have to invent labels
to make it practical.

\dconfig [pattern [mode]]

mode: all, overridden

So mode is overridden if pattern is absent but all if pattern is present,
with the ability to specify overridden.

pattern: [[{context.{context label}}|{category.{category
label}}.]...]{parameter name pattern}
parameter name pattern: [{two part name prefix}.]{base parameter pattern}


One thing we could perhaps do to reduce confusion is to change the
> table heading when doing this, say from "List of configuration parameters"
> to "List of non-default configuration parameters".
>
>
I'd be inclined to echo a note after the output table that says that not
all configuration parameters are displayed - possibly even providing a
count [all - overridden].  The header is likely to be ignored even if it
still ends up on screen after scrolling.

David J.


Re: Add parameter jit_warn_above_fraction

2022-04-09 Thread Julien Rouhaud
On Sat, Apr 09, 2022 at 12:31:23PM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
> >> Also, good luck with "looking in the logs", because by default
> >> WARNING-level messages don't go to the postmaster log.
> 
> > I must be missing something because by default log_min_messages is WARNING, 
> > and
> > AFAICS I do get WARNING-level messages in some vanilla cluster logs, using
> > vanilla .psqlrc.
> 
> Ah, sorry, I was looking at the wrong thing namely
> log_min_error_statement.  The point still holds though: if we emit this
> at level WARNING, what will get logged is just the bare message and not
> the statement that triggered it.  How useful do you think that will be?

Ah right, I did miss that "small detail".  And of course I agree, as-is that
would be entirely useless and it should be LOG, like the rest of similar
features.




Re: Mingw task for Cirrus CI

2022-04-09 Thread Andrew Dunstan


On 4/8/22 21:02, Andres Freund wrote:
> Hi,
>
> On 2022-04-08 19:27:58 -0500, Justin Pryzby wrote:
>> On Thu, Apr 07, 2022 at 10:10:21AM -0700, Andres Freund wrote:
>>> On 2022-04-06 11:03:37 -0400, Andrew Dunstan wrote:
 On 3/30/22 20:26, Andres Freund wrote:
> Could you try using dash to invoke configure here, and whether it makes 
> configure faster?
 I got weird failures re libxml/parser.h when I tried with dash. See
  (It would be nice if we
 could see config.log on failure.)
>>> Since dash won't help us to get the build time down sufficiently, and the
>>> tests don't pass without a separate build tree, I looked at what makes
>>> config/prep_buildtree so slow.
>>>
>>> It's largely just bad code. The slowest part are spawning one expr and mkdir
>>> -p for each directory. One 'cmp' for each makefile doesn't help either.
>>>
>>> The expr can be replaced with
>>>   subdir=${item#$sourcetree}
>>> that's afaics posix syntax ([1]), not bash.
>>>
>>> Spawning one mkdir for each directory can be replaced by a single mkdir
>>> invocation with all the directories. On my linux workstation that gets the
>>> time for the first loop down from 1005ms to 38ms, really.
>> Even better?
>>
>> (cd "$sourcetree" && find . -print |grep -E '/Makefile$|/GNUmakefile$' |grep 
>> -v "$sourcetree/doc/src/sgml/images/" |xargs tar c) |
>> (cd "$buildtree" && tar x)
> Don't think we depend on tar for building, at the moment. But yes, it'd be
> faster... Tar is certainly a smaller dependency than perl, not sure if there's
> any relevant platform without it?
>
>


Couple of things here.


1. The second grep should lose "$sourcetree" I think.

2. Isn't this going to be a change in behaviour in that it will copy
rather than symlinking the Makefiles? That is in fact the default
behaviour of msys2's 'ln -s', as I pointed out upthread, but I don't
think it's what we really want, especially in the general case. If you
modify the Makefile and you're using a vpath you want to see the change
reflected in your vpath.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add parameter jit_warn_above_fraction

2022-04-09 Thread Tom Lane
Julien Rouhaud  writes:
> On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
>> Also, good luck with "looking in the logs", because by default
>> WARNING-level messages don't go to the postmaster log.

> I must be missing something because by default log_min_messages is WARNING, 
> and
> AFAICS I do get WARNING-level messages in some vanilla cluster logs, using
> vanilla .psqlrc.

Ah, sorry, I was looking at the wrong thing namely
log_min_error_statement.  The point still holds though: if we emit this
at level WARNING, what will get logged is just the bare message and not
the statement that triggered it.  How useful do you think that will be?

regards, tom lane




Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Tom Lane
"Jonathan S. Katz"  writes:
> -1, at least for the moment. Sometimes a user doesn't know what they're 
> looking for coupled with being unaware of what the default value is. If 
> a setting is set to a default value and that value is the problematic 
> setting, a user should be able to see that even in a full list.

Sure, but then you do "\dconfig *".  With there being several hundred
GUCs (and no doubt more coming), I'm not sure that "show me every GUC"
is a common use-case at all, let alone so common as to deserve being
the default behavior.

One thing we could perhaps do to reduce confusion is to change the
table heading when doing this, say from "List of configuration parameters"
to "List of non-default configuration parameters".

regards, tom lane




Re: why pg_walfile_name() cannot be executed during recovery?

2022-04-09 Thread Andrey Borodin



> On 9 Apr 2022, at 18:30, Bharath Rupireddy 
>  wrote:
> 
> Using insert tli when not in recovery and using tli of the last WAL
> replayed record in crash/archive/standby recovery, seems a reasonable
> choice to me. 

Please excuse me if I'm not attentive enough. I've read this thread. And I 
could not find what is the problem that you are solving. What is the purpose of 
the WAL file name you want to obtain?

pg_walfile_name() - is a formatting function. With TLI as an hidden argument. 
If we want it to work on Standby we should just convert it to pure formatting 
function without access the the DB state, pass TLI as an argument.
Making implicit TLI computation with certain expectations is not a good idea 
IMV.

pg_walfile_name() could just read .history file, determine which TLI contains 
given LSN and format the name. And still there's a tricky segments during TLI 
switch.

Either way we can rename the function to 
pg_walfile_name_as_if_on_timeline_of_last_wal_replayed().

Thanks!

Best regards, Andrey Borodin.



Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Jonathan S. Katz

On 4/9/22 11:58 AM, Julien Rouhaud wrote:

On Sat, Apr 09, 2022 at 10:31:17AM -0400, Tom Lane wrote:

Christoph Berg  writes:


I would think that if \dconfig showed the non-default settings only,
it would be much more useful; the full list would still be available
with "\dconfig *". This is in line with \dt only showing tables on the
search_path, and "\dt *.*" showing all.


Hm, I could get on board with that -- any other opinions?


+1 for it, that's often what I'm interested in when looking at the GUCs in
general.


-1, at least for the moment. Sometimes a user doesn't know what they're 
looking for coupled with being unaware of what the default value is. If 
a setting is set to a default value and that value is the problematic 
setting, a user should be able to see that even in a full list.


(The \dt searching only tables "search_path" vs. the database has also 
bitten me too. I did ultimately learn about "\dt *.*", but this makes 
the user have to unpack more layers to do simple things).


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Julien Rouhaud
On Sat, Apr 09, 2022 at 10:31:17AM -0400, Tom Lane wrote:
> Christoph Berg  writes:
> 
> > I would think that if \dconfig showed the non-default settings only,
> > it would be much more useful; the full list would still be available
> > with "\dconfig *". This is in line with \dt only showing tables on the
> > search_path, and "\dt *.*" showing all.
> 
> Hm, I could get on board with that -- any other opinions?

+1 for it, that's often what I'm interested in when looking at the GUCs in
general.

> (Perhaps there's an argument for omitting "override"
> settings as well?)

-0.1.  Most are usually not useful, but I can see at least data_checksums and
wal_buffers that are still interesting.




Re: Add parameter jit_warn_above_fraction

2022-04-09 Thread Julien Rouhaud
On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
>
> Also, good luck with "looking in the logs", because by default
> WARNING-level messages don't go to the postmaster log.  If that's
> the intended use-case then the message ought to appear at LOG
> level (which'd also change the desirable name for the GUC).

I must be missing something because by default log_min_messages is WARNING, and
AFAICS I do get WARNING-level messages in some vanilla cluster logs, using
vanilla .psqlrc.




Re: Commitfest wrapup

2022-04-09 Thread Tom Lane
Greg Stark  writes:
> On Sat, 9 Apr 2022 at 06:44, Alvaro Herrera  wrote:
>>> * Simplify some RI checks to reduce SPI overhead

>> Move to next; a lot more work is required.

> If it's going to be part of a much larger patch set I wonder if it
> shouldn't just be marked Rejected and start a new thread and new CF
> entry for the whole suite.

IMV, "rejected" means "we don't want this patch nor any plausible
rework of it".  In this case, the feedback is more like "why aren't
we changing all of ri_triggers this way", so I'd call it RWF.

>>> * Map WAL segment files on PMEM as WAL buffers
>>> * Support custom authentication methods using hooks
>>> * Implement INSERT SET syntax
>>> * Logical insert/update/delete WAL records for custom table AMs

>> New features.

> Yeah, this bunch definitely consists of new features, just not sure if
> they should be moved forward or Rejected or RWF.

Probably just move them forward.  The only one of these four that's
been really sitting around for a long time is INSERT SET, and I think
there we've just not quite made up our minds if we want it or not.

regards, tom lane




Re: Commitfest wrapup

2022-04-09 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Apr-08, Greg Stark wrote:
>> * fix psql pattern handling

> Sounds like a bugfix, but how old and how backpatchable a fix is?
> Thread is quite long.

There's been more contention than one could wish about both what
the behavior should be and how to refactor the code to achieve it.
Nonetheless, I think there is general consensus that the current
behavior isn't very desirable, so if we can reach agreement I think
this should be treated as a bug fix.

>> * Avoid erroring out when unable to remove or parse logical rewrite
>> files to save checkpoint work

It seems like people are not convinced that this cure is better than
the disease.

>> * standby recovery fails when re-replaying due to missing directory
>> which was removed in previous replay.
>> * Logical replication failure "ERROR: could not map filenode
>> "base/13237/442428" to relation OID" with catalog modifying txns
>> * Fix pg_rewind race condition just after promotion
>> * pg_receivewal fail to streams when the partial file to write is not
>> fully initialized present in the wal receiver directory

> Sound like bugfixes to be backpatched.

Yeah.  I'm not sure why these have received so little love.
The only one of this ilk that I've personally looked at was

>> * Make message at end-of-recovery less scary

which for me adds far too much complication for the claimed benefit.
Maybe somebody else with a lot more familiarity with the current WAL
code will care to push that, but I'm not touching it.  Post-FF
doesn't seem like a good time for it anyway from a risk/reward
standpoint.

regards, tom lane




Re: Add parameter jit_warn_above_fraction

2022-04-09 Thread Tom Lane
Julien Rouhaud  writes:
> On Fri, Apr 08, 2022 at 09:39:18AM -0400, Stephen Frost wrote:
>> Having this in pg_stat_statements is certainly helpful but having a
>> warning also is.  I don't think we have to address this in only one way.
>> A lot faster to flip this guc and then look in the logs on a busy system
>> than to install pg_stat_statements, restart the cluster once you get
>> permission to do so, and then query it.

> +1, especially if you otherwise don't really need or want to have
> pg_stat_statements enabled, as it's far from being free.  Sure you could 
> enable
> it by default with pg_stat_statements.track = none, but that seems a lot more
> troublesome than just dynamically enabling a GUC, possibly for a short time
> and/or for a specific database/role.

The arguments against the GUC were not about whether it's convenient.
They were about whether the information it gives you is actually going
to be of any use.

Also, good luck with "looking in the logs", because by default
WARNING-level messages don't go to the postmaster log.  If that's
the intended use-case then the message ought to appear at LOG
level (which'd also change the desirable name for the GUC).

regards, tom lane




Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Tom Lane
Christoph Berg  writes:
> The name has evolved from \dcp over various longer \d-things to the
> more verbose \dconfig. How about we evolve it even more and just call
> it \config?

I think people felt that it should be part of the \d family.
Also, because we have \connect and \conninfo, you'd need to
type at least five characters before you could tab-complete,
whereas \dconfig is unique at four (you just need \dco).

> I would think that if \dconfig showed the non-default settings only,
> it would be much more useful; the full list would still be available
> with "\dconfig *". This is in line with \dt only showing tables on the
> search_path, and "\dt *.*" showing all.

Hm, I could get on board with that -- any other opinions?
(Perhaps there's an argument for omitting "override"
settings as well?)

regards, tom lane




Re: why pg_walfile_name() cannot be executed during recovery?

2022-04-09 Thread Bharath Rupireddy
On Fri, Apr 8, 2022 at 7:28 PM Robert Haas  wrote:
>
> On Fri, Apr 8, 2022 at 9:31 AM Bharath Rupireddy
>  wrote:
> > Fundamental question - should the pg_walfile_{name, name_offset} check
> > whether the file with the computed WAL file name exists on the server
> > right now or ever existed earlier? Right now, they don't do that, see
> > [1].
>
> I don't think that checking whether the file exists is the right
> approach. However, I do think that it's important to be precise about
> which TLI is going to be used. I think it would be reasonable to
> redefine this function (on both the primary and the standby) so that
> the TLI that is used is the one that was in effect at the time record
> at the given LSN was either written or replayed. Then, you could
> potentially use this function to figure out whether you still have the
> WAL files that are needed to replay up to some previous point in the
> WAL stream. However, what about the segments where we switched from
> one TLI to the next in the middle of the segment? There, you probably
> need both the old and the new segments, or maybe if you're trying to
> stream them you only need the new one because we have some weird
> special case that will send the segment from the new timeline when the
> segment from the old timeline is requested. So you couldn't just call
> this function on one LSN per segment and call it good, and it wouldn't
> necessarily be the case that the filenames you got back were exactly
> the ones you needed.
>
> So I'm not entirely sure this proposal is good enough, but it at least
> would have the advantage of meaning that the filename you get back is
> one that existed at some point in time and somebody used it for
> something.

Using insert tli when not in recovery and using tli of the last WAL
replayed record in crash/archive/standby recovery, seems a reasonable
choice to me. I've also added a note in the docs.

Attaching v2 with the above change. Please review it further.

Regards,
Bharath Rupireddy.
From 37a7587abb11b6ebcb82d0fbf3cff7505355a679 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 9 Apr 2022 13:14:03 +
Subject: [PATCH v2] Allow pg_walfile_{name, name_offset} to run in recovery

Right now, pg_walfile_name and pg_walfile_name_offset don't run
while the server is in recovery (standby, PITR/archive, crash).
This reduces their usability if the server opens up for
read-only connections in recovery.

This patch enables them to compute WAL file name even in recovery
with a timeline ID of last the last successfully replayed WAL
record. They continue to use the timeline ID with which new
WAL records are being inserted and flushed when not in recovery.
---
 doc/src/sgml/func.sgml |  8 +
 src/backend/access/transam/xlogfuncs.c | 48 +-
 2 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5047e090db..0a08e34813 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28442,6 +28442,14 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
 needs to be archived.

 
+   
+Both pg_walfile_name_offset and pg_walfile_name
+use a timeline ID internally to the extract write-ahead log file name. When
+not in recovery, they use the timeline ID with which new write-ahead log
+records are being inserted and flushed. When in recovery, they use the
+timeline ID of the last successfully replayed write-ahead log record.
+   
+
   
 
   
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index b61ae6c0b4..795fd8d26b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -44,6 +44,8 @@
 static StringInfo label_file;
 static StringInfo tblspc_map_file;
 
+static TimeLineID GetTLIForWALFileNameComputation(void);
+
 /*
  * pg_backup_start: set up for taking an on-line backup dump
  *
@@ -316,6 +318,30 @@ pg_last_wal_replay_lsn(PG_FUNCTION_ARGS)
 	PG_RETURN_LSN(recptr);
 }
 
+/*
+ * Get timeLine ID for computing WAL file name.
+ *
+ * When not in recovery, it returns the timeline into which new WAL is being
+ * inserted and flushed.
+ *
+ * When in crash/archive/standby recovery, it returns the timeline of the last
+ * WAL record that is successfully replayed.
+ */
+static TimeLineID
+GetTLIForWALFileNameComputation(void)
+{
+	TimeLineID	tli;
+
+	if (RecoveryInProgress())
+		(void) GetXLogReplayRecPtr(&tli);
+	else
+		tli = GetWALInsertionTimeLine();
+
+	Assert(tli > 0);
+
+	return tli;
+}
+
 /*
  * Compute an xlog file name and decimal byte offset given a WAL location,
  * such as is returned by pg_backup_stop() or pg_switch_wal().
@@ -336,13 +362,9 @@ pg_walfile_name_offset(PG_FUNCTION_ARGS)
 	TupleDesc	resultTupleDesc;
 	HeapTuple	resultHeapTuple;
 	Datum		result;
+	TimeLineID	tli;
 
-	if (RecoveryInProgress())
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE)

Re: Commitfest wrapup

2022-04-09 Thread Greg Stark
On Sat, 9 Apr 2022 at 06:44, Alvaro Herrera  wrote:
>
> > * Simplify some RI checks to reduce SPI overhead
>
> Move to next; a lot more work is required.

If it's going to be part of a much larger patch set I wonder if it
shouldn't just be marked Rejected and start a new thread and new CF
entry for the whole suite.

> > * Map WAL segment files on PMEM as WAL buffers
> > * Support custom authentication methods using hooks
> > * Implement INSERT SET syntax
> > * Logical insert/update/delete WAL records for custom table AMs
>
> New features.

Yeah, this bunch definitely consists of new features, just not sure if
they should be moved forward or Rejected or RWF. Some of them had some
negative feedback or the development has taken some turns that make me
think starting new patches specifically for the parts that remain may
make more sense.


-- 
greg




Re: make MaxBackends available in _PG_init

2022-04-09 Thread Julien Rouhaud
On Wed, Mar 30, 2022 at 09:30:51AM -0700, Nathan Bossart wrote:
> On Tue, Mar 29, 2022 at 12:22:21PM -0400, Robert Haas wrote:
> > It's not, though, because the original proposal was to change things
> > around so that the value of MaxBackends would have been reliable in
> > _PG_init(). If we'd done that, then extensions that are using it in
> > _PG_init() would have gone from being buggy to being not-buggy. But
> > since you advocated against that change, we instead committed
> > something that caused them to go from being buggy to failing outright.
> > That's pretty painful for people with such extensions. And IMHO, it's
> > *much* more legitimate to want to size a data structure based on the
> > value of MaxBackends than it is for extensions to override GUC values.
> > If we can make the latter use case work in a sane way, OK, although I
> > have my doubts about how sane it really is, but it can't be at the
> > expense of telling extensions that have been (incorrectly) using
> > MaxBackends in _PG_init() that we're throwing them under the bus.
> > 
> > IMHO, the proper thing to do if certain GUC values are required for an
> > extension to work is to put that information in the documentation and
> > error out at an appropriate point if the user does not follow the
> > directions. Then this issue does not arise. But there's no reasonable
> > workaround for being unable to size data structures based on
> > MaxBackends.
> 
> FWIW I would be on board with reverting all the GetMaxBackends() stuff if
> we made the value available in _PG_init() and stopped supporting GUC
> overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
> process_shared_preload_libraries_in_progress is true).

Yeah I would prefer this approach too, although it couldn't prevent extension
from directly modifying the underlying variables so I don't know how effective
it would be.

On the bright side, I see that citus is using SetConfigOption() to increase
max_prepared_transactions [1].  That's the only extension mentioned in that
thread that does modify some GUC, and this GUC was already marked as
PGDLLIMPORT since 2017 so it probably wasn't done that way for Windows
compatibility reason.

In the meantime, should we add an open item?

[1] 
https://github.com/citusdata/citus/blob/master/src/backend/distributed/transaction/transaction_management.c#L656-L657




Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?

2022-04-09 Thread Julien Rouhaud
On Sat, Apr 09, 2022 at 02:38:50PM +0530, Bharath Rupireddy wrote:
> On Fri, Apr 8, 2022 at 10:22 PM SATYANARAYANA NARLAPURAM
>  wrote:
> >
> >> >  wrote:
> >> > >
> >> > > Hi,
> >> > >
> >> > > I'm thinking if there's a way in core postgres to achieve $subject. In
> >> > > reality, the sync/async standbys can either be closer/farther (which
> >> > > means sync/async standbys can receive WAL at different times) to
> >> > > primary, especially in cloud HA environments with primary in one
> >> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions.
> >> > > $subject may not be possible on dev systems (say, for testing some HA
> >> > > features) unless we can inject a delay in WAL senders before sending
> >> > > WAL.
> >
> > Simulation will be helpful even for end customers to simulate faults in the
> > production environments during availability zone/disaster recovery drills.
>
> Right.

I'm not sure that's actually helpful.  If you want to do some realistic testing
you need to fully simulate various network incidents and only delaying postgres
replication is never going to be close to that.  You should instead rely on
tool like tc, which can do much more than what $subject could ever do, and do
that for all your HA stack.  At the very least you don't want to validate that
your setup is working as excpected by just simulating a faulty postgres
replication connection but still having all your clients and HA agent not
having any network issue at all.




Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-04-09 Thread Bharath Rupireddy
On Tue, Jan 4, 2022 at 1:40 AM SATYANARAYANA NARLAPURAM
 wrote:
>
> On Sun, Jan 2, 2022 at 11:56 PM Michael Paquier  wrote:
>>
>> On Sun, Jan 02, 2022 at 09:27:43PM -0800, SATYANARAYANA NARLAPURAM wrote:
>> > I noticed that pg_receivewal fails to stream when the partial file to write
>> > is not fully initialized and fails with the error message something like
>> > below. This requires an extra step of deleting the partial file that is not
>> > fully initialized before starting the pg_receivewal. Attaching a simple
>> > patch that creates a temp file, fully initialize it and rename the file to
>> > the desired wal segment name.
>>
>> Are you referring to the pre-padding when creating a new partial
>> segment, aka when we write chunks of XLOG_BLCKSZ full of zeros until
>> the file is fully created?  What kind of error did you see?  I guess
>> that a write() with ENOSPC would be more likely, but you got a
>> different problem?
>
> I see two cases, 1/ when no space  is left on the device and 2/ when the 
> process is taken down forcibly (a VM/container crash)

Yeah, these cases can occur leaving uninitialized .partial files which
can be a problem for both pg_receivewal and pg_basebackup that uses
dir_open_for_write (CreateWalDirectoryMethod).

>>   I don't disagree with improving such cases, but we
>> should not do things so as there is a risk of leaving behind an
>> infinite set of segments in case of repeated errors
>
> Do you see a problem with the proposed patch that leaves the files behind, at 
> least in my testing I don't see any files left behind?

With the proposed patch, it doesn't leave the unpadded .partial files.
Also, the v2 patch always removes a leftover .partial.temp file before
it creates a new one.

>> , and partial
>> segments are already a kind of temporary file.
>
>
> if the .partial file exists with not zero-padded up to the wal segment size 
> (WalSegSz), then open_walfile fails with the below error. I have two options 
> here, 1/ to continue padding the existing partial file and let it zero up to 
> WalSegSz , 2/create a temp file as I did in the patch. I thought the latter 
> is safe because it can handle corrupt cases as described below. Thoughts?

The temp file approach looks clean.

>> -   if (dir_data->sync)
>> +   if (shouldcreatetempfile)
>> +   {
>> +   if (durable_rename(tmpsuffixpath, targetpath) != 0)
>> +   {
>> +   close(fd);
>> +   unlink(tmpsuffixpath);
>> +   return NULL;
>> +   }
>> +   }
>>
>> durable_rename() does a set of fsync()'s, but --no-sync should not
>> flush any data.

Fixed this in v2.

Another thing I found while working on this is the way the
dir_open_for_write does padding - it doesn't retry in case of partial
writes of blocks of size XLOG_BLCKSZ, unlike what core postgres does
with pg_pwritev_with_retry in XLogFileInitInternal. Maybe
dir_open_for_write should use the same approach. Thoughts?

I fixed couple of issues with v1 (which was using static local
variables in dir_open_for_write, not using durable_rename/rename for
dir_data->sync true/false cases, not considering compression method
none while setting shouldcreatetempfile true), improved comments and
added commit message.

Please review the v2 further.

Regards,
Bharath Rupireddy.


v2-0001-Avoid-partially-padded-files-under-pg_receivewal-.patch
Description: Binary data


Re: PG DOCS - logical replication filtering

2022-04-09 Thread Alvaro Herrera
On 2022-Apr-08, Peter Smith wrote:

> PSA patch v6 to address some of Amit's review comments [1].

I think the text is good (didn't read it all, just about the first
half), but why is there one paragraph per sentence?  Seems a bit too
sparse.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."  (Andrew Sullivan)




Re: Commitfest wrapup

2022-04-09 Thread Alvaro Herrera
On 2022-Apr-08, Greg Stark wrote:

> These remaining CF entries look like they're bugs that are maybe Open
> Issues for release?
> 
> * fix spinlock contention in LogwrtResult

I don't have a good enough grip on barriers needed for this, so I'd
rather move it to pg16 to have time for further study.

> * fix psql pattern handling

Sounds like a bugfix, but how old and how backpatchable a fix is?
Thread is quite long.

> * Avoid erroring out when unable to remove or parse logical rewrite
> files to save checkpoint work
> * standby recovery fails when re-replaying due to missing directory
> which was removed in previous replay.
> * Logical replication failure "ERROR: could not map filenode
> "base/13237/442428" to relation OID" with catalog modifying txns
> * Fix pg_rewind race condition just after promotion
> * pg_receivewal fail to streams when the partial file to write is not
> fully initialized present in the wal receiver directory

Sound like bugfixes to be backpatched.

> A couple minor documentation, testing, and diagnostics patches that
> may be committable even after feature freeze?
> 
> * Improve role attributes docs

Let's get it pushed.

There's also a logical replication "row filter" doc patch that I don't
see in your list, we should get that in.  Maybe it's not in CF.  I had a
quibble with paras length in that one; requires stylistic edit.

https://postgr.es/m/cahut+pvyxmedyy-jhat9ysfephv0ju2-cz8f_npvhup0b95...@mail.gmail.com

> * Reloption tests iprovement. Test resetting illegal option that was
> actually set for some reason

I think we should push this one.

> * jit_warn_above_fraction parameter

Unsure; seems a good patch, but is there enough consensus?

> * Simplify some RI checks to reduce SPI overhead

Move to next; a lot more work is required.

> * Map WAL segment files on PMEM as WAL buffers
> * Support custom authentication methods using hooks
> * Implement INSERT SET syntax
> * Logical insert/update/delete WAL records for custom table AMs

New features.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"




Re: Mingw task for Cirrus CI

2022-04-09 Thread Alvaro Herrera
On 2022-Apr-08, Andres Freund wrote:

> I just realized that the second find is pretty expensive compared to the
> first.
> 
> time find "$sourcetree" -type d \( \( -name CVS -prune \) -o \( -name .git 
> -prune \) -o -print \) | grep -v "$sourcetree/doc/src/sgml/\+" > /dev/null
> real  0m0.019s
> user  0m0.008s
> sys   0m0.017s
> 
> second:
> time find "$sourcetree" -name Makefile -print -o -name GNUmakefile -print | 
> grep -v "$sourcetree/doc/src/sgml/images/" > /dev/null
> 
> real  0m0.118s
> user  0m0.071s
> sys   0m0.053s

Hmm, ISTM that time can be reduced a bit with -prune,

time find "$sourcetree"  \( -name .git -prune \) -o -name Makefile -print -o 
-name GNUmakefile -print | grep -v "$sourcetree/doc/src/sgml/images/" > 
/dev/null

I thought it might work to do away with the grep and use find's -path
instead to prune that subdir, but "time" shows almost no difference for
me:

time find "$sourcetree"  \( -name .git -prune \) -o \( -path 
'*doc/src/sgml/images' -prune \) -o -name Makefile -print -o -name GNUmakefile 
-print > /dev/null

Maybe find's -path is equally expensive.  Still, that seems a good
change anyway.  (The times are lower in my system than those you show.)

> It think we could just obsolete the second find, by checking for the existence
> of Makefile / GNUmakefile in the first loop...

Hmm, if that's going to require one more shell command per dir, it
sounds more expensive. It's worth trying, I guess.

> The invocation of ln -s is quite measurable - looks like it's mostly the
> process startup overhead (on linux, at least). Doing a ln --version > 
> /dev/null
> each iteration takes about the same time as actually creating the symlinks.

Is this running with some locale settings enabled?  Maybe we can save
some time by making sure we're under LC_ALL=C or something like that, to
avoid searching for translation files.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las navajas y los monos deben estar siempre distantes"   (Germán Poo)




Re: How about a psql backslash command to show GUCs?

2022-04-09 Thread Christoph Berg
Re: Tom Lane
> Looks like the consensus has shifted to \dconfig.  I'll do it like that.

A bit late to the party, but two more ideas:


The name has evolved from \dcp over various longer \d-things to the
more verbose \dconfig. How about we evolve it even more and just call
it \config? That would be much easier to remember - in fact after I
seeing the patch the other day, I wanted to try it today and I was
confused when \config didn't work, and had to read the git log to see
how it's actually called.

It also doesn't conflict with tab completion too much, \conf
would work.


The other bit is hiding non-default values. "\dconfig" by itself is
very long and not very interesting. I have this in my .psqlrc that I
use very often on servers I'm visiting:

\set config 'SELECT name, current_setting(name), CASE source WHEN 
$$configuration file$$ THEN regexp_replace(sourcefile, $$^/.*/$$, 
)||$$:$$||sourceline ELSE source END FROM pg_settings WHERE source <> 
$$default$$;'

I would think that if \dconfig showed the non-default settings only,
it would be much more useful; the full list would still be available
with "\dconfig *". This is in line with \dt only showing tables on the
search_path, and "\dt *.*" showing all.

Christoph




Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?

2022-04-09 Thread Bharath Rupireddy
On Fri, Apr 8, 2022 at 10:22 PM SATYANARAYANA NARLAPURAM
 wrote:
>
>> >  wrote:
>> > >
>> > > Hi,
>> > >
>> > > I'm thinking if there's a way in core postgres to achieve $subject. In
>> > > reality, the sync/async standbys can either be closer/farther (which
>> > > means sync/async standbys can receive WAL at different times) to
>> > > primary, especially in cloud HA environments with primary in one
>> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions.
>> > > $subject may not be possible on dev systems (say, for testing some HA
>> > > features) unless we can inject a delay in WAL senders before sending
>> > > WAL.
>
> Simulation will be helpful even for end customers to simulate faults in the 
> production environments during availability zone/disaster recovery drills.

Right.

>> > > How about having two developer-only GUCs {async,
>> > > sync}_wal_sender_delay? When set, the async and sync WAL senders will
>> > > delay sending WAL by {async, sync}_wal_sender_delay
>> > > milliseconds/seconds? Although, I can't think of any immediate use, it
>> > > will be useful someday IMO, say for features like [1], if it gets in.
>> > > With this set of GUCs, one can even add core regression tests for HA
>> > > features.
>
> I would suggest doing this at the slot level, instead of two GUCs that 
> control the behavior of all the slots (physical/logical). Something like 
> "pg_suspend_replication_slot and pg_Resume_replication_slot"?

Having the control at the replication slot level seems reasonable
instead of at the WAL sender level. As there can be many slots on the
primary, we must have a way to specify which slots need to be delayed
and by how much time before sending WAL. If GUCs, they must be of list
types and I'm not sure that would come out well.

Instead, two (superuser-only/users with replication role) functions
such as pg_replication_slot_set_delay(slot_name,
delay_in_milliseconds)/pg_replication_slot_unset_delay(slot_name).
pg_replication_slot_set_delay will set ReplicationSlot->delay and the
WAL sender checks MyReplicationSlot->delay > 0 and waits before
sending WAL. pg_replication_slot_unset_delay will set
ReplicationSlot->delay to 0, or instead of
pg_replication_slot_unset_delay, the
pg_replication_slot_set_delay(slot_name, 0) can be used, this way only
single function.

If the users want a standby to receive WAL with a delay, they can use
pg_replication_slot_set_delay after creating the replication slot.

Thoughts?

> Alternatively a GUC on the standby side instead of primary so that the wal 
> receiver stops responding to the wal sender?

I think we have wal_receiver_status_interval GUC on WAL receiver that
achieves the above i.e. not responding to the primary at all, one can
set wal_receiver_status_interval to, say, 1day.

[1]
{
{"wal_receiver_status_interval", PGC_SIGHUP, REPLICATION_STANDBY,
gettext_noop("Sets the maximum interval between WAL
receiver status reports to the sending server."),
NULL,
GUC_UNIT_S
},
&wal_receiver_status_interval,
10, 0, INT_MAX / 1000,
NULL, NULL, NULL
},

Regards,
Bharath Rupireddy.