Re: Why is hot_standby_feedback off by default?
On Sun, Oct 22, 2023 at 4:56 AM Vik Fearing wrote: > On 10/22/23 09:50, sirisha chamarthi wrote: > > Is there any specific reason hot_standby_feedback default is set to off? > > > Yes. No one wants a rogue standby to ruin production. > Agreed. I believe that any reasonable use of a standby server for queries requires hot_standby_feedback to be turned on. Otherwise, we can potentially see query cancellations, increased replication lag because of conflicts (while replaying vacuum cleanup records) on standby (resulting in longer failover times if the server is configured for disaster recovery + read scaling). Recent logical decoding on standby as well requires hot_standby_feedback to be turned on to avoid slot invalidation [1]. If there is no requirement to query the standby, admins can always set hot_standby to off. My goal here is to minimize the amount of configuration tuning required to use these features. [1]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html Thanks, Sirisha
Re: Why is hot_standby_feedback off by default?
Hi Andres, On Sun, Oct 22, 2023 at 12:08 PM Andres Freund wrote: > Hi, > > On October 22, 2023 4:56:15 AM PDT, Vik Fearing > wrote: > >On 10/22/23 09:50, sirisha chamarthi wrote: > >> Is there any specific reason hot_standby_feedback default is set to off? > > > > > >Yes. No one wants a rogue standby to ruin production. > > Medium term, I think we need an approximate xid->"time of assignment" > mapping that's continually maintained on the primary. One of the things > that'd show us to do is introduce a GUC to control the maximum effect of > hs_feedback on the primary, in a useful unit. Numbers of xids are not a > useful unit (100k xids is forever on some systems, a few minutes at best on > others, the rate is not necessarily that steady when plpgsql exception > handles are used, ...) > +1 on this idea. Please let me give this a try. Thanks, Sirisha
Why is hot_standby_feedback off by default?
Hi Hackers, Is there any specific reason hot_standby_feedback default is set to off? I see some explanation in the thread [1] about recovery_min_apply_delay value > 0 causing table bloat. However, recovery_min_apply_delay is set to 0 by default. So, if a server admin wants to change this value, they can change hot_standby_feedback as well if needed right? Thanks! [1]: https://www.postgresql.org/message-id/55f981ec.5040...@gmx.net
Re: Fix documentation for max_wal_size and min_wal_size
Hi On Mon, Apr 17, 2023 at 9:38 PM Michael Paquier wrote: > On Mon, Apr 17, 2023 at 07:57:58PM -0700, sirisha chamarthi wrote: > > On Fri, Apr 14, 2023 at 1:01 AM Kyotaro Horiguchi < > horikyota@gmail.com> > > wrote: > >> So, I personally think it should be written like this: "The default > >> size is 80MB. However, if you have changed the WAL segment size from > >> the default of 16MB, it will be five times the segment size.", but I'm > >> not sure what the others think about this.. > > Yes, I was under the impression that this should mention 16MB, but > I'd also add a note about initdb when a non-default value is specified > for the segment size. > How about the text below? "The default size is 80MB. However, if you have changed the WAL segment size from the default of 16MB with the initdb option --wal-segsize, it will be five times the segment size."
Re: Fix documentation for max_wal_size and min_wal_size
Hi, On Fri, Apr 14, 2023 at 1:01 AM Kyotaro Horiguchi wrote: > At Thu, 13 Apr 2023 12:01:04 -0700, sirisha chamarthi < > sirichamarth...@gmail.com> wrote in > > The documentation [1] says max_wal_size and min_wal_size defaults are 1GB > > and 80 MB respectively. However, these are configured based on the > > wal_segment_size and documentation is not clear about it. Attached a > patch > > to fix the documentation. > > > > [1] https://www.postgresql.org/docs/devel/runtime-config-wal.html > > Good catch! Now wal_segment_size is easily changed. > > -The default is 1 GB. > +The default value is configured to maximum of 64 times the > wal_segment_size or 1 GB. > -The default is 80 MB. > +The default value is configured to maximum of 5 times the > wal_segment_size or 80 MB. > > However, I believe that most users don't change the WAL segment size, > so the primary information is that the default sizes are 1GB and 80MB. > > So, I personally think it should be written like this: "The default > size is 80MB. However, if you have changed the WAL segment size from > the default of 16MB, it will be five times the segment size.", but I'm > not sure what the others think about this.. This looks good to me. Thanks, Sirisha
Fix documentation for max_wal_size and min_wal_size
Hi, The documentation [1] says max_wal_size and min_wal_size defaults are 1GB and 80 MB respectively. However, these are configured based on the wal_segment_size and documentation is not clear about it. Attached a patch to fix the documentation. [1] https://www.postgresql.org/docs/devel/runtime-config-wal.html Thanks, Sirisha 0001-Fix-documentation-for-max_wal_size-and-min_wal_size-.patch Description: Binary data
Add ps display while waiting for wal in read_local_xlog_page_guts
Hi, pg_create_logical_replication_slot can take longer than usual on a standby when there is no activity on the primary. We don't have enough information in the pg_stat_activity or process title to debug why this is taking so long. Attached a small patch to update the process title while waiting for the wal in read_local_xlog_page_guts. Any thoughts on introducing a new wait event too? For example, in my setup, slot creation took 8 minutes 13 seconds. It only succeeded after I ran select txid_current() on primary. postgres=# select pg_create_logical_replication_slot('s1','test_decoding'); pg_create_logical_replication_slot (s1,0/C096D10) (1 row) Time: 493365.995 ms (08:13.366) Thanks, Sirisha 0001-set-ps-display_while-waiting-for-wal.patch Description: Binary data
Wasted Vacuum cycles when OldestXmin is not moving
Hi Hackers, vacuum is not able to clean up dead tuples when OldestXmin is not moving (because of a long running transaction or when hot_standby_feedback is behind). Even though OldestXmin is not moved from the last time it checked, it keeps retrying every autovacuum_naptime and wastes CPU cycles and IOs when pages are not in memory. Can we not bypass the dead tuple collection and cleanup step until OldestXmin is advanced? Below log shows the vacuum running every 1 minute. 2023-01-09 08:13:01.364 UTC [727219] LOG: automatic vacuum of table "postgres.public.t1": index scans: 0 pages: 0 removed, 6960 remain, 6960 scanned (100.00% of total) tuples: 0 removed, 1572864 remain, 786432 are dead but not yet removable removable cutoff: 852, which was 2 XIDs old when operation ended frozen: 0 pages from table (0.00% of total) had 0 tuples frozen index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s buffer usage: 13939 hits, 0 misses, 0 dirtied WAL usage: 0 records, 0 full page images, 0 bytes system usage: CPU: user: 0.15 s, system: 0.00 s, elapsed: 0.29 s 2023-01-09 08:14:01.363 UTC [727289] LOG: automatic vacuum of table "postgres.public.t1": index scans: 0 pages: 0 removed, 6960 remain, 6960 scanned (100.00% of total) tuples: 0 removed, 1572864 remain, 786432 are dead but not yet removable removable cutoff: 852, which was 2 XIDs old when operation ended frozen: 0 pages from table (0.00% of total) had 0 tuples frozen index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s buffer usage: 13939 hits, 0 misses, 0 dirtied WAL usage: 0 records, 0 full page images, 0 bytes system usage: CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.29 s Thanks, Sirisha
Re: Introduce a new view for checkpointer related stats
On Thu, Dec 1, 2022 at 9:50 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Wed, Nov 30, 2022 at 5:15 PM Bharath Rupireddy > wrote: > > > > I don't have a strong opinion about changing column names. However, if > > we were to change it, I prefer to use names that > > PgStat_CheckpointerStats has. BTW, that's what > > PgStat_BgWriterStats/pg_stat_bgwriter and > > PgStat_ArchiverStats/pg_stat_archiver uses. > > After thinking about this a while, I convinced myself to change the > column names to be a bit more meaningful. I still think having > checkpoints in the column names is needed because it also has other > backend related columns. I'm attaching the v4 patch for further > review. > CREATE VIEW pg_stat_checkpointer AS > SELECT > pg_stat_get_timed_checkpoints() AS timed_checkpoints, > pg_stat_get_requested_checkpoints() AS requested_checkpoints, > pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, > pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, > pg_stat_get_buf_written_checkpoints() AS > buffers_written_checkpoints, > pg_stat_get_buf_written_backend() AS buffers_written_backend, > pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend, > pg_stat_get_checkpointer_stat_reset_time() AS stats_reset; > IMO, “buffers_written_checkpoints” is confusing. What do you think? > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
Re: Prefetch the next tuple's memory during seqscans
On Tue, Nov 22, 2022 at 11:44 PM David Rowley wrote: > On Wed, 23 Nov 2022 at 20:29, sirisha chamarthi > wrote: > > I ran your test1 exactly like your setup except the row count is 300 > (with 13275 blocks). Shared_buffers is 128MB and the hardware configuration > details at the bottom of the mail. It appears Master + 0001 + 0005 > regressed compared to master slightly . > > Thank you for running these tests. > > Can you share if the plans used for these queries was a parallel plan? > I had set max_parallel_workers_per_gather to 0 to remove the > additional variability from parallel query. > > Also, 13275 blocks is 104MBs, does EXPLAIN (ANALYZE, BUFFERS) indicate > that all pages were in shared buffers? I used pg_prewarm() to ensure > they were so that the runs were consistent. > I reran the test with setting max_parallel_workers_per_gather = 0 and with pg_prewarm. Appears I missed some step while testing on the master, thanks for sharing the details. New numbers show master has higher latency than *Master + 0001 + 0005*. *Master* Before vacuum: latency average = 452.881 ms After vacuum: latency average = 393.880 ms *Master + 0001 + 0005* Before vacuum: latency average = 441.832 ms After vacuum: latency average = 369.591 ms
Re: Prefetch the next tuple's memory during seqscans
On Tue, Nov 22, 2022 at 1:58 PM David Rowley wrote: > On Thu, 3 Nov 2022 at 06:25, Andres Freund wrote: > > Attached is an experimental patch/hack for that. It ended up being more > > beneficial to make the access ordering more optimal than prefetching the > tuple > > contents, but I'm not at all sure that's the be-all-end-all. > > Thanks for writing that patch. I've been experimenting with it. > > I tried unrolling the loop (patch 0003) as you mentioned in: > > + * FIXME: Worth unrolling so that we don't fetch the same cacheline > + * over and over, due to line items being smaller than a cacheline? > > but didn't see any gains from doing that. > > I also adjusted your patch a little so that instead of doing: > > - OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */ > + OffsetNumber *rs_vistuples; > + OffsetNumber rs_vistuples_d[MaxHeapTuplesPerPage]; /* their offsets */ > > to work around the issue of having to populate rs_vistuples_d in > reverse, I added a new field called rs_startindex to mark where the > first element in the rs_vistuples array is. The way you wrote it seems > to require fewer code changes, but per the FIXME comment you left, I > get the idea you just did it the way you did to make it work enough > for testing. > > I'm quite keen to move forward in committing the 0001 patch to add the > pg_prefetch_mem macro. What I'm a little undecided about is what the > best patch is to commit first to make use of the new macro. > > I did some tests on the attached set of patches: > > alter system set max_parallel_workers_per_gather = 0; > select pg_reload_conf(); > > create table t as select a from generate_series(1,1000)a; > alter table t set (autovacuum_enabled=false); > > $ cat bench.sql > select * from t where a = 0; > > psql -c "select pg_prewarm('t');" postgres > > -- Test 1 no frozen tuples in "t" > > Master (@9c6ad5eaa): > $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 383.332 ms > latency average = 375.747 ms > latency average = 376.090 ms > > Master + 0001 + 0002: > $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 370.133 ms > latency average = 370.149 ms > latency average = 370.157 ms > > Master + 0001 + 0005: > $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 372.662 ms > latency average = 371.034 ms > latency average = 372.709 ms > > -- Test 2 "select count(*) from t" with all tuples frozen > > $ cat bench1.sql > select count(*) from t; > > psql -c "vacuum freeze t;" postgres > psql -c "select pg_prewarm('t');" postgres > > Master (@9c6ad5eaa): > $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 406.238 ms > latency average = 407.029 ms > latency average = 406.962 ms > > Master + 0001 + 0005: > $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency" > latency average = 345.470 ms > latency average = 345.775 ms > latency average = 345.354 ms > > My current thoughts are that it might be best to go with 0005 to start > with. I know Melanie is working on making some changes in this area, > so perhaps it's best to leave 0002 until that work is complete. > I ran your test1 exactly like your setup except the row count is 300 (with 13275 blocks). Shared_buffers is 128MB and the hardware configuration details at the bottom of the mail. It appears *Master + 0001 + 0005 *regressed compared to master slightly . *Master (@56d0ed3b756b2e3799a7bbc0ac89bc7657ca2c33)* Before vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 430.287 ms After Vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 369.046 ms *Master + 0001 + 0002:* Before vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 427.983 ms After Vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 367.185 ms *Master + 0001 + 0005:* Before vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 447.045 ms After Vacuum: /usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10 postgres | grep -E "^latency" latency average = 374.484 ms lscpu output Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s):1 Vendor ID: GenuineIntel CPU family: 6 Model:
Re: Catalog_xmin is not advanced when a logical slot is lost
On Mon, Nov 21, 2022 at 10:56 AM Alvaro Herrera wrote: > On 2022-Nov-21, sirisha chamarthi wrote: > > > I have a old .partial file in the data directory to reproduce this. > > I don't think the .partial file is in itself important. But I think > this whole thing is a distraction. Yes, sorry for the confusion. > I managed to reproduce it > eventually, by messing with the slot and WAL at random, and my > conclusion is that we shouldn't mess with this at all for this bugfix. > Agreed. > Instead I'm going to do what Ashutosh mentioned at the start, which is > to verify both the restart_lsn and the invalidated_at, when deciding > whether to ignore the slot. > Sounds good to me. Thanks! > > It seems to me that there is a bigger mess here, considering that we use > the effective_xmin in some places and the other xmin (the one that's > saved to disk) in others. I have no patience for trying to disentangle > that at this point, though. > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ > "Having your biases confirmed independently is how scientific progress is > made, and hence made our great society what it is today" (Mary Gardiner) >
Re: Catalog_xmin is not advanced when a logical slot is lost
On Mon, Nov 21, 2022 at 10:40 AM sirisha chamarthi < sirichamarth...@gmail.com> wrote: > > > On Mon, Nov 21, 2022 at 10:11 AM Alvaro Herrera > wrote: > >> On 2022-Nov-21, sirisha chamarthi wrote: >> >> > It appears to be. wal_sender is setting restart_lsn to a valid LSN even >> > when the slot is invalidated. >> >> > postgres@pgvm:~$ /usr/local/pgsql/bin/pg_receivewal -S s1 -D . >> > pg_receivewal: error: unexpected termination of replication stream: >> ERROR: >> > requested WAL segment 000100EB has already been removed >> > pg_receivewal: disconnected; waiting 5 seconds to try again >> > ^Cpostgres@pgvm:~$ /usr/local/pgsql/bin/psql >> > psql (16devel) >> > Type "help" for help. >> > >> > postgres=# select * from pg_replication_slots; >> > server closed the connection unexpectedly >> > This probably means the server terminated abnormally >> > before or while processing the request. >> >> Whoa, I cannot reproduce this :-( >> > > I have a old .partial file in the data directory to reproduce this. > > postgres=# select * from pg_replication_slots; > slot_name | plugin | slot_type | datoid | database | temporary | active | > active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | > wal_status | safe_wal_size | two_phase > > ---++---++--+---+++--+--+-+-++---+--- > s2|| physical || | f | f | >| | | 2/DC00 | | lost > | | f > (1 row) > > postgres=# \q > postgres@pgvm:~$ ls > 0001000200D8 0001000200D9 > 0001000200DA 0001000200DB > 0001000200DC.partial > Just to be clear, it was hitting the assert I added in the slotfuncs.c but not in the code you mentioned. Apologies for the confusion. Also it appears in the above case I mentioned, the slot is not invalidated yet as the checkpointer did not run though the state says it is lost. > > >> >> -- >> Álvaro HerreraBreisgau, Deutschland — >> https://www.EnterpriseDB.com/ >> "Java is clearly an example of money oriented programming" (A. Stepanov) >> >
Re: Catalog_xmin is not advanced when a logical slot is lost
On Mon, Nov 21, 2022 at 10:11 AM Alvaro Herrera wrote: > On 2022-Nov-21, sirisha chamarthi wrote: > > > It appears to be. wal_sender is setting restart_lsn to a valid LSN even > > when the slot is invalidated. > > > postgres@pgvm:~$ /usr/local/pgsql/bin/pg_receivewal -S s1 -D . > > pg_receivewal: error: unexpected termination of replication stream: > ERROR: > > requested WAL segment 000100EB has already been removed > > pg_receivewal: disconnected; waiting 5 seconds to try again > > ^Cpostgres@pgvm:~$ /usr/local/pgsql/bin/psql > > psql (16devel) > > Type "help" for help. > > > > postgres=# select * from pg_replication_slots; > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > Whoa, I cannot reproduce this :-( > I have a old .partial file in the data directory to reproduce this. postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---++---++--+---+++--+--+-+-++---+--- s2|| physical || | f | f | | | | 2/DC00 | | lost | | f (1 row) postgres=# \q postgres@pgvm:~$ ls 0001000200D8 0001000200D9 0001000200DA 0001000200DB 0001000200DC.partial > > -- > Álvaro HerreraBreisgau, Deutschland — > https://www.EnterpriseDB.com/ > "Java is clearly an example of money oriented programming" (A. Stepanov) >
Re: Catalog_xmin is not advanced when a logical slot is lost
On Mon, Nov 21, 2022 at 9:12 AM Alvaro Herrera wrote: > On 2022-Nov-21, sirisha chamarthi wrote: > > > On Mon, Nov 21, 2022 at 8:05 AM Alvaro Herrera > > wrote: > > > > Thank you. I had pushed mine for CirrusCI to test, and it failed the > > > assert I added in slot.c: > > > https://cirrus-ci.com/build/4786354503548928 > > > Not yet sure why, looking into it. > > > > Can this be because restart_lsn is not set to InvalidXLogRecPtr for the > > physical slots? > > Hmm, that makes no sense. Is that yet another bug? Looking. > It appears to be. wal_sender is setting restart_lsn to a valid LSN even when the slot is invalidated. postgres=# select pg_Create_physical_replication_slot('s1'); pg_create_physical_replication_slot - (s1,) (1 row) postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---++---++--+---+++--+--+-+-++---+--- s1|| physical || | f | f | | | | | | | -8254390272 | f (1 row) postgres=# checkpoint; CHECKPOINT postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---++---++--+---+++--+--+-+-++---+--- s1|| physical || | f | f | | | | | | | -8374095064 | f (1 row) postgres=# \q postgres@pgvm:~$ /usr/local/pgsql/bin/pg_receivewal -S s1 -D . pg_receivewal: error: unexpected termination of replication stream: ERROR: requested WAL segment 000100EB has already been removed pg_receivewal: disconnected; waiting 5 seconds to try again ^Cpostgres@pgvm:~$ /usr/local/pgsql/bin/psql psql (16devel) Type "help" for help. postgres=# select * from pg_replication_slots; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The connection to the server was lost. Attempting reset: Failed. !?> ^C !?> In the log: 2022-11-21 17:31:48.159 UTC [3953664] STATEMENT: START_REPLICATION SLOT "s1" 0/EB00 TIMELINE 1 TRAP: failed Assert("XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)"), File: "slotfuncs.c", Line: 371, PID: 3953707 > > -- > Álvaro Herrera 48°01'N 7°57'E — > https://www.EnterpriseDB.com/ > "No es bueno caminar con un hombre muerto" >
Re: Catalog_xmin is not advanced when a logical slot is lost
On Mon, Nov 21, 2022 at 8:05 AM Alvaro Herrera wrote: > On 2022-Nov-21, sirisha chamarthi wrote: > > > > > I am a fan of stricter, all-assumption-covering conditions. In case > we > > > > don't want to check restart_lsn, an Assert might be useful to > validate > > > > our assumption. > > > > > > Agreed. I'll throw in an assert. > > > > Changed this in the patch to throw an assert. > > Thank you. I had pushed mine for CirrusCI to test, and it failed the > assert I added in slot.c: > https://cirrus-ci.com/build/4786354503548928 > Not yet sure why, looking into it. > Can this be because restart_lsn is not set to InvalidXLogRecPtr for the physical slots? My repro is as follows: select pg_create_physical_replication_slot('s5'); // Load some data to invalidate slot postgres@pgvm:~$ /usr/local/pgsql/bin/pg_receivewal -S s5 -D . pg_receivewal: error: unexpected termination of replication stream: ERROR: requested WAL segment 000100EB has already been removed pg_receivewal: disconnected; waiting 5 seconds to try again pg_receivewal: error: unexpected termination of replication stream: ERROR: requested WAL segment 000100EB has already been removed pg_receivewal: disconnected; waiting 5 seconds to try again ^Cpostgres@pgvm:~$ /usr/local/pgsql/bin/psql psql (16devel) Type "help" for help. postgres=# select * from pg_replication_slots; slot_name |plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---+---+---++--+---+++--+--+-+-++---+--- s3| test_decoding | logical | 5 | postgres | f | f || | 769 | | 0/A992E7D0 | lost | | f s5| | physical || | f | f || | | 0/EB00 | | lost | | f > > > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ >
Re: Catalog_xmin is not advanced when a logical slot is lost
Thanks Alvaro, Ashutosh for your comments. On Mon, Nov 21, 2022 at 6:20 AM Alvaro Herrera wrote: > On 2022-Nov-21, Ashutosh Bapat wrote: > > > Maybe. In that case pg_get_replication_slots() should be changed. We > > should use the same criteria to decide whether a slot is invalidated > > or not at all the places. > > Right. > Agreed. > > > I am a fan of stricter, all-assumption-covering conditions. In case we > > don't want to check restart_lsn, an Assert might be useful to validate > > our assumption. > > Agreed. I'll throw in an assert. > Changed this in the patch to throw an assert. > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ > 0002-Ignore-invalidated-slots-while-computing-the-oldest-.patch Description: Binary data
Proposal: Allow user with pg_monitor role to call pg_stat_reset* functions
Hi Hackers, At present, calling pg_stat_reset* functions requires super user access unless explicitly grant execute permission on those. In this thread, I am proposing to grant execute on them to users with pg_monitor role permissions. This comes handy to the monitoring users (part of pg_monitor role) to capture the stats fresh and analyze. Do you see any concerns with this approach? Thanks, Sirisha
Re: Fix comments atop pg_get_replication_slots
Amit, thanks for looking into this! On Sun, Nov 20, 2022 at 11:38 PM Amit Kapila wrote: > On Mon, Nov 21, 2022 at 12:45 PM sirisha chamarthi > wrote: > > > > Hi Hackers, > > > > The comments atop seem to indicate that it is only showing active > replication slots. The comment is ambiguous as it also shows all the slots > including lost and inactive slots. Attached a small patch to fix it. > > > > I agree that it is a bit confusing. How about "SQL SRF showing all > replication slots that currently exist on the database cluster"? > Looks good to me. Attached a patch for the same. > > -- > With Regards, > Amit Kapila. > 0002-Fix-atop-pg_get_replication_slots-function-to-reflec.patch Description: Binary data
Fix comments atop pg_get_replication_slots
Hi Hackers, The comments atop seem to indicate that it is only showing active replication slots. The comment is ambiguous as it also shows all the slots including lost and inactive slots. Attached a small patch to fix it. Thanks, Sirisha 0001-Fix-atop-pg_get_replication_slots-function-to-reflec.patch Description: Binary data
Catalog_xmin is not advanced when a logical slot is lost
Hi Hackers, forking this thread from the discussion [1] as suggested by Amit. Catalog_xmin is not advanced when a logical slot is invalidated (lost) until the invalidated slot is dropped. This patch ignores invalidated slots while computing the oldest xmin. Attached a small patch to address this and the output after the patch is as shown below. postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---+---+---++--+---+++--+--+-+-++---+--- s2 | test_decoding | logical | 5 | postgres | f | f | | | 771 | 0/30466368 | 0/304663A0 | reserved | 28903824 | f (1 row) postgres=# create table t2(c int, c1 char(100)); CREATE TABLE postgres=# drop table t2; DROP TABLE postgres=# vacuum pg_class; VACUUM postgres=# select n_dead_tup from pg_stat_all_tables where relname = 'pg_class'; n_dead_tup 2 (1 row) postgres=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_pri ority | sync_state | reply_time -+--+-+--+-+-+-+---+--+---+--+---+---++---+---++- --++ (0 rows) postgres=# insert into t1 select * from t1; INSERT 0 2097152 postgres=# checkpoint; CHECKPOINT postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---+---+---++--+---+++--+--+-+-++---+--- s2 | test_decoding | logical | 5 | postgres | f | f | | | 771 | | 0/304663A0 | lost | | f (1 row) postgres=# vacuum pg_class; VACUUM postgres=# select n_dead_tup from pg_stat_all_tables where relname = 'pg_class'; n_dead_tup 0 (1 row) [1] https://www.postgresql.org/message-id/flat/CAKrAKeW-sGqvkw-2zKuVYiVv%3DEOG4LEqJn01RJPsHfS2rQGYng%40mail.gmail.com Thanks, Sirisha 0001-Ignore-invalidated-slots-while-computing-the-oldest-.patch Description: Binary data
Re: Reviving lost replication slots
On Wed, Nov 9, 2022 at 12:32 AM Kyotaro Horiguchi wrote: > I don't think walsenders fetching segment from archive is totally > stupid. With that feature, we can use fast and expensive but small > storage for pg_wal, while avoiding replciation from dying even in > emergency. > Thanks! If there is a general agreement on this in this forum, I would like to start working on this patch, > > At Tue, 8 Nov 2022 19:39:58 -0800, sirisha chamarthi < > sirichamarth...@gmail.com> wrote in > > > If it's a streaming replication slot, the standby will anyway jump to > > > archive mode ignoring the replication slot and the slot will never be > > > usable again unless somebody creates a new replication slot and > > > provides it to the standby for reuse. > > > If it's a logical replication slot, the subscriber will start to > > > diverge from the publisher and the slot will have to be revived > > > manually i.e. created again. > > > > > > > Physical slots can be revived with standby downloading the WAL from the > > archive directly. This patch is helpful for the logical slots. > > However, supposing that WalSndSegmentOpen() fetches segments from > archive as the fallback and that succeeds, the slot can survive > missing WAL in pg_wal in the first place. So this patch doesn't seem > to be needed for the purpose. > Agree on this. If we add the proposed support, we don't need this patch. > > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center >
Re: Reviving lost replication slots
On Wed, Nov 9, 2022 at 2:37 AM Amit Kapila wrote: > On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > wrote: > > > Is the intent of setting restart_lsn to InvalidXLogRecPtr was to > disallow reviving the slot? > > > > I think the intent is to compute the correct value for > replicationSlotMinLSN as we use restart_lsn for it and using the > invalidated slot's restart_lsn value for it doesn't make sense. > Correct. If a slot is invalidated (lost), then shouldn't we ignore the slot from computing the catalog_xmin? I don't see it being set to InvalidTransactionId in ReplicationSlotsComputeRequiredXmin. Attached a small patch to address this and the output after the patch is as shown below. postgres=# select * from pg_replication_slots; slot_name |plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---+---+---++--+---+++--+--+-+-++---+--- s2| test_decoding | logical | 5 | postgres | f | f || | 771 | 0/30466368 | 0/304663A0 | reserved | 28903824 | f (1 row) postgres=# create table t2(c int, c1 char(100)); CREATE TABLE postgres=# drop table t2; DROP TABLE postgres=# vacuum pg_class; VACUUM postgres=# select n_dead_tup from pg_stat_all_tables where relname = 'pg_class'; n_dead_tup 2 (1 row) postgres=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_pri ority | sync_state | reply_time -+--+-+--+-+-+-+---+--+---+--+---+---++---+---++- --++ (0 rows) postgres=# insert into t1 select * from t1; INSERT 0 2097152 postgres=# checkpoint; CHECKPOINT postgres=# select * from pg_replication_slots; slot_name |plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---+---+---++--+---+++--+--+-+-++---+--- s2| test_decoding | logical | 5 | postgres | f | f || | 771 | | 0/304663A0 | lost | | f (1 row) postgres=# vacuum pg_class; VACUUM postgres=# select n_dead_tup from pg_stat_all_tables where relname = 'pg_class'; n_dead_tup 0 (1 row) > > -- > With Regards, > Amit Kapila. >
Re: Reviving lost replication slots
On Mon, Nov 7, 2022 at 11:17 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi > wrote: > > > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila > wrote: > >> > >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > >> wrote: > >> > > >> > A replication slot can be lost when a subscriber is not able to catch > up with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > >> > > >> > >> Why in the first place one has to set max_slot_wal_keep_size if they > >> care for WAL more than that? > > > > Disk full is a typical use where we can't wait until the logical slots > to catch up before truncating the log. > > If the max_slot_wal_keep_size is set appropriately and the replication > lag is monitored properly along with some automatic actions such as > replacing/rebuilding the standbys or subscribers (which may not be > easy and cheap though), the chances of hitting the "lost replication" > problem becomes less, but not zero always. > pg_dump and pg_restore can take several hours to days on a large database. Keeping the WAL in the pg_wal folder (faster, smaller and costly disks?) is not always an option. > > >> If you have a case where you want to > >> handle this case for some particular slot (where you are okay with the > >> invalidation of other slots exceeding max_slot_wal_keep_size) then the > >> other possibility could be to have a similar variable at the slot > >> level but not sure if that is a good idea because you haven't > >> presented any such case. > > > > IIUC, ability to fetch WAL from the archive as a fall back mechanism > should automatically take care of all the lost slots. Do you see a need to > take care of a specific slot? If the idea is not to download the wal files > in the pg_wal directory, they can be placed in a slot specific folder > (data/pg_replslot//) until they are needed while decoding and can be > removed. > > Is the idea here the core copying back the WAL files from the archive? > If yes, I think it is not something the core needs to do. This very > well fits the job of an extension or an external module that revives > the lost replication slots by copying WAL files from archive location. > The current code is throwing an error that the slot is lost because the restart_lsn is set to invalid LSN when the WAL is truncated by checkpointer. In order to build an external service that can revive a lost slot, at the minimum we needed the patch attached. > > Having said above, what's the best way to revive a lost replication > slot today? Any automated way exists today? It seems like > pg_replication_slot_advance() doesn't do anything for the > invalidated/lost slots. > If the WAL is available in the pg_wal directory, the replication stream resumes normally when the client connects with the patch I posted. > > If it's a streaming replication slot, the standby will anyway jump to > archive mode ignoring the replication slot and the slot will never be > usable again unless somebody creates a new replication slot and > provides it to the standby for reuse. > If it's a logical replication slot, the subscriber will start to > diverge from the publisher and the slot will have to be revived > manually i.e. created again. > Physical slots can be revived with standby downloading the WAL from the archive directly. This patch is helpful for the logical slots. > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
Re: Reviving lost replication slots
On Tue, Nov 8, 2022 at 1:36 AM Amit Kapila wrote: > On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi > wrote: > > > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila > wrote: > >> > >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > >> wrote: > >> > > >> > A replication slot can be lost when a subscriber is not able to catch > up with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > >> > > >> > >> Why in the first place one has to set max_slot_wal_keep_size if they > >> care for WAL more than that? > > > > Disk full is a typical use where we can't wait until the logical slots > to catch up before truncating the log. > > > > Ideally, in such a case the subscriber should fall back to the > physical standby of the publisher but unfortunately, we don't yet have > a functionality where subscribers can continue logical replication > from physical standby. Do you think if we had such functionality it > would serve our purpose? > Don't think streaming from standby helps as the disk layout is expected to remain the same on physical standby and primary. > >> If you have a case where you want to > >> handle this case for some particular slot (where you are okay with the > >> invalidation of other slots exceeding max_slot_wal_keep_size) then the > >> other possibility could be to have a similar variable at the slot > >> level but not sure if that is a good idea because you haven't > >> presented any such case. > > > > IIUC, ability to fetch WAL from the archive as a fall back mechanism > should automatically take care of all the lost slots. Do you see a need to > take care of a specific slot? > > > > No, I was just trying to see if your use case can be addressed in some > other way. BTW, won't copying the WAL again back from archive can lead > to a disk full situation. > The idea is to download the WAL from archive on demand as the slot requires them and throw away the segment once processed. > > -- > With Regards, > Amit Kapila. >
Re: Reviving lost replication slots
Hi Amit, Thanks for your comments! On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila wrote: > On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > wrote: > > > > A replication slot can be lost when a subscriber is not able to catch up > with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > > > > Why in the first place one has to set max_slot_wal_keep_size if they > care for WAL more than that? Disk full is a typical use where we can't wait until the logical slots to catch up before truncating the log. If you have a case where you want to > handle this case for some particular slot (where you are okay with the > invalidation of other slots exceeding max_slot_wal_keep_size) then the > other possibility could be to have a similar variable at the slot > level but not sure if that is a good idea because you haven't > presented any such case. > IIUC, ability to fetch WAL from the archive as a fall back mechanism should automatically take care of all the lost slots. Do you see a need to take care of a specific slot? If the idea is not to download the wal files in the pg_wal directory, they can be placed in a slot specific folder (data/pg_replslot//) until they are needed while decoding and can be removed. > > -- > With Regards, > Amit Kapila. >
Reviving lost replication slots
Hi, A replication slot can be lost when a subscriber is not able to catch up with the load on the primary and the WAL to catch up exceeds max_slot_wal_keep_size. When this happens, target has to be reseeded (pg_dump) from the scratch and this can take longer. I am investigating the options to revive a lost slot. With the attached patch and copying the WAL files from the archive to pg_wal directory I was able to revive the lost slot. I also verified that a lost slot doesn't let vacuum cleanup the catalog tuples deleted by any later transaction than catalog_xmin. One side effect of this approach is that the checkpointer creating the .ready files corresponds to the copied wal files in the archive_status folder. Archive command has to handle this case. At the same time, checkpointer can potentially delete the file again before the subscriber consumes the file again. In the proposed patch, I am not setting restart_lsn to InvalidXLogRecPtr but instead relying on invalidated_at field to tell if the slot is lost. Is the intent of setting restart_lsn to InvalidXLogRecPtr was to disallow reviving the slot? If overall direction seems ok, I would continue on the work to revive the slot by copying the wal files from the archive. Appreciate your feedback. Thanks, Sirisha 0001-Allow-revive-a-lost-replication-slot.patch Description: Binary data
Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value
On Wed, Oct 19, 2022 at 7:59 PM Kyotaro Horiguchi wrote: > At Wed, 19 Oct 2022 13:06:08 +0530, Bharath Rupireddy < > bharath.rupireddyforpostg...@gmail.com> wrote in > > On Wed, Oct 19, 2022 at 12:39 PM sirisha chamarthi > > wrote: > > > > > > The current code comment says that the replication stream on a slot > with the given targetLSN can't continue after a restart but even without a > restart the stream cannot continue. The slot is invalidated and the > walsender process is terminated by the checkpoint process. Attaching a > small patch to fix the comment. > > In short, the proposed fix alone seems fine to me. If we want to show > further details, I would add a bit as follows. > > | * * WALAVAIL_REMOVED means it has been removed. A replication stream on > | * a slot with this LSN cannot continue. Note that the affected > | * processes have been terminated by checkpointer, too. > Thanks for your comments! Attached the patch with your suggestions. Thanks, Sirisha v2-0001-Fix-GetWALAvailability-function-code-comments.patch Description: Binary data
Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value
Hi Hackers, The current code comment says that the replication stream on a slot with the given targetLSN can't continue after a restart but even without a restart the stream cannot continue. The slot is invalidated and the walsender process is terminated by the checkpoint process. Attaching a small patch to fix the comment. 2022-10-19 06:26:22.387 UTC [144482] STATEMENT: START_REPLICATION SLOT "s2" LOGICAL 0/0 2022-10-19 06:27:41.998 UTC [2553755] LOG: checkpoint starting: time 2022-10-19 06:28:04.974 UTC [2553755] LOG: terminating process 144482 to release replication slot "s2" 2022-10-19 06:28:04.974 UTC [144482] FATAL: terminating connection due to administrator command 2022-10-19 06:28:04.974 UTC [144482] CONTEXT: slot "s2", output plugin "test_decoding", in the change callback, associated LSN 0/1E23AB68 2022-10-19 06:28:04.974 UTC [144482] STATEMENT: START_REPLICATION SLOT "s2" LOGICAL 0/0 Thanks, Sirisha 0001-Fix-GetWALAvailability-function-code-comments.patch Description: Binary data
Unable to connect to Postgres13 server from psql client built on master
Hi hackers. I am unable to connect to my Postgres server (version 13 running) in Azure Postgres from the PSQL client built on the latest master. However, I am able to connect to the Postgres 15 server running locally on the machine. I installed an earlier version of the PSQL client (v 12) and was able to connect to both the Azure PG instance as well as the local instance. Can this be a bug in the master? I tried looking at the server logs in Azure but couldn't get anything meaningful from those. Any tips on how I can debug psql client further? My local server is running with trust authentication and the remote server is running with md5 in the pg_hba.conf. I am not sure if this changes the psql behavior somehow. root@userspgdev:/usr/local/pgsql# ./psql -U postgres -h inst.postgres.database.azure.com -d postgres bash: ./psql: No such file or directory root@userspgdev:/usr/local/pgsql# psql -U postgres -h inst.postgres.database.azure.com -d postgres Password for user postgres: psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 13.6) WARNING: psql major version 12, server major version 13. Some psql features might not work. SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. postgres=> \q bin/psql -U postgres -h inst.postgres.database.azure.com -d postgres psql: error: connection to server at "inst.postgres.database.azure.com" (20.116.167.xx), port 5432 failed: FATAL: no pg_hba.conf entry for host "20.125.61.xx", user "postgres", database "postgres", SSL off Also, wondering why no error is emitted by the psql client when the connection attempt fails? Thanks, Sirisha
Documentation issue with pg_stat_recovery_prefetch
Hi, I was going through pg_stat_recovery_prefetch documentation and saw an issue with formatting. Attached a small patch to fix the issue. This is the first time I am sending an email to hackers. Please educate me if I miss something. https://www.postgresql.org/docs/devel/monitoring-stats.html#PG-STAT-RECOVERY-PREFETCH-VIEW Thanks, Sirisha 0001-Fix-documentation-bug-for-pg_stat_recovery_prefetch.patch Description: Binary data