Re: [11.6] WALs recovery ordering with Restore Command - timelines
One more log example: > > 2019-12-06 07:11:16 CST LOG: database system was shut down in recovery > at 2019-12-06 07:11:08 CST > Searching WAL: 0022.history, location: pg_wal/RECOVERYHISTORY > 2019-12-06 07:11:16 CST LOG: restored log file "0022.history" from > archive > Searching WAL: 0023.history, location: pg_wal/RECOVERYHISTORY > 2019-12-06 07:11:16 CST LOG: entering standby mode > Searching WAL: 0022.history, location: pg_wal/RECOVERYHISTORY > 2019-12-06 07:11:16 CST LOG: restored log file "0022.history" from > archive > Searching WAL: 002218C6003F, location: pg_wal/RECOVERYXLOG > Searching WAL: 002118C6003F, location: pg_wal/RECOVERYXLOG > 2019-12-06 07:11:20 CST LOG: restored log file > "002118C6003F" from archive > Searching WAL: 0021.history, location: pg_wal/RECOVERYHISTORY > 2019-12-06 07:11:20 CST LOG: restored log file "0021.history" from > archive > Searching WAL: 002218BF001B, location: pg_wal/RECOVERYXLOG > Searching WAL: 002118BF001B, location: pg_wal/RECOVERYXLOG > 2019-12-06 07:11:27 CST LOG: restored log file > "002118BF001B" from archive > 2019-12-06 07:11:27 CST LOG: redo starts at 18BF/1B311260 > Searching WAL: 002218BF001C, location: pg_wal/RECOVERYXLOG > Searching WAL: 002118BF001C, location: pg_wal/RECOVERYXLOG > 2019-12-06 07:11:34 CST LOG: restored log file > "002118BF001C" from archive > Searching WAL: 002218BF001D, location: pg_wal/RECOVERYXLOG > Searching WAL: 002118BF001D, location: pg_wal/RECOVERYXLOG > 2019-12-06 07:11:40 CST LOG: restored log file > "002118BF001D" from archive > Searching WAL: 002218BF001E, location: pg_wal/RECOVERYXLOG > Searching WAL: 002118BF001E, location: pg_wal/RECOVERYXLOG > 2019-12-06 07:11:46 CST LOG: restored log file > "002118BF001E" from archive > Searching WAL: 002218BF001F, location: pg_wal/RECOVERYXLOG > Searching WAL: 002118BF001F, location: pg_wal/RECOVERYXLOG > 2019-12-06 07:11:53 CST LOG: restored log file > "002118BF001F" from archive > Command: *restore_command = 'echo -e "Searching WAL: %f, location: %p"; /usr/bin/pgbackrest --stanza=platform archive-get %f "%p"'* As you can see Postgres tries restoring* 0022** WALs *before timeline switch LSN* is reached while restoring *0021**. Should I resend this issue to a pgsql-bugs <https://www.postgresql.org/list/pgsql-bugs/>? пн, 25 нояб. 2019 г. в 23:26, Pavel Suderevsky : > Hi, > > PostgreSQL 11.6 > Centos 7.6 > pgBackrest 2.19 > > > Case: > Master is on timeline 15 while Standby that should be synchronized with > Master as a hot standby is on timeline 13. WALs to be obtained from archive. > > recovery.conf: > >primary_conninfo = 'user=usename host=10.10.10.3 port=5432 sslmode=prefer > application_name=pg2' > >recovery_target_timeline = 'latest' > >restore_command = '/usr/bin/pgbackrest --stanza=platform archive-get %f > "%p" --log-level-console info' > >standby_mode = 'on' > > Standby recovery is done like that: > 1. try restore 000F0A2700E5 > 2. unable to find 000F0A2700E5 in the archive > 3. try restore 000E0A2700E5 > 4. found 000E0A2700E5 in the archive > -- trying next WAL for BOTH timelines > 1. try restore 000F0A2700E6 > 2. unable to find 000F0A2700E6 in the archive > 3. try restore 000E0A2700E6 > 4. found 000E0A2700E6 in the archive > > Why does Postgres restore WALs not in a timelines order? First 13, then > 14, then 15. Up to timeline switch position. WALs it try to restore for the > latest timeline are deliberately not yet exist. It leads to terrible > recovery performance because of long "unable to find" operations. >
[11.6] WALs recovery ordering with Restore Command - timelines
Hi, PostgreSQL 11.6 Centos 7.6 pgBackrest 2.19 Case: Master is on timeline 15 while Standby that should be synchronized with Master as a hot standby is on timeline 13. WALs to be obtained from archive. recovery.conf: >primary_conninfo = 'user=usename host=10.10.10.3 port=5432 sslmode=prefer application_name=pg2' >recovery_target_timeline = 'latest' >restore_command = '/usr/bin/pgbackrest --stanza=platform archive-get %f "%p" --log-level-console info' >standby_mode = 'on' Standby recovery is done like that: 1. try restore 000F0A2700E5 2. unable to find 000F0A2700E5 in the archive 3. try restore 000E0A2700E5 4. found 000E0A2700E5 in the archive -- trying next WAL for BOTH timelines 1. try restore 000F0A2700E6 2. unable to find 000F0A2700E6 in the archive 3. try restore 000E0A2700E6 4. found 000E0A2700E6 in the archive Why does Postgres restore WALs not in a timelines order? First 13, then 14, then 15. Up to timeline switch position. WALs it try to restore for the latest timeline are deliberately not yet exist. It leads to terrible recovery performance because of long "unable to find" operations.
Re: 9.6.11- could not truncate directory "pg_serial": apparent wraparound
Hi, Got this issue again. Settings on the platform (PG 9.6.11): max_pred_locks_per_transaction = 3000 max_connections = 800 Despite the fact that documentation says: > with the exception of fast-path locks, each lock manager will deliver a consistent set of results I've noticed the following: 1. pg_locks showed 2 million SIReadLocks for the pid that has been in the "idle" state for a dozen of seconds already. 2. pg_locks showed count of 5 million SIReadLock granted although I expected a limit of 2.4 million SIReadLocks with settings provided above. And still no any signs of serializable transactions that could live for a 1 billion xids. -- Pavel Suderevsky E: psuderevsky(at)gmail(dot)com вт, 9 апр. 2019 г. в 16:00, Pavel Suderevsky : > On Sun, Apr 7, 2019 at 2:31 AM Pavel Suderevsky >> wrote: >> > Probably if you advise me what could cause "pg_serial": apparent >> wraparound messages I would have more chances to handle all the performance >> issues. >> >> Did you see that warning at some point before the later error? >> > Thomas, > > Thank you for your reply! > > No, there have never been such warnings. > > I wonder if this condition required you to have a serializable >> transaction running (or prepared) while you consume 2^30 AKA ~1 >> billion xids. I think it is unreachable in v11+ because commit >> e5eb4fa8 allowed for more SLRU pages to avoid this artificially early >> wrap. > > > Do I understand right that this is about Virtual txids? Have no idea how > even a something close to a billion of transaction ids could be consumed on > this system. > > -- > Pavel Suderevsky > E: psuderev...@gmail.com >
Re: 9.6.11- could not truncate directory "pg_serial": apparent wraparound
> > On Sun, Apr 7, 2019 at 2:31 AM Pavel Suderevsky > wrote: > > Probably if you advise me what could cause "pg_serial": apparent > wraparound messages I would have more chances to handle all the performance > issues. > > Did you see that warning at some point before the later error? > Thomas, Thank you for your reply! No, there have never been such warnings. I wonder if this condition required you to have a serializable > transaction running (or prepared) while you consume 2^30 AKA ~1 > billion xids. I think it is unreachable in v11+ because commit > e5eb4fa8 allowed for more SLRU pages to avoid this artificially early > wrap. Do I understand right that this is about Virtual txids? Have no idea how even a something close to a billion of transaction ids could be consumed on this system. -- Pavel Suderevsky E: psuderev...@gmail.com
Re: 9.6.11- could not truncate directory "pg_serial": apparent wraparound
Guys, still need your help. Previous night: *2019-04-05 00:35:04 UTC LOG: could not truncate directory "pg_serial": apparent wraparound* *2019-04-05 00:40:04 UTC LOG: could not truncate directory "pg_serial": apparent wraparound* (2 checkpoints) It turned that I have some problem with performance related to predicate locking on this platform. A lot of long prepared statements with the *SerializableXactHashLock* and *predicate_lock_manager* wait_events followed by high CPU usage happened during 00:30 and 00:45. During this period there were 55k pred locks granted at max and 30k in average. Probably because of high CPU usage some statements were spending a lot of time in bind/parse steps. Probably if you advise me what could cause *"pg_serial": apparent wraparound* messages I would have more chances to handle all the performance issues. Thank you! -- Pavel Suderevsky E: psuderev...@gmail.com пн, 11 мар. 2019 г. в 19:09, Pavel Suderevsky : > Hi, > > PG: 9.6.11 > OS: CentOS > Env: AWS EC2 > > I've faced the following exceptions in PostgreSQL server log: > > could not truncate directory "pg_serial": apparent wraparound > Sometimes it repeats every 5 min and the longest period was 40 min. > > In fact, I can't find any suspicious events happening that periods. > pg_wait_sampling didn't catch any events, no long queries (more than 60s), > Autovacuum workers or transactions in "idle in transaction" state were in > action at this time. > > The only related I could find in archive is: > https://www.postgresql.org/message-id/CACjxUsON4Vya3a6r%3DubwmN-4qTDDfZjuwSzjnL1QjdUc8_gzLw%40mail.gmail.com > >You should not see the errors you are reporting nor > >the warning I mentioned unless a serializable transaction remains > >active long enough for about 1 billion transaction IDs to be > >consumed. > > Database age now is just 18.5 millions of transactions. > > Server has two standbys (sync and async), hot_standby_feedback is off. > > Please advice what I can do to find a reason of these exceptions. >
9.6.11- could not truncate directory "pg_serial": apparent wraparound
Hi, PG: 9.6.11 OS: CentOS Env: AWS EC2 I've faced the following exceptions in PostgreSQL server log: > could not truncate directory "pg_serial": apparent wraparound Sometimes it repeats every 5 min and the longest period was 40 min. In fact, I can't find any suspicious events happening that periods. pg_wait_sampling didn't catch any events, no long queries (more than 60s), Autovacuum workers or transactions in "idle in transaction" state were in action at this time. The only related I could find in archive is: https://www.postgresql.org/message-id/CACjxUsON4Vya3a6r%3DubwmN-4qTDDfZjuwSzjnL1QjdUc8_gzLw%40mail.gmail.com >You should not see the errors you are reporting nor >the warning I mentioned unless a serializable transaction remains >active long enough for about 1 billion transaction IDs to be >consumed. Database age now is just 18.5 millions of transactions. Server has two standbys (sync and async), hot_standby_feedback is off. Please advice what I can do to find a reason of these exceptions.
Long statement execution. Data fetch by client
Hi, OS: 7.5.1804 PG version: 9.6.12 Client: Java application (jdbc 42.0.0) I have the issue with unexpectedly long statements: LOG: duration: 9284.887 ms parse : SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: duration: 12739.364 ms parse : SHOW TRANSACTION ISOLATION LEVEL LOG: duration: 10500.315 ms parse : SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: duration: 10467.578 ms parse : SHOW TRANSACTION ISOLATION LEVEL LOG: duration: 25901.488 ms execute S_3: COMMIT It's not a fsync issue. Another statements executed without issues during execution of mentioned statements. pg_stat_statements didn't catch any such long statement. No wait_events catched for these statements (checked by taking pg_stat_activity snapshots every 100ms). Processes were in R state (according atop, and related to that moments that were catched). I have an assumption that issue is happening during data fetch by Java. Could you please assist with the following questions? 1. Does query have 'active' state while data is being fetched by client? 2. Do I understand right that log_statement includes fetch time when records duration time in log, while pg_stat_statements includes only server-side computation time? 3. Is there any way to catch client's long fetch event on PostgreSQL server side? 4. Is that all about FETCH or what else can cause such long execution of statements like 'SHOW TRANSACTION ISOLATION LEVEL'? Thank you very much in advance. -- Pavel Suderevsky