Link WAL record to session
Hi, All! Is it possible to associate a WAL record with it session/process id? For example, if I have this record in WAL: rmgr: Standby len (rec/tot): 42/42, tx: 1142213518, lsn: 36174/FDC19050, prev 36174/FDC19020, desc: LOCK xid 1142213518 db 13091 rel 733108033 Can I find session that writes it? -- Sergey Burladyan
How does postgres sort large strings?
Hi! I thought that the sorting values are stored entirely in work_mem, but in fact it works somehow differently. Can anyone suggest how this works? For example, I created this 512MB incompressible file and test table: tr -dcs '[:print:]' '[:print:]' < /dev/urandom | tr -d '"'\' | dd bs=1K count=512K of=asciidump create unlogged table t1 (v text); insert into t1 select pg_read_file('asciidump') from generate_series(1, 10); select pg_column_size(v), octet_length(v) from t1 limit 1; pg_column_size | octet_length +-- 536870912 |536870912 set work_mem to '64MB'; Now I think that 64MB is not enough to sort such large values and postgres will use temp files, but in fact it does not. select temp_files, temp_bytes from pg_stat_database where datname = current_catalog; temp_files | temp_bytes + 0 | 0 explain (analyze,verbose,buffers) select v from t1 order by v; QUERY PLAN --- Sort (cost=94.38..97.78 rows=1360 width=32) (actual time=6433.138..6433.140 rows=10 loops=1) Output: v Sort Key: t1.v Sort Method: quicksort Memory: 25kB Buffers: shared hit=543881 read=679794 written=118012 -> Seq Scan on public.t1 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.007..0.009 rows=10 loops=1) Output: v Buffers: shared hit=1 Planning Time: 0.035 ms Execution Time: 6433.155 ms > Sort Method: quicksort Memory: 25kB select temp_files, temp_bytes from pg_stat_database where datname = current_catalog; temp_files | temp_bytes + 0 | 0 WOW! How does it work?! :-) -- Sergey Burladyan
Re: Running from 9.6 backups sometimes fails with fatal error
I started trying to debug backup with minimal WALs for achieve consistency, it is from 'server 2' and it only needs 20 WAL files. It error was: ERROR: xlog flush request 44B/7E5DAB28 is not satisfied --- flushed only to 44B/7305B560 CONTEXT: writing block 0 of relation base/16506/16891_vm === backup_label === START WAL LOCATION: 44B/6002F280 (file 0001044B0060) CHECKPOINT LOCATION: 44B/7305B4F0 BACKUP METHOD: streamed BACKUP FROM: standby START TIME: 2019-02-13 03:19:54 MSK LABEL: pg_basebackup base backup === controldata === Database cluster state: in archive recovery pg_control last modified: Wed 13 Feb 2019 03:19:54 AM MSK Latest checkpoint location: 44B/7305B4F0 Latest checkpoint's REDO location:44B/6002F280 Latest checkpoint's REDO WAL file:0001044B0060 Time of latest checkpoint:Wed 13 Feb 2019 02:19:36 AM MSK Minimum recovery ending location: 44B/7305B560 I scan all *_vm files in DB 16506 for pages with LSN bigger than 'Minimum recovery ending location' and find three files: base/16506/190024_vm 000 44B/9411F5C0 base/16506/190005_vm 000 44B/94121DE8 base/16506/16891_vm 000 44B/7E5DAB28 Then I parse (pg_xlogdump) all this 20 WALs, needed for consistency and search for this three oid, for 190024 and 190005 I found VISIBLE WAL record: === $ grep 190024 wals-text | grep VISIBLE === rmgr: Heap2 len (rec/tot): 64/ 8256, tx: 0, lsn: 44B/651EDAE8, prev 44B/651EDA58, desc: VISIBLE cutoff xid 752732359 flags 1, blkref #0: rel 1663/16506/190024 fork vm blk 0 FPW, blkref #1: rel 1663/16506/190024 blk 0 rmgr: Heap2 len (rec/tot): 59/59, tx: 0, lsn: 44B/6B19A6E8, prev 44B/6B19A658, desc: VISIBLE cutoff xid 752734764 flags 1, blkref #0: rel 1663/16506/190024 fork vm blk 0, blkref #1: rel 1663/16506/190024 blk 0 rmgr: Heap2 len (rec/tot): 59/59, tx: 0, lsn: 44B/71194D40, prev 44B/71194CB0, desc: VISIBLE cutoff xid 752737874 flags 1, blkref #0: rel 1663/16506/190024 fork vm blk 0, blkref #1: rel 1663/16506/190024 blk 0 === $ grep 190005 wals-text | grep VISIBLE === rmgr: Heap2 len (rec/tot): 64/ 8256, tx: 0, lsn: 44B/651F3728, prev 44B/651F36C8, desc: VISIBLE cutoff xid 752732355 flags 1, blkref #0: rel 1663/16506/190005 fork vm blk 0 FPW, blkref #1: rel 1663/16506/190005 blk 0 rmgr: Heap2 len (rec/tot): 59/59, tx: 0, lsn: 44B/6B19AE80, prev 44B/6B19AE20, desc: VISIBLE cutoff xid 752732667 flags 1, blkref #0: rel 1663/16506/190005 fork vm blk 0, blkref #1: rel 1663/16506/190005 blk 0 rmgr: Heap2 len (rec/tot): 59/59, tx: 0, lsn: 44B/6B19CF10, prev 44B/6B19AEC0, desc: VISIBLE cutoff xid 752734752 flags 1, blkref #0: rel 1663/16506/190005 fork vm blk 0, blkref #1: rel 1663/16506/190005 blk 1 rmgr: Heap2 len (rec/tot): 59/59, tx: 0, lsn: 44B/711954D8, prev 44B/71195478, desc: VISIBLE cutoff xid 752737869 flags 1, blkref #0: rel 1663/16506/190005 fork vm blk 0, blkref #1: rel 1663/16506/190005 blk 0 But I did not find in parsed WALs anything about vm fork for 16891. Is it OK, or is something wrong here? PS: I try to remove all problematic visibility maps, with page LSN bigger than 'Minimum recovery ending location') and start backup without it. Without it postgres started successfully. -- Sergey Burladyan
Running from 9.6 backups sometimes fails with fatal error
Hi! I need help with investigation what happened here. I have two different master servers with standby, version: PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg90+1), compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit and I create backups with pg_basebackup from theirs standbys, but sometimes backup check fails, it just cannot start, for example: server 1: 2019-02-18 14:18:51 MSK [11901-2] LOG: checkpoint starting: end-of-recovery immediate wait 2019-02-18 14:18:51 MSK [11901-3] DEBUG: performing replication slot checkpoint 2019-02-18 14:18:51 MSK [11901-4] LOG: request to flush past end of generated WAL; request 44B/7E5DAB28, currpos 44B/7305B560 2019-02-18 14:18:51 MSK [11901-5] CONTEXT: writing block 0 of relation base/16506/16891_vm 2019-02-18 14:18:51 MSK [11901-6] ERROR: xlog flush request 44B/7E5DAB28 is not satisfied --- flushed only to 44B/7305B560 2019-02-18 14:18:51 MSK [11901-7] CONTEXT: writing block 0 of relation base/16506/16891_vm 2019-02-18 14:18:51 MSK [11858-110] FATAL: checkpoint request failed server 2: 2019-02-07 16:07:40 MSK [7911-37] LOG: checkpoint starting: end-of-recovery immediate wait 2019-02-07 16:07:40 MSK [7911-38] DEBUG: performing replication slot checkpoint 2019-02-07 16:07:41 MSK [7911-39] LOG: request to flush past end of generated WAL; request 1CB/8E94EE90, currpos 1CB/8A195210 2019-02-07 16:07:41 MSK [7911-40] CONTEXT: writing block 0 of relation base/18413/18529_vm 2019-02-07 16:07:41 MSK [7911-41] ERROR: xlog flush request 1CB/8E94EE90 is not satisfied --- flushed only to 1CB/8A195210 2019-02-07 16:07:41 MSK [7911-42] CONTEXT: writing block 0 of relation base/18413/18529_vm 2019-02-07 16:07:41 MSK [7868-665] FATAL: checkpoint request failed backup checked with: PostgreSQL 9.6.11 on x86_64-pc-linux-gnu (Debian 9.6.11-1.pgdg90+1), compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit recovery.conf: restore_command = '/usr/local/bin/restore_cmd /mnt/wals %f %p' recovery_target = 'immediate' postgresql.conf: wal_log_hints = on How can I debug this? -- Sergey Burladyan