Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common
Hi Craig, Anyway, this OS is guess OS in vmware (vsphere). Thank for your response and help. On Tue, Oct 3, 2017 at 8:49 PM, Craig Ringerwrote: > > > Can you get stacks please? > > Use -g > # Events: 2K cpu-clock # # Overhead Command Shared ObjectSymbol # . # 86.96% postgres [kernel.kallsyms] [k] __mutex_lock_common.isra.5 | --- __mutex_lock_common.isra.5 read 2.85% postgres [kernel.kallsyms] [k] do_raw_spin_lock | --- do_raw_spin_lock | |--90.48%-- read | |--8.33%-- recv | --1.19%-- write 2.44% postgres [kernel.kallsyms] [k] mutex_unlock | --- mutex_unlock read 2.03% postgres [kernel.kallsyms] [k] arch_local_irq_restore | --- arch_local_irq_restore read 1.32% postgres postgres [.] ValidXLogRecord | --- ValidXLogRecord 1.25% postgres [kernel.kallsyms] [k] mutex_lock | --- mutex_lock read -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
[GENERAL] BDR, wal sender, high system cpu, mutex_lock_common
Hi all, I've an environment 9.4 + bdr: PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit kernel version: 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_64 GNU/Linux This is consolidation databases, in this machine there are around 250+ wal sender processes. top output revealed high system cpu: %Cpu(s): 1.4 us, 49.7 sy, 0.0 ni, 48.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st profiling cpu with perf: perf top -e cpu-clock Events: 142K cpu-clock 82.37% [kernel][k] __mutex_lock_common.isra.5 4.49% [kernel][k] do_raw_spin_lock 2.23% [kernel][k] mutex_lock 2.16% [kernel][k] mutex_unlock 2.12% [kernel][k] arch_local_irq_restore 1.73% postgres[.] ValidXLogRecord 0.87% [kernel][k] __mutex_unlock_slowpath 0.78% [kernel][k] arch_local_irq_enable 0.63% [kernel][k] sys_recvfrom finally get which processes (wal senders) that are using mutexes: perf top -e task-clock -p 55382 Events: 697 task-clock 88.08% [kernel] [k] __mutex_lock_common.isra.5 3.27% [kernel] [k] do_raw_spin_lock 2.34% [kernel] [k] arch_local_irq_restore 2.10% postgres [.] ValidXLogRecord 1.87% [kernel] [k] mutex_unlock 1.87% [kernel] [k] mutex_lock 0.47% [kernel] [k] sys_recvfrom strace output of wal sender process: % time seconds usecs/call callserrors syscall -- --- --- - - 98.301.030072 5213063201463 read 1.690.017686 0201464201464 recvfrom 0.010.000110 0 806 lseek 0.000.43 0 474 468 rt_sigreturn 0.000.00 0 6 open 0.000.00 0 6 close -- --- --- - - 100.001.047911415819403395 total strace detail, majority read from pipe and recvfrom from socket (but most of them are EAGAIN): read(15, "~\320\5\0\1\0\0\0\0@\235\1\360\16\0\0\334\26\0\0\0\0\0\0\365\27\0\0\0\0\0\0"..., 8192) = 8192 <0.25> read(6, 0x7fffdd837b3f, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000116> recvfrom(10, 0x7fffdd837b17, 1, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.49> ls -l /proc/62388/fd/15 lr-x-- 1 postgres postgres 64 Oct 1 08:39 /proc/62388/fd/15 -> /data/pg_xlog/00010EF00061 ls -l /proc/62388/fd/6 lr-x-- 1 postgres postgres 64 Oct 1 08:39 /proc/62388/fd/6 -> pipe:[1090892506] ls -l /proc/62388/fd/10 lrwx-- 1 postgres postgres 64 Oct 1 08:39 /proc/62388/fd/10 -> socket:[1096584060] I wonder, is there kernel version has better handling mutexes? Or is it the expected behavior? Sorry for cross-posting, I have posted the same on pgsql-performance too -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
Re: [GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory
Hi Craig, So, is it safe to drop those list from this query output? select riname from pg_replication_identifier where riname not in (select external_id from pg_replication_identifier_progress); I cannot read pg_get_replication_identifier_progress function, is it likely c function? On Fri, Sep 15, 2017 at 11:14 AM, Craig Ringer <cr...@2ndquadrant.com> wrote: > On 15 September 2017 at 11:46, milist ujang <ujang.mil...@gmail.com> > wrote: > >> Hi Craig, >> >> Thanks again for pointing to inactive replication slot. >> After inactive replication slot been dropped, the relfrozenxid now moving. >> >> I wonder if replication identifier will have some issue if left >> un-chained? since at other side there are inactive replication identifier. >> > > No, that's harmless. > > However, if there's still an "other side" at all, you presumably have > broken replication. > > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
Re: [GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory
Hi Craig, Thanks again for pointing to inactive replication slot. After inactive replication slot been dropped, the relfrozenxid now moving. I wonder if replication identifier will have some issue if left un-chained? since at other side there are inactive replication identifier. On Fri, Sep 15, 2017 at 9:36 AM, Craig Ringerwrote: > > > Do you have any idle/old replication slots, perhaps from failed node joins > or abandoned nodes not properly parted? > > SELECT * > FROM pg_replication_slots; > > Also check > > SELECT oid,* > FROM pg_database; > > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
[GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory
HI list, I have a database with bdr environment which keep alerting these messages in log file: HINT: Close open transactions soon to avoid wraparound problems. WARNING: oldest xmin is far in the past Querying pg_stat_activity where state='active'; datname | template1 query| autovacuum: VACUUM pg_catalog.pg_depend (to prevent wraparound) datname | template1 query| autovacuum: VACUUM pg_toast.pg_toast_1255 (to prevent wraparound) datname | template1 query| autovacuum: VACUUM pg_catalog.pg_ts_parser (to prevent wraparound) SELECT pg_namespace.nspname ,c.relname AS relname ,greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age ,c.relfrozenxid ,t.relfrozenxid FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid LEFT JOIN pg_namespace ON pg_namespace.oid = c.relnamespace WHERE c.relkind = 'r' and c.relname='pg_depend'; -[ RECORD 1 ]+--- nspname | pg_catalog relname | pg_depend age | 1165907267 relfrozenxid | 102246720 relfrozenxid | Trying to vacuum manual, but no luck: postgres=# vacuum pg_catalog.pg_depend; WARNING: oldest xmin is far in the past HINT: Close open transactions soon to avoid wraparound problems. VACUUM postgres=# vacuum freeze pg_catalog.pg_depend; WARNING: oldest xmin is far in the past HINT: Close open transactions soon to avoid wraparound problems. VACUUM postgres=# vacuum full pg_catalog.pg_depend; WARNING: oldest xmin is far in the past HINT: Close open transactions soon to avoid wraparound problems. VACUUM Check fozenxid again after vacuum, not moving: SELECT pg_namespace.nspname ,c.relname AS relname ,greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age ,c.relfrozenxid ,t.relfrozenxid FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid LEFT JOIN pg_namespace ON pg_namespace.oid = c.relnamespace WHERE c.relkind = 'r' and c.relname='pg_depend'; -[ RECORD 1 ]+--- nspname | pg_catalog relname | pg_depend age | 1165908742 relfrozenxid | 102246720 relfrozenxid | Searching the similar situation, found this url: https://postgrespro.com/list/thread-id/1556972 but the above url was in streaming replication, not bdr; but symptom is likely same: there are a lot of files inside pg_subtrans directory, many files older than latest cluster restarted. query pg_prepared_xacts, pg_stat_activity, pg_locks with no indication about long running queries or even open transactions. -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
[GENERAL] BDR, limit number of groups in a cluster
Hi all, Based on the docs and look at the processes, it seems 1 wal sender on each node per group. If there is scenario of consolidating many databases (say hundreds) into 1 database (in this central cluster there are hundreds wal sender), what is the limit number of groups? I wonder if anyone has runs such the scenario? -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?
Hi Craig, On Wed, Sep 6, 2017 at 4:07 PM, Craig Ringerwrote: > > You could drop and re-create the replication slot, I guess. But your nodes > would be hopelessly out of sync and need manual resync (with data > replication disabled) of one node vs another. > Thanks for pointing to replication slot. I Simulate the similar situation in dev env by remove the wal segment on node1, when node2 keep inserting into a table, now it perfectly can move forward to latest wal segment, but the difference situation is at node_status. In production node_status is i in node1 but r in node2, where on my dev both nodes keep r , even I waited to let it may change quite long. can I safely update the node_status directy on bdr.bdr_nodes? -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?
Hi Craig On Wed, Sep 6, 2017 at 7:21 AM, Craig Ringerwrote: > > > BDR can, see bdr.skip_changes_upto . > > Unluckily my bdr is 0.9.3 > But PostgreSQL's logical decoding requires a contiguous WAL stream to > maintain a valid catalog_xmin and restart_lsn, so it'll still fail to > advance. So your replication from that node is broken, and you have to part > the node then rejoin. You'll need to manually recover any diverged data. > > Yup, I willing to reconcile data manualy via dblink or else, is it still possible to move wal segment in 0.9.3? I've played these at dev env to simulate, but no luck: # select * from pg_replication_identifier_progress; local_id | external_id | remote_lsn | local_lsn --+++--- 1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950 | 0/3007588 (1 row) # SELECT pg_replication_identifier_setup_replaying_from('bdr_6461744703437035137_1_34424_30406_'); pg_replication_identifier_setup_replaying_from (1 row) # SELECT pg_replication_identifier_setup_tx_origin('0/4288960','2017-09-06'); pg_replication_identifier_setup_tx_origin --- (1 row) # select * from pg_replication_identifier_progress; local_id | external_id | remote_lsn | local_lsn --+++--- 1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950 | 0/3007588 (1 row) -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
[GENERAL] BDR, wal segment has been removed, is it possible move forward?
Hi all, due to space issue and high volume transaction, some wal segments removed from pg_xlog on bdr environment. warning log at node1 saying "requested WAL segment . has already been removed" following Connection reset by peer. log at node2 : Sending replication command: START_REPLICATION SLOT ... ... XX000: data stream ended I had played streams and goldengate (oracle product) , that at capture side we can move forward to certain sequence (archivedlog/redolog - wal segment in postgres). So, is it possible to move forward to read recent wal segment in bdr environment? (assume data reconciliation will be done manually). -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab