Hi guys, Besides command run(like pg_ctl) can cause "received immediate shutdown request" any other reason can cause this please?
This production DB, support colleague said didn't run it Yi Sun <yina...@gmail.com> 于2020年11月18日周三 上午11:54写道: > Hi all > > There are 3 nodes in our prd db in patroni cluster, vm01 is leader, vm02 > and vm03 are standby, vm01 received immediate shutdown request caused > failover to vm02, after that vm03 received fast shutdown request > > As vm03 not in cluster so have to reinit vm03 > > What's the possible root caused vm01 received immediate shutdown request > please? > > and What's the possible root caused vm03 received fast shutdown request > > Below are detail info, please check, if need any other info, I will > provide, thanks > > postgresql version: 11.4 > OS: centos 7 > HA: patroni upgraded from 1.6.3 to 2.0.1 these days > > --vm01 pg log > > [2020-11-16 12:03:56.592 UTC] p=31485:3@ c= > pgwatch2@127.0.0.1/eu4_baas_bckp_epmgr:pgwatch2 LOG: disconnection: > session time: 0:00:00.037 user=pgwatch2 database=eu4_baas_bckp_epmgr > host=127.0.0.1 port=36916 > [2020-11-16 12:03:57.005 UTC] p=32103:14@ c=@/: LOG: received immediate > shutdown request > [2020-11-16 12:03:57.017 UTC] p=31189:3@314/0 c= > atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] > <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D> > WARNING: terminating connection because of crash of another server process > [2020-11-16 12:03:57.017 UTC] p=31189:4@314/0 c= > atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] > <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D> > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. > [2020-11-16 12:03:57.017 UTC] p=31189:5@314/0 c= > atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] > <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D> > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > [2020-11-16 12:03:57.017 UTC] p=31183:3@313/0 c= > atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] > <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> WARNING: > terminating connection because of crash of another server process > [2020-11-16 12:03:57.017 UTC] p=31183:4@313/0 c= > atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] > <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> DETAIL: The > postmaster has commanded this server process to roll back the current > transaction and exit, because another server process exited abnormally and > possibly corrupted shared memory. > [2020-11-16 12:03:57.017 UTC] p=31183:5@313/0 c= > atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] > <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> HINT: In a > moment you should be able to reconnect to the database and repeat your > command. > [2020-11-16 12:03:57.017 UTC] p=31182:3@310/281059 c= > bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] > <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D> > WARNING: terminating connection because of crash of another server process > [2020-11-16 12:03:57.017 UTC] p=31182:4@310/281059 c= > bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] > <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D> > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. > [2020-11-16 12:03:57.017 UTC] p=31182:5@310/281059 c= > bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] > <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D> HINT: > In a moment you should be able to reconnect to the database and repeat > your command. > > --vm01 patroni log > Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:56,922", "name": "patroni.dcs.consul", "process": 32085, > "thread": 140350381303616, "level": "WARNING", "message": "Could not > register service: unknown role type promoted"} > Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085, > "thread": 140350381303616, "level": "INFO", "message": "Lock owner: > eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"} > Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085, > "thread": 140350381303616, "level": "INFO", "message": "does not have lock"} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:59,158", "name": "patroni.dcs.consul", "process": 32085, > "thread": 140350381303616, "level": "INFO", "message": "Deregister service > pgcluster11/eu4-baas-patroni-cluster-vm01"} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:59,180", "name": "patroni", "process": 32085, "thread": > 140350381303616, "level": "INFO", "message": "demoting self because i do > not have the lock and i was a leader"} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:59,181", "name": "patroni", "process": 32085, "thread": > 140350381303616, "level": "WARNING", "message": "Loop time exceeded, > rescheduling immediately."} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:59,184", "name": "patroni.ha", "process": 32085, > "thread": 140350381303616, "level": "INFO", "message": "Lock owner: > eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:59,185", "name": "patroni", "process": 32085, "thread": > 140350381303616, "level": "INFO", "message": "starting after demotion in > progress"} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:59,186", "name": "patroni.postgresql.connection", > "process": 32085, "thread": 140349984724736, "level": "INFO", "message": > "closed patroni connection to the postgresql cluster"} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": > "2020-11-16 12:03:59,668", "name": "patroni.postgresql.postmaster", > "process": 32085, "thread": 140349984724736, "level": "INFO", "message": > "postmaster pid=31495"} > Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: > localhost:5432 - no response > > --vm03 pg log > [2020-11-16 12:03:53.484 UTC] p=24090:1@ > c=[unknown]@::1/[unknown]:[unknown] LOG: connection received: host=::1 > port=39442 > [2020-11-16 12:03:53.485 UTC] p=24090:2@9/1878954 > c=replicator@::1/[unknown]:[unknown] > LOG: replication connection authorized: user=replicator > [2020-11-16 12:03:53.487 UTC] p=24090:3@ c=replicator@::1/[unknown]:[unknown] > LOG: disconnection: session time: 0:00:00.002 user=replicator database= > host=::1 port=39442 > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > [2020-11-16 12:03:57.108 UTC] p=27711:6@ c=@/: FATAL: could not receive > data from WAL stream: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > [2020-11-16 12:03:57.144 UTC] p=27701:7@1/0 c=@/: LOG: invalid record > length at E052/93EED170: wanted 24, got 0 > [2020-11-16 12:03:57.419 UTC] p=24097:1@ c=@/: FATAL: could not connect > to the primary server: FATAL: the database system is shutting down > > [2020-11-16 12:05:38.581 UTC] p=25380:2@ c=postgres@::1/postgres:[unknown] > FATAL: the database system is starting up > [2020-11-16 12:05:41.010 UTC] p=24113:4@1/0 c=@/: LOG: consistent > recovery state reached at E052/93EED170 > [2020-11-16 12:05:41.010 UTC] p=24113:5@1/0 c=@/: LOG: invalid record > length at E052/93EED170: wanted 24, got 0 > [2020-11-16 12:05:41.011 UTC] p=24109:8@ c=@/: LOG: database system is > ready to accept read only connections > [2020-11-16 12:05:43.280 UTC] p=25400:1@ > c=[unknown]@::1/[unknown]:[unknown] LOG: connection received: host=::1 > port=40922 > [2020-11-16 12:05:43.392 UTC] p=25401:1@ > c=[unknown]@::1/[unknown]:[unknown] LOG: connection received: host=::1 > port=40924 > [2020-11-16 12:05:43.393 UTC] p=25401:2@2/2 c=postgres@::1/postgres:[unknown] > LOG: connection authorized: user=postgres database=postgres > [2020-11-16 12:05:43.580 UTC] p=25402:1@ > c=[unknown]@::1/[unknown]:[unknown] LOG: connection received: host=::1 > port=40926 > [2020-11-16 12:05:43.628 UTC] p=25402:2@3/1 > c=replicator@::1/[unknown]:[unknown] > LOG: replication connection authorized: user=replicator > [2020-11-16 12:05:43.630 UTC] p=25402:3@ c=replicator@::1/[unknown]:[unknown] > LOG: disconnection: session time: 0:00:00.049 user=replicator database= > host=::1 port=40926 > [2020-11-16 12:05:43.647 UTC] p=25406:1@ > c=[unknown]@::1/[unknown]:[unknown] LOG: connection received: host=::1 > port=40932 > [2020-11-16 12:05:43.648 UTC] p=25406:2@3/2 > c=replicator@::1/[unknown]:[unknown] > LOG: replication connection authorized: user=replicator > [2020-11-16 12:05:43.650 UTC] p=25406:3@ c=replicator@::1/[unknown]:[unknown] > LOG: disconnection: session time: 0:00:00.002 user=replicator database= > host=::1 port=40932 > [2020-11-16 12:05:44.006 UTC] p=24109:9@ c=@/: LOG: received fast > shutdown request > [2020-11-16 12:05:44.015 UTC] p=24109:10@ c=@/: LOG: aborting any active > transactions > [2020-11-16 12:05:44.015 UTC] p=25401:3@2/0 c=postgres@::1/postgres:Patroni > FATAL: terminating connection due to administrator command > [2020-11-16 12:05:44.015 UTC] p=25401:4@ c=postgres@::1/postgres:Patroni > LOG: disconnection: session time: 0:00:00.622 user=postgres > database=postgres host=::1 port=40924 > [2020-11-16 12:05:44.509 UTC] p=24120:1@ c=@/: LOG: shutting down > [2020-11-16 12:05:45.621 UTC] p=24109:11@ c=@/: LOG: database system is > shut down > > --vm03 patroni log > Nov 16 12:03:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:03:51,436", "name": "patroni", "process": 27664, "thread": > 140371489281856, "level": "INFO", "message": "following a different leader > because i am not the healthiest node"} > > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: > pg_controldata: could not read file > "/var/lib/pgsql/11/data/global/pg_control": read 0 of 288 > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:05:51,016", "name": "patroni.postgresql", "process": 27664, > "thread": 140371489281856, "level": "ERROR", "message": "Error when calling > pg_controldata"} > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: Traceback > (most recent call last): > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File > "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", > line 718, in controldata > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: data = > subprocess.check_output([self.pgcommand('pg_controldata'), self._data_dir], > env=env) > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File > "/usr/lib64/python3.6/subprocess.py", line 356, in check_output > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: > **kwargs).stdout > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File > "/usr/lib64/python3.6/subprocess.py", line 438, in run > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: > output=stdout, stderr=stderr) > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: > subprocess.CalledProcessError: Command > '['/usr/pgsql-11/bin/pg_controldata', '/var/lib/pgsql/11/data']' returned > non-zero exit status 1. > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:05:51,018", "name": "patroni.ha", "process": 27664, > "thread": 140371489281856, "level": "INFO", "message": "pg_controldata: > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: "} > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664, > "thread": 140371489281856, "level": "INFO", "message": "Lock owner: > eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"} > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664, > "thread": 140371489281856, "level": "INFO", "message": "Lock owner: > eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"} > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:05:51,031", "name": "patroni.dcs.consul", "process": 27664, > "thread": 140371489281856, "level": "INFO", "message": "Deregister service > pgcluster11/eu4-baas-patroni-cluster-vm03"} > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:05:51,048", "name": "patroni", "process": 27664, "thread": > 140371489281856, "level": "INFO", "message": "starting as a secondary"} > Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": > "2020-11-16 12:05:51,049", "name": "patroni.postgresql.connection", > "process": 27664, "thread": 140371151345408, "level": "INFO", "message": > "closed patroni connection to the postgresql cluster"} > > > Best regards > Oliver Sun > > > > > > > >