Hi guys,

Besides command run(like pg_ctl) can cause "received immediate shutdown
request"  any other reason can cause this please?

This production DB, support colleague said didn't run it

Yi Sun <yina...@gmail.com> 于2020年11月18日周三 上午11:54写道:

> Hi all
>
> There are 3 nodes in our prd db in patroni cluster, vm01 is leader, vm02
> and vm03 are standby, vm01 received immediate shutdown request  caused
> failover to vm02, after that vm03 received fast shutdown request
>
> As vm03 not in cluster so have to reinit vm03
>
> What's the possible root caused vm01 received immediate shutdown request
> please?
>
> and What's the possible root caused vm03 received fast shutdown request
>
> Below are detail info, please check, if need any other info, I will
> provide, thanks
>
> postgresql version: 11.4
> OS: centos 7
> HA: patroni upgraded from 1.6.3 to 2.0.1 these days
>
> --vm01 pg log
>
> [2020-11-16 12:03:56.592 UTC] p=31485:3@ c=
> pgwatch2@127.0.0.1/eu4_baas_bckp_epmgr:pgwatch2 LOG:  disconnection:
> session time: 0:00:00.037 user=pgwatch2 database=eu4_baas_bckp_epmgr
> host=127.0.0.1 port=36916
> [2020-11-16 12:03:57.005 UTC] p=32103:14@ c=@/: LOG:  received immediate
> shutdown request
> [2020-11-16 12:03:57.017 UTC] p=31189:3@314/0 c=
> atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown]
> <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D>
> WARNING:  terminating connection because of crash of another server process
> [2020-11-16 12:03:57.017 UTC] p=31189:4@314/0 c=
> atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown]
> <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D>
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> [2020-11-16 12:03:57.017 UTC] p=31189:5@314/0 c=
> atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown]
> <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D>
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> [2020-11-16 12:03:57.017 UTC] p=31183:3@313/0 c=
> atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown]
> <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> WARNING:
>  terminating connection because of crash of another server process
> [2020-11-16 12:03:57.017 UTC] p=31183:4@313/0 c=
> atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown]
> <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process exited abnormally and
> possibly corrupted shared memory.
> [2020-11-16 12:03:57.017 UTC] p=31183:5@313/0 c=
> atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown]
> <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> HINT:  In a
> moment you should be able to reconnect to the database and repeat your
> command.
> [2020-11-16 12:03:57.017 UTC] p=31182:3@310/281059 c=
> bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown]
> <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D>
> WARNING:  terminating connection because of crash of another server process
> [2020-11-16 12:03:57.017 UTC] p=31182:4@310/281059 c=
> bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown]
> <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D>
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> [2020-11-16 12:03:57.017 UTC] p=31182:5@310/281059 c=
> bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown]
> <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D> HINT:
>  In a moment you should be able to reconnect to the database and repeat
> your command.
>
> --vm01 patroni log
> Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:56,922", "name": "patroni.dcs.consul", "process": 32085,
> "thread": 140350381303616, "level": "WARNING", "message": "Could not
> register service: unknown role type promoted"}
> Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085,
> "thread": 140350381303616, "level": "INFO", "message": "Lock owner:
> eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
> Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085,
> "thread": 140350381303616, "level": "INFO", "message": "does not have lock"}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:59,158", "name": "patroni.dcs.consul", "process": 32085,
> "thread": 140350381303616, "level": "INFO", "message": "Deregister service
> pgcluster11/eu4-baas-patroni-cluster-vm01"}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:59,180", "name": "patroni", "process": 32085, "thread":
> 140350381303616, "level": "INFO", "message": "demoting self because i do
> not have the lock and i was a leader"}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:59,181", "name": "patroni", "process": 32085, "thread":
> 140350381303616, "level": "WARNING", "message": "Loop time exceeded,
> rescheduling immediately."}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:59,184", "name": "patroni.ha", "process": 32085,
> "thread": 140350381303616, "level": "INFO", "message": "Lock owner:
> eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:59,185", "name": "patroni", "process": 32085, "thread":
> 140350381303616, "level": "INFO", "message": "starting after demotion in
> progress"}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:59,186", "name": "patroni.postgresql.connection",
> "process": 32085, "thread": 140349984724736, "level": "INFO", "message":
> "closed patroni connection to the postgresql cluster"}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:59,668", "name": "patroni.postgresql.postmaster",
> "process": 32085, "thread": 140349984724736, "level": "INFO", "message":
> "postmaster pid=31495"}
> Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]:
> localhost:5432 - no response
>
> --vm03 pg log
> [2020-11-16 12:03:53.484 UTC] p=24090:1@
> c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1
> port=39442
> [2020-11-16 12:03:53.485 UTC] p=24090:2@9/1878954 
> c=replicator@::1/[unknown]:[unknown]
> LOG:  replication connection authorized: user=replicator
> [2020-11-16 12:03:53.487 UTC] p=24090:3@ c=replicator@::1/[unknown]:[unknown]
> LOG:  disconnection: session time: 0:00:00.002 user=replicator database=
> host=::1 port=39442
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> [2020-11-16 12:03:57.108 UTC] p=27711:6@ c=@/: FATAL:  could not receive
> data from WAL stream: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> [2020-11-16 12:03:57.144 UTC] p=27701:7@1/0 c=@/: LOG:  invalid record
> length at E052/93EED170: wanted 24, got 0
> [2020-11-16 12:03:57.419 UTC] p=24097:1@ c=@/: FATAL:  could not connect
> to the primary server: FATAL:  the database system is shutting down
>
> [2020-11-16 12:05:38.581 UTC] p=25380:2@ c=postgres@::1/postgres:[unknown]
> FATAL:  the database system is starting up
> [2020-11-16 12:05:41.010 UTC] p=24113:4@1/0 c=@/: LOG:  consistent
> recovery state reached at E052/93EED170
> [2020-11-16 12:05:41.010 UTC] p=24113:5@1/0 c=@/: LOG:  invalid record
> length at E052/93EED170: wanted 24, got 0
> [2020-11-16 12:05:41.011 UTC] p=24109:8@ c=@/: LOG:  database system is
> ready to accept read only connections
> [2020-11-16 12:05:43.280 UTC] p=25400:1@
> c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1
> port=40922
> [2020-11-16 12:05:43.392 UTC] p=25401:1@
> c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1
> port=40924
> [2020-11-16 12:05:43.393 UTC] p=25401:2@2/2 c=postgres@::1/postgres:[unknown]
> LOG:  connection authorized: user=postgres database=postgres
> [2020-11-16 12:05:43.580 UTC] p=25402:1@
> c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1
> port=40926
> [2020-11-16 12:05:43.628 UTC] p=25402:2@3/1 
> c=replicator@::1/[unknown]:[unknown]
> LOG:  replication connection authorized: user=replicator
> [2020-11-16 12:05:43.630 UTC] p=25402:3@ c=replicator@::1/[unknown]:[unknown]
> LOG:  disconnection: session time: 0:00:00.049 user=replicator database=
> host=::1 port=40926
> [2020-11-16 12:05:43.647 UTC] p=25406:1@
> c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1
> port=40932
> [2020-11-16 12:05:43.648 UTC] p=25406:2@3/2 
> c=replicator@::1/[unknown]:[unknown]
> LOG:  replication connection authorized: user=replicator
> [2020-11-16 12:05:43.650 UTC] p=25406:3@ c=replicator@::1/[unknown]:[unknown]
> LOG:  disconnection: session time: 0:00:00.002 user=replicator database=
> host=::1 port=40932
> [2020-11-16 12:05:44.006 UTC] p=24109:9@ c=@/: LOG:  received fast
> shutdown request
> [2020-11-16 12:05:44.015 UTC] p=24109:10@ c=@/: LOG:  aborting any active
> transactions
> [2020-11-16 12:05:44.015 UTC] p=25401:3@2/0 c=postgres@::1/postgres:Patroni
> FATAL:  terminating connection due to administrator command
> [2020-11-16 12:05:44.015 UTC] p=25401:4@ c=postgres@::1/postgres:Patroni
> LOG:  disconnection: session time: 0:00:00.622 user=postgres
> database=postgres host=::1 port=40924
> [2020-11-16 12:05:44.509 UTC] p=24120:1@ c=@/: LOG:  shutting down
> [2020-11-16 12:05:45.621 UTC] p=24109:11@ c=@/: LOG:  database system is
> shut down
>
> --vm03 patroni log
> Nov 16 12:03:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:03:51,436", "name": "patroni", "process": 27664, "thread":
> 140371489281856, "level": "INFO", "message": "following a different leader
> because i am not the healthiest node"}
>
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]:
> pg_controldata: could not read file
> "/var/lib/pgsql/11/data/global/pg_control": read 0 of 288
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:05:51,016", "name": "patroni.postgresql", "process": 27664,
> "thread": 140371489281856, "level": "ERROR", "message": "Error when calling
> pg_controldata"}
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: Traceback
> (most recent call last):
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File
> "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py",
> line 718, in controldata
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: data =
> subprocess.check_output([self.pgcommand('pg_controldata'), self._data_dir],
> env=env)
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File
> "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]:
> **kwargs).stdout
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File
> "/usr/lib64/python3.6/subprocess.py", line 438, in run
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]:
> output=stdout, stderr=stderr)
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]:
> subprocess.CalledProcessError: Command
> '['/usr/pgsql-11/bin/pg_controldata', '/var/lib/pgsql/11/data']' returned
> non-zero exit status 1.
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:05:51,018", "name": "patroni.ha", "process": 27664,
> "thread": 140371489281856, "level": "INFO", "message": "pg_controldata:
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: "}
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664,
> "thread": 140371489281856, "level": "INFO", "message": "Lock owner:
> eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"}
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664,
> "thread": 140371489281856, "level": "INFO", "message": "Lock owner:
> eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"}
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:05:51,031", "name": "patroni.dcs.consul", "process": 27664,
> "thread": 140371489281856, "level": "INFO", "message": "Deregister service
> pgcluster11/eu4-baas-patroni-cluster-vm03"}
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:05:51,048", "name": "patroni", "process": 27664, "thread":
> 140371489281856, "level": "INFO", "message": "starting as a secondary"}
> Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time":
> "2020-11-16 12:05:51,049", "name": "patroni.postgresql.connection",
> "process": 27664, "thread": 140371151345408, "level": "INFO", "message":
> "closed patroni connection to the postgresql cluster"}
>
>
> Best regards
> Oliver Sun
>
>
>
>
>
>
>
>

Reply via email to