Re: [GENERAL] Error stopping postgresql service on a standby server.

2012-09-05 Thread Dipti Bharvirkar
Hi,

Has anyone encountered this issue? Why would the WAL receiver process not
stop when postmaster is shutdown?
Any suggestions on how to avoid running into this error or ways to recover
from it?

Thank you in advance for any inputs on this,

Dipti

On Fri, Aug 31, 2012 at 1:17 PM, Dipti Bharvirkar wrote:

 Hi,

 In our project, we use Postgres 9.1.3 version and asynchronous streaming
 replication.
 In recent times, on couple of our setups, we saw issues stopping Postgres
 service on the standby server after streaming replication was setup.

 The command service postgresql stop returned with a failure message. We
 use pg_ctl stop -D '$PGDATA' -s -m fast in the Postgres service script to
 stop the server.
 To see if there were some active client connections that were causing a
 failure in stopping Postgres service, I ran the query SELECT * FROM
 pg_stat_activity;.
 It failed with the following error: psql: FATAL:  the database system is
 shutting down

 ps -ef | grep postgres returned the following:
 postgres 14033 1  0 Aug28 ?00:00:01
 /usr/pgsql-9.1/bin/postmaster -p 5432 -D /var/lib/pgsql/data
 postgres 14044 14033  0 Aug28 ?00:00:00 postgres: logger process
 postgres 14046 14033  0 Aug28 ?00:00:00 postgres: writer process
 postgres 14047 14033  0 Aug28 ?00:00:00 postgres: stats collector
 process
 postgres 14912 14033  0 Aug28 ?00:00:00 postgres: wal receiver
 process
 root 31519  3003  0 06:18 pts/200:00:00 grep postgres

 netstat -anp | grep 5432 returns the following:
 tcp0  0 0.0.0.0:54320.0.0.0:*
   LISTEN  14033/postmaster
 tcp0  0 127.0.0.1:5432  127.0.0.1:60597
   TIME_WAIT   -
 tcp0  0 127.0.0.1:5432  127.0.0.1:60589
   TIME_WAIT   -
 tcp67288  0 1.1.1.1:61500 http://47.11.49.176:61500  2
 .2.2.2:5432 http://47.11.49.190:5432   ESTABLISHED
 14912/postgres: wal

 I had a few queries based on some of the observations -

1. On one of the setups where similar issue was observed, we stopped
Postgres service on the master server. As a result of this, the sender
process on the master server and consequently the receiver process on
standby stopped. After this, Postgres service could successfully be stopped
on the standby server. This fact coupled with the output of the two
commands mentioned above makes me believe that it is the wal receiver
process that is not getting terminated because of which the Postgres
service on standby server does not stop. Is this assumption right?
2. If yes, what could be the possible cause for the receiver process
to not terminate? Shouldn't it stop gracefully when a shutdown command is
received? When the issue occurred, we had minimal activity on the master
server. There were no long running transactions being committed to the
master and streamed to the standby when the issue occurred. Even if there
were, could it cause the receiver process to not terminate?
3. How can we avoid running into this issue? Could we be missing some
step that is essential for a graceful shutdown of the service on a standby?
4. On one setup where the issue was seen, since -m fast option with
pg_ctl stop did not help in stopping the service, I used the -m
immediate option. The service stopped (I understand that this option
aborts the processes without a clean shutdown and so is not a safe option).
The service would not start back up. We saw the invalid record length
error during the startup (I guess this was expected since it wasn't a clean
shutdown). A pg_resetxlog helped recover from this issue. However, that
seems risky too since there is a chance of data inconsistency. What is the
best way to recover from this error if it occurs again?

 Thanks,
 Dipti



[GENERAL] Error stopping postgresql service on a standby server.

2012-08-31 Thread Dipti Bharvirkar
Hi,

In our project, we use Postgres 9.1.3 version and asynchronous streaming
replication.
In recent times, on couple of our setups, we saw issues stopping Postgres
service on the standby server after streaming replication was setup.

The command service postgresql stop returned with a failure message. We
use pg_ctl stop -D '$PGDATA' -s -m fast in the Postgres service script to
stop the server.
To see if there were some active client connections that were causing a
failure in stopping Postgres service, I ran the query SELECT * FROM
pg_stat_activity;.
It failed with the following error: psql: FATAL:  the database system is
shutting down

ps -ef | grep postgres returned the following:
postgres 14033 1  0 Aug28 ?00:00:01
/usr/pgsql-9.1/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 14044 14033  0 Aug28 ?00:00:00 postgres: logger process
postgres 14046 14033  0 Aug28 ?00:00:00 postgres: writer process
postgres 14047 14033  0 Aug28 ?00:00:00 postgres: stats collector
process
postgres 14912 14033  0 Aug28 ?00:00:00 postgres: wal receiver
process
root 31519  3003  0 06:18 pts/200:00:00 grep postgres

netstat -anp | grep 5432 returns the following:
tcp0  0 0.0.0.0:54320.0.0.0:*
LISTEN  14033/postmaster
tcp0  0 127.0.0.1:5432  127.0.0.1:60597
TIME_WAIT   -
tcp0  0 127.0.0.1:5432  127.0.0.1:60589
TIME_WAIT   -
tcp67288  0 1.1.1.1:61500 http://47.11.49.176:61500  2
.2.2.2:5432 http://47.11.49.190:5432   ESTABLISHED
14912/postgres: wal

I had a few queries based on some of the observations -

   1. On one of the setups where similar issue was observed, we stopped
   Postgres service on the master server. As a result of this, the sender
   process on the master server and consequently the receiver process on
   standby stopped. After this, Postgres service could successfully be stopped
   on the standby server. This fact coupled with the output of the two
   commands mentioned above makes me believe that it is the wal receiver
   process that is not getting terminated because of which the Postgres
   service on standby server does not stop. Is this assumption right?
   2. If yes, what could be the possible cause for the receiver process to
   not terminate? Shouldn't it stop gracefully when a shutdown command is
   received? When the issue occurred, we had minimal activity on the master
   server. There were no long running transactions being committed to the
   master and streamed to the standby when the issue occurred. Even if there
   were, could it cause the receiver process to not terminate?
   3. How can we avoid running into this issue? Could we be missing some
   step that is essential for a graceful shutdown of the service on a standby?
   4. On one setup where the issue was seen, since -m fast option with
   pg_ctl stop did not help in stopping the service, I used the -m
   immediate option. The service stopped (I understand that this option
   aborts the processes without a clean shutdown and so is not a safe option).
   The service would not start back up. We saw the invalid record length
   error during the startup (I guess this was expected since it wasn't a clean
   shutdown). A pg_resetxlog helped recover from this issue. However, that
   seems risky too since there is a chance of data inconsistency. What is the
   best way to recover from this error if it occurs again?

Thanks,
Dipti