Re: [PERFORM] URGENT issue: pg-xlog growing on master!

Matheus de Oliveira Mon, 10 Jun 2013 11:01:15 -0700

On Mon, Jun 10, 2013 at 12:35 PM, Niels Kristian Schjødt <
[email protected]> wrote:


>
> Den 10/06/2013 kl. 16.36 skrev bricklen <[email protected]>:
>
> On Mon, Jun 10, 2013 at 4:29 AM, Niels Kristian Schjødt <
> [email protected]> wrote:
>
>>
>> 2013-06-10 11:21:45 GMT FATAL:  could not connect to the primary server:
>> could not connect to server: No route to host
>>                 Is the server running on host "192.168.0.4" and accepting
>>                 TCP/IP connections on port 5432?
>>
>
> Did anything get changed on the standby or master around the time this
> message started occurring?
> On the master, what do the following show?
> show port;
> show listen_addresses;
>
> The master's IP is still 192.168.0.4?
>
> Have you tried connecting to the master using something like:
> psql -h 192.168.0.4 -p 5432 -U postgres -d postgres
>
> Does that throw a useful error or warning?
>
>
>
> It turned out that the switch port that the server was connected to was
> faulty, and hence no successful connection between master and slave was
> established. This resolved in pg_xlog building up very fast, because our
> system performs a lot of changes on the data we store.
>
> I ended up running pg_archivecleanup on the master to get some space freed
> urgently. Then I got the switch changed with a new one. Now I'm trying to
> the streaming replication setup from scratch again, but with no luck.
>
> I can't seem to figure out which steps I need to do, to get the standby
> server wiped and get it started as a streaming replication again from
> scratch. I tried to follow the steps, from step 6, in here
> http://wiki.postgresql.org/wiki/Streaming_Replication but the process
> seems to fail when I reach the point where I try to do a psql -c "SELECT
> pg_stop_backup()". It just says:
>
> NOTICE:  pg_stop_backup cleanup done, waiting for required WAL segments to
> be archived
> WARNING:  pg_stop_backup still waiting for all required WAL segments to be
> archived (60 seconds elapsed)
> HINT:  Check that your archive_command is executing properly.
>  pg_stop_backup can be canceled safely, but the database backup will not be
> usable without all the WAL segments.
> (...)
>
> When looking at ps aux on the master, I see the following:
>
> postgres 30930  0.0  0.0  98412  1632 ?        Ss   15:59   0:02 postgres:
> archiver process   failed on 0000000200000E1B000000A9
>
> The file mentioned is the one that it was about to archive, when the
> standby server failed. Somehow it must still be trying to "catch up" from
> that file which of cause isn't there any more, since I had to remove those
> in order to get more space on the HDD. Instead of trying to catch up from
> the last succeeded file, I want it to start over from scratch with the
> replication - I just don't know how.
>
>
That is because you manually removed some xlog, and you shouldn't ever do
that. To "cancel" the archiving, the better way (IMHO) is to set
archive_command to a dummy command, like:

    archive_command = '/bin/true'

And reload PostgreSQL:

    psql -c "SELECT pg_reload_conf()"

With that, PostgreSQL will stop archiving, and so you'll **be with no
backup at all**. With some archives removed, you can use your old
archive_command again and reload the server.

BTW, check why the archive_command is not working properly (look at PG's
log files). Is it because of no space left on disk? If so, removing some
may work.

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: [PERFORM] URGENT issue: pg-xlog growing on master!

Reply via email to