On Mon, Jun 10, 2013 at 12:35 PM, Niels Kristian Schjødt < nielskrist...@autouncle.com> wrote:
> > Den 10/06/2013 kl. 16.36 skrev bricklen <brick...@gmail.com>: > > On Mon, Jun 10, 2013 at 4:29 AM, Niels Kristian Schjødt < > nielskrist...@autouncle.com> wrote: > >> >> 2013-06-10 11:21:45 GMT FATAL: could not connect to the primary server: >> could not connect to server: No route to host >> Is the server running on host "192.168.0.4" and accepting >> TCP/IP connections on port 5432? >> > > Did anything get changed on the standby or master around the time this > message started occurring? > On the master, what do the following show? > show port; > show listen_addresses; > > The master's IP is still 192.168.0.4? > > Have you tried connecting to the master using something like: > psql -h 192.168.0.4 -p 5432 -U postgres -d postgres > > Does that throw a useful error or warning? > > > > It turned out that the switch port that the server was connected to was > faulty, and hence no successful connection between master and slave was > established. This resolved in pg_xlog building up very fast, because our > system performs a lot of changes on the data we store. > > I ended up running pg_archivecleanup on the master to get some space freed > urgently. Then I got the switch changed with a new one. Now I'm trying to > the streaming replication setup from scratch again, but with no luck. > > I can't seem to figure out which steps I need to do, to get the standby > server wiped and get it started as a streaming replication again from > scratch. I tried to follow the steps, from step 6, in here > http://wiki.postgresql.org/wiki/Streaming_Replication but the process > seems to fail when I reach the point where I try to do a psql -c "SELECT > pg_stop_backup()". It just says: > > NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to > be archived > WARNING: pg_stop_backup still waiting for all required WAL segments to be > archived (60 seconds elapsed) > HINT: Check that your archive_command is executing properly. > pg_stop_backup can be canceled safely, but the database backup will not be > usable without all the WAL segments. > (...) > > When looking at ps aux on the master, I see the following: > > postgres 30930 0.0 0.0 98412 1632 ? Ss 15:59 0:02 postgres: > archiver process failed on 0000000200000E1B000000A9 > > The file mentioned is the one that it was about to archive, when the > standby server failed. Somehow it must still be trying to "catch up" from > that file which of cause isn't there any more, since I had to remove those > in order to get more space on the HDD. Instead of trying to catch up from > the last succeeded file, I want it to start over from scratch with the > replication - I just don't know how. > > That is because you manually removed some xlog, and you shouldn't ever do that. To "cancel" the archiving, the better way (IMHO) is to set archive_command to a dummy command, like: archive_command = '/bin/true' And reload PostgreSQL: psql -c "SELECT pg_reload_conf()" With that, PostgreSQL will stop archiving, and so you'll **be with no backup at all**. With some archives removed, you can use your old archive_command again and reload the server. BTW, check why the archive_command is not working properly (look at PG's log files). Is it because of no space left on disk? If so, removing some may work. Regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres