Hello, my setup is: 1 master 1 synchronous slave (up and running) 2 asynchronous slave (up and running) Now I'm setting up asynchronous slave: first step of this setup is pg_basebackup connecting to master.
Indirections are because it is encapsulated in a script. This script is aimed to serve for automated replication recovery. This script is launched by crm_mon -d daemon. You are right to check without indirections: now I tried to run the command directly from command line without variables- it works fine... This means: no problem in pg_basebackup itself but some problem in environment of process launched from crm_mon daemon. Thanks for your hint (remove indirections). Pg_basebackup works fine when launched from command-line.. Unfortenutely I have still no clue how to solve the issue- su works. I have to find out why pg_basebackup cannot fork when launched from crm_mon. Best Regards, Andrej 2016-04-16 1:17 GMT+02:00 Adrian Klaver <adrian.kla...@aklaver.com>: > On 04/15/2016 03:28 PM, Andrej Vanek wrote: > >> Hello, >> >> I tried to run pg_basebackup. Return value is 1. >> >> How to find out its reason? >> (I suspect that some wal after backup is missing- but how to find >> out the real reason? How to fix it?) >> > > First it is not clear to me where you are taking the backup from, the > master or the standby? > > Second there is a lot of redirection going on. What happens if you run the > pg_basebackup directly (without doing su - postgres ...) and use hardcoded > values instead of shell variables? > > > >> thanks, Andrej >> --------------details: >> environment: CentOS 6.7, postgres 9.5.1 >> ( PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 >> 20120313 (Red Hat 4.4.7-16), 64-bit) >> >> I tried 2 forms of pg_basebackup (-X fetch and -X stream). Both were >> issued from a script: >> # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP} >> -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X fetch" 2>${LOG_FILE}.stderr >> >> ${LOG_FILE} >> # echo $? >> 1 <--------------pg_basebackup failed! >> # cat log.stderr >> # cat /var/log/cluster/geo_repair.log.err >> transaction log start point: 0/E3000028 on timeline 1 >> WARNING: skipping special file "./pg_hba.conf" >> WARNING: skipping special file "./pg_hba.conf.save" >> transaction log end point: 0/E30000F8 >> pg_basebackup: base backup completed <------------------no >> reason for pg_basebackup failure! >> # cp /tmp/pg_hba.conf /tmp/postgresql.conf /pg_data/ >> # su - postgres -c "/usr/pgsql-9.5/bin/pg_ctl -D /pg_data/ start" >> # tail /pg_data/pg_log/postgresql-Fri.log >> `pg_xlog/0000000100000000000000E2' -> >> `../backups/arc/0000000100000000000000E2' >> 2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING: skipping >> special file "./pg_hba.conf" >> 2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING: skipping >> special file "./pg_hba.conf.save" <---------------recorded in >> pg_log on master node and copied by pg_basebackup (note time difference >> between two servers) >> 2016-04-15 23:15:02 CEST:@:[23321] LOG: database system was >> interrupted; last known up at 2016-04-15 23:15:10 CEST >> 2016-04-15 23:15:02 CEST:postgres@postgres:[23329] FATAL: the database >> system is starting up >> 2016-04-15 23:15:03 CEST:@:[23321] LOG: entering standby mode >> 2016-04-15 23:15:03 CEST:@:[23321] LOG: database system was not >> properly shut down; automatic recovery in progress <---------something >> missing from pg_basebackup >> 2016-04-15 23:15:03 CEST:@:[23321] LOG: redo starts at 0/E3000028 >> 2016-04-15 23:15:03 CEST:@:[23321] LOG: consistent recovery state >> reached at 0/E4000000 >> 2016-04-15 23:15:03 CEST:@:[23295] LOG: database system is ready to >> accept read only connections >> 2016-04-15 23:15:03 CEST:@:[23356] LOG: started streaming WAL from >> primary at 0/E4000000 on timeline 1 >> -------second trial >> # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP} >> -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X stream" >> # echo $? >> 1 >> # cat /var/log/cluster/geo_repair.log.err >> transaction log start point: 0/E5000028 on timeline 1 >> pg_basebackup: starting background WAL receiver >> WARNING: skipping special file "./pg_hba.conf" >> WARNING: skipping special file "./pg_hba.conf.save" >> transaction log end point: 0/E50000F8 >> pg_basebackup: waiting for background process to finish streaming ... >> pg_basebackup: could not wait for child process: No child processes >> <----what does this mean? I think it failed to start process to >> fetching wal logs created during backup: but neither on master node >> neither on pg_basebackup output here is any information about reason.. >> (max_wal_senders on master is 10: I see no reason to fail). >> >> postgres logs: >> `pg_xlog/0000000100000000000000E4' -> >> `../backups/arc/0000000100000000000000E4' >> 2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING: skipping >> special file "./pg_hba.conf" >> 2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING: skipping >> special file "./pg_hba.conf.save" >> 2016-04-15 23:35:01 CEST:@:[28926] LOG: database system was >> interrupted; last known up at 2016-04-15 23:35:09 CEST >> 2016-04-15 23:35:01 CEST:postgres@postgres:[28938] FATAL: the database >> system is starting up >> 2016-04-15 23:35:02 CEST:@:[28926] LOG: entering standby mode >> 2016-04-15 23:35:02 CEST:@:[28926] LOG: database system was not >> properly shut down; automatic recovery in progress <------------this >> means something missing from pg_basebackup >> 2016-04-15 23:35:02 CEST:@:[28926] LOG: redo starts at 0/E5000028 >> 2016-04-15 23:35:02 CEST:@:[28926] LOG: consistent recovery state >> reached at 0/E6000000 >> 2016-04-15 23:35:02 CEST:@:[28904] LOG: database system is ready to >> accept read only connections >> 2016-04-15 23:35:02 CEST:@:[28989] LOG: started streaming WAL from >> primary at 0/E6000000 on timeline 1 >> >> postgres params on master node: >> log_line_prefix = '%t:%u@%d:[%p] ' >> logging_collector = on >> wal_buffers = 16MB >> max_wal_size = 200MB >> log_temp_files = 1MB >> max_connections = 170 >> shared_buffers = 512MB >> effective_cache_size = 1500MB >> work_mem = 48MB >> log_lock_waits = on >> log_min_duration_statement = 10000 >> shared_preload_libraries = 'pg_stat_statements' >> include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA >> wal_level = hot_standby >> archive_mode = on >> max_wal_senders = 10 >> hot_standby = on >> wal_keep_segments = 128 >> archive_command = '/opt/postgres/dbconf/archive_command.sh %p %f' >> wal_receiver_status_interval = 2 >> max_standby_streaming_delay = -1 >> max_standby_archive_delay = -1 >> restart_after_crash = off >> hot_standby_feedback = on >> >> > > -- > Adrian Klaver > adrian.kla...@aklaver.com >