Hello, I tried to run pg_basebackup. Return value is 1.
How to find out its reason? (I suspect that some wal after backup is missing- but how to find out the real reason? How to fix it?) thanks, Andrej --------------details: environment: CentOS 6.7, postgres 9.5.1 ( PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit) I tried 2 forms of pg_basebackup (-X fetch and -X stream). Both were issued from a script: # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP} -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X fetch" 2>${LOG_FILE}.stderr >> ${LOG_FILE} # echo $? 1 <--------------pg_basebackup failed! # cat log.stderr # cat /var/log/cluster/geo_repair.log.err transaction log start point: 0/E3000028 on timeline 1 WARNING: skipping special file "./pg_hba.conf" WARNING: skipping special file "./pg_hba.conf.save" transaction log end point: 0/E30000F8 pg_basebackup: base backup completed <------------------no reason for pg_basebackup failure! # cp /tmp/pg_hba.conf /tmp/postgresql.conf /pg_data/ # su - postgres -c "/usr/pgsql-9.5/bin/pg_ctl -D /pg_data/ start" # tail /pg_data/pg_log/postgresql-Fri.log `pg_xlog/0000000100000000000000E2' -> `../backups/arc/0000000100000000000000E2' 2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING: skipping special file "./pg_hba.conf" 2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING: skipping special file "./pg_hba.conf.save" <---------------recorded in pg_log on master node and copied by pg_basebackup (note time difference between two servers) 2016-04-15 23:15:02 CEST:@:[23321] LOG: database system was interrupted; last known up at 2016-04-15 23:15:10 CEST 2016-04-15 23:15:02 CEST:postgres@postgres:[23329] FATAL: the database system is starting up 2016-04-15 23:15:03 CEST:@:[23321] LOG: entering standby mode 2016-04-15 23:15:03 CEST:@:[23321] LOG: database system was not properly shut down; automatic recovery in progress <---------something missing from pg_basebackup 2016-04-15 23:15:03 CEST:@:[23321] LOG: redo starts at 0/E3000028 2016-04-15 23:15:03 CEST:@:[23321] LOG: consistent recovery state reached at 0/E4000000 2016-04-15 23:15:03 CEST:@:[23295] LOG: database system is ready to accept read only connections 2016-04-15 23:15:03 CEST:@:[23356] LOG: started streaming WAL from primary at 0/E4000000 on timeline 1 -------second trial # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP} -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X stream" # echo $? 1 # cat /var/log/cluster/geo_repair.log.err transaction log start point: 0/E5000028 on timeline 1 pg_basebackup: starting background WAL receiver WARNING: skipping special file "./pg_hba.conf" WARNING: skipping special file "./pg_hba.conf.save" transaction log end point: 0/E50000F8 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: could not wait for child process: No child processes <----what does this mean? I think it failed to start process to fetching wal logs created during backup: but neither on master node neither on pg_basebackup output here is any information about reason.. (max_wal_senders on master is 10: I see no reason to fail). postgres logs: `pg_xlog/0000000100000000000000E4' -> `../backups/arc/0000000100000000000000E4' 2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING: skipping special file "./pg_hba.conf" 2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING: skipping special file "./pg_hba.conf.save" 2016-04-15 23:35:01 CEST:@:[28926] LOG: database system was interrupted; last known up at 2016-04-15 23:35:09 CEST 2016-04-15 23:35:01 CEST:postgres@postgres:[28938] FATAL: the database system is starting up 2016-04-15 23:35:02 CEST:@:[28926] LOG: entering standby mode 2016-04-15 23:35:02 CEST:@:[28926] LOG: database system was not properly shut down; automatic recovery in progress <------------this means something missing from pg_basebackup 2016-04-15 23:35:02 CEST:@:[28926] LOG: redo starts at 0/E5000028 2016-04-15 23:35:02 CEST:@:[28926] LOG: consistent recovery state reached at 0/E6000000 2016-04-15 23:35:02 CEST:@:[28904] LOG: database system is ready to accept read only connections 2016-04-15 23:35:02 CEST:@:[28989] LOG: started streaming WAL from primary at 0/E6000000 on timeline 1 postgres params on master node: log_line_prefix = '%t:%u@%d:[%p] ' logging_collector = on wal_buffers = 16MB max_wal_size = 200MB log_temp_files = 1MB max_connections = 170 shared_buffers = 512MB effective_cache_size = 1500MB work_mem = 48MB log_lock_waits = on log_min_duration_statement = 10000 shared_preload_libraries = 'pg_stat_statements' include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA wal_level = hot_standby archive_mode = on max_wal_senders = 10 hot_standby = on wal_keep_segments = 128 archive_command = '/opt/postgres/dbconf/archive_command.sh %p %f' wal_receiver_status_interval = 2 max_standby_streaming_delay = -1 max_standby_archive_delay = -1 restart_after_crash = off hot_standby_feedback = on