Howdy,

We're using NetApp's flexclone's whenever we need to move our DB between 
machines.

One specific case where we do that is when we're creating a new streaming 
replication target.

The basic steps we're using are:
pg_start_backup();
<flex clone within the netapp>
pg_stop_backup();

The problem i'm seeing is that periodically the backup_label is empty, which 
means 
I can't start the new standby.

I believe that since the NetApp stuff is all happening within the SAN this file 
hasn't been
fsynced to disk by the time we take the snapshot.

One option would be to do a "sync" prior to the clone, however that seems kind 
of like a 
heavy operation, and it's slightly more complicated to script. (having to have 
a user
account on the system to sudo rather than just connecting to the db to issue 
the 
pg_start_backup(...) )

Another option is to add pg_fsync(fileno(fp)) after the fflush() when creating 
the file (I'm not
sure if fsync implies fflush or not, if it does you could just replace it.)

I think this type of snapshot is fairly common, I've been doing them since 2000 
with EMC, 
i'm sure that most SAN vendors support it.

I also suspect that this type of problem could show up on AWS if you tried to 
use their EBS snapshots


Attached is a patch for the suggested change.


Dave
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 9311,9316 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 9311,9317 ----
  								BACKUP_LABEL_FILE)));
  			if (fwrite(labelfbuf.data, labelfbuf.len, 1, fp) != 1 ||
  				fflush(fp) != 0 ||
+ 				pg_fsync(fileno(fp)) != 0 ||
  				ferror(fp) ||
  				FreeFile(fp))
  				ereport(ERROR,
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to