Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Heikki Linnakangas
Josh Berkus wrote:
 Here's a way to trap yourself:
 
 (1) Set up an HS/SR master
 (2) pg_start_backup on the master
 (3) clone the master to 1 or more slaves
 (4) Fast shutdown the master (without pg_stop_backup)
 (5) Restart the master
 (6) Bring up the slaves
 
 Result: the slaves will come up fine in recovery mode.  However, they
 will never switch over to HS mode or start SR.  You will not be able to
 pg_stop_backup() on the master.  At this point, you have no option but
 to shut down the slaves and re-clone.
 
 The only reason why this is somewhat problematic for users is that you
 will not get any messages from the master or the slaves to indicate why
 they won't switch modes.  So I can imagine someone wasting a lot of time
 troubleshooting the wrong problems.
 
 Suggested resolution: I don't think there's and logical fix for this
 case; it should just be added to the docs as a failure/troubleshooting
 condition.

Hmm, we could throw an error in the standby, when we see a shutdown
checkpoint while we're waiting for an end-backup record. If the database
was shut down before pg_stop_backup(), we know that the backup was
cancelled and the end-backup record we're waiting for will never arrive.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Fujii Masao
On Tue, Apr 27, 2010 at 4:19 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Hmm, we could throw an error in the standby, when we see a shutdown
 checkpoint while we're waiting for an end-backup record. If the database
 was shut down before pg_stop_backup(), we know that the backup was
 cancelled and the end-backup record we're waiting for will never arrive.

Sounds good. This would work fine even if an immediate shutdown is done
instead since the primary ends up generating a shutdown checkpoint record
when restarting.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Heikki Linnakangas
Fujii Masao wrote:
 On Tue, Apr 27, 2010 at 4:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Hmm, we could throw an error in the standby, when we see a shutdown
 checkpoint while we're waiting for an end-backup record. If the database
 was shut down before pg_stop_backup(), we know that the backup was
 cancelled and the end-backup record we're waiting for will never arrive.
 
 Sounds good. This would work fine even if an immediate shutdown is done
 instead since the primary ends up generating a shutdown checkpoint record
 when restarting.

Yep. I've committed a patch to do that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Robert Haas
On Tue, Apr 27, 2010 at 5:25 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Yep. I've committed a patch to do that.

Is there no way for the slave to recover from this situation?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Fujii Masao
On Tue, Apr 27, 2010 at 8:07 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 27, 2010 at 5:25 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Yep. I've committed a patch to do that.

 Is there no way for the slave to recover from this situation?

Probably Yes. You would need to take a fresh base backup and
restart the slave from it.

On second thought, seeing a shutdown checkpoint during waiting
end-backup means mostly that the database has already reached
the consistent state. We might be able to relax the error check.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 12:25 +0300, Heikki Linnakangas wrote:
 Fujii Masao wrote:
  On Tue, Apr 27, 2010 at 4:19 PM, Heikki Linnakangas
  heikki.linnakan...@enterprisedb.com wrote:
  Hmm, we could throw an error in the standby, when we see a shutdown
  checkpoint while we're waiting for an end-backup record. If the database
  was shut down before pg_stop_backup(), we know that the backup was
  cancelled and the end-backup record we're waiting for will never arrive.
  
  Sounds good. This would work fine even if an immediate shutdown is done
  instead since the primary ends up generating a shutdown checkpoint record
  when restarting.
 
 Yep. I've committed a patch to do that.

We should be able to do this earlier in the run.

If pg_stop_backup() is run it creates the .backup file in the archive.
In the absence of that file, we should be able to work out that
pg_stop_backup() was not run. Almost, because we support starting
recovery without need to run start/stop backup. If we introduced a
special option for that in recovery.conf it would be much simpler to
fail if the file were unavailable.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, we could throw an error in the standby, when we see a shutdown
 checkpoint while we're waiting for an end-backup record. If the database
 was shut down before pg_stop_backup(), we know that the backup was
 cancelled and the end-backup record we're waiting for will never arrive.

Isn't the above statement complete nonsense?  There's nothing to stop
the DBA from issuing pg_stop_backup() after he restarts the master.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Heikki Linnakangas
Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, we could throw an error in the standby, when we see a shutdown
 checkpoint while we're waiting for an end-backup record. If the database
 was shut down before pg_stop_backup(), we know that the backup was
 cancelled and the end-backup record we're waiting for will never arrive.
 
 Isn't the above statement complete nonsense?  There's nothing to stop
 the DBA from issuing pg_stop_backup() after he restarts the master.

pg_stop_backup() can't be called if there's no backup in progress.
Restart cancels it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Heikki Linnakangas
Robert Haas wrote:
 On Tue, Apr 27, 2010 at 5:25 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Yep. I've committed a patch to do that.
 
 Is there no way for the slave to recover from this situation?

No, it will never open up for hot standby, and it will error at the end
of recovery anyway. This just makes it happen earlier and with a smarter
error message.

In theory, if the data directory was fully copied by the time of the
shutdown/crash and the only thing that was missing was pg_stop_backup(),
all the data is there, so you could get a consistent database. But we
can't know if it's consistent or not, so we don't allow it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Heikki Linnakangas
Simon Riggs wrote:
 If pg_stop_backup() is run it creates the .backup file in the archive.
 In the absence of that file, we should be able to work out that
 pg_stop_backup() was not run. 

It's just as likely that the file is there even though the backup didn't
finish, though.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 Isn't the above statement complete nonsense?  There's nothing to stop
 the DBA from issuing pg_stop_backup() after he restarts the master.

 pg_stop_backup() can't be called if there's no backup in progress.
 Restart cancels it.

Doh, right (not enough caffeine yet).

Given that, I concur this change is a good idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 18:13 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  If pg_stop_backup() is run it creates the .backup file in the archive.
  In the absence of that file, we should be able to work out that
  pg_stop_backup() was not run. 
 
 It's just as likely that the file is there even though the backup didn't
 finish, though.

It's possible, but not likely. It would need to break at a very specific
place for that to be the case. Whereas the test I explained would work
for about 99% of the time between start and stop backup, except for the
caveat I explained also. I'm not sure that pointing out a minor hole
stops it being a worthwhile test? Surely if you care to fix the problem
then a better test can only be a good thing?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Tue, 2010-04-27 at 18:13 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 If pg_stop_backup() is run it creates the .backup file in the archive.
 In the absence of that file, we should be able to work out that
 pg_stop_backup() was not run. 
 It's just as likely that the file is there even though the backup didn't
 finish, though.
 
 It's possible, but not likely. It would need to break at a very specific
 place for that to be the case. Whereas the test I explained would work
 for about 99% of the time between start and stop backup, except for the
 caveat I explained also.

I don't understand how you arrived at that figure. Roughly speaking,
there's two possibilities: backup_label is backed up before the bulk of
the data in base-directory or tablespaces, in which case it will almost
certainly be included in the backup, or it will be backed up after the
bulk of the data, in which case it will almost certainly not be included
if the backup is stopped prematurely. I don't know which is more common,
but both seem plausible.

 I'm not sure that pointing out a minor hole
 stops it being a worthwhile test? Surely if you care to fix the problem
 then a better test can only be a good thing?

Yeah, it might be worthwhile if it's not a lot of code.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 20:14 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Tue, 2010-04-27 at 18:13 +0300, Heikki Linnakangas wrote:
  Simon Riggs wrote:
  If pg_stop_backup() is run it creates the .backup file in the archive.
  In the absence of that file, we should be able to work out that
  pg_stop_backup() was not run. 
  It's just as likely that the file is there even though the backup didn't
  finish, though.
  
  It's possible, but not likely. It would need to break at a very specific
  place for that to be the case. Whereas the test I explained would work
  for about 99% of the time between start and stop backup, except for the
  caveat I explained also.
 
 I don't understand how you arrived at that figure. 

You're talking about the backup_label file, I'm talking about
the .backup file in the archive.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Tue, 2010-04-27 at 20:14 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 On Tue, 2010-04-27 at 18:13 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 If pg_stop_backup() is run it creates the .backup file in the archive.
 In the absence of that file, we should be able to work out that
 pg_stop_backup() was not run. 
 It's just as likely that the file is there even though the backup didn't
 finish, though.
 It's possible, but not likely. It would need to break at a very specific
 place for that to be the case. Whereas the test I explained would work
 for about 99% of the time between start and stop backup, except for the
 caveat I explained also.
 I don't understand how you arrived at that figure. 
 
 You're talking about the backup_label file, I'm talking about
 the .backup file in the archive.

Oh, the backup history file. We stopped relying on that with the
introduction of the end-of-backup record, to make life easier for
streaming replication, and because it's simpler anyway. I don't think we
should go back to it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-27 Thread Fujii Masao
On Wed, Apr 28, 2010 at 4:12 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Simon Riggs wrote:
 On Tue, 2010-04-27 at 20:14 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 On Tue, 2010-04-27 at 18:13 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 If pg_stop_backup() is run it creates the .backup file in the archive.
 In the absence of that file, we should be able to work out that
 pg_stop_backup() was not run.
 It's just as likely that the file is there even though the backup didn't
 finish, though.
 It's possible, but not likely. It would need to break at a very specific
 place for that to be the case. Whereas the test I explained would work
 for about 99% of the time between start and stop backup, except for the
 caveat I explained also.
 I don't understand how you arrived at that figure.

 You're talking about the backup_label file, I'm talking about
 the .backup file in the archive.

 Oh, the backup history file. We stopped relying on that with the
 introduction of the end-of-backup record, to make life easier for
 streaming replication, and because it's simpler anyway. I don't think we
 should go back to it.

Right.

When restore_command is not given, the backup history file would be
unavailable in the standby. We cannot regard the absence of the file
as non-run of pg_stop_backup().

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Wierd quirk of HS/SR, probably not fixable

2010-04-04 Thread Josh Berkus
Hackers,

Here's a way to trap yourself:

(1) Set up an HS/SR master
(2) pg_start_backup on the master
(3) clone the master to 1 or more slaves
(4) Fast shutdown the master (without pg_stop_backup)
(5) Restart the master
(6) Bring up the slaves

Result: the slaves will come up fine in recovery mode.  However, they
will never switch over to HS mode or start SR.  You will not be able to
pg_stop_backup() on the master.  At this point, you have no option but
to shut down the slaves and re-clone.

The only reason why this is somewhat problematic for users is that you
will not get any messages from the master or the slaves to indicate why
they won't switch modes.  So I can imagine someone wasting a lot of time
troubleshooting the wrong problems.

Suggested resolution: I don't think there's and logical fix for this
case; it should just be added to the docs as a failure/troubleshooting
condition.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers