Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Wed, Mar 30, 2011 at 4:56 PM, Heikki Linnakangas wrote: >> Attached patch reverts that. Comments? > > Looks good, committed. Thanks! > We could also improve the error message. If we haven't reached the > end-of-backup location, we could say something along the lines of: > > ERROR: WAL ends before the end of online backup > HINT: Online backup must be ended with pg_stop_backup(), and all the WAL up > to that point must be available at recovery. +1 Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On 30.03.2011 09:25, Fujii Masao wrote: On Wed, Mar 30, 2011 at 11:39 AM, Fujii Masao wrote: On Wed, Mar 30, 2011 at 12:54 AM, Heikki Linnakangas wrote: Hmm, why did we change that? I'm not sure, but I guess that's because I missed the case where crash recovery starts from the backup :( It seems like a mistake, the database is not consistent until you reach the backup stop location, whether or not you're doing archive recovery. +1 for reverting that, and backpatching it as well. Agreed. Attached patch reverts that. Comments? Looks good, committed. We could also improve the error message. If we haven't reached the end-of-backup location, we could say something along the lines of: ERROR: WAL ends before the end of online backup HINT: Online backup must be ended with pg_stop_backup(), and all the WAL up to that point must be available at recovery. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Wed, Mar 30, 2011 at 11:39 AM, Fujii Masao wrote: > On Wed, Mar 30, 2011 at 12:54 AM, Heikki Linnakangas > wrote: >> Hmm, why did we change that? > > I'm not sure, but I guess that's because I missed the case where crash > recovery starts from the backup :( > >> It seems like a mistake, the database is not >> consistent until you reach the backup stop location, whether or not you're >> doing archive recovery. +1 for reverting that, and backpatching it as well. > > Agreed. Attached patch reverts that. Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center backup_stop_location_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Wed, Mar 30, 2011 at 12:54 AM, Heikki Linnakangas wrote: > Hmm, why did we change that? I'm not sure, but I guess that's because I missed the case where crash recovery starts from the backup :( > It seems like a mistake, the database is not > consistent until you reach the backup stop location, whether or not you're > doing archive recovery. +1 for reverting that, and backpatching it as well. Agreed. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On 29.03.2011 14:27, Fujii Masao wrote: On Tue, Mar 29, 2011 at 6:46 PM, hubert depesz lubaczewski wrote: Did you use recovery.conf to start standalone PostgreSQL? If not, recovery doesn't check whether it reaches the recovery ending position or not. So I guess no problem didn't happen. no, i don't use. hmm .. i am nearly 100% certain that previous pgs did in fact check if the end of recovery is reached. Yes. In 8.4, that was checked only when starting recovery from the backup (i.e., which includes backup_label and backup history file) without recovery.conf. But in 9.0, the behavior was changed so that only archive recovery (i.e., with recovery.conf) checks that. IIRC, we don't have strong opinion about this change. We should revert, in order to make even crash recovery check whether it reaches the ending location? Hmm, why did we change that? It seems like a mistake, the database is not consistent until you reach the backup stop location, whether or not you're doing archive recovery. +1 for reverting that, and backpatching it as well. "pg_basebackup -x", which includes all the WAL required to restore in the pg_xlog directory of the base backup itself, is also affected. Without the check that you reach the end-of-backup, an aborted base backup will appear to restore fine, even though some WAL segments are missing and the backup is incomplete. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
hubert depesz lubaczewski wrote: > it worked. now the slave2 is working as stand alone. > > what does it tell us? will any work happening after checkpoint > break it anyway? I'm less sure about what will put it into a bad state again than I was that an immediate checkpoint would put you into a good state. I have a vague feeling that I've seen or heard something which suggests that doing this during a spread checkpoint might be a problem as things currently stand. I can't be more specific without digging through code, and I'm pretty swamped at the moment. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Tue, Mar 29, 2011 at 6:46 PM, hubert depesz lubaczewski wrote: >> Did you use recovery.conf to start standalone PostgreSQL? If not, >> recovery doesn't check whether it reaches the recovery ending position >> or not. So I guess no problem didn't happen. > > no, i don't use. > > hmm .. i am nearly 100% certain that previous pgs did in fact check if the end > of recovery is reached. Yes. In 8.4, that was checked only when starting recovery from the backup (i.e., which includes backup_label and backup history file) without recovery.conf. But in 9.0, the behavior was changed so that only archive recovery (i.e., with recovery.conf) checks that. IIRC, we don't have strong opinion about this change. We should revert, in order to make even crash recovery check whether it reaches the ending location? Of course, even if we do that, your problem is not solved at all. So I think that the right direction is to implement the ability to easily take a base backup from the standby, in 9.2. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Tue, Mar 29, 2011 at 11:13:07AM +0900, Fujii Masao wrote: > Yes, it's intentional. In streaming replication, at first the master must > stream > a backup history file to the standby in order to let it know the recovery > ending > position. But streaming replication doesn't have ability to send a text file, > so > we changed the code so that the recovery ending position was also written as > WAL record which can be streamed. ok, this makes sense. > BTW, in my system, I use another trick to take a base backup from the > standby: > > (All of these operations are expected to be performed on the standby) > (1) Run CHECKPOINT > (2) Copy pg_control to temporary area > (3) Take a base backup of $PGDATA > (4) Copy back pg_control from temporary area to the backup taken in (2). > (5) Calculate the recovery ending position from current pg_control in > $PGDATA by using pg_controldata > > When recovery starts from that backup, it doesn't automatically check > whether it has reached the ending position or not. So the user needs to > check that manually. > Yeah, this trick is very fragile and complicated. I'd like to improve the way > in 9.2. I know about it, but I feel very worried about doing stuff like this - i.e. meddling with internal files of pg. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Tue, Mar 29, 2011 at 11:20:48AM +0900, Fujii Masao wrote: > On Tue, Mar 29, 2011 at 12:11 AM, hubert depesz lubaczewski > wrote: > > On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote: > >> In 9.0, recovery doesn't read a backup history file. That FATAL error > >> happens > >> if recovery ends before it reads the WAL record which was generated by > >> pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL > >> record not backup history file. Since you didn't run pg_stop_backup() and > >> there > >> is no WAL record containing the recovery ending location, you got that > >> error. > >> > >> If you want to take hot backup from the standby, you need to do the > >> procedure > >> explained in > >> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups > > > > one more question. how come that I can use this backup to make > > standalone pg, and it starts without any problem, but when I start it as > > sr slave, let it run for some time, and then promote to standalone, it > > breaks? > > Did you use recovery.conf to start standalone PostgreSQL? If not, > recovery doesn't check whether it reaches the recovery ending position > or not. So I guess no problem didn't happen. no, i don't use. hmm .. i am nearly 100% certain that previous pgs did in fact check if the end of recovery is reached. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 05:29:22PM -0500, Kevin Grittner wrote: > I have a theory. Can you try it in what would be the failure case, > but run an explicit a CHECKPOINT on the master, wait for > pg_controldata to show that checkpoint on the slave, and (as soon as > you see that) try to trigger the slave to come up in production? =$ ( pg_controldata master/; pg_controldata slave2/ ) | grep "Latest checkpoint location:" Latest checkpoint location: 0/2D58 Latest checkpoint location: 0/2C58 =$ psql -p 54001 -c "checkpoint" CHECKPOINT =$ ( pg_controldata master/; pg_controldata slave2/ ) | grep "Latest checkpoint location:" Latest checkpoint location: 0/2E58 Latest checkpoint location: 0/2C58 ... ~ 1.5 minute later =$ ( pg_controldata master/; pg_controldata slave2/ ) | grep "Latest checkpoint location:" Latest checkpoint location: 0/2E58 Latest checkpoint location: 0/2E58 =$ touch /home/depesz/slave2/finish.recovery it worked. now the slave2 is working as stand alone. what does it tell us? will any work happening after checkpoint break it anyway? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Tue, Mar 29, 2011 at 12:11 AM, hubert depesz lubaczewski wrote: > On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote: >> In 9.0, recovery doesn't read a backup history file. That FATAL error happens >> if recovery ends before it reads the WAL record which was generated by >> pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL >> record not backup history file. Since you didn't run pg_stop_backup() and >> there >> is no WAL record containing the recovery ending location, you got that error. >> >> If you want to take hot backup from the standby, you need to do the procedure >> explained in >> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups > > one more question. how come that I can use this backup to make > standalone pg, and it starts without any problem, but when I start it as > sr slave, let it run for some time, and then promote to standalone, it > breaks? Did you use recovery.conf to start standalone PostgreSQL? If not, recovery doesn't check whether it reaches the recovery ending position or not. So I guess no problem didn't happen. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 9:19 PM, hubert depesz lubaczewski wrote: > On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote: >> In 9.0, recovery doesn't read a backup history file. That FATAL error happens >> if recovery ends before it reads the WAL record which was generated by >> pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL >> record not backup history file. Since you didn't run pg_stop_backup() and >> there >> is no WAL record containing the recovery ending location, you got that error. >> >> If you want to take hot backup from the standby, you need to do the procedure >> explained in >> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups > > Is it intentional and/or does it serve some greater good? Yes, it's intentional. In streaming replication, at first the master must stream a backup history file to the standby in order to let it know the recovery ending position. But streaming replication doesn't have ability to send a text file, so we changed the code so that the recovery ending position was also written as WAL record which can be streamed. IIRC another reason is that it's more reliable to write down the important information like the recovery ending position to WAL record than a backup history file. > I mean - > ability to make backups on slave without ever bothering master was > pretty interesting. Me, too. We would need to implement that in 9.2. BTW, in my system, I use another trick to take a base backup from the standby: (All of these operations are expected to be performed on the standby) (1) Run CHECKPOINT (2) Copy pg_control to temporary area (3) Take a base backup of $PGDATA (4) Copy back pg_control from temporary area to the backup taken in (2). (5) Calculate the recovery ending position from current pg_control in $PGDATA by using pg_controldata When recovery starts from that backup, it doesn't automatically check whether it has reached the ending position or not. So the user needs to check that manually. Yeah, this trick is very fragile and complicated. I'd like to improve the way in 9.2. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 05:43:15PM -0500, Kevin Grittner wrote: > hubert depesz lubaczewski wrote: > > > have you seen this mail - > > http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php > > One more thing: Am I correct in understanding that you are trying to > do a PITR-style backup without using pg_start_backup() and > pg_stop_backup()? If so, why? because this is backup on slave, and the point was to make the backup work without *any* bothering master. so far it worked fine. and generally even with 9.0 it still works, and backup *can* be used to setup new pg instance. but it cannot be used to make sr slave, which we could later on promote. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 05:29:22PM -0500, Kevin Grittner wrote: > hubert depesz lubaczewski wrote: > > > have you seen this mail - > > http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php > > Ah, OK. > > I have a theory. Can you try it in what would be the failure case, > but run an explicit a CHECKPOINT on the master, wait for > pg_controldata to show that checkpoint on the slave, and (as soon as > you see that) try to trigger the slave to come up in production? yes. will check, but it will happen in ~ 10 hours. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
hubert depesz lubaczewski wrote: > have you seen this mail - > http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php One more thing: Am I correct in understanding that you are trying to do a PITR-style backup without using pg_start_backup() and pg_stop_backup()? If so, why? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
hubert depesz lubaczewski wrote: > have you seen this mail - > http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php Ah, OK. I have a theory. Can you try it in what would be the failure case, but run an explicit a CHECKPOINT on the master, wait for pg_controldata to show that checkpoint on the slave, and (as soon as you see that) try to trigger the slave to come up in production? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 04:53:37PM -0500, Kevin Grittner wrote: > hubert depesz lubaczewski wrote: > > On Mon, Mar 28, 2011 at 04:24:23PM -0500, Kevin Grittner wrote: > >> hubert depesz lubaczewski wrote: > >> > >>> how come that I can use this backup to make standalone pg, and > <<< it starts without any problem, but when I start it as sr slave, > >>> let it run for some time, and then promote to standalone, it > >>> breaks? > >> > >> We need more detail to make much of a guess about that. > > > > what details can I provide? > > > > I can provide scripts that I use to test it, and also access to > > test machine that I was testing it on. > > For starters, what do you mean by "it breaks"? What, exactly > happens? What is in the logs? What version of PostgreSQL? Are you > using pg_standby or custom scripts? hmm ... i thought that all details are in the first mail in thread. I can probably repost it, but it seems to me that it includes all of the information - which scripts, how it fails, in what cases, and what exactly i'm doing. have you seen this mail - http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php ? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
hubert depesz lubaczewski wrote: > On Mon, Mar 28, 2011 at 04:24:23PM -0500, Kevin Grittner wrote: >> hubert depesz lubaczewski wrote: >> >>> how come that I can use this backup to make standalone pg, and <<< it starts without any problem, but when I start it as sr slave, >>> let it run for some time, and then promote to standalone, it >>> breaks? >> >> We need more detail to make much of a guess about that. > > what details can I provide? > > I can provide scripts that I use to test it, and also access to > test machine that I was testing it on. For starters, what do you mean by "it breaks"? What, exactly happens? What is in the logs? What version of PostgreSQL? Are you using pg_standby or custom scripts? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 04:24:23PM -0500, Kevin Grittner wrote: > hubert depesz lubaczewski wrote: > > > how come that I can use this backup to make standalone pg, and it > > starts without any problem, but when I start it as sr slave, let > > it run for some time, and then promote to standalone, it breaks? > > We need more detail to make much of a guess about that. what details can I provide? I can provide scripts that I use to test it, and also access to test machine that I was testing it on. if you'd need something else - just tell me what, i'll do my best to provide. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
hubert depesz lubaczewski wrote: > how come that I can use this backup to make standalone pg, and it > starts without any problem, but when I start it as sr slave, let > it run for some time, and then promote to standalone, it breaks? We need more detail to make much of a guess about that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote: > In 9.0, recovery doesn't read a backup history file. That FATAL error happens > if recovery ends before it reads the WAL record which was generated by > pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL > record not backup history file. Since you didn't run pg_stop_backup() and > there > is no WAL record containing the recovery ending location, you got that error. > > If you want to take hot backup from the standby, you need to do the procedure > explained in > http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups one more question. how come that I can use this backup to make standalone pg, and it starts without any problem, but when I start it as sr slave, let it run for some time, and then promote to standalone, it breaks? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote: > In 9.0, recovery doesn't read a backup history file. That FATAL error happens > if recovery ends before it reads the WAL record which was generated by > pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL > record not backup history file. Since you didn't run pg_stop_backup() and > there > is no WAL record containing the recovery ending location, you got that error. > > If you want to take hot backup from the standby, you need to do the procedure > explained in > http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups Is it intentional and/or does it serve some greater good? I mean - ability to make backups on slave without ever bothering master was pretty interesting. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Mon, Mar 28, 2011 at 12:48 AM, Fujii Masao wrote: > If you want to take hot backup from the standby, you need to do the procedure > explained in > http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups It'd be nice to improve this in 9.2. Relying on users to get this just right seems both inconvenient and error-prone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
On Sat, Mar 26, 2011 at 5:31 AM, hubert depesz lubaczewski wrote: > I can also setup streaming slave, and it also works, but when I create > trigger file to promote this slave to master it fails with error: > 2011-03-24 21:01:58.051 CET @ 9680 LOG: trigger file found: > /home/depesz/slave2/finish.recovery > 2011-03-24 21:01:58.051 CET @ 9930 FATAL: terminating walreceiver process > due to administrator command > 2011-03-24 21:01:58.151 CET @ 9680 LOG: redo done at 0/1F58 > 2011-03-24 21:01:58.151 CET @ 9680 LOG: last completed transaction was at > log time 2011-03-24 20:58:25.836333+01 > 2011-03-24 21:01:58.238 CET @ 9680 FATAL: WAL ends before consistent > recovery point > > Which is interesting, because this particular backup was done using .backup > file containing: > > START WAL LOCATION: 0/A20 (file 0001000A) > STOP WAL LOCATION: 0/12C9D7E8 (file 00010012) > CHECKPOINT LOCATION: 0/B803050 > START TIME: 2011-03-24 20:52:46 CET > STOP TIME: 2011-03-24 20:53:41 CET > LABEL: OmniPITR_Slave_Hot_Backup > > Which means that minimum recovery ending location was in fact reached (it was > on 0/12C9D7E8, and recovery continued till 0/1F58). In 9.0, recovery doesn't read a backup history file. That FATAL error happens if recovery ends before it reads the WAL record which was generated by pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL record not backup history file. Since you didn't run pg_stop_backup() and there is no WAL record containing the recovery ending location, you got that error. If you want to take hot backup from the standby, you need to do the procedure explained in http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
hi, So, I hit a strange problem with Streaming Replication, that I cannot explain. Executive summary: when using hot backup made on straming replication slave, sometimes (depending on load) generated backup is created in such a way, that while it can be brough back as standalone Pg, and it can be brough back as streaming slave, such slave (created off the backup) cannot be promoted to standalone. Disclaimer: I know that making hot backups on slave is not the suggested way, yet I was doing it without any problem on earlier Postgres versions (8.2,8.3,8.4), and do not have this problem with backups generated from the masters, so the problem I hit now is so" peculiar, that I thought that it might be just an effect of some underlying, more serious, condition. Longer explanation: First, let me explain how omnipitr-backup-slave works, because it's the tool that I use to make backups on slave. Steps that it does: 1. gets pg_controldata for $PGDATADIR 2. compresses $PGDATA to data tar.gz, putting inside backup_label file, which contains: START WAL LOCATION: %s (file %s) CHECKPOINT LOCATION: %s START TIME: %s LABEL: OmniPITR_Slave_Hot_Backup where START WAL LOCATION uses value from "Latest checkpoint's REDO location" from pg_controldata from step #1, "CHECKPOINT LOCATION" is taken from "Latest checkpoint location" from pg_controldata taken in step #1, and START TIME is based on current (before starting compression of $PGDATA) timestamp. 3. gets another copy of pg_controldata for $PGDATA 4. repeats step #3 until value in "Latest checkpoint location" will change 5. wait until file that contains WAL location, from "Minimum recovery ending location" from pg_controldata from step #4, will be available 6. creates .backup file which is named based on "START WAL LOCATION" (from step #2), and contains the same lines as backup_label file from step #2, plus two more lines: STOP WAL LOCATION: %s (file %s) STOP TIME: %s where STOP WAL LOCATION is taken from "Minimum recovery ending location" from pg_controldata from step #4, and STOP time is current timestamp as of before starting compression of wal segments. 7. compresses xlogs plus the .backup file generated in step #6. This approach worked for a long time on various hosts, systems, versions, etc. But now, it fails. I'm using for tests PostgreSQL 9.0.2 and 9.0.3 (mostly 9.0.2 as this is the most critical for me, but I tested on 9.0.3 too, and the problem is the same), on linux (ubuntu), 64bit. I do the procedure as always, and it produces backup. With this backup I can setup new standalone server, and it works. I can also setup streaming slave, and it also works, but when I create trigger file to promote this slave to master it fails with error: 2011-03-24 21:01:58.051 CET @ 9680 LOG: trigger file found: /home/depesz/slave2/finish.recovery 2011-03-24 21:01:58.051 CET @ 9930 FATAL: terminating walreceiver process due to administrator command 2011-03-24 21:01:58.151 CET @ 9680 LOG: redo done at 0/1F58 2011-03-24 21:01:58.151 CET @ 9680 LOG: last completed transaction was at log time 2011-03-24 20:58:25.836333+01 2011-03-24 21:01:58.238 CET @ 9680 FATAL: WAL ends before consistent recovery point Which is interesting, because this particular backup was done using .backup file containing: START WAL LOCATION: 0/A20 (file 0001000A) STOP WAL LOCATION: 0/12C9D7E8 (file 00010012) CHECKPOINT LOCATION: 0/B803050 START TIME: 2011-03-24 20:52:46 CET STOP TIME: 2011-03-24 20:53:41 CET LABEL: OmniPITR_Slave_Hot_Backup Which means that minimum recovery ending location was in fact reached (it was on 0/12C9D7E8, and recovery continued till 0/1F58). I have set of script that can be used to replicate the problem, but the test takes some time (~ 30 minutes). What's most interesting is that this problem does not happen always. It happens only when there was non-trivial load on db server - this is in my tests where both master and slave are the same machine. I think that in normal cases load on slave is more important. If anyone would be able to help, I can give you access to test machine and/or provide set of script which replicate (usually) the problem. Alternatively - if there is anything I can do to help you solve the mystery - I'd be very willing to. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers