Re: [HACKERS] patch proposal
On Wed, Aug 17, 2016 at 11:27 PM, Stephen Frost wrote: > Venkata, > > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > Agreed. Additional option like "pause" would. As long as there is an > option > > to ensure following happens if the recovery target is not reached - > > > > a) PG pauses the recovery at the end of the WAL > > b) Generates a warning in the log file saying that recovery target point > > is not reached (there is a patch being worked upon on by Thom on this) > > c) Does not open-up the database exiting from the recovery process by > > giving room to resume the replay of WALs > > One thing to consider is just how different this is from simply bringing > PG up as a warm standby instead, with the warning added to indicate if > the recovery point wasn't reached. > I am referring to a specific scenario (performing point-in time recovery) where-in a DBA attempts to bring up a standalone PG instance by restoring the backup and performing recovery to a particular recover target (XID, time or a named restore point) in the past by replaying all the available WAL archives, which is not quite possible through a warm-standby setup. Warm standby is more of a high-availability solution and i do not think so, it addresses PITR kind of situation. I will share more details defining PG behaviour across various recovery scenarios (as asked by David Steele) when using various recovery* parameters. Will also explain what difference the proposed patch could make. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Thu, Aug 18, 2016 at 3:37 PM, Michael Paquier wrote: > On Tue, Aug 16, 2016 at 11:06 PM, Stephen Frost > wrote: > > I could see supporting an additional "pause" option that means "pause at > > the end of WAL if you don't reach the recovery target point". I'd also > > be happy with a warning being emitted in the log if the recovery target > > point isn't reached before reaching the end of WAL, but I don't think it > > makes sense to change the existing behavior. > > Indeed, let's not change the existing behavior. A warning showing up > by default would be useful in itself, even if there are people that I > think set up overly high recovery targets to be sure to replay WAL as > much as possible. As recovery_target_action has meaning when a > recovery target has been reached, I would guess that we would want a > new option that has the same mapping value as recovery_target_action, > except that it activates when the target recovery is *not* reached. > Hence it would be possible to shutdown, pause or promote at will when > recovery completes, and be able to take a separate action is the > recovery target is indeed reached. The default of this parameter would > be "promote", which is what happens now. > Agreed. I understand the complexities with backward compatibility on changing the existing behaviour. Even, I was more inclined towards introducing a new parameter and as suggested, will consider the options pause, shutdown or promote for new parameter. Thanks for the inputs and advises. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Thu, Aug 18, 2016 at 7:20 PM, Stephen Frost wrote: > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > On Wed, Aug 17, 2016 at 11:27 PM, Stephen Frost > wrote: > > > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > > > Agreed. Additional option like "pause" would. As long as there is an > > > option > > > > to ensure following happens if the recovery target is not reached - > > > > > > > > a) PG pauses the recovery at the end of the WAL > > > > b) Generates a warning in the log file saying that recovery target > point > > > > is not reached (there is a patch being worked upon on by Thom on > this) > > > > c) Does not open-up the database exiting from the recovery process > by > > > > giving room to resume the replay of WALs > > > > > > One thing to consider is just how different this is from simply > bringing > > > PG up as a warm standby instead, with the warning added to indicate if > > > the recovery point wasn't reached. > > > > I am referring to a specific scenario (performing point-in time recovery) > > where-in a DBA attempts to bring up a standalone PG instance by restoring > > the backup and performing recovery to a particular recover target (XID, > > time or a named restore point) in the past by replaying all the available > > WAL archives, which is not quite possible through a warm-standby setup. > > > > Warm standby is more of a high-availability solution and i do not think > so, > > it addresses PITR kind of situation. > > No, a warm standby is one which just plays through the WAL but doesn't > bring the database up or start its own set of WAL, which is what you're > asking about. > I understand that, in warm-standby, PG does continuously replay WAL without bringing up the database as the database will be in standby mode, which sounds ideal. While recovering the database to a particular recovery target point ( and there is no requirement to setup standby ), then, there is a risk that database will get promoted due to missing or corrupt WALs. Which means, there is no way to avoid promotion and resume WAL recovery. An option like "pause" with a new parameter would be ideal to prevent database promotion at a default recovery target, which is "immediate" or "end-of-WAL". Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Thu, Aug 18, 2016 at 3:37 PM, Michael Paquier wrote: > On Tue, Aug 16, 2016 at 11:06 PM, Stephen Frost > wrote: > > I could see supporting an additional "pause" option that means "pause at > > the end of WAL if you don't reach the recovery target point". I'd also > > be happy with a warning being emitted in the log if the recovery target > > point isn't reached before reaching the end of WAL, but I don't think it > > makes sense to change the existing behavior. > > Indeed, let's not change the existing behavior. A warning showing up > by default would be useful in itself, even if there are people that I > think set up overly high recovery targets to be sure to replay WAL as > much as possible. As recovery_target_action has meaning when a > recovery target has been reached, I would guess that we would want a > new option that has the same mapping value as recovery_target_action, > except that it activates when the target recovery is *not* reached. > Hence it would be possible to shutdown, pause or promote at will when > recovery completes, and be able to take a separate action is the > recovery target is indeed reached. The default of this parameter would > be "promote", which is what happens now. > Yes, a new parameter with same options as recovery_target_action is the idea i had in mind as well and i have the following queries while working through the patch design - *Query 1* What about the existing parameter called "recovery_target" which accepts only one value "immediate", which will be similar to the "promote" option with the to-be-introduced new parameter. Since this parameter's behaviour will be incorporated into the new parameter, I think, this parameter can be deprecated from the next PostgreSQL version ? *Query 2* I am thinking that the new parameter name should be "recovery_target_incomplete" or "recovery_target_incomplete_action" which (by name) suggests that recovery target point is not yet reached and accepts options "pause","promote" and "shutdown". The other alternative name i thought of was - "recovery_target_immediate_action", which (by name) suggests the action to be taken when the recovery does not reach the actual set recovery target and reaches immediate consistent point. Any comments, suggestions ? Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost wrote: > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > *Query 1* > > > > What about the existing parameter called "recovery_target" which accepts > > only one value "immediate", which will be similar to the "promote" option > > with the to-be-introduced new parameter. > > Since this parameter's behaviour will be incorporated into the new > > parameter, I think, this parameter can be deprecated from the next > > PostgreSQL version ? > > I don't think we can really consider that without having a good answer > for what the "new parameter" is, in particular... > Sure. I Agree. > > *Query 2* > > > > I am thinking that the new parameter name should be > > "recovery_target_incomplete" or "recovery_target_incomplete_action" > which > > (by name) suggests that recovery target point is not yet reached and > > accepts options "pause","promote" and "shutdown". > > > > The other alternative name i thought of was - > > "recovery_target_immediate_action", which (by name) suggests the action > to > > be taken when the recovery does not reach the actual set recovery target > > and reaches immediate consistent point. > > I don't really care for any of those names. Note that "immediate" and > "the point at which we realize that we didn't find the recovery target" > are *not* necessairly the same. Whatever we do here needs to also cover > the 'recovery_target_name' option, where we're only going to realize > that we didn't find the restore point when we reach the end of the WAL > stream. > Yes, all the recovery targets (including recovery_target_name) will be taken into consideration. The whole idea about this patch is to ensure PG realises that the recovery is incomplete if the specified recovery target point is not found at the end of the WAL stream. > I'm not a fan of the "recovery_target" option, particularly as it's only > got one value even though it can mean two things (either "immediate" or > "not set"), but we need a complete solution before we can consider > deprecating it. Further, we could consider making it an alias for > whatever better name we come up with. > The new parameter will accept options : "pause", "shutdown" and "promote" *"promote"* This option will ensure database starts up once the "immediate" consistent recovery point is reached even if it is well before the mentioned recovery target point (XID, Name or time). This behaviour will be similar to that of recovery_target="immediate" and can be aliased. *"pause"* This option ensures database recovery pauses if any of the specified recovery target ("XID", "Name" or "time") is not reached. So that WAL replay can be resumed if required. *"shutdown"* This option ensures database shuts down if any of the mentioned recovery target points (XID, Name or time) is not reached. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost wrote: > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost > wrote: > > > I'm not a fan of the "recovery_target" option, particularly as it's > only > > > got one value even though it can mean two things (either "immediate" or > > > "not set"), but we need a complete solution before we can consider > > > deprecating it. Further, we could consider making it an alias for > > > whatever better name we come up with. > > > > The new parameter will accept options : "pause", "shutdown" and "promote" > > > > *"promote"* > > > > This option will ensure database starts up once the "immediate" > consistent > > recovery point is reached even if it is well before the mentioned > recovery > > target point (XID, Name or time). > > This behaviour will be similar to that of recovery_target="immediate" and > > can be aliased. > > I don't believe we're really going at this the right way. Clearly, > there will be cases where we'd like promotion at the end of the WAL > stream (as we currently have) even if the recovery point is not found, > but if the new option's "promote" is the same as "immediate" then we > don't have that. > My apologies for the confusion. Correction - I meant, "promote" option would promote the database once the consistent point is reached at the end-of-the-WAL. > We need to break this down into all the different possible combinations > and then come up with names for the options to define them. I don't > believe a single option is going to be able to cover all of the cases. > > The cases which I'm considering are: > > recovery target is immediate (as soon as we have consistency) > recovery target is a set point (name, xid, time, whatever) > > action to take if recovery target is found > action to take if recovery target is not found > > Generally, "action" is one of "promote", "pause", or "shutdown". > Clearly, not all actions are valid for all recovery target cases- in > particular, "immediate" with "recovery target not found" can not support > the "promote" or "pause" options. Otherwise, we can support: > I Agree. This is a valid point to consider. I might have few questions over this at a later stage. Recovery Target | Found | Action > -|-|-- > immediate| Yes| promote > immediate| Yes| pause > immediate| Yes| shutdown > > immediate| No | shutdown > > name/xid/time| Yes| promote > name/xid/time| Yes| pause > name/xid/time| Yes| shutdown > > name/xid/time| No | promote > name/xid/time| No | pause > name/xid/time| No | shutdown > > We could clearly support this with these options: > > recovery_target = immediate, other > recovery_action_target_found = promote, pause, shutdown > This is currently supported by the existing parameter called "recovery_target_action" > recovery_action_target_not_found = promote, pause, shutdown > This is exactly what newly proposed parameter will do. > One question to ask is if we need to support an option for xid and time > related to when we realize that we won't find the recovery target. If > consistency is reached at a time which is later than the recovery target > for time, what then? Do we go through the rest of the WAL and perform > the "not found" action at the end of the WAL stream? If we use that > approach, then at least all of the recovery target types are handled the > same, but I can definitely see cases where an administrator might prefer > an "error" option. > Currently, this situation is handled by recovery_target_inclusive parameter. Administrator can choose to stop the recovery at any consistent point before or after the specified recovery target. Is this what you were referring to ? I might need a bit of clarification here, the parameter i am proposing would be effective only if the specified recovery target is not reached and may not be effective if the recovery goes beyond recovery target point. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost wrote: > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost > wrote: > > > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > > > This behaviour will be similar to that of > recovery_target="immediate" and > > > > can be aliased. > > > > > > I don't believe we're really going at this the right way. Clearly, > > > there will be cases where we'd like promotion at the end of the WAL > > > stream (as we currently have) even if the recovery point is not found, > > > but if the new option's "promote" is the same as "immediate" then we > > > don't have that. > > > > My apologies for the confusion. Correction - I meant, "promote" option > > would promote the database once the consistent point is reached at the > > end-of-the-WAL. > > "consistent point" and "end-of-the-WAL" are not the same. > > > > recovery_target = immediate, other > > > > > > recovery_action_target_found = promote, pause, shutdown > > > > This is currently supported by the existing parameter called > > "recovery_target_action" > > Right, sure, we could possibly use that, or create an alias, etc. > > > > recovery_action_target_not_found = promote, pause, shutdown > > > > This is exactly what newly proposed parameter will do. > > Then it isn't the same as 'recovery_target = immediate'. > > > > One question to ask is if we need to support an option for xid and time > > > related to when we realize that we won't find the recovery target. If > > > consistency is reached at a time which is later than the recovery > target > > > for time, what then? Do we go through the rest of the WAL and perform > > > the "not found" action at the end of the WAL stream? If we use that > > > approach, then at least all of the recovery target types are handled > the > > > same, but I can definitely see cases where an administrator might > prefer > > > an "error" option. > > > > Currently, this situation is handled by recovery_target_inclusive > > parameter. > > No, that's not the same. > > > Administrator can choose to stop the recovery at any consistent > > point before or after the specified recovery target. Is this what you > were > > referring to ? > > Not quite. > > > I might need a bit of clarification here, the parameter i am proposing > > would be effective only if the specified recovery target is not reached > and > > may not be effective if the recovery goes beyond recovery target point. > > Ok, let's consider this scenario: > > Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08 > Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00 > recovery_target is not set > recovery_target_time = 2016-08-26 08:29:30 > recovery_target_inclusive = false > > In such a case, we will necessairly commit transactions which happened > between 8:29:30 and 8:30:00 and only stop (I believe..) once we've > reached consistency. We could possibly detect that case and throw an > error instead. The same could happen with xid. > Yes, currently, PG just recovers until the consistency is reached. It makes sense to throw an error instead. > Working through more scenarios would be useful, I believe. For example, > what if the xid or time specified happened before the backup started? > > Basically, what I was trying to get at is that we might be able to > detect that we'll never find a given recovery target point without > actually replaying all of WAL and wondering if we should provide options > to control what to do in such a case. > Ah, Yes, I got the point and I agree. Thanks for the clarification. Yes, good idea and it is important to ensure PG errors out and warn the administrator with appropriate message indicating that... "a much earlier backup must be used..." if the specified recovery target xid, name or time is earlier than the backup time. Regards, Venkata B N Fujitsu Australia
[HACKERS] pg_xlog error on the master
I just did did a "git pull" to test one of my patches and i get the following error : 2016-10-23 18:51:47.679 AEDT [31930] FATAL: could not open archive status directory "pg_xlog/archive_status": No such file or directory 2016-10-23 18:51:47.679 AEDT [31841] LOG: archiver process (PID 31930) exited with exit code 1 is it expected ? I notice that pg_xlog's name has been changed to pg_wal. I am not sure about this. Regards, Venkata B N Database Consultant / Architect
Re: [HACKERS] pg_xlog error on the master
On Sunday, 23 October 2016, Michael Paquier wrote: > On Sun, Oct 23, 2016 at 5:05 PM, Venkata B Nagothi > wrote: > > I just did did a "git pull" to test one of my patches and i get the > > following error : > > > > 2016-10-23 18:51:47.679 AEDT [31930] FATAL: could not open archive > status > > directory "pg_xlog/archive_status": No such file or directory > > 2016-10-23 18:51:47.679 AEDT [31841] LOG: archiver process (PID 31930) > > exited with exit code 1 > > > > is it expected ? > > No. > > > I notice that pg_xlog's name has been changed to pg_wal. I am not sure > about > > this. > > WAL archiving works correctly here, and tests in src/test/recovery/ > work. Are you sure that you cleaned up up your source tree before > recompiling? Oops. My bad. All works fine now. Sorry for the confusion. Thanks, Venkata B N
Re: [HACKERS] patch proposal
Attached is the patch which introduces the new parameter "recovery_target_incomplete" - Currently this patch addresses two scenarios - *Scenario 1 :* Provide options to DBA when the recovery target is not reached and has stopped mid-way due to unavailability of WALs When performing recovery to a specific recovery target, "recovery_target_incomplete" parameter can be used to either promote, pause or shutdown when recovery does not reach the specified recovery target and reaches end-of-the-wal. *Scenario 2 :* Generates Errors, Hints when the specified recovery target is prior to the backup's current position (xid, time and lsn). This behaviour is integrated with the parameters "recovery_target_time","recovery_target_lsn" and "recovery_target_xid". I would like to know if the approach this patch incorporates sounds ok ? My other comments are inline On Mon, Aug 29, 2016 at 3:17 PM, Venkata B Nagothi wrote: > > On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost > wrote: > >> * Venkata B Nagothi (nag1...@gmail.com) wrote: >> > On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost >> wrote: >> > > * Venkata B Nagothi (nag1...@gmail.com) wrote: >> > > > This behaviour will be similar to that of >> recovery_target="immediate" and >> > > > can be aliased. >> > > >> > > I don't believe we're really going at this the right way. Clearly, >> > > there will be cases where we'd like promotion at the end of the WAL >> > > stream (as we currently have) even if the recovery point is not found, >> > > but if the new option's "promote" is the same as "immediate" then we >> > > don't have that. >> > >> > My apologies for the confusion. Correction - I meant, "promote" option >> > would promote the database once the consistent point is reached at the >> > end-of-the-WAL. >> >> "consistent point" and "end-of-the-WAL" are not the same. >> >> > > recovery_target = immediate, other >> > > >> > > recovery_action_target_found = promote, pause, shutdown >> > >> > This is currently supported by the existing parameter called >> > "recovery_target_action" >> >> Right, sure, we could possibly use that, or create an alias, etc. >> >> > > recovery_action_target_not_found = promote, pause, shutdown >> > >> > This is exactly what newly proposed parameter will do. >> >> Then it isn't the same as 'recovery_target = immediate'. >> >> > > One question to ask is if we need to support an option for xid and >> time >> > > related to when we realize that we won't find the recovery target. If >> > > consistency is reached at a time which is later than the recovery >> target >> > > for time, what then? Do we go through the rest of the WAL and perform >> > > the "not found" action at the end of the WAL stream? If we use that >> > > approach, then at least all of the recovery target types are handled >> the >> > > same, but I can definitely see cases where an administrator might >> prefer >> > > an "error" option. >> > >> > Currently, this situation is handled by recovery_target_inclusive >> > parameter. >> >> No, that's not the same. >> >> > Administrator can choose to stop the recovery at any consistent >> > point before or after the specified recovery target. Is this what you >> were >> > referring to ? >> >> Not quite. >> >> > I might need a bit of clarification here, the parameter i am proposing >> > would be effective only if the specified recovery target is not reached >> and >> > may not be effective if the recovery goes beyond recovery target point. >> >> Ok, let's consider this scenario: >> >> Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08 >> Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00 >> recovery_target is not set >> recovery_target_time = 2016-08-26 08:29:30 >> recovery_target_inclusive = false >> >> In such a case, we will necessairly commit transactions which happened >> between 8:29:30 and 8:30:00 and only stop (I believe..) once we've >> reached consistency. We could possibly detect that case and throw an >> error instead. The same could happen with xid. >> > > Yes, currently, PG just recovers until the consistency is reached. It &g
[HACKERS] Contents of "backup_label" and "*.backup" in pg_wal location
Hello Hackers, I have a question regarding the contents being written to the backup_label file and the .backup file in the pg_wal location generated when the online backup is done. backup_label file contents are as follows, which do not contain backup stop position (timestamp and STOP WAL location). The below information is written when pg_start_backup() is done. START WAL LOCATION: 0/4460 (file 00010044) CHECKPOINT LOCATION: 0/4498 BACKUP METHOD: pg_start_backup BACKUP FROM: master START TIME: 2016-11-04 14:24:25 AEDT LABEL: 222 I see the following contents in the file "00010044.0060.backup" which was generated in the pg_wal location during the online backup. When pg_stop_backup() is executed, the following content is written which includes the content copied from the backup_label file. START WAL LOCATION: 0/4460 (file 00010044) STOP WAL LOCATION: 0/44000168 (file 00010044) CHECKPOINT LOCATION: 0/4498 BACKUP METHOD: pg_start_backup BACKUP FROM: master START TIME: 2016-11-04 14:24:25 AEDT LABEL: 222 STOP TIME: 2016-11-04 14:25:09 AEDT Can someone please help me know the importance of the above file ? How about having the same contents in the backup_label file as well ? I am working on another patch and was thinking it would be good if the same contents can be written to backup_label file as well. Actually, it would be more beneficial to do so. As of now, the backup_label file does not have any information related to when and at what position the backup actually completed. Any thoughts / apprehensions ? Regards, Venkata B N Database Consultant Fujitsu Australia
Re: [HACKERS] Contents of "backup_label" and "*.backup" in pg_wal location
On Fri, Nov 4, 2016 at 3:44 PM, Michael Paquier wrote: > On Fri, Nov 4, 2016 at 1:18 PM, Venkata B Nagothi > wrote: > > I see the following contents in the file > > "00010044.0060.backup" which was generated in the > pg_wal > > location during the online backup. When pg_stop_backup() is executed, the > > following content is written which includes the content copied from the > > backup_label file. > > > > [...] > > > > Can someone please help me know the importance of the above file? > > It is not actually critical, and useful for debugging (you could say > the same about backup_label.old). > > > How about having the same contents in the backup_label file as well? > > > As of now, the backup_label file does not have any information related to > > when and at what position the backup actually completed. > > Yes, and it is not actually possible to write the stop information > because when a backup finishes the backup_label is simply removed and > it has been included in the backup before it finishes. The role of > this file is to provide the LSN start location from which the backup > is able to replay things cleanly. The stop position, aka when > everything on disk is consistent, is determined at replay by the > XLOG_BACKUP_END record. This stop position is not something you can > know when the backup_label file is generated. And I am of course > talking about exclusive backups here. > Sure. I will look at the possibility of using XLOG_BACKUP_END in my patch. I am looking at the possibility of keeping the backup_label at source until pg_stop_backup() is executed and then write the STOP information and then move it across to the backup location. I see that when the START/STOP information is written to the WAL history file, the content from the backup_label file is being copied and I am thinking to do the same other way around. Am i making sense ? is that anyway not possible ? If this makes sense, then i would start working on an optimal design and look at the possibility of achieving this. Regards, Venkata B N Database Consultant Fujitsu Australia
Re: [HACKERS] Contents of "backup_label" and "*.backup" in pg_wal location
> On Fri, Nov 4, 2016 at 7:04 PM, Venkata B Nagothi > wrote: > > Sure. I will look at the possibility of using XLOG_BACKUP_END in my > patch. > > I am looking at the possibility of keeping the backup_label at source > until > > pg_stop_backup() > > is executed and then write the STOP information and then move it across > to > > the backup location. > > Non-exclusive backups already do that, except that as the backup is > already stopped at the moment the backup_label information is sent > back to the caller, and it is expected that it will be the caller that > will write its contents into the backed up PGDATA in a file named as > backup_label. Anyway, any design in this area for exclusive backups > would be really inconsistent. For example take the case of > pg_start_backup -> cp -> pg_stop_backup for an exclusive backup, when > are you going to update the backup_label file with the stop info? > I was wondering if writing the stop info in the backup_label at all. I am not sure about this possibility. > I see that when the START/STOP information is written to the WAL history > > file, > > the content from the backup_label file is being copied and I am thinking > to > > do the same other way around. > > > > Am i making sense ? is that anyway not possible ? > > > > If this makes sense, then i would start working on an optimal design and > > look at the possibility of achieving this. > > Before writing any code, I would be curious about the need behind > that, and you give no reason where this would help in this thread. Do > you actually want to time the timestamp when backup ends? This could > be added as a new field of pg_stop_backup for both the exclusive and > non-exclusive cases. Just an idea. > Sure. If the time stamp, (and possibly lsn as well) could be added as a new field to the pg_stop_backup. Will work on this. I am writing a patch (which is being discussed in the other thread) which will ensure appropriate errors/hints are thrown in various scenarios when recovering to a particular target like time, lsn or xid (as suggested by Stephen which makes a lot of sense). Example : *Scenario 1* - If you mention a recovery_target_time which falls prior to the backup-start-time, then, errors/hints will generated without initiating the recovery process which not the case now. PostgreSQL prefers recovering till immediate consistent recovery target or end-of-the-wal. This can achieved and is already addressed in the patch. So, no issues. *Scenario 2* - Similarly, if the specified recovery_target_time is later than the backup-start-time and is falling before backup-end-time, then, error/hint would be generated which says "recovery target time is well before the consistent recovery point..". The same could be done for recovery_target_lsn as well. To achieve this, it would be good to have backup-end-time, backup-end-wal-position in the backup_label. This needs to be addressed and i am exploring ways to achieve this. Regards, Venkata B N Database Consultant Fujitsu Australia
Re: [HACKERS] patch proposal
Attached is the 2nd version of the patch with some enhancements. *Scenario 2 :* > > Generates Errors, Hints when the specified recovery target is prior to the > backup's current position (xid, time and lsn). This behaviour is integrated > with the parameters "recovery_target_time","recovery_target_lsn" and > "recovery_target_xid". > In the second version of the patch, recovery_target_xid will be compared with the "latestCompletedXid" instead of "oldestXid" of the backup and if the specified recovery_target_xid is prior to the "latestCompletedXid" of the backup, then errors/hints would be generated indicating the DBA to use the appropriate backup. Now, with this patch, pg_controldata also displays the "latestCompletedXid" of the cluster. > Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08 >>> Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00 >>> recovery_target is not set >>> recovery_target_time = 2016-08-26 08:29:30 >>> recovery_target_inclusive = false >>> >>> In such a case, we will necessairly commit transactions which happened >>> between 8:29:30 and 8:30:00 and only stop (I believe..) once we've >>> reached consistency. We could possibly detect that case and throw an >>> error instead. The same could happen with xid. >>> >> >> Yes, currently, PG just recovers until the consistency is reached. It >> makes sense to throw an error instead. >> > > This has not been addressed yet. Currently, the patch only considers > generating an error if the recovery target position is prior the current > backup's position and this is irrespective of weather backup_label is > present or not. > I will share my updates on how this can be addressed. > This does not seem to be a possibility as the information in the backup_label is not enough to handle this scenario in specific cases like xid or time and it does not seem possible to add the backup stop info to the backup_label. I would prefer leaving this to the original behaviour at the moment which is : PostgreSQL generates errors and exits when *EndOfLog < minRecoveryPoint *during recovery Documentation is still pending from my end and will be submitting the same when the patch design/approach is agreed. Regards, Venkata B N Database Consultant Fujitsu Australia recoveryTargetIncomplete.patch-version2 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch proposal
On Tue, Jan 31, 2017 at 6:49 AM, David Steele wrote: > On 1/27/17 3:19 AM, Venkata B Nagothi wrote: > > > I will be adding the tests in > > src/test/recovery/t/003_recovery_targets.pl > > <http://003_recovery_targets.pl>. My tests would look more or less > > similar to recovery_target_action. I do not see any of the tests added > > for the parameter recovery_target_action ? Anyways, i will work on > > adding tests for recovery_target_incomplete. > > Do you know when those will be ready? > Apologies for the delayed response. Actually, I am working through to add the tests for both the patches and started with adding the tests to recovery_target_incomplete parameter. I have hit an issue while adding the tests which i have explained below. > > > > > > 4) I'm not convinced that fatal errors by default are the way to go. > > It's a pretty big departure from what we have now. > > > > > > I have changed the code to generate ERRORs instead of FATALs. Which is > > in the patch recoveryStartPoint.patch > > I think at this point in recovery any ERROR at all will probably be > fatal. Once you have some tests in place we'll know for sure. > Sure. I can add the tests for the first patch. Either way, the goal would be to keep recovery from proceeding and let > the user adjust their targets. Since this is a big behavioral change > which will need buy in from the community. > Sure. Agreed. > > Also, I don't think it's very intuitive how > recovery_target_incomplete > > works. For instance, if I set recovery_target_incomplete=promote > and > > set recovery_target_time, but pick a backup after the specified time, > > then there will be an error "recovery_target_time %s is older..." > rather > > than a promotion. > > > > > > Yes, that is correct and that is the expected behaviour. Let me explain - > > My point was that this combination of options could lead to confusion on > the part of users. However, I'd rather leave that alone for the time > being and focus on the first patch. > Sure. > > I have split the patch into two different > > pieces. One is to determine if the recovery can start at all and other > > patch deals with the incomplete recovery situation. > > I think the first patch looks promising and I would rather work through > that before considering the second patch. Once there are tests for the > first patch I will complete my review. > Sure. Will submit the first patch along with tests once i can get through the issue i am facing. *The issue i am facing while writing the tests* Below is the problem i am facing while adding the tests to 003_recovery_targets.pl 1. Test case : Test the Incomplete recovery with parameters restore_command, recovery_target_xid and recovery_target_incomplete ='shutdown' (option 'pause' and 'promote' work fine) 2. Expected test Result : Standby node successfully shuts-down if the recovery process reaches mid-way and does not reach the intended recovery target due to unavailability of WALs 3. All good. Everything works as expected. The problem is that, the test exits with the exit code 256, bails out and shows-up as "pg_ctl failed". Is there anyway, i can ensure the test does not bail-out and exits with the exit code '0' when you do "pg_ctl start" and the server starts and shuts-down for any reason ? If you look at the log file, pg_ctl actually starts successfully and then shuts down cleanly because intended recovery target is not reached. The messages in the log file are also appropriate. Since it is a clean-start and clean-shutdown (which means the test is successful), the test should return a success, which is not happening. Below is the log (a test for the second patch). I believe Perl generates an error code 256 because pg_ctl responds with an error - "pg_ctl: could not start the server". I would need my test case to return a success in this scenario. Any suggestions would be great. [dba@buildhost ~]$ /data/PostgreSQL-Repo/postgresql/tmp_install/disk1/ pginstall/pgsql-10-dev-master-xlog-2-wal/bin/pg_ctl -D /data/PostgreSQL-Repo/postgresql/src/test/recovery/ tmp_check/data_pitr_3_U_oA/pgdata start waiting for server to start2017-02-22 12:06:41.773 AEDT [23858] LOG: database system was interrupted while in recovery at log time 2017-02-15 16:40:41 AEDT 2017-02-22 12:06:41.773 AEDT [23858] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2017-02-22 12:06:41.773 AEDT [23858] LOG: starting point-in-time recovery to XID 559 2017-02-22 12:06:41.782 AEDT [23858] LOG: restored log file "0001000
Re: [HACKERS] patch proposal
On Tue, Jan 31, 2017 at 10:41 AM, Michael Paquier wrote: > On Tue, Jan 31, 2017 at 4:49 AM, David Steele wrote: > > On 1/27/17 3:19 AM, Venkata B Nagothi wrote: > >> I have split the patch into two different > >> pieces. One is to determine if the recovery can start at all and other > >> patch deals with the incomplete recovery situation. > > > > I think the first patch looks promising and I would rather work through > > that before considering the second patch. Once there are tests for the > > first patch I will complete my review. > > Based on that, I am moving the patch to next CF with "Needs Review". > Venkata, please be careful in updating correctly the patch status, it > was still on "Waiting on Author". > Apologies. Sure. Will make a note. Regards, Venkata B N Database Consultant
[HACKERS] Range Partitioning behaviour - query
Hi Hackers, I have noticed the following behaviour in range partitioning which i felt is not quite correct (i missed reporting this) - I have tested by creating a date ranged partition. I created the following table. db03=# CREATE TABLE orders ( o_orderkey INTEGER, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice REAL, o_orderdate DATE, o_orderpriority CHAR(15), o_clerk CHAR(15), o_shippriority INTEGER, o_comment VARCHAR(79)) partition by range (o_orderdate); CREATE TABLE Created the following partitioned tables : db03=# CREATE TABLE orders_y1992 PARTITION OF orders FOR VALUES FROM ('1992-01-01') TO ('1992-12-31'); CREATE TABLE db03=# CREATE TABLE orders_y1993 PARTITION OF orders FOR VALUES FROM ('1993-01-01') TO ('*1993-12-31'*); CREATE TABLE db03=# CREATE TABLE orders_y1994 PARTITION OF orders FOR VALUES FROM ('1994-01-01') TO ('1994-12-31'); CREATE TABLE The rows with the date "1993-12-31" gets rejected as shown below - db03=# copy orders from '/data/orders.csv' delimiter '|'; ERROR: no partition of relation "orders" found for row DETAIL: Failing row contains (353, 8878, F, 273342, *1993-12-31*, 5-LOW , Clerk#02241, 0, quiet ideas sleep. even instructions cajole slyly. silently spe). CONTEXT: COPY orders, line 89: "353|8878|F|273342|*1993-12-31*|5-LOW |Clerk#02241|0| quiet ideas sleep. even instructions..." I would want the partition "orders_y1993" to accept all the rows with the date 1993-12-31. To confirm this behaviour, I did another simple test with numbers - I created two partitioned tables with range values from 1 to 5 and from 6 to 10 as shown below - db03=# create table test_part ( col int) partition by range (col); CREATE TABLE db03=# create table test_part_5 partition of test_part for values from (1) to (5); CREATE TABLE db03=# create table test_part_10 partition of test_part for values from (6) to (10); CREATE TABLE When i try to insert value 5, it gets rejected as shown below db03=# insert into test_part values (5); ERROR: no partition of relation "test_part" found for row DETAIL: Failing row contains (5). The table partition "test_part_5" is not supposed to accept value 5 ? Am i missing anything here ? Regards, Venkata B N Database Consultant
Re: [HACKERS] Range Partitioning behaviour - query
On Thu, Feb 23, 2017 at 3:14 PM, Amit Langote wrote: > Hi, > > On 2017/02/23 11:55, Venkata B Nagothi wrote: > > Hi Hackers, > > > > I have noticed the following behaviour in range partitioning which i felt > > is not quite correct (i missed reporting this) - > > > > I have tested by creating a date ranged partition. > > > > I created the following table. > > > > db03=# CREATE TABLE orders ( > > o_orderkey INTEGER, > > o_custkey INTEGER, > > o_orderstatus CHAR(1), > > o_totalprice REAL, > > o_orderdate DATE, > > o_orderpriority CHAR(15), > > o_clerk CHAR(15), > > o_shippriority INTEGER, > > o_comment VARCHAR(79)) partition by range (o_orderdate); > > CREATE TABLE > > > > Created the following partitioned tables : > > > > > > db03=# CREATE TABLE orders_y1992 > > PARTITION OF orders FOR VALUES FROM ('1992-01-01') TO ('1992-12-31'); > > CREATE TABLE > > > > db03=# CREATE TABLE orders_y1993 > > PARTITION OF orders FOR VALUES FROM ('1993-01-01') TO ('*1993-12-31 > '*); > > CREATE TABLE > > > > db03=# CREATE TABLE orders_y1994 > >PARTITION OF orders FOR VALUES FROM ('1994-01-01') TO ('1994-12-31'); > > CREATE TABLE > > > > > > The rows with the date "1993-12-31" gets rejected as shown below - > > > > db03=# copy orders from '/data/orders.csv' delimiter '|'; > > ERROR: no partition of relation "orders" found for row > > DETAIL: Failing row contains (353, 8878, F, 273342, *1993-12-31*, 5-LOW > >, Clerk#02241, 0, quiet ideas sleep. even instructions cajole > > slyly. silently spe). > > CONTEXT: COPY orders, line 89: "353|8878|F|273342|*1993-12-31*|5-LOW > >|Clerk#02241|0| quiet ideas sleep. even instructions..." > > > > I would want the partition "orders_y1993" to accept all the rows with the > > date 1993-12-31. > > [ ... ] > > > Am i missing anything here ? > > Upper bound of a range partition is an exclusive bound. A note was added > recently to the CREATE TABLE page to make this clear. > > https://www.postgresql.org/docs/devel/static/sql-createtable.html Thanks. Actually, my confusion was that the upper bound value would be included when "TO" clause is used in the syntax. Also, there are no options like "<" or "LESS THAN" clauses available. So, "TO" translates to "<". That is what i wanted to confirm. Regards, Venkata B N Database Consultant
Re: [HACKERS] Range Partitioning behaviour - query
On Fri, Feb 24, 2017 at 1:01 PM, Amit Langote wrote: > On 2017/02/24 10:38, David G. Johnston wrote: > > On Thu, Feb 23, 2017 at 6:17 PM, Amit Langote < > langote_amit...@lab.ntt.co.jp > >> wrote: > > > >> On 2017/02/24 8:38, Venkata B Nagothi wrote: > >>> On Thu, Feb 23, 2017 at 3:14 PM, Amit Langote wrote: > >>>> Upper bound of a range partition is an exclusive bound. A note was > >> added > >>>> recently to the CREATE TABLE page to make this clear. > >>>> > >>>> https://www.postgresql.org/docs/devel/static/sql-createtable.html > >>> > >>> > >>> Thanks. Actually, my confusion was that the upper bound value would be > >>> included when "TO" clause is used in the syntax. > >> > >> Hmm, TO sounds like it implies inclusive. > >> > > > > âI think most common usage of the word ends up being inclusive but the > word > > itself doesn't really care.â > > > > Dictionary.com has a good example: > > > > "We work from nine to five." - you leave at the beginning of the 5 > o'clock > > hour (I'm going for casual usage here) > > Thanks for that example. > > One problem I've seen people mention is one of cognitive dissonance of > having to define partition_y2013 as FROM ('2013-01-01') TO ('2014-01-01'), > given that that's the only way to get what one needs. But we concluded > that that's a reasonable compromise. > Agreed. I do see the similar approach adopted across other traditional RDBMS products as well. Regards, Venkata B N Database Consultant
Re: [HACKERS] Range Partitioning behaviour - query
On Fri, Feb 24, 2017 at 12:38 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thu, Feb 23, 2017 at 6:17 PM, Amit Langote < > langote_amit...@lab.ntt.co.jp> wrote: > >> On 2017/02/24 8:38, Venkata B Nagothi wrote: >> > On Thu, Feb 23, 2017 at 3:14 PM, Amit Langote wrote: >> >> Upper bound of a range partition is an exclusive bound. A note was >> added >> >> recently to the CREATE TABLE page to make this clear. >> >> >> >> https://www.postgresql.org/docs/devel/static/sql-createtable.html >> > >> > >> > Thanks. Actually, my confusion was that the upper bound value would be >> > included when "TO" clause is used in the syntax. >> >> Hmm, TO sounds like it implies inclusive. >> > > âI think most common usage of the word ends up being inclusive but the > word itself doesn't really care.â > > Dictionary.com has a good example: > > "We work from nine to five." - you leave at the beginning of the 5 o'clock > hour (I'm going for casual usage here) > True. > Since our implementation of ranges is half-open the usage here is > consistent with that concept. That it doesn't match BETWEEN is actually > somewhat nice since you can use ranges for half-open and BETWEEN if you > want to be concise with fully-closed endpoints. But it is one more thing > to remember. > Agreed. Regards, Venkata B N Database Consultant
Re: [HACKERS] patch proposal
Hi David, On Tue, Jan 31, 2017 at 6:49 AM, David Steele wrote: > On 1/27/17 3:19 AM, Venkata B Nagothi wrote: > > > I will be adding the tests in > > src/test/recovery/t/003_recovery_targets.pl > > <http://003_recovery_targets.pl>. My tests would look more or less > > similar to recovery_target_action. I do not see any of the tests added > > for the parameter recovery_target_action ? Anyways, i will work on > > adding tests for recovery_target_incomplete. > > Do you know when those will be ready? > Attached are both the patches with tests included. > > > > > > > 4) I'm not convinced that fatal errors by default are the way to go. > > It's a pretty big departure from what we have now. > > > > > > I have changed the code to generate ERRORs instead of FATALs. Which is > > in the patch recoveryStartPoint.patch > > I think at this point in recovery any ERROR at all will probably be > fatal. Once you have some tests in place we'll know for sure. > > Either way, the goal would be to keep recovery from proceeding and let > the user adjust their targets. Since this is a big behavioral change > which will need buy in from the community. > Hoping to get some comments / feedback from the community on the second patch too. > Also, I don't think it's very intuitive how recovery_target_incomplete > > works. For instance, if I set recovery_target_incomplete=promote > and > > set recovery_target_time, but pick a backup after the specified time, > > then there will be an error "recovery_target_time %s is older..." > rather > > than a promotion. > > > > > > Yes, that is correct and that is the expected behaviour. Let me explain - > > My point was that this combination of options could lead to confusion on > the part of users. However, I'd rather leave that alone for the time > being and focus on the first patch. > > > I have split the patch into two different > > pieces. One is to determine if the recovery can start at all and other > > patch deals with the incomplete recovery situation. > > I think the first patch looks promising and I would rather work through > that before considering the second patch. Once there are tests for the > first patch I will complete my review. > I have added all the tests for the second patch and all seem to be working fine. Regarding the first patch, i have included all the tests except for the test case on recovery_target_time. I did write the test, but, the test is kind of behaving weird which i am working through to resolve. Regards, Venkata B N Database Consultant recoveryStartPoint.patch-version3 Description: Binary data recoveryTargetIncomplete.patch-version4 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch proposal
On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi wrote: > Hi David, > > On Tue, Jan 31, 2017 at 6:49 AM, David Steele wrote: > >> On 1/27/17 3:19 AM, Venkata B Nagothi wrote: >> >> > I will be adding the tests in >> > src/test/recovery/t/003_recovery_targets.pl >> > <http://003_recovery_targets.pl>. My tests would look more or less >> > similar to recovery_target_action. I do not see any of the tests added >> > for the parameter recovery_target_action ? Anyways, i will work on >> > adding tests for recovery_target_incomplete. >> >> Do you know when those will be ready? >> > > Attached are both the patches with tests included. > > >> >> > >> > >> > 4) I'm not convinced that fatal errors by default are the way to go. >> > It's a pretty big departure from what we have now. >> > >> > >> > I have changed the code to generate ERRORs instead of FATALs. Which is >> > in the patch recoveryStartPoint.patch >> >> I think at this point in recovery any ERROR at all will probably be >> fatal. Once you have some tests in place we'll know for sure. >> >> Either way, the goal would be to keep recovery from proceeding and let >> the user adjust their targets. Since this is a big behavioral change >> which will need buy in from the community. >> > > Hoping to get some comments / feedback from the community on the second > patch too. > > > Also, I don't think it's very intuitive how >> recovery_target_incomplete >> > works. For instance, if I set recovery_target_incomplete=promote >> and >> > set recovery_target_time, but pick a backup after the specified >> time, >> > then there will be an error "recovery_target_time %s is older..." >> rather >> > than a promotion. >> > >> > >> > Yes, that is correct and that is the expected behaviour. Let me explain >> - >> >> My point was that this combination of options could lead to confusion on >> the part of users. However, I'd rather leave that alone for the time >> being and focus on the first patch. >> >> > I have split the patch into two different >> > pieces. One is to determine if the recovery can start at all and other >> > patch deals with the incomplete recovery situation. >> >> I think the first patch looks promising and I would rather work through >> that before considering the second patch. Once there are tests for the >> first patch I will complete my review. >> > > I have added all the tests for the second patch and all seem to be working > fine. > > Regarding the first patch, i have included all the tests except for the > test case on recovery_target_time. > I did write the test, but, the test is kind of behaving weird which i am > working through to resolve. > This issue has been resolved. All good now. To avoid any further confusion, i have attached the latest versions (4) of both the patches with all the tests included. I have already changed the patch status to "Needs review". Thank you ! Regards, Venkata B N Database Consultant recoveryStartPoint.patch-version4 Description: Binary data recoveryTargetIncomplete.patch-version4 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello. I added pgsql-hackers. > > This occurs also on git master and back to 9.4. > > At Fri, 13 Jan 2017 08:47:06 -0600, Jonathon Nelson > wrote in gmail.com> > > On Mon, Nov 28, 2016 at 1:39 PM, Jonathon Nelson > wrote: > > > First, the postgresql configuration differs only minimally from the > stock > > > config: > > > > > > Assume wal_keep_segments = 0. > > > Assume the use of physical replication slots. > > > Assume one master, one standby. > > > > > > Lastly, we have observed the behavior "in the wild" at least twice and > in > > > the lab a dozen or so times. > > > > > > EXAMPLE #1 (logs are from the replica): > > > > > > user=,db=,app=,client= DEBUG: creating and filling new WAL file > > > user=,db=,app=,client= DEBUG: done creating and filling new WAL file > > > user=,db=,app=,client= DEBUG: sending write 6/8B00 flush > 6/8A00 > > > apply 5/748425A0 > > > user=,db=,app=,client= DEBUG: sending write 6/8B00 flush > 6/8B00 > > > apply 5/74843020 > > > > > > user=,db=,app=,client= DEBUG: postmaster received signal 2 > > > user=,db=,app=,client= LOG: received fast shutdown request > > > user=,db=,app=,client= LOG: aborting any active transactions > > > > > > And, upon restart: > > > > > > user=,db=,app=,client= LOG: restartpoint starting: xlog > > > user=,db=,app=,client= DEBUG: updated min recovery point to > 6/67002390 on > > > timeline 1 > > > user=,db=,app=,client= DEBUG: performing replication slot checkpoint > > > user=,db=,app=,client= DEBUG: updated min recovery point to > 6/671768C0 on > > > timeline 1 > > > user=,db=,app=,client= CONTEXT: writing block 589 of relation > > > base/13294/16501 > > > user=,db=,app=,client= LOG: invalid magic number in log segment > > > 00010006008B, offset 0 > > > user=,db=,app=,client= DEBUG: switched WAL source from archive to > stream > > > after failure > > > user=,db=,app=,client= LOG: started streaming WAL from primary at > > > 6/8A00 on timeline 1 > > > user=,db=,app=,client= FATAL: could not receive data from WAL stream: > > > ERROR: requested WAL segment 00010006008A has already been > > > removed > > I managed to reproduce this. A little tweak as the first patch > lets the standby to suicide as soon as walreceiver sees a > contrecord at the beginning of a segment. > > - M(aster): createdb as a master with wal_keep_segments = 0 > (default), min_log_messages = debug2 > - M: Create a physical repslot. > - S(tandby): Setup a standby database. > - S: Edit recovery.conf to use the replication slot above then > start it. > - S: touch /tmp/hoge > - M: Run pgbench ... > - S: After a while, the standby stops. > > LOG: STOP THE SERVER > > - M: Stop pgbench. > - M: Do 'checkpoint;' twice. > - S: rm /tmp/hoge > - S: Fails to catch up with the following error. > > > FATAL: could not receive data from WAL stream: ERROR: requested WAL > segment 0001002B has already been removed > > I have been testing / reviewing the latest patch "0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might need some more clarification on this. Before applying the patch, I tried re-producing the above error - - I had master->standby in streaming replication - Took the backup of master - with a low max_wal_size and wal_keep_segments = 0 - Configured standby with recovery.conf - Created replication slot on master - Configured the replication slot on standby and started the standby - I got the below error >> 2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length at 0/F2000140: wanted 24, got 0 >> 2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL from primary at 0/F200 on timeline 1 >> 2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000100F2 has already been removed and i could notice that the file "000100F2" was removed from the master. This can be easily re-produced and this occurs irrespective of configuring replication slots. As long as the file "000100F2" is available on the master, standby continues to stream WALs without any issues. some more details - Contents of the file "000100F2" on standby before pg_stop_backup() rmgr: Standby len (rec/tot): 24/50, tx: 0, lsn: 0/F228, prev 0/F198, desc: RUNNING_XACTS nextXid 638 latestCompletedXid 637 oldestRunningXid 638 rmgr: Standby len (rec/tot): 24/50, tx: 0, lsn: 0/F260, prev 0/F228, desc: RUNNING_XACTS nextXid 638 latestCompletedXid 637 oldestRunningXid 638 rmgr: XLOGlen (rec/tot): 80/ 106, tx: 0, lsn: 0/F298, prev 0/F260, desc: CHECKPOINT_ONLINE redo 0/F260; tli 1; prev tli 1; fpw true; xid 0:638; oid 16487; multi 1; offset 0; oldest xid 54
Re: [HACKERS] patch proposal
On Tue, Mar 21, 2017 at 8:46 AM, David Steele wrote: > Hi Venkata, > > On 2/28/17 11:59 PM, Venkata B Nagothi wrote: > >> On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi > <mailto:nag1...@gmail.com>> wrote: >> On Tue, Jan 31, 2017 at 6:49 AM, David Steele > <mailto:da...@pgmasters.net>> wrote: >> >> Do you know when those will be ready? >> >> Attached are both the patches with tests included. >> > > Thanks for adding the tests. They bring clarity to the patch. > Thank you again reviewing the patch and your comments ! > Unfortunately, I don't think the first patch (recoveryStartPoint) will > work as currently implemented. The problem I see is that the new function > recoveryStartsHere() depends on pg_control containing a checkpoint right at > the end of the backup. There's no guarantee that this is true, even if > pg_control is copied last. That means a time, lsn, or xid that occurs > somewhere in the middle of the backup can be selected without complaint > from this code depending on timing. > Yes, that is true. The latest best position, checkpoint position, xid and timestamp of the restored backup of the backup is shown up in the pg_controldata, which means, that is the position from which the recovery would start. Which in-turn means, WALs start getting replayed from that position towards --> minimum recovery position (which is the end backup, which again means applying WALs generated between start and the end backup) all the way through to --> recovery target position. The best start position to check with would the position shown up in the pg_control file, which is way much better compared to the current postgresql behaviour. The tests pass (or rather fail as expected) because they are written using > values before the start of the backup. It's actually the end of the backup > (where the database becomes consistent on recovery) that defines where PITR > can start and this distinction definitely matters for very long backups. A > better test would be to start the backup, get a time/lsn/xid after writing > some data, and then make sure that time/lsn/xid is flagged as invalid on > recovery. > Yes, i agree, the databases restored from the backup would start the recovery and would become consistent once the end-backup is reached. The point here is that, when the backup is restored and recovery proceeds further, there is NO WAY to identify the next consistent position or end-position of the backup. This patch is only implementing a check to ensure recovery starts and proceeds at the right position instead of pre-maturely getting promoted which is the current behaviour. The current behaviour is that, if the XID shown-up by the pg_controldata is say 1400 and the specified recovery_target_xid is say 200, then, postgresql just completes the recovery and promotes at the immediate consistent recovery target, which means, the DBA has to all the way start the restoration and recovery process again to promote the database at the intended position. It is also problematic to assume that transaction IDs commit in order. If > checkPoint.latestCompletedXid contains 5 then a recovery to xid 4 may or > may not be successful depending on the commit order. However, it appears > in this case the patch would disallow recovery to 4. > If the txid 4 is the recovery target xid, then, the appropriate backup taken previous to txid 4 must be used or as an alternative recovery target like recovery_target_timestamp must be used to proceed to the respective recovery target xid. > I think this logic would need to be baked into recoveryStopsAfter() and/or > recoveryStopsBefore() in order to work. It's not clear to me what that > would look like, however, or if the xid check is even practical. > The above two functions would determine if the recovery process has to stop at a particular position or not and the patch has nothing to do with it. To summarise, this patch only determines if the WAL replay has to start at all. In other words, if the specified recovery target falls prior to the restored database position, then, the WAL replay should not start, which prevent the database from getting promoted at an unintended recovery target. Any thoughts/comments ? Regards, Venkata Balaji N Database Consultant
Re: [HACKERS] patch proposal
Hi David, On Thu, Mar 23, 2017 at 4:21 AM, David Steele wrote: > On 3/21/17 8:45 PM, Venkata B Nagothi wrote: > >> On Tue, Mar 21, 2017 at 8:46 AM, David Steele > >> Unfortunately, I don't think the first patch (recoveryStartPoint) >> will work as currently implemented. The problem I see is that the >> new function recoveryStartsHere() depends on pg_control containing a >> checkpoint right at the end of the backup. There's no guarantee >> that this is true, even if pg_control is copied last. That means a >> time, lsn, or xid that occurs somewhere in the middle of the backup >> can be selected without complaint from this code depending on timing. >> >> >> Yes, that is true. The latest best position, checkpoint position, xid >> and timestamp of the restored backup of the backup is shown up in the >> pg_controldata, which means, that is the position from which the >> recovery would start. >> > > Backup recovery starts from the checkpoint in the backup_label, not from > the checkpoint in pg_control. The original checkpoint that started the > backup is generally overwritten in pg_control by the end of the backup. Yes, I totally agree. My initial intention was to compare the recovery target position(s) with the contents in the backup_label, but, then, the checks would fails if the recovery is performed without the backup_label file. Then, i decided to compare the recovery target positions with the contents in the pg_control file. > Which in-turn means, WALs start getting replayed >> from that position towards --> minimum recovery position (which is the >> end backup, which again means applying WALs generated between start and >> the end backup) all the way through to --> recovery target position. >> > > minRecoveryPoint is only used when recovering a backup that was made from > a standby. For backups taken on the master, the backup end WAL record is > used. > > The best start position to check with would the position shown up in the >> pg_control file, which is way much better compared to the current >> postgresql behaviour. >> > > I don't agree, for the reasons given previously. > As explained above, my intention was to ensure that the recovery start positions checks are successfully performed irrespective of the presence of the backup_label file. I did some analysis before deciding to use pg_controldata's output instead of backup_label file contents. Comparing the output of the pg_controldata with the contents of backup_label contents. *Recovery Target LSN* START WAL LOCATION (which is 0/9C28) in the backup_label = pg_controldata's latest checkpoint's REDO location (Latest checkpoint's REDO location: 0/9C28) *Recovery Target TIME* backup start time in the backup_label (START TIME: 2017-03-26 11:55:46 AEDT) = pg_controldata's latest checkpoint time (Time of latest checkpoint : Sun 26 Mar 2017 11:55:46 AM AEDT) *Recovery Target XID* To begin with backup_label does contain any start XID. So, the only option is to depend on pg_controldata's output. After a few quick tests and thorough observation, i do notice that, the pg_control file information is copied as it is to the backup location at the pg_start_backup. I performed some quick tests by running few transactions between pg_start_backup and pg_stop_backup. So, i believe, this is ideal start point for WAL replay. Am i missing anything here ? The tests pass (or rather fail as expected) because they are written >> using values before the start of the backup. It's actually the end >> of the backup (where the database becomes consistent on recovery) >> that defines where PITR can start and this distinction definitely >> matters for very long backups. A better test would be to start the >> backup, get a time/lsn/xid after writing some data, and then make >> sure that time/lsn/xid is flagged as invalid on recovery. >> >> The current behaviour is that, if the XID shown-up by the pg_controldata >> is say 1400 and the specified recovery_target_xid is say 200, then, >> postgresql just completes the recovery and promotes at the immediate >> consistent recovery target, which means, the DBA has to all the way >> start the restoration and recovery process again to promote the database >> at the intended position. >> > > I'm not saying that the current behavior is good, only that the proposed > behavior is incorrect as far as I can tell. Please consider my explanation above and share your thoughts. > > It is also problematic to assume that transaction IDs commit in >> order. If checkPoint.latestCompletedXid conta
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Regards, Venkata B N Database Consultant On Tue, Mar 28, 2017 at 5:51 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > This conflicts with 6912acc (replication lag tracker) so just > rebased on a6f22e8. > I tried applying this patch to latest master, it is not getting applied [dba@buildhost postgresql]$ git apply /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:28: trailing whitespace. /* /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:29: trailing whitespace. * This variable corresponds to restart_lsn in pg_replication_slots for a /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:30: trailing whitespace. * physical slot. This has a valid value only when it differs from the current /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:31: trailing whitespace. * flush pointer. /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:32: trailing whitespace. */ error: patch failed: src/backend/replication/walsender.c:210 error: src/backend/replication/walsender.c: patch does not apply Regards, Venkata Balaji N Database Consultant
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On Thu, Mar 30, 2017 at 10:55 AM, Michael Paquier wrote: > On Thu, Mar 30, 2017 at 8:49 AM, Venkata B Nagothi > wrote: > > On Tue, Mar 28, 2017 at 5:51 PM, Kyotaro HORIGUCHI > > wrote: > > I tried applying this patch to latest master, it is not getting applied > > > > [dba@buildhost postgresql]$ git apply > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:28: > > trailing whitespace. > > /* > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:29: > > trailing whitespace. > > * This variable corresponds to restart_lsn in pg_replication_slots for a > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:30: > > trailing whitespace. > > * physical slot. This has a valid value only when it differs from the > > current > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:31: > > trailing whitespace. > > * flush pointer. > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:32: > > trailing whitespace. > > */ > > error: patch failed: src/backend/replication/walsender.c:210 > > error: src/backend/replication/walsender.c: patch does not apply > > git apply and git am can be very picky sometimes, so you may want to > fallback to patch -p1 if things don't work. In this case it does. > patch -p1 seems to be working. Thanks ! Regards, Venkata B N Database Consultant
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On Thu, Mar 30, 2017 at 3:51 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > At Thu, 30 Mar 2017 11:12:56 +1100, Venkata B Nagothi > wrote in gmail.com> > > On Thu, Mar 30, 2017 at 10:55 AM, Michael Paquier < > michael.paqu...@gmail.com > > > wrote: > > > > > On Thu, Mar 30, 2017 at 8:49 AM, Venkata B Nagothi > > > wrote: > > > > On Tue, Mar 28, 2017 at 5:51 PM, Kyotaro HORIGUCHI > > > > wrote: > > > > I tried applying this patch to latest master, it is not getting > applied > > > > > > > > [dba@buildhost postgresql]$ git apply > > > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > > > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch > > > > /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/ > > > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:28: > > > > trailing whitespace. > > > > /* > > > 0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:29: > > > > trailing whitespace. > > > > * This variable corresponds to restart_lsn in pg_replication_slots > for a > ... > > > git apply and git am can be very picky sometimes, so you may want to > > > fallback to patch -p1 if things don't work. In this case it does. > > > > > > > patch -p1 seems to be working. Thanks ! > > That's quite strange. The patch I sent doesn't cantain trailing > spaces at all. The cited lines doesn't seem to contain them. It > applied cleanly with "git am" for me. > > The file restored from the attachment of received mail also don't. > > The original files contains the following, > > 0002440 66 6f 72 20 61 0a 2b 20 2a 20 70 68 79 73 69 63 > f o r a \n + * p h y s i c > > The corresponding part of the file restored from mail on Windows > is the following, > 0002460 63 61 74 69 6f 6e 5f 73 6c 6f 74 73 20 66 6f 72 > c a t i o n _ s l o t s f o r > 0002500 20 61 0d 0a 2b 20 2a 20 70 68 79 73 69 63 61 6c > a \r \n + * p h y s i c a l > > Both doesn't contain a space at the end of a line. How did you > retrieve the patch from the mail? > Yes, downloaded from the email on Windows and copied across to Linux and did "git apply". Regards, Venkata B N Database Consultant
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On Thu, Mar 30, 2017 at 4:46 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello, > > At Thu, 30 Mar 2017 15:59:14 +1100, Venkata B Nagothi > wrote in xwu0j05x_j...@mail.gmail.com> > > Yes, downloaded from the email on Windows and copied across to Linux and > > did "git apply". > > The same works for me. But --keep-cr gave me the same result with > you. > > > $ git am --keep-cr ~/work/patches/0001-Fix-a-bug- > of-physical-replication-slot_a6f22e8.patch > > Applying: Fix a bug of physical replication slot. > > .git/rebase-apply/patch:13: trailing whitespace. > > /* > for me too - [dba@buildhost postgresql]$ git am --keep-cr /data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch Applying: Fix a bug of physical replication slot. /data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:13: trailing whitespace. /* /data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:14: trailing whitespace. * This variable corresponds to restart_lsn in pg_replication_slots for a /data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:15: trailing whitespace. * physical slot. This has a valid value only when it differs from the current /data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:16: trailing whitespace. * flush pointer. /data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:17: trailing whitespace. */ error: patch failed: src/backend/replication/walsender.c:210 error: src/backend/replication/walsender.c: patch does not apply Patch failed at 0001 Fix a bug of physical replication slot. The copy of the patch that failed is found in: /data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch When you have resolved this problem, run "git am --resolved". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". > https://git-scm.com/docs/git-am > > | --[no-]keep-cr > | > | With --keep-cr, call git mailsplit (see git-mailsplit[1]) with > | the same option, to prevent it from stripping CR at the end of > | lines. am.keepcr configuration variable can be used to specify > | the default behaviour. --no-keep-cr is useful to override > | am.keepcr. > > I don't know why it preserves CRs only for the lines, but anyway, > don't you have am.keepcr in you configuration? > May be, I do not think i have am.keepcr in my configuration. I am not 100% sure of it. I only did "git apply.." which produced white space errors. Regards, Venkata B N Database Consultant
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On Fri, Mar 17, 2017 at 6:48 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello, > > At Mon, 13 Mar 2017 11:06:00 +1100, Venkata B Nagothi > wrote in gmail.com> > > On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI < > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > I managed to reproduce this. A little tweak as the first patch > > > lets the standby to suicide as soon as walreceiver sees a > > > contrecord at the beginning of a segment. > > > > > > - M(aster): createdb as a master with wal_keep_segments = 0 > > > (default), min_log_messages = debug2 > > > - M: Create a physical repslot. > > > - S(tandby): Setup a standby database. > > > - S: Edit recovery.conf to use the replication slot above then > > > start it. > > > - S: touch /tmp/hoge > > > - M: Run pgbench ... > > > - S: After a while, the standby stops. > > > > LOG: STOP THE SERVER > > > > > > - M: Stop pgbench. > > > - M: Do 'checkpoint;' twice. > > > - S: rm /tmp/hoge > > > - S: Fails to catch up with the following error. > > > > > > > FATAL: could not receive data from WAL stream: ERROR: requested > WAL > > > segment 0001002B has already been removed > > > > > > > > I have been testing / reviewing the latest patch > > "0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might > > need some more clarification on this. > > > > Before applying the patch, I tried re-producing the above error - > > > > - I had master->standby in streaming replication > > - Took the backup of master > >- with a low max_wal_size and wal_keep_segments = 0 > > - Configured standby with recovery.conf > > - Created replication slot on master > > - Configured the replication slot on standby and started the standby > > I suppose the "configure" means primary_slot_name in recovery.conf. > > > - I got the below error > > > >>> 2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length at > > 0/F2000140: wanted 24, got 0 > >>> 2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL from > > primary at 0/F200 on timeline 1 > >>> 2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive data > > from WAL stream: ERROR: requested WAL segment 000100F2 > has > > already been removed > > Maybe you created the master slot with non-reserve (default) mode > and put a some-minites pause after making the backup and before > starting the standby. For the case the master slot doesn't keep > WAL segments unless the standby connects so a couple of > checkpoints can blow away the first segment required by the > standby. This is quite reasonable behavior. The following steps > makes this more sure. > > > - Took the backup of master > >- with a low max_wal_size = 2 and wal_keep_segments = 0 > > - Configured standby with recovery.conf > > - Created replication slot on master > + - SELECT pg_switch_wal(); on master twice. > + - checkpoint; on master twice. > > - Configured the replication slot on standby and started the standby > > Creating the slot with the following command will save it. > > =# select pg_create_physical_replication_slot('s1', true); > I did a test again, by applying the patch and I am not sure if the patch is doing the right thing ? Here is test case - - I ran pgbench - I took the backup of the master first - Below are the WALs on master after the stop backup - postgres=# select pg_stop_backup(); NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup pg_stop_backup 0/8C000130 (1 row) postgres=# \q [dba@buildhost data]$ ls -ltrh pgdata-10dev-prsb-1/pg_wal/ total 65M drwx--. 2 dba dba 4.0K Mar 31 09:36 archive_status -rw---. 1 dba dba 16M Mar 31 11:09 0001008E -rw---. 1 dba dba 16M Mar 31 11:17 0001008F -rw---. 1 dba dba 16M Mar 31 11:18 0001008C -rw---. 1 dba dba 16M Mar 31 11:18 0001008D - After the backup, i created the physical replication slot postgres=# select pg_create_physical_replication_slot('repslot',true); pg_create_physical_replication_slot - (repslot,0/8D28) (1 row) postgres=# select pg_walfile_name('0/8D28'); pg_walfile_name --- 0001008D (1 row) Here
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On Fri, Mar 31, 2017 at 4:05 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Thank you having a look on this. > > # I removed -bugs in CC:. > > At Fri, 31 Mar 2017 13:40:00 +1100, Venkata B Nagothi > wrote in gmail.com> > > On Fri, Mar 17, 2017 at 6:48 PM, Kyotaro HORIGUCHI < > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > > Hello, > > > > > > At Mon, 13 Mar 2017 11:06:00 +1100, Venkata B Nagothi < > nag1...@gmail.com> > > > wrote in > > gmail.com> > > > > On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI < > > > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > > I managed to reproduce this. A little tweak as the first patch > > > > > lets the standby to suicide as soon as walreceiver sees a > > > > > contrecord at the beginning of a segment. > > > > > > > > > > - M(aster): createdb as a master with wal_keep_segments = 0 > > > > > (default), min_log_messages = debug2 > > > > > - M: Create a physical repslot. > > > > > - S(tandby): Setup a standby database. > > > > > - S: Edit recovery.conf to use the replication slot above then > > > > > start it. > > > > > - S: touch /tmp/hoge > > > > > - M: Run pgbench ... > > > > > - S: After a while, the standby stops. > > > > > > LOG: STOP THE SERVER > > > > > > > > > > - M: Stop pgbench. > > > > > - M: Do 'checkpoint;' twice. > > > > > - S: rm /tmp/hoge > > > > > - S: Fails to catch up with the following error. > > > > > > > > > > > FATAL: could not receive data from WAL stream: ERROR: > requested > > > WAL > > > > > segment 0001002B has already been removed > > > > > > > > > > > > > > I have been testing / reviewing the latest patch > > > > "0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i > might > > > > need some more clarification on this. > > > > > > > > Before applying the patch, I tried re-producing the above error - > > > > > > > > - I had master->standby in streaming replication > > > > - Took the backup of master > > > >- with a low max_wal_size and wal_keep_segments = 0 > > > > - Configured standby with recovery.conf > > > > - Created replication slot on master > > > > - Configured the replication slot on standby and started the standby > > > > > > I suppose the "configure" means primary_slot_name in recovery.conf. > > > > > > > - I got the below error > > > > > > > >>> 2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length > at > > > > 0/F2000140: wanted 24, got 0 > > > >>> 2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL > from > > > > primary at 0/F200 on timeline 1 > > > >>> 2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive > data > > > > from WAL stream: ERROR: requested WAL segment > 000100F2 > > > has > > > > already been removed > > > > > > Maybe you created the master slot with non-reserve (default) mode > > > and put a some-minites pause after making the backup and before > > > starting the standby. For the case the master slot doesn't keep > > > WAL segments unless the standby connects so a couple of > > > checkpoints can blow away the first segment required by the > > > standby. This is quite reasonable behavior. The following steps > > > makes this more sure. > > > > > > > - Took the backup of master > > > >- with a low max_wal_size = 2 and wal_keep_segments = 0 > > > > - Configured standby with recovery.conf > > > > - Created replication slot on master > > > + - SELECT pg_switch_wal(); on master twice. > > > + - checkpoint; on master twice. > > > > - Configured the replication slot on standby and started the standby > > > > > > Creating the slot with the following command will save it. > > > > > > =# select pg_create_physical_replication_slot('s1', true); > > > > > > > I did a test again, by applying the patch and I am not sure if the patch > is > > doing the right thing ?
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Sat, Jan 14, 2017 at 5:10 AM, Vladimir Rusinov wrote: > Attached are two new version of the patch: one keeps aliases, one don't. > Both the patches (with and without aliases) are not getting applied to the latest master. Below is the error - Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git a/src/include/access/xlog_fn.h b/src/include/access/xlog_fn.h |index a013d047a2..a46b2f995c 100644 |--- a/src/include/access/xlog_fn.h |+++ b/src/include/access/xlog_fn.h -- File to patch: Also, remove stray reference to xlog function in one of the tests. > > I've lost vote count. Should we create a form to calculate which one of > the patches should be commited? > > If we decide to go the extension way, perhaps it can be maintained outside > of core Postgres? > My take on having aliases to the functions : In my opinion as a DBA, I agree with having no-aliases. Having functions doing the same thing with two different names could be confusing. I have been doing quite a number of PostgreSQL upgrades since few years, i do not see, the function name changes as a major issue or a road-blocker in the upgrade exercise. If the function name has changed, it has changed. It would be a serious thing to consider during the upgrade if there is a change in the functionality of a particular function as it needs testing. I think, changing of the function names in the scripts and custom-tools has to be executed and might take up a bit of extra time. IMHO, It is not a good idea to keep aliases as this would make the DBAs lazier to change the function names on priority in the automated scripts/jobs/tools during the upgrades. Especially, in bigger and complex environments where-in database environments are handled by different multiple groups of DBAs, it could be possible that both *xlog* functions and *wal* functions will be used up in the scripts and all of them will be working fine and once the *xlog* functions names are completely removed, then some of the jobs/scripts start failing. I would vote for "no-aliases". Regards, Venkata Balaji N Database Consultant
Re: [HACKERS] patch proposal
Hi David, On Tue, Jan 24, 2017 at 9:22 AM, David Steele wrote: > Hi Venkata, > > On 11/8/16 5:47 PM, Venkata B Nagothi wrote: > > Attached is the 2nd version of the patch with some enhancements. > > Here's my review of the patch. > Thank you very much for reviewing the patch. 1) There are a number of whitespace error when applying the patch: > > $ git apply ../other/recoveryTargetIncomplete.patch-version2 > ../other/recoveryTargetIncomplete.patch-version2:158: indent with spaces. > else > ../other/recoveryTargetIncomplete.patch-version2:160: space before tab > in indent. > ereport(LOG, > ../other/recoveryTargetIncomplete.patch-version2:166: indent with spaces. > /* > ../other/recoveryTargetIncomplete.patch-version2:167: indent with spaces. >* This is the position where we can > choose to shutdown, pause > ../other/recoveryTargetIncomplete.patch-version2:168: indent with spaces. >* or promote at the end-of-the-wal if the > intended recovery > warning: squelched 10 whitespace errors > warning: 15 lines add whitespace errors. > > The build did succeed after applying the patch. > Sorry, my bad. The attached patch should fix those errors. > 2) There is no documentation. Serious changes to recovery are going to > need to be documented very carefully. It will also help during review > to determine you intent. > Attached patches include the documentation as well. > > 3) There are no tests. I believe that > src/test/recovery/t/006_logical_decoding.pl would be the best place to > insert your tests. > I will be adding the tests in src/test/recovery/t/003_recovery_targets.pl. My tests would look more or less similar to recovery_target_action. I do not see any of the tests added for the parameter recovery_target_action ? Anyways, i will work on adding tests for recovery_target_incomplete. > 4) I'm not convinced that fatal errors by default are the way to go. > It's a pretty big departure from what we have now. > I have changed the code to generate ERRORs instead of FATALs. Which is in the patch recoveryStartPoint.patch > Also, I don't think it's very intuitive how recovery_target_incomplete > works. For instance, if I set recovery_target_incomplete=promote and > set recovery_target_time, but pick a backup after the specified time, > then there will be an error "recovery_target_time %s is older..." rather > than a promotion. > Yes, that is correct and that is the expected behaviour. Let me explain - a) When you use the parameter recovery_target_time and pick-up a backup taken after the specified target time, then, recovery will not proceed further and ideally it should not proceed further. This is not an incomplete recovery situation either. This is a situation where recovery process cannot start at all. With this patch, "recovery_target_time" (similarly recovery_target_xid and recovery_target_lsn) first checks if the specified target time is later than the backup's time stamp if yes, then the recovery will proceed a-head or else it would generate an error, which is what has happened in your case. The ideal way to deal with this situation to change the recovery_target_time to a much later time (which is later than the backup's time) or use the parameter "recovery_target=immediate" to complete the recovery and start the database or choose the correct backup (which has a time stamp prior to the recovery_target_time). b) recovery_target_incomplete has no effect in the above situation as this parameter only deals with incomplete recovery. A situation where-in recovery process stops mid-way (Example : due to missing or corrupt WALs ) without reaching the intended recovery target (XID, Time and LSN). c) In real-time environment when a DBA is required to perform PITR to a particular recovery target (say, recovery_target_time), it is the DBA who knows till where the database needs to be recovered and what backup needs to be used for the same. The first thing DBA would do is, choose the appropriate backup which is taken prior to the recovery target. This is the basic first step needed. It seems to me that there needs to be a separate setting to raise a > fatal error. > > 5) There are really two patches here. One deals with notifying the user > that replay to their recovery target is not possible (implemented here > with fatal errors) and the other allows options when their recovery > target appears to be possible but never arrives. As far as I can see, at this point, nobody has really signed on to this > approach. I believe you will need a consensus before you can proceed > with a patch
Re: [HACKERS] Declarative partitioning - another take
Hi, I am testing the partitioning feature from the latest master and got the following error while loading the data - db01=# create table orders_y1993 PARTITION OF orders FOR VALUES FROM ('1993-01-01') TO ('1993-12-31'); CREATE TABLE db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; *ERROR: could not read block 6060 in file "base/16384/16412": read only 0 of 8192 bytes* *CONTEXT: COPY orders, line 376589: "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#01993|0|ithely regular pack"* Am i doing something wrong ? Regards, Venkata B N Database Consultant On Fri, Dec 9, 2016 at 3:58 PM, Amit Langote wrote: > On 2016/12/09 0:25, Robert Haas wrote: > > On Wed, Dec 7, 2016 at 11:42 PM, Michael Paquier > > wrote: > >>> Congrats to everyone working on this! This is a large step forward. > >> > >> Congratulations to all! It was a long way to this result. > > > > Yes. The last effort in this area which I can remember was by Itagaki > > Takahiro in 2010, so we've been waiting for this for more than 6 > > years. It's really good that Amit was able to put in the effort to > > produce a committable patch, and I think he deserves all of our thanks > > for getting that done - and NTT deserves our thanks for paying him to > > do it. > > > > Even though I know he put in a lot more work than I did, let me just > > say: phew, even reviewing that was a ton of work. > > Absolutely! Your review comments and design suggestions have been > instrumental in improving (and cutting down on the size of) the patches. > > > Of course, this is the beginning, not the end. > > +1000! > > > I've been thinking > > about next steps -- here's an expanded list: > > > > - more efficient plan-time partition pruning (constraint exclusion is > too slow) > > - run-time partition pruning > > - partition-wise join (Ashutosh Bapat is already working on this) > > - try to reduce lock levels > > - hash partitioning > > - the ability to create an index on the parent and have all of the > > children inherit it; this should work something like constraint > > inheritance. you could argue that this doesn't add any real new > > capability but it's a huge usability feature. > > - teaching autovacuum enough about inheritance hierarchies for it to > > update the parent statistics when they get stale despite the lack of > > any actual inserts/updates/deletes to the parent. this has been > > pending for a long time, but it's only going to get more important > > - row movement (aka avoiding the need for an ON UPDATE trigger on each > > partition) > > - insert (and eventually update) tuple routing for foreign partitions > > - not scanning the parent > > - fixing the insert routing so that we can skip tuple conversion where > possible > > - fleshing out the documentation > > I would definitely want to contribute to some of these items. It's great > that many others plan to contribute toward this as well. > > > One thing I'm wondering is whether we can optimize away some of the > > heavyweight locks. For example, if somebody does SELECT * FROM ptab > > WHERE id = 1, they really shouldn't need to lock the entire > > partitioning hierarchy, but right now they do. If the root knows > > based on its own partitioning key that only one child is relevant, it > > would be good to lock *only that child*. For this feature to be > > competitive, it needs to scale to at least a few thousand partitions, > > and locking thousands of objects instead of one or two is bound to be > > slow. Similarly, you can imagine teaching COPY to lock partitions > > only on demand; if no tuples are routed to a particular partition, we > > don't need to lock it. There's a manageability component here, too: > > not locking partitions unnecessarily makes ti easier to get DDL on > > other partitions through. Alternatively, maybe we could rewrite the > > lock manager to be hierarchical, so that you can take a single lock > > that represents an AccessShareLock on all partitions and only need to > > make one entry in the lock table to do it. That means that attempts > > to lock individual partitions need to check not only for a lock on > > that partition but also on anything further up in the hierarchy, but > > that might be a good trade if it gives us O(1) locking on the parent. > > And maybe we could also have a level of the hierarchy that represents > > every-table-in-the-database, for the benefit of pg_dump. Of course, > > rewriting the lock manager is a big project not for the faint of > > heart, but I think if we don't it's going to be a scaling bottleneck. > > Hierarchical lock manager stuff is interesting. Are you perhaps alluding > to a new *intention* lock mode as described in the literature on multiple > granularity locking [1]? > > > We also need to consider other parts of the system that may not scale, > > like pg_dump. For a long time, we've been sorta-kinda willing to fix > > the worst of the scalability problems with pg_dump, but that's re
Re: [HACKERS] Declarative partitioning - another take
Regards, Venkata B N Database Consultant On Fri, Dec 9, 2016 at 11:11 PM, Amit Langote wrote: > On Fri, Dec 9, 2016 at 3:16 PM, Venkata B Nagothi > wrote: > > Hi, > > > > I am testing the partitioning feature from the latest master and got the > > following error while loading the data - > > > > db01=# create table orders_y1993 PARTITION OF orders FOR VALUES FROM > > ('1993-01-01') TO ('1993-12-31'); > > CREATE TABLE > > > > db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; > > ERROR: could not read block 6060 in file "base/16384/16412": read only > 0 of > > 8192 bytes > > CONTEXT: COPY orders, line 376589: > > "9876391|374509|O|54847|1997-07-16|3-MEDIUM > |Clerk#01993|0|ithely > > regular pack" > > Hmm. Could you tell what relation the file/relfilenode 16412 belongs to? > db01=# select relname from pg_class where relfilenode=16412 ; relname -- orders_y1997 (1 row) I VACUUMED the partition and then re-ran the copy command and no luck. db01=# vacuum orders_y1997; VACUUM db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 6060 in file "base/16384/16412": read only 0 of 8192 bytes CONTEXT: COPY orders, line 376589: "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#01993|0|ithely regular pack" I do not quite understand the below behaviour as well. I VACUUMED 1997 partition and then i got an error for 1992 partition and then after 1996 and then after 1994 and so on. postgres=# \c db01 You are now connected to database "db01" as user "dba". db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 6060 in file "base/16384/16412": read only 0 of 8192 bytes CONTEXT: COPY orders, line 376589: "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#01993|0|ithely regular pack" db01=# vacuum orders_y1997; VACUUM db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 3942 in file "base/16384/16406": read only 0 of 8192 bytes CONTEXT: COPY orders, line 75445: "1993510|185287|F|42667.9|1992-08-15|2-HIGH |Clerk#00079|0| dugouts above the even " db01=# select relname from pg_class where relfilenode=16406; relname -- orders_y1992 (1 row) db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 3942 in file "base/16384/16406": read only 0 of 8192 bytes CONTEXT: COPY orders, line 75396: "1993317|260510|F|165852|1992-12-13|5-LOW |Clerk#03023|0|regular foxes. ironic dependenc..." db01=# vacuum orders_y1992; VACUUM db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 3708 in file "base/16384/16394": read only 0 of 8192 bytes CONTEXT: COPY orders, line 178820: "4713957|286270|O|200492|1996-10-01|1-URGENT |Clerk#01993|0|uriously final packages. slyly " db01=# select relname from pg_class where relfilenode=16394; relname -- orders_y1996 (1 row) db01=# vacuum orders_y1996; VACUUM db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 5602 in file "base/16384/16403": read only 0 of 8192 bytes CONTEXT: COPY orders, line 147390: "3882662|738010|F|199365|1994-12-26|5-LOW |Clerk#01305|0|ar instructions above the expre..." db01=# select relname from pg_class where relfilenode=16403; relname -- orders_y1994 (1 row) db01=# vacuum orders_y1994; VACUUM db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 5561 in file "base/16384/16412": read only 0 of 8192 bytes CONTEXT: COPY orders, line 59276: "1572448|646948|O|25658.6|1997-05-02|4-NOT SPECIFIED|Clerk#01993|0|es. ironic, regular p" *And finally the error again occurred for 1997 partition* db01=# select relname from pg_class where relfilenode=16412; relname -- orders_y1997 (1 row) db01=# vacuum orders_y1997; VACUUM db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 6060 in file "base/16384/16412": read only 0 of 8192 bytes CONTEXT: COPY orders, line 376589: "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#01993|0|ithely regular pack" db01=# Am i not understanding anything here ? > Also, is orders_y1993 the only partition of orders? How about \d+ orders? > Yes, i created multiple yearly partitions for orders table. I wanted to 1993 year's data first and see if the data goes into or
Re: [HACKERS] Declarative partitioning - another take
On Mon, Dec 12, 2016 at 3:06 PM, Amit Langote wrote: > > Hi, > > On 2016/12/11 10:02, Venkata B Nagothi wrote: > > On Fri, Dec 9, 2016 at 11:11 PM, Amit Langote > > wrote: > >> On Fri, Dec 9, 2016 at 3:16 PM, Venkata B Nagothi > >> wrote: > >>> I am testing the partitioning feature from the latest master and got > the > >>> following error while loading the data - > >>> > >>> db01=# create table orders_y1993 PARTITION OF orders FOR VALUES FROM > >>> ('1993-01-01') TO ('1993-12-31'); > >>> CREATE TABLE > >>> > >>> db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; > >>> ERROR: could not read block 6060 in file "base/16384/16412": read only > >> 0 of > >>> 8192 bytes > >>> CONTEXT: COPY orders, line 376589: > >>> "9876391|374509|O|54847|1997-07-16|3-MEDIUM > >> |Clerk#01993|0|ithely > >>> regular pack" > >> > >> Hmm. Could you tell what relation the file/relfilenode 16412 belongs > to? > >> > > > > db01=# select relname from pg_class where relfilenode=16412 ; > >relname > > -- > > orders_y1997 > > (1 row) > > > > > > I VACUUMED the partition and then re-ran the copy command and no luck. > > > > db01=# vacuum orders_y1997; > > VACUUM > > > > db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; > > ERROR: could not read block 6060 in file "base/16384/16412": read only 0 > > of 8192 bytes > > CONTEXT: COPY orders, line 376589: > > "9876391|374509|O|54847|1997-07-16|3-MEDIUM > |Clerk#01993|0|ithely > > regular pack" > > > > I do not quite understand the below behaviour as well. I VACUUMED 1997 > > partition and then i got an error for 1992 partition and then after 1996 > > and then after 1994 and so on. > > > > [ ... ] > > > db01=# vacuum orders_y1997; > > VACUUM > > db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; > > ERROR: could not read block 6060 in file "base/16384/16412": read only 0 > > of 8192 bytes > > CONTEXT: COPY orders, line 376589: > > "9876391|374509|O|54847|1997-07-16|3-MEDIUM > |Clerk#01993|0|ithely > > regular pack" > > db01=# > > > > Am i not understanding anything here ? > > I could not reproduce this issue. Also, I could not say what might have > gone wrong based only on the information I have seen so far. > > Have you tried inserting the same data using insert? > I can load the data into appropriate partitions using INSERT. So, no issues there. db01=# CREATE TABLE orders2( o_orderkey INTEGER, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice REAL, o_orderdate DATE, o_orderpriority CHAR(15), o_clerk CHAR(15), o_shippriority INTEGER, o_comment VARCHAR(79)) partition by (o_orderdate); *db01=# insert into orders2 select * from orders where o_orderdate='1995-10-11';* *INSERT 0 3110* > create table orders_unpartitioned (like orders); > copy orders_unpartitioned from '/data/orders-1993.csv'; > insert into orders select * from orders_unpartitioned; > Loading the data into a normal table is not an issue (infact the csv is generated from the table itself) The issue is occurring only when i am trying to load the data from CSV file into a partitioned table - db01=# CREATE TABLE orders_y1992 PARTITION OF orders2 FOR VALUES FROM ('1992-01-01') TO ('1992-12-31'); CREATE TABLE db01=# copy orders2 from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 6060 in file "base/16384/16407": read only 0 of 8192 bytes CONTEXT: COPY orders2, line 376589: "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#01993|0|ithely regular pack" Not sure why COPY is failing. Regards, Venkata B N Database Consultant
[HACKERS] patch proposal
Hi, During the recovery process, It would be nice if PostgreSQL generates an error by aborting the recovery process (instead of starting-up the cluster) if the intended recovery target point is not reached and give an option to DBA to resume the recovery process from where it exactly stopped. The issue here is, if by any chance, the required WALs are not available or if there is any WAL missing or corrupted at the restore_command location, then PostgreSQL recovers until the end of the last available WAL and starts-up the cluster. At a later point-of-time, if the missing WAL is found, then, it is not possible to resume the recovery process from where it stopped previously. The whole backup restoration + recovery process must be initiated from the beginning which can be time consuming especially when recovering huge clusters sizing up to Giga bytes and Tera Bytes requiring 1000s of WALs to be copied over for recovery. Any thoughts ? comments? I can work on this patch based on the comments/feedback. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Tue, Aug 16, 2016 at 2:50 AM, David Steele wrote: > On 8/15/16 2:33 AM, Venkata B Nagothi wrote: > > > During the recovery process, It would be nice if PostgreSQL generates an > > error by aborting the recovery process (instead of starting-up the > > cluster) if the intended recovery target point is not reached and give > > an option to DBA to resume the recovery process from where it exactly > > stopped. > > Thom wrote a patch [1] recently that gives warnings in this case. You > might want to have a look at that first. > That is good to know. Yes, this patch is about generating a more meaningful output messages for recovery process, which makes sense. > > The issue here is, if by any chance, the required WALs are not available > > or if there is any WAL missing or corrupted at the restore_command > > location, then PostgreSQL recovers until the end of the last available > > WAL and starts-up the cluster. > > You can use pause_at_recovery_target/recovery_target_action (depending > on your version) to prevent promotion. That would work for your stated > scenario but not for the scenario where replay starts (or the database > reaches consistency) after the recovery target. > The above said parameters can be configured to pause, shutdown or prevent promotion only after reaching the recovery target point. To clarify, I am referring to a scenario where recovery target point is not reached at all ( i mean, half-complete or in-complete recovery) and there are lots of WALs still pending to be replayed - in this situation, PostgreSQL just completes the archive recovery until the end of the last available WAL (WAL file "0001001E" in my case) and starts-up the cluster by generating an error message (saying "0001001F" not found). Note: I am testing in PostgreSQL-9.5 LOG: restored log file "0001001E" from archive cp: cannot stat â/data/pgrestore9531/0001001Fâ: No such file or directory LOG: redo done at 0/1EFFDBB8 LOG: last completed transaction was at log time 2016-08-15 11:04:26.795902+10 I have used the following recovery* parameters in the recovery.conf file here and have intentionally not supplied all the WAL archives needed for the recovery process to reach the target xid. recovery_target_xid = , recovery_target_inclusive = true recovery_target_action = pause It would be nice if PostgreSQL pauses the recovery in-case its not complete (because of missing or corrupt WAL), shutdown the cluster and allows the DBA to restart the replay of the remaining WAL Archive files to continue recovery (from where it stopped previously) until the recovery target point is reached. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] patch proposal
On Wed, Aug 17, 2016 at 12:06 AM, Stephen Frost wrote: > Greetings, > > * Venkata B Nagothi (nag1...@gmail.com) wrote: > > The above said parameters can be configured to pause, shutdown or prevent > > promotion only after reaching the recovery target point. > > To clarify, I am referring to a scenario where recovery target point is > not > > reached at all ( i mean, half-complete or in-complete recovery) and there > > are lots of WALs still pending to be replayed - in this situation, > > PG doesn't know that there are still WALs to be replayed. > PG doesn't know that there are still WALs to be replayed. Since, i have given an particular recovery target and PG knows the current replay position, I would say, it would be good if PG warns and pauses there by saying recovery target point is not reached. > It would be nice if PostgreSQL pauses the recovery in-case its not > complete > > (because of missing or corrupt WAL), shutdown the cluster and allows the > > DBA to restart the replay of the remaining WAL Archive files to continue > > recovery (from where it stopped previously) until the recovery target > point > > is reached. > Agreed. Reaching end-of-WAL is not an error. It sounds more like a limitation in certain scenarios. Reaching the end of WAL isn't an error and I don't believe it makes any > sense to treat it like it is. You can specify any recovery target point > you wish, including ones that don't exist, and that's not an error > either. > > I could see supporting an additional "pause" option that means "pause at > the end of WAL if you don't reach the recovery target point". I'd also > be happy with a warning being emitted in the log if the recovery target > point isn't reached before reaching the end of WAL, but I don't think it > makes sense to change the existing behavior. > Agreed. Additional option like "pause" would. As long as there is an option to ensure following happens if the recovery target is not reached - a) PG pauses the recovery at the end of the WAL b) Generates a warning in the log file saying that recovery target point is not reached (there is a patch being worked upon on by Thom on this) c) Does not open-up the database exiting from the recovery process by giving room to resume the replay of WALs Regards, Venkata B N Fujitsu Australia