Re: using pg_basebackup for point in time recovery
On Mon, Jun 25, 2018 at 12:51:10PM -0400, Bruce Momjian wrote: > FYI, in recent discussions on the docs list: > > > https://www.postgresql.org/message-id/CABUevEyumGh3r05U3_mhRrEU=dfacdrr2hew140mvn7fsbm...@mail.gmail.com I did not recall this one. Thanks for the reminder, Bruce. > There was the conclusion that: > > If it's a clean backpatch I'd say it is -- people who are using > PostgreSQL 9.6 will be reading the documentation for 9.6 etc, so they > will not know about the fix then. > > If it's not a clean backpatch I can certainly see considering it, but if > it's not a lot of effort then I'd say it's definitely worth it. > > so the rule I have been using for backpatching doc stuff has changed > recently. In the case of this thread, I think that the patch applies cleanly anyway as this comes from the period where hot standbys have been introduced. So that would not be a lot of work... Speaking of which, it would be nice to be sure about the wording folks here would prefer using before fixing anything ;p -- Michael signature.asc Description: PGP signature
Re: using pg_basebackup for point in time recovery
On Thu, Jun 21, 2018 at 04:50:38PM -0700, David G. Johnston wrote: > On Thu, Jun 21, 2018 at 4:26 PM, Vik Fearing > wrote: > > On 21/06/18 07:27, Michael Paquier wrote: > > Attached is a patch which includes your suggestion. What do you think? > > As that's an improvement, only HEAD would get that clarification. > > Say what? If the clarification applies to previous versions, as it > does, it should be backpatched. This isn't a change in behavior, it's a > change in the description of existing behavior. > > > Generally only actual bug fixes get back-patched; but I'd have to say this > looks like it could easily be classified as one. FYI, in recent discussions on the docs list: https://www.postgresql.org/message-id/CABUevEyumGh3r05U3_mhRrEU=dfacdrr2hew140mvn7fsbm...@mail.gmail.com there was the conclusion that: If it's a clean backpatch I'd say it is -- people who are using PostgreSQL 9.6 will be reading the documentation for 9.6 etc, so they will not know about the fix then. If it's not a clean backpatch I can certainly see considering it, but if it's not a lot of effort then I'd say it's definitely worth it. so the rule I have been using for backpatching doc stuff has changed recently. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: using pg_basebackup for point in time recovery
On Thu, Jun 21, 2018 at 04:50:38PM -0700, David G. Johnston wrote: > Generally only actual bug fixes get back-patched; but I'd have to say > this looks like it could easily be classified as one. Everybody is against me here ;) > Some comments on the patch itself: > > "recover up to the wanted recovery point." - "desired recovery point" reads > better to me > > > "These backups are typically much faster to backup and restore" - "These > backups are typically much faster to create and restore"; avoid repeated > use of the word backup Okay. > "but can result as well in larger backup sizes" - "but can result in larger > backup sizes", drop the unnecessary 'as well' Okay. > I like adding "cold backup" here to help contrast and explain why a base > backup is considered a "hot backup". The rest is style to make that flow > better. Indeed. The section uses hot backups a lot. What do all folks here think about the updated attached? -- Michael diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 982776ca0a..af48aa64c2 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1430,12 +1430,15 @@ restore_command = 'cp /mnt/server/archivedir/%f %p' Standalone Hot Backups - It is possible to use PostgreSQL's backup facilities to - produce standalone hot backups. These are backups that cannot be used - for point-in-time recovery, yet are typically much faster to backup and - restore than pg_dump dumps. (They are also much larger - than pg_dump dumps, so in some cases the speed advantage - might be negated.) + It is possible to use PostgreSQL's backup + facilities to produce standalone hot backups. These are backups that + could be used for point-in-time recovery if combined with a WAL + archive able to recover up to the wanted recovery point. These backups + are typically much faster to create and restore than + pg_dump for large deployments but can result + in larger backup sizes. Note also that + pg_dump backups cannot be used for + point-in-time recovery. signature.asc Description: PGP signature
Re: using pg_basebackup for point in time recovery
On Thu, Jun 21, 2018 at 04:42:00PM -0400, Ravi Krishna wrote: > Same here even though I use Mac mail. But it is not yahoo alone. > Most of the web email clients have resorted to top posting. I miss > the old days of Outlook Express which was so '>' friendly. I think > Gmail allows '>' when you click on the dots to expand the mail you > are replying to, but it messes up in justifying and formatting it. Those products have good practices when it comes to break and redefine what the concept behind emails is... -- Michael signature.asc Description: PGP signature
Re: using pg_basebackup for point in time recovery
On Thu, Jun 21, 2018 at 4:26 PM, Vik Fearing wrote: > On 21/06/18 07:27, Michael Paquier wrote: > > Attached is a patch which includes your suggestion. What do you think? > > As that's an improvement, only HEAD would get that clarification. > > Say what? If the clarification applies to previous versions, as it > does, it should be backpatched. This isn't a change in behavior, it's a > change in the description of existing behavior. > Generally only actual bug fixes get back-patched; but I'd have to say this looks like it could easily be classified as one. Before: These are backups that cannot be used for PITR After: These are backups that could be used for PITR if ... Changing a cannot to a can seems like we are fixing a bug in the documentation. Some comments on the patch itself: "recover up to the wanted recovery point." - "desired recovery point" reads better to me "These backups are typically much faster to backup and restore" - "These backups are typically much faster to create and restore"; avoid repeated use of the word backup "but can result as well in larger backup sizes" - "but can result in larger backup sizes", drop the unnecessary 'as well' "sizes, so the speed of one method or the other is to evaluate carefully first" - that is just wrong as-is; suggest just removing it. To cover the last three items as a whole I'd suggest: "These backups are typically much faster to create and restore, but generate larger file sizes, compared to pg_dump." For the last sentence I'd suggest: "Note that because WAL cannot be applied on top of a restored pg_dump backup it is considered a cold backup and cannot be used for point-in-time-recovery." I like adding "cold backup" here to help contrast and explain why a base backup is considered a "hot backup". The rest is style to make that flow better. David J.
Re: using pg_basebackup for point in time recovery
On 21/06/18 07:27, Michael Paquier wrote: > Attached is a patch which includes your suggestion. What do you think? > As that's an improvement, only HEAD would get that clarification. Say what? If the clarification applies to previous versions, as it does, it should be backpatched. This isn't a change in behavior, it's a change in the description of existing behavior. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: using pg_basebackup for point in time recovery
> > > >You should avoid top-posting on the Postgres lists, this is not the > >usual style used by people around :) > > Will do, but Yahoo Mail! does not seem to like that, so I am typing the > > myself > Same here even though I use Mac mail. But it is not yahoo alone. Most of the web email clients have resorted to top posting. I miss the old days of Outlook Express which was so '>' friendly. I think Gmail allows '>' when you click on the dots to expand the mail you are replying to, but it messes up in justifying and formatting it. The best for '>': Unix elm :-)
Re: using pg_basebackup for point in time recovery
Hi Michael On Thursday, June 21, 2018, 7:28:13 AM GMT+2, Michael Paquier wrote: >You should avoid top-posting on the Postgres lists, this is not the >usual style used by people around :) Will do, but Yahoo Mail! does not seem to like that, so I am typing the > myself >Attached is a patch which includes your suggestion. What do you think? >As that's an improvement, only HEAD would get that clarification. Yes I think it is now perfectly clear. Much appreciated to have the chance to contribute to the doc by the way, it is very nice >Perhaps. There is really nothing preventing one to add a recovery.conf >afterwards, which is also why pg_basebackup -R exists. I do that as >well for some of the framework I work with and maintain. I just went to the doc to check about this -R option :-) Pierre
Re: using pg_basebackup for point in time recovery
On 06/21/2018 12:27 AM, Michael Paquier wrote: [snip] Attached is a patch which includes your suggestion. What do you think? As that's an improvement, only HEAD would get that clarification. You've *got* to be kidding. Fixing an ambiguously or poorly worded bit of *documentation* should obviously be pushed to all affected versions. -- Angular momentum makes the world go 'round.
Re: using pg_basebackup for point in time recovery
Hi Pierre, On Wed, Jun 20, 2018 at 08:06:31AM +, Pierre Timmermans wrote: > Hi Michael You should avoid top-posting on the Postgres lists, this is not the usual style used by people around :) > Thanks for the confirmation. Your rewording removes the confusion. I > would maybe take the opportunity to re-instate that pg_dump cannot be > used for PITR, so in the line of > "These are backups that could be used for point-in-time recovery if > combined with a WAL archive able to recover up to the wanted recovery > point. These backups are typically much faster to backup and restore > than pg_dump for large deployments but can result as well in larger > backup sizes, so the speed of one method or the other is to evaluate > carefully first. Consider also that pg_dump backups cannot be used for > point-in-time recovery." Attached is a patch which includes your suggestion. What do you think? As that's an improvement, only HEAD would get that clarification. > Maybe the confusion stems from the fact that if you restore a > standalone (self-contained) pg_basebackup then - by default - recovery > is done with the recovery_target immediate option, so if one needs > point-in-time recovery he has to edit the recovery.conf and brings the > archives.. Perhaps. There is really nothing preventing one to add a recovery.conf afterwards, which is also why pg_basebackup -R exists. I do that as well for some of the framework I work with and maintain. -- Michael diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 982776ca0a..ccc0a66bf3 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1430,12 +1430,15 @@ restore_command = 'cp /mnt/server/archivedir/%f %p' Standalone Hot Backups - It is possible to use PostgreSQL's backup facilities to - produce standalone hot backups. These are backups that cannot be used - for point-in-time recovery, yet are typically much faster to backup and - restore than pg_dump dumps. (They are also much larger - than pg_dump dumps, so in some cases the speed advantage - might be negated.) + It is possible to use PostgreSQL's backup + facilities to produce standalone hot backups. These are backups that + could be used for point-in-time recovery if combined with a WAL + archive able to recover up to the wanted recovery point. These backups + are typically much faster to backup and restore than pg_dump for large + deployments but can result as well in larger backup sizes, so the + speed of one method or the other is to evaluate carefully first. Note + also that pg_dump backups cannot be used + for point-in-time recovery. signature.asc Description: PGP signature
Re: using pg_basebackup for point in time recovery
Hi Michael Thanks for the confirmation. Your rewording removes the confusion. I would maybe take the opportunity to re-instate that pg_dump cannot be used for PITR, so in the line of "These are backups that could be used for point-in-time recovery if combined with a WAL archive able to recover up to the wanted recovery point. These backups are typically much faster to backup and restore than pg_dump for large deployments but can result as well in larger backup sizes, so the speed of one method or the other is to evaluate carefully first. Consider also that pg_dump backups cannot be used for point-in-time recovery." Maybe the confusion stems from the fact that if you restore a standalone (self-contained) pg_basebackup then - by default - recovery is done with the recovery_target immediate option, so if one needs point-in-time recovery he has to edit the recovery.conf and brings the archives.. Thanks and regards, Pierre On Wednesday, June 20, 2018, 5:38:56 AM GMT+2, Michael Paquier wrote: Hi Pierre, On Tue, Jun 19, 2018 at 12:03:58PM +, Pierre Timmermans wrote: > Here is the doc, the sentence that I find misleading is "There are > backups that cannot be used for point-in-time recovery", also > mentioning that they are faster than pg_dumps add to confusion (since > pg_dumps cannot be used for PITR): > https://www.postgresql.org/docs/current/static/continuous-archiving.html Yes, it is indeed perfectly possible to use such backups to do a PITR as long as you have a WAL archive able to replay up to the point where you want the replay to happen, so I agree that this is a bit confusing. This part of the documentation is here since the beginning of times, well 6559c4a2 to be exact. Perhaps we would want to reword this sentence as follows: "These are backups that could be used for point-in-time recovery if combined with a WAL archive able to recover up to the wanted recovery point. These backups are typically much faster to backup and restore than pg_dump for large deployments but can result as well in larger backup sizes, so the speed of one method or the other is to evaluate carefully first." I am open to better suggestions of course. -- Michael
Re: using pg_basebackup for point in time recovery
Hi Pierre, On Tue, Jun 19, 2018 at 12:03:58PM +, Pierre Timmermans wrote: > Here is the doc, the sentence that I find misleading is "There are > backups that cannot be used for point-in-time recovery", also > mentioning that they are faster than pg_dumps add to confusion (since > pg_dumps cannot be used for PITR): > https://www.postgresql.org/docs/current/static/continuous-archiving.html Yes, it is indeed perfectly possible to use such backups to do a PITR as long as you have a WAL archive able to replay up to the point where you want the replay to happen, so I agree that this is a bit confusing. This part of the documentation is here since the beginning of times, well 6559c4a2 to be exact. Perhaps we would want to reword this sentence as follows: "These are backups that could be used for point-in-time recovery if combined with a WAL archive able to recover up to the wanted recovery point. These backups are typically much faster to backup and restore than pg_dump for large deployments but can result as well in larger backup sizes, so the speed of one method or the other is to evaluate carefully first." I am open to better suggestions of course. -- Michael signature.asc Description: PGP signature
using pg_basebackup for point in time recovery
Hi,I find the documentation about pg_basebackup misleading : the documentation states that standalone hot backups cannot be used for point in time recovery, however I don't get the point : if one has a combination of the nightly pg_basebackup and the archived wals, then it is totally OK to do point in time I assume ? (of course the recovery.conf must be manually changed to set the restore_command and the recovery target time) Here is the doc, the sentence that I find misleading is "There are backups that cannot be used for point-in-time recovery", also mentioning that they are faster than pg_dumps add to confusion (since pg_dumps cannot be used for PITR)Doc: https://www.postgresql.org/docs/current/static/continuous-archiving.html It is possible to use PostgreSQL's backup facilities to produce standalone hot backups. These are backups that cannot be used for point-in-time recovery, yet are typically much faster to backup and restore than pg_dump dumps. (They are also much larger than pg_dump dumps, so in some cases the speed advantage might be negated.) As with base backups, the easiest way to produce a standalone hot backup is to use the pg_basebackup tool. If you include the -X parameter when calling it, all the write-ahead log required to use the backup will be included in the backup automatically, and no special action is required to restore the backup. Thanks and regards, Pierre On Tuesday, June 19, 2018, 1:38:40 PM GMT+2, Ron wrote: On 06/15/2018 11:26 AM, Data Ace wrote: Well I think my question is somewhat away from my intention cause of my poor understanding and questioning :( Actually, I have 1TB data and have hardware spec enough to handle this amount of data, but the problem is that it needs too many join operations and the analysis process is going too slow right now. I've searched and found that graph model nicely fits for network data like social data in query performance. If your data is hierarchal, then storing it in a network database is perfectly reasonable. I'm not sure, though, that there are many network databases for Linux. Raima is the only one I can think of. Should I change my DB (I mean my DB for analysis)? or do I need some other solutions or any extension? Thanks -- Angular momentum makes the world go 'round.