Re: Something else about Redo Logs disappearing
On Tue, 2020-06-16 at 00:28 +0200, Peter wrote: > On Mon, Jun 15, 2020 at 09:46:34PM +0200, Laurenz Albe wrote: > ! On Mon, 2020-06-15 at 19:00 +0200, Peter wrote: > ! > And that is one of a couple of likely pitfalls I perceived when > ! > looking at that new API. > ! > ! That is a property of my scripts, *not* of the non-exclusive > ! backup API... > > Then how would I have figured it before even knowing Your scripts? By reading the documentation and trying it? > The API requires such kind of programming, and such kind of > programming creates dangerous pitfalls. After You have trained soms > hundreds of system administrators, You will also see such things > right at first glance. The API can be used by hand, calling two SQL functions. I have trained some hundreds of administrators, and they could follow. > But then it's futile to discuss with religious people, as they are > blind to reasoning: AL II. 58, "The slaves shall serve." Suspicion becomes certainty: You are not out to understand, but to troll. Bye. Laurenz Albe
Re: Something else about Redo Logs disappearing
On Mon, Jun 15, 2020 at 09:46:34PM +0200, Laurenz Albe wrote: ! On Mon, 2020-06-15 at 19:00 +0200, Peter wrote: ! > And that is one of a couple of likely pitfalls I perceived when ! > looking at that new API. ! ! That is a property of my scripts, *not* of the non-exclusive ! backup API... Then how would I have figured it before even knowing Your scripts? The API requires such kind of programming, and such kind of programming creates dangerous pitfalls. After You have trained soms hundreds of system administrators, You will also see such things right at first glance. But then it's futile to discuss with religious people, as they are blind to reasoning: AL II. 58, "The slaves shall serve." cheerio, PMc
Re: Something else about Redo Logs disappearing
On Sun, Jun 14, 2020 at 03:05:15PM +0200, Magnus Hagander wrote: ! > You can see that all the major attributes (scheduling, error-handling, ! > signalling, ...) of a WAL backup are substantially different to that ! > of any usual backup. ! ! > This is a different *Class* of backup object, therefore it needs an ! > appropriate infrastructure that can handle these attributes correctly. ! > ! ! Yes, this is *exactly* why special-handling the WAL during the base backup ! makes a lot of sense. Certainly. Only I prefer to do the special-handling *outside of* the base backup. ! Is it required? No. ! Will it make your backups more reliable? Yes. *shrug* I have no benefit in increasing reliability from 250% to 330%, if that would be the case at all. ! But, if You never have considered *continuous* archiving, and only ! > intend to take a functional momentarily backup of a cluster, then You ! > may well have never noticed these differences. I noticed them mainly ! > because I did *BUILD* such an infrastructure (the 20 lines of shell ! > script, you know). ! > ! ! Yes, if you take a simplistic view of your backups, then yes. You appear to sound like an insurance salesman who desperately tries to sell a third health insurance policy to somebody who already has two of them, by trying to build on unfounded precariousness. ! ! There is *absolutely* no need for threading to use the current APIs. You ! > ! need to run one query, go do something else, and then run another ! > ! query. ! > ! > Wrong. The point is, I dont want to "go do something else", I have to ! > exit() and get back to the initiator at that place. ! > ! ! That is not a requirement of the current PostgreSQL APIs. We'll be done with that whole API in a few more lines now. (I'm getting tired of this.) ! (in fact, using ! threading would add a significant extra burden there, as libpq does not ! allow sharing of connections between threads) I never said one would need to thread the DB connections. ! That is a requirement, and indeed a pretty sharp limitation, of the *other* ! APIs you are working with, it sounds like. What "other"? ! The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do ! they require any form of threading. Ah, nice try! But, we're *NOT* shifting blame around. We do instead get things working. We do proper engineering. ! And the fact that you need to do an exit() would negate any threading ! anyway, so that seems to be a false argument regardless. You do know exactly what I'm talking about. ! This is also clearly visible in Laurenz' code: he utilizes two ! > unchecked background tasks (processes, in this case) with loose ! > coupling for the purpose, as it does not work otherwise. ! > ! ! Yes, because he is also trying to work around a severely limited API *on ! the other side*. There is no "other" side. There is only *one* side: to get things working. And for interaction, Jon Postel's law applies: Be conservative in what you provide, and liberal in what you require. This is how the Internet was built. The modern-day linux-youngsters tend to forget that we all stand on the shoulders of giants. ! The most interesting point in there appears to be this: ! > > that the backup label and tablespace map files are not written to ! > > disk. Instead, their would-be contents are returned in *labelfile ! > > and *tblspcmapfile, ! > ! > This is in do_pg_start_backup() - so we actually HAVE this data ! > already at the *START* time of the backup! ! ! ! > Then why in hell do we wait until the END of the backup before we ! > hand this data to the operator: at a time when the DVD with the ! > ! ! Because it cannot be safely written *into the data directory*. ! ! Now, it could be written *somewhere else*, that is true. And then you would ! add an extra step at restore time to rename it back. But then your restore ! would now also require a plugin. Yes, and as it is now, it requires girl Friday to fetch them from the line-printer and mix them up - which, as we already got explained, can end up a *lot* worse. Or, equivalently and as here practically demonstrated, some consultant trainee writing some script which, when accidentially invoked twice, creates an inconsistent backup, and this being invisible to the operator. That's indeed dangerous enough for my taste. But lets grab that from the start: Yes, I didn't trust the docs. Because, as people here are so crazy about the old API being troublesome and dangerous and must be deprecated, and the whole thing being so imminent, then there should be some REASON for that. And from the docs I could not see any reason - so I supposed there must be something else in pg_start_backup(); something that is not explained in the docs, and that would explain the whole bohei. But, in fact, there is no such thing. First, the backup_label, which should not stay in the running cluster tree. So, what bad does happen when it stays there? Nothing at all. The cluster migh
Re: Something else about Redo Logs disappearing
On Mon, 2020-06-15 at 19:00 +0200, Peter wrote: > And that is one of a couple of likely pitfalls I perceived when > looking at that new API. That is a property of my scripts, *not* of the non-exclusive backup API... > We could fix that, but that will then get more complicated - and > people will usually not do that. And that's why I consider that > new API as rather dangerous. ... so this is moot. Yours, Laurenz Albe
Re: Something else about Redo Logs disappearing
On Mon, Jun 15, 2020 at 03:19:29PM +0200, Laurenz Albe wrote: ! On Mon, 2020-06-15 at 14:50 +0200, Peter wrote: ! > ! An example: ! > ! ! > ! - Backup #1 calls "pgpre.sh" ! > ! - Backup #1 starts copying files ! > ! - Backup #2 calls "pgpre.sh". ! > ! This will cancel the first backup. ! > ! - Backup #1 completes copying files. ! > ! - Backup #1 calls "pgpost.sh". ! > ! It will receive an error. ! > ! So it has to invalidate the backup. ! > ! - Backup #2 completes copying files. ! > ! - Backup #2 calls "pgpost.sh". ! > ! It gets a "backup_label" file and completes the backup. ! > ! > That's not true. ! ! Ah, yes, you are right. Thank You. ! Since "pgpre.sh" and "pgpost.sh" are independent, there ! is no way to tell which of them belongs to which other. Correct. ! So calling "pgpost.sh" indeed ends the most recently started ! backup and returns "backup_label" accordingly. ! ! That means: the caller of the scripts has to make sure ! not to start a second backup while the first one is running. Never run two backups in parallel with such an approach, exactly. And that is one of a couple of likely pitfalls I perceived when looking at that new API. We could fix that, but that will then get more complicated - and people will usually not do that. And that's why I consider that new API as rather dangerous. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Mon, 2020-06-15 at 14:50 +0200, Peter wrote: > ! An example: > ! > ! - Backup #1 calls "pgpre.sh" > ! - Backup #1 starts copying files > ! - Backup #2 calls "pgpre.sh". > ! This will cancel the first backup. > ! - Backup #1 completes copying files. > ! - Backup #1 calls "pgpost.sh". > ! It will receive an error. > ! So it has to invalidate the backup. > ! - Backup #2 completes copying files. > ! - Backup #2 calls "pgpost.sh". > ! It gets a "backup_label" file and completes the backup. > > That's not true. Ah, yes, you are right. It's a while since I wrote these scripts. Since "pgpre.sh" and "pgpost.sh" are independent, there is no way to tell which of them belongs to which other. So calling "pgpost.sh" indeed ends the most recently started backup and returns "backup_label" accordingly. That means: the caller of the scripts has to make sure not to start a second backup while the first one is running. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Something else about Redo Logs disappearing
On Mon, Jun 15, 2020 at 11:44:33AM +0200, Laurenz Albe wrote: ! On Sat, 2020-06-13 at 19:48 +0200, Peter wrote: ! > ! > 4. If, by misconfiguration and/or operator error, the backup system ! > ! > happens to start a second backup. in parallel to the first, ! > ! > then do I correctly assume, both backups will be rendered ! > ! > inconsistent while this may not be visible to the operator; and ! > ! > the earlier backup would be flagged as apparently successful while ! > ! > carrying the wrong (later) label? ! > ! ! > ! If you are using my scripts and start a second backup while the first ! > ! one is still running, the first backup will be interrupted. ! > ! > This is not what I am asking. It appears correct to me, that, on ! > the database, the first backup will be interrupted. But on the ! > tape side, this might go unnoticed, and on completion it will ! > successfully receive the termination code from the *SECOND* ! > backup - which means that on tape we will have a seemingly ! > successful backup, which ! > 1. is corrupted, and ! > 2. carries a wrong label. ! ! That will only happen if the backup that uses my scripts does the ! wrong thing. Yes. Occasionally software does the wrong thing, it's called "bugs". ! An example: ! ! - Backup #1 calls "pgpre.sh" ! - Backup #1 starts copying files ! - Backup #2 calls "pgpre.sh". ! This will cancel the first backup. ! - Backup #1 completes copying files. ! - Backup #1 calls "pgpost.sh". ! It will receive an error. ! So it has to invalidate the backup. ! - Backup #2 completes copying files. ! - Backup #2 calls "pgpost.sh". ! It gets a "backup_label" file and completes the backup. That's not true. Now let me see how to compile a bash... and here we go: ! An example: ! ! - Backup #1 calls "pgpre.sh" > $ ./pgpre.sh > backup starting location: 1/C858 > $ We now have: > 24129 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 24130 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 24131 10 SJ 0:00.01 psql -Atq > 24158 10 SCJ 0:00.00 sleep 5 And: > postgres=# \d > List of relations > Schema | Name | Type | Owner > ++---+-- > public | backup | table | postgres > (1 row) > > postgres=# select * from backup; > id | state | pid | backup_label | tablespace_map > +-+---+--+ > 1 | running | 24132 | | > (1 row) ! - Backup #1 starts copying files Let's suppose it does now. ! - Backup #2 calls "pgpre.sh". > $ ./pgpre.sh > backup starting location: 1/C924 > $ FATAL: terminating connection due to administrator command > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > connection to server was lost > Backup failed > ./pgpre.sh: line 93: ${PSQL[1]}: ambiguous redirect > > $ echo $? > 0 ! This will cancel the first backup. Yes, it seems it did: > 25279 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 25280 10 IWJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 25281 10 SJ 0:00.01 psql -Atq > 25402 10 SCJ 0:00.00 sleep 5 > postgres=# \d > List of relations > Schema | Name | Type | Owner > ++---+-- > public | backup | table | postgres > (1 row) > > postgres=# select * from backup; > id | state | pid | backup_label | tablespace_map > +-+---+--+ > 1 | running | 25282 | | > (1 row) ! - Backup #1 completes copying files. ! - Backup #1 calls "pgpost.sh". > $ ./pgpost.sh > START WAL LOCATION: 1/C924 (file 0001000100C9) > CHECKPOINT LOCATION: 1/C958 > BACKUP METHOD: streamed > BACKUP FROM: master > START TIME: 2020-06-15 14:09:41 CEST > LABEL: 2020-06-15 14:09:40 > START TIMELINE: 1 > > $ echo $? > 0 ! It will receive an error. ! So it has to invalidate the backup. Where is the error? What we now have is this: No processes anymore. > id | state | pid | backup_label > | tablespace_map > +--+---++ > 1 | complete | 25282 | START WAL LOCATION: 1/C924 (file > 0001000100C9)+| > | | | CHECKPOINT LOCATION: 1/C958 > +| > | | | BACKUP METHOD: streamed > +| > | | | BACKUP FROM: master > +| > | | | START TIME: 2020-06-15 14:09:41 CEST > +| > | | | LABEL: 2020-06-15 14:09:40 > +| > | | | START TIMELINE: 1 > +| > | | |
Re: Something else about Redo Logs disappearing
On Sat, 2020-06-13 at 19:48 +0200, Peter wrote: > ! > 4. If, by misconfiguration and/or operator error, the backup system > ! > happens to start a second backup. in parallel to the first, > ! > then do I correctly assume, both backups will be rendered > ! > inconsistent while this may not be visible to the operator; and > ! > the earlier backup would be flagged as apparently successful while > ! > carrying the wrong (later) label? > ! > ! If you are using my scripts and start a second backup while the first > ! one is still running, the first backup will be interrupted. > > This is not what I am asking. It appears correct to me, that, on > the database, the first backup will be interrupted. But on the > tape side, this might go unnoticed, and on completion it will > successfully receive the termination code from the *SECOND* > backup - which means that on tape we will have a seemingly > successful backup, which > 1. is corrupted, and > 2. carries a wrong label. That will only happen if the backup that uses my scripts does the wrong thing. An example: - Backup #1 calls "pgpre.sh" - Backup #1 starts copying files - Backup #2 calls "pgpre.sh". This will cancel the first backup. - Backup #1 completes copying files. - Backup #1 calls "pgpost.sh". It will receive an error. So it has to invalidate the backup. - Backup #2 completes copying files. - Backup #2 calls "pgpost.sh". It gets a "backup_label" file and completes the backup. So the only way that something can go wrong would be if backup #1 somehow does *not* invalidate the backup. > ! This is specific to my scripts, PostgreSQL's non-exclusive backup > ! can perform more than one concurrent backup successfully. > ! I tried to keep things simple. > > I understand. But the operator may not know that and/or accidentially > start a second backup while one is still running. And this will then > result in ... > > ! If you have the wrong "backup_label", you end up with silent data > corruption. > > ... this. Of course, if you do arbitrary nonsense like restoring a backup without "backup_label", you will get arbitrary data corruption. It is a fundamental principle that, apart from "backup_label", there is no way to tell a backup from a crashed data directory. Any backup/recovery software must refuse to use a backup that lacks that file. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Something else about Redo Logs disappearing
On Sat, Jun 13, 2020 at 10:13 PM Peter wrote: > On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote: > ! > Okay. So lets behave like professional people and figure how that > ! > can be achieved: > ! > At first, we drop that WAL requirement, because with WAL archiving > ! > it is already guaranteed that an unbroken chain of WAL is always > ! > present in the backup (except when we have a bug like the one that > ! > lead to this discussion). > ! > So this is **not part of the scope**. > ! > > ! > ! I would assume that anybody who deals with backups professionally > wouldn't > ! consider that out of scope, > > I strongly disagree. I might suppose You haven't thought this to the > proper end. See: > You may disagree, but I would argue that this is because you are the one who has not thought it through. But hey, let's agree to disagree. You can see that all the major attributes (scheduling, error-handling, > signalling, ...) of a WAL backup are substantially different to that > of any usual backup. This is a different *Class* of backup object, therefore it needs an > appropriate infrastructure that can handle these attributes correctly. > Yes, this is *exactly* why special-handling the WAL during the base backup makes a lot of sense. Is it required? No. Will it make your backups more reliable? Yes. But it depends on what your priorities are. But, if You never have considered *continuous* archiving, and only > intend to take a functional momentarily backup of a cluster, then You > may well have never noticed these differences. I noticed them mainly > because I did *BUILD* such an infrastructure (the 20 lines of shell > script, you know). > Yes, if you take a simplistic view of your backups, then yes. And yes, I was indeed talking about *professional* approaches. > Sure. ! There is *absolutely* no need for threading to use the current APIs. You > ! need to run one query, go do something else, and then run another > ! query. > > Wrong. The point is, I dont want to "go do something else", I have to > exit() and get back to the initiator at that place. > That is not a requirement of the current PostgreSQL APIs. (in fact, using threading would add a significant extra burden there, as libpq does not allow sharing of connections between threads) That is a requirement, and indeed a pretty sharp limitation, of the *other* APIs you are working with, it sounds like. The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do they require any form of threading. And the fact that you need to do an exit() would negate any threading anyway, so that seems to be a false argument regardless. This is also clearly visible in Laurenz' code: he utilizes two > unchecked background tasks (processes, in this case) with loose > coupling for the purpose, as it does not work otherwise. > Yes, because he is also trying to work around a severely limited API *on the other side*. There's plenty of backup integrations that don't have this limitation. They all work perfectly fine with no need for exit() and certainly no weird need for special threading. The most interesting point in there appears to be this: > > that the backup label and tablespace map files are not written to > > disk. Instead, their would-be contents are returned in *labelfile > > and *tblspcmapfile, > > This is in do_pg_start_backup() - so we actually HAVE this data > already at the *START* time of the backup! > Then why in hell do we wait until the END of the backup before we > hand this data to the operator: at a time when the DVD with the > Because it cannot be safely written *into the data directory*. Now, it could be written *somewhere else*, that is true. And then you would add an extra step at restore time to rename it back. But then your restore would now also require a plugin. ( > backup is already fixated and cannot be changed anymore, so that > You don't need to change the the backup, only append to it. If you are calling pg_stop_backup() at a time when that is no longer possible, then you are calling pg_stop_backup() at the wrong time. As I can read, there is no difference in the function requirements > between exclusive and non-exclusive mode, in that regard: the > backup-label file is NOT necessary in the running cluster data tree, > BUT it should get into the RESTORED data tree before starting it. > Correct. It is in fact actively harmful in the running cluster data tree. And I can't find a single one of those "big problems". What I do find > is just people whining that their cluster doesn't start and they can't > simply delete a file, even if told so. Like soldier complaining that > his gun doesn't shoot and he has no idea how to reload. > Have you actually tried it? Or dealt with the many people who have run into corruption around this? Again, as suggested before, review the discussions that led up to the changes. There are plenty of examples there. ! > I now hope very much that M
Re: Something else about Redo Logs disappearing
On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote: ! > Okay. So lets behave like professional people and figure how that ! > can be achieved: ! > At first, we drop that WAL requirement, because with WAL archiving ! > it is already guaranteed that an unbroken chain of WAL is always ! > present in the backup (except when we have a bug like the one that ! > lead to this discussion). ! > So this is **not part of the scope**. ! > ! ! I would assume that anybody who deals with backups professionally wouldn't ! consider that out of scope, I strongly disagree. I might suppose You haven't thought this to the proper end. See: The WAL backup is, for a couple of reasons, different to most other backup objects: - it runs on demand, not per schedule. - it must be immediately repeated (or alerted), until success, otherwise the database might stop. - it might use a different storage (e.g. twofold/mirrored), for obvious reasons. - it gets deleted after backup, and that is a point-of-no-return. For the data tree backup, on the contrary, all these attributes are identical to those of any other incr/full filesystem backup. You can see that all the major attributes (scheduling, error-handling, signalling, ...) of a WAL backup are substantially different to that of any usual backup. This is a different *Class* of backup object, therefore it needs an appropriate infrastructure that can handle these attributes correctly. The WAL do *not belong* into the scope of the data tree backup, they are to be handled by different schemes in a different way. But, if You never have considered *continuous* archiving, and only intend to take a functional momentarily backup of a cluster, then You may well have never noticed these differences. I noticed them mainly because I did *BUILD* such an infrastructure (the 20 lines of shell script, you know). And yes, I was indeed talking about *professional* approaches. ! There is *absolutely* no need for threading to use the current APIs. You ! need to run one query, go do something else, and then run another ! query. Wrong. The point is, I dont want to "go do something else", I have to exit() and get back to the initiator at that place. This is also clearly visible in Laurenz' code: he utilizes two unchecked background tasks (processes, in this case) with loose coupling for the purpose, as it does not work otherwise. ! But then, lets think another step forward: for what purpose do we ! > actually need to call pg_start_backup() and pg_stop_backup() at all? ! > I couldn't find exhaustive information about that, only some partial ! > facts. ! > ! ! Since you don't trust the documentation, I suggest you take a look at ! https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=55cac186dc71fcc2f4628f9974b30850bb51eb5d;hb=92c58fd94801dd5c81ee20e26c5bb71ad64552a8#l10438 Thanks, this indeed gives a most entertaining read. Honestly, I've never seen such a source before - there is thrice as much storytime than code. :) The most interesting point in there appears to be this: > that the backup label and tablespace map files are not written to > disk. Instead, their would-be contents are returned in *labelfile > and *tblspcmapfile, This is in do_pg_start_backup() - so we actually HAVE this data already at the *START* time of the backup! Then why in hell do we wait until the END of the backup before we hand this data to the operator: at a time when the DVD with the backup is already fixated and cannot be changed anymore, so that we have to send this data to quarternary storage (spell: the line printer) and have girl Friday sort them out (and probably mix them up)?? Is this indeed Boeing-security: try to avoid possible mistake by creating a much bigger risk? ! Things that remain to be figured out: ! > 1. What does pg_start_backup actually do and why would that be ! > necessary? I could not find exhaustive information, but this can ! > probably figured from the source. Currently I know so much: ! > - it writes a backup_label file. That is just a few lines of ! >ASCII and should not be difficult to produce. ! > ! ! It does that only in exclusive mode, and doing that is one of the big ! problems with exclusive mode. So don't do that. As I can read, there is no difference in the function requirements between exclusive and non-exclusive mode, in that regard: the backup-label file is NOT necessary in the running cluster data tree, BUT it should get into the RESTORED data tree before starting it. And I can't find a single one of those "big problems". What I do find is just people whining that their cluster doesn't start and they can't simply delete a file, even if told so. Like soldier complaining that his gun doesn't shoot and he has no idea how to reload. I could devise a dozen ways how to detect that situation automatically during rc.d and remove the file - if it were of any concern to me. ! > I now hop
Re: Something else about Redo Logs disappearing
On Sat, Jun 13, 2020 at 01:53:28PM +0200, Laurenz Albe wrote: ! > I've never seen anybody coding bash - it is strongly shunned in the ! > Berkeley community. ! ! Strange, but then I don't move in these circles. Never mind. ! > Some Questions: ! > 1. There are explicit error messages in loc-82 and -92 of pgpre.sh. ! > To where are these written? ! ! Standard error. It is up to the caller of the script to route that ! somewhere useful. Understood. ! > 2. The result data from pg_stop_backup() are stored into the living ! > database. But, according to the docs, they should be placed into ! > the completed backup. Do I have a misunderstanding here? ! ! Right, but these scripts don't know anything about that backup itself. ! They are designed to be called before and after the backup. ! In between, you back up the data directory however you think fit. ! ! It is the responsibility of the caller of the post-backup script ! to add the "backup_label" file to the backup. I see. ! > 4. If, by misconfiguration and/or operator error, the backup system ! > happens to start a second backup. in parallel to the first, ! > then do I correctly assume, both backups will be rendered ! > inconsistent while this may not be visible to the operator; and ! > the earlier backup would be flagged as apparently successful while ! > carrying the wrong (later) label? ! ! If you are using my scripts and start a second backup while the first ! one is still running, the first backup will be interrupted. This is not what I am asking. It appears correct to me, that, on the database, the first backup will be interrupted. But on the tape side, this might go unnoticed, and on completion it will successfully receive the termination code from the *SECOND* backup - which means that on tape we will have a seemingly successful backup, which 1. is corrupted, and 2. carries a wrong label. ! This is specific to my scripts, PostgreSQL's non-exclusive backup ! can perform more than one concurrent backup successfully. ! I tried to keep things simple. I understand. But the operator may not know that and/or accidentially start a second backup while one is still running. And this will then result in ... ! If you have the wrong "backup_label", you end up with silent data corruption. ... this. Indeed this is difficult to avoid, because the high risk of silent data corruption is an elementary architectural feature of the so-called "new API". Which is why this is not going to run on my ship. But you will have to wait - the to-be-expected inrush of service-sales due to corrupted clusters will only happen after R.13 is active and peope are *forced* to cope with that "new API". Thanks for the effort of answering my questions. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Sat, 2020-06-13 at 05:24 +0200, Peter wrote: > ! https://github.com/cybertec-postgresql/safe-backup > > Ah, thank You, very nice. > > I've never seen anybody coding bash - it is strongly shunned in the > Berkeley community. Strange, but then I don't move in these circles. > Some Questions: > 1. There are explicit error messages in loc-82 and -92 of pgpre.sh. > To where are these written? Standard error. It is up to the caller of the script to route that somewhere useful. > 2. The result data from pg_stop_backup() are stored into the living > database. But, according to the docs, they should be placed into > the completed backup. Do I have a misunderstanding here? Right, but these scripts don't know anything about that backup itself. They are designed to be called before and after the backup. In between, you back up the data directory however you think fit. It is the responsibility of the caller of the post-backup script to add the "backup_label" file to the backup. > 3. The most common error cause of a backup might be tape-side > malfunction. So far as I see, the way to handle this is currently, > to provide a timeout for pgpre.sh (which is problematic, because > we might as well have just reached end-of-tape and have to wait > until monday for the operator to change it). May I suggest to add > a switch to pgpost.sh, in order to volutarily fail out of the job? As said above, the backup itself is not the job of these scripts. > 4. If, by misconfiguration and/or operator error, the backup system > happens to start a second backup. in parallel to the first, > then do I correctly assume, both backups will be rendered > inconsistent while this may not be visible to the operator; and > the earlier backup would be flagged as apparently successful while > carrying the wrong (later) label? If you are using my scripts and start a second backup while the first one is still running, the first backup will be interrupted. This is specific to my scripts, PostgreSQL's non-exclusive backup can perform more than one concurrent backup successfully. I tried to keep things simple. > BTW: what does, in general, happen, if a backup_label file gets > accidentially swapped with one from a parallel, but slightly later > backup? Do I correctly assume that such mistake gets somehow detected, > as otherwise it would have just the same unwelcome effects > (i.e. silent data corruption) as no backup_label at all? If you have the wrong "backup_label", you end up with silent data corruption. That should not happen with my scripts, though. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Something else about Redo Logs disappearing
On Thu, Jun 11, 2020 at 10:53:15PM +0200, Laurenz Albe wrote: ! On Thu, 2020-06-11 at 22:35 +0200, Magnus Hagander wrote: ! > I believe somebody around that time also wrote a set of bash scripts that can be used in a pre/post-backup-job combination with the current APIs. ! ! https://github.com/cybertec-postgresql/safe-backup Ah, thank You, very nice. I've never seen anybody coding bash - it is strongly shunned in the Berkeley community. Some Questions: 1. There are explicit error messages in loc-82 and -92 of pgpre.sh. To where are these written? 2. The result data from pg_stop_backup() are stored into the living database. But, according to the docs, they should be placed into the completed backup. Do I have a misunderstanding here? 3. The most common error cause of a backup might be tape-side malfunction. So far as I see, the way to handle this is currently, to provide a timeout for pgpre.sh (which is problematic, because we might as well have just reached end-of-tape and have to wait until monday for the operator to change it). May I suggest to add a switch to pgpost.sh, in order to volutarily fail out of the job? 4. If, by misconfiguration and/or operator error, the backup system happens to start a second backup. in parallel to the first, then do I correctly assume, both backups will be rendered inconsistent while this may not be visible to the operator; and the earlier backup would be flagged as apparently successful while carrying the wrong (later) label? BTW: what does, in general, happen, if a backup_label file gets accidentially swapped with one from a parallel, but slightly later backup? Do I correctly assume that such mistake gets somehow detected, as otherwise it would have just the same unwelcome effects (i.e. silent data corruption) as no backup_label at all? cheerio, PMc
Re: Something else about Redo Logs disappearing
Greetings, * Adrian Klaver (adrian.kla...@aklaver.com) wrote: > On 6/9/20 4:15 AM, Stephen Frost wrote: > >* Adrian Klaver (adrian.kla...@aklaver.com) wrote: > >>I use pg_backrest, but it does not look promising for running on BSD: > >>https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html > > > >That's an unfortunately ancient post, really, considering that > >pgbackrest has now been fully rewritten into C, and Luca as recently as > >September 2019 was saying he has it working on FreeBSD. > > Yeah, but this: > > https://github.com/pgbackrest/pgbackrest/issues/686 > > is not clear on whether a user can do that w/o a certain amount of hand > holding. Luca (at my asking) wrote up a new blog post to clarify things, you can see it here: https://fluca1978.github.io/2020/06/12/pgbackrestOnFreeBSD.html No particular hand holding needed. Thanks, Stephen signature.asc Description: PGP signature
Re: Something else about Redo Logs disappearing
On Thu, 2020-06-11 at 22:35 +0200, Magnus Hagander wrote: > I believe somebody around that time also wrote a set of bash scripts that can > be used in a pre/post-backup-job combination with the current APIs. https://github.com/cybertec-postgresql/safe-backup Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Something else about Redo Logs disappearing
On Thu, Jun 11, 2020 at 10:13 PM Peter wrote: > > Okay. So lets behave like professional people and figure how that > can be achieved: > At first, we drop that WAL requirement, because with WAL archiving > it is already guaranteed that an unbroken chain of WAL is always > present in the backup (except when we have a bug like the one that > lead to this discussion). > So this is **not part of the scope**. > I would assume that anybody who deals with backups professionally wouldn't consider that out of scope, but sure, for the sake of argument, let's do that. ! This is only one option though, there are others- you can also use > ! pgbackrest to push your backups to s3 (or any s3-compatible data storage > ! system, which includes some backup systems), and we'll be adding > ! support > > ! I concur that this is becoming a madhouse, and is pushing past the limit > ! for what I'm willing to deal with when trying to assist someone. > > Well, then that might be a misconception. I'm traditionally a > consultant, and so I am used to *evaluate* solutions. I don't need > assistance for that, I only need precise technical info. > Excellent. Then let's stick to that. This STILL needs threaded programming (as I said, there is no way to > avoid that with those "new API"), but in this case it is effectively > reduced to just grab the return-code of some program that has been > started with "&". > There is *absolutely* no need for threading to use the current APIs. You need to run one query, go do something else, and then run another query. It's 100% sequential, so there is zero need for threads. Now, if you're stuck in shellscript, it's a little more complicated. But it does not need threading. But then, lets think another step forward: for what purpose do we > actually need to call pg_start_backup() and pg_stop_backup() at all? > I couldn't find exhaustive information about that, only some partial > facts. > Since you don't trust the documentation, I suggest you take a look at https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=55cac186dc71fcc2f4628f9974b30850bb51eb5d;hb=92c58fd94801dd5c81ee20e26c5bb71ad64552a8#l10438 It has a fair amount of detail of the underlying reasons, and of course links to all the details. Things that remain to be figured out: > 1. What does pg_start_backup actually do and why would that be > necessary? I could not find exhaustive information, but this can > probably figured from the source. Currently I know so much: > - it writes a backup_label file. That is just a few lines of >ASCII and should not be difficult to produce. > It does that only in exclusive mode, and doing that is one of the big problems with exclusive mode. So don't do that. > I now hope very much that Magnus Hagander will tell some of the > impeding "failure scenarios", because I am getting increasingly > tired of pondering about probable ones, and searching the old > list entries for them, without finding something substantial. > Feel free to look at the mailinglist archives. Many of them have been explained there before. Pay particular attention to the threads around when the deprecated APIs were actually deprecaed. I believe somebody around that time also wrote a set of bash scripts that can be used in a pre/post-backup-job combination with the current APIs. //Magnus
Re: Something else about Redo Logs disappearing
Greetings, * Peter (p...@citylink.dinoex.sub.org) wrote: > On Wed, Jun 10, 2020 at 08:32:22AM -0400, Stephen Frost wrote: > ! > What repo?? I seem to have missed that at first glance. > ! > ! Yes, pgbackrest has a repo, like most other tools (though they call them > ! different things... pg_basebackup has one though it's not really > ! formal). > ! > ! > Are You indeed suggesting that one should have their data within > ! > the database, where it is worked with, and then use Your tool > ! > to copy it to some "repo" disk playground whatever area, and then > ! > use their regular backup system to COPY IT AGAIN into their > ! > backup/archiving system? Are You kiddin'? > ! > ! No, I'm not kidding and yes, that's what I'm suggesting. > > Well, do You think I could seriously sell that to anybody? That hardly seems to have much relevance. I do know that there are lots of deployments which have exactly that setup though. > ! You need a > ! consistent backup of your database that includes all the needed WAL to > ! perform a restore. > > Okay. So lets behave like professional people and figure how that > can be achieved: No, thanks, but I'm done. I will not continue to engage with someone who insists on belittling everyone they engage with. That's not appropriate for any professional environment, including these lists. Thanks, Stephen signature.asc Description: PGP signature
Re: Something else about Redo Logs disappearing
On Wed, Jun 10, 2020 at 08:32:22AM -0400, Stephen Frost wrote: ! > What repo?? I seem to have missed that at first glance. ! ! Yes, pgbackrest has a repo, like most other tools (though they call them ! different things... pg_basebackup has one though it's not really ! formal). ! ! > Are You indeed suggesting that one should have their data within ! > the database, where it is worked with, and then use Your tool ! > to copy it to some "repo" disk playground whatever area, and then ! > use their regular backup system to COPY IT AGAIN into their ! > backup/archiving system? Are You kiddin'? ! ! No, I'm not kidding and yes, that's what I'm suggesting. Well, do You think I could seriously sell that to anybody? ! You need a ! consistent backup of your database that includes all the needed WAL to ! perform a restore. Okay. So lets behave like professional people and figure how that can be achieved: At first, we drop that WAL requirement, because with WAL archiving it is already guaranteed that an unbroken chain of WAL is always present in the backup (except when we have a bug like the one that lead to this discussion). So this is **not part of the scope**. I'll get to the other item, the "consistent backup", further below. ! This is only one option though, there are others- you can also use ! pgbackrest to push your backups to s3 (or any s3-compatible data storage ! system, which includes some backup systems), and we'll be adding ! support ! I concur that this is becoming a madhouse, and is pushing past the limit ! for what I'm willing to deal with when trying to assist someone. Well, then that might be a misconception. I'm traditionally a consultant, and so I am used to *evaluate* solutions. I don't need assistance for that, I only need precise technical info. So lets get serious: It is NOT technically feasible to amplify the storage in a way that the entire backup data gets copied from the live database to some "repo" place first, and then again from that repo place to regular file-based backup/archiving storage. And it does not make a difference WHERE that other place is, if at Jeff's or whereever. It just does not belong into the loop. So, how could the alternative look like? I for my part consider doing this: With a proper transactional filesystem we can do recursive filesystem snapshots. That means, given a suitable filesystem layout, we can do a snapshot of the data tree, AND the pg_wal filesystem, AND the respective tablespaces. And this is atomical. So, what we then do in the before- hook, is: * we call pg_start_backup(whatever, false, false). * we issue the filesystem snapshot, atomical. * we call pg_stop_backup(whatever). And then we return with exit(0) (if all succeeded), and the file-based backup software can start to collect the files from the filesystem snapshot, and release it afterwards. This STILL needs threaded programming (as I said, there is no way to avoid that with those "new API"), but in this case it is effectively reduced to just grab the return-code of some program that has been started with "&". So far, so good. There is still one thing to be done, namely, the requirement to collect the data reported by pg_stop_backup() and add that to the backup, at a point in time where that is ALREADY CLOSED! (that's another piece of gross bogus in this "new API") - but with my intended method (and some ugliness) this could now also be solved. But then, lets think another step forward: for what purpose do we actually need to call pg_start_backup() and pg_stop_backup() at all? I couldn't find exhaustive information about that, only some partial facts. What we know for certain, is: if we leave these calls away, and just do a filesystem snapshot and make a backup from that, then we have exactly the same thing as if we had a power failure at that time. So this is restoreable, and the server will certainly start. The remaining question is: can we then give it our subsequently archived redologs and make it roll forward before going ready? I don't know that yet, but maybe, if we just throw these WAL into pg_wal, the thing might be stupid enough to swallow them. If that does not work, then there might still be ways to trick it into believing it's a standby server and make it roll forward these WAL. So, there are not only ways to avoid the useless storage amplification, there might even be means to get rid of that whole misconceived "API". Things that remain to be figured out: 1. What does pg_start_backup actually do and why would that be necessary? I could not find exhaustive information, but this can probably figured from the source. Currently I know so much: - it writes a backup_label file. That is just a few lines of ASCII and should not be difficult to produce. - it does a checkpoint. This can probably be done with the respective command (if it is needed at all). - it does temporarily switch to full_page_writes. This cannot be don
Re: Something else about Redo Logs disappearing
On Wed, Jun 10, 2020 at 01:10:36PM +0200, Magnus Hagander wrote: ! > Just having a look at their webpage, something seems to have been updated ! > recently, they now state that they have a new postgres adapter: ! > ! > https://www.bareos.com/en/company_news/postgres-plugin-en1.html ! > Enjoy reading, and tell us what You think. ! > ! ! This one unfortunately rings out of somebody who doesn't know how to back ! up postgres, at least in the past 10-15 years. ! ! They are using an API that has been deprecated for years - in what's ! announced as a brand new product. They are advocating local archiving, ! which basically guarantees dataloss in the event of a disaster. Aye, thank You, that's exactly the impression I got. This is probably still the old thing I was talking about, just made into a new product. ! That's from a 3 minute look, but that's definitely enough to suggest this ! is not something I'd consider using. The matter is, that backup software (as a whole, not this postgres component) offers lots of things exactly as I like them. It is a great concept, a great implementation, but a bad coding quality and a bad maintenance policy. But then, one can get it for free; and I know of no other with such features. So I went thru the effort of fixing it up, so that it now well serves my needs - and use my own scripting for the add-ons. ! > Well, Your own docs show how to do it with a one-liner. So please ! > don't blame me for improving that to 20 lines. ! > ! ! Yes, those docs are unfortunately "known bad" and should definitely be ! improved on. it does very clearly state that the example is just an ! example. But it doesn't clearly state *why* it shouldn't be used. That's why I felt the ethical need to speak up and share my consideration. Now it's up to those in charge and not my issue anymore. ;) ! In my understanding, backup is done via pgdump. The archive logs are ! > for emergencies (data corruption, desaster), only. And emergencies ! > would usually be handled by some professional people who know what ! > they have to do. ! > ! ! I'd say it's the exact opposite. backups are done via pg_basebackup or ! manual basebackups. Archive logs are for point in time recovery. pg_dump ! can be used as a secondary "backup to the backups" option, but it is most ! interesting for things that are not backups (such as inspecting data, or ! provisioning partial test systems). ! ! Different for different scenarios of course, but that would be the base ! scenario. And pg_dump are definitely as far from good backups are you can ! get while still having something that can be called approximately backups. ! It might be enough for small databases, but even in those cases ! pg_basebackup (without archive logging) is easier... It's easier to create - but to apply? That depends on how many DBs are in the cluster and how diverse their use. Also at any major version switch these backups get worthless; one cannot use them for longterm. (I suppose this is also true for pg_basebackup.) I'm creating my longterm (and offsite) simply as clones from the regular full backup. So what I came up with for now, is: I run pg_dump over all the present databases, plus globals, chunk that up (in a similar way like chunked HTTP works), feed it onto a pipe and backup that pipe. No need for interim storage, so it can get as large as the backup software can take it. And that should work for longterm - and I don't currently see a better option. (This one does not work in 20 lines shellscript, because I didn't get a reliable chunker running in shell.) ! And yes, I read that whole horrible discussion, and I could tear my ! > hair out, really, concerning the "deprecated API". I suppose You mean ! > the mentioning in the docs that the "exclusive low-level backup" is ! > somehow deprecated. ! > ! ! Yes. There is no "somehow", it's deprecated. Then lets not call it "somehow", as, more precisely, from my understanding so far, that so called "new API" is ill-conceived and troublesome in more than one regard. I would, with my current knowledge, recommend to avoid, or better, abandon it. Or, in other words: it is similar to what Boeing tried to do, in forcing things upon people via software, for safety reasons - and now see where Boeing got with that. ! > But now, with the now recommended "non-exclusive low-level backup", ! > the task is different: now your before-hook needs to do two things ! > at the same time: ! > 1. keep a socket open in order to hold the connection to postgres ! > (because postgres will terminate the backup when the socket is ! > closed), and ! > 2. invoke exit(0) (because the actual backup will not start until ! > the before- hook has properly delivered a successful exit code. ! > And, that is not only difficult, it is impossible. ! ! It is not impossible. It is harder if you limit your available tools yes, ! but it also *works*. In this description which I choose, I would think it is actually imposs
Re: Something else about Redo Logs disappearing
Greetings, * Peter (p...@citylink.dinoex.sub.org) wrote: > On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote: > ! > And then 90% of the things offered here become superfluous, because > ! > they are already handled site-wide. And then you will have to > ! > consider integration of both pieces - and that will most likely be > ! > more work and more error-prone than just writing a few adapters in > ! > shell. > ! > ! pgbackrest's repo can be safely backed up using the simple file-based > ! backup utilities that you're referring to here. I suspect some of the > ! other solution's backups also could be, but you'd probably want to make > ! sure. > > What repo?? I seem to have missed that at first glance. Yes, pgbackrest has a repo, like most other tools (though they call them different things... pg_basebackup has one though it's not really formal). > Are You indeed suggesting that one should have their data within > the database, where it is worked with, and then use Your tool > to copy it to some "repo" disk playground whatever area, and then > use their regular backup system to COPY IT AGAIN into their > backup/archiving system? Are You kiddin'? No, I'm not kidding and yes, that's what I'm suggesting. You need a consistent backup of your database that includes all the needed WAL to perform a restore. This is only one option though, there are others- you can also use pgbackrest to push your backups to s3 (or any s3-compatible data storage system, which includes some backup systems), and we'll be adding support for Azure very shortly, and have plans to add GCS too in the future, along with others probably. > Is this becoming a madhouse, or are You going to refund them that? I concur that this is becoming a madhouse, and is pushing past the limit for what I'm willing to deal with when trying to assist someone. > Let me tell You something: the people I used to work for, sometimes > had a problem. They had some amount of data that was created during > the day, and they had the night to write that data away to backup. > That would usually mean, four or eight of the big tapes, streaming in > parallel, fibers saturated, all night thru. And the problem usually was > that they would need a longer night. At least the math had to be done > properly. Indeed, parallel backup is important, which is why pgbackrest supports it, along with compression and encryption, all in-stream between the database server and the repo, along with calculating a SHA to be stored of every single file seen, allowing you to validate that the files haven't changed since the backup was done when restoring. > Maybe You never encountered these, but there are surroundings where > there is no spare room for nonsense. Maybe that'S why these people > preferred to use oracle. I've both dealt with keeping tape drives fully loaded to avoid breaking the tape by studdering it (and writing dedicated C code to deal with exactly that), and dealt with backing up and restoring Oracle, including with various "enterprise" backup technologies (with varying levels of success...). None of what is being brought up here is new, novel, or even particularly interesting. Thanks, Stephen signature.asc Description: PGP signature
Re: Something else about Redo Logs disappearing
On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote: ! > And then 90% of the things offered here become superfluous, because ! > they are already handled site-wide. And then you will have to ! > consider integration of both pieces - and that will most likely be ! > more work and more error-prone than just writing a few adapters in ! > shell. ! ! pgbackrest's repo can be safely backed up using the simple file-based ! backup utilities that you're referring to here. I suspect some of the ! other solution's backups also could be, but you'd probably want to make ! sure. What repo?? I seem to have missed that at first glance. Are You indeed suggesting that one should have their data within the database, where it is worked with, and then use Your tool to copy it to some "repo" disk playground whatever area, and then use their regular backup system to COPY IT AGAIN into their backup/archiving system? Are You kiddin'? Are You indeed suggesting that people should buy the amount of disks that they use for their database AGAIN in order for Your software to copy the stuff around? Is this becoming a madhouse, or are You going to refund them that? Let me tell You something: the people I used to work for, sometimes had a problem. They had some amount of data that was created during the day, and they had the night to write that data away to backup. That would usually mean, four or eight of the big tapes, streaming in parallel, fibers saturated, all night thru. And the problem usually was that they would need a longer night. At least the math had to be done properly. Maybe You never encountered these, but there are surroundings where there is no spare room for nonsense. Maybe that'S why these people preferred to use oracle. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Wed, Jun 10, 2020 at 1:53 AM Peter wrote: > On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote: > > ! The backup solution is? > > https://www.bareos.com/ > > ! Fine rant below. Go forth and work your wonders. > > I don't need to, anymore. I did that, for about 20 years - people > I used to work for as a consultant (major banks and insurance shops) > would usually run Informix or Oracle. Postgres is just my own private > fancy. > > On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote: > ! * Peter (p...@citylink.dinoex.sub.org) wrote: > ! > This professional backup solution also offers support for postgres. > ! > Sadly, it only covers postgres up to Rel.9, and that piece of software > ! > wasn't touched in the last 6 or 7 years. > ! > ! Then it certainly doesn't work with the changes in v12, and probably has > ! other issues, as you allude to. > > Just having a look at their webpage, something seems to have been updated > recently, they now state that they have a new postgres adapter: > > https://www.bareos.com/en/company_news/postgres-plugin-en1.html > Enjoy reading, and tell us what You think. > This one unfortunately rings out of somebody who doesn't know how to back up postgres, at least in the past 10-15 years. They are using an API that has been deprecated for years - in what's announced as a brand new product. They are advocating local archiving, which basically guarantees dataloss in the event of a disaster. That's from a 3 minute look, but that's definitely enough to suggest this is not something I'd consider using. ! > Actually, I am getting very tired of reading that something which can > ! > easily be done within 20 lines of shell scripting, would need special > ! > ! This is just simply false- you can't do it properly in 20 lines of shell > ! scripting. > > Well, Your own docs show how to do it with a one-liner. So please > don't blame me for improving that to 20 lines. > Yes, those docs are unfortunately "known bad" and should definitely be improved on. it does very clearly state that the example is just an example. But it doesn't clearly state *why* it shouldn't be used. In my understanding, backup is done via pgdump. The archive logs are > for emergencies (data corruption, desaster), only. And emergencies > would usually be handled by some professional people who know what > they have to do. > I'd say it's the exact opposite. backups are done via pg_basebackup or manual basebackups. Archive logs are for point in time recovery. pg_dump can be used as a secondary "backup to the backups" option, but it is most interesting for things that are not backups (such as inspecting data, or provisioning partial test systems). Different for different scenarios of course, but that would be the base scenario. And pg_dump are definitely as far from good backups are you can get while still having something that can be called approximately backups. It might be enough for small databases, but even in those cases pg_basebackup (without archive logging) is easier... And yes, I read that whole horrible discussion, and I could tear my > hair out, really, concerning the "deprecated API". I suppose You mean > the mentioning in the docs that the "exclusive low-level backup" is > somehow deprecated. > Yes. There is no "somehow", it's deprecated. > This is a very big bad. Because: normally you can run the base backup > as a strictly ordinary file-level backup in "full" mode, just as any > backup software can do it. You will simply execute the > pg_start_backup() and pg_stop_backup() commands in the before- and > after- hooks - and any backup software will offer these hooks. > But now, with the now recommended "non-exclusive low-level backup", > the task is different: now your before-hook needs to do two things > at the same time: > 1. keep a socket open in order to hold the connection to postgres > (because postgres will terminate the backup when the socket is > closed), and > 2. invoke exit(0) (because the actual backup will not start until > the before- hook has properly delivered a successful exit code. > And, that is not only difficult, it is impossible. > It is not impossible. It is harder if you limit your available tools yes, but it also *works*. So yes, this is really a LOT of work. But the point is: this all is > not really necessary, because currently the stuff works fine in the > old way. > It does not, no. It works in the simple cases, but it has multiple failure scenarios that *cannot* be fixed without changing those fundamentals. But you can always go for the actual old way -- just stop postgres in the pre-job and start it again in the post-job. That's by far the easiest. And that *does* work and is fully supported. ! > The only really interesting thing there is the pg_probackup. These > ! > folks seem to have found a way to do row-level incremental backups. > ! > ! pg_probackup doesn't do row-level incremental backups, unless I've > ! misse
Re: Something else about Redo Logs disappearing
On 6/9/20 4:35 PM, Peter wrote: On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote: And that can very well be done properly with an incremental filesystem backup software plus some 20 lines of shellscript. Read the caveats here: https://www.postgresql.org/docs/12/backup-file.html Now talking about doing an automated restore, or, having some menu- driven solution, or -the worst of all- having a solution that can be operated by morons - that's an entirely different matter. In my understanding, backup is done via pgdump. The archive logs are for emergencies (data corruption, desaster), only. And emergencies would usually be handled by some professional people who know what they have to do. Read the entire section below, for why WAL's are for backup also. FYI, if you don't properly set it up then you may not be protected for data corruption. See PITR. https://www.postgresql.org/docs/12/continuous-archiving.html Postgres is used by a wide gamut of people of differing abilities, many of who appreciate the availability of tested solutions to protect their data as they are not morons and understand there are people who can make their life easier. This was actually my job as a consultant: to de-mystify technology, and make it understandable as an arrangement of well explainable pieces of functionality with well-deducible consequences. Not seeing it. But this is no longer respected today; now people are expected to *NOT* understand the technology they handle, and instead believe in marketing and that it all is very complicated and un-intellegible. cheerio, PMc -- Adrian Klaver adrian.kla...@aklaver.com
Re: Something else about Redo Logs disappearing
Greetings, * Peter (p...@citylink.dinoex.sub.org) wrote: > On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote: > ! * Peter (p...@citylink.dinoex.sub.org) wrote: > ! > This professional backup solution also offers support for postgres. > ! > Sadly, it only covers postgres up to Rel.9, and that piece of software > ! > wasn't touched in the last 6 or 7 years. > ! > ! Then it certainly doesn't work with the changes in v12, and probably has > ! other issues, as you allude to. > > Just having a look at their webpage, something seems to have been updated > recently, they now state that they have a new postgres adapter: > > https://www.bareos.com/en/company_news/postgres-plugin-en1.html > Enjoy reading, and tell us what You think. I'm afraid I'm not particularly interested in performing a pro bono evaluation of a commercial product, though considering they've put out a press release with obviously broken links, I already have suspicions of what I'd find ... (try clicking on their 'experimental/nightly' link). A quick look at the docs also shows that it's referring to recovery.conf, which no longer exists since v12 was released back in September, so, yeah, isn't exactly current. > ! > Actually, I am getting very tired of reading that something which can > ! > easily be done within 20 lines of shell scripting, would need special > ! > ! This is just simply false- you can't do it properly in 20 lines of shell > ! scripting. > > Well, Your own docs show how to do it with a one-liner. So please > don't blame me for improving that to 20 lines. No, the documentation provides an example for the purpose of understanding how the replacement in the command is done and explicitly says that you probably shouldn't use that command. > ! Sure, you can write something that has probably next to no > ! error checking, > > Before judging that, one should first specify what precisely is the > demand. I really don't need to in order to be able to judge the notion of a 20 line shell script being able to manage to perform a backup correctly. > In my understanding, backup is done via pgdump. The archive logs are > for emergencies (data corruption, desaster), only. And emergencies > would usually be handled by some professional people who know what > they have to do. No, that's not the case. pg_dump isn't at all involved in the backups that we're talking about here which are physical, file-level, backups. > ! uses the deprecated API that'll cause your systems to > ! fail to start if you ever happen to have a reboot during a backup > > It is highly unlikely that I did never have that happen during 15 > years. So what does that mean? If I throw in a pg_start_backup('bogus'), > and then restart the cluster, it will not work anymore? If you perform a pg_start_backup(), have a checkpoint happen such that older WAL is removed, and then reboot the box or kill -9 postgres, no, it's not going to start anymore because there's going to be a backup_label file that is telling the cluster that it needs to start replaying WAL from an older point in time than what you've got WAL for. > Lets see... > Clean stop/start - no issue whatsoever. (LOG: online backup mode > canceled) > kill -9 the whole flock - no issue whatsoever (Log: database system > was interrupted) > I won't pull the plug now, but that has certainly happened lots of > times in the past, and also yielded no issue whatsoever - simply > because there *never* was *any* issue whatsover with Postgres (until > I got the idea to install the relatively fresh R.12 - but that's > understandable). Being lucky really isn't what you want to bet on. > So maybe this problem exists only on Windows? No, it's not Windows specific. > And yes, I read that whole horrible discussion, and I could tear my > hair out, really, concerning the "deprecated API". I suppose You mean > the mentioning in the docs that the "exclusive low-level backup" is > somehow deprecated. Yes, it's deprecated specifically because of the issues outlined above. They aren't hypothetical, they do happen, and people do get bit by them. > This is a very big bad. Because: normally you can run the base backup > as a strictly ordinary file-level backup in "full" mode, just as any > backup software can do it. You will simply execute the > pg_start_backup() and pg_stop_backup() commands in the before- and > after- hooks - and any backup software will offer these hooks. > > But now, with the now recommended "non-exclusive low-level backup", > the task is different: now your before-hook needs to do two things > at the same time: > 1. keep a socket open in order to hold the connection to postgres > (because postgres will terminate the backup when the socket is > closed), and > 2. invoke exit(0) (because the actual backup will not start until > the before- hook has properly delivered a successful exit code. > And, that is not only difficult, it is impossible. One would imagine that if the commercial
Re: Something else about Redo Logs disappearing
On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote: ! The backup solution is? https://www.bareos.com/ ! Fine rant below. Go forth and work your wonders. I don't need to, anymore. I did that, for about 20 years - people I used to work for as a consultant (major banks and insurance shops) would usually run Informix or Oracle. Postgres is just my own private fancy. On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote: ! * Peter (p...@citylink.dinoex.sub.org) wrote: ! > This professional backup solution also offers support for postgres. ! > Sadly, it only covers postgres up to Rel.9, and that piece of software ! > wasn't touched in the last 6 or 7 years. ! ! Then it certainly doesn't work with the changes in v12, and probably has ! other issues, as you allude to. Just having a look at their webpage, something seems to have been updated recently, they now state that they have a new postgres adapter: https://www.bareos.com/en/company_news/postgres-plugin-en1.html Enjoy reading, and tell us what You think. ! > Actually, I am getting very tired of reading that something which can ! > easily be done within 20 lines of shell scripting, would need special ! ! This is just simply false- you can't do it properly in 20 lines of shell ! scripting. Well, Your own docs show how to do it with a one-liner. So please don't blame me for improving that to 20 lines. ! Sure, you can write something that has probably next to no ! error checking, Before judging that, one should first specify what precisely is the demand. In a basic approach, the demand may be to get the logs out on tape in a failsafe automated fashion without any miss, and get the data tree out periodically, and have guaranteed that these files are untampered as on disk. And that can very well be done properly with an incremental filesystem backup software plus some 20 lines of shellscript. Now talking about doing an automated restore, or, having some menu- driven solution, or -the worst of all- having a solution that can be operated by morons - that's an entirely different matter. In my understanding, backup is done via pgdump. The archive logs are for emergencies (data corruption, desaster), only. And emergencies would usually be handled by some professional people who know what they have to do. You may consider different demands, and that is also fine, but doesn't need to concern me. ! uses the deprecated API that'll cause your systems to ! fail to start if you ever happen to have a reboot during a backup It is highly unlikely that I did never have that happen during 15 years. So what does that mean? If I throw in a pg_start_backup('bogus'), and then restart the cluster, it will not work anymore? Lets see... Clean stop/start - no issue whatsoever. (LOG: online backup mode canceled) kill -9 the whole flock - no issue whatsoever (Log: database system was interrupted) I won't pull the plug now, but that has certainly happened lots of times in the past, and also yielded no issue whatsoever - simply because there *never* was *any* issue whatsover with Postgres (until I got the idea to install the relatively fresh R.12 - but that's understandable). So maybe this problem exists only on Windows? And yes, I read that whole horrible discussion, and I could tear my hair out, really, concerning the "deprecated API". I suppose You mean the mentioning in the docs that the "exclusive low-level backup" is somehow deprecated. This is a very big bad. Because: normally you can run the base backup as a strictly ordinary file-level backup in "full" mode, just as any backup software can do it. You will simply execute the pg_start_backup() and pg_stop_backup() commands in the before- and after- hooks - and any backup software will offer these hooks. But now, with the now recommended "non-exclusive low-level backup", the task is different: now your before-hook needs to do two things at the same time: 1. keep a socket open in order to hold the connection to postgres (because postgres will terminate the backup when the socket is closed), and 2. invoke exit(0) (because the actual backup will not start until the before- hook has properly delivered a successful exit code. And, that is not only difficult, it is impossible. So, what has to be done instead: you need to write a separate network daemon, with the only purpose of holding that connection to postgres open. And that network daemon needs to handle the communication to the backup software on one side, and to postgres on the other side. And that network daemon then needs the full-blown feature requirements as a fail-safe network daemon should have (and that is a LOT!), plus it needs to handle all kinds of possible failures (like network interruptions) in that triangle, during the backup, and properly notify both sides of whatever might be ongoing (and that is NOT trivial). So yes, this is really a LOT of work. But the point is: this all is not really necessary, because currently
Re: Something else about Redo Logs disappearing
On 6/9/20 12:02 PM, Peter wrote: On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote: ! Adrian Klaver writes: ! > On 6/8/20 7:33 PM, Peter wrote: ! >> That "cp" is usually not synchronous. So there is the possibility ! >> that this command terminates successfully, and reports exitcode zero ! >> back to the Postgres, and then the Postgres will consider that log ! >> being safely away. ! ! > Which is why just following the above command in the docs is: ! > "(This is an example, not a recommendation, and might not work on all ! > platforms.) " ! ! Yeah. There have been discussions about changing that disclaimer to be ! more forceful, because in point of fact a plain "cp" is generally not safe ! enough. You need to fsync the written file, and on many filesystems you ! also have to fsync the directory it's in. It certainly does not need to be "more forceful" - because this is not about behavioural education, like training dogs, horses, or monkeys, and neither do we entertain a BDSM studio. What it needs instead is mention of the magic word "fsync". Because, we already know that - we just need a reminder at the proper place. Systems integrators are professional people. They are not in need of more beating (spell: forceful education), only of substantial technical hints and informations. ! > Generally for peace of mind folks use third party tools like: ! ! +1. Rolling your own archive script is seldom advisable. Well then, using commercial solutions brings it's own problems. E.g., FYI, the projects Stephen and I mentioned are Open Source. I'm sure you can get paid support for them, but you exist a higher plane then that so you can use then for free. the people I happened to work for often had problems with torsion, which happens when the solution gets longer than, say, twenty meters, and these are walked at high speeds. They didn't have a problem with scripting - rather the opposite, they were happy with it and paid good money for. cheerio, PMc -- Adrian Klaver adrian.kla...@aklaver.com
Re: Something else about Redo Logs disappearing
On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote: ! Adrian Klaver writes: ! > On 6/8/20 7:33 PM, Peter wrote: ! >> That "cp" is usually not synchronous. So there is the possibility ! >> that this command terminates successfully, and reports exitcode zero ! >> back to the Postgres, and then the Postgres will consider that log ! >> being safely away. ! ! > Which is why just following the above command in the docs is: ! > "(This is an example, not a recommendation, and might not work on all ! > platforms.) " ! ! Yeah. There have been discussions about changing that disclaimer to be ! more forceful, because in point of fact a plain "cp" is generally not safe ! enough. You need to fsync the written file, and on many filesystems you ! also have to fsync the directory it's in. It certainly does not need to be "more forceful" - because this is not about behavioural education, like training dogs, horses, or monkeys, and neither do we entertain a BDSM studio. What it needs instead is mention of the magic word "fsync". Because, we already know that - we just need a reminder at the proper place. Systems integrators are professional people. They are not in need of more beating (spell: forceful education), only of substantial technical hints and informations. ! > Generally for peace of mind folks use third party tools like: ! ! +1. Rolling your own archive script is seldom advisable. Well then, using commercial solutions brings it's own problems. E.g., the people I happened to work for often had problems with torsion, which happens when the solution gets longer than, say, twenty meters, and these are walked at high speeds. They didn't have a problem with scripting - rather the opposite, they were happy with it and paid good money for. cheerio, PMc
Re: Something else about Redo Logs disappearing
Greetings, * Peter (p...@citylink.dinoex.sub.org) wrote: > This professional backup solution also offers support for postgres. > Sadly, it only covers postgres up to Rel.9, and that piece of software > wasn't touched in the last 6 or 7 years. Then it certainly doesn't work with the changes in v12, and probably has other issues, as you allude to. > Actually, I am getting very tired of reading that something which can > easily be done within 20 lines of shell scripting, would need special This is just simply false- you can't do it properly in 20 lines of shell scripting. Sure, you can write something that has probably next to no error checking, uses the deprecated API that'll cause your systems to fail to start if you ever happen to have a reboot during a backup, and has no way to provide verification that the backup was at all successful after the fact, but that's not what I'd consider a proper solution- instead it's one that'll end up causing you a lot of pain down the road. Even the shell-script based solution (which I've never used and personally wouldn't really recommend, but to each their own) called 'pitery' (available here: https://github.com/dalibo/pitrery) is thousands of lines of code. > Does nobody know anymore how to do proper systems management > scripting? Using just the basic system tools which have proven to > work for more than 50 years now!? I've not met anything I'd call 'proper systems management scripting' that's 20 lines of code, shell script or not. > ! Not sure about pg_probackup. > > Okay, I had a -very short- look into these. Just scanning the > introductory pages. > > The only really interesting thing there is the pg_probackup. These > folks seem to have found a way to do row-level incremental backups. pg_probackup doesn't do row-level incremental backups, unless I've missed some pretty serious change in its development, but it does provide page-level, with, as I recall, an extension that didn't get good reception when it was posted and discussed on these mailing lists by other PG hackers. I don't know if those concerns about it have been addressed or not, you might ask the pg_probackup folks if you're considering it as a solution. > This is fine as long as you do not run any computers, and the only > application you are using is Postgres. > But, if you have other applications as well, or have computers, then > you will need a different backup solution, something that will cover > your site-wide backup demands, in a consistent fashion (think > something in the style of ADSM, or nowadays called Spectrum Protect). > > And then 90% of the things offered here become superfluous, because > they are already handled site-wide. And then you will have to > consider integration of both pieces - and that will most likely be > more work and more error-prone than just writing a few adapters in > shell. pgbackrest's repo can be safely backed up using the simple file-based backup utilities that you're referring to here. I suspect some of the other solution's backups also could be, but you'd probably want to make sure. PG generally isn't something that can be backed up using the simple file based backup solutions, as you might appreciate from just considering the number of tools written to specifically deal with the complexity of backing up an online PG cluster. Thanks, Stephen signature.asc Description: PGP signature
Re: Something else about Redo Logs disappearing
On 6/9/20 10:55 AM, Peter wrote: On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote: ! ! On 6/8/20 7:33 PM, Peter wrote: ! > ! > Actually, the affair had some good side: as usual I was checking ! > my own designs first and looking for flaws, and indeed I found one: ! > If you do copy out the archive logs not directly to tape, but to ! > some disk area for further processing, then there is an issue with ! > possible loss. If you do it like the docs say, with a command like ! > this: ! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p ! > +/mnt/server/archivedir/%f' # Unix ! > That "cp" is usually not synchronous. So there is the possibility ! > that this command terminates successfully, and reports exitcode zero ! > back to the Postgres, and then the Postgres will consider that log ! > being safely away. ! ! Which is why just following the above command in the docs is: ! ! "(This is an example, not a recommendation, and might not work on all ! platforms.) " So, what You are basically saying is: my worries are justified and correctly founded, and this is indeed a matter that needs to be taken care of. Thank You. ! Generally for peace of mind folks use third party tools like: ! ! pg_backrest(https://pgbackrest.org/), ! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or ! Barman(https://www.pgbarman.org/). Hmja. We may on occasion have a look into these... ! I use pg_backrest, but it does not look promising for running on BSD: ! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html That looks mostly like the usual things which can be fixed. Now, for the facts: I am already using a professional backup solution. (It is actually a "dual-use/commercial" solution, of the kind which you can either fetch from github and use without support, or buy with a contract or whatever and get support.) With this professional backup solution I have already identified 28 (spell: twenty-eight) bugs, and fixed/workarounded these, until I got it properly working. This professional backup solution also offers support for postgres. Sadly, it only covers postgres up to Rel.9, and that piece of software wasn't touched in the last 6 or 7 years. But the bigger issue there is, that backup solution needs it's own postgres database as it's backend - and it cannot backup the database it is using. Looks quite pointless to me, then. So I just did it all with shell (and it wasn't many lines). The backup solution is? So now, as I've been thru identifying and handling all the issues in that one backup solution, and since it is supposed to handle *all* backup demands (and not only postgres), I will certainly not start and go thru the same process again with one of these supposed solutions, where 90% of the code tries to solve the same things redundantly again, but then only for PG. They are not supposed. They are in use by many people/organizations across a wide variety of installations. Actually, I am getting very tired of reading that something which can easily be done within 20 lines of shell scripting, would need special "solutions", solutions that need to be compiled, solutions that would bring along their own fashion of interpreter, solutions that have a lot of their own dependencies and introduce mainly one thing: new bugs. They where developed as they could not be done in 20 lines of shell scripting and work at a reliable level. Fine rant below. Go forth and work your wonders. Does nobody know anymore how to do proper systems management scripting? Using just the basic system tools which have proven to work for more than 50 years now!? ! Not sure about pg_probackup. Okay, I had a -very short- look into these. Just scanning the introductory pages. The only really interesting thing there is the pg_probackup. These folks seem to have found a way to do row-level incremental backups. And pgbarman seems to have an impressive understanding of ITIL (in case anybody bothers about that). All these tools do only cover PG, but do that in any possible regards. This is fine as long as you do not run any computers, and the only application you are using is Postgres. But, if you have other applications as well, or have computers, then you will need a different backup solution, something that will cover your site-wide backup demands, in a consistent fashion (think something in the style of ADSM, or nowadays called Spectrum Protect). And then 90% of the things offered here become superfluous, because they are already handled site-wide. And then you will have to consider integration of both pieces - and that will most likely be more work and more error-prone than just writing a few adapters in shell. cheerio, PMc -- Adrian Klaver adrian.kla...@aklaver.com
Re: Something else about Redo Logs disappearing
On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote: ! ! On 6/8/20 7:33 PM, Peter wrote: ! > ! > Actually, the affair had some good side: as usual I was checking ! > my own designs first and looking for flaws, and indeed I found one: ! > If you do copy out the archive logs not directly to tape, but to ! > some disk area for further processing, then there is an issue with ! > possible loss. If you do it like the docs say, with a command like ! > this: ! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p ! > +/mnt/server/archivedir/%f' # Unix ! > That "cp" is usually not synchronous. So there is the possibility ! > that this command terminates successfully, and reports exitcode zero ! > back to the Postgres, and then the Postgres will consider that log ! > being safely away. ! ! Which is why just following the above command in the docs is: ! ! "(This is an example, not a recommendation, and might not work on all ! platforms.) " So, what You are basically saying is: my worries are justified and correctly founded, and this is indeed a matter that needs to be taken care of. Thank You. ! Generally for peace of mind folks use third party tools like: ! ! pg_backrest(https://pgbackrest.org/), ! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or ! Barman(https://www.pgbarman.org/). Hmja. We may on occasion have a look into these... ! I use pg_backrest, but it does not look promising for running on BSD: ! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html That looks mostly like the usual things which can be fixed. Now, for the facts: I am already using a professional backup solution. (It is actually a "dual-use/commercial" solution, of the kind which you can either fetch from github and use without support, or buy with a contract or whatever and get support.) With this professional backup solution I have already identified 28 (spell: twenty-eight) bugs, and fixed/workarounded these, until I got it properly working. This professional backup solution also offers support for postgres. Sadly, it only covers postgres up to Rel.9, and that piece of software wasn't touched in the last 6 or 7 years. But the bigger issue there is, that backup solution needs it's own postgres database as it's backend - and it cannot backup the database it is using. Looks quite pointless to me, then. So I just did it all with shell (and it wasn't many lines). So now, as I've been thru identifying and handling all the issues in that one backup solution, and since it is supposed to handle *all* backup demands (and not only postgres), I will certainly not start and go thru the same process again with one of these supposed solutions, where 90% of the code tries to solve the same things redundantly again, but then only for PG. Actually, I am getting very tired of reading that something which can easily be done within 20 lines of shell scripting, would need special "solutions", solutions that need to be compiled, solutions that would bring along their own fashion of interpreter, solutions that have a lot of their own dependencies and introduce mainly one thing: new bugs. Does nobody know anymore how to do proper systems management scripting? Using just the basic system tools which have proven to work for more than 50 years now!? ! Not sure about pg_probackup. Okay, I had a -very short- look into these. Just scanning the introductory pages. The only really interesting thing there is the pg_probackup. These folks seem to have found a way to do row-level incremental backups. And pgbarman seems to have an impressive understanding of ITIL (in case anybody bothers about that). All these tools do only cover PG, but do that in any possible regards. This is fine as long as you do not run any computers, and the only application you are using is Postgres. But, if you have other applications as well, or have computers, then you will need a different backup solution, something that will cover your site-wide backup demands, in a consistent fashion (think something in the style of ADSM, or nowadays called Spectrum Protect). And then 90% of the things offered here become superfluous, because they are already handled site-wide. And then you will have to consider integration of both pieces - and that will most likely be more work and more error-prone than just writing a few adapters in shell. cheerio, PMc
Re: Something else about Redo Logs disappearing
Greetings, * Adrian Klaver (adrian.kla...@aklaver.com) wrote: > On 6/9/20 4:15 AM, Stephen Frost wrote: > >* Adrian Klaver (adrian.kla...@aklaver.com) wrote: > >>I use pg_backrest, but it does not look promising for running on BSD: > >>https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html > > > >That's an unfortunately ancient post, really, considering that > >pgbackrest has now been fully rewritten into C, and Luca as recently as > >September 2019 was saying he has it working on FreeBSD. > > Yeah, but this: > > https://github.com/pgbackrest/pgbackrest/issues/686 > > is not clear on whether a user can do that w/o a certain amount of hand > holding. I've asked Luca to update his blog post and/or re-test on FreeBSD and he's said he would. We've moved to using autoconf and friends, and it's all in C now, so it really shouldn't be as much of an issue these days. I recall someone else building on FreeBSD not long ago, but not finding a reference to it offhand. > >If folks do run into issues with pgbackrest on FreeBSD, please let us > >know. ... again, this. Thanks, Stephen signature.asc Description: PGP signature
Re: Something else about Redo Logs disappearing
On 6/9/20 4:15 AM, Stephen Frost wrote: Greetings, * Adrian Klaver (adrian.kla...@aklaver.com) wrote: I use pg_backrest, but it does not look promising for running on BSD: https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html That's an unfortunately ancient post, really, considering that pgbackrest has now been fully rewritten into C, and Luca as recently as September 2019 was saying he has it working on FreeBSD. Yeah, but this: https://github.com/pgbackrest/pgbackrest/issues/686 is not clear on whether a user can do that w/o a certain amount of hand holding. If folks do run into issues with pgbackrest on FreeBSD, please let us know. Thanks, Stephen -- Adrian Klaver adrian.kla...@aklaver.com
Re: Something else about Redo Logs disappearing
Greetings, * Adrian Klaver (adrian.kla...@aklaver.com) wrote: > I use pg_backrest, but it does not look promising for running on BSD: > https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html That's an unfortunately ancient post, really, considering that pgbackrest has now been fully rewritten into C, and Luca as recently as September 2019 was saying he has it working on FreeBSD. If folks do run into issues with pgbackrest on FreeBSD, please let us know. Thanks, Stephen signature.asc Description: PGP signature
Re: Something else about Redo Logs disappearing
Adrian Klaver writes: > On 6/8/20 7:33 PM, Peter wrote: >> That "cp" is usually not synchronous. So there is the possibility >> that this command terminates successfully, and reports exitcode zero >> back to the Postgres, and then the Postgres will consider that log >> being safely away. > Which is why just following the above command in the docs is: > "(This is an example, not a recommendation, and might not work on all > platforms.) " Yeah. There have been discussions about changing that disclaimer to be more forceful, because in point of fact a plain "cp" is generally not safe enough. You need to fsync the written file, and on many filesystems you also have to fsync the directory it's in. > Generally for peace of mind folks use third party tools like: +1. Rolling your own archive script is seldom advisable. regards, tom lane
Re: Something else about Redo Logs disappearing
On 6/8/20 7:33 PM, Peter wrote: Actually, the affair had some good side: as usual I was checking my own designs first and looking for flaws, and indeed I found one: If you do copy out the archive logs not directly to tape, but to some disk area for further processing, then there is an issue with possible loss. If you do it like the docs say, with a command like this: archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p +/mnt/server/archivedir/%f' # Unix That "cp" is usually not synchronous. So there is the possibility that this command terminates successfully, and reports exitcode zero back to the Postgres, and then the Postgres will consider that log being safely away. Which is why just following the above command in the docs is: "(This is an example, not a recommendation, and might not work on all platforms.) " Generally for peace of mind folks use third party tools like: pg_backrest(https://pgbackrest.org/), pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or Barman(https://www.pgbarman.org/). as they offer safety checks for your backups. I use pg_backrest, but it does not look promising for running on BSD: https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html Not sure about pg_probackup. Barman is Python package: http://docs.pgbarman.org/release/2.10/#installation-from-sources But the target of the copy may not yet been written to disk. If at that point a power loss happens, the log may become missing/damaged/ incomplete, while the database may or may not consider it done when restarting. Therefore, mounting such a target filesystem in all-synchronous mode might be a good idea. (UFS: "-o sync", ZFS: "set sync=always") cheerio, PMc -- Adrian Klaver adrian.kla...@aklaver.com
Re: Something else about Redo Logs disappearing
Actually, the affair had some good side: as usual I was checking my own designs first and looking for flaws, and indeed I found one: If you do copy out the archive logs not directly to tape, but to some disk area for further processing, then there is an issue with possible loss. If you do it like the docs say, with a command like this: archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p +/mnt/server/archivedir/%f' # Unix That "cp" is usually not synchronous. So there is the possibility that this command terminates successfully, and reports exitcode zero back to the Postgres, and then the Postgres will consider that log being safely away. But the target of the copy may not yet been written to disk. If at that point a power loss happens, the log may become missing/damaged/ incomplete, while the database may or may not consider it done when restarting. Therefore, mounting such a target filesystem in all-synchronous mode might be a good idea. (UFS: "-o sync", ZFS: "set sync=always") cheerio, PMc