Re: Something else about Redo Logs disappearing

2020-06-16 Thread Laurenz Albe
On Tue, 2020-06-16 at 00:28 +0200, Peter wrote:
> On Mon, Jun 15, 2020 at 09:46:34PM +0200, Laurenz Albe wrote:
> ! On Mon, 2020-06-15 at 19:00 +0200, Peter wrote:
> ! > And that is one of a couple of likely pitfalls I perceived when
> ! > looking at that new API.
> ! 
> ! That is a property of my scripts, *not* of the non-exclusive
> ! backup API...
> 
> Then how would I have figured it before even knowing Your scripts?

By reading the documentation and trying it?

> The API requires such kind of programming, and such kind of
> programming creates dangerous pitfalls. After You have trained soms
> hundreds of system administrators, You will also see such things
> right at first glance.

The API can be used by hand, calling two SQL functions.
I have trained some hundreds of administrators, and they could follow.

> But then it's futile to discuss with religious people, as they are
> blind to reasoning: AL II. 58, "The slaves shall serve."

Suspicion becomes certainty:
You are not out to understand, but to troll.

Bye.

Laurenz Albe





Re: Something else about Redo Logs disappearing

2020-06-16 Thread Peter
On Mon, Jun 15, 2020 at 09:46:34PM +0200, Laurenz Albe wrote:
! On Mon, 2020-06-15 at 19:00 +0200, Peter wrote:
! > And that is one of a couple of likely pitfalls I perceived when
! > looking at that new API.
! 
! That is a property of my scripts, *not* of the non-exclusive
! backup API...

Then how would I have figured it before even knowing Your scripts?

The API requires such kind of programming, and such kind of
programming creates dangerous pitfalls. After You have trained soms
hundreds of system administrators, You will also see such things
right at first glance.

But then it's futile to discuss with religious people, as they are
blind to reasoning: AL II. 58, "The slaves shall serve."


cheerio,
PMc




Re: Something else about Redo Logs disappearing

2020-06-16 Thread Peter
On Sun, Jun 14, 2020 at 03:05:15PM +0200, Magnus Hagander wrote:

! > You can see that all the major attributes (scheduling, error-handling,
! > signalling, ...) of a WAL backup are substantially different to that
! > of any usual backup.
! 
! > This is a different *Class* of backup object, therefore it needs an
! > appropriate infrastructure that can handle these attributes correctly.
! >
! 
! Yes, this is *exactly* why special-handling the WAL during the base backup
! makes a lot of sense.

Certainly. Only I prefer to do the special-handling *outside of* the
base backup.

! Is it required? No.
! Will it make your backups more reliable? Yes.

*shrug* I have no benefit in increasing reliability from 250% to 330%,
if that would be the case at all.

! But, if You never have considered *continuous* archiving, and only
! > intend to take a functional momentarily backup of a cluster, then You
! > may well have never noticed these differences. I noticed them mainly
! > because I did *BUILD* such an infrastructure (the 20 lines of shell
! > script, you know).
! >
! 
! Yes, if you take a simplistic view of your backups, then yes.

You appear to sound like an insurance salesman who desperately tries
to sell a third health insurance policy to somebody who already has
two of them, by trying to build on unfounded precariousness.

! ! There is *absolutely* no need for threading to use the current APIs. You
! > ! need to run one query, go do something else, and then run another
! > ! query.
! >
! > Wrong. The point is, I dont want to "go do something else", I have to
! > exit() and get back to the initiator at that place.
! >
! 
! That is not a requirement of the current PostgreSQL APIs.

We'll be done with that whole API in a few more lines now. (I'm getting
tired of this.)

! (in fact, using
! threading would add a significant extra burden there, as libpq does not
! allow sharing of connections between threads)

I never said one would need to thread the DB connections.

! That is a requirement, and indeed a pretty sharp limitation, of the *other*
! APIs you are working with, it sounds like.

What "other"?

! The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do
! they require any form of threading.

Ah, nice try! But, we're *NOT* shifting blame around. We do instead
get things working. We do proper engineering.
 
! And the fact that you need to do an exit() would negate any threading
! anyway, so that seems to be a false argument regardless.

You do know exactly what I'm talking about.

! This is also clearly visible in Laurenz' code: he utilizes two
! > unchecked background tasks (processes, in this case) with loose
! > coupling for the purpose, as it does not work otherwise.
! >
! 
! Yes, because he is also trying to work around a severely limited API *on
! the other side*.

There is no "other" side. There is only *one* side: to get things
working. And for interaction, Jon Postel's law applies:

  Be conservative in what you provide, and liberal in what you require.

This is how the Internet was built. The modern-day linux-youngsters
tend to forget that we all stand on the shoulders of giants.

! The most interesting point in there appears to be this:
! >   > that the backup label and tablespace map files are not written to
! >   > disk. Instead, their would-be contents are returned in *labelfile
! >   > and *tblspcmapfile,
! >
! > This is in do_pg_start_backup() - so we actually HAVE this data
! > already at the *START* time of the backup!
! 
! 
! > Then why in hell do we wait until the END of the backup before we
! > hand this data to the operator: at a time when the DVD with the
! >
! 
! Because it cannot be safely written *into the data directory*.
! 
! Now, it could be written *somewhere else*, that is true. And then you would
! add an extra step at restore time to rename it back. But then your restore
! would now also require a plugin.

Yes, and as it is now, it requires girl Friday to fetch them from
the line-printer and mix them up - which, as we already got explained,
can end up a *lot* worse. Or, equivalently and as here practically
demonstrated, some consultant trainee writing some script which, when
accidentially invoked twice, creates an inconsistent backup, and
this being invisible to the operator. That's indeed dangerous enough
for my taste.

But lets grab that from the start:
Yes, I didn't trust the docs. Because, as people here are so crazy
about the old API being troublesome and dangerous and must be
deprecated, and the whole thing being so imminent, then there should
be some REASON for that. And from the docs I could not see any reason
 - so I supposed there must be something else in pg_start_backup();
something that is not explained in the docs, and that would explain
the whole bohei.

But, in fact, there is no such thing.

First, the backup_label, which should not stay in the running cluster
tree. So, what bad does happen when it stays there? Nothing at all.
The cluster 

Re: Something else about Redo Logs disappearing

2020-06-15 Thread Laurenz Albe
On Mon, 2020-06-15 at 19:00 +0200, Peter wrote:
> And that is one of a couple of likely pitfalls I perceived when
> looking at that new API.

That is a property of my scripts, *not* of the non-exclusive
backup API...

> We could fix that, but that will then get more complicated - and
> people will usually not do that. And that's why I consider that
> new API as rather dangerous.

... so this is moot.

Yours,
Laurenz Albe





Re: Something else about Redo Logs disappearing

2020-06-15 Thread Peter
On Mon, Jun 15, 2020 at 03:19:29PM +0200, Laurenz Albe wrote:
! On Mon, 2020-06-15 at 14:50 +0200, Peter wrote:
! > ! An example:
! > ! 
! > ! - Backup #1 calls "pgpre.sh"
! > ! - Backup #1 starts copying files
! > ! - Backup #2 calls "pgpre.sh".
! > !   This will cancel the first backup.
! > ! - Backup #1 completes copying files.
! > ! - Backup #1 calls "pgpost.sh".
! > !   It will receive an error.
! > !   So it has to invalidate the backup.
! > ! - Backup #2 completes copying files.
! > ! - Backup #2 calls "pgpost.sh".
! > !   It gets a "backup_label" file and completes the backup.
! > 
! > That's not true.
! 
! Ah, yes, you are right.

Thank You.

! Since "pgpre.sh" and "pgpost.sh" are independent, there
! is no way to tell which of them belongs to which other.

Correct.
 
! So calling "pgpost.sh" indeed ends the most recently started
! backup and returns "backup_label" accordingly.
! 
! That means: the caller of the scripts has to make sure
! not to start a second backup while the first one is running.

Never run two backups in parallel with such an approach, exactly.

And that is one of a couple of likely pitfalls I perceived when
looking at that new API.

We could fix that, but that will then get more complicated - and
people will usually not do that. And that's why I consider that
new API as rather dangerous.


cheerio,
PMc




Re: Something else about Redo Logs disappearing

2020-06-15 Thread Laurenz Albe
On Mon, 2020-06-15 at 14:50 +0200, Peter wrote:
> ! An example:
> ! 
> ! - Backup #1 calls "pgpre.sh"
> ! - Backup #1 starts copying files
> ! - Backup #2 calls "pgpre.sh".
> !   This will cancel the first backup.
> ! - Backup #1 completes copying files.
> ! - Backup #1 calls "pgpost.sh".
> !   It will receive an error.
> !   So it has to invalidate the backup.
> ! - Backup #2 completes copying files.
> ! - Backup #2 calls "pgpost.sh".
> !   It gets a "backup_label" file and completes the backup.
> 
> That's not true.

Ah, yes, you are right.
It's a while since I wrote these scripts.

Since "pgpre.sh" and "pgpost.sh" are independent, there
is no way to tell which of them belongs to which other.

So calling "pgpost.sh" indeed ends the most recently started
backup and returns "backup_label" accordingly.

That means: the caller of the scripts has to make sure
not to start a second backup while the first one is running.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Something else about Redo Logs disappearing

2020-06-15 Thread Peter
On Mon, Jun 15, 2020 at 11:44:33AM +0200, Laurenz Albe wrote:
! On Sat, 2020-06-13 at 19:48 +0200, Peter wrote:
! > ! >  4. If, by misconfiguration and/or operator error, the backup system
! > ! > happens to start a second backup. in parallel to the first,
! > ! > then do I correctly assume, both backups will be rendered
! > ! > inconsistent while this may not be visible to the operator; and
! > ! > the earlier backup would be flagged as apparently successful while
! > ! > carrying the wrong (later) label?
! > ! 
! > ! If you are using my scripts and start a second backup while the first
! > ! one is still running, the first backup will be interrupted.
! > 
! > This is not what I am asking. It appears correct to me, that, on
! > the database, the first backup will be interrupted. But on the
! > tape side, this might go unnoticed, and on completion it will
! > successfully receive the termination code from the *SECOND*
! > backup - which means that on tape we will have a seemingly
! > successful backup, which
! >  1. is corrupted, and
! >  2. carries a wrong label.
! 
! That will only happen if the backup that uses my scripts does the
! wrong thing.

Yes. Occasionally software does the wrong thing, it's called "bugs".

! An example:
! 
! - Backup #1 calls "pgpre.sh"
! - Backup #1 starts copying files
! - Backup #2 calls "pgpre.sh".
!   This will cancel the first backup.
! - Backup #1 completes copying files.
! - Backup #1 calls "pgpost.sh".
!   It will receive an error.
!   So it has to invalidate the backup.
! - Backup #2 completes copying files.
! - Backup #2 calls "pgpost.sh".
!   It gets a "backup_label" file and completes the backup.

That's not true.


Now let me see how to compile a bash... and here we go:

! An example:
! 
! - Backup #1 calls "pgpre.sh"

> $ ./pgpre.sh
> backup starting location: 1/C858
> $

We now have:
> 24129 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 24130 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 24131 10  SJ   0:00.01 psql -Atq
> 24158 10  SCJ  0:00.00 sleep 5

And:
> postgres=# \d
>  List of relations
>  Schema |  Name  | Type  |  Owner   
> ++---+--
>  public | backup | table | postgres
> (1 row)
>  
> postgres=# select * from backup;
>  id |  state  |  pid  | backup_label | tablespace_map 
> +-+---+--+
>   1 | running | 24132 |  | 
> (1 row)

! - Backup #1 starts copying files

Let's suppose it does now.

! - Backup #2 calls "pgpre.sh".

> $ ./pgpre.sh
> backup starting location: 1/C924
> $ FATAL:  terminating connection due to administrator command
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
> Backup failed
> ./pgpre.sh: line 93: ${PSQL[1]}: ambiguous redirect
> 
> $ echo $?
> 0

!   This will cancel the first backup.

Yes, it seems it did:

> 25279 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 25280 10  IWJ  0:00.00 /usr/local/bin/bash ./pgpre.sh
> 25281 10  SJ   0:00.01 psql -Atq
> 25402 10  SCJ  0:00.00 sleep 5

> postgres=# \d
>  List of relations
>  Schema |  Name  | Type  |  Owner   
> ++---+--
>  public | backup | table | postgres
> (1 row)
> 
> postgres=# select * from backup;
>  id |  state  |  pid  | backup_label | tablespace_map 
> +-+---+--+
>   1 | running | 25282 |  | 
> (1 row)

! - Backup #1 completes copying files.
! - Backup #1 calls "pgpost.sh".

> $ ./pgpost.sh 
> START WAL LOCATION: 1/C924 (file 0001000100C9)
> CHECKPOINT LOCATION: 1/C958
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2020-06-15 14:09:41 CEST
> LABEL: 2020-06-15 14:09:40
> START TIMELINE: 1
>
> $ echo $?
> 0

!   It will receive an error.
!   So it has to invalidate the backup.

Where is the error?

What we now have is this:
No processes anymore.

>  id |  state   |  pid  |  backup_label
>   | tablespace_map 
> +--+---++
>   1 | complete | 25282 | START WAL LOCATION: 1/C924 (file 
> 0001000100C9)+| 
> |  |   | CHECKPOINT LOCATION: 1/C958  
>  +| 
> |  |   | BACKUP METHOD: streamed  
>  +| 
> |  |   | BACKUP FROM: master  
>  +| 
> |  |   | START TIME: 2020-06-15 14:09:41 CEST 
>  +| 
> |  |   | LABEL: 2020-06-15 14:09:40   
>  +| 
> |  |   | START TIMELINE: 1
>  +| 
> |  |   |

Re: Something else about Redo Logs disappearing

2020-06-15 Thread Laurenz Albe
On Sat, 2020-06-13 at 19:48 +0200, Peter wrote:
> ! >  4. If, by misconfiguration and/or operator error, the backup system
> ! > happens to start a second backup. in parallel to the first,
> ! > then do I correctly assume, both backups will be rendered
> ! > inconsistent while this may not be visible to the operator; and
> ! > the earlier backup would be flagged as apparently successful while
> ! > carrying the wrong (later) label?
> ! 
> ! If you are using my scripts and start a second backup while the first
> ! one is still running, the first backup will be interrupted.
> 
> This is not what I am asking. It appears correct to me, that, on
> the database, the first backup will be interrupted. But on the
> tape side, this might go unnoticed, and on completion it will
> successfully receive the termination code from the *SECOND*
> backup - which means that on tape we will have a seemingly
> successful backup, which
>  1. is corrupted, and
>  2. carries a wrong label.

That will only happen if the backup that uses my scripts does the
wrong thing.

An example:

- Backup #1 calls "pgpre.sh"
- Backup #1 starts copying files
- Backup #2 calls "pgpre.sh".
  This will cancel the first backup.
- Backup #1 completes copying files.
- Backup #1 calls "pgpost.sh".
  It will receive an error.
  So it has to invalidate the backup.
- Backup #2 completes copying files.
- Backup #2 calls "pgpost.sh".
  It gets a "backup_label" file and completes the backup.

So the only way that something can go wrong would be if
backup #1 somehow does *not* invalidate the backup.

> ! This is specific to my scripts, PostgreSQL's non-exclusive backup
> ! can perform more than one concurrent backup successfully.
> ! I tried to keep things simple.
> 
> I understand. But the operator may not know that and/or accidentially
> start a second backup while one is still running. And this will then
> result in ...
> 
> ! If you have the wrong "backup_label", you end up with silent data 
> corruption.
> 
>  ... this.

Of course, if you do arbitrary nonsense like restoring a
backup without "backup_label", you will get arbitrary data
corruption.

It is a fundamental principle that, apart from "backup_label", there
is no way to tell a backup from a crashed data directory.
Any backup/recovery software must refuse to use a backup that lacks that file.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Something else about Redo Logs disappearing

2020-06-14 Thread Magnus Hagander
On Sat, Jun 13, 2020 at 10:13 PM Peter  wrote:

> On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote:
> ! > Okay. So lets behave like professional people and figure how that
> ! > can be achieved:
> ! > At first, we drop that WAL requirement, because with WAL archiving
> ! > it is already guaranteed that an unbroken chain of WAL is always
> ! > present in the backup (except when we have a bug like the one that
> ! > lead to this discussion).
> ! > So this is **not part of the scope**.
> ! >
> !
> ! I would assume that anybody who deals with backups professionally
> wouldn't
> ! consider that out of scope,
>
> I strongly disagree. I might suppose You haven't thought this to the
> proper end. See:
>

You may disagree, but I would argue that this is because you are the one
who has not thought it through. But hey, let's agree to disagree.


You can see that all the major attributes (scheduling, error-handling,
> signalling, ...) of a WAL backup are substantially different to that
> of any usual backup.

This is a different *Class* of backup object, therefore it needs an
> appropriate infrastructure that can handle these attributes correctly.
>

Yes, this is *exactly* why special-handling the WAL during the base backup
makes a lot of sense.

Is it required? No.
Will it make your backups more reliable? Yes.

But it depends on what your priorities are.


But, if You never have considered *continuous* archiving, and only
> intend to take a functional momentarily backup of a cluster, then You
> may well have never noticed these differences. I noticed them mainly
> because I did *BUILD* such an infrastructure (the 20 lines of shell
> script, you know).
>

Yes, if you take a simplistic view of your backups, then yes.


And yes, I was indeed talking about *professional* approaches.
>

Sure.



! There is *absolutely* no need for threading to use the current APIs. You
> ! need to run one query, go do something else, and then run another
> ! query.
>
> Wrong. The point is, I dont want to "go do something else", I have to
> exit() and get back to the initiator at that place.
>

That is not a requirement of the current PostgreSQL APIs. (in fact, using
threading would add a significant extra burden there, as libpq does not
allow sharing of connections between threads)

That is a requirement, and indeed a pretty sharp limitation, of the *other*
APIs you are working with, it sounds like.

The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do
they require any form of threading.

And the fact that you need to do an exit() would negate any threading
anyway, so that seems to be a false argument regardless.


This is also clearly visible in Laurenz' code: he utilizes two
> unchecked background tasks (processes, in this case) with loose
> coupling for the purpose, as it does not work otherwise.
>

Yes, because he is also trying to work around a severely limited API *on
the other side*.

There's plenty of backup integrations that don't have this limitation. They
all work perfectly fine with no need for exit() and certainly no weird need
for special threading.


The most interesting point in there appears to be this:
>   > that the backup label and tablespace map files are not written to
>   > disk. Instead, their would-be contents are returned in *labelfile
>   > and *tblspcmapfile,
>
> This is in do_pg_start_backup() - so we actually HAVE this data
> already at the *START* time of the backup!


> Then why in hell do we wait until the END of the backup before we
> hand this data to the operator: at a time when the DVD with the
>

Because it cannot be safely written *into the data directory*.

Now, it could be written *somewhere else*, that is true. And then you would
add an extra step at restore time to rename it back. But then your restore
would now also require a plugin.

(



> backup is already fixated and cannot be changed anymore, so that
>

You don't need to change the the backup, only append to it. If you are
calling pg_stop_backup() at a time when that is no longer possible, then
you are calling pg_stop_backup() at the wrong time.


As I can read, there is no difference in the function requirements
> between exclusive and non-exclusive mode, in that regard: the
> backup-label file is NOT necessary in the running cluster data tree,
> BUT it should get into the RESTORED data tree before starting it.
>

Correct. It is in fact actively harmful in the running cluster data tree.


And I can't find a single one of those "big problems". What I do find
> is just people whining that their cluster doesn't start and they can't
> simply delete a file, even if told so. Like soldier complaining that
> his gun doesn't shoot and he has no idea how to reload.
>

Have you actually tried it? Or dealt with the many people who have run into
corruption around this?

Again, as suggested before, review the discussions that led up to the
changes. There are plenty of examples there.


! > I now hope very much that 

Re: Something else about Redo Logs disappearing

2020-06-13 Thread Peter
On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote:
! > Okay. So lets behave like professional people and figure how that
! > can be achieved:
! > At first, we drop that WAL requirement, because with WAL archiving
! > it is already guaranteed that an unbroken chain of WAL is always
! > present in the backup (except when we have a bug like the one that
! > lead to this discussion).
! > So this is **not part of the scope**.
! >
! 
! I would assume that anybody who deals with backups professionally wouldn't
! consider that out of scope,

I strongly disagree. I might suppose You haven't thought this to the
proper end. See:

The WAL backup is, for a couple of reasons, different to most other
backup objects:
 - it runs on demand, not per schedule.
 - it must be immediately repeated (or alerted), until success,
   otherwise the database might stop.
 - it might use a different storage (e.g. twofold/mirrored),
   for obvious reasons.
 - it gets deleted after backup, and that is a point-of-no-return.

For the data tree backup, on the contrary, all these attributes are
identical to those of any other incr/full filesystem backup.

You can see that all the major attributes (scheduling, error-handling,
signalling, ...) of a WAL backup are substantially different to that
of any usual backup.
This is a different *Class* of backup object, therefore it needs an
appropriate infrastructure that can handle these attributes correctly.

The WAL do *not belong* into the scope of the data tree backup, they
are to be handled by different schemes in a different way.

But, if You never have considered *continuous* archiving, and only
intend to take a functional momentarily backup of a cluster, then You
may well have never noticed these differences. I noticed them mainly
because I did *BUILD* such an infrastructure (the 20 lines of shell
script, you know).

And yes, I was indeed talking about *professional* approaches.


! There is *absolutely* no need for threading to use the current APIs. You
! need to run one query, go do something else, and then run another
! query.

Wrong. The point is, I dont want to "go do something else", I have to
exit() and get back to the initiator at that place.

This is also clearly visible in Laurenz' code: he utilizes two
unchecked background tasks (processes, in this case) with loose
coupling for the purpose, as it does not work otherwise.


! But then, lets think another step forward: for what purpose do we
! > actually need to call pg_start_backup() and pg_stop_backup() at all?
! > I couldn't find exhaustive information about that, only some partial
! > facts.
! >
! 
! Since you don't trust the documentation, I suggest you take a look at
! 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=55cac186dc71fcc2f4628f9974b30850bb51eb5d;hb=92c58fd94801dd5c81ee20e26c5bb71ad64552a8#l10438

Thanks, this indeed gives a most entertaining read. Honestly, I've
never seen such a source before - there is thrice as much storytime
than code. :)

The most interesting point in there appears to be this:
  > that the backup label and tablespace map files are not written to
  > disk. Instead, their would-be contents are returned in *labelfile
  > and *tblspcmapfile,

This is in do_pg_start_backup() - so we actually HAVE this data
already at the *START* time of the backup!

Then why in hell do we wait until the END of the backup before we
hand this data to the operator: at a time when the DVD with the
backup is already fixated and cannot be changed anymore, so that
we have to send this data to quarternary storage (spell: the line
printer) and have girl Friday sort them out (and probably mix them
up)??

Is this indeed Boeing-security: try to avoid possible mistake by
creating a much bigger risk?


! Things that remain to be figured out:
! >  1. What does pg_start_backup actually do and why would that be
! > necessary? I could not find exhaustive information, but this can
! > probably figured from the source. Currently I know so much:
! >  - it writes a backup_label file. That is just a few lines of
! >ASCII and should not be difficult to produce.
! >
! 
! It does that only in exclusive mode, and doing that is one of the big
! problems with exclusive mode. So don't do that.

As I can read, there is no difference in the function requirements
between exclusive and non-exclusive mode, in that regard: the
backup-label file is NOT necessary in the running cluster data tree,
BUT it should get into the RESTORED data tree before starting it.

And I can't find a single one of those "big problems". What I do find
is just people whining that their cluster doesn't start and they can't
simply delete a file, even if told so. Like soldier complaining that
his gun doesn't shoot and he has no idea how to reload.

I could devise a dozen ways how to detect that situation automatically
during rc.d and remove the file - if it were of any concern to me.


! > I now 

Re: Something else about Redo Logs disappearing

2020-06-13 Thread Peter
On Sat, Jun 13, 2020 at 01:53:28PM +0200, Laurenz Albe wrote:

! > I've never seen anybody coding bash - it is strongly shunned in the
! > Berkeley community. 
! 
! Strange, but then I don't move in these circles.

Never mind.

! > Some Questions:
! >  1. There are explicit error messages in loc-82 and -92 of pgpre.sh.
! > To where are these written?
! 
! Standard error.  It is up to the caller of the script to route that
! somewhere useful.

Understood.

! >  2. The result data from pg_stop_backup() are stored into the living
! > database. But, according to the docs, they should be placed into
! > the completed backup. Do I have a misunderstanding here?
! 
! Right, but these scripts don't know anything about that backup itself.
! They are designed to be called before and after the backup.
! In between, you back up the data directory however you think fit.
! 
! It is the responsibility of the caller of the post-backup script
! to add the "backup_label" file to the backup.

I see.

! >  4. If, by misconfiguration and/or operator error, the backup system
! > happens to start a second backup. in parallel to the first,
! > then do I correctly assume, both backups will be rendered
! > inconsistent while this may not be visible to the operator; and
! > the earlier backup would be flagged as apparently successful while
! > carrying the wrong (later) label?
! 
! If you are using my scripts and start a second backup while the first
! one is still running, the first backup will be interrupted.

This is not what I am asking. It appears correct to me, that, on
the database, the first backup will be interrupted. But on the
tape side, this might go unnoticed, and on completion it will
successfully receive the termination code from the *SECOND*
backup - which means that on tape we will have a seemingly
successful backup, which
 1. is corrupted, and
 2. carries a wrong label.

! This is specific to my scripts, PostgreSQL's non-exclusive backup
! can perform more than one concurrent backup successfully.
! I tried to keep things simple.

I understand. But the operator may not know that and/or accidentially
start a second backup while one is still running. And this will then
result in ...

! If you have the wrong "backup_label", you end up with silent data corruption.

 ... this.

Indeed this is difficult to avoid, because the high risk of silent
data corruption is an elementary architectural feature of the so-called
"new API". Which is why this is not going to run on my ship.

But you will have to wait - the to-be-expected inrush of service-sales
due to corrupted clusters will only happen after R.13 is active and
peope are *forced* to cope with that "new API".

Thanks for the effort of answering my questions.

cheerio,
PMc




Re: Something else about Redo Logs disappearing

2020-06-13 Thread Laurenz Albe
On Sat, 2020-06-13 at 05:24 +0200, Peter wrote:
> ! https://github.com/cybertec-postgresql/safe-backup
> 
> Ah, thank You, very nice.
> 
> I've never seen anybody coding bash - it is strongly shunned in the
> Berkeley community. 

Strange, but then I don't move in these circles.

> Some Questions:
>  1. There are explicit error messages in loc-82 and -92 of pgpre.sh.
> To where are these written?

Standard error.  It is up to the caller of the script to route that
somewhere useful.

>  2. The result data from pg_stop_backup() are stored into the living
> database. But, according to the docs, they should be placed into
> the completed backup. Do I have a misunderstanding here?

Right, but these scripts don't know anything about that backup itself.
They are designed to be called before and after the backup.
In between, you back up the data directory however you think fit.

It is the responsibility of the caller of the post-backup script
to add the "backup_label" file to the backup.

>  3. The most common error cause of a backup might be tape-side
> malfunction. So far as I see, the way to handle this is currently,
> to provide a timeout for pgpre.sh (which is problematic, because
> we might as well have just reached end-of-tape and have to wait
> until monday for the operator to change it). May I suggest to add
> a switch to pgpost.sh, in order to volutarily fail out of the job?

As said above, the backup itself is not the job of these scripts.

>  4. If, by misconfiguration and/or operator error, the backup system
> happens to start a second backup. in parallel to the first,
> then do I correctly assume, both backups will be rendered
> inconsistent while this may not be visible to the operator; and
> the earlier backup would be flagged as apparently successful while
> carrying the wrong (later) label?

If you are using my scripts and start a second backup while the first
one is still running, the first backup will be interrupted.

This is specific to my scripts, PostgreSQL's non-exclusive backup
can perform more than one concurrent backup successfully.
I tried to keep things simple.

> BTW: what does, in general, happen, if a backup_label file gets
> accidentially swapped with one from a parallel, but slightly later
> backup? Do I correctly assume that such mistake gets somehow detected,
> as otherwise it would have just the same unwelcome effects
> (i.e. silent data corruption) as no backup_label at all?

If you have the wrong "backup_label", you end up with silent data corruption.

That should not happen with my scripts, though.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Something else about Redo Logs disappearing

2020-06-13 Thread Peter
On Thu, Jun 11, 2020 at 10:53:15PM +0200, Laurenz Albe wrote:
! On Thu, 2020-06-11 at 22:35 +0200, Magnus Hagander wrote:
! > I believe somebody around that time also wrote a set of bash scripts that 
can be used in a pre/post-backup-job combination with the current APIs.
! 
! https://github.com/cybertec-postgresql/safe-backup

Ah, thank You, very nice.

I've never seen anybody coding bash - it is strongly shunned in the
Berkeley community. 

Some Questions:
 1. There are explicit error messages in loc-82 and -92 of pgpre.sh.
To where are these written?
 2. The result data from pg_stop_backup() are stored into the living
database. But, according to the docs, they should be placed into
the completed backup. Do I have a misunderstanding here?
 3. The most common error cause of a backup might be tape-side
malfunction. So far as I see, the way to handle this is currently,
to provide a timeout for pgpre.sh (which is problematic, because
we might as well have just reached end-of-tape and have to wait
until monday for the operator to change it). May I suggest to add
a switch to pgpost.sh, in order to volutarily fail out of the job?
 4. If, by misconfiguration and/or operator error, the backup system
happens to start a second backup. in parallel to the first,
then do I correctly assume, both backups will be rendered
inconsistent while this may not be visible to the operator; and
the earlier backup would be flagged as apparently successful while
carrying the wrong (later) label?

BTW: what does, in general, happen, if a backup_label file gets
accidentially swapped with one from a parallel, but slightly later
backup? Do I correctly assume that such mistake gets somehow detected,
as otherwise it would have just the same unwelcome effects
(i.e. silent data corruption) as no backup_label at all?

cheerio,
PMc




Re: Something else about Redo Logs disappearing

2020-06-12 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 6/9/20 4:15 AM, Stephen Frost wrote:
> >* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> >>I use pg_backrest, but it does not look promising for running on BSD:
> >>https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html
> >
> >That's an unfortunately ancient post, really, considering that
> >pgbackrest has now been fully rewritten into C, and Luca as recently as
> >September 2019 was saying he has it working on FreeBSD.
> 
> Yeah, but this:
> 
> https://github.com/pgbackrest/pgbackrest/issues/686
> 
> is not clear on whether a user can do that w/o a certain amount of hand
> holding.

Luca (at my asking) wrote up a new blog post to clarify things, you can
see it here:

https://fluca1978.github.io/2020/06/12/pgbackrestOnFreeBSD.html

No particular hand holding needed.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Something else about Redo Logs disappearing

2020-06-11 Thread Laurenz Albe
On Thu, 2020-06-11 at 22:35 +0200, Magnus Hagander wrote:
> I believe somebody around that time also wrote a set of bash scripts that can 
> be used in a pre/post-backup-job combination with the current APIs.

https://github.com/cybertec-postgresql/safe-backup

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Something else about Redo Logs disappearing

2020-06-11 Thread Magnus Hagander
On Thu, Jun 11, 2020 at 10:13 PM Peter  wrote:

>
> Okay. So lets behave like professional people and figure how that
> can be achieved:
> At first, we drop that WAL requirement, because with WAL archiving
> it is already guaranteed that an unbroken chain of WAL is always
> present in the backup (except when we have a bug like the one that
> lead to this discussion).
> So this is **not part of the scope**.
>

I would assume that anybody who deals with backups professionally wouldn't
consider that out of scope, but sure, for the sake of argument, let's do
that.


! This is only one option though, there are others- you can also use
> ! pgbackrest to push your backups to s3 (or any s3-compatible data storage
> ! system, which includes some backup systems), and we'll be adding
> ! support
>
> ! I concur that this is becoming a madhouse, and is pushing past the limit
> ! for what I'm willing to deal with when trying to assist someone.
>
> Well, then that might be a misconception. I'm traditionally a
> consultant, and so I am used to *evaluate* solutions. I don't need
> assistance for that, I only need precise technical info.
>

Excellent. Then let's stick to that.


This STILL needs threaded programming (as I said, there is no way to
> avoid that with those "new API"), but in this case it is effectively
> reduced to just grab the return-code of some program that has been
> started with "&".
>

There is *absolutely* no need for threading to use the current APIs. You
need to run one query, go do something else, and then run another query.
It's 100% sequential, so there is zero need for threads. Now, if you're
stuck in shellscript, it's a little more complicated. But it does not need
threading.


But then, lets think another step forward: for what purpose do we
> actually need to call pg_start_backup() and pg_stop_backup() at all?
> I couldn't find exhaustive information about that, only some partial
> facts.
>

Since you don't trust the documentation, I suggest you take a look at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=55cac186dc71fcc2f4628f9974b30850bb51eb5d;hb=92c58fd94801dd5c81ee20e26c5bb71ad64552a8#l10438

It has a fair amount of detail of the underlying reasons, and of course
links to all the details.


Things that remain to be figured out:
>  1. What does pg_start_backup actually do and why would that be
> necessary? I could not find exhaustive information, but this can
> probably figured from the source. Currently I know so much:
>  - it writes a backup_label file. That is just a few lines of
>ASCII and should not be difficult to produce.
>

It does that only in exclusive mode, and doing that is one of the big
problems with exclusive mode. So don't do that.



> I now hope very much that Magnus Hagander will tell some of the
> impeding "failure scenarios", because I am getting increasingly
> tired of pondering about probable ones, and searching the old
> list entries for them, without finding something substantial.
>

Feel free to look at the mailinglist archives. Many of them have been
explained there before. Pay particular attention to the threads around when
the deprecated APIs were actually deprecaed. I believe somebody around that
time also wrote a set of bash scripts that can be used in a
pre/post-backup-job combination with the current APIs.

//Magnus


Re: Something else about Redo Logs disappearing

2020-06-11 Thread Stephen Frost
Greetings,

* Peter (p...@citylink.dinoex.sub.org) wrote:
> On Wed, Jun 10, 2020 at 08:32:22AM -0400, Stephen Frost wrote:
> ! > What repo?? I seem to have missed that at first glance.
> ! 
> ! Yes, pgbackrest has a repo, like most other tools (though they call them
> ! different things... pg_basebackup has one though it's not really
> ! formal).
> ! 
> ! > Are You indeed suggesting that one should have their data within
> ! > the database, where it is worked with, and then use Your tool
> ! > to copy it to some "repo" disk playground whatever area, and then
> ! > use their regular backup system to COPY IT AGAIN into their
> ! > backup/archiving system? Are You kiddin'?
> ! 
> ! No, I'm not kidding and yes, that's what I'm suggesting.
> 
> Well, do You think I could seriously sell that to anybody?

That hardly seems to have much relevance.  I do know that there are lots
of deployments which have exactly that setup though.

> ! You need a
> ! consistent backup of your database that includes all the needed WAL to
> ! perform a restore.
> 
> Okay. So lets behave like professional people and figure how that
> can be achieved:

No, thanks, but I'm done.  I will not continue to engage with someone
who insists on belittling everyone they engage with.  That's not
appropriate for any professional environment, including these lists.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Something else about Redo Logs disappearing

2020-06-11 Thread Peter
On Wed, Jun 10, 2020 at 08:32:22AM -0400, Stephen Frost wrote:

! > What repo?? I seem to have missed that at first glance.
! 
! Yes, pgbackrest has a repo, like most other tools (though they call them
! different things... pg_basebackup has one though it's not really
! formal).
! 
! > Are You indeed suggesting that one should have their data within
! > the database, where it is worked with, and then use Your tool
! > to copy it to some "repo" disk playground whatever area, and then
! > use their regular backup system to COPY IT AGAIN into their
! > backup/archiving system? Are You kiddin'?
! 
! No, I'm not kidding and yes, that's what I'm suggesting.

Well, do You think I could seriously sell that to anybody?

! You need a
! consistent backup of your database that includes all the needed WAL to
! perform a restore.

Okay. So lets behave like professional people and figure how that
can be achieved:
At first, we drop that WAL requirement, because with WAL archiving
it is already guaranteed that an unbroken chain of WAL is always
present in the backup (except when we have a bug like the one that
lead to this discussion).
So this is **not part of the scope**.

I'll get to the other item, the "consistent backup", further below.

! This is only one option though, there are others- you can also use
! pgbackrest to push your backups to s3 (or any s3-compatible data storage
! system, which includes some backup systems), and we'll be adding
! support

! I concur that this is becoming a madhouse, and is pushing past the limit
! for what I'm willing to deal with when trying to assist someone.

Well, then that might be a misconception. I'm traditionally a
consultant, and so I am used to *evaluate* solutions. I don't need
assistance for that, I only need precise technical info.

So lets get serious:
It is NOT technically feasible to amplify the storage in a way that
the entire backup data gets copied from the live database to some "repo"
place first, and then again from that repo place to regular file-based
backup/archiving storage.
And it does not make a difference WHERE that other place is, if at
Jeff's or whereever. It just does not belong into the loop.

So, how could the alternative look like?

I for my part consider doing this:
With a proper transactional filesystem we can do recursive filesystem
snapshots. That means, given a suitable filesystem layout, we can do a
snapshot of the data tree, AND the pg_wal filesystem, AND the
respective tablespaces. And this is atomical.

So, what we then do in the before- hook, is:

 * we call pg_start_backup(whatever, false, false).
 * we issue the filesystem snapshot, atomical.
 * we call pg_stop_backup(whatever).

And then we return with exit(0) (if all succeeded), and the file-based
backup software can start to collect the files from the filesystem
snapshot, and release it afterwards.

This STILL needs threaded programming (as I said, there is no way to
avoid that with those "new API"), but in this case it is effectively
reduced to just grab the return-code of some program that has been
started with "&".

So far, so good.

There is still one thing to be done, namely, the requirement to
collect the data reported by pg_stop_backup() and add that to the
backup, at a point in time where that is ALREADY CLOSED! (that's
another piece of gross bogus in this "new API") - but with my intended
method (and some ugliness) this could now also be solved.

But then, lets think another step forward: for what purpose do we
actually need to call pg_start_backup() and pg_stop_backup() at all?
I couldn't find exhaustive information about that, only some partial
facts.

What we know for certain, is: if we leave these calls away, and
just do a filesystem snapshot and make a backup from that, then we
have exactly the same thing as if we had a power failure at that time.
So this is restoreable, and the server will certainly start.

The remaining question is: can we then give it our subsequently
archived redologs and make it roll forward before going ready? I don't
know that yet, but maybe, if we just throw these WAL into pg_wal, the
thing might be stupid enough to swallow them.

If that does not work, then there might still be ways to trick it
into believing it's a standby server and make it roll forward these
WAL.

So, there are not only ways to avoid the useless storage
amplification, there might even be means to get rid of that whole
misconceived "API".

Things that remain to be figured out:
 1. What does pg_start_backup actually do and why would that be
necessary? I could not find exhaustive information, but this can
probably figured from the source. Currently I know so much:
 - it writes a backup_label file. That is just a few lines of
   ASCII and should not be difficult to produce.
 - it does a checkpoint. This can probably be done with the
   respective command (if it is needed at all).
 - it does temporarily switch to full_page_writes. This cannot
   be 

Re: Something else about Redo Logs disappearing

2020-06-11 Thread Peter
On Wed, Jun 10, 2020 at 01:10:36PM +0200, Magnus Hagander wrote:

! > Just having a look at their webpage, something seems to have been updated
! > recently, they now state that they have a new postgres adapter:
! >
! > https://www.bareos.com/en/company_news/postgres-plugin-en1.html
! > Enjoy reading, and tell us what You think.
! >
! 
! This one unfortunately rings out of somebody who doesn't know how to back
! up postgres, at least in the past 10-15 years.
! 
! They are using an API that has been deprecated for years - in what's
! announced as a brand new product. They are advocating local archiving,
! which basically guarantees dataloss in the event of a disaster.

Aye, thank You, that's exactly the impression I got. This is probably
still the old thing I was talking about, just made into a new product.
 
! That's from a 3 minute look, but that's definitely enough to suggest this
! is  not something I'd consider using.

The matter is, that backup software (as a whole, not this postgres
component) offers lots of things exactly as I like them. It is a great
concept, a great implementation, but a bad coding quality and a bad
maintenance policy. But then, one can get it for free; and I know
of no other with such features. So I went thru the effort of fixing
it up, so that it now well serves my needs - and use my own scripting
for the add-ons.
 
! > Well, Your own docs show how to do it with a one-liner. So please
! > don't blame me for improving that to 20 lines.
! >
! 
! Yes, those docs are unfortunately "known bad" and should definitely be
! improved on. it does very clearly state that the example is just an
! example. But it doesn't clearly state *why* it shouldn't be used.

That's why I felt the ethical need to speak up and share my
consideration. Now it's up to those in charge and not my issue
anymore. ;)

! In my understanding, backup is done via pgdump. The archive logs are
! > for emergencies (data corruption, desaster), only. And emergencies
! > would usually be handled by some professional people who know what
! > they have to do.
! >
! 
! I'd say it's the exact opposite. backups are done via pg_basebackup or
! manual basebackups. Archive logs are for point in time recovery. pg_dump
! can be used as a secondary "backup to the backups" option, but it is most
! interesting for things that are not backups (such as inspecting data, or
! provisioning partial test systems).
! 
! Different for different scenarios of course, but that would be the base
! scenario. And pg_dump are definitely as far from good backups are you can
! get while still having something that can be called approximately backups.
! It might be enough for small databases, but even in those cases
! pg_basebackup (without archive logging) is easier...

It's easier to create - but to apply? That depends on how many DBs are
in the cluster and how diverse their use. Also at any major version
switch these backups get worthless; one cannot use them for longterm.
(I suppose this is also true for pg_basebackup.)

I'm creating my longterm (and offsite) simply as clones from the regular
full backup. So what I came up with for now, is: I run pg_dump over all
the present databases, plus globals, chunk that up (in a similar way
like chunked HTTP works), feed it onto a pipe and backup that pipe. No
need for interim storage, so it can get as large as the backup
software can take it. And that should work for longterm - and I don't 
currently see a better option.

(This one does not work in 20 lines shellscript, because I didn't get
a reliable chunker running in shell.)

! And yes, I read that whole horrible discussion, and I could tear my
! > hair out, really, concerning the "deprecated API". I suppose You mean
! > the mentioning in the docs that the "exclusive low-level backup" is
! > somehow deprecated.
! >
! 
! Yes. There is no "somehow", it's deprecated.

Then lets not call it "somehow", as, more precisely, from my
understanding so far, that so called "new API" is ill-conceived and
troublesome in more than one regard. I would, with my current
knowledge, recommend to avoid, or better, abandon it.

Or, in other words: it is similar to what Boeing tried to do, in
forcing things upon people via software, for safety reasons - and
now see where Boeing got with that.

! > But now, with the now recommended "non-exclusive low-level backup",
! > the task is different: now your before-hook needs to do two things
! > at the same time:
! >  1. keep a socket open in order to hold the connection to postgres
! > (because postgres will terminate the backup when the socket is
! > closed), and
! >  2. invoke exit(0) (because the actual backup will not start until
! > the before- hook has properly delivered a successful exit code.
! > And, that is not only difficult, it is impossible.
!
! It is not impossible. It is harder if you limit your available tools yes,
! but it also *works*.

In this description which I choose, I would think it is actually

Re: Something else about Redo Logs disappearing

2020-06-10 Thread Stephen Frost
Greetings,

* Peter (p...@citylink.dinoex.sub.org) wrote:
> On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
> ! > And then 90% of the things offered here become superfluous, because
> ! > they are already handled site-wide. And then you will have to
> ! > consider integration of both pieces - and that will most likely be
> ! > more work and more error-prone than just writing a few adapters in
> ! > shell.
> ! 
> ! pgbackrest's repo can be safely backed up using the simple file-based
> ! backup utilities that you're referring to here.  I suspect some of the
> ! other solution's backups also could be, but you'd probably want to make
> ! sure.
> 
> What repo?? I seem to have missed that at first glance.

Yes, pgbackrest has a repo, like most other tools (though they call them
different things... pg_basebackup has one though it's not really
formal).

> Are You indeed suggesting that one should have their data within
> the database, where it is worked with, and then use Your tool
> to copy it to some "repo" disk playground whatever area, and then
> use their regular backup system to COPY IT AGAIN into their
> backup/archiving system? Are You kiddin'?

No, I'm not kidding and yes, that's what I'm suggesting.  You need a
consistent backup of your database that includes all the needed WAL to
perform a restore.

This is only one option though, there are others- you can also use
pgbackrest to push your backups to s3 (or any s3-compatible data storage
system, which includes some backup systems), and we'll be adding support
for Azure very shortly, and have plans to add GCS too in the future,
along with others probably.

> Is this becoming a madhouse, or are You going to refund them that?

I concur that this is becoming a madhouse, and is pushing past the limit
for what I'm willing to deal with when trying to assist someone.

> Let me tell You something: the people I used to work for, sometimes
> had a problem. They had some amount of data that was created during
> the day, and they had the night to write that data away to backup.
> That would usually mean, four or eight of the big tapes, streaming in
> parallel, fibers saturated, all night thru. And the problem usually was
> that they would need a longer night. At least the math had to be done
> properly.

Indeed, parallel backup is important, which is why pgbackrest supports
it, along with compression and encryption, all in-stream between the
database server and the repo, along with calculating a SHA to be stored
of every single file seen, allowing you to validate that the files
haven't changed since the backup was done when restoring.

> Maybe You never encountered these, but there are surroundings where
> there is no spare room for nonsense. Maybe that'S why these people
> preferred to use oracle.

I've both dealt with keeping tape drives fully loaded to avoid breaking
the tape by studdering it (and writing dedicated C code to deal with
exactly that), and dealt with backing up and restoring Oracle, including
with various "enterprise" backup technologies (with varying levels of
success...).  None of what is being brought up here is new, novel, or
even particularly interesting.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Something else about Redo Logs disappearing

2020-06-10 Thread Peter
On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! > And then 90% of the things offered here become superfluous, because
! > they are already handled site-wide. And then you will have to
! > consider integration of both pieces - and that will most likely be
! > more work and more error-prone than just writing a few adapters in
! > shell.
! 
! pgbackrest's repo can be safely backed up using the simple file-based
! backup utilities that you're referring to here.  I suspect some of the
! other solution's backups also could be, but you'd probably want to make
! sure.

What repo?? I seem to have missed that at first glance.

Are You indeed suggesting that one should have their data within
the database, where it is worked with, and then use Your tool
to copy it to some "repo" disk playground whatever area, and then
use their regular backup system to COPY IT AGAIN into their
backup/archiving system? Are You kiddin'?

Are You indeed suggesting that people should buy the amount
of disks that they use for their database AGAIN in order for Your
software to copy the stuff around?

Is this becoming a madhouse, or are You going to refund them that?

Let me tell You something: the people I used to work for, sometimes
had a problem. They had some amount of data that was created during
the day, and they had the night to write that data away to backup.
That would usually mean, four or eight of the big tapes, streaming in
parallel, fibers saturated, all night thru. And the problem usually was
that they would need a longer night. At least the math had to be done
properly.

Maybe You never encountered these, but there are surroundings where
there is no spare room for nonsense. Maybe that'S why these people
preferred to use oracle.


cheerio,
PMc




Re: Something else about Redo Logs disappearing

2020-06-10 Thread Magnus Hagander
On Wed, Jun 10, 2020 at 1:53 AM Peter  wrote:

> On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote:
>
> ! The backup solution is?
>
> https://www.bareos.com/
>
> ! Fine rant below. Go forth and work your wonders.
>
> I don't need to, anymore. I did that, for about 20 years - people
> I used to work for as a consultant (major banks and insurance shops)
> would usually run Informix or Oracle. Postgres is just my own private
> fancy.
>
> On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
> ! * Peter (p...@citylink.dinoex.sub.org) wrote:
> ! > This professional backup solution also offers support for postgres.
> ! > Sadly, it only covers postgres up to Rel.9, and that piece of software
> ! > wasn't touched in the last 6 or 7 years.
> !
> ! Then it certainly doesn't work with the changes in v12, and probably has
> ! other issues, as you allude to.
>
> Just having a look at their webpage, something seems to have been updated
> recently, they now state that they have a new postgres adapter:
>
> https://www.bareos.com/en/company_news/postgres-plugin-en1.html
> Enjoy reading, and tell us what You think.
>

This one unfortunately rings out of somebody who doesn't know how to back
up postgres, at least in the past 10-15 years.

They are using an API that has been deprecated for years - in what's
announced as a brand new product. They are advocating local archiving,
which basically guarantees dataloss in the event of a disaster.

That's from a 3 minute look, but that's definitely enough to suggest this
is  not something I'd consider using.


! > Actually, I am getting very tired of reading that something which can
> ! > easily be done within 20 lines of shell scripting, would need special
> !
> ! This is just simply false- you can't do it properly in 20 lines of shell
> ! scripting.
>
> Well, Your own docs show how to do it with a one-liner. So please
> don't blame me for improving that to 20 lines.
>

Yes, those docs are unfortunately "known bad" and should definitely be
improved on. it does very clearly state that the example is just an
example. But it doesn't clearly state *why* it shouldn't be used.



In my understanding, backup is done via pgdump. The archive logs are
> for emergencies (data corruption, desaster), only. And emergencies
> would usually be handled by some professional people who know what
> they have to do.
>

I'd say it's the exact opposite. backups are done via pg_basebackup or
manual basebackups. Archive logs are for point in time recovery. pg_dump
can be used as a secondary "backup to the backups" option, but it is most
interesting for things that are not backups (such as inspecting data, or
provisioning partial test systems).

Different for different scenarios of course, but that would be the base
scenario. And pg_dump are definitely as far from good backups are you can
get while still having something that can be called approximately backups.
It might be enough for small databases, but even in those cases
pg_basebackup (without archive logging) is easier...


And yes, I read that whole horrible discussion, and I could tear my
> hair out, really, concerning the "deprecated API". I suppose You mean
> the mentioning in the docs that the "exclusive low-level backup" is
> somehow deprecated.
>

Yes. There is no "somehow", it's deprecated.



> This is a very big bad. Because: normally you can run the base backup
> as a strictly ordinary file-level backup in "full" mode, just as any
> backup software can do it. You will simply execute the
> pg_start_backup() and pg_stop_backup() commands in the before- and
> after- hooks - and any backup software will offer these hooks.


> But now, with the now recommended "non-exclusive low-level backup",
> the task is different: now your before-hook needs to do two things
> at the same time:
>  1. keep a socket open in order to hold the connection to postgres
> (because postgres will terminate the backup when the socket is
> closed), and
>  2. invoke exit(0) (because the actual backup will not start until
> the before- hook has properly delivered a successful exit code.
> And, that is not only difficult, it is impossible.
>

It is not impossible. It is harder if you limit your available tools yes,
but it also *works*.


So yes, this is really a LOT of work. But the point is: this all is
> not really necessary, because currently the stuff works fine in the
> old way.
>

It does not, no. It works in the simple cases, but it has multiple failure
scenarios that *cannot* be fixed without changing those fundamentals.

But you can always go for the actual old way -- just stop postgres in the
pre-job and start it again in the post-job. That's by far the easiest. And
that *does* work and is fully supported.


! > The only really interesting thing there is the pg_probackup. These
> ! > folks seem to have found a way to do row-level incremental backups.
> !
> ! pg_probackup doesn't do row-level incremental backups, unless I've
> ! 

Re: Something else about Redo Logs disappearing

2020-06-09 Thread Adrian Klaver

On 6/9/20 4:35 PM, Peter wrote:

On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote:





And that can very well be done properly with an incremental filesystem
backup software plus some 20 lines of shellscript.


Read the caveats here:

https://www.postgresql.org/docs/12/backup-file.html



Now talking about doing an automated restore, or, having some menu-
driven solution, or -the worst of all- having a solution that can be
operated by morons - that's an entirely different matter.

In my understanding, backup is done via pgdump. The archive logs are
for emergencies (data corruption, desaster), only. And emergencies
would usually be handled by some professional people who know what
they have to do.


Read the entire section below, for why WAL's are for backup also. FYI, 
if you don't properly set it up then you may not be protected for data 
corruption. See PITR.


https://www.postgresql.org/docs/12/continuous-archiving.html

Postgres is used by a wide gamut of people of differing abilities, many 
of who appreciate the availability of tested solutions to protect their 
data as they are not morons and understand there are people who can make 
their life easier.






This was actually my job as a consultant: to de-mystify technology,
and make it understandable as an arrangement of well explainable
pieces of functionality with well-deducible consequences.


Not seeing it.


But this is no longer respected today; now people are expected to
*NOT* understand the technology they handle, and instead believe
in marketing and that it all is very complicated and un-intellegible.


cheerio,
PMc




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Something else about Redo Logs disappearing

2020-06-09 Thread Stephen Frost
Greetings,

* Peter (p...@citylink.dinoex.sub.org) wrote:
> On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
> ! * Peter (p...@citylink.dinoex.sub.org) wrote:
> ! > This professional backup solution also offers support for postgres.
> ! > Sadly, it only covers postgres up to Rel.9, and that piece of software
> ! > wasn't touched in the last 6 or 7 years.
> ! 
> ! Then it certainly doesn't work with the changes in v12, and probably has
> ! other issues, as you allude to.
> 
> Just having a look at their webpage, something seems to have been updated
> recently, they now state that they have a new postgres adapter:
> 
> https://www.bareos.com/en/company_news/postgres-plugin-en1.html
> Enjoy reading, and tell us what You think.

I'm afraid I'm not particularly interested in performing a pro bono
evaluation of a commercial product, though considering they've put out a
press release with obviously broken links, I already have suspicions of
what I'd find ... (try clicking on their 'experimental/nightly' link).

A quick look at the docs also shows that it's referring to
recovery.conf, which no longer exists since v12 was released back in
September, so, yeah, isn't exactly current.

> ! > Actually, I am getting very tired of reading that something which can
> ! > easily be done within 20 lines of shell scripting, would need special
> ! 
> ! This is just simply false- you can't do it properly in 20 lines of shell
> ! scripting.
> 
> Well, Your own docs show how to do it with a one-liner. So please
> don't blame me for improving that to 20 lines.

No, the documentation provides an example for the purpose of
understanding how the replacement in the command is done and explicitly
says that you probably shouldn't use that command.

> ! Sure, you can write something that has probably next to no
> ! error checking,
> 
> Before judging that, one should first specify what precisely is the
> demand.

I really don't need to in order to be able to judge the notion of a 20
line shell script being able to manage to perform a backup correctly.

> In my understanding, backup is done via pgdump. The archive logs are
> for emergencies (data corruption, desaster), only. And emergencies
> would usually be handled by some professional people who know what
> they have to do.

No, that's not the case.  pg_dump isn't at all involved in the backups
that we're talking about here which are physical, file-level, backups.

> ! uses the deprecated API that'll cause your systems to
> ! fail to start if you ever happen to have a reboot during a backup
> 
> It is highly unlikely that I did never have that happen during 15
> years. So what does that mean? If I throw in a pg_start_backup('bogus'),
> and then restart the cluster, it will not work anymore?

If you perform a pg_start_backup(), have a checkpoint happen such that
older WAL is removed, and then reboot the box or kill -9 postgres, no,
it's not going to start anymore because there's going to be a
backup_label file that is telling the cluster that it needs to start
replaying WAL from an older point in time than what you've got WAL for.

> Lets see...
> Clean stop/start - no issue whatsoever. (LOG:  online backup mode
> canceled)
> kill -9 the whole flock - no issue whatsoever (Log: database system
> was interrupted)
> I won't pull the plug now, but that has certainly happened lots of
> times in the past, and also yielded no issue whatsoever - simply
> because there *never* was *any* issue whatsover with Postgres (until
> I got the idea to install the relatively fresh R.12 - but that's
> understandable).

Being lucky really isn't what you want to bet on.

> So maybe this problem exists only on Windows?

No, it's not Windows specific.

> And yes, I read that whole horrible discussion, and I could tear my
> hair out, really, concerning the "deprecated API". I suppose You mean
> the mentioning in the docs that the "exclusive low-level backup" is
> somehow deprecated.

Yes, it's deprecated specifically because of the issues outlined above.
They aren't hypothetical, they do happen, and people do get bit by them.

> This is a very big bad. Because: normally you can run the base backup
> as a strictly ordinary file-level backup in "full" mode, just as any
> backup software can do it. You will simply execute the
> pg_start_backup() and pg_stop_backup() commands in the before- and
> after- hooks - and any backup software will offer these hooks.
> 
> But now, with the now recommended "non-exclusive low-level backup",
> the task is different: now your before-hook needs to do two things
> at the same time:
>  1. keep a socket open in order to hold the connection to postgres
> (because postgres will terminate the backup when the socket is
> closed), and
>  2. invoke exit(0) (because the actual backup will not start until
> the before- hook has properly delivered a successful exit code.
> And, that is not only difficult, it is impossible.

One would imagine that if the commercial 

Re: Something else about Redo Logs disappearing

2020-06-09 Thread Peter
On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote:

! The backup solution is?

https://www.bareos.com/

! Fine rant below. Go forth and work your wonders.

I don't need to, anymore. I did that, for about 20 years - people
I used to work for as a consultant (major banks and insurance shops)
would usually run Informix or Oracle. Postgres is just my own private
fancy.

On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! * Peter (p...@citylink.dinoex.sub.org) wrote:
! > This professional backup solution also offers support for postgres.
! > Sadly, it only covers postgres up to Rel.9, and that piece of software
! > wasn't touched in the last 6 or 7 years.
! 
! Then it certainly doesn't work with the changes in v12, and probably has
! other issues, as you allude to.

Just having a look at their webpage, something seems to have been updated
recently, they now state that they have a new postgres adapter:

https://www.bareos.com/en/company_news/postgres-plugin-en1.html
Enjoy reading, and tell us what You think.

! > Actually, I am getting very tired of reading that something which can
! > easily be done within 20 lines of shell scripting, would need special
! 
! This is just simply false- you can't do it properly in 20 lines of shell
! scripting.

Well, Your own docs show how to do it with a one-liner. So please
don't blame me for improving that to 20 lines.

! Sure, you can write something that has probably next to no
! error checking,

Before judging that, one should first specify what precisely is the
demand.
In a basic approach, the demand may be to get the logs out on tape in
a failsafe automated fashion without any miss, and get the data tree
out periodically, and have guaranteed that these files are untampered
as on disk.

And that can very well be done properly with an incremental filesystem
backup software plus some 20 lines of shellscript.

Now talking about doing an automated restore, or, having some menu-
driven solution, or -the worst of all- having a solution that can be
operated by morons - that's an entirely different matter.

In my understanding, backup is done via pgdump. The archive logs are
for emergencies (data corruption, desaster), only. And emergencies
would usually be handled by some professional people who know what
they have to do.

You may consider different demands, and that is also fine, but doesn't
need to concern me.

! uses the deprecated API that'll cause your systems to
! fail to start if you ever happen to have a reboot during a backup

It is highly unlikely that I did never have that happen during 15
years. So what does that mean? If I throw in a pg_start_backup('bogus'),
and then restart the cluster, it will not work anymore?
Lets see...
Clean stop/start - no issue whatsoever. (LOG:  online backup mode
canceled)
kill -9 the whole flock - no issue whatsoever (Log: database system
was interrupted)
I won't pull the plug now, but that has certainly happened lots of
times in the past, and also yielded no issue whatsoever - simply
because there *never* was *any* issue whatsover with Postgres (until
I got the idea to install the relatively fresh R.12 - but that's
understandable).

So maybe this problem exists only on Windows?

And yes, I read that whole horrible discussion, and I could tear my
hair out, really, concerning the "deprecated API". I suppose You mean
the mentioning in the docs that the "exclusive low-level backup" is
somehow deprecated.
This is a very big bad. Because: normally you can run the base backup
as a strictly ordinary file-level backup in "full" mode, just as any
backup software can do it. You will simply execute the
pg_start_backup() and pg_stop_backup() commands in the before- and
after- hooks - and any backup software will offer these hooks.

But now, with the now recommended "non-exclusive low-level backup",
the task is different: now your before-hook needs to do two things
at the same time:
 1. keep a socket open in order to hold the connection to postgres
(because postgres will terminate the backup when the socket is
closed), and
 2. invoke exit(0) (because the actual backup will not start until
the before- hook has properly delivered a successful exit code.
And, that is not only difficult, it is impossible.

So, what has to be done instead: you need to write a separate network
daemon, with the only purpose of holding that connection to postgres
open. And that network daemon needs to handle the communication to
the backup software on one side, and to postgres on the other side.
And that network daemon then needs the full-blown feature requirements
as a fail-safe network daemon should have (and that is a LOT!), plus
it needs to handle all kinds of possible failures (like network
interruptions) in that triangle, during the backup, and properly
notify both sides of whatever might be ongoing (and that is NOT
trivial).

So yes, this is really a LOT of work. But the point is: this all is
not really necessary, because 

Re: Something else about Redo Logs disappearing

2020-06-09 Thread Adrian Klaver

On 6/9/20 12:02 PM, Peter wrote:

On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote:
! Adrian Klaver  writes:
! > On 6/8/20 7:33 PM, Peter wrote:
! >> That "cp" is usually not synchronous. So there is the possibility
! >> that this command terminates successfully, and reports exitcode zero
! >> back to the Postgres, and then the Postgres will consider that log
! >> being safely away.
!
! > Which is why just following the above command in the docs is:
! > "(This is an example, not a recommendation, and might not work on all
! > platforms.) "
!
! Yeah.  There have been discussions about changing that disclaimer to be
! more forceful, because in point of fact a plain "cp" is generally not safe
! enough.  You need to fsync the written file, and on many filesystems you
! also have to fsync the directory it's in.

It certainly does not need to be "more forceful" - because this is not
about behavioural education, like training dogs, horses, or monkeys,
and neither do we entertain a BDSM studio.

What it needs instead is mention of the magic word "fsync". Because,
we already know that - we just need a reminder at the proper place.

Systems integrators are professional people. They are not in need of
more beating (spell: forceful education), only of substantial
technical hints and informations.

! > Generally for peace of mind folks use third party tools like:
!
! +1.  Rolling your own archive script is seldom advisable.

Well then, using commercial solutions brings it's own problems. E.g.,


FYI, the projects Stephen and I mentioned are Open Source. I'm sure you 
can get paid support for them, but you exist a higher plane then that so 
you can use then for free.



the people I happened to work for often had problems with torsion,
which happens when the solution gets longer than, say, twenty meters,
and these are walked at high speeds.

They didn't have a problem with scripting - rather the opposite, they
were happy with it and paid good money for.


cheerio,
PMc




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Something else about Redo Logs disappearing

2020-06-09 Thread Peter
On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote:
! Adrian Klaver  writes:
! > On 6/8/20 7:33 PM, Peter wrote:
! >> That "cp" is usually not synchronous. So there is the possibility
! >> that this command terminates successfully, and reports exitcode zero
! >> back to the Postgres, and then the Postgres will consider that log
! >> being safely away.
! 
! > Which is why just following the above command in the docs is:
! > "(This is an example, not a recommendation, and might not work on all 
! > platforms.) "
! 
! Yeah.  There have been discussions about changing that disclaimer to be
! more forceful, because in point of fact a plain "cp" is generally not safe
! enough.  You need to fsync the written file, and on many filesystems you
! also have to fsync the directory it's in.

It certainly does not need to be "more forceful" - because this is not
about behavioural education, like training dogs, horses, or monkeys,
and neither do we entertain a BDSM studio.

What it needs instead is mention of the magic word "fsync". Because,
we already know that - we just need a reminder at the proper place.

Systems integrators are professional people. They are not in need of
more beating (spell: forceful education), only of substantial
technical hints and informations.

! > Generally for peace of mind folks use third party tools like:
! 
! +1.  Rolling your own archive script is seldom advisable.

Well then, using commercial solutions brings it's own problems. E.g.,
the people I happened to work for often had problems with torsion,
which happens when the solution gets longer than, say, twenty meters,
and these are walked at high speeds.

They didn't have a problem with scripting - rather the opposite, they
were happy with it and paid good money for.


cheerio,
PMc




Re: Something else about Redo Logs disappearing

2020-06-09 Thread Stephen Frost
Greetings,

* Peter (p...@citylink.dinoex.sub.org) wrote:
> This professional backup solution also offers support for postgres.
> Sadly, it only covers postgres up to Rel.9, and that piece of software
> wasn't touched in the last 6 or 7 years.

Then it certainly doesn't work with the changes in v12, and probably has
other issues, as you allude to.

> Actually, I am getting very tired of reading that something which can
> easily be done within 20 lines of shell scripting, would need special

This is just simply false- you can't do it properly in 20 lines of shell
scripting.  Sure, you can write something that has probably next to no
error checking, uses the deprecated API that'll cause your systems to
fail to start if you ever happen to have a reboot during a backup, and
has no way to provide verification that the backup was at all successful
after the fact, but that's not what I'd consider a proper solution-
instead it's one that'll end up causing you a lot of pain down the road.

Even the shell-script based solution (which I've never used and
personally wouldn't really recommend, but to each their own) called
'pitery' (available here: https://github.com/dalibo/pitrery) is
thousands of lines of code.

> Does nobody know anymore how to do proper systems management
> scripting? Using just the basic system tools which have proven to
> work for more than 50 years now!?

I've not met anything I'd call 'proper systems management scripting'
that's 20 lines of code, shell script or not.

> ! Not sure about pg_probackup.
> 
> Okay, I had a -very short- look into these. Just scanning the
> introductory pages.
> 
> The only really interesting thing there is the pg_probackup. These
> folks seem to have found a way to do row-level incremental backups.

pg_probackup doesn't do row-level incremental backups, unless I've
missed some pretty serious change in its development, but it does
provide page-level, with, as I recall, an extension that didn't get
good reception when it was posted and discussed on these mailing lists
by other PG hackers.  I don't know if those concerns about it have been
addressed or not, you might ask the pg_probackup folks if you're
considering it as a solution.

> This is fine as long as you do not run any computers, and the only
> application you are using is Postgres.
> But, if you have other applications as well, or have computers, then
> you will need a different backup solution, something that will cover
> your site-wide backup demands, in a consistent fashion (think
> something in the style of ADSM, or nowadays called Spectrum Protect).
> 
> And then 90% of the things offered here become superfluous, because
> they are already handled site-wide. And then you will have to
> consider integration of both pieces - and that will most likely be
> more work and more error-prone than just writing a few adapters in
> shell.

pgbackrest's repo can be safely backed up using the simple file-based
backup utilities that you're referring to here.  I suspect some of the
other solution's backups also could be, but you'd probably want to make
sure.

PG generally isn't something that can be backed up using the simple file
based backup solutions, as you might appreciate from just considering
the number of tools written to specifically deal with the complexity of
backing up an online PG cluster.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Something else about Redo Logs disappearing

2020-06-09 Thread Adrian Klaver

On 6/9/20 10:55 AM, Peter wrote:

On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote:
!
! On 6/8/20 7:33 PM, Peter wrote:
! >
! > Actually, the affair had some good side: as usual I was checking
! > my own designs first and looking for flaws, and indeed I found one:
! > If you do copy out the archive logs not directly to tape, but to
! > some disk area for further processing, then there is an issue with
! > possible loss. If you do it like the docs say, with a command like
! > this:
! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
! > +/mnt/server/archivedir/%f'  # Unix
! > That "cp" is usually not synchronous. So there is the possibility
! > that this command terminates successfully, and reports exitcode zero
! > back to the Postgres, and then the Postgres will consider that log
! > being safely away.
!
! Which is why just following the above command in the docs is:
!
! "(This is an example, not a recommendation, and might not work on all
! platforms.) "

So, what You are basically saying is: my worries are justified and
correctly founded, and this is indeed a matter that needs to be taken
care of.
Thank You.

! Generally for peace of mind folks use third party tools like:
!
! pg_backrest(https://pgbackrest.org/),
! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or
! Barman(https://www.pgbarman.org/).

Hmja. We may on occasion have a look into these...

! I use pg_backrest, but it does not look promising for running on BSD:
! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html

That looks mostly like the usual things which can be fixed.

Now, for the facts: I am already using a professional backup
solution. (It is actually a "dual-use/commercial" solution, of the
kind which you can either fetch from github and use without support,
or buy with a contract or whatever and get support.)

With this professional backup solution I have already identified 28
(spell: twenty-eight) bugs, and fixed/workarounded these, until I got it
properly working.

This professional backup solution also offers support for postgres.
Sadly, it only covers postgres up to Rel.9, and that piece of software
wasn't touched in the last 6 or 7 years.
But the bigger issue there is, that backup solution needs it's
own postgres database as it's backend - and it cannot backup the
database it is using. Looks quite pointless to me, then.
So I just did it all with shell (and it wasn't many lines).


The backup solution is?



So now, as I've been thru identifying and handling all the issues in
that one backup solution, and since it is supposed to handle *all*
backup demands (and not only postgres), I will certainly not start
and go thru the same process again with one of these supposed
solutions, where 90% of the code tries to solve the same things
redundantly again, but then only for PG.


They are not supposed. They are in use by many people/organizations 
across a wide variety of installations.





Actually, I am getting very tired of reading that something which can
easily be done within 20 lines of shell scripting, would need special
"solutions", solutions that need to be compiled, solutions that would
bring along their own fashion of interpreter, solutions that have a
lot of their own dependencies and introduce mainly one thing: new bugs.


They where developed as they could not be done in 20 lines of shell 
scripting and work at a reliable level.


Fine rant below. Go forth and work your wonders.



Does nobody know anymore how to do proper systems management
scripting? Using just the basic system tools which have proven to
work for more than 50 years now!?

! Not sure about pg_probackup.

Okay, I had a -very short- look into these. Just scanning the
introductory pages.

The only really interesting thing there is the pg_probackup. These
folks seem to have found a way to do row-level incremental backups.

And pgbarman seems to have an impressive understanding of ITIL (in
case anybody bothers about that).

All these tools do only cover PG, but do that in any possible regards.

This is fine as long as you do not run any computers, and the only
application you are using is Postgres.
But, if you have other applications as well, or have computers, then
you will need a different backup solution, something that will cover
your site-wide backup demands, in a consistent fashion (think
something in the style of ADSM, or nowadays called Spectrum Protect).

And then 90% of the things offered here become superfluous, because
they are already handled site-wide. And then you will have to
consider integration of both pieces - and that will most likely be
more work and more error-prone than just writing a few adapters in
shell.



cheerio,
PMc




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Something else about Redo Logs disappearing

2020-06-09 Thread Peter
On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote:
! 
! On 6/8/20 7:33 PM, Peter wrote:
! > 
! > Actually, the affair had some good side: as usual I was checking
! > my own designs first and looking for flaws, and indeed I found one:
! > If you do copy out the archive logs not directly to tape, but to
! > some disk area for further processing, then there is an issue with
! > possible loss. If you do it like the docs say, with a command like
! > this:
! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
! > +/mnt/server/archivedir/%f'  # Unix
! > That "cp" is usually not synchronous. So there is the possibility
! > that this command terminates successfully, and reports exitcode zero
! > back to the Postgres, and then the Postgres will consider that log
! > being safely away.
! 
! Which is why just following the above command in the docs is:
! 
! "(This is an example, not a recommendation, and might not work on all
! platforms.) "

So, what You are basically saying is: my worries are justified and
correctly founded, and this is indeed a matter that needs to be taken
care of.
Thank You.

! Generally for peace of mind folks use third party tools like:
! 
! pg_backrest(https://pgbackrest.org/),
! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or
! Barman(https://www.pgbarman.org/).

Hmja. We may on occasion have a look into these...

! I use pg_backrest, but it does not look promising for running on BSD:
! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html

That looks mostly like the usual things which can be fixed.

Now, for the facts: I am already using a professional backup
solution. (It is actually a "dual-use/commercial" solution, of the
kind which you can either fetch from github and use without support,
or buy with a contract or whatever and get support.)

With this professional backup solution I have already identified 28
(spell: twenty-eight) bugs, and fixed/workarounded these, until I got it
properly working.

This professional backup solution also offers support for postgres.
Sadly, it only covers postgres up to Rel.9, and that piece of software
wasn't touched in the last 6 or 7 years.
But the bigger issue there is, that backup solution needs it's
own postgres database as it's backend - and it cannot backup the
database it is using. Looks quite pointless to me, then.
So I just did it all with shell (and it wasn't many lines).

So now, as I've been thru identifying and handling all the issues in
that one backup solution, and since it is supposed to handle *all*
backup demands (and not only postgres), I will certainly not start
and go thru the same process again with one of these supposed
solutions, where 90% of the code tries to solve the same things
redundantly again, but then only for PG.


Actually, I am getting very tired of reading that something which can
easily be done within 20 lines of shell scripting, would need special
"solutions", solutions that need to be compiled, solutions that would
bring along their own fashion of interpreter, solutions that have a
lot of their own dependencies and introduce mainly one thing: new bugs.

Does nobody know anymore how to do proper systems management
scripting? Using just the basic system tools which have proven to
work for more than 50 years now!?

! Not sure about pg_probackup.

Okay, I had a -very short- look into these. Just scanning the
introductory pages.

The only really interesting thing there is the pg_probackup. These
folks seem to have found a way to do row-level incremental backups.

And pgbarman seems to have an impressive understanding of ITIL (in
case anybody bothers about that).

All these tools do only cover PG, but do that in any possible regards.

This is fine as long as you do not run any computers, and the only
application you are using is Postgres.
But, if you have other applications as well, or have computers, then
you will need a different backup solution, something that will cover
your site-wide backup demands, in a consistent fashion (think
something in the style of ADSM, or nowadays called Spectrum Protect).

And then 90% of the things offered here become superfluous, because
they are already handled site-wide. And then you will have to
consider integration of both pieces - and that will most likely be
more work and more error-prone than just writing a few adapters in
shell.



cheerio,
PMc




Re: Something else about Redo Logs disappearing

2020-06-09 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 6/9/20 4:15 AM, Stephen Frost wrote:
> >* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> >>I use pg_backrest, but it does not look promising for running on BSD:
> >>https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html
> >
> >That's an unfortunately ancient post, really, considering that
> >pgbackrest has now been fully rewritten into C, and Luca as recently as
> >September 2019 was saying he has it working on FreeBSD.
> 
> Yeah, but this:
> 
> https://github.com/pgbackrest/pgbackrest/issues/686
> 
> is not clear on whether a user can do that w/o a certain amount of hand
> holding.

I've asked Luca to update his blog post and/or re-test on FreeBSD and
he's said he would.  We've moved to using autoconf and friends, and it's
all in C now, so it really shouldn't be as much of an issue these days.
I recall someone else building on FreeBSD not long ago, but not finding
a reference to it offhand.

> >If folks do run into issues with pgbackrest on FreeBSD, please let us
> >know.

... again, this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Something else about Redo Logs disappearing

2020-06-09 Thread Adrian Klaver

On 6/9/20 4:15 AM, Stephen Frost wrote:

Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:

I use pg_backrest, but it does not look promising for running on BSD:
https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html


That's an unfortunately ancient post, really, considering that
pgbackrest has now been fully rewritten into C, and Luca as recently as
September 2019 was saying he has it working on FreeBSD.


Yeah, but this:

https://github.com/pgbackrest/pgbackrest/issues/686

is not clear on whether a user can do that w/o a certain amount of hand 
holding.




If folks do run into issues with pgbackrest on FreeBSD, please let us
know.

Thanks,

Stephen




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Something else about Redo Logs disappearing

2020-06-09 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> I use pg_backrest, but it does not look promising for running on BSD:
> https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html

That's an unfortunately ancient post, really, considering that
pgbackrest has now been fully rewritten into C, and Luca as recently as
September 2019 was saying he has it working on FreeBSD.

If folks do run into issues with pgbackrest on FreeBSD, please let us
know.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Something else about Redo Logs disappearing

2020-06-08 Thread Tom Lane
Adrian Klaver  writes:
> On 6/8/20 7:33 PM, Peter wrote:
>> That "cp" is usually not synchronous. So there is the possibility
>> that this command terminates successfully, and reports exitcode zero
>> back to the Postgres, and then the Postgres will consider that log
>> being safely away.

> Which is why just following the above command in the docs is:
> "(This is an example, not a recommendation, and might not work on all 
> platforms.) "

Yeah.  There have been discussions about changing that disclaimer to be
more forceful, because in point of fact a plain "cp" is generally not safe
enough.  You need to fsync the written file, and on many filesystems you
also have to fsync the directory it's in.

> Generally for peace of mind folks use third party tools like:

+1.  Rolling your own archive script is seldom advisable.

regards, tom lane




Re: Something else about Redo Logs disappearing

2020-06-08 Thread Adrian Klaver



On 6/8/20 7:33 PM, Peter wrote:


Actually, the affair had some good side: as usual I was checking
my own designs first and looking for flaws, and indeed I found one:
 
If you do copy out the archive logs not directly to tape, but to

some disk area for further processing, then there is an issue with
possible loss. If you do it like the docs say, with a command like
this:
 
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p

+/mnt/server/archivedir/%f'  # Unix
 
That "cp" is usually not synchronous. So there is the possibility

that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log
being safely away.


Which is why just following the above command in the docs is:

"(This is an example, not a recommendation, and might not work on all 
platforms.) "


Generally for peace of mind folks use third party tools like:

pg_backrest(https://pgbackrest.org/),
pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) 
or Barman(https://www.pgbarman.org/).


as they offer safety checks for your backups.

I use pg_backrest, but it does not look promising for running on BSD:
https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html

Not sure about pg_probackup.

Barman is Python package:
http://docs.pgbarman.org/release/2.10/#installation-from-sources

 
But the target of the copy may not yet been written to disk. If

at that point a power loss happens, the log may become missing/damaged/
incomplete, while the database may or may not consider it done
when restarting.
 
Therefore, mounting such a target filesystem in all-synchronous mode

might be a good idea. (UFS: "-o sync", ZFS: "set sync=always")
 
cheerio,

PMc





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Something else about Redo Logs disappearing

2020-06-08 Thread Peter


Actually, the affair had some good side: as usual I was checking
my own designs first and looking for flaws, and indeed I found one: 

If you do copy out the archive logs not directly to tape, but to
some disk area for further processing, then there is an issue with  
possible loss. If you do it like the docs say, with a command like  
this:   

archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p 
+/mnt/server/archivedir/%f'  # Unix 

That "cp" is usually not synchronous. So there is the possibility   
that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log  
being safely away.  

But the target of the copy may not yet been written to disk. If 
at that point a power loss happens, the log may become missing/damaged/ 
incomplete, while the database may or may not consider it done  
when restarting.

Therefore, mounting such a target filesystem in all-synchronous mode
might be a good idea. (UFS: "-o sync", ZFS: "set sync=always")  

cheerio,
PMc