Re: [GENERAL] Backup "Best Practices"

2016-11-29 Thread Jeff Janes
On Mon, Nov 28, 2016 at 11:20 PM, Thomas Kellerer 
wrote:

> Israel Brewster schrieb am 28.11.2016 um 23:50:
>
>>
>>> pg_archivecleanup -n /mnt/server/archiverdir
>>> 00010010.0020.backup
>>>
>>
>> Ok, but where does that "00010010.0020.backup"
>> come from? I mean, I can tell it's a WAL segment file name (plus a
>> backup label), but I don't have anything like that in my WAL
>> archives, even though I've run pg_basebackup a couple of times. Do I
>> have to call something to create that file? Some flag to
>> pg_basebackup? At the moment I am running pg_basebackup such that it
>> generates gziped tar files, if that makes a difference.
>>
>
> The .backup file will be inside the tar file if I'm not mistaken


I don't think it will be, but there will be a backup_label file in there,
which contains much of the same contents as the .backup file does.  But in
this case, the contents of the file are not important, only the name is.
 pg_archivecleanup doesn't attempt to open the file-name given as the
second argument, it just looks at the name itself.  So you could pull
backup_label out of the tar file, parse the contents and use them to
construct the command to give to pg_archivecleanup.

I think it would really be nice if pg_basebackup -D backup_dir -Ft would
create the backup_label file not only in the tarball, but also (as the
final step) create it as a loosie file in the backup_dir.

Cheers,

Jeff


Re: [GENERAL] Backup "Best Practices"

2016-11-29 Thread Israel Brewster
On Nov 29, 2016, at 8:12 AM, Israel Brewster  wrote:
> 
> On Nov 28, 2016, at 10:04 PM, Jeff Janes  > wrote:
>> 
>> On Mon, Nov 28, 2016 at 2:50 PM, Israel Brewster > > wrote:
>> 
>>> - What is the "best" (or just a good) method of keeping the WAL archives 
>>> under control? Obviously when I do a new basebackup I can "cleanup" any old 
>>> files that said backup doesn't need, 
>>> 
>>> You have said you might be interested in doing PITR. So you want to delay 
>>> the cleanup so as to not compromise that ability.  You need to develop a 
>>> policy on how far back you want to be able to do a PITR.
>>> 
>>>  
>>> but how do I know what those are?
>>> 
>>> pg_archivecleanup -n /mnt/server/archiverdir 
>>> 00010010.0020.backup
>> 
>> Ok, but where does that "00010010.0020.backup" come 
>> from? I mean, I can tell it's a WAL segment file name (plus a backup label), 
>> but I don't have anything like that in my WAL archives, even though I've run 
>> pg_basebackup a couple of times. 
>> 
>> I get one file like that for every pg_basebackup I run.  Could your 
>> archive_command be doing something to specifically short-circuit the writing 
>> of those files?  Like testing the length of %p or %f?
> 
> My archive command is simply a copy - straight out of the examples given in 
> the documentation, actually. Only test I do is to make sure the file doesn't 
> exist before running the copy
> 
>> Do I have to call something to create that file? Some flag to pg_basebackup? 
>> At the moment I am running pg_basebackup such that it generates gziped tar 
>> files, if that makes a difference.
>> 
>> 
>> That is how I run it as well.  I don't think there is a flag to 
>> pg_basebackup which even allows you to bypass the creation of those files.  
>> You are looking in the WAL archive itself, correct?  Not somewhere in a 
>> listing of the base.tar.gz file? 
> 
> I am looking at the WAL archive itself. One thing that just occurred to me: 
> in my testing, I've been running the base backup from the secondary slave 
> server. Perhaps that makes a difference? I know the slave itself doesn't 
> archive WAL files, but I would have expected the master to get the message a 
> backup was being run and do any needed archiving itself.

So to test, I ran a base backup from my primary server rather than the 
secondary - and the .backup file WAS indeed created in the WAL archive 
directory. So I guess that means I have to run base backups from the primary 
server. Are there any performance implications to doing this that I should be 
aware of? Something that would imply I need to make sure to run the backup 
during lull periods?

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
>> 
>> Cheers,
>> 
>> Jeff



Re: [GENERAL] Backup "Best Practices"

2016-11-29 Thread Israel Brewster
On Nov 28, 2016, at 10:20 PM, Thomas Kellerer  wrote:
> 
> Israel Brewster schrieb am 28.11.2016 um 23:50:
>>> 
>>> pg_archivecleanup -n /mnt/server/archiverdir 
>>> 00010010.0020.backup
>> 
>> Ok, but where does that "00010010.0020.backup"
>> come from? I mean, I can tell it's a WAL segment file name (plus a
>> backup label), but I don't have anything like that in my WAL
>> archives, even though I've run pg_basebackup a couple of times. Do I
>> have to call something to create that file? Some flag to
>> pg_basebackup? At the moment I am running pg_basebackup such that it
>> generates gziped tar files, if that makes a difference.
> 
> The .backup file will be inside the tar file if I'm not mistaken

Oh, ok - thanks. I'll check on that. Although since I am generating gziped tar 
files, that's not the easiest thing in the world - I'll have to decompress the 
file first, then extract the .backup file. Still, worth looking into I suppose 
:-)

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---
> 
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Backup "Best Practices"

2016-11-29 Thread Israel Brewster
On Nov 28, 2016, at 10:04 PM, Jeff Janes  wrote:
> 
> On Mon, Nov 28, 2016 at 2:50 PM, Israel Brewster  > wrote:
> 
>> - What is the "best" (or just a good) method of keeping the WAL archives 
>> under control? Obviously when I do a new basebackup I can "cleanup" any old 
>> files that said backup doesn't need,
>> 
>> You have said you might be interested in doing PITR. So you want to delay 
>> the cleanup so as to not compromise that ability.  You need to develop a 
>> policy on how far back you want to be able to do a PITR.
>> 
>>  
>> but how do I know what those are?
>> 
>> pg_archivecleanup -n /mnt/server/archiverdir 
>> 00010010.0020.backup
> 
> Ok, but where does that "00010010.0020.backup" come from? 
> I mean, I can tell it's a WAL segment file name (plus a backup label), but I 
> don't have anything like that in my WAL archives, even though I've run 
> pg_basebackup a couple of times.
> 
> I get one file like that for every pg_basebackup I run.  Could your 
> archive_command be doing something to specifically short-circuit the writing 
> of those files?  Like testing the length of %p or %f?

My archive command is simply a copy - straight out of the examples given in the 
documentation, actually. Only test I do is to make sure the file doesn't exist 
before running the copy

> Do I have to call something to create that file? Some flag to pg_basebackup? 
> At the moment I am running pg_basebackup such that it generates gziped tar 
> files, if that makes a difference.
> 
> 
> That is how I run it as well.  I don't think there is a flag to pg_basebackup 
> which even allows you to bypass the creation of those files.  You are looking 
> in the WAL archive itself, correct?  Not somewhere in a listing of the 
> base.tar.gz file? 

I am looking at the WAL archive itself. One thing that just occurred to me: in 
my testing, I've been running the base backup from the secondary slave server. 
Perhaps that makes a difference? I know the slave itself doesn't archive WAL 
files, but I would have expected the master to get the message a backup was 
being run and do any needed archiving itself.

> 
> Cheers,
> 
> Jeff



Re: [GENERAL] Backup "Best Practices"

2016-11-28 Thread Thomas Kellerer

Israel Brewster schrieb am 28.11.2016 um 23:50:


pg_archivecleanup -n /mnt/server/archiverdir 
00010010.0020.backup


Ok, but where does that "00010010.0020.backup"
come from? I mean, I can tell it's a WAL segment file name (plus a
backup label), but I don't have anything like that in my WAL
archives, even though I've run pg_basebackup a couple of times. Do I
have to call something to create that file? Some flag to
pg_basebackup? At the moment I am running pg_basebackup such that it
generates gziped tar files, if that makes a difference.


The .backup file will be inside the tar file if I'm not mistaken




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup "Best Practices"

2016-11-28 Thread Jeff Janes
On Mon, Nov 28, 2016 at 2:50 PM, Israel Brewster 
wrote:

>
> - What is the "best" (or just a good) method of keeping the WAL archives
>> under control? Obviously when I do a new basebackup I can "cleanup" any old
>> files that said backup doesn't need,
>>
>
> You have said you might be interested in doing PITR. So you want to delay
> the cleanup so as to not compromise that ability.  You need to develop a
> policy on how far back you want to be able to do a PITR.
>
>
>
>> but how do I know what those are?
>>
>
> pg_archivecleanup -n /mnt/server/archiverdir 00010010.
> 0020.backup
>
>
> Ok, but where does that "00010010.0020.backup" come
> from? I mean, I can tell it's a WAL segment file name (plus a backup
> label), but I don't have anything like that in my WAL archives, even though
> I've run pg_basebackup a couple of times.
>

I get one file like that for every pg_basebackup I run.  Could your
archive_command be doing something to specifically short-circuit the
writing of those files?  Like testing the length of %p or %f?




> Do I have to call something to create that file? Some flag to
> pg_basebackup? At the moment I am running pg_basebackup such that it
> generates gziped tar files, if that makes a difference.
>


That is how I run it as well.  I don't think there is a flag to
pg_basebackup which even allows you to bypass the creation of those files.
You are looking in the WAL archive itself, correct?  Not somewhere in a
listing of the base.tar.gz file?

Cheers,

Jeff


Re: [GENERAL] Backup "Best Practices"

2016-11-28 Thread Israel Brewster
On Nov 28, 2016, at 2:13 PM, John R Pierce  wrote:
> 
> On 11/28/2016 2:50 PM, Israel Brewster wrote:
>>> 
>>> Why wouldn't the streaming replica salvage that?  Are they expected to fail 
>>> together?  Is the NFS share onto which you want to store your basebackup 
>>> and WAL also expected to fail together with them?
>> 
>> That's why I specified *total* failure. If only the primary dies, that's a 
>> simple cutover to the secondary, and not a total failure - no need to go to 
>> backups of any sort under that scenario :-) I'm thinking more along the 
>> lines of server room burns down, natural disaster, etc - something that 
>> causes a total failure of the db system, necessitating recovery from an 
>> offsite backup.
> 
> your base backups and WAL archive should be replicated offsite, then.
> 
> 

Obviously :-)
> 
> -- 
> john r pierce, recycling bits in santa cruz



Re: [GENERAL] Backup "Best Practices"

2016-11-28 Thread Israel Brewster
On Nov 25, 2016, at 1:00 PM, Jeff Janes  wrote:
> 
> On Wed, Nov 23, 2016 at 10:16 AM, Israel Brewster  > wrote:
> I was wondering if someone could inform me about, or point me to an online 
> article about best practices for backing up a postgresql database cluster? At 
> the moment, I have two servers running with streaming replication for 
> failover purposes, and then I do nightly database dumps for recovery/backup 
> purposes. However, I feel this approach is lacking in a couple areas I can 
> think of:
> 
> - If a total failure occurs towards the end of the day, we could potentially 
> loose a whole days worth of data.
> 
> Why wouldn't the streaming replica salvage that?  Are they expected to fail 
> together?  Is the NFS share onto which you want to store your basebackup and 
> WAL also expected to fail together with them?

That's why I specified *total* failure. If only the primary dies, that's a 
simple cutover to the secondary, and not a total failure - no need to go to 
backups of any sort under that scenario :-) I'm thinking more along the lines 
of server room burns down, natural disaster, etc - something that causes a 
total failure of the db system, necessitating recovery from an offsite backup.

>  
> Similar argument for user error - there is no way to "undo" a catastrophic 
> user data error without going all the way back to the previous day
> - Less-than-ideal recovery under some scenarios. Since each database in the 
> cluster is dumped individually, this is good should only *one* database need 
> to be restored, but could get tedious should the entire cluster need to be 
> restored.
> 
> To mitigate these issues, I am thinking of supplementing the individual dumps 
> with a full base backup and WAL archiving to a NFS share. This should enable 
> (relatively) quick/easy recovery from backup, plus the ability to do PIT 
> Recovery. I do have a few questions with this approach, however:
> 
> - How do I figure out how often I should take a full base backup? I know this 
> will depend on the amount of traffic my database is doing, and how long I am 
> willing to wait for WAL files to be replayed - the more WAL files needing 
> replayed, the longer recovery will take - but is there some rule of thumb 
> that I can use to calculate how often I need a new base backup? Perhaps based 
> on the number of WAL files?
> 
> You have to try it and see.  Different types of wal records will take 
> different amounts of time to re-play, so there is no rule of thumb. It would 
> depend on the type of traffic you have in your database.  And it could be 
> limited by a single CPU, or by IO.  If the restore_command needs to restore 
> the WAL from a remote server, it is very likely to be limited by the latency 
> of doing that.  In fact, this is often the bottleneck even if it is restoring 
> from the local server, at least if archival is often driven by 
> archive_timeout.
> 
> When I need to re-clone production to get a fresh server to use for dev or 
> testing, I do so using almost exactly the same method I would use for 
> restoring production from a disaster (restore from most recent basebackup, 
> then recovery from WAL archive).  So I know how long it takes for the 
> recovery to happen based on true experience, and I take a new basebackup when 
> that length of time starts to annoy me.

Gotcha. Guess I'll have to start running some tests :-)

>  
> - What is the "best" (or just a good) method of keeping the WAL archives 
> under control? Obviously when I do a new basebackup I can "cleanup" any old 
> files that said backup doesn't need,
> 
> You have said you might be interested in doing PITR. So you want to delay the 
> cleanup so as to not compromise that ability.  You need to develop a policy 
> on how far back you want to be able to do a PITR.
> 
>  
> but how do I know what those are?
> 
> pg_archivecleanup -n /mnt/server/archiverdir 
> 00010010.0020.backup

Ok, but where does that "00010010.0020.backup" come from? I 
mean, I can tell it's a WAL segment file name (plus a backup label), but I 
don't have anything like that in my WAL archives, even though I've run 
pg_basebackup a couple of times. Do I have to call something to create that 
file? Some flag to pg_basebackup? At the moment I am running pg_basebackup such 
that it generates gziped tar files, if that makes a difference.

> 
>  
> - Should I be looking at any other backup methods in addition to/instead of 
> the basebackup/WAL archive scheme?
> 
> 
> You may want to consider pg_receivexlog to maintain your WAL archive, rather 
> than archive_command.  That way you don't have to worry about the trades off 
> caused by setting archive_timeout.  But unless you use it with a replication 
> slot, it is not very safe as the pg_receivexlog could stop working and your 
> database would happy run along without protection.  

Re: [GENERAL] Backup "Best Practices"

2016-11-25 Thread Jeff Janes
On Wed, Nov 23, 2016 at 10:16 AM, Israel Brewster 
wrote:

> I was wondering if someone could inform me about, or point me to an online
> article about best practices for backing up a postgresql database cluster?
> At the moment, I have two servers running with streaming replication for
> failover purposes, and then I do nightly database dumps for recovery/backup
> purposes. However, I feel this approach is lacking in a couple areas I can
> think of:
>
> - If a total failure occurs towards the end of the day, we could
> potentially loose a whole days worth of data.
>

Why wouldn't the streaming replica salvage that?  Are they expected to fail
together?  Is the NFS share onto which you want to store your basebackup
and WAL also expected to fail together with them?


> Similar argument for user error - there is no way to "undo" a catastrophic
> user data error without going all the way back to the previous day
> - Less-than-ideal recovery under some scenarios. Since each database in
> the cluster is dumped individually, this is good should only *one* database
> need to be restored, but could get tedious should the entire cluster need
> to be restored.
>
> To mitigate these issues, I am thinking of supplementing the individual
> dumps with a full base backup and WAL archiving to a NFS share. This should
> enable (relatively) quick/easy recovery from backup, plus the ability to do
> PIT Recovery. I do have a few questions with this approach, however:
>
> - How do I figure out how often I should take a full base backup? I know
> this will depend on the amount of traffic my database is doing, and how
> long I am willing to wait for WAL files to be replayed - the more WAL files
> needing replayed, the longer recovery will take - but is there some rule of
> thumb that I can use to calculate how often I need a new base backup?
> Perhaps based on the number of WAL files?
>

You have to try it and see.  Different types of wal records will take
different amounts of time to re-play, so there is no rule of thumb. It
would depend on the type of traffic you have in your database.  And it
could be limited by a single CPU, or by IO.  If the restore_command needs
to restore the WAL from a remote server, it is very likely to be limited by
the latency of doing that.  In fact, this is often the bottleneck even if
it is restoring from the local server, at least if archival is often driven
by archive_timeout.

When I need to re-clone production to get a fresh server to use for dev or
testing, I do so using almost exactly the same method I would use for
restoring production from a disaster (restore from most recent basebackup,
then recovery from WAL archive).  So I know how long it takes for the
recovery to happen based on true experience, and I take a new basebackup
when that length of time starts to annoy me.




> - What is the "best" (or just a good) method of keeping the WAL archives
> under control? Obviously when I do a new basebackup I can "cleanup" any old
> files that said backup doesn't need,
>

You have said you might be interested in doing PITR. So you want to delay
the cleanup so as to not compromise that ability.  You need to develop a
policy on how far back you want to be able to do a PITR.



> but how do I know what those are?
>

pg_archivecleanup -n /mnt/server/archiverdir
00010010.0020.backup



> - Should I be looking at any other backup methods in addition to/instead
> of the basebackup/WAL archive scheme?
>


You may want to consider pg_receivexlog to maintain your WAL archive,
rather than archive_command.  That way you don't have to worry about the
trades off caused by setting archive_timeout.  But unless you use it with a
replication slot, it is not very safe as the pg_receivexlog could stop
working and your database would happy run along without protection.  Also,
it is hard to be sure you are reliably issuing an fsyncs over NFS, so with
archive_command over NFS there is always the risk your WAL data is not
actually reaching disk in a timely fashion.  So if you can run
pg_receivexlog running on the NFS-host machine pointed to the local
storage, not looping back over NFS, that is safer.

Cheers,

Jeff


Re: [GENERAL] Backup "Best Practices"

2016-11-24 Thread Vick Khera
On Wed, Nov 23, 2016 at 1:16 PM, Israel Brewster 
wrote:

> I was wondering if someone could inform me about, or point me to an online
> article about best practices for backing up a postgresql database cluster?
> At the moment, I have two servers running with streaming replication for
> failover purposes, and then I do nightly database dumps for recovery/backup
> purposes. However, I feel this approach is lacking in a couple areas I can
> think of:
>

The first thing you have to do is list all of the problems you want to
solve by having backups. You did list some of them here, but you should
think long and hard about it. I did not see anything about off-site backups
for example. Those are necessary if you want to protect yourself against
something like a fire destroying everything at your primary location.
Consider your business costs for losing X hours of data vs the cost of
storing that data. The backup and recovery you come up with will have to be
tailored to your needs.

For my use case I do the following:

1 live backup using slony for immediate server failover.
Nightly pg_dump snapshots kept for 2 weeks for customer "pilot error"
recovery
The nightly pg_dump files are kept for 2 years for the first of every month
to help recover from pilot error. So far the longest back I've needed to
restore has been about 6 months, but disk space is cheap for me. These
files are kept on a third server that is not one of the two primary DB
servers.

The pg_dump files are rsync'd from my data center to a filer at my main
office about 40 miles away, for my off-site disaster recovery.

If I could afford a faster and more reliable network connection at the
office, I'd set up live streaming backup offsite as well. However, the
physical location makes that not likely to happen any time soon :(

Also consider the extra burden that doing your backups puts on your
servers. Do they have enough I/O capacity to handle it, especially at peak
demand times?


Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Andreas Kretschmer


Am 23. November 2016 20:31:21 MEZ, schrieb John R Pierce :
>On 11/23/2016 11:20 AM, Israel Brewster wrote:
>> How do I determine which those are? Just based on the timestamp if
>the 
>> WAL file, such that I could do something like take the timestamp of 
>> the last basebackup and delete all WAL files older than that? Or is 
>> there a better way?
>
>there's a tool, barman, I've never used but its supposed to greatly 
>simplify this whole process...
>
>http://www.pgbarman.org/


Barman is a really great tool.

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Joe Conway
On 11/23/2016 02:50 PM, Israel Brewster wrote:
> On Nov 23, 2016, at 10:31 AM, John R Pierce > there's a tool, barman, I've never used but its supposed to greatly
>> simplify this whole process...
>>
>> http://www.pgbarman.org/
> 
> Definitely looks like something to try. Thanks!

I'd recommend you also look at pgbackrest:
https://github.com/pgbackrest/pgbackrest

HTH,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Israel Brewster
On Nov 23, 2016, at 10:31 AM, John R Pierce  wrote:
> 
> On 11/23/2016 11:20 AM, Israel Brewster wrote:
>> How do I determine which those are? Just based on the timestamp if the WAL 
>> file, such that I could do something like take the timestamp of the last 
>> basebackup and delete all WAL files older than that? Or is there a better 
>> way?
> 
> there's a tool, barman, I've never used but its supposed to greatly simplify 
> this whole process...
> 
> http://www.pgbarman.org/ 

Definitely looks like something to try. Thanks!

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---
> 
> -- 
> john r pierce, recycling bits in santa cruz
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread John R Pierce

On 11/23/2016 11:20 AM, Israel Brewster wrote:
How do I determine which those are? Just based on the timestamp if the 
WAL file, such that I could do something like take the timestamp of 
the last basebackup and delete all WAL files older than that? Or is 
there a better way?


there's a tool, barman, I've never used but its supposed to greatly 
simplify this whole process...


http://www.pgbarman.org/

--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Israel Brewster

---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


On Nov 23, 2016, at 9:42 AM, John R Pierce  wrote:On 11/23/2016 10:16 AM, Israel Brewster wrote:To mitigate these issues, I am thinking of supplementing the individual dumps with a full base backup and WAL archiving to a NFS share. This should enable (relatively) quick/easy recovery from backup, plus the ability to do PIT Recovery. I do have a few questions with this approach, however:if you do the base backup periodically (daily? weekly?   monthly? kinda depends on your workload, data volume, and available space on your WAL archive server)- How do I figure out how often I should take a full base backup? I know this will depend on the amount of traffic my database is doing, and how long I am willing to wait for WAL files to be replayed - the more WAL files needing replayed, the longer recovery will take - but is there some rule of thumb that I can use to calculate how often I need a new base backup? Perhaps based on the number of WAL files?frequency of doing this is entirely dependent on your data volume, how long the backups take, and your tolerance for restore times. wal archive recoveries are generally quite fast, but of course if there's millions of wal files to recover, that would take a long time.  note, too, 9.6 has some major performance enhancements in how checkpoints are written, which should speed things up overall.Backups (using pg_basebackup to a tar file, gziped) take about 45 minutes. Part of the goal here is to minimize restore time in a disaster recovery scenario, so *fairly* low tolerance for restore times (say, after getting the base backup in place, another 5 minutes or less to get it up-and-running). My difficulty is that I don't know if that translates into two WAL files or two thousand. But that was really my question - is there some way to know how many WAL files are too many, and by extension how often I need to do a new base backup? Obviously how quickly I get to that point of "too many" WAL files could vary widely, but if I could get some idea of what that line is, or at least how to figure out that line, that would really help with planning.From what you said, it sounds like I can go for a while, and allow quite a few WAL files to pile up (especially with 9.6, which I will be moving to shortly) between base backups without creating too large a recovery delay. So I'm thinking I may try weekly or even monthly base backups and see how that goes.- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need, but how do I know what those are?- Should I be looking at any other backup methods in addition to/instead of the basebackup/WAL archive scheme?I would keep at least 2 base backups and all wal files since the start of the oldest base backup.    Good thinking. I'll look into implementing that.when you create a new base backup, delete the oldest, and all wal archive files up to the start of the following one. How do I determine which those are? Just based on the timestamp if the WAL file, such that I could do something like take the timestamp of the last basebackup and delete all WAL files older than that? Or is there a better way?the reason for keeping at least two is so you always have a recovery point if something catastrophic happens during the backup process.note that having base backups plus wal archives allows PiTR too, point in time recovery.   say you discover a bad script updated a ton of bad data into your database last tuesday at 3pm.  you can restore the preceding base backup, then recover up to tuesday just before this event.Exactly - that's one of the primary reasons I'm taking this approach.Thanks again for the info!-- john r pierce, recycling bits in santa cruz-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread John R Pierce

On 11/23/2016 10:16 AM, Israel Brewster wrote:


To mitigate these issues, I am thinking of supplementing the 
individual dumps with a full base backup and WAL archiving to a NFS 
share. This should enable (relatively) quick/easy recovery from 
backup, plus the ability to do PIT Recovery. I do have a few questions 
with this approach, however:


if you do the base backup periodically (daily? weekly?   monthly? kinda 
depends on your workload, data volume, and available space on your WAL 
archive server)


- How do I figure out how often I should take a full base backup? I 
know this will depend on the amount of traffic my database is doing, 
and how long I am willing to wait for WAL files to be replayed - the 
more WAL files needing replayed, the longer recovery will take - but 
is there some rule of thumb that I can use to calculate how often I 
need a new base backup? Perhaps based on the number of WAL files?


frequency of doing this is entirely dependent on your data volume, how 
long the backups take, and your tolerance for restore times. wal archive 
recoveries are generally quite fast, but of course if there's millions 
of wal files to recover, that would take a long time.  note, too, 
9.6 has some major performance enhancements in how checkpoints are 
written, which should speed things up overall.


- What is the "best" (or just a good) method of keeping the WAL 
archives under control? Obviously when I do a new basebackup I can 
"cleanup" any old files that said backup doesn't need, but how do I 
know what those are?
- Should I be looking at any other backup methods in addition 
to/instead of the basebackup/WAL archive scheme?


I would keep at least 2 base backups and all wal files since the start 
of the oldest base backup.when you create a new base backup, delete 
the oldest, and all wal archive files up to the start of the following 
one. the reason for keeping at least two is so you always have a 
recovery point if something catastrophic happens during the backup process.


note that having base backups plus wal archives allows PiTR too, point 
in time recovery.   say you discover a bad script updated a ton of bad 
data into your database last tuesday at 3pm.  you can restore the 
preceding base backup, then recover up to tuesday just before this event.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Backup "Best Practices"

2016-11-23 Thread Israel Brewster
I was wondering if someone could inform me about, or point me to an online article about best practices for backing up a postgresql database cluster? At the moment, I have two servers running with streaming replication for failover purposes, and then I do nightly database dumps for recovery/backup purposes. However, I feel this approach is lacking in a couple areas I can think of:- If a total failure occurs towards the end of the day, we could potentially loose a whole days worth of data. Similar argument for user error - there is no way to "undo" a catastrophic user data error without going all the way back to the previous day- Less-than-ideal recovery under some scenarios. Since each database in the cluster is dumped individually, this is good should only *one* database need to be restored, but could get tedious should the entire cluster need to be restored.To mitigate these issues, I am thinking of supplementing the individual dumps with a full base backup and WAL archiving to a NFS share. This should enable (relatively) quick/easy recovery from backup, plus the ability to do PIT Recovery. I do have a few questions with this approach, however:- How do I figure out how often I should take a full base backup? I know this will depend on the amount of traffic my database is doing, and how long I am willing to wait for WAL files to be replayed - the more WAL files needing replayed, the longer recovery will take - but is there some rule of thumb that I can use to calculate how often I need a new base backup? Perhaps based on the number of WAL files?- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need, but how do I know what those are?- Should I be looking at any other backup methods in addition to/instead of the basebackup/WAL archive scheme?Thanks for any information!
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD