Re: [GENERAL] Basic Question on Point In Time Recovery

2015-04-07 Thread Jim Nasby

On 3/11/15 6:46 AM, Andrew Sullivan wrote:

Is our current frequent pg_dump approach a sensible way to go about
things.  Or are we missing something?  Is there some other way to
restore one database without affecting the others?

Slony-I, which is a PITA to administer, has a mode where you can ship
logs off and restore them in pieces.  The logs are not WAL, but Slony
logs (which are produced by triggers and some explicit event writing
for schema changes).  So they work per-database.  Schema changes are
really quite involved for Slony, and there's overhead resulting from
the triggrs, and as I said it's rather clunky to administer.  But it's
been around some time, it still is actively maintained, and it has
this functionality.  The PITR tools were, last I checked, pretty
primitive.  But the tool might work for your case.  I don't know
whether Bucardo or Londiste (two alternative systems that work on
roughly the same principle) have this functionality, but I kind of
doubt it since both were designed to get rid of several of the
complexities that Slony presented.  (Slony had all those complexities
because it was trying to offer all this functionality at once.)


You could do something very similar with londiste by setting up a second 
queue and delaying when you move data to it from the primary queue, 
based on event_time.


Or now that I think about it... you might be able to do that right in 
the replay process itself.


The big reason I prefer londiste over Slony is that it's extremely 
modular, so it's easy to do stuff like this.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-14 Thread Francisco Olarte
Hi Robert:

On Thu, Mar 12, 2015 at 12:52 PM, Robert Inder rob...@interactive.co.uk wrote:
 On 11 March 2015 at 17:32, Francisco Olarte fola...@peoplecall.com wrote:
 This is, build an streaming replication slave, pg_dump from the slave. If
 needed, restore in the master.
...
 I really like the idea of running pg_dump on the slave, but I don't understand
 how I could do it.

 Or Is this something that has changed with Postgres 9?
 We're currently running Postgres 8.4.
 Is this my specific reason to embark on an upgrade?

It's been answered before, but yes, streaming replication is a (
someone may say THE ) 9.0 feature. My fault, I've been using it for so
long I forgot the past. If you are using log shipping I would
recommend some tests and considering an upgrade, I've found
replication is much simpler and easier with it.

Regards.
Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-14 Thread Francisco Olarte
Hi Steven:

On Fri, Mar 13, 2015 at 9:03 PM, Steven Lembark lemb...@wrkhors.com wrote:
  The thing is you can use desktop class machines for the slave. If you do
.

 If load on the backup server becomes an issue you might be able to
 make incremental pg_dump's onto tmpfs.

I'm curious, how do you make incremental backups using pg_dump? I do
not know it, haven't been able to find it in the docs and it would be
a great solutions for a couple of problems I have.

 Advantage there is that the
 dump iteslf has effectively no write I/O overhead: you can dump to
 tmpfs and then [bg]zip to stable storage w/o beating up the disks,
 which becomes a real problem with comodity-grade hardware.

I do not think this ramdisks are a good thing for that. Let me
explain, if you have tmpfs, you have devent pipe support in the shell.
If you can put the uncompressed dump in a ramdisk, you are going to be
able to fit the compressed one in a write-through disk cache. So you
just dump compressed ( either piping through a compressor or using
pg_dump's custom compressed format ) and let the write cache do its
magic ( also, it's going to beat the disk less ( or equal, if your RAM
is really big for the task ). If you use a ramdisk and compress you
have less ram to cache the compressed dump, write cache is gonna
perform worse, disk is gonna be beaten more ). Also, tmpfs means less
ram for disk cache and shared buffer for the database. On a machine
needing the RAM my money is in it making the things slower ( on an
scenario like 8Gb db on 64Gb machine whatever you do is going to be
fast enough, so just peek the simpler to code ).

And regarding compression, pg_dump's one does normally perform good
enough unless you have strange data, I've done several tests and found
in normal data it does not pay to either raise the level ( little
space gain for the cycles taken ) or try alternate ones ( I normally
always use custom format due to it's ability to easily do partial
restores. Compressing it outside means you need to decompress to use.
). Also, I did some compressor tests for intermediate files, and found
for many reading/writing using gzip/lzo was faster than uncompressed (
specially with modern multicore machines ), xz was the compression
king and I was severely disapointed by bzip2 ( for any option combo in
bzip2 I had another in xz which beat it in BOTH size AND time, YMMV ).

Regards.

   Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-13 Thread Robert Inder
On 12 March 2015 at 12:31, Thomas Kellerer spam_ea...@gmx.net wrote:

 8.4 cannot run queries on the standby, you need to upgrade to a 
 supported/maintained version for this
 (this feature was introduced in 9.0)

 In 9.x you can start the slave as a hot standby to allow read only queries 
 which is what pg_dump needs.

 You should really upgrade to a current version 9.4 or 9.3

Thought you were going to say that:-(

Well, I guess we have to do it some time, and now there is a reason
for it to happen sooner rather than later...

But even if (OK, though)  I'm doing that, Steven's suggestion of
making the dump to a ram file system, then filing it as a separate
step, looks simple enough to be worth trying as a stop-gap...

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-13 Thread Steven Lembark

 The thing is you can use desktop class machines for the slave. If you do
 not have spare machines I would suggest a desktop class machine with big
 RAM and whatever disks you need for the DB plus an extra disk to pg_dump to
 ( so pg_dump does not compete with DB for the db disks, this really kills
 performance ). Replication slaves do not need that much RAM ( as the only
 query it is going to run is the pg_dump ones, but desktop ram is cheap ).
 We did this with a not so powerful desktop with an extra sata disk to store
 the pg_dumps and it worked really well, and we are presently using two
 servers, using one of the extra gigabit interfaces with a crossover cable
 for the replication connection plus an extra sata disk to make hourly
 pg_dumps and it works quite well.

If load on the backup server becomes an issue you might be able to 
make incremental pg_dump's onto tmpfs. Advantage there is that the 
dump iteslf has effectively no write I/O overhead: you can dump to 
tmpfs and then [bg]zip to stable storage w/o beating up the disks, 
which becomes a real problem with comodity-grade hardware.

-- 
Steven Lembark 3646 Flora Pl
Workhorse Computing   St Louis, MO 63110
lemb...@wrkhors.com  +1 888 359 3508


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-12 Thread Stéphane Schildknecht
Hello,

On 11/03/2015 11:54, Robert Inder wrote:
 We are developing a new software system which is now used by a number
 of independent clients for gathering and storing live data as part of
 their day to day work.
 
 We have a number of clients sharing a single server.  It is running
 one Postgres service, and each client is a separate user with access
 to their own database.  Each client's database will contain hundreds
 of thousands of records, and will be supporting occasional queries by
 a small number of users.   So the system is currently running on
 modest hardware.
 
 To guard against the server failing, we have a standby server being
 updated by WAL files, so if the worst comes to the worst we'll only
 lose a few minutes work.  No problems there.
 
 But, at least while the system is under rapid development, we also
 want to have a way to roll a particular client's database back to a
 (recent) known good state, but without affecting any other client.
 
 My understanding is that the WAL files mechanism is installation-wide
 -- it will affect all clients alike.
 
 So to allow us to restore data for an individual client, we're running
 pg_dump once an hour on each database in turn.  In the event of a
 problem with one client's system, we can restore just that one
 database, without affecting any other client.
 
 The problem is that we're finding that as the number of clients grows,
 and with it the amount of data, pg_dump is becoming more intrusive.
 Our perception is that when pg_dump is running for any database,
 performance on all databases is reduced.  I'm guessing this is because
 the dump is making heavy use of the disk.

One way you could choose is to have a server acting as WAL archiver.

pg_basebackup your slave every day, and store all WAL until new pg_basebackup
is taken.

Whenever you have to restore a single customer, you could recover the whole
instance up to the time *before* the worst happend and pg_dump the customer,
and pg_restore it.

Doing that, you won't have to pg_dump avery one hour or so all of your 
databases.



 
 There is obviously scope for improving performance by getting using
 more, or more powerful, hardware.  That's obviously going to be
 necessary at some point, but it is obviously an expense that our
 client would like to defer as long as possible.
 
 So before we go down that route, I'd like to check that we're not
 doing something dopey.
 
 Is our current frequent pg_dump approach a sensible way to go about
 things.  Or are we missing something?  Is there some other way to
 restore one database without affecting the others?
 
 Thanks in advance.
 
 Robert.
 


-- 
Stéphane Schildknecht
Contact régional PostgreSQL pour l'Europe francophone
Loxodata - Conseil, expertise et formations
06.17.11.37.42



signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-12 Thread Joseph Kregloh
Have you looked into Barman? http://www.pgbarman.org/ It does what you
want. You can take a full daily backup and it keeps track of the WAL files
to allow for a PITR. It also allows you to run the backup from one of your
slaves.

The way we have it setup is as follows: We have three servers, one master
and two slaves. The master ships WAL files to both slaves. One of the
slaves has Barman installed on it. Barman takes a copy of the WAL files and
archives it, then nightly we do a full backup from the slave.

This takes the load of the master and allows us to have a PITR with a
minimal full backup of one day.

Thanks,
-Joseph Kregloh

On Wed, Mar 11, 2015 at 9:26 AM, Stéphane Schildknecht 
stephane.schildkne...@postgres.fr wrote:

 Hello,

 On 11/03/2015 11:54, Robert Inder wrote:
  We are developing a new software system which is now used by a number
  of independent clients for gathering and storing live data as part of
  their day to day work.
 
  We have a number of clients sharing a single server.  It is running
  one Postgres service, and each client is a separate user with access
  to their own database.  Each client's database will contain hundreds
  of thousands of records, and will be supporting occasional queries by
  a small number of users.   So the system is currently running on
  modest hardware.
 
  To guard against the server failing, we have a standby server being
  updated by WAL files, so if the worst comes to the worst we'll only
  lose a few minutes work.  No problems there.
 
  But, at least while the system is under rapid development, we also
  want to have a way to roll a particular client's database back to a
  (recent) known good state, but without affecting any other client.
 
  My understanding is that the WAL files mechanism is installation-wide
  -- it will affect all clients alike.
 
  So to allow us to restore data for an individual client, we're running
  pg_dump once an hour on each database in turn.  In the event of a
  problem with one client's system, we can restore just that one
  database, without affecting any other client.
 
  The problem is that we're finding that as the number of clients grows,
  and with it the amount of data, pg_dump is becoming more intrusive.
  Our perception is that when pg_dump is running for any database,
  performance on all databases is reduced.  I'm guessing this is because
  the dump is making heavy use of the disk.

 One way you could choose is to have a server acting as WAL archiver.

 pg_basebackup your slave every day, and store all WAL until new
 pg_basebackup
 is taken.

 Whenever you have to restore a single customer, you could recover the whole
 instance up to the time *before* the worst happend and pg_dump the
 customer,
 and pg_restore it.

 Doing that, you won't have to pg_dump avery one hour or so all of your
 databases.



 
  There is obviously scope for improving performance by getting using
  more, or more powerful, hardware.  That's obviously going to be
  necessary at some point, but it is obviously an expense that our
  client would like to defer as long as possible.
 
  So before we go down that route, I'd like to check that we're not
  doing something dopey.
 
  Is our current frequent pg_dump approach a sensible way to go about
  things.  Or are we missing something?  Is there some other way to
  restore one database without affecting the others?
 
  Thanks in advance.
 
  Robert.
 


 --
 Stéphane Schildknecht
 Contact régional PostgreSQL pour l'Europe francophone
 Loxodata - Conseil, expertise et formations
 06.17.11.37.42




Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-12 Thread Robert Inder
Thanks for your comments (so far:-)

I guess I'm pleased that nobody has said that I'm doing something stupid!

I'll certainly look at Slony and Barman.

And Stephane's suggestion of doing regular basebackups and keeping the
WAL files seems neat.  If I under stand it, we'd use the/a standby
server to replay the entire installation up to the point when the
problem occurs, and then use pg_dump to dump just the database we
need.

I'm wondering just how the size of a day's worth of WAL files would
compare to a whole slew of hourly dumps.

The other issue would be how long the replay would take.  But, I
realise, that's not a major concern: the delay would only be seen by a
client that
had had a major problem.  Everyone else would see service as normal.

I think I'll be doing some experiments to find out:-)

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-12 Thread Robert Inder
Hi, Francisco,

On 11 March 2015 at 17:32, Francisco Olarte fola...@peoplecall.com wrote:

 This is, build an streaming replication slave, pg_dump from the slave. If
 needed, restore in the master.

I really like the idea of running pg_dump on the slave, but I don't understand
how I could do it.

Postgres on our live machine is regularly pushing WAL files to the standby.

Postgres on the standby machine is continually reading those files.
But that is all it will do. pg_dump just says The database is starting up.

Could/should I have something configured differently?

Or Is this something that has changed with Postgres 9?
We're currently running Postgres 8.4.
Is this my specific reason to embark on an upgrade?

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-12 Thread Thomas Kellerer
Robert Inder schrieb am 12.03.2015 um 12:52:
 Postgres on the standby machine is continually reading those files.
 But that is all it will do. pg_dump just says The database is starting up.
 
 Could/should I have something configured differently?
 
 Or Is this something that has changed with Postgres 9?
 We're currently running Postgres 8.4.
 Is this my specific reason to embark on an upgrade?

8.4 cannot run queries on the standby, you need to upgrade to a 
supported/maintained version for this
(this feature was introduced in 9.0)

In 9.x you can start the slave as a hot standby to allow read only queries 
which is what pg_dump needs.

You should really upgrade to a current version 9.4 or 9.3 

Thomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-11 Thread Francisco Olarte
​Hi Robert...​

On Wed, Mar 11, 2015 at 11:54 AM, Robert Inder rob...@interactive.co.uk
wrote:

 Is our current frequent pg_dump approach a sensible way to go about
 things.  Or are we missing something?  Is there some other way to
 restore one database without affecting the others?


​As you've been told before, pg_dump is the way to go and it hits hard on
the IO load. Also, depending on where you are dumping to you may be hitting
yourself on the foot ( dump to another disk, or on another machine ).

You may try streaming replication + pg_dump, we are currently doing this,
although not in your exact scenario.

This is, build an streaming replication slave, pg_dump from the slave. If
needed, restore in the master.

The thing is you can use desktop class machines for the slave. If you do
not have spare machines I would suggest a desktop class machine with big
RAM and whatever disks you need for the DB plus an extra disk to pg_dump to
( so pg_dump does not compete with DB for the db disks, this really kills
performance ). Replication slaves do not need that much RAM ( as the only
query it is going to run is the pg_dump ones, but desktop ram is cheap ).
We did this with a not so powerful desktop with an extra sata disk to store
the pg_dumps and it worked really well, and we are presently using two
servers, using one of the extra gigabit interfaces with a crossover cable
for the replication connection plus an extra sata disk to make hourly
pg_dumps and it works quite well.

Francisco Olarte.


[GENERAL] Basic Question on Point In Time Recovery

2015-03-11 Thread Robert Inder
We are developing a new software system which is now used by a number
of independent clients for gathering and storing live data as part of
their day to day work.

We have a number of clients sharing a single server.  It is running
one Postgres service, and each client is a separate user with access
to their own database.  Each client's database will contain hundreds
of thousands of records, and will be supporting occasional queries by
a small number of users.   So the system is currently running on
modest hardware.

To guard against the server failing, we have a standby server being
updated by WAL files, so if the worst comes to the worst we'll only
lose a few minutes work.  No problems there.

But, at least while the system is under rapid development, we also
want to have a way to roll a particular client's database back to a
(recent) known good state, but without affecting any other client.

My understanding is that the WAL files mechanism is installation-wide
-- it will affect all clients alike.

So to allow us to restore data for an individual client, we're running
pg_dump once an hour on each database in turn.  In the event of a
problem with one client's system, we can restore just that one
database, without affecting any other client.

The problem is that we're finding that as the number of clients grows,
and with it the amount of data, pg_dump is becoming more intrusive.
Our perception is that when pg_dump is running for any database,
performance on all databases is reduced.  I'm guessing this is because
the dump is making heavy use of the disk.

There is obviously scope for improving performance by getting using
more, or more powerful, hardware.  That's obviously going to be
necessary at some point, but it is obviously an expense that our
client would like to defer as long as possible.

So before we go down that route, I'd like to check that we're not
doing something dopey.

Is our current frequent pg_dump approach a sensible way to go about
things.  Or are we missing something?  Is there some other way to
restore one database without affecting the others?

Thanks in advance.

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-11 Thread Andrew Sullivan
On Wed, Mar 11, 2015 at 10:54:56AM +, Robert Inder wrote:
 But, at least while the system is under rapid development, we also
 want to have a way to roll a particular client's database back to a
 (recent) known good state, but without affecting any other client.
 
 My understanding is that the WAL files mechanism is installation-wide
 -- it will affect all clients alike.

Right.  It's the WAL, so everything in the relevant Postgres system is
involved.

 So before we go down that route, I'd like to check that we're not
 doing something dopey.

No, frequent pg_dumps are indeed hard on I/O.

 Is our current frequent pg_dump approach a sensible way to go about
 things.  Or are we missing something?  Is there some other way to
 restore one database without affecting the others?

Slony-I, which is a PITA to administer, has a mode where you can ship
logs off and restore them in pieces.  The logs are not WAL, but Slony
logs (which are produced by triggers and some explicit event writing
for schema changes).  So they work per-database.  Schema changes are
really quite involved for Slony, and there's overhead resulting from
the triggrs, and as I said it's rather clunky to administer.  But it's
been around some time, it still is actively maintained, and it has
this functionality.  The PITR tools were, last I checked, pretty
primitive.  But the tool might work for your case.  I don't know
whether Bucardo or Londiste (two alternative systems that work on
roughly the same principle) have this functionality, but I kind of
doubt it since both were designed to get rid of several of the
complexities that Slony presented.  (Slony had all those complexities
because it was trying to offer all this functionality at once.)

Best regards,

A


-- 
Andrew Sullivan
a...@crankycanuck.ca


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-11 Thread Joseph Kregloh
On Wed, Mar 11, 2015 at 9:55 AM, Andrew Sullivan a...@crankycanuck.ca
wrote:

 On Wed, Mar 11, 2015 at 09:40:09AM -0400, Joseph Kregloh wrote:
  Have you looked into Barman? http://www.pgbarman.org/ It does what you
  want.  You can take a full daily backup and it keeps track of the WAL
 files
  to allow for a PITR.

 I just had a look at the documentation (and the rest of your mail),
 and this doesn't actually seem to do what the OP wanted, which is to
 get PITR _per database_ in the same cluster.  Upthread someone
 suggested a way around this, which is to PITR a cluster to a
 known-good point and then pg_dump the target database.


This is where Barman comes in. It would take care of handling all of the
stuff required for a PITR. Then he can do the pg_dump for a selected
database. Barman does backup the entire cluster.


 But if Barman
 can do this automatically, that'd be cool (it's just not in the docs).

 Barman does look like a nice convenience package for managing
 WAL-shipping type backup installations instead of building one's own
 scripts, so this note isn't intended as a criticism of the package.
 I'm just not sure it does the thing requested in this case.

 A

 --
 Andrew Sullivan
 a...@crankycanuck.ca


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-11 Thread Andrew Sullivan
On Wed, Mar 11, 2015 at 09:40:09AM -0400, Joseph Kregloh wrote:
 Have you looked into Barman? http://www.pgbarman.org/ It does what you
 want.  You can take a full daily backup and it keeps track of the WAL files
 to allow for a PITR.

I just had a look at the documentation (and the rest of your mail),
and this doesn't actually seem to do what the OP wanted, which is to
get PITR _per database_ in the same cluster.  Upthread someone
suggested a way around this, which is to PITR a cluster to a
known-good point and then pg_dump the target database.  But if Barman
can do this automatically, that'd be cool (it's just not in the docs).

Barman does look like a nice convenience package for managing
WAL-shipping type backup installations instead of building one's own
scripts, so this note isn't intended as a criticism of the package.
I'm just not sure it does the thing requested in this case.

A

-- 
Andrew Sullivan
a...@crankycanuck.ca


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general