Re: [GENERAL] Large Database Restore

2007-05-21 Thread Richard Huxton

Lee Keel wrote:

So then the best way to do this kind of backup\restore is to use pg_dump?
Is there any plan in the future to be able to do some sort of file-level
backup like SqlServer?


Oh you *can* do a file-level backup, but only of the entire cluster. If 
you have information shared between databases (users, transaction IDs 
etc) then I don't see any way for single-database file-copying to work.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Lee Keel
Thanks to everyone for their input on this.  After reading all the emails
and some of the documentation (section 23.3), I think this is all a little
more than what I need.  My database is basically read-only and all I was
looking to do is to be able to take snap-shots of it and be able to restore
on a developer's machine and not take 30 hours.  So I was thinking I would
zip the data directories associated to my database, then the developer could
just copy the zip file and unzip in their own data directory.  My question
now is: what file would a developer need to change to add this new directory
to their database list, or will it just automatically show up when they
refresh the service?

 

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Nolan
Sent: Thursday, May 17, 2007 7:03 PM
To: Ron Johnson; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Large Database Restore

 

 

On 5/17/07, Ron Johnson [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]  wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/17/07 16:49, Michael Nolan wrote:
 I don't know if my database is typical (as there probably is no such
 thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my 
 backup server, but to restore a low level backup (about 35GB)

Meaning a tarball of $PGDATA?

Actually, it's two different areas because I've got a second tablespace on a
separate physical drive for indexes, but yes, it's a tarball of all the
files that PG uses, following the procedures in section 23.3 of the
documentation.

It works very well, though I still don't understand why, if there are no
changes to the warm standby server tables, only queries, it isn't possible
to keep restoring WAL files to keep the warm standby server in parallel with
the live server.  (I'm guessing there must be SOMETHING that happens at the
end of the recovery process, or some time after that, to make the next WAL
unprocessable., but I can't figure it out from the docs.) 
--
Mike Nolan

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the sender. This message 
contains confidential information and is intended only for the individual 
named. If you are not the named addressee you should not disseminate, 
distribute or copy this e-mail.


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Richard Huxton

Lee Keel wrote:

Thanks to everyone for their input on this.  After reading all the emails
and some of the documentation (section 23.3), I think this is all a little
more than what I need.  My database is basically read-only and all I was
looking to do is to be able to take snap-shots of it and be able to restore
on a developer's machine and not take 30 hours.  So I was thinking I would
zip the data directories associated to my database, then the developer could
just copy the zip file and unzip in their own data directory.  My question
now is: what file would a developer need to change to add this new directory
to their database list, or will it just automatically show up when they
refresh the service?


You can't do a file-level backup of one database I'm afraid, only all 
databases in one installation.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Alvaro Herrera
Lee Keel escribió:
 So then the best way to do this kind of backup\restore is to use pg_dump?
 Is there any plan in the future to be able to do some sort of file-level
 backup like SqlServer?

Actually you can do single databases, but you must also include some
other directories besides the database directory.  You would need to
include everything, excluding things in the base directory, but not
exclude your databases directory in base.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Lee Keel
So then the best way to do this kind of backup\restore is to use pg_dump?
Is there any plan in the future to be able to do some sort of file-level
backup like SqlServer?

-LK

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 18, 2007 10:35 AM
To: Lee Keel
Cc: Michael Nolan; Ron Johnson; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Large Database Restore

Lee Keel wrote:
 Thanks to everyone for their input on this.  After reading all the emails
 and some of the documentation (section 23.3), I think this is all a little
 more than what I need.  My database is basically read-only and all I was
 looking to do is to be able to take snap-shots of it and be able to
restore
 on a developer's machine and not take 30 hours.  So I was thinking I would
 zip the data directories associated to my database, then the developer
could
 just copy the zip file and unzip in their own data directory.  My question
 now is: what file would a developer need to change to add this new
directory
 to their database list, or will it just automatically show up when they
 refresh the service?

You can't do a file-level backup of one database I'm afraid, only all 
databases in one installation.

-- 
   Richard Huxton
   Archonet Ltd
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the sender. This message 
contains confidential information and is intended only for the individual 
named. If you are not the named addressee you should not disseminate, 
distribute or copy this e-mail.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Richard Huxton

Alvaro Herrera wrote:

Lee Keel escribió:

So then the best way to do this kind of backup\restore is to use pg_dump?
Is there any plan in the future to be able to do some sort of file-level
backup like SqlServer?


Actually you can do single databases, but you must also include some
other directories besides the database directory.  You would need to
include everything, excluding things in the base directory, but not
exclude your databases directory in base.


Will that not cause trouble if you've added users though?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Alvaro Herrera
Richard Huxton escribió:
 Alvaro Herrera wrote:
 Lee Keel escribió:
 So then the best way to do this kind of backup\restore is to use pg_dump?
 Is there any plan in the future to be able to do some sort of file-level
 backup like SqlServer?
 
 Actually you can do single databases, but you must also include some
 other directories besides the database directory.  You would need to
 include everything, excluding things in the base directory, but not
 exclude your databases directory in base.
 
 Will that not cause trouble if you've added users though?

Huh, maybe it will, but then I don't see how.  Can you be more specific?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Richard Huxton

Alvaro Herrera wrote:

Richard Huxton escribió:

Alvaro Herrera wrote:

Lee Keel escribió:

So then the best way to do this kind of backup\restore is to use pg_dump?
Is there any plan in the future to be able to do some sort of file-level
backup like SqlServer?

Actually you can do single databases, but you must also include some
other directories besides the database directory.  You would need to
include everything, excluding things in the base directory, but not
exclude your databases directory in base.

Will that not cause trouble if you've added users though?


Huh, maybe it will, but then I don't see how.  Can you be more specific?


Well, user info is stored in global (as is the list of databases, and 
it looks like language handler definitions too). Run strings over the 
files and you'll see.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Large Database Restore

2007-05-18 Thread Alvaro Herrera
Richard Huxton escribió:
 Alvaro Herrera wrote:
 Richard Huxton escribió:
 Alvaro Herrera wrote:
 Lee Keel escribió:
 So then the best way to do this kind of backup\restore is to use 
 pg_dump?
 Is there any plan in the future to be able to do some sort of file-level
 backup like SqlServer?
 Actually you can do single databases, but you must also include some
 other directories besides the database directory.  You would need to
 include everything, excluding things in the base directory, but not
 exclude your databases directory in base.
 Will that not cause trouble if you've added users though?
 
 Huh, maybe it will, but then I don't see how.  Can you be more specific?
 
 Well, user info is stored in global (as is the list of databases, and 
 it looks like language handler definitions too). Run strings over the 
 files and you'll see.

Oh, the fear is that you would overwrite the new user files with the
data from the backup.  Yeah, that's correct, you would.  Also on restore
you would overwrite the pg_xlog and pg_clog areas and the control file,
which while good for your newly restored database, would render the
other databases corrupted.

Absolutely true.

The only way I see to make this work would be to freeze the involved
database (with vacuum freeze), then stop the postmaster cleanly, then
make the tarball of just the DB directory.  But then, if shared catalog
state changed between the backup and the restore (say, because you
create a table, which gets registered in the shared catalog
pg_shdepend), said changes would not be rolled back either leading to
possible problems later on.

I hadn't noticed this was so difficult to do!  (You can solve the
pg_shdepend problem by copying that database's entries elsewhere and
then restoring them along the tarball, but I'm not sure it's workable
for the other shared catalogs).

A lot more involved that you probably want anyway.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Large Database Restore

2007-05-17 Thread Jim C. Nasby
On Thu, May 17, 2007 at 08:19:08AM -0500, Lee Keel wrote:
 I am restoring a 51GB backup file that has been running for almost 26 hours.
 There have been no errors and things are still working.  I have turned fsync
 off, but that still did not speed things up.  Can anyone provide me with the
 optimal settings for restoring a large database?

Well, fsync=off is the biggest one. The others to look at:

maintenance_work_mem = 1GB (I generally have problems setting it over
1GB, and I'm not sure it would add much benefit)
checkpoint_timeout = 1 hour
checkpoint_segments = huge (you want to reduce the frequency of
checkpoints... probably to at least less than every 20 minutes)

Finally, if you've got a multi-CPU machine, you might want to build all
the indexes and table constraints as a separate step, and run them
through perl or something so that you'll utilize more than just one CPU.

Of course the downside to all of this is that it would mean starting
over from scratch.

Ultimately though, once your database gets past a certain size, you
really want to be using PITR and not pg_dump as your main recovery
strategy.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Large Database Restore

2007-05-17 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/17/07 11:04, Jim C. Nasby wrote:
[snip]
 
 Ultimately though, once your database gets past a certain size, you
 really want to be using PITR and not pg_dump as your main recovery
 strategy.

But doesn't that just replay each transaction?  It must manage the
index nodes during each update/delete/insert, and multiple UPDATE
statements means that you hit the same page over and over again.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGTL+3S9HxQb37XmcRAqGyAKDYxtahXCuZD0WkNV8fY8p48Wcn2gCgk3hQ
ExVOZQBDuVVafTqB1XD/Gno=
=6Pzi
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Large Database Restore

2007-05-17 Thread Ben

Yes, but the implication is that large databases probably don't update
every row between backup periods.

On Thu, 17 May 2007, Ron Johnson wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/17/07 11:04, Jim C. Nasby wrote:
[snip]


Ultimately though, once your database gets past a certain size, you
really want to be using PITR and not pg_dump as your main recovery
strategy.


But doesn't that just replay each transaction?  It must manage the
index nodes during each update/delete/insert, and multiple UPDATE
statements means that you hit the same page over and over again.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGTL+3S9HxQb37XmcRAqGyAKDYxtahXCuZD0WkNV8fY8p48Wcn2gCgk3hQ
ExVOZQBDuVVafTqB1XD/Gno=
=6Pzi
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Large Database Restore

2007-05-17 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, but that's not always a valid assumption.

And still PITR must update the index at each insert, which is much
slower than the bulk load then create index of pg_dump.

On 05/17/07 16:01, Ben wrote:
 Yes, but the implication is that large databases probably don't update
 every row between backup periods.
 
 On Thu, 17 May 2007, Ron Johnson wrote:
 
 On 05/17/07 11:04, Jim C. Nasby wrote:
 [snip]

 Ultimately though, once your database gets past a certain size, you
 really want to be using PITR and not pg_dump as your main recovery
 strategy.
 
 But doesn't that just replay each transaction?  It must manage the
 index nodes during each update/delete/insert, and multiple UPDATE
 statements means that you hit the same page over and over again.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGTMZGS9HxQb37XmcRAuMPAKCMfQxwJhGoVVKw/VGM4rai7pBnTwCgliwc
CfnCseBnXep4prffuqnQPNc=
=xE0J
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Large Database Restore

2007-05-17 Thread Michael Nolan

I don't know if my database is typical (as there probably is no such thing),
but to restore a full dump (pg_dumpall) takes over 4 hours on my backup
server, but to restore a low level backup (about 35GB) and then process 145
WAL files (since Tuesday morning when the last low level backup was run)
took me around 2 hours today
--
Mike Nolan


Re: [GENERAL] Large Database Restore

2007-05-17 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/17/07 16:49, Michael Nolan wrote:
 I don't know if my database is typical (as there probably is no such
 thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my
 backup server, but to restore a low level backup (about 35GB) 

Meaning a tarball of $PGDATA?

   and then
 process 145 WAL files (since Tuesday morning when the last low level
 backup was run) took me around 2 hours today

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGTN0BS9HxQb37XmcRArY4AKCzTzoe2XP1COyjpdWD+wGp5ACYoACfd9fx
336CjyTBXkgu/lXZewagcOA=
=+/58
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Large Database Restore

2007-05-17 Thread Michael Nolan

On 5/17/07, Ron Johnson [EMAIL PROTECTED] wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/17/07 16:49, Michael Nolan wrote:
 I don't know if my database is typical (as there probably is no such
 thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my
 backup server, but to restore a low level backup (about 35GB)

Meaning a tarball of $PGDATA?

Actually, it's two different areas because I've got a second tablespace on

a separate physical drive for indexes, but yes, it's a tarball of all the
files that PG uses, following the procedures in section 23.3 of the
documentation.

It works very well, though I still don't understand why, if there are no
changes to the warm standby server tables, only queries, it isn't possible
to keep restoring WAL files to keep the warm standby server in parallel with
the live server.  (I'm guessing there must be SOMETHING that happens at the
end of the recovery process, or some time after that, to make the next WAL
unprocessable., but I can't figure it out from the docs.)
--
Mike Nolan