subject:"\[HACKERS\] Issues Outstanding for Point In Time Recovery \(PITR\)"

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-18 Thread J. R. Nield


On Thu, 2002-07-18 at 15:36, Richard Tucker wrote:
> Sorry, I didn't delimit my comments correctly before.
> I'm offering our pg_copy/roll forward implementation to PostgreSQL.org if it
> finds it an acceptable contribution. Progress/Multera would hand over all
> rights to any code accepted.
> 
> I'd be willing to post a patch to HEAD -- where can I find instructions on
> how to do this?
> 

Instructions for how to get the latest development source are at:
  http://developer.postgresql.org/TODO/docs/cvs.html

If you want to post a context diff against 7.2.1 to pgsql-hackers first,
that would let us see what it does.

Let me know if there is anything else I can help you with.

-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-18 Thread Richard Tucker


Progress/Multera would release the hot backup/roll forward work to the
PostgreSQL Development group.
-regards
richt

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of J. R. Nield
Sent: Thursday, July 18, 2002 2:34 PM
To: [EMAIL PROTECTED]
Cc: Bruce Momjian; PostgreSQL Hacker
Subject: Re: [HACKERS] Issues Outstanding for Point In Time Recovery
(PITR)


Richard:

I can't quite follow this; maybe you sent a draft by accident. If you
want to post a patch against 7.2.1, or even better against HEAD in CVS,
that would be great. Or if you'd rather point me to your source online,
that would be good too.

I just want to clarify though: is this work released to the PostgreSQL
Development group by Progress and Multera, or do they still claim
copyright interest in it?

Regards,
J.R. Nield


On Thu, 2002-07-18 at 12:56, Richard Tucker wrote:
>
>
> -Original Message-
> From: J. R. Nield [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, July 17, 2002 8:13 PM
> To: [EMAIL PROTECTED]
> Cc: Bruce Momjian
> Subject: RE: [HACKERS] Issues Outstanding for Point In Time Recovery
> (PITR)
>
>
> On Wed, 2002-07-17 at 19:25, Richard Tucker wrote:
> > Regarding hot backup.  Our implementation of "pg_copy" does a hot
backup.
> > It turns off database checkpointing for the duration of the backup.
> Backups
> > all the files of the database cluster up to the wal file currently being
> > logged to.  It then acquires the WalInsertLock lock long enough to
backup
> > the current wal file.
>
> Does it then allow more writes to that WAL file? It would seem like you
> want to advance the log to the next file, so the sysadmin wouldn't have
> to choose which one of log-file number 3 he wants to use at restore.
>
> > Writes to the wal file are allowed during the backup except for the
> backing of the wal file current when the
> > backup completes.  That is the pg_xlog directory is the last directory
to
> be backed up.  The wal_files are backed
> > up in the order they were used.  Continued wal file logging is allowed
> until the backup reaches the current wal
> > file being written to.  To back up this last wal file the WalInsertLock
is
> held until the copy of the wal file
> > is complete.  So the backup stops update activity only long enough to
copy
> this last 16mb file.
>
> Also, what do you mean by 'turns off checkpointing'. You have to do a
> checkpoint, or at least flush the buffers, when you start the backup.
> Otherwise how do you know what LSN to start from at restore?
>
> > The pg_control file also gets backed up.  It contains the point in the
log
> at which to begin
> > the redo/roll forward.
> > By not allowing the redo point to advance while the backup goes on means
> that the startup processes' crash
> > recovery code will capture all the changes made to the database cluster
> while the backup was running.
>
>
> Anyway: Yes we'd love to see the code.
>
> > In what form would you like me to send the code to you e.g. as a patch,
> copy our whole source ...
>
> Since I've pretty-much got create/drop and index stuff working, if your
> code does the rest then we should be good to go.
>
> ;jrnield
>
>
> --
> J. R. Nield
> [EMAIL PROTECTED]
>
>
>
>
--
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-18 Thread J. R. Nield


Richard:

I can't quite follow this; maybe you sent a draft by accident. If you
want to post a patch against 7.2.1, or even better against HEAD in CVS,
that would be great. Or if you'd rather point me to your source online,
that would be good too.

I just want to clarify though: is this work released to the PostgreSQL
Development group by Progress and Multera, or do they still claim
copyright interest in it?

Regards,
J.R. Nield


On Thu, 2002-07-18 at 12:56, Richard Tucker wrote:
> 
> 
> -Original Message-
> From: J. R. Nield [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, July 17, 2002 8:13 PM
> To: [EMAIL PROTECTED]
> Cc: Bruce Momjian
> Subject: RE: [HACKERS] Issues Outstanding for Point In Time Recovery
> (PITR)
> 
> 
> On Wed, 2002-07-17 at 19:25, Richard Tucker wrote:
> > Regarding hot backup.  Our implementation of "pg_copy" does a hot backup.
> > It turns off database checkpointing for the duration of the backup.
> Backups
> > all the files of the database cluster up to the wal file currently being
> > logged to.  It then acquires the WalInsertLock lock long enough to backup
> > the current wal file.
> 
> Does it then allow more writes to that WAL file? It would seem like you
> want to advance the log to the next file, so the sysadmin wouldn't have
> to choose which one of log-file number 3 he wants to use at restore.
> 
> > Writes to the wal file are allowed during the backup except for the
> backing of the wal file current when the
> > backup completes.  That is the pg_xlog directory is the last directory to
> be backed up.  The wal_files are backed
> > up in the order they were used.  Continued wal file logging is allowed
> until the backup reaches the current wal
> > file being written to.  To back up this last wal file the WalInsertLock is
> held until the copy of the wal file
> > is complete.  So the backup stops update activity only long enough to copy
> this last 16mb file.
> 
> Also, what do you mean by 'turns off checkpointing'. You have to do a
> checkpoint, or at least flush the buffers, when you start the backup.
> Otherwise how do you know what LSN to start from at restore?
> 
> > The pg_control file also gets backed up.  It contains the point in the log
> at which to begin
> > the redo/roll forward.
> > By not allowing the redo point to advance while the backup goes on means
> that the startup processes' crash
> > recovery code will capture all the changes made to the database cluster
> while the backup was running.
> 
> 
> Anyway: Yes we'd love to see the code.
> 
> > In what form would you like me to send the code to you e.g. as a patch,
> copy our whole source ...
> 
> Since I've pretty-much got create/drop and index stuff working, if your
> code does the rest then we should be good to go.
> 
> ;jrnield
> 
> 
> --
> J. R. Nield
> [EMAIL PROTECTED]
> 
> 
> 
> 
-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian


J. R. Nield wrote:
> I will do something like this, but reserve 16 bytes for it just in case
> we change our minds. It needs to be different among systems on the same
> machine, so there needs to be a time value and a pseudo-random part as
> well. Also, 'hostname' will likely be the same on many machines
> (localhost.localdomain or similar).
> 
> The only reason I bothered with UUID's before is because they have a
> standard setup to make the possibility of collision extremely small, and
> I figured replication will end up using it someday.

Sure.  Problem is, we support so many platforms that any trickery is a
problem.  If they can change it in postgresql.conf, that should be
sufficient.

> Hot backup read/write requires that we force an advance in the logfile
> segment after the backup. We need to save all the logs between backup
> start and completion. Otherwise the files will be useless as a
> standalone system if the current logs somehow get destroyed (fire in the
> machine room, etc.).
> 
> The way I would do this is:
> 
>   create a checkpoint
>   do the block-by-block walk of the files using the bufmgr
>   create a second checkpoint
>   force the log to advance past the end of the current segment
>   save the log segments containing records between the
>first & second checkpont with the backup

Sounds good.

> Then if you restore the backup, you can recover to the point of the
> second checkpoint, even if the logs since then are all gone.

Good, you put the logs that happened during the backup inside the same
backup, make it consistent.  Makes sense.

> Right now the log segment size is fixed, so this means that we'd waste
> 8MB of log space on average to do a backup. Also, the way XLOG reads
> records right now, we have to write placeholder records into the empty
> space, because that's how it finds the end of the log stream. So I need
> to change XLOG to handle "skip records", and then to truncate the file
> when it gets archived, so we don't have to save up to 16MB of zeros.
> 
> Also, if archiving is turned off, then we can't recycle or delete any
> logs for the duration of the backup, and we have to save them.
> 
> So I'll finish the XLOG support for this, and then think about the
> correct way to walk through all the files.

Sounds like a good plan.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread J. R. Nield

On Wed, 2002-07-17 at 01:25, Bruce Momjian wrote:
> 
> We only patch configure.in.  If you post to hackers, they can give you
> assistance and I will try to help however I can.  I can so some
> configure.in stuff for you myself.

Thanks for the offer. The only thing I was changing it for was to test
whether and how to get a ethernet MAC address using ioctl, so libuuid
could use it if available. That is dropped now.

> 
> > Related to that, the other place I need advice is on adding Ted Tso's
> > LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
> > allowed to use this? There is a free OSF/DCE spec for UUID's, so I can
> > re-implement the library if required.
> 
> We talked about this on the replication mailing list.  We decided that
> hostname, properly hashed to an integer, was the proper way to get this
> value.  Also, there should be a postgresql.conf variable so you can
> override the hostname-generated value if you wish.  I think that is
> sufficient.

I will do something like this, but reserve 16 bytes for it just in case
we change our minds. It needs to be different among systems on the same
machine, so there needs to be a time value and a pseudo-random part as
well. Also, 'hostname' will likely be the same on many machines
(localhost.localdomain or similar).

The only reason I bothered with UUID's before is because they have a
standard setup to make the possibility of collision extremely small, and
I figured replication will end up using it someday.

> 
> > We also haven't discussed commands for backup/restore, but I will use
> > what I think is appropriate and we can change the grammar if needed. The
> > initial hot-backup capability will require the database to be in
> > read-only mode and use tar for backup, and I will add the ability to
> > allow writes later.
> 
> Yea, I saw Tom balked at that. I think we have enough manpower and time
> that we can get hot backup in normal read/write mode working before 7.3
> beta so I would just code it assuming the system is live and we can deal
> with making it hot-capable once it is in CVS.  It doesn't have to work
> 100% until beta time.

Hot backup read/write requires that we force an advance in the logfile
segment after the backup. We need to save all the logs between backup
start and completion. Otherwise the files will be useless as a
standalone system if the current logs somehow get destroyed (fire in the
machine room, etc.).

The way I would do this is:

  create a checkpoint
  do the block-by-block walk of the files using the bufmgr
  create a second checkpoint
  force the log to advance past the end of the current segment
  save the log segments containing records between the
   first & second checkpont with the backup

Then if you restore the backup, you can recover to the point of the
second checkpoint, even if the logs since then are all gone.

Right now the log segment size is fixed, so this means that we'd waste
8MB of log space on average to do a backup. Also, the way XLOG reads
records right now, we have to write placeholder records into the empty
space, because that's how it finds the end of the log stream. So I need
to change XLOG to handle "skip records", and then to truncate the file
when it gets archived, so we don't have to save up to 16MB of zeros.

Also, if archiving is turned off, then we can't recycle or delete any
logs for the duration of the backup, and we have to save them.

So I'll finish the XLOG support for this, and then think about the
correct way to walk through all the files.

-- 
J. R. Nield
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Richard Tucker


I don't know how our marketing came up third most popular but I think the
order is, Oracle, MySQL, and PostgreSQL or maybe Oracle, MSSQL and
PostgreSQL.  I'm sure there is some criterion by which PostgreSQL is tenth
and by some other its number one.
Of course, my posting was about Point In Time Recovery and not multera
marketing spin.
-regards
richt

-Original Message-
From: Bruce Momjian [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 17, 2002 6:29 PM
To: Rod Taylor
Cc: PostgreSQL-development; [EMAIL PROTECTED]; J. R. Nield
Subject: Re: [HACKERS] Issues Outstanding for Point In Time Recovery
(PITR)


Rod Taylor wrote:
> > server powered by PostgreSQL, the third most popular database, and a
> >
> >^
>
> > Wonder why we are "the third most popular database".  I think that's
> > good?
>
> You'll notice they didn't qualify where.  On this list, it's probably
> #1.  Within Progress software perhaps we're third most popular (whatever
> two are typically used in the InnerEdge are 1 and 2).

Yea, using that logic, #1 would be the Progress internal db system, #2
would be MySQL (though that seems doubtful at this point with Nusphere),
and PostgreSQL.

Actually, PostgreSQL is #1 in my home.

--
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread D'Arcy J.M. Cain


On July 17, 2002 05:45 pm, Richard Tucker wrote:
> We also have implemented a roll forward recovery mechanism. We modified a
> 7.2.1 version of Postgres.
> ...

Excellent!  I can't wait.  When will it be in current?

> The BIG THING we have not done is address the issue that add/drop tables
> and indexes do not propagate through the roll forward recovery mechanism
> properly.

I can live with that.  Schemas shouldn't change so often that you can't just 
dup any changes to the backup(s).

-- 
D'Arcy J.M. Cain|  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian


Rod Taylor wrote:
> > server powered by PostgreSQL, the third most popular database, and a
> > 
> >^
>  
> > Wonder why we are "the third most popular database".  I think that's
> > good?
> 
> You'll notice they didn't qualify where.  On this list, it's probably
> #1.  Within Progress software perhaps we're third most popular (whatever
> two are typically used in the InnerEdge are 1 and 2).

Yea, using that logic, #1 would be the Progress internal db system, #2
would be MySQL (though that seems doubtful at this point with Nusphere),
and PostgreSQL.

Actually, PostgreSQL is #1 in my home.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Rod Taylor


> server powered by PostgreSQL, the third most popular database, and a
> 
>^
 
> Wonder why we are "the third most popular database".  I think that's
> good?

You'll notice they didn't qualify where.  On this list, it's probably
#1.  Within Progress software perhaps we're third most popular (whatever
two are typically used in the InnerEdge are 1 and 2).


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian

Bruce Momjian wrote:
> Richard Tucker wrote:
> > 
> > We also have implemented a roll forward recovery mechanism. We modified a
> > 7.2.1 version of Postgres.
> > The mechanism is designed to provide a means of recoverying from the loss or
> > corruption of media.  It provides for duplicating wal_files so that if a
> > wal_file is lost roll forward recovery can recover the database using the
> > duplicated wal_files.  Hooks were also added so that the roll forward
> > recovery mechanism can be used to implement a hot standby database.
> > Along with the roll forward recovery mechanism we have also implemented an
> > online database backup utility which is synchronized with the recovery log
> > so that the backup can be a starting point for a roll forward recovery
> > session.
> 
> In researching who has done this work, I found that
> http://www.multera.com/ is Progress Software.  I assume this
> "Distributed but Connected" is partly based on PostgreSQL.

Oh, I see it now, PostgreSQL is right there:

* OuterEdge
  A distributed site technology suite that allows remote users to
maintain the integrity, not the congestion and price, of centralized
data. The OuterEdge includes the affordable UltraSQL^(TM) database
server powered by PostgreSQL, the third most popular database, and a

   ^
Secure Application/web server that supports the industry's most popular
Internet languages. Together with the Replication Engine software, the
OuterEdge gives remote users seamless access to important data without
relying on bandwidth.

Wonder why we are "the third most popular database".  I think that's
good?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian


Richard Tucker wrote:
> 
> We also have implemented a roll forward recovery mechanism. We modified a
> 7.2.1 version of Postgres.
> The mechanism is designed to provide a means of recoverying from the loss or
> corruption of media.  It provides for duplicating wal_files so that if a
> wal_file is lost roll forward recovery can recover the database using the
> duplicated wal_files.  Hooks were also added so that the roll forward
> recovery mechanism can be used to implement a hot standby database.
> Along with the roll forward recovery mechanism we have also implemented an
> online database backup utility which is synchronized with the recovery log
> so that the backup can be a starting point for a roll forward recovery
> session.

In researching who has done this work, I found that
http://www.multera.com/ is Progress Software.  I assume this
"Distributed but Connected" is partly based on PostgreSQL.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Richard Tucker



We also have implemented a roll forward recovery mechanism. We modified a
7.2.1 version of Postgres.
The mechanism is designed to provide a means of recoverying from the loss or
corruption of media.  It provides for duplicating wal_files so that if a
wal_file is lost roll forward recovery can recover the database using the
duplicated wal_files.  Hooks were also added so that the roll forward
recovery mechanism can be used to implement a hot standby database.
Along with the roll forward recovery mechanism we have also implemented an
online database backup utility which is synchronized with the recovery log
so that the backup can be a starting point for a roll forward recovery
session.

Roll forward recovery is enabled for a database cluster by specifying one of
two postmaster configuration parameters.

wal_file_reuse = false  This parameter tells the wal system to not reuse wal
files.  This option is intended for sites wishing to implement a hot standby
database where wal files will be periodically copied to another machine
where they will be rolled forward into the standby database.

wal_file_duplicate =   This parameter tells the system to
mirror the files in the pg_xlog directory to the specified directory.  This
allows for the recovery of a database where a wal file has been damaged or
lost.  It also allows for a variant of a hot standby database where the
duplicate directory is the pg_xlog directory of the standby database.

Since both of these options cause wal files to accumulate indefinately the
dba needs a means of purging wal files when they are no longer needed.  So
an sql command, "ALTER SYSTEM PURGE WAL_FILES ", has also
been implemented.  This command deletes all wal files up to and including
the specified  as long as those wal files are not needed to
recover the database in the event of a system crash.  To find out the status
of the wal files a function has been implemented to return a wal file name.
The function is:

Wal_file( )
Request_type := [ current | last | checkpoint | oldest]

Wal_file ('current') returns the name of the log file currently being
written to.
Wal_file('last') returns the name of the last log file filled.
Wal_file('checkpoint') returns the name of the file containing the current
redo position.  The current redo position is the position in the recovery
log where crash recovery would start if the system were to crash now.  All
logs prior to this one will not be needed to recover the database cluster
and could be safely removed.
Wal_file('oldest') returns the oldest xlog file found in the pg_xlog
directory.


To actually perform a roll forward you use the postmaster configuration
parameter "roll_forward=yes".  This parameter tells the startup process to
perform crash recovery even though the state of the database as found in the
pg_control file indicates a normal shutdown.  This is necessary since the
starting point of roll forward session could be the restore of a database
cluster that was shutdown in order to back it up.  Furthermore this
parameter tells the startup process not to write out a checkpoint record at
the end of the roll forward session. This allows for the database cluster to
receive subsequent wal files and to have those rolled forward as well.  When
starting the postmaster with the roll_forward=yes option, it shuts down the
database as soon as the startup process completes.  So the idea is to
restore a backup, copy all of your saved/duplicated wal files into the
pg_xlog directory of the restored database and start the postmaster with the
roll_forward option.

For point in time recovery there is also a roll_forward_until =  which
rolls forward through the wal files until the first transaction commit note
that is greater than or equal to the specified time.

The pg_copy utility performs an on line copy of a database cluster.  Its
syntax is:
pg_copy  [-h host] [-p port] ...
This makes a copy of the database where backup_directory is what you would
set PGDATA to in order start a postmaster against the backup copy.  The
database can be being updated while the copy occurs.  If you start a
postmaster against this copy it will appear to the startup process as a
database that crashed at the instant the pg_copy operation completed.
Futhermore the pg_copy utility automatically removes any wal files not
needed to recover the database from either pg_xlog directory or the
wal_file_duplicate directory.

So a DBA to protect the database from media loss just needs to set the
wal_file_duplicate paramater and periodically pg_copy the database.


The BIG THING we have not done is address the issue that add/drop tables and
indexes do not propagate through the roll forward recovery mechanism
properly.



-regards
Richard Tucker






---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread Bruce Momjian

J. R. Nield wrote:
> On Tue, 2002-07-16 at 15:36, Bruce Momjian wrote:
> > 
> > J.R., just checking to see how PITR recovery is going.  Do you need any
> > assistance or have any questions for us?
> > 
> > Also, do you have any idea how close you are to having something
> > completed?  Are you aware we are closing development of 7.3 at the end
> > of August and start beta September 1?  Is there any way we can help you?
> > 
> 
> It should be ready to go into CVS by the end of the month. 
> 
> That will include: logging all operations except for rtree and GiST,
> archival of logfiles (with new postgresql.conf params), headers on the
> logfiles to verify the system that created them, standalone backend
> recovery to a point-in-time, and a rudimentary hot backup capability.

Sounds great.  That gives us another month to iron out any remaining
issues.  This will be a great 7.3 feature!

> I could use some advice on the proper way to add tests to configure.in,
> given that the autoconf output is in CVS. Would you ever want a patch to
> include the generated 'configure' file?

We only patch configure.in.  If you post to hackers, they can give you
assistance and I will try to help however I can.  I can so some
configure.in stuff for you myself.

> Related to that, the other place I need advice is on adding Ted Tso's
> LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
> allowed to use this? There is a free OSF/DCE spec for UUID's, so I can
> re-implement the library if required.

We talked about this on the replication mailing list.  We decided that
hostname, properly hashed to an integer, was the proper way to get this
value.  Also, there should be a postgresql.conf variable so you can
override the hostname-generated value if you wish.  I think that is
sufficient.

> We also haven't discussed commands for backup/restore, but I will use
> what I think is appropriate and we can change the grammar if needed. The
> initial hot-backup capability will require the database to be in
> read-only mode and use tar for backup, and I will add the ability to
> allow writes later.

Yea, I saw Tom balked at that. I think we have enough manpower and time
that we can get hot backup in normal read/write mode working before 7.3
beta so I would just code it assuming the system is live and we can deal
with making it hot-capable once it is in CVS.  It doesn't have to work
100% until beta time.

> Does this sound like a reasonable timeframe/feature-set to make the 7.3
> release?

Sounds great.  This is another killer 7.3 feature, and we really need
this for greater enterprise acceptance of PostgreSQL.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread Tom Lane


"J. R. Nield" <[EMAIL PROTECTED]> writes:
> Related to that, the other place I need advice is on adding Ted Tso's
> LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
> allowed to use this?

Uh, why exactly is UUID essential for this?  (The correct answer is
"it's not", IMHO.)

> We also haven't discussed commands for backup/restore, but I will use
> what I think is appropriate and we can change the grammar if needed. The
> initial hot-backup capability will require the database to be in
> read-only mode and use tar for backup, and I will add the ability to
> allow writes later.

There is no read-only mode, and I for one will resist adding one.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread J. R. Nield

On Tue, 2002-07-16 at 15:36, Bruce Momjian wrote:
> 
> J.R., just checking to see how PITR recovery is going.  Do you need any
> assistance or have any questions for us?
> 
> Also, do you have any idea how close you are to having something
> completed?  Are you aware we are closing development of 7.3 at the end
> of August and start beta September 1?  Is there any way we can help you?
> 

It should be ready to go into CVS by the end of the month. 

That will include: logging all operations except for rtree and GiST,
archival of logfiles (with new postgresql.conf params), headers on the
logfiles to verify the system that created them, standalone backend
recovery to a point-in-time, and a rudimentary hot backup capability.

I could use some advice on the proper way to add tests to configure.in,
given that the autoconf output is in CVS. Would you ever want a patch to
include the generated 'configure' file?

Related to that, the other place I need advice is on adding Ted Tso's
LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
allowed to use this? There is a free OSF/DCE spec for UUID's, so I can
re-implement the library if required.

We also haven't discussed commands for backup/restore, but I will use
what I think is appropriate and we can change the grammar if needed. The
initial hot-backup capability will require the database to be in
read-only mode and use tar for backup, and I will add the ability to
allow writes later.

Does this sound like a reasonable timeframe/feature-set to make the 7.3
release?

-- 
J. R. Nield
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread Bruce Momjian



J.R., just checking to see how PITR recovery is going.  Do you need any
assistance or have any questions for us?

Also, do you have any idea how close you are to having something
completed?  Are you aware we are closing development of 7.3 at the end
of August and start beta September 1?  Is there any way we can help you?

---

J. R. Nield wrote:
> On Fri, 2002-07-05 at 01:42, Bruce Momjian wrote:
> > 
> > We have needed
> > point-in-time recovery for a long time, 
> 
> Most thanks should go to vadim (and whoever else worked on this), since
> his WAL code already does most of the work. The key thing is auditing
> the backend to look for every case where we assume some action is not
> visible until after commit, and therefore don't log its effects. Those
> are the main cases that must be changed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-09 Thread Hannu Krosing

On Tue, 2002-07-09 at 17:26, Tom Lane wrote:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > 1) record the lowest uncommitted transaction number (LUTN) , this may
> > have problems with wraparound, but I guess they are solvable. Disllow
> > VACUUM. Do a CHECKPOINT ('alter database begin backup')
> > 3) make a file-level (.tar) backup of data directory.
> > 4) Allow VACUUM. ('alter database end backup')
> 
> Transactions don't necessarily commit in sequence number order, so the
> concept of LUTN seems meaningless.

Not quite. It is the most simple way to be sure that if we invalidate
all transactions >= than it we get back to a fairly recent
Point-In-Time.

The real solution would of course be to remember all committed
transactions at this PIT, which can probably be done by remembering LUTN
and all individual committed transactions > LUTN

> Why is it necessary (or even good) to disallow VACUUM?

So that it would be possible to resurrect these tuples that have been
deleted/updated during disk-level backup.

I would like better the ability to tell VACUUM not to touch tuples where
deleting transaction number >= LUTN . IIRC the original postgres was
able to do that.

> I really dislike
> a design that allows the DBA to cripple the database by forgetting the
> last step in a (long) process.

There are several ways around it.

1. do it in a script, that will not forget.

2. Closing the session that did 'alter database begin backup' session
could do it automatically, but this would make the backup script
trickier.

3. VACUUM should not block but report a warning about being restricted
from running.

4. database can be instructed to send a message to DBA's pager if it has
been in 'begin backup' state too long ;)

Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-09 Thread Tom Lane

Hannu Krosing <[EMAIL PROTECTED]> writes:
> 1) record the lowest uncommitted transaction number (LUTN) , this may
> have problems with wraparound, but I guess they are solvable. Disllow
> VACUUM. Do a CHECKPOINT ('alter database begin backup')
> 3) make a file-level (.tar) backup of data directory.
> 4) Allow VACUUM. ('alter database end backup')

Transactions don't necessarily commit in sequence number order, so the
concept of LUTN seems meaningless.

Why is it necessary (or even good) to disallow VACUUM?  I really dislike
a design that allows the DBA to cripple the database by forgetting the
last step in a (long) process.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Barry Lind


I know that in Oracle there are 'alter database begin backup' and 'alter 
database end backup' commands that allow you to script your hot backups 
through a cron job by calling the begin backup command first, then using 
disk backup method of choice and then finally call the end backup command.

--Barry

Patrick Macdonald wrote:

>Zeugswetter Andreas SB SD wrote:
>  
>
>>>As noted, one of the main problems is knowing where to begin
>>>in the log.  This can be handled by having backup processing
>>>update the control file with the first lsn and log file
>>>required.  At the time of the backup, this information is or
>>>can be made available.  The control file can be the last file
>>>added to the tar and can contain information spanning the entire
>>>backup process.
>>>  
>>>
>>lsn and logfile number (of latest checkpoints) is already in the control
>>file, thus you need control file at start of backup. (To reduce the number
>>of logs needed for restore of an online backup you could force a checkpoint
>>before starting file backup)
>>
>>
>
>Maybe I should have been more clear.  The control file snapshot must 
>be taken at backup start (as you mention) but can be stored in cache.
>The fields can then be modified as we see fit.  At the end of backup,
>we can write this to a temp file and add it to the tar.  Therefore,
>as mentioned, the snapshot spans the entire backup process.
> 
>  
>
>>You will also need lsn and logfile number after file backup, to know how much
>>log needs to at least be replayed to regain a consistent state.
>>
>>
>
>This is a nicety but not a necessity. If you have a backup end log 
>record, you just have to enforce that the PIT recovery encounters 
>that particular log record on forward recovery.  Once encountered,
>you know that you at passed the point of back up end.
>
>Cheers,
>Patrick
>
>
>
>---(end of broadcast)---
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html
>
>
>
>  
>





---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Bruce Momjian

Zeugswetter Andreas SB SD wrote:
> 
> > OK, so you do a tar backup of a file.  While you are doing the tar,
> > certain 8k blocks are being modified in the file.  There is no way to
> > know what blocks are modified as you are doing the tar, and in fact you
> > could read partial page writes during the tar.
> 
> No, I think all OS's (Unix and NT at least) guard against this, as long as 
> the whole 8k block is written in one call. It is only the physical layer (disk) 
> that is prone to partial writes.

Yes, good point.  The kernel will present a unified view of the 8k
block.  Of course, there are still cases where 8k blocks are being
changed in front/behind in the tarred file.  Will WAL allow us to
re-synchronize that file even if part of it has pages from an earlier in
time than other pages.  Uh, I think so.  So maybe we don't need the
pre-write images in WAL after all.  Can we replay the WAL when some
pages in the restored file _have_ the WAL changes and some don't?  Maybe
the LSN on the pages helps with this.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Patrick Macdonald

Zeugswetter Andreas SB SD wrote:
> 
> > As noted, one of the main problems is knowing where to begin
> > in the log.  This can be handled by having backup processing
> > update the control file with the first lsn and log file
> > required.  At the time of the backup, this information is or
> > can be made available.  The control file can be the last file
> > added to the tar and can contain information spanning the entire
> > backup process.
> 
> lsn and logfile number (of latest checkpoints) is already in the control
> file, thus you need control file at start of backup. (To reduce the number
> of logs needed for restore of an online backup you could force a checkpoint
> before starting file backup)

Maybe I should have been more clear.  The control file snapshot must 
be taken at backup start (as you mention) but can be stored in cache.
The fields can then be modified as we see fit.  At the end of backup,
we can write this to a temp file and add it to the tar.  Therefore,
as mentioned, the snapshot spans the entire backup process.

> You will also need lsn and logfile number after file backup, to know how much
> log needs to at least be replayed to regain a consistent state.

This is a nicety but not a necessity. If you have a backup end log 
record, you just have to enforce that the PIT recovery encounters 
that particular log record on forward recovery.  Once encountered,
you know that you at passed the point of back up end.

Cheers,
Patrick

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Zeugswetter Andreas SB SD



> As noted, one of the main problems is knowing where to begin
> in the log.  This can be handled by having backup processing 
> update the control file with the first lsn and log file 
> required.  At the time of the backup, this information is or 
> can be made available.  The control file can be the last file
> added to the tar and can contain information spanning the entire
> backup process.

lsn and logfile number (of latest checkpoints) is already in the control 
file, thus you need control file at start of backup. (To reduce the number 
of logs needed for restore of an online backup you could force a checkpoint
before starting file backup)

You will also need lsn and logfile number after file backup, to know how much 
log needs to at least be replayed to regain a consistent state. 

Andreas



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Zeugswetter Andreas SB SD



> OK, so you do a tar backup of a file.  While you are doing the tar,
> certain 8k blocks are being modified in the file.  There is no way to
> know what blocks are modified as you are doing the tar, and in fact you
> could read partial page writes during the tar.

No, I think all OS's (Unix and NT at least) guard against this, as long as 
the whole 8k block is written in one call. It is only the physical layer (disk) 
that is prone to partial writes.

> any page that was modified while we were backing up is in the WAL.  On
> restore, we can recover whatever tar saw of the file, knowing that the
> WAL page images will recover any page changes made during the tar.

Assuming above, I do not think this is necessary.

> What I suggest is a way for the backup tar to turn on pre-change page
> images while the tar is happening, and turn it off after the tar is
> done.

Again, I do not think this is necessary.

Andreas



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-07 Thread Bruce Momjian

Patrick Macdonald wrote:
> The idea of using the last lsn on the page to detect a partial
> write is used by other dbms systems.  You already have that 
> information available so there is no overhead in computing it. 
> Nothing wrong with CRC though.

Agreed.  Just thought I would point out that is not guaranteed.  Suppose
the 8k block is spread over 16 512 sectors in two cylinders.  The OS or
SCSI tagged queuing could wrote the second part of the page (sectors
9-16) before the first group (1-8).  If it writes 9-16, then writes 1-8
but fails in the middle of 1-8, the LSN will match at the front and back
of the page, but the page will be partially written.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-06 Thread Patrick Macdonald

J.R.,

Nice first draft and a good read. Was going to comment 
in-line but thought this method would be easier to follow. 
The comments/suggestions below assume that PIT recovery is
being performed at the cluster level with a data backup 
image created by a tar-like utility. 

As noted, one of the main problems is knowing where to begin
in the log.  This can be handled by having backup processing 
update the control file with the first lsn and log file 
required.  At the time of the backup, this information is or 
can be made available.  The control file can be the last file
added to the tar and can contain information spanning the entire
backup process.

For data consistency, since the backup is being performed on 
an active cluster, we have to make sure to mark the end of the 
backup.  On restore, to make the cluster consistent, you have 
to force the user to perform forward recovery passed the point
of the backup completion marker in the (archived) log.  This 
can be handled using a backup end log record.  The backup end
log record would have to contain an identifier unique to this 
backup.  If a user requests to stop PIT recovery before this 
log record is encountered, consistency is not guaranteed. 
PIT should either disallow the action or warn of possible / 
impending doom.

The necessary logging for rtee (and others) insertions/deletions
can be added to the base code.  Not much of a worry but I would
expect to encounter other missing log items during testing.

The idea of using the last lsn on the page to detect a partial
write is used by other dbms systems.  You already have that 
information available so there is no overhead in computing it. 
Nothing wrong with CRC though.

As for the DB_SHUTDOWNED state, this could be handled by having
the backup processing update the control file field to 
DB_PIT_REQUIRED (or some such identifier).  After a restore,
users would be blocked from connecting to the cluster's databases 
until a forward recovery passed the backup end log record has
completed successfully.  

At the end of normal crash recovery, the user has to go digging
to identify in-flight transactions still in the system and abort
them manually.  It would be nice if PIT recovery automatically
aborted all in-flight transactions at the end. 

As PostgreSQL heads towards forward recovery functionality, it
may be wise to add headers to the log files.  As the logs from
any cluster are identically named, the header would allow unique
identification of the file and contents (cluster name, unique 
log id, id of the prior log file for chaining purposes, lsn 
ranges, etc). Most helpful.

Just a few notes from the administrative side. PIT recovery
should probably offer the user the following actions:

. forward recover to end of logs [and stop]
  Process log files located in the current directory until you
  read through the last one.  Allow the user the option to stop
  or not, just in case the logs are archived.  Send back the
  timestamp of the last encountered commit log record and the
  series of log files scanned.   

. forward recover to PIT [and stop]
  Similar to that described above but use the commit timestamps
  to gauge PIT progress.  

. forward recover query
  Send back the log series covered and the last commit timestamp
  encountered. 

. forward recover stop
  Stop the current forward recovery session. Undo all in-flight
  transactions and bring the databases down in a consistent
  state.  No other external user actions should be required.

Looking forward to reading draft 2.

Cheers,
Patrick
--
Patrick Macdonald 
Red Hat Canada 

"J. R. Nield" wrote:
> 
> Hello:
> 
> I've got the logging system to the point where I can take a shutdown
> consistent copy of a system, and play forward through multiple
> checkpoints. It seems to handle CREATE TABLE/DROP TABLE/TRUNCATE
> properly, and things are moving forward well. Recovery to an arbitrary
> point-in-time should be just as easy, but will need some administrative
> interface for it.
> 
> At this point, some input would be useful on how I should handle things.
> 
> The most important questions that need answering are in sections 2 & 5,
> since they impact the most other parts of the system. They will also
> require good documentation for sysadmins.
> 
> Issues Outstanding for Point In Time Recovery (PITR)
> 
> $Date: 2002/07/04 14:23:37 $
> 
> $Revision: 1.4 $
> 
> J.R. Nield
> 
> (Enc: ISO 8859-15 Latin-9)
> 
> §0 - Introduction
> 
> This file is where I'm keeping track of all the issues I run into while
> trying to get PITR to work properly. Hopefully it will evolve into a
> description of how PITR actually works once it is implemented.
> 
> I will also try to add feedback as it comes in.
> 
> The big items so-far are:
> §1 - Logging Relation file creation, truncation, and removal
> This is mostly done. Can do infinte play-forward from
> online logs.
> §2 - Partial-

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-05 Thread Bruce Momjian

Bruce Momjian wrote:
> You are saying, "How do we know what WAL records go with that backup
> snapshot of the file?"  OK, lets assume we are shutdown.  You can grab
> the WAL log info from pg_control using contrib/pg_controldata and that
> tells you what WAL logs to roll forward when you need to PIT recover
> that backup later.  If you store that info in the first file you backup,
> you can have that WAL pointer available for later recovery in case you
> are restoring from that backup.  Is that the issue?
> 
> What seems more complicated is doing the backup while the database is
> active, and this may be a requirement for a final PITR solution.  Some
> think we can grab the WAL pointer at 'tar' start and replay that on the
> backup even if the file changes during backup.

OK, I think I understand live backups now using tar and PITR.  Someone
explained this to me months ago but now I understand it.

First, a key issue is that PostgreSQL doesn't fiddle with individual
items on disk.  It reads an 8k block, modifies it, (writes it to WAL if
it hasn't been written to that WAL segment before), and writes it to
disk.  That is key.  (Are there cases where don't do this, like
pg_controldata?)

OK, so you do a tar backup of a file.  While you are doing the tar,
certain 8k blocks are being modified in the file.  There is no way to
know what blocks are modified as you are doing the tar, and in fact you
could read partial page writes during the tar.

One solution would be to read the file using the PostgreSQL page buffer,
but even then, getting a stable snapshot of the file would be difficult.
Now, we could lock the table and prevent writes while it is being backed
up, but there is a better way.

We already have pre-change page images in WAL.  When we do the backup,
any page that was modified while we were backing up is in the WAL.  On
restore, we can recover whatever tar saw of the file, knowing that the
WAL page images will recover any page changes made during the tar.

Now, you mentioned we may not want pre-change page images in WAL
because, with PITR, we can more easily recover from the WAL rather than
having this performance hit for many page writes.

What I suggest is a way for the backup tar to turn on pre-change page
images while the tar is happening, and turn it off after the tar is
done.

We already have this TODO item:

* Turn off after-change writes if fsync is disabled (?)

No sense in doing after-change WAL writes without fsync.  We could
extend this so those after-changes writes could be turned on an off,
allowing fill tar backups and PITR recovery.  In fact, for people with
reliable hardware, we should already be giving them the option of
turning off pre-change writes.  We don't have a way of detecting partial
page writes, but then again, we can't detect failures with fsync off
anyway so it seems to be the same vulnerability.  I guess that's why we
were going to wrap the effect into the same variable, but for PITR, can
see wanting fsync always on and the ability to turn pre-change writes on
and off.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-05 Thread Bruce Momjian


J. R. Nield wrote:
> On Fri, 2002-07-05 at 01:42, Bruce Momjian wrote:
> > 
> > We have needed
> > point-in-time recovery for a long time, 
> 
> Most thanks should go to vadim (and whoever else worked on this), since
> his WAL code already does most of the work. The key thing is auditing
> the backend to look for every case where we assume some action is not
> visible until after commit, and therefore don't log its effects. Those
> are the main cases that must be changed.

Yep.  Glad you can focus on that.

> > ---
> > 
> > J. R. Nield wrote:
> > > Hello:
> > > 
> > > I've got the logging system to the point where I can take a shutdown
> > > consistent copy of a system, and play forward through multiple
> > > checkpoints. It seems to handle CREATE TABLE/DROP TABLE/TRUNCATE
> 
> But notably not for the btree indexes! It looked like they were working,
> because the files were there, and all indexes created before the backup
> would work under insert/delete (including sys catalog indexes). This is
> because btree insert/delete is logged, just not during build. So I
> missed that one case.
> 
> You will end-up with up-to-date table data though, so it is something.
> 
> Adding logging support to btbuild is the next step, and I don't think it
> should be too hard. I am working this now.

Great.

> It is also a major advantage that most everything in the system gets
> stored in the catalog tables, and so is logged already.
> 
> 
> > Uh, we do log pre-page writes to WAL to recover from partial page
> > writes to disk.  Is there something more we need here?
> > 
> > As for bad block detection, we have thought about adding a CRC to each
> > page header, or at least making it optional.  WAL already has a CRC.
> >
> 
> Yes this should be last to do, because it is not necessary for PITR,
> only for performance (the option not to write pre-images without fear of
> data loss). 

Yep.

> > Yes, there are a few places where we actually create a file, and if the
> > server crashes, the file remains out there forever.  We need to track that
> > better.  
> 
> OK, there is a bigger problem then just tracking the file though. We
> sometimes do stuff to that file that we don't log. We assume that if we
> commit, the file must be OK and will not need replay because the
> transaction would not have committed if the file was not in a commitable
> state. If we abort, the system never sees the file, so in a sense we
> undo everything we did to the file. It is a kind of poor-man's rollback
> for certain operations, like btbuild, create table, etc. But it means
> that we cannot recover the file from the log, even after a commit.

Yep.

> > 
> > > 
> > > ?1.1.1 - CREATE DATABASE is also unlogged
> > > 
> > >   This will cause the same replay problems as above.
> > 
> > Yep.  Again, seems a master cleanup on startup is needed.
> 
> The cleanup is not the problem, only a nuisance. Creating the files
> during replay is the problem. I must recreate CREATE DATABASE from the
> log exactly as it was done originally. I think just logging the
> parameters to the command function should be sufficient, but I need to
> think more about it.

OK, makes sense.  Nice when you can bundle a complex action into the
logging of one command and its parameters.

> > 
> > > b) If TRUNCATE TABLE fails, the system must PANIC. Otherwise, the table
> > > may be used in a future command, and a replay-recovered database may
> > > end-up with different data than the original.
> > 
> > We number based on oids.  You mean oid wraparound could cause the file
> > to be used again?
> 
> That's not what I meant. Let's say I issue 'TRUNCATE TABLE foo'. Then,
> right before smgrtruncate is called, I do an XLogInsert saying "Redo a
> TRUNCATE TABLE on foo to nblocks if we crash". Then smgrtruncate fails
> and we do an elog(ERROR)
> 
> Now the user decides that since TRUNCATE TABLE didn't work, he might as
> well use the table, so he inserts some records into it, generating log
> entries.
> 
> When I replay this log sequence later, what happens if the TRUNCATE
> succeeds instead of failing?

You mean the user is now accessing a partially truncated table? That's
just too weird.  I don't see how the WAL would know how far truncation
had gone.  I see why you would need the panic and it seems acceptable.

> I admit that there are other ways of handling it than to PANIC if the
> truncate fails. All the ones I can come up with seem to amount to some
> kind of ad-hoc UNDO log.

Yea, truncate failure seems so rare/impossible to happen, we can do a
panic and see if it ever happens to anyone.  I bet it will not.  Those
are usually cases of an OS crash, so it is the same as a panic.

> > > WAL must be flushed before truncate as well.
> > > 
> > > WAL does not need to be flushed before create, if we don't mind 
> > > leaking files sometimes.
> > 
> > Cleanup?
> 
> Yes, we could garbage

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-05 Thread Zeugswetter Andreas SB SD



> Let me re-write it, and I'll post it in the next version. The section
> dealt with what to do when you have a valid restored controlfile from a
> backup system, which is in the DB_SHUTDOWNED state, and that points to a
> valid shutdown/checkpoint record in the log; only the checkpoint record
> happens not to be the last one in the log. This is a situation that
> could never happen now, but would in PITR.

But it would need to be restore's responsibility to set the flag to 
DB_IN_PRODUCTION, no?

> Even if we shutdown before we copy the file, we don't want a file that
> hasn't been written to in 5 weeks before it was backed up to require
> five weeks of old log files to recover. So we need to track that
> information somehow, because right now if we scanned the blocks in the
> file looking for at the page LSN's, we greatest LSN we would see might
> be much older than where it would be safe to recover from. That is the
> biggest problem, I think.

Well, if you skip a validity test it could be restore's responsibility 
to know which checkpoint was last before the file backup was taken. 
(When doing a backup you would need to include the last checkpoint info
== pg_control at start of backup)

Andreas



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-04 Thread J. R. Nield


On Thu, 2002-07-04 at 11:45, J. R. Nield wrote:

One other item that should be here:
> The big items so-far are:
> §1 - Logging Relation file creation, truncation, and removal 
> This is mostly done. Can do infinte play-forward from
> online logs.
> §2 - Partial-Write and Bad Block detection
> Need input before starting. Migration issues.
> §3 - Detecting Shutdown Consistent System Recovery
> Mostly done.
> §4 - Interactive Play-Forward Recovery for an Entire System
> Need input before starting.
> §5 - Individual file consistent recovery
> Need input. Semi-Major changes required.
  §6 - btbuild is not logged
  Not logged because of same assumptions as for file create.
  Only need to log the index build parameters to recreate
  the index, not each page change.


-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-04 Thread J. R. Nield


Hello:

I've got the logging system to the point where I can take a shutdown
consistent copy of a system, and play forward through multiple
checkpoints. It seems to handle CREATE TABLE/DROP TABLE/TRUNCATE
properly, and things are moving forward well. Recovery to an arbitrary
point-in-time should be just as easy, but will need some administrative
interface for it.

At this point, some input would be useful on how I should handle things.

The most important questions that need answering are in sections 2 & 5,
since they impact the most other parts of the system. They will also
require good documentation for sysadmins.



Issues Outstanding for Point In Time Recovery (PITR)

$Date: 2002/07/04 14:23:37 $

$Revision: 1.4 $

J.R. Nield

(Enc: ISO 8859-15 Latin-9)


§0 - Introduction

This file is where I'm keeping track of all the issues I run into while
trying to get PITR to work properly. Hopefully it will evolve into a
description of how PITR actually works once it is implemented.

I will also try to add feedback as it comes in.

The big items so-far are:
§1 - Logging Relation file creation, truncation, and removal 
This is mostly done. Can do infinte play-forward from
online logs.
§2 - Partial-Write and Bad Block detection
Need input before starting. Migration issues.
§3 - Detecting Shutdown Consistent System Recovery
Mostly done.
§4 - Interactive Play-Forward Recovery for an Entire System
Need input before starting.
§5 - Individual file consistent recovery
Need input. Semi-Major changes required.

§1 - Logging Relation file creation, truncation, and removal 

  §1.1 - Problem:

Without file creation in the log, we can't replay committed 
transactions that create relations. 

The current code assumes that any transaction reaching commit has already
ensured it's files exist, and that those files will never be removed. This
is true now, but not for log-replay from an old backup database system. 

The current XLOG code silently ignores block-write requests for
non-existent files, and assumes that the transaction generating those
requests must have aborted.

Right now a crash during TRUNCATE TABLE will leave the table in an
inconsistent state (partially truncated). This would not work when doing
replay from before the last checkpoint.

§1.1.1 - CREATE DATABASE is also unlogged

  This will cause the same replay problems as above.

  §1.2 - Proposal:

a) Augment the SMGR code to log relation file operations, and to handle
redo requests properly. This is simple in the case of create. Drop must be
logged only IN the commit record. For truncate see (b).

The 'struct f_smgr' needs new operations 'smgr_recreate', 'smgr_reunlink',
and 'smgr_retruncate'. smgr_recreate should accept a RelFileNode instead
of a Relation.

Transactions that abort through system failure (ie.  unlogged aborts) 
will simply continue to leak files.

b) If TRUNCATE TABLE fails, the system must PANIC. Otherwise, the table
may be used in a future command, and a replay-recovered database may
end-up with different data than the original.

WAL must be flushed before truncate as well.

WAL does not need to be flushed before create, if we don't mind 
leaking files sometimes.

c) Redo code should treat writes to non-existent files as an error.
Changes affect heap & nbtree AM's. [Check others]

d) rtree [and GiST? WTF is GiST? ] is not logged. A replay recovery of
  a database should mark all the rtree indices as corrupt.
  [ actually we should do that now, are we? ]

e) CREATE DATABASE must be logged properly, not use system(cp...)

  §1.3 - Status:

All logged SMGR operations are now in a START_CRIT_SECTION()/
END_CRIT_SECTION() pair enclosing the XLogInsert() and the underlying fs
operations.

Code has been added to smgr and xact modules to log:
create (no XLogFlush)
truncate (XLogFlush)
pending deletes on commit record
files to delete on abort record

Code added to md.c to support redo ops

Code added to smgr for RMGR redo/desc callbacks

Code added to xact RMGR callbacks for redo/desc

Database will do infinite shutdown consistent system recovery from the
online logs, if you manually munge the control file to set state ==
DB_IN_PRODUCTION instead of DB_SHUTDOWNED.

Still need to do:
Item (c), recovery cleanup in all AM's
Item (d), logging in other index AM's
Item (e), CREATE DATABASE stuff



§2 - Partial-Write and Bad Block detection

  §2.1 - Problem:

In order to protect against partial writes without logging pages
twice, we need to detect partial pages in system files and report them
to the system administrator. We also might want to be able to det

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

[HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

30 matches

Site Navigation

Mail list logo

Footer information