Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-18 Thread J. R. Nield

Richard:

I can't quite follow this; maybe you sent a draft by accident. If you
want to post a patch against 7.2.1, or even better against HEAD in CVS,
that would be great. Or if you'd rather point me to your source online,
that would be good too.

I just want to clarify though: is this work released to the PostgreSQL
Development group by Progress and Multera, or do they still claim
copyright interest in it?

Regards,
J.R. Nield


On Thu, 2002-07-18 at 12:56, Richard Tucker wrote:
 
 
 -Original Message-
 From: J. R. Nield [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, July 17, 2002 8:13 PM
 To: [EMAIL PROTECTED]
 Cc: Bruce Momjian
 Subject: RE: [HACKERS] Issues Outstanding for Point In Time Recovery
 (PITR)
 
 
 On Wed, 2002-07-17 at 19:25, Richard Tucker wrote:
  Regarding hot backup.  Our implementation of pg_copy does a hot backup.
  It turns off database checkpointing for the duration of the backup.
 Backups
  all the files of the database cluster up to the wal file currently being
  logged to.  It then acquires the WalInsertLock lock long enough to backup
  the current wal file.
 
 Does it then allow more writes to that WAL file? It would seem like you
 want to advance the log to the next file, so the sysadmin wouldn't have
 to choose which one of log-file number 3 he wants to use at restore.
 
  Writes to the wal file are allowed during the backup except for the
 backing of the wal file current when the
  backup completes.  That is the pg_xlog directory is the last directory to
 be backed up.  The wal_files are backed
  up in the order they were used.  Continued wal file logging is allowed
 until the backup reaches the current wal
  file being written to.  To back up this last wal file the WalInsertLock is
 held until the copy of the wal file
  is complete.  So the backup stops update activity only long enough to copy
 this last 16mb file.
 
 Also, what do you mean by 'turns off checkpointing'. You have to do a
 checkpoint, or at least flush the buffers, when you start the backup.
 Otherwise how do you know what LSN to start from at restore?
 
  The pg_control file also gets backed up.  It contains the point in the log
 at which to begin
  the redo/roll forward.
  By not allowing the redo point to advance while the backup goes on means
 that the startup processes' crash
  recovery code will capture all the changes made to the database cluster
 while the backup was running.
 
 
 Anyway: Yes we'd love to see the code.
 
  In what form would you like me to send the code to you e.g. as a patch,
 copy our whole source ...
 
 Since I've pretty-much got create/drop and index stuff working, if your
 code does the rest then we should be good to go.
 
 ;jrnield
 
 
 --
 J. R. Nield
 [EMAIL PROTECTED]
 
 
 
 
-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-18 Thread Richard Tucker

Progress/Multera would release the hot backup/roll forward work to the
PostgreSQL Development group.
-regards
richt

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of J. R. Nield
Sent: Thursday, July 18, 2002 2:34 PM
To: [EMAIL PROTECTED]
Cc: Bruce Momjian; PostgreSQL Hacker
Subject: Re: [HACKERS] Issues Outstanding for Point In Time Recovery
(PITR)


Richard:

I can't quite follow this; maybe you sent a draft by accident. If you
want to post a patch against 7.2.1, or even better against HEAD in CVS,
that would be great. Or if you'd rather point me to your source online,
that would be good too.

I just want to clarify though: is this work released to the PostgreSQL
Development group by Progress and Multera, or do they still claim
copyright interest in it?

Regards,
J.R. Nield


On Thu, 2002-07-18 at 12:56, Richard Tucker wrote:


 -Original Message-
 From: J. R. Nield [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, July 17, 2002 8:13 PM
 To: [EMAIL PROTECTED]
 Cc: Bruce Momjian
 Subject: RE: [HACKERS] Issues Outstanding for Point In Time Recovery
 (PITR)


 On Wed, 2002-07-17 at 19:25, Richard Tucker wrote:
  Regarding hot backup.  Our implementation of pg_copy does a hot
backup.
  It turns off database checkpointing for the duration of the backup.
 Backups
  all the files of the database cluster up to the wal file currently being
  logged to.  It then acquires the WalInsertLock lock long enough to
backup
  the current wal file.

 Does it then allow more writes to that WAL file? It would seem like you
 want to advance the log to the next file, so the sysadmin wouldn't have
 to choose which one of log-file number 3 he wants to use at restore.

  Writes to the wal file are allowed during the backup except for the
 backing of the wal file current when the
  backup completes.  That is the pg_xlog directory is the last directory
to
 be backed up.  The wal_files are backed
  up in the order they were used.  Continued wal file logging is allowed
 until the backup reaches the current wal
  file being written to.  To back up this last wal file the WalInsertLock
is
 held until the copy of the wal file
  is complete.  So the backup stops update activity only long enough to
copy
 this last 16mb file.

 Also, what do you mean by 'turns off checkpointing'. You have to do a
 checkpoint, or at least flush the buffers, when you start the backup.
 Otherwise how do you know what LSN to start from at restore?

  The pg_control file also gets backed up.  It contains the point in the
log
 at which to begin
  the redo/roll forward.
  By not allowing the redo point to advance while the backup goes on means
 that the startup processes' crash
  recovery code will capture all the changes made to the database cluster
 while the backup was running.


 Anyway: Yes we'd love to see the code.

  In what form would you like me to send the code to you e.g. as a patch,
 copy our whole source ...

 Since I've pretty-much got create/drop and index stuff working, if your
 code does the rest then we should be good to go.

 ;jrnield


 --
 J. R. Nield
 [EMAIL PROTECTED]




--
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Richard Tucker


We also have implemented a roll forward recovery mechanism. We modified a
7.2.1 version of Postgres.
The mechanism is designed to provide a means of recoverying from the loss or
corruption of media.  It provides for duplicating wal_files so that if a
wal_file is lost roll forward recovery can recover the database using the
duplicated wal_files.  Hooks were also added so that the roll forward
recovery mechanism can be used to implement a hot standby database.
Along with the roll forward recovery mechanism we have also implemented an
online database backup utility which is synchronized with the recovery log
so that the backup can be a starting point for a roll forward recovery
session.

Roll forward recovery is enabled for a database cluster by specifying one of
two postmaster configuration parameters.

wal_file_reuse = false  This parameter tells the wal system to not reuse wal
files.  This option is intended for sites wishing to implement a hot standby
database where wal files will be periodically copied to another machine
where they will be rolled forward into the standby database.

wal_file_duplicate = directory_path  This parameter tells the system to
mirror the files in the pg_xlog directory to the specified directory.  This
allows for the recovery of a database where a wal file has been damaged or
lost.  It also allows for a variant of a hot standby database where the
duplicate directory is the pg_xlog directory of the standby database.

Since both of these options cause wal files to accumulate indefinately the
dba needs a means of purging wal files when they are no longer needed.  So
an sql command, ALTER SYSTEM PURGE WAL_FILES wal_file_name, has also
been implemented.  This command deletes all wal files up to and including
the specified wal_file_name as long as those wal files are not needed to
recover the database in the event of a system crash.  To find out the status
of the wal files a function has been implemented to return a wal file name.
The function is:

Wal_file( request_tye)
Request_type := [ ‘current’ | ‘last’ | ‘checkpoint’ | ‘oldest’]

Wal_file ('current') returns the name of the log file currently being
written to.
Wal_file('last') returns the name of the last log file filled.
Wal_file('checkpoint') returns the name of the file containing the current
redo position.  The current redo position is the position in the recovery
log where crash recovery would start if the system were to crash now.  All
logs prior to this one will not be needed to recover the database cluster
and could be safely removed.
Wal_file('oldest') returns the oldest xlog file found in the pg_xlog
directory.


To actually perform a roll forward you use the postmaster configuration
parameter roll_forward=yes.  This parameter tells the startup process to
perform crash recovery even though the state of the database as found in the
pg_control file indicates a normal shutdown.  This is necessary since the
starting point of roll forward session could be the restore of a database
cluster that was shutdown in order to back it up.  Furthermore this
parameter tells the startup process not to write out a checkpoint record at
the end of the roll forward session. This allows for the database cluster to
receive subsequent wal files and to have those rolled forward as well.  When
starting the postmaster with the roll_forward=yes option, it shuts down the
database as soon as the startup process completes.  So the idea is to
restore a backup, copy all of your saved/duplicated wal files into the
pg_xlog directory of the restored database and start the postmaster with the
roll_forward option.

For point in time recovery there is also a roll_forward_until = time which
rolls forward through the wal files until the first transaction commit note
that is greater than or equal to the specified time.

The pg_copy utility performs an on line copy of a database cluster.  Its
syntax is:
pg_copy backup_directory [-h host] [-p port] ...
This makes a copy of the database where backup_directory is what you would
set PGDATA to in order start a postmaster against the backup copy.  The
database can be being updated while the copy occurs.  If you start a
postmaster against this copy it will appear to the startup process as a
database that crashed at the instant the pg_copy operation completed.
Futhermore the pg_copy utility automatically removes any wal files not
needed to recover the database from either pg_xlog directory or the
wal_file_duplicate directory.

So a DBA to protect the database from media loss just needs to set the
wal_file_duplicate paramater and periodically pg_copy the database.


The BIG THING we have not done is address the issue that add/drop tables and
indexes do not propagate through the roll forward recovery mechanism
properly.



-regards
Richard Tucker






---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian

Richard Tucker wrote:
 
 We also have implemented a roll forward recovery mechanism. We modified a
 7.2.1 version of Postgres.
 The mechanism is designed to provide a means of recoverying from the loss or
 corruption of media.  It provides for duplicating wal_files so that if a
 wal_file is lost roll forward recovery can recover the database using the
 duplicated wal_files.  Hooks were also added so that the roll forward
 recovery mechanism can be used to implement a hot standby database.
 Along with the roll forward recovery mechanism we have also implemented an
 online database backup utility which is synchronized with the recovery log
 so that the backup can be a starting point for a roll forward recovery
 session.

In researching who has done this work, I found that
http://www.multera.com/ is Progress Software.  I assume this
Distributed but Connected is partly based on PostgreSQL.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian

Bruce Momjian wrote:
 Richard Tucker wrote:
  
  We also have implemented a roll forward recovery mechanism. We modified a
  7.2.1 version of Postgres.
  The mechanism is designed to provide a means of recoverying from the loss or
  corruption of media.  It provides for duplicating wal_files so that if a
  wal_file is lost roll forward recovery can recover the database using the
  duplicated wal_files.  Hooks were also added so that the roll forward
  recovery mechanism can be used to implement a hot standby database.
  Along with the roll forward recovery mechanism we have also implemented an
  online database backup utility which is synchronized with the recovery log
  so that the backup can be a starting point for a roll forward recovery
  session.
 
 In researching who has done this work, I found that
 http://www.multera.com/ is Progress Software.  I assume this
 Distributed but Connected is partly based on PostgreSQL.

Oh, I see it now, PostgreSQL is right there:

* OuterEdge
  A distributed site technology suite that allows remote users to
maintain the integrity, not the congestion and price, of centralized
data. The OuterEdge includes the affordable UltraSQL^(TM) database
server powered by PostgreSQL, the third most popular database, and a

   ^
Secure Application/web server that supports the industry's most popular
Internet languages. Together with the Replication Engine software, the
OuterEdge gives remote users seamless access to important data without
relying on bandwidth.

Wonder why we are the third most popular database.  I think that's
good?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Rod Taylor

 server powered by PostgreSQL, the third most popular database, and a
 
^
 
 Wonder why we are the third most popular database.  I think that's
 good?

You'll notice they didn't qualify where.  On this list, it's probably
#1.  Within Progress software perhaps we're third most popular (whatever
two are typically used in the InnerEdge are 1 and 2).


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian

Rod Taylor wrote:
  server powered by PostgreSQL, the third most popular database, and a
  
 ^
  
  Wonder why we are the third most popular database.  I think that's
  good?
 
 You'll notice they didn't qualify where.  On this list, it's probably
 #1.  Within Progress software perhaps we're third most popular (whatever
 two are typically used in the InnerEdge are 1 and 2).

Yea, using that logic, #1 would be the Progress internal db system, #2
would be MySQL (though that seems doubtful at this point with Nusphere),
and PostgreSQL.

Actually, PostgreSQL is #1 in my home.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread D'Arcy J.M. Cain

On July 17, 2002 05:45 pm, Richard Tucker wrote:
 We also have implemented a roll forward recovery mechanism. We modified a
 7.2.1 version of Postgres.
 ...

Excellent!  I can't wait.  When will it be in current?

 The BIG THING we have not done is address the issue that add/drop tables
 and indexes do not propagate through the roll forward recovery mechanism
 properly.

I can live with that.  Schemas shouldn't change so often that you can't just 
dup any changes to the backup(s).

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Richard Tucker

I don't know how our marketing came up third most popular but I think the
order is, Oracle, MySQL, and PostgreSQL or maybe Oracle, MSSQL and
PostgreSQL.  I'm sure there is some criterion by which PostgreSQL is tenth
and by some other its number one.
Of course, my posting was about Point In Time Recovery and not multera
marketing spin.
-regards
richt

-Original Message-
From: Bruce Momjian [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 17, 2002 6:29 PM
To: Rod Taylor
Cc: PostgreSQL-development; [EMAIL PROTECTED]; J. R. Nield
Subject: Re: [HACKERS] Issues Outstanding for Point In Time Recovery
(PITR)


Rod Taylor wrote:
  server powered by PostgreSQL, the third most popular database, and a
 
 ^

  Wonder why we are the third most popular database.  I think that's
  good?

 You'll notice they didn't qualify where.  On this list, it's probably
 #1.  Within Progress software perhaps we're third most popular (whatever
 two are typically used in the InnerEdge are 1 and 2).

Yea, using that logic, #1 would be the Progress internal db system, #2
would be MySQL (though that seems doubtful at this point with Nusphere),
and PostgreSQL.

Actually, PostgreSQL is #1 in my home.

--
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread J. R. Nield

On Wed, 2002-07-17 at 01:25, Bruce Momjian wrote:
 
 We only patch configure.in.  If you post to hackers, they can give you
 assistance and I will try to help however I can.  I can so some
 configure.in stuff for you myself.

Thanks for the offer. The only thing I was changing it for was to test
whether and how to get a ethernet MAC address using ioctl, so libuuid
could use it if available. That is dropped now.

 
  Related to that, the other place I need advice is on adding Ted Tso's
  LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
  allowed to use this? There is a free OSF/DCE spec for UUID's, so I can
  re-implement the library if required.
 
 We talked about this on the replication mailing list.  We decided that
 hostname, properly hashed to an integer, was the proper way to get this
 value.  Also, there should be a postgresql.conf variable so you can
 override the hostname-generated value if you wish.  I think that is
 sufficient.

I will do something like this, but reserve 16 bytes for it just in case
we change our minds. It needs to be different among systems on the same
machine, so there needs to be a time value and a pseudo-random part as
well. Also, 'hostname' will likely be the same on many machines
(localhost.localdomain or similar).

The only reason I bothered with UUID's before is because they have a
standard setup to make the possibility of collision extremely small, and
I figured replication will end up using it someday.

 
  We also haven't discussed commands for backup/restore, but I will use
  what I think is appropriate and we can change the grammar if needed. The
  initial hot-backup capability will require the database to be in
  read-only mode and use tar for backup, and I will add the ability to
  allow writes later.
 
 Yea, I saw Tom balked at that. I think we have enough manpower and time
 that we can get hot backup in normal read/write mode working before 7.3
 beta so I would just code it assuming the system is live and we can deal
 with making it hot-capable once it is in CVS.  It doesn't have to work
 100% until beta time.

Hot backup read/write requires that we force an advance in the logfile
segment after the backup. We need to save all the logs between backup
start and completion. Otherwise the files will be useless as a
standalone system if the current logs somehow get destroyed (fire in the
machine room, etc.).

The way I would do this is:

  create a checkpoint
  do the block-by-block walk of the files using the bufmgr
  create a second checkpoint
  force the log to advance past the end of the current segment
  save the log segments containing records between the
   first  second checkpont with the backup

Then if you restore the backup, you can recover to the point of the
second checkpoint, even if the logs since then are all gone.

Right now the log segment size is fixed, so this means that we'd waste
8MB of log space on average to do a backup. Also, the way XLOG reads
records right now, we have to write placeholder records into the empty
space, because that's how it finds the end of the log stream. So I need
to change XLOG to handle skip records, and then to truncate the file
when it gets archived, so we don't have to save up to 16MB of zeros.

Also, if archiving is turned off, then we can't recycle or delete any
logs for the duration of the backup, and we have to save them.

So I'll finish the XLOG support for this, and then think about the
correct way to walk through all the files.
 
-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-17 Thread Bruce Momjian

J. R. Nield wrote:
 I will do something like this, but reserve 16 bytes for it just in case
 we change our minds. It needs to be different among systems on the same
 machine, so there needs to be a time value and a pseudo-random part as
 well. Also, 'hostname' will likely be the same on many machines
 (localhost.localdomain or similar).
 
 The only reason I bothered with UUID's before is because they have a
 standard setup to make the possibility of collision extremely small, and
 I figured replication will end up using it someday.

Sure.  Problem is, we support so many platforms that any trickery is a
problem.  If they can change it in postgresql.conf, that should be
sufficient.

 Hot backup read/write requires that we force an advance in the logfile
 segment after the backup. We need to save all the logs between backup
 start and completion. Otherwise the files will be useless as a
 standalone system if the current logs somehow get destroyed (fire in the
 machine room, etc.).
 
 The way I would do this is:
 
   create a checkpoint
   do the block-by-block walk of the files using the bufmgr
   create a second checkpoint
   force the log to advance past the end of the current segment
   save the log segments containing records between the
first  second checkpont with the backup

Sounds good.

 Then if you restore the backup, you can recover to the point of the
 second checkpoint, even if the logs since then are all gone.

Good, you put the logs that happened during the backup inside the same
backup, make it consistent.  Makes sense.

 Right now the log segment size is fixed, so this means that we'd waste
 8MB of log space on average to do a backup. Also, the way XLOG reads
 records right now, we have to write placeholder records into the empty
 space, because that's how it finds the end of the log stream. So I need
 to change XLOG to handle skip records, and then to truncate the file
 when it gets archived, so we don't have to save up to 16MB of zeros.
 
 Also, if archiving is turned off, then we can't recycle or delete any
 logs for the duration of the backup, and we have to save them.
 
 So I'll finish the XLOG support for this, and then think about the
 correct way to walk through all the files.

Sounds like a good plan.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread Bruce Momjian


J.R., just checking to see how PITR recovery is going.  Do you need any
assistance or have any questions for us?

Also, do you have any idea how close you are to having something
completed?  Are you aware we are closing development of 7.3 at the end
of August and start beta September 1?  Is there any way we can help you?

---

J. R. Nield wrote:
 On Fri, 2002-07-05 at 01:42, Bruce Momjian wrote:
  
  We have needed
  point-in-time recovery for a long time, 
 
 Most thanks should go to vadim (and whoever else worked on this), since
 his WAL code already does most of the work. The key thing is auditing
 the backend to look for every case where we assume some action is not
 visible until after commit, and therefore don't log its effects. Those
 are the main cases that must be changed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread J. R. Nield

On Tue, 2002-07-16 at 15:36, Bruce Momjian wrote:
 
 J.R., just checking to see how PITR recovery is going.  Do you need any
 assistance or have any questions for us?
 
 Also, do you have any idea how close you are to having something
 completed?  Are you aware we are closing development of 7.3 at the end
 of August and start beta September 1?  Is there any way we can help you?
 

It should be ready to go into CVS by the end of the month. 

That will include: logging all operations except for rtree and GiST,
archival of logfiles (with new postgresql.conf params), headers on the
logfiles to verify the system that created them, standalone backend
recovery to a point-in-time, and a rudimentary hot backup capability.

I could use some advice on the proper way to add tests to configure.in,
given that the autoconf output is in CVS. Would you ever want a patch to
include the generated 'configure' file?

Related to that, the other place I need advice is on adding Ted Tso's
LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
allowed to use this? There is a free OSF/DCE spec for UUID's, so I can
re-implement the library if required.

We also haven't discussed commands for backup/restore, but I will use
what I think is appropriate and we can change the grammar if needed. The
initial hot-backup capability will require the database to be in
read-only mode and use tar for backup, and I will add the ability to
allow writes later.

Does this sound like a reasonable timeframe/feature-set to make the 7.3
release?

-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread Tom Lane

J. R. Nield [EMAIL PROTECTED] writes:
 Related to that, the other place I need advice is on adding Ted Tso's
 LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
 allowed to use this?

Uh, why exactly is UUID essential for this?  (The correct answer is
it's not, IMHO.)

 We also haven't discussed commands for backup/restore, but I will use
 what I think is appropriate and we can change the grammar if needed. The
 initial hot-backup capability will require the database to be in
 read-only mode and use tar for backup, and I will add the ability to
 allow writes later.

There is no read-only mode, and I for one will resist adding one.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-16 Thread Bruce Momjian

J. R. Nield wrote:
 On Tue, 2002-07-16 at 15:36, Bruce Momjian wrote:
  
  J.R., just checking to see how PITR recovery is going.  Do you need any
  assistance or have any questions for us?
  
  Also, do you have any idea how close you are to having something
  completed?  Are you aware we are closing development of 7.3 at the end
  of August and start beta September 1?  Is there any way we can help you?
  
 
 It should be ready to go into CVS by the end of the month. 
 
 That will include: logging all operations except for rtree and GiST,
 archival of logfiles (with new postgresql.conf params), headers on the
 logfiles to verify the system that created them, standalone backend
 recovery to a point-in-time, and a rudimentary hot backup capability.

Sounds great.  That gives us another month to iron out any remaining
issues.  This will be a great 7.3 feature!


 I could use some advice on the proper way to add tests to configure.in,
 given that the autoconf output is in CVS. Would you ever want a patch to
 include the generated 'configure' file?

We only patch configure.in.  If you post to hackers, they can give you
assistance and I will try to help however I can.  I can so some
configure.in stuff for you myself.

 Related to that, the other place I need advice is on adding Ted Tso's
 LGPL'd UUID library (stolen from e2fsprogs) to the source. Are we
 allowed to use this? There is a free OSF/DCE spec for UUID's, so I can
 re-implement the library if required.

We talked about this on the replication mailing list.  We decided that
hostname, properly hashed to an integer, was the proper way to get this
value.  Also, there should be a postgresql.conf variable so you can
override the hostname-generated value if you wish.  I think that is
sufficient.

 We also haven't discussed commands for backup/restore, but I will use
 what I think is appropriate and we can change the grammar if needed. The
 initial hot-backup capability will require the database to be in
 read-only mode and use tar for backup, and I will add the ability to
 allow writes later.

Yea, I saw Tom balked at that. I think we have enough manpower and time
that we can get hot backup in normal read/write mode working before 7.3
beta so I would just code it assuming the system is live and we can deal
with making it hot-capable once it is in CVS.  It doesn't have to work
100% until beta time.

 Does this sound like a reasonable timeframe/feature-set to make the 7.3
 release?

Sounds great.  This is another killer 7.3 feature, and we really need
this for greater enterprise acceptance of PostgreSQL.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-09 Thread Hannu Krosing

On Tue, 2002-07-09 at 17:26, Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  1) record the lowest uncommitted transaction number (LUTN) , this may
  have problems with wraparound, but I guess they are solvable. Disllow
  VACUUM. Do a CHECKPOINT ('alter database begin backup')
  3) make a file-level (.tar) backup of data directory.
  4) Allow VACUUM. ('alter database end backup')
 
 Transactions don't necessarily commit in sequence number order, so the
 concept of LUTN seems meaningless.

Not quite. It is the most simple way to be sure that if we invalidate
all transactions = than it we get back to a fairly recent
Point-In-Time.

The real solution would of course be to remember all committed
transactions at this PIT, which can probably be done by remembering LUTN
and all individual committed transactions  LUTN

 Why is it necessary (or even good) to disallow VACUUM?

So that it would be possible to resurrect these tuples that have been
deleted/updated during disk-level backup.

I would like better the ability to tell VACUUM not to touch tuples where
deleting transaction number = LUTN . IIRC the original postgres was
able to do that.

 I really dislike
 a design that allows the DBA to cripple the database by forgetting the
 last step in a (long) process.

There are several ways around it.

1. do it in a script, that will not forget.

2. Closing the session that did 'alter database begin backup' session
could do it automatically, but this would make the backup script
trickier.

3. VACUUM should not block but report a warning about being restricted
from running.

4. database can be instructed to send a message to DBA's pager if it has
been in 'begin backup' state too long ;)


Hannu



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Zeugswetter Andreas SB SD


 OK, so you do a tar backup of a file.  While you are doing the tar,
 certain 8k blocks are being modified in the file.  There is no way to
 know what blocks are modified as you are doing the tar, and in fact you
 could read partial page writes during the tar.

No, I think all OS's (Unix and NT at least) guard against this, as long as 
the whole 8k block is written in one call. It is only the physical layer (disk) 
that is prone to partial writes.

 any page that was modified while we were backing up is in the WAL.  On
 restore, we can recover whatever tar saw of the file, knowing that the
 WAL page images will recover any page changes made during the tar.

Assuming above, I do not think this is necessary.

 What I suggest is a way for the backup tar to turn on pre-change page
 images while the tar is happening, and turn it off after the tar is
 done.

Again, I do not think this is necessary.

Andreas



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Zeugswetter Andreas SB SD


 As noted, one of the main problems is knowing where to begin
 in the log.  This can be handled by having backup processing 
 update the control file with the first lsn and log file 
 required.  At the time of the backup, this information is or 
 can be made available.  The control file can be the last file
 added to the tar and can contain information spanning the entire
 backup process.

lsn and logfile number (of latest checkpoints) is already in the control 
file, thus you need control file at start of backup. (To reduce the number 
of logs needed for restore of an online backup you could force a checkpoint
before starting file backup)

You will also need lsn and logfile number after file backup, to know how much 
log needs to at least be replayed to regain a consistent state. 

Andreas



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Bruce Momjian

Zeugswetter Andreas SB SD wrote:
 
  OK, so you do a tar backup of a file.  While you are doing the tar,
  certain 8k blocks are being modified in the file.  There is no way to
  know what blocks are modified as you are doing the tar, and in fact you
  could read partial page writes during the tar.
 
 No, I think all OS's (Unix and NT at least) guard against this, as long as 
 the whole 8k block is written in one call. It is only the physical layer (disk) 
 that is prone to partial writes.

Yes, good point.  The kernel will present a unified view of the 8k
block.  Of course, there are still cases where 8k blocks are being
changed in front/behind in the tarred file.  Will WAL allow us to
re-synchronize that file even if part of it has pages from an earlier in
time than other pages.  Uh, I think so.  So maybe we don't need the
pre-write images in WAL after all.  Can we replay the WAL when some
pages in the restored file _have_ the WAL changes and some don't?  Maybe
the LSN on the pages helps with this.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-08 Thread Barry Lind

I know that in Oracle there are 'alter database begin backup' and 'alter 
database end backup' commands that allow you to script your hot backups 
through a cron job by calling the begin backup command first, then using 
disk backup method of choice and then finally call the end backup command.

--Barry

Patrick Macdonald wrote:

Zeugswetter Andreas SB SD wrote:
  

As noted, one of the main problems is knowing where to begin
in the log.  This can be handled by having backup processing
update the control file with the first lsn and log file
required.  At the time of the backup, this information is or
can be made available.  The control file can be the last file
added to the tar and can contain information spanning the entire
backup process.
  

lsn and logfile number (of latest checkpoints) is already in the control
file, thus you need control file at start of backup. (To reduce the number
of logs needed for restore of an online backup you could force a checkpoint
before starting file backup)



Maybe I should have been more clear.  The control file snapshot must 
be taken at backup start (as you mention) but can be stored in cache.
The fields can then be modified as we see fit.  At the end of backup,
we can write this to a temp file and add it to the tar.  Therefore,
as mentioned, the snapshot spans the entire backup process.
 
  

You will also need lsn and logfile number after file backup, to know how much
log needs to at least be replayed to regain a consistent state.



This is a nicety but not a necessity. If you have a backup end log 
record, you just have to enforce that the PIT recovery encounters 
that particular log record on forward recovery.  Once encountered,
you know that you at passed the point of back up end.

Cheers,
Patrick



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



  






---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-07 Thread Bruce Momjian

Patrick Macdonald wrote:
 The idea of using the last lsn on the page to detect a partial
 write is used by other dbms systems.  You already have that 
 information available so there is no overhead in computing it. 
 Nothing wrong with CRC though.

Agreed.  Just thought I would point out that is not guaranteed.  Suppose
the 8k block is spread over 16 512 sectors in two cylinders.  The OS or
SCSI tagged queuing could wrote the second part of the page (sectors
9-16) before the first group (1-8).  If it writes 9-16, then writes 1-8
but fails in the middle of 1-8, the LSN will match at the front and back
of the page, but the page will be partially written.


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-06 Thread Patrick Macdonald

J.R.,

Nice first draft and a good read. Was going to comment 
in-line but thought this method would be easier to follow. 
The comments/suggestions below assume that PIT recovery is
being performed at the cluster level with a data backup 
image created by a tar-like utility. 

As noted, one of the main problems is knowing where to begin
in the log.  This can be handled by having backup processing 
update the control file with the first lsn and log file 
required.  At the time of the backup, this information is or 
can be made available.  The control file can be the last file
added to the tar and can contain information spanning the entire
backup process.

For data consistency, since the backup is being performed on 
an active cluster, we have to make sure to mark the end of the 
backup.  On restore, to make the cluster consistent, you have 
to force the user to perform forward recovery passed the point
of the backup completion marker in the (archived) log.  This 
can be handled using a backup end log record.  The backup end
log record would have to contain an identifier unique to this 
backup.  If a user requests to stop PIT recovery before this 
log record is encountered, consistency is not guaranteed. 
PIT should either disallow the action or warn of possible / 
impending doom.

The necessary logging for rtee (and others) insertions/deletions
can be added to the base code.  Not much of a worry but I would
expect to encounter other missing log items during testing.

The idea of using the last lsn on the page to detect a partial
write is used by other dbms systems.  You already have that 
information available so there is no overhead in computing it. 
Nothing wrong with CRC though.

As for the DB_SHUTDOWNED state, this could be handled by having
the backup processing update the control file field to 
DB_PIT_REQUIRED (or some such identifier).  After a restore,
users would be blocked from connecting to the cluster's databases 
until a forward recovery passed the backup end log record has
completed successfully.  

At the end of normal crash recovery, the user has to go digging
to identify in-flight transactions still in the system and abort
them manually.  It would be nice if PIT recovery automatically
aborted all in-flight transactions at the end. 

As PostgreSQL heads towards forward recovery functionality, it
may be wise to add headers to the log files.  As the logs from
any cluster are identically named, the header would allow unique
identification of the file and contents (cluster name, unique 
log id, id of the prior log file for chaining purposes, lsn 
ranges, etc). Most helpful.

Just a few notes from the administrative side. PIT recovery
should probably offer the user the following actions:

. forward recover to end of logs [and stop]
  Process log files located in the current directory until you
  read through the last one.  Allow the user the option to stop
  or not, just in case the logs are archived.  Send back the
  timestamp of the last encountered commit log record and the
  series of log files scanned.   

. forward recover to PIT [and stop]
  Similar to that described above but use the commit timestamps
  to gauge PIT progress.  
  
. forward recover query
  Send back the log series covered and the last commit timestamp
  encountered. 

. forward recover stop
  Stop the current forward recovery session. Undo all in-flight
  transactions and bring the databases down in a consistent
  state.  No other external user actions should be required.

Looking forward to reading draft 2.

Cheers,
Patrick
--
Patrick Macdonald 
Red Hat Canada 

J. R. Nield wrote:
 
 Hello:
 
 I've got the logging system to the point where I can take a shutdown
 consistent copy of a system, and play forward through multiple
 checkpoints. It seems to handle CREATE TABLE/DROP TABLE/TRUNCATE
 properly, and things are moving forward well. Recovery to an arbitrary
 point-in-time should be just as easy, but will need some administrative
 interface for it.
 
 At this point, some input would be useful on how I should handle things.
 
 The most important questions that need answering are in sections 2  5,
 since they impact the most other parts of the system. They will also
 require good documentation for sysadmins.
 
 Issues Outstanding for Point In Time Recovery (PITR)
 
 $Date: 2002/07/04 14:23:37 $
 
 $Revision: 1.4 $
 
 J.R. Nield
 
 (Enc: ISO 8859-15 Latin-9)
 
 §0 - Introduction
 
 This file is where I'm keeping track of all the issues I run into while
 trying to get PITR to work properly. Hopefully it will evolve into a
 description of how PITR actually works once it is implemented.
 
 I will also try to add feedback as it comes in.
 
 The big items so-far are:
 §1 - Logging Relation file creation, truncation, and removal
 This is mostly done. Can do infinte play-forward from
 online logs.
 §2 - Partial-Write and Bad Block detection

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-05 Thread Bruce Momjian

Bruce Momjian wrote:
 You are saying, How do we know what WAL records go with that backup
 snapshot of the file?  OK, lets assume we are shutdown.  You can grab
 the WAL log info from pg_control using contrib/pg_controldata and that
 tells you what WAL logs to roll forward when you need to PIT recover
 that backup later.  If you store that info in the first file you backup,
 you can have that WAL pointer available for later recovery in case you
 are restoring from that backup.  Is that the issue?
 
 What seems more complicated is doing the backup while the database is
 active, and this may be a requirement for a final PITR solution.  Some
 think we can grab the WAL pointer at 'tar' start and replay that on the
 backup even if the file changes during backup.

OK, I think I understand live backups now using tar and PITR.  Someone
explained this to me months ago but now I understand it.

First, a key issue is that PostgreSQL doesn't fiddle with individual
items on disk.  It reads an 8k block, modifies it, (writes it to WAL if
it hasn't been written to that WAL segment before), and writes it to
disk.  That is key.  (Are there cases where don't do this, like
pg_controldata?)

OK, so you do a tar backup of a file.  While you are doing the tar,
certain 8k blocks are being modified in the file.  There is no way to
know what blocks are modified as you are doing the tar, and in fact you
could read partial page writes during the tar.

One solution would be to read the file using the PostgreSQL page buffer,
but even then, getting a stable snapshot of the file would be difficult.
Now, we could lock the table and prevent writes while it is being backed
up, but there is a better way.

We already have pre-change page images in WAL.  When we do the backup,
any page that was modified while we were backing up is in the WAL.  On
restore, we can recover whatever tar saw of the file, knowing that the
WAL page images will recover any page changes made during the tar.

Now, you mentioned we may not want pre-change page images in WAL
because, with PITR, we can more easily recover from the WAL rather than
having this performance hit for many page writes.

What I suggest is a way for the backup tar to turn on pre-change page
images while the tar is happening, and turn it off after the tar is
done.

We already have this TODO item:

* Turn off after-change writes if fsync is disabled (?)

No sense in doing after-change WAL writes without fsync.  We could
extend this so those after-changes writes could be turned on an off,
allowing fill tar backups and PITR recovery.  In fact, for people with
reliable hardware, we should already be giving them the option of
turning off pre-change writes.  We don't have a way of detecting partial
page writes, but then again, we can't detect failures with fsync off
anyway so it seems to be the same vulnerability.  I guess that's why we
were going to wrap the effect into the same variable, but for PITR, can
see wanting fsync always on and the ability to turn pre-change writes on
and off.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-05 Thread Zeugswetter Andreas SB SD


 Let me re-write it, and I'll post it in the next version. The section
 dealt with what to do when you have a valid restored controlfile from a
 backup system, which is in the DB_SHUTDOWNED state, and that points to a
 valid shutdown/checkpoint record in the log; only the checkpoint record
 happens not to be the last one in the log. This is a situation that
 could never happen now, but would in PITR.

But it would need to be restore's responsibility to set the flag to 
DB_IN_PRODUCTION, no?

 Even if we shutdown before we copy the file, we don't want a file that
 hasn't been written to in 5 weeks before it was backed up to require
 five weeks of old log files to recover. So we need to track that
 information somehow, because right now if we scanned the blocks in the
 file looking for at the page LSN's, we greatest LSN we would see might
 be much older than where it would be safe to recover from. That is the
 biggest problem, I think.

Well, if you skip a validity test it could be restore's responsibility 
to know which checkpoint was last before the file backup was taken. 
(When doing a backup you would need to include the last checkpoint info
== pg_control at start of backup)

Andreas



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-05 Thread Bruce Momjian

J. R. Nield wrote:
 On Fri, 2002-07-05 at 01:42, Bruce Momjian wrote:
  
  We have needed
  point-in-time recovery for a long time, 
 
 Most thanks should go to vadim (and whoever else worked on this), since
 his WAL code already does most of the work. The key thing is auditing
 the backend to look for every case where we assume some action is not
 visible until after commit, and therefore don't log its effects. Those
 are the main cases that must be changed.

Yep.  Glad you can focus on that.

  ---
  
  J. R. Nield wrote:
   Hello:
   
   I've got the logging system to the point where I can take a shutdown
   consistent copy of a system, and play forward through multiple
   checkpoints. It seems to handle CREATE TABLE/DROP TABLE/TRUNCATE
 
 But notably not for the btree indexes! It looked like they were working,
 because the files were there, and all indexes created before the backup
 would work under insert/delete (including sys catalog indexes). This is
 because btree insert/delete is logged, just not during build. So I
 missed that one case.
 
 You will end-up with up-to-date table data though, so it is something.
 
 Adding logging support to btbuild is the next step, and I don't think it
 should be too hard. I am working this now.

Great.

 It is also a major advantage that most everything in the system gets
 stored in the catalog tables, and so is logged already.
 
 
  Uh, we do log pre-page writes to WAL to recover from partial page
  writes to disk.  Is there something more we need here?
  
  As for bad block detection, we have thought about adding a CRC to each
  page header, or at least making it optional.  WAL already has a CRC.
 
 
 Yes this should be last to do, because it is not necessary for PITR,
 only for performance (the option not to write pre-images without fear of
 data loss). 

Yep.

  Yes, there are a few places where we actually create a file, and if the
  server crashes, the file remains out there forever.  We need to track that
  better.  
 
 OK, there is a bigger problem then just tracking the file though. We
 sometimes do stuff to that file that we don't log. We assume that if we
 commit, the file must be OK and will not need replay because the
 transaction would not have committed if the file was not in a commitable
 state. If we abort, the system never sees the file, so in a sense we
 undo everything we did to the file. It is a kind of poor-man's rollback
 for certain operations, like btbuild, create table, etc. But it means
 that we cannot recover the file from the log, even after a commit.

Yep.

  
   
   ?1.1.1 - CREATE DATABASE is also unlogged
   
 This will cause the same replay problems as above.
  
  Yep.  Again, seems a master cleanup on startup is needed.
 
 The cleanup is not the problem, only a nuisance. Creating the files
 during replay is the problem. I must recreate CREATE DATABASE from the
 log exactly as it was done originally. I think just logging the
 parameters to the command function should be sufficient, but I need to
 think more about it.

OK, makes sense.  Nice when you can bundle a complex action into the
logging of one command and its parameters.

  
   b) If TRUNCATE TABLE fails, the system must PANIC. Otherwise, the table
   may be used in a future command, and a replay-recovered database may
   end-up with different data than the original.
  
  We number based on oids.  You mean oid wraparound could cause the file
  to be used again?
 
 That's not what I meant. Let's say I issue 'TRUNCATE TABLE foo'. Then,
 right before smgrtruncate is called, I do an XLogInsert saying Redo a
 TRUNCATE TABLE on foo to nblocks if we crash. Then smgrtruncate fails
 and we do an elog(ERROR)
 
 Now the user decides that since TRUNCATE TABLE didn't work, he might as
 well use the table, so he inserts some records into it, generating log
 entries.
 
 When I replay this log sequence later, what happens if the TRUNCATE
 succeeds instead of failing?

You mean the user is now accessing a partially truncated table? That's
just too weird.  I don't see how the WAL would know how far truncation
had gone.  I see why you would need the panic and it seems acceptable.

 I admit that there are other ways of handling it than to PANIC if the
 truncate fails. All the ones I can come up with seem to amount to some
 kind of ad-hoc UNDO log.

Yea, truncate failure seems so rare/impossible to happen, we can do a
panic and see if it ever happens to anyone.  I bet it will not.  Those
are usually cases of an OS crash, so it is the same as a panic.

   WAL must be flushed before truncate as well.
   
   WAL does not need to be flushed before create, if we don't mind 
   leaking files sometimes.
  
  Cleanup?
 
 Yes, we could garbage-collect leaked files. XLogFlush is not that
 expensive though, so I don't have an opinion on this yet.

Right now, if we do CREATE TABLE, and the backend 

Re: [HACKERS] Issues Outstanding for Point In Time Recovery (PITR)

2002-07-04 Thread J. R. Nield

On Thu, 2002-07-04 at 11:45, J. R. Nield wrote:

One other item that should be here:
 The big items so-far are:
 §1 - Logging Relation file creation, truncation, and removal 
 This is mostly done. Can do infinte play-forward from
 online logs.
 §2 - Partial-Write and Bad Block detection
 Need input before starting. Migration issues.
 §3 - Detecting Shutdown Consistent System Recovery
 Mostly done.
 §4 - Interactive Play-Forward Recovery for an Entire System
 Need input before starting.
 §5 - Individual file consistent recovery
 Need input. Semi-Major changes required.
  §6 - btbuild is not logged
  Not logged because of same assumptions as for file create.
  Only need to log the index build parameters to recreate
  the index, not each page change.


-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly