Re: [HACKERS] [PATCHES] Restartable Recovery

2006-08-09 Thread Simon Riggs
On Mon, 2006-08-07 at 13:05 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I've implemented this for BTree, GIN, GIST using an additional rmgr
  functionbool rm_safe_restartpoint(void)
  ...
  Recovery checkpoints are now renamed restartpoints to avoid
  confusion with checkpoints. So checkpoints occur during normal
  processing (only) and restartpoints occur during recovery (only).
 
 Applied with revisions.  

errCheckPointGuts() :-)  I guess patch reviews need some spicing up.

 As submitted the patch pushed backup_label out
 of the way immediately upon reading it, which is no good: you need to be
 sure that the starting checkpoint location is written to pg_control
 first, else an immediate crash would allow the thing to try to start
 from whatever checkpoint is listed in the backed-up pg_control.  Also,
 the minimum recovery stopping point that's obtained using the label file
 still has to be enforced if there's a crash during the replay sequence.
 I felt the best way to do that was to copy the minimum stopping point
 into pg_control, so that's what the code does now.

Thanks for checking that.

 Also, as I mentioned earlier, I think that doing restartpoints on the
 basis of elapsed time is simpler and more useful than having an explicit
 distinction between normal and standby modes.  We can always invent
 a standby_mode flag later if we need one, but we don't need it for this.

OK, agreed.

The original thinking was that writing a restartpoint was more crucial
when in standby mode; but this way we've better performance and have a
low ceiling on the restart time if that should ever occur at the worst
moment.

Thanks again to Marko for the concept.

I'll work on the docs for backup.sgml also.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Restartable Recovery

2006-08-07 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I've implemented this for BTree, GIN, GIST using an additional rmgr
 function  bool rm_safe_restartpoint(void)
 ...
 Recovery checkpoints are now renamed restartpoints to avoid
 confusion with checkpoints. So checkpoints occur during normal
 processing (only) and restartpoints occur during recovery (only).

Applied with revisions.  As submitted the patch pushed backup_label out
of the way immediately upon reading it, which is no good: you need to be
sure that the starting checkpoint location is written to pg_control
first, else an immediate crash would allow the thing to try to start
from whatever checkpoint is listed in the backed-up pg_control.  Also,
the minimum recovery stopping point that's obtained using the label file
still has to be enforced if there's a crash during the replay sequence.
I felt the best way to do that was to copy the minimum stopping point
into pg_control, so that's what the code does now.

Also, as I mentioned earlier, I think that doing restartpoints on the
basis of elapsed time is simpler and more useful than having an explicit
distinction between normal and standby modes.  We can always invent
a standby_mode flag later if we need one, but we don't need it for this.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Restartable Recovery

2006-07-31 Thread Bruce Momjian

Nice.  I was going to ask if this could make it into 8.2.

---

Simon Riggs wrote:
 On Sun, 2006-07-16 at 20:56 +0100, Simon Riggs wrote:
  On Sun, 2006-07-16 at 15:33 -0400, Tom Lane wrote:
   Simon Riggs [EMAIL PROTECTED] writes:
On Sun, 2006-07-16 at 12:40 -0400, Tom Lane wrote:
A compromise that might be good enough is to add an rmgr routine 
defined
as bool is_idle(void) that tests whether the rmgr has any open state
to worry about.  Then, recovery checkpoints are done only if all rmgrs
say they are idle.  
   
Perhaps that should be extended to say whether there are any
non-idempotent changes made in the last checkpoint period. That might
cover a wider set of potential actions.
   
   Perhaps best to call it safe_to_checkpoint(), and not pre-judge what
   reasons the rmgr might have for not wanting to restart here.
  
  You read my mind.
  
   If we are only going to do a recovery checkpoint at every Nth checkpoint
   record, then occasionally having to skip one seems no big problem ---
   just do it at the first subsequent record that is safe.
  
  Got it.
 
 I've implemented this for BTree, GIN, GIST using an additional rmgr
 function  bool rm_safe_restartpoint(void)
 
 The functions are actually trivial, assuming I've understood this and
 how GIST and GIN work for their xlogging.
 
 Recovery checkpoints are now renamed restartpoints to avoid
 confusion with checkpoints. So checkpoints occur during normal
 processing (only) and restartpoints occur during recovery (only).
 
 Updated patch enclosed, which I believe has no conflicts with the other
 patches on xlog.c just submitted.
 
 Much additional testing required, but the underlying concepts are very
 simple really. Andreas: any further gotchas? :-)
 
 -- 
   Simon Riggs
   EnterpriseDB  http://www.enterprisedb.com

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Restartable Recovery

2006-07-16 Thread Andreas Seltenreich
Simon Riggs [EMAIL PROTECTED] writes:

 [2. text/x-patch; restartableRecovery.patch]

Hmm, wouldn't you have to reboot the resource managers at each
checkpoint? I'm afraid otherwise things like postponed page splits
could get lost on restart from a later checkpoint.

regards,
andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Restartable Recovery

2006-07-16 Thread Tom Lane
Andreas Seltenreich [EMAIL PROTECTED] writes:
 Simon Riggs [EMAIL PROTECTED] writes:
 [2. text/x-patch; restartableRecovery.patch]

 Hmm, wouldn't you have to reboot the resource managers at each
 checkpoint? I'm afraid otherwise things like postponed page splits
 could get lost on restart from a later checkpoint.

Ouch.  That's a bit nasty.  You can't just apply a postponed split at
checkpoint time, because the WAL record could easily be somewhere after
the checkpoint, leading to duplicate insertions.  Right offhand I don't
see how to make this work :-(

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Restartable Recovery

2006-07-16 Thread Simon Riggs
On Sun, 2006-07-16 at 10:51 -0400, Tom Lane wrote:
 Andreas Seltenreich [EMAIL PROTECTED] writes:
  Simon Riggs [EMAIL PROTECTED] writes:
  [2. text/x-patch; restartableRecovery.patch]
 
  Hmm, wouldn't you have to reboot the resource managers at each
  checkpoint? I'm afraid otherwise things like postponed page splits
  could get lost on restart from a later checkpoint.
 
 Ouch.  That's a bit nasty.  You can't just apply a postponed split at
 checkpoint time, because the WAL record could easily be somewhere after
 the checkpoint, leading to duplicate insertions.  Right offhand I don't
 see how to make this work :-(

Yes, ouch. So much for gung-ho code sprints; thanks Andreas.

To do this we would need to have another rmgr specific routine that gets
called at a recovery checkpoint. This would then write to disk the
current state of the incomplete multi-WAL actions, in some manner.
During the startup routines we would check for any pre-existing state
files and use those to initialise the incomplete action cache. Cleanup
would then discard all state files. 

That allows us to not-forget actions, but it doesn't help us if there
are problems repeating actions twice. We would at least know that we are
in a potential double-action zone and could give different kinds of
errors or handling.

Or we can simply mark any indexes incomplete-needs-rebuild if they had a
page split during the overlap time between the last known good recovery
checkpoint and the following one. But that does lead to randomly bounded
recovery time, which might be better to have started from scratch
anyway.

Given time available for 8.2, neither one is a quick fix.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] Restartable Recovery

2006-07-11 Thread Simon Riggs

On Marko Kreen's detailed suggestion, I've implemented a restartable
recovery mode for archive recovery (aka PITR). Restart points are known
as recovery checkpoints and are normally taken every 100 checkpoints in
the log to ensure good recovery performance.

An additional mode
standby_mode = 'true'
can also be specified, which ensures that a recovery checkpoint occurs
for each checkpoint in the logs.

Some other code refactorings, though all changes isolated to xlog.c and
to pg_control.h; code comments welcome.

Applies cleanly to cvstip, passes make check.

Further details testing is very desirable. I've tested restarting a
recovery twice and things work successfully. 

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com
Index: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.242
diff -c -r1.242 xlog.c
*** src/backend/access/transam/xlog.c	27 Jun 2006 18:59:17 -	1.242
--- src/backend/access/transam/xlog.c	11 Jul 2006 16:46:21 -
***
*** 124,129 
--- 124,130 
  
  /* File path names (all relative to $PGDATA) */
  #define BACKUP_LABEL_FILE		backup_label
+ #define BACKUP_LABEL_IN_USE	backup_label.in_use
  #define RECOVERY_COMMAND_FILE	recovery.conf
  #define RECOVERY_COMMAND_DONE	recovery.done
  
***
*** 183,188 
--- 184,192 
  static bool recoveryTargetInclusive = true;
  static TransactionId recoveryTargetXid;
  static time_t recoveryTargetTime;
+ static bool InStandby = false;
+ /* How many XLOG_CHECKPOINT* entries since last recovery checkpoint */
+ static int nCheckpoints = 0;
  
  /* if recoveryStopsHere returns true, it saves actual stop xid/time here */
  static TransactionId recoveryStopXid;
***
*** 496,501 
--- 500,506 
  	 uint32 endLogId, uint32 endLogSeg);
  static void WriteControlFile(void);
  static void ReadControlFile(void);
+ static void ValidateControlFile(void);
  static char *str_time(time_t tnow);
  static void issue_xlog_fsync(void);
  
***
*** 505,511 
  static bool read_backup_label(XLogRecPtr *checkPointLoc);
  static void remove_backup_label(void);
  static void rm_redo_error_callback(void *arg);
! 
  
  /*
   * Insert an XLOG record having the specified RMID and info bytes,
--- 510,516 
  static bool read_backup_label(XLogRecPtr *checkPointLoc);
  static void remove_backup_label(void);
  static void rm_redo_error_callback(void *arg);
! static void CheckPointShmem(XLogRecPtr checkPointRedo);
  
  /*
   * Insert an XLOG record having the specified RMID and info bytes,
***
*** 3626,3631 
--- 3631,3663 
  		ereport(FATAL,
  (errmsg(incorrect checksum in control file)));
  
+ ValidateControlFile();
+ 
+ 	if (pg_perm_setlocale(LC_COLLATE, ControlFile-lc_collate) == NULL)
+ 		ereport(FATAL,
+ 			(errmsg(database files are incompatible with operating system),
+ 			 errdetail(The database cluster was initialized with LC_COLLATE \%s\,
+ 	which is not recognized by setlocale().,
+ 	   ControlFile-lc_collate),
+ 			 errhint(It looks like you need to initdb or install locale support.)));
+ 	if (pg_perm_setlocale(LC_CTYPE, ControlFile-lc_ctype) == NULL)
+ 		ereport(FATAL,
+ 			(errmsg(database files are incompatible with operating system),
+ 		errdetail(The database cluster was initialized with LC_CTYPE \%s\,
+    which is not recognized by setlocale().,
+   ControlFile-lc_ctype),
+ 			 errhint(It looks like you need to initdb or install locale support.)));
+ 
+ 	/* Make the fixed locale settings visible as GUC variables, too */
+ 	SetConfigOption(lc_collate, ControlFile-lc_collate,
+ 	PGC_INTERNAL, PGC_S_OVERRIDE);
+ 	SetConfigOption(lc_ctype, ControlFile-lc_ctype,
+ 	PGC_INTERNAL, PGC_S_OVERRIDE);
+ }
+ 
+ static void
+ ValidateControlFile(void)
+ {
  	/*
  	 * Do compatibility checking immediately.  We do this here for 2 reasons:
  	 *
***
*** 3722,3747 
     but the server was compiled with LOCALE_NAME_BUFLEN %d.,
  		   ControlFile-localeBuflen, LOCALE_NAME_BUFLEN),
   errhint(It looks like you need to recompile or initdb.)));
- 	if (pg_perm_setlocale(LC_COLLATE, ControlFile-lc_collate) == NULL)
- 		ereport(FATAL,
- 			(errmsg(database files are incompatible with operating system),
- 			 errdetail(The database cluster was initialized with LC_COLLATE \%s\,
- 	which is not recognized by setlocale().,
- 	   ControlFile-lc_collate),
- 			 errhint(It looks like you need to initdb or install locale support.)));
- 	if (pg_perm_setlocale(LC_CTYPE, ControlFile-lc_ctype) == NULL)
- 		ereport(FATAL,
- 			(errmsg(database files are incompatible with operating system),
- 		errdetail(The database cluster was initialized with LC_CTYPE \%s\,
-    which is not recognized by setlocale().,
-   ControlFile-lc_ctype),
- 			 errhint(It looks