[PATCHES] Writing WAL for relcache invalidation: pg_internal.init

2006-10-27 Thread Simon Riggs
Enclose a patch for new WAL records for relcache invalidation.

Not conclusively tested, but seems to work fine, so please regard this
as a prototype-needs-review.

We definitely need a regr test framework to support this type of patch.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

Index: src/backend/access/transam/rmgr.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/rmgr.c,v
retrieving revision 1.24
diff -c -r1.24 rmgr.c
*** src/backend/access/transam/rmgr.c	7 Aug 2006 16:57:56 -	1.24
--- src/backend/access/transam/rmgr.c	27 Oct 2006 20:36:09 -
***
*** 20,25 
--- 20,26 
  #include "commands/sequence.h"
  #include "commands/tablespace.h"
  #include "storage/smgr.h"
+ #include "utils/relcache.h"
  
  
  const RmgrData RmgrTable[RM_MAX_ID + 1] = {
***
*** 30,36 
  	{"Database", dbase_redo, dbase_desc, NULL, NULL, NULL},
  	{"Tablespace", tblspc_redo, tblspc_desc, NULL, NULL, NULL},
  	{"MultiXact", multixact_redo, multixact_desc, NULL, NULL, NULL},
! 	{"Reserved 7", NULL, NULL, NULL, NULL, NULL},
  	{"Reserved 8", NULL, NULL, NULL, NULL, NULL},
  	{"Reserved 9", NULL, NULL, NULL, NULL, NULL},
  	{"Heap", heap_redo, heap_desc, NULL, NULL, NULL},
--- 31,37 
  	{"Database", dbase_redo, dbase_desc, NULL, NULL, NULL},
  	{"Tablespace", tblspc_redo, tblspc_desc, NULL, NULL, NULL},
  	{"MultiXact", multixact_redo, multixact_desc, NULL, NULL, NULL},
! 	{"Cache", cache_redo, cache_desc, NULL, NULL, NULL},
  	{"Reserved 8", NULL, NULL, NULL, NULL, NULL},
  	{"Reserved 9", NULL, NULL, NULL, NULL, NULL},
  	{"Heap", heap_redo, heap_desc, NULL, NULL, NULL},
Index: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.252
diff -c -r1.252 xlog.c
*** src/backend/access/transam/xlog.c	18 Oct 2006 22:44:11 -	1.252
--- src/backend/access/transam/xlog.c	27 Oct 2006 20:36:19 -
***
*** 46,51 
--- 46,52 
  #include "utils/builtins.h"
  #include "utils/nabstime.h"
  #include "utils/pg_locale.h"
+ #include "utils/relcache.h"
  
  
  /*
Index: src/backend/utils/cache/relcache.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.249
diff -c -r1.249 relcache.c
*** src/backend/utils/cache/relcache.c	4 Oct 2006 00:30:00 -	1.249
--- src/backend/utils/cache/relcache.c	27 Oct 2006 20:36:24 -
***
*** 3568,3575 
  
  	if (beforeSend)
  	{
  		/* no interlock needed here */
! 		unlink(initfilename);
  	}
  	else
  	{
--- 3568,3599 
  
  	if (beforeSend)
  	{
+ 		/*
+ 		 * Make a non-transactional XLOG entry showing the file removal
+ 		 * It's non-transactional because we should replay it whether the
+ 		 * transaction commits or not; the underlying file change is certainly
+ 		 * not reversible. We do this *once* because we don't need to process
+  * invalidation messages during WAL reply, so even though the file
+  * is unlinked twice during normal invalidation sequence we need only
+  * do this once during WAL replay and so need only write WAL once.
+ 		 */
+ 		XLogRecPtr	lsn;
+ 		XLogRecData rdata;
+ 		xl_cache_init_file_inval xlrec;
+ 
+ xlrec.dbOid = MyDatabaseId;
+ xlrec.dbTableSpaceOid = MyDatabaseTableSpace;
+ 
+ 		rdata.data = (char *) &xlrec;
+ 		rdata.len = sizeof(xlrec);
+ 		rdata.buffer = InvalidBuffer;
+ 		rdata.next = NULL;
+ 
+ 		lsn = XLogInsert(RM_CACHE_ID, XLOG_CACHE_INIT_FILE_INVAL | XLOG_NO_TRAN,
+ 		 &rdata);
+ 
  		/* no interlock needed here */
! unlink(initfilename);
  	}
  	else
  	{
***
*** 3582,3588 
  		 * that we will execute second and successfully unlink the file.
  		 */
  		LWLockAcquire(RelCacheInitLock, LW_EXCLUSIVE);
! 		unlink(initfilename);
  		LWLockRelease(RelCacheInitLock);
  	}
  }
--- 3606,3651 
  		 * that we will execute second and successfully unlink the file.
  		 */
  		LWLockAcquire(RelCacheInitLock, LW_EXCLUSIVE);
! unlink(initfilename);
  		LWLockRelease(RelCacheInitLock);
  	}
+
+ }
+ 
+ void
+ cache_redo(XLogRecPtr lsn, XLogRecord *record)
+ {
+ 	uint8		info = record->xl_info & ~XLR_INFO_MASK;
+ 
+ if (info == XLOG_CACHE_INIT_FILE_INVAL)
+ {
+ 		xl_cache_init_file_inval *xlrec = (xl_cache_init_file_inval *) XLogRecGetData(record);
+ 		char *dbpath = GetDatabasePath((int) &xlrec->dbOid, (int) &xlrec->dbTableSpaceOid);
+ 	char initfilename[MAXPGPATH];
+ 
+ 	snprintf(initfilename, sizeof(initfilename), "%s/%s",
+ 			 dbpath, RELCACHE_INIT_FILENAME);
+ 
+ unlink(initfilename);
+ 		pfree(dbpath);
+ }
+ 	else
+ 		elog(PANIC, "cache_redo: unknown op code %u", info);
+ }
+ 
+ void
+ cache_desc(StringInfo buf, uint8 xl_info, cha

Re: [HACKERS] [PATCHES] GUC description cleanup

2006-10-27 Thread Josh Berkus
Neil,

> Sure, I'll wait for 8.3 to branch.

I have some cleanup I want to do for 8.3 too.  


Josh Berkus
PostgreSQL @ Sun
San Francisco 415-752-2500

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-27 Thread Simon Riggs
On Fri, 2006-10-27 at 22:19 +0100, Simon Riggs wrote:

> So we definitely have a nasty problem here.
> 
> VACUUM FREEZE is just a loaded gun right now.
> 
> > Maybe it's OK to say that during WAL replay we keep it
> > all the way back to the freeze horizon, but I'm not sure how we keep the
> > system from wiping clog it still needs right after switching to normal
> > operation.  Maybe we should somehow not xlog updates of datvacuumxid?
> 
> Thinking...

Suggestions:

1. Create a new Utility rmgr that can issue XLOG_UTIL_FREEZE messages
for each block that has had any tuples frozen on it during normal
VACUUMs. We need log only the relid, blockid and vacuum's xid to redo
the freeze operation.

2. VACUUM FREEZE need not generate any additional WAL records, but will
do an immediate sync following execution and before clog truncation.
That way the large number of changed blocks will all reach disk before
we do the updates to the catalog.

3. We don't truncate the clog during WAL replay, so the clog will grow
during recovery. Nothing to do there to make things safe.

4. When InArchiveRecovery we should set all of the datminxid and
datvacuumxid fields to be the Xid from where recovery started, so that
clog is not truncated soon after recovery. Performing a VACUUM FREEZE
after a recovery would be mentioned as an optional task at the end of a
PITR recovery on a failover/second server.

5. At 3.5 billion records during recovery we should halt the replay, do
a full database scan to set hint bits, truncate clog, then restart
replay. (Automatically within the recovery process).

6. During WAL replay, put out a warning message every 1 billion rows
saying that a hint bit scan will eventually be required if recovery
continues.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-27 Thread Simon Riggs
On Fri, 2006-10-27 at 12:01 -0400, Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> I think it's premature to start writing
> >> patches until we've decided how this really needs to work.
> 
> > Not logging hint-bit updates seems safe to me. As long as we have the 
> > clog, the hint-bit is just a hint. The problem with freezing is that 
> > after freezing tuples, the corresponding clog page can go away.
> 
> Actually clog can go away much sooner than that, at least in normal
> operation --- that's what datvacuumxid is for, to track where we can
> truncate clog.  

So we definitely have a nasty problem here.

VACUUM FREEZE is just a loaded gun right now.

> Maybe it's OK to say that during WAL replay we keep it
> all the way back to the freeze horizon, but I'm not sure how we keep the
> system from wiping clog it still needs right after switching to normal
> operation.  Maybe we should somehow not xlog updates of datvacuumxid?

Thinking...

Also, we should probably be setting all the hint bits for pages during
recovery then, so we don't need to re-write them again later.

> Another thing I'm concerned about is the scenario where a PITR
> hot-standby machine tracks a master over a period of more than 4 billion
> transactions.  I'm not sure what will happen in the slave's pg_clog
> directory, but I'm afraid it won't be good :-(

I think we'll need to error-out at that point, plus produce messages
when we pass 2 billion transactions recovered. It makes sense to produce
a new base backup regularly anyway.

We'll also need to produce an error message on the primary server so
that we take a new base backup every 2 billion transactions.

There are better solutions, but I'm not sure it makes sense to try and
fix them right now, since that could well delay the release. If we think
it is a necessary fix for the 8.2 line then we could get a better fix
into 8.2.1

[I've just coded the relcache invalidation WAL logging patch also.]

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-27 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes:
> [I've just coded the relcache invalidation WAL logging patch also.]

What?  That doesn't make any sense to me.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] WAL logging freezing

2006-10-27 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I think it's premature to start writing
>> patches until we've decided how this really needs to work.

> Not logging hint-bit updates seems safe to me. As long as we have the 
> clog, the hint-bit is just a hint. The problem with freezing is that 
> after freezing tuples, the corresponding clog page can go away.

Actually clog can go away much sooner than that, at least in normal
operation --- that's what datvacuumxid is for, to track where we can
truncate clog.  Maybe it's OK to say that during WAL replay we keep it
all the way back to the freeze horizon, but I'm not sure how we keep the
system from wiping clog it still needs right after switching to normal
operation.  Maybe we should somehow not xlog updates of datvacuumxid?

Another thing I'm concerned about is the scenario where a PITR
hot-standby machine tracks a master over a period of more than 4 billion
transactions.  I'm not sure what will happen in the slave's pg_clog
directory, but I'm afraid it won't be good :-(

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] WAL logging freezing

2006-10-27 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
I would've liked to give freezing a new opcode, 
  but we've ran out of them (see htup.h).


Hardly ... we have plenty of unused rmgr id's still.


Good point.


The real issue that still has to be resolved is the interaction of all
this stuff with PITR scenarios --- is it still safe to not log hint-bit
updates when PITR is on?  I think it's premature to start writing
patches until we've decided how this really needs to work.


Not logging hint-bit updates seems safe to me. As long as we have the 
clog, the hint-bit is just a hint. The problem with freezing is that 
after freezing tuples, the corresponding clog page can go away.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] GUC description cleanup

2006-10-27 Thread Neil Conway
On Fri, 2006-10-27 at 15:59 +0200, Peter Eisentraut wrote:
> I appreciate this effort, but I think it's better to hold the patch.

Sure, I'll wait for 8.3 to branch.

-Neil



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] GUC description cleanup

2006-10-27 Thread Peter Eisentraut
Am Donnerstag, 26. Oktober 2006 19:47 schrieb Neil Conway:
> Note that this patch breaks the translations of these strings, so I
> haven't applied it yet. Should I apply it now, or wait for 8.3 to
> branch?

I appreciate this effort, but I think it's better to hold the patch.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings