Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-15 Thread Joachim Wieland
On Tue, Dec 7, 2010 at 3:42 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, I've committed this patch now.

I can confirm that I could continue replaying the logfiles on the
standby host with this patch.


Thanks a lot,
Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-07 Thread Heikki Linnakangas

On 02.12.2010 12:31, Heikki Linnakangas wrote:

On 02.12.2010 13:25, Simon Riggs wrote:

On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the
running-xacts snapshot and writing it to the log is sufficient, I don't
see what moving the running-xacts record buys us. Does it allow some
further simplifications somewhere?


Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.


Most of the changes to procarray.c were about removing code that's no
longer necessary when we close the gap between acquiring and writing the
running-xacts WAL record. You can leave it as it is as a historical
curiosity, but I'd prefer to simplify it, given that we now know that it
doesn't actually work correctly if the gap is not closed.


Ok, I've committed this patch now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 01.12.2010 20:51, Heikki Linnakangas wrote:

Another approach would be to revisit the way the running-xacts snapshot
is taken. Currently, we first take a snapshot, and then WAL-log it.
There is a small window between the steps where backends can begin/end
transactions, and recovery has to deal with that. When this was
designed, there was long discussion on whether we should instead grab
WALInsertLock and ProcArrayLock at the same time, to ensure that the
running-xacts snapshot represents an up-to-date situation at the point
in WAL where it's inserted.

We didn't want to do that because both locks can be heavily contended.
But maybe we should after all. It would make the recovery code simpler.

If we want to get fancy, we wouldn't necessarily need to hold both locks
for the whole duration. We could first grab ProcArrayLock and construct
the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
finally write the WAL record and release WALInsertLock. But that would
require small changes to XLogInsert.


I took a look at that approach. We don't actually need to hold 
ProcArrayLock while the WAL-record is written, we need to hold 
XidGenLock. I believe that's less severe than holding the ProcArrayLock 
as there's already precedence for writing a WAL record while holding 
that: we do that when we advance to a new clog page and write a 
zero-clog-page record.


So this is what we should do IMHO.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6095,6102  StartupXLOG(void)
  			StartupSUBTRANS(oldestActiveXID);
  			StartupMultiXact();
  
- 			ProcArrayInitRecoveryInfo(oldestActiveXID);
- 
  			/*
  			 * If we're beginning at a shutdown checkpoint, we know that
  			 * nothing was running on the master at this point. So fake-up an
--- 6095,6100 
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 435,453  ProcArrayClearTransaction(PGPROC *proc)
  }
  
  /*
-  * ProcArrayInitRecoveryInfo
-  *
-  * When trying to assemble our snapshot we only care about xids after this value.
-  * See comments for LogStandbySnapshot().
-  */
- void
- ProcArrayInitRecoveryInfo(TransactionId oldestActiveXid)
- {
- 	latestObservedXid = oldestActiveXid;
- 	TransactionIdRetreat(latestObservedXid);
- }
- 
- /*
   * ProcArrayApplyRecoveryInfo -- apply recovery info about xids
   *
   * Takes us through 3 states: Initialized, Pending and Ready.
--- 435,440 
***
*** 519,533  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * OK, we need to initialise from the RunningTransactionsData record
! 	 */
! 
! 	/*
! 	 * Remove all xids except xids later than the snapshot. We don't know
! 	 * exactly which ones that is until precisely now, so that is why we allow
! 	 * xids to be added only to remove most of them again here.
  	 */
- 	ExpireOldKnownAssignedTransactionIds(running-nextXid);
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
--- 506,514 
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * Release any locks belonging to old transactions that are not
! 	 * running according to the running-xacts record.
  	 */
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
***
*** 536,544  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
- 	 * Combine the running xact data with already known xids, if any exist.
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * combine them first, sort them and then re-add to KnownAssignedXids.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
--- 517,524 
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * sort them first.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
***
*** 547,597  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	 * xids to subtrans. If RunningXacts is overflowed then we don't have
  	 * enough information to correctly update subtrans anyway.
  	 */
  
  	/*
! 	 * Allocate a temporary array so we can combine xids. The total of both
! 	 * arrays should never normally exceed TOTAL_MAX_CACHED_SUBXIDS.
  	 */
! 	xids = palloc(sizeof(TransactionId) * TOTAL_MAX_CACHED_SUBXIDS);
  
  	/*
! 	 * Get the remaining KnownAssignedXids. In most cases there won't be any
! 	 * at all since this exists only to catch a theoretical race condition.
! 	 */
! 	nxids = KnownAssignedXidsGet(xids, InvalidTransactionId);
! 	if (nxids  0)
! 		

Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Simon Riggs
On Thu, 2010-12-02 at 10:39 +0200, Heikki Linnakangas wrote:
 On 01.12.2010 20:51, Heikki Linnakangas wrote:
  Another approach would be to revisit the way the running-xacts snapshot
  is taken. Currently, we first take a snapshot, and then WAL-log it.
  There is a small window between the steps where backends can begin/end
  transactions, and recovery has to deal with that. When this was
  designed, there was long discussion on whether we should instead grab
  WALInsertLock and ProcArrayLock at the same time, to ensure that the
  running-xacts snapshot represents an up-to-date situation at the point
  in WAL where it's inserted.
 
  We didn't want to do that because both locks can be heavily contended.
  But maybe we should after all. It would make the recovery code simpler.
 
  If we want to get fancy, we wouldn't necessarily need to hold both locks
  for the whole duration. We could first grab ProcArrayLock and construct
  the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
  finally write the WAL record and release WALInsertLock. But that would
  require small changes to XLogInsert.
 
 I took a look at that approach. We don't actually need to hold 
 ProcArrayLock while the WAL-record is written, we need to hold 
 XidGenLock. I believe that's less severe than holding the ProcArrayLock 
 as there's already precedence for writing a WAL record while holding 
 that: we do that when we advance to a new clog page and write a 
 zero-clog-page record.
 
 So this is what we should do IMHO.

Oh, thanks for looking at this. I've been looking at this also and as we
might expect had a slightly different design.

First, your assessment of the locking above is better than mine. I agree
with your analysis so we should do it that way. The locking issue was
the reason I haven't patched this yet so I'm glad you've improved this.

In terms of the rest of the patch, it seems we have different designs, I
think I have a much simpler, less invasive solution:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the 
running-xacts snapshot and writing it to the log is sufficient, I don't 
see what moving the running-xacts record buys us. Does it allow some 
further simplifications somewhere?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Simon Riggs
On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
 On 02.12.2010 11:02, Simon Riggs wrote:
  The cause of the issue is that replay starts at one LSN and there is a
  delay until the RunningXacts WAL record occurs. If there was no delay,
  there would be no issue at all. In CreateCheckpoint() we start by
  grabbing the WAInsertLock and later recording that pointer as part of
  the checkpoint record. My proposal is to replace the grab the lock
  code with the insert of the RunningXacts WAL record (when wal_level
  set), so that recovery always starts with that record type.
 
 Oh, interesting idea. But AFAICS closing the gap between acquiring the 
 running-xacts snapshot and writing it to the log is sufficient, I don't 
 see what moving the running-xacts record buys us. Does it allow some 
 further simplifications somewhere?

Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.

Earlier on this thread, we discussed:

On Wed, 2010-11-24 at 15:19 +, Simon Riggs wrote: 
 On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
  When recovery starts, we fetch the oldestActiveXid from the checkpoint
  record. Let's say that it's 100. We then start replaying WAL records 
  from the Redo pointer, and the first record (heap insert in your case)
  contains an Xid that's much larger than 100, say 1. We call 
  RecordKnownAssignedXids() to make note that all xids between that
  range are in-progress, but there isn't enough room in the array for
  that.
 
 Agreed.

The current code fails because of the gap between the redo pointer and
the XLOG_RUNNING_XACTS WAL record. If there is no gap, there is no
problem.

So my preferred solution would:
* Log XLOG_RUNNING_XACTS while holding XidGenLock, as you suggest
* Move logging to occur at the Redo pointer

That is a much smaller patch with a smaller footprint.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 02.12.2010 13:25, Simon Riggs wrote:

On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the
running-xacts snapshot and writing it to the log is sufficient, I don't
see what moving the running-xacts record buys us. Does it allow some
further simplifications somewhere?


Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.


Most of the changes to procarray.c were about removing code that's no 
longer necessary when we close the gap between acquiring and writing the 
running-xacts WAL record. You can leave it as it is as a historical 
curiosity, but I'd prefer to simplify it, given that we now know that it 
doesn't actually work correctly if the gap is not closed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-01 Thread Heikki Linnakangas

On 24.11.2010 12:48, Heikki Linnakangas wrote:

When recovery starts, we fetch the oldestActiveXid from the checkpoint
record. Let's say that it's 100. We then start replaying WAL records
from the Redo pointer, and the first record (heap insert in your case)
contains an Xid that's much larger than 100, say 1. We call
RecordKnownAssignedXids() to make note that all xids between that range
are in-progress, but there isn't enough room in the array for that.

We normally get away with a smallish array because the array is trimmed
at commit and abort records, and the special xid-assignment record to
handle the case of a lot of subtransactions. We initialize the array
from the running-xacts record that's written at a checkpoint. That
mechanism fails in this case because the heap insert record is seen
before the running-xacts record, causing all those xids in the range
100-1 to be considered in-progress. The running-xacts record that
comes later would prune them, but we don't have enough slots to hold
them until that.

Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
the xids we see during WAL replay in private memory instead of putting
them in the KnownAssignedXids array until we see the running-xacts record.


So, here's a patch using that approach.

Another approach would be to revisit the way the running-xacts snapshot 
is taken. Currently, we first take a snapshot, and then WAL-log it. 
There is a small window between the steps where backends can begin/end 
transactions, and recovery has to deal with that. When this was 
designed, there was long discussion on whether we should instead grab 
WALInsertLock and ProcArrayLock at the same time, to ensure that the 
running-xacts snapshot represents an up-to-date situation at the point 
in WAL where it's inserted.


We didn't want to do that because both locks can be heavily contended. 
But maybe we should after all. It would make the recovery code simpler.


If we want to get fancy, we wouldn't necessarily need to hold both locks 
for the whole duration. We could first grab ProcArrayLock and construct 
the snapshot. Then grab WALInsertLock and release ProcArrayLock, and 
finally write the WAL record and release WALInsertLock. But that would 
require small changes to XLogInsert.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6095,6102  StartupXLOG(void)
  			StartupSUBTRANS(oldestActiveXID);
  			StartupMultiXact();
  
- 			ProcArrayInitRecoveryInfo(oldestActiveXID);
- 
  			/*
  			 * If we're beginning at a shutdown checkpoint, we know that
  			 * nothing was running on the master at this point. So fake-up an
--- 6095,6100 
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 435,453  ProcArrayClearTransaction(PGPROC *proc)
  }
  
  /*
-  * ProcArrayInitRecoveryInfo
-  *
-  * When trying to assemble our snapshot we only care about xids after this value.
-  * See comments for LogStandbySnapshot().
-  */
- void
- ProcArrayInitRecoveryInfo(TransactionId oldestActiveXid)
- {
- 	latestObservedXid = oldestActiveXid;
- 	TransactionIdRetreat(latestObservedXid);
- }
- 
- /*
   * ProcArrayApplyRecoveryInfo -- apply recovery info about xids
   *
   * Takes us through 3 states: Initialized, Pending and Ready.
--- 435,440 
***
*** 519,533  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * OK, we need to initialise from the RunningTransactionsData record
! 	 */
! 
! 	/*
! 	 * Remove all xids except xids later than the snapshot. We don't know
! 	 * exactly which ones that is until precisely now, so that is why we allow
! 	 * xids to be added only to remove most of them again here.
  	 */
- 	ExpireOldKnownAssignedTransactionIds(running-nextXid);
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
--- 506,514 
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * Release any locks belonging to old transactions that are not
! 	 * running according to the running-xacts record.
  	 */
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
***
*** 536,544  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
- 	 * Combine the running xact data with already known xids, if any exist.
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * combine them first, sort them and then re-add to KnownAssignedXids.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
--- 517,524 
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * sort them first.
  	 *
  	 * Some of the new xids are 

Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Heikki Linnakangas

On 24.11.2010 06:56, Joachim Wieland wrote:

On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 19.11.2010 23:46, Joachim Wieland wrote:


FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with
wal_debug=on ? That will print all the replayed WAL records, which is a lot
of data, but it might give a hint what's going on.


Sure, but this gives me only one more line:

[...]
LOG:  redo starts at 1F8/FC00E978
LOG:  REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 4587) exited with exit code 1
LOG:  terminating any other active server processes


Thanks, I can reproduce this now. This happens when you have a wide gap 
between the oldest still active xid and the latest xid.


When recovery starts, we fetch the oldestActiveXid from the checkpoint 
record. Let's say that it's 100. We then start replaying WAL records 
from the Redo pointer, and the first record (heap insert in your case) 
contains an Xid that's much larger than 100, say 1. We call 
RecordKnownAssignedXids() to make note that all xids between that range 
are in-progress, but there isn't enough room in the array for that.


We normally get away with a smallish array because the array is trimmed 
at commit and abort records, and the special xid-assignment record to 
handle the case of a lot of subtransactions. We initialize the array 
from the running-xacts record that's written at a checkpoint. That 
mechanism fails in this case because the heap insert record is seen 
before the running-xacts record, causing all those xids in the range 
100-1 to be considered in-progress. The running-xacts record that 
comes later would prune them, but we don't have enough slots to hold 
them until that.


Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash 
the xids we see during WAL replay in private memory instead of putting 
them in the KnownAssignedXids array until we see the running-xacts record.


To reproduce this, I did this in the master:

postgres=# CREATE FUNCTION insertfunc(n integer) RETURNS VOID AS $$
declare
  i integer;
begin
  FOR i IN 1..n LOOP
BEGIN
  INSERT INTO foo VALUES (1);
EXCEPTION WHEN division_by_zero THEN RAISE NOTICE 'divbyzero';
END;
  END LOOP;
end;
$$ LANGUAGE plpgsql;
postgres=# SELECT insertfunc(1);

After letting that run for a while, so that a couple of checkpoints have 
occurred, kill the master and start standby to recover that from 
archive. After it has replayed all the WAL, stop the standby and restart it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Heikki Linnakangas

On 24.11.2010 12:48, Heikki Linnakangas wrote:

On 24.11.2010 06:56, Joachim Wieland wrote:

On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

On 19.11.2010 23:46, Joachim Wieland wrote:


FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with
wal_debug=on ? That will print all the replayed WAL records, which is
a lot
of data, but it might give a hint what's going on.


Sure, but this gives me only one more line:

[...]
LOG: redo starts at 1F8/FC00E978
LOG: REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG: startup process (PID 4587) exited with exit code 1
LOG: terminating any other active server processes


Thanks, I can reproduce this now. This happens when you have a wide gap
between the oldest still active xid and the latest xid.

When recovery starts, we fetch the oldestActiveXid from the checkpoint
record. Let's say that it's 100. We then start replaying WAL records
from the Redo pointer, and the first record (heap insert in your case)
contains an Xid that's much larger than 100, say 1. We call
RecordKnownAssignedXids() to make note that all xids between that range
are in-progress, but there isn't enough room in the array for that.

We normally get away with a smallish array because the array is trimmed
at commit and abort records, and the special xid-assignment record to
handle the case of a lot of subtransactions. We initialize the array
from the running-xacts record that's written at a checkpoint. That
mechanism fails in this case because the heap insert record is seen
before the running-xacts record, causing all those xids in the range
100-1 to be considered in-progress. The running-xacts record that
comes later would prune them, but we don't have enough slots to hold
them until that.

Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
the xids we see during WAL replay in private memory instead of putting
them in the KnownAssignedXids array until we see the running-xacts record.


Looking closer at RecordKnownAssignedTransactionIds(), there's a related 
much more serious bug there too. When latestObservedXid is initialized 
to the oldest still-running xid, oldestActiveXid, at WAL recovery, we 
zero the CLOG starting from the oldestActiveXid. That will zap away the 
clog bits of any old transactions that had already committed before the 
checkpoint started, but were younger than the oldest still running 
transaction. The transactions will be lost :-(.


It's dangerous to initialize latestObservedXid to anything to an older 
value. The idea of keeping the seen xids in a temporary list private to 
the startup process until the running-xacts record would solve that 
problem too. ProcArrayInitRecoveryInfo() would not be needed anymore, 
the KnownAssignedXids tracking would start at the first running-xacts 
record (or shutdown checkpoint) we see, not any sooner than that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Heikki Linnakangas

On 24.11.2010 13:38, Heikki Linnakangas wrote:

It's dangerous to initialize latestObservedXid to anything to an older
value.


older value than the nextXid-1 from the checkpoint record, I meant to say.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Simon Riggs
On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
 When recovery starts, we fetch the oldestActiveXid from the checkpoint
 record. Let's say that it's 100. We then start replaying WAL records 
 from the Redo pointer, and the first record (heap insert in your case)
 contains an Xid that's much larger than 100, say 1. We call 
 RecordKnownAssignedXids() to make note that all xids between that
 range are in-progress, but there isn't enough room in the array for
 that.

Agreed.

 Hmm. I'm not sure off the top of my head how to fix that. Perhaps
stash 
 the xids we see during WAL replay in private memory instead of
 putting 
 them in the KnownAssignedXids array until we see the running-xacts
 record.

Moving LogStandbySnapshot() earlier will help but won't solve it fully.

Will think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-23 Thread Joachim Wieland
On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 19.11.2010 23:46, Joachim Wieland wrote:

 FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
 pArray-maxKnownAssignedXids: 6890

 Hmm, that's a lot of entries in KnownAssignedXids.

 Can you recompile with WAL_DEBUG, and run the recovery again with
 wal_debug=on ? That will print all the replayed WAL records, which is a lot
 of data, but it might give a hint what's going on.

Sure, but this gives me only one more line:

[...]
LOG:  redo starts at 1F8/FC00E978
LOG:  REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 4587) exited with exit code 1
LOG:  terminating any other active server processes


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-22 Thread Joachim Wieland
On Sun, Nov 21, 2010 at 11:48 PM, Fujii Masao masao.fu...@gmail.com wrote:
 --
 If you suspect a bug in Hot Standby, please set
        trace_recovery_messages = DEBUG2
 in postgresql.conf and repeat the action

 Always useful to know
 * max_connections
 * current number of sessions
 * whether we have two phase commits happening
 --

The trace_recovery_messages parameter does not give more output...

max_connections is set to 100

there have been no sessions on the standby itself, but a few on the
primary database, I don't know how much but probably not more than 10.
The sessions there were doing quite some load however, among them
slony synchronization, since the hot standby master database was
actually a slony replica.

max_prepared_transactions has not been changed from its default value of 0.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-22 Thread Heikki Linnakangas

On 19.11.2010 23:46, Joachim Wieland wrote:

FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with 
wal_debug=on ? That will print all the replayed WAL records, which is a 
lot of data, but it might give a hint what's going on.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-21 Thread Fujii Masao
On Sat, Nov 20, 2010 at 6:46 AM, Joachim Wieland j...@mcknight.de wrote:
 I still have the server, if you want me to debug anything or send a
 patch against 9.0.1 that gives more output, just let me know.

Per previous Simon's comment, the following information would be useful.
http://archives.postgresql.org/pgsql-general/2010-10/msg00154.php

--
If you suspect a bug in Hot Standby, please set
trace_recovery_messages = DEBUG2
in postgresql.conf and repeat the action

Always useful to know
* max_connections
* current number of sessions
* whether we have two phase commits happening
--

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-19 Thread Joachim Wieland
Hi,

I am seeing the following here on 9.0.1 on Linux x86-64:

LOG:  redo starts at 1F8/FC00E978
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23


and this is the complete history:

postgres was running as HS in foreground, Ctrl-C'ed it for a restart.

LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating walreceiver process due to administrator command
FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down


Started it up again:

$ postgres -D /db/
LOG:  database system was shut down in recovery at 2010-11-19 14:36:30 EST
LOG:  entering standby mode
cp: cannot stat `/archive/000101F90001': No such file or directory
cp: cannot stat `/archive/000101F800FC': No such file or directory
LOG:  redo starts at 1F8/FC00E978
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 30052) exited with exit code 1
LOG:  terminating any other active server processes


(copied the log files over...)


./postgres -D /db/

LOG:  database system was interrupted while in recovery at log time
2010-11-19 14:36:12 EST
HINT:  If this has occurred more than once some data might be
corrupted and you might need to choose an earlier recovery target.
LOG:  entering standby mode
LOG:  restored log file 000101F90001 from archive
LOG:  restored log file 000101F800FC from archive
LOG:  redo starts at 1F8/FC00E978
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 31581) exited with exit code 1
LOG:  terminating any other active server processes


Changing the line in the source code to give some more output gives me:

FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


I still have the server, if you want me to debug anything or send a
patch against 9.0.1 that gives more output, just let me know.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers