Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-12-02 Thread Koichi Suzuki
Hi,

I found that no one is registered as hot standby reviewer.   I'd like
to review the patch, mainly by testing through some benchmark test.

Regards;

2008/11/2 Simon Riggs [EMAIL PROTECTED]:
 Hot Standby patch, including all major planned features.

 Allows users to connect to server in archive recovery and run queries.

 Applies cleanly, passes make check.

 There's no flaky code, kludges or other last minute rush-for-deadline
 stuff. It all works, though really for a patch this size and scope I
 expect many bugs. As a result I am describing this as WIP, though it
 is much more than a prototype. All the code has been planned out in
 advance, so there's been very little on-the-fly changes required during
 final development. I'm fully committed to making all required changes
 and fixes in reasonable times. I will continue detailed testing over the
 next few weeks to re-check everything prior to commit.

 Initially, I would ask people to spend time thinking about this
 conceptually to check that I have all the correct subsystems and cater
 for all the little side tweaks that exist in various parts of the server

 When you test this, please do it on a server built with --enable-cassert
 and keep a detailed log using trace_recovery_messages = DEBUG4 (or 2-3).

 Code has been specifically designed to be performance neutral or better
 for normal workloads, so the WAL volume, number of times we take WAL
 locks etc should be identical (on 64-bit systems). The patch includes
 further tuning of subtransaction commits, so I hope the patch may even
 be a few % win on more complex workloads with many PL/pgSQL functions
 using EXCEPTION. Enabling the bgwriter during recovery seems likely to
 be a huge gain on performance for larger shared_buffers settings, which
 should offset a considerable increase in CPU usage during recovery.

 Performance test results would be appreciated
 * for normal running - just to test that it really is neutral
 * for recovery - if query access not used is it faster than before

 Points of significant note for detailed reviewers

 * Prepared transactions not implemented yet. No problems foreseen, but
 want to wait to see if other refactorings are required.

 * Touching every block of a btree index during a replay of VACUUM is not
 yet implemented. Would like some second opinions that it is even
 required. I have code prototyped for it, but it feels too wacky even
 after Heikki and I agreed it was required. Do we still think that?

 * locking correctness around flat file refresh still not enabled

 * need some discussiona round how to handle max_connections changes
 cleanly.

-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-12-02 Thread Pavan Deolasee
On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote:

 Hi,

 I found that no one is registered as hot standby reviewer.   I'd like
 to review the patch, mainly by testing through some benchmark test.


You can yourself edit the Wiki page, though you need to register first. But
its very straight forward.

I added myself as reviewer to Hot standby few days back since I did some
code review and testing. I intend to spend some more time after new patch is
posted.

Thanks,
Pavan

Pavan Deolasee.
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-12-02 Thread Koichi Suzuki
Thank you for your advise.   I'll edit the Wiki page.

2008/12/3 Pavan Deolasee [EMAIL PROTECTED]:


 On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote:

 Hi,

 I found that no one is registered as hot standby reviewer.   I'd like
 to review the patch, mainly by testing through some benchmark test.


 You can yourself edit the Wiki page, though you need to register first. But
 its very straight forward.

 I added myself as reviewer to Hot standby few days back since I did some
 code review and testing. I intend to spend some more time after new patch is
 posted.

 Thanks,
 Pavan

 Pavan Deolasee.
 EnterpriseDB http://www.enterprisedb.com




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-21 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 8:15 PM, Pavan Deolasee [EMAIL PROTECTED]wrote:



 On Thu, Nov 20, 2008 at 7:50 PM, Simon Riggs [EMAIL PROTECTED]wrote:




 (I assume you mean bgwriter, not archiver process).


 Yeah, its the bgwriter, IIRC hung while taking checkpoint.



Sorry, its the startup process thats stuck in the checkpoint. Here is the
stack trace:

(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x0095564b in semop () from /lib/libc.so.6
#2  0x0825c703 in PGSemaphoreLock (sema=0xb7c52c7c, interruptOK=0 '\0') at
pg_sema.c:420
#3  0x0829ff5e in LWLockAcquire (lockid=WALInsertLock, mode=LW_EXCLUSIVE) at
lwlock.c:456
#4  0x080d5c7e in XLogInsert (rmid=0 '\0', info=16 '\020', rdata=0xbfda1798)
at xlog.c:746
#5  0x080e2e0f in CreateCheckPoint (flags=6) at xlog.c:6674
#6  0x080e1afd in StartupXLOG () at xlog.c:6077
#7  0x08104f2f in AuxiliaryProcessMain (argc=2, argv=0xbfda19e4) at
bootstrap.c:421
#8  0x0826d285 in StartChildProcess (type=StartupProcess) at
postmaster.c:4104
#9  0x082690d9 in PostmasterMain (argc=3, argv=0x9c89a60) at
postmaster.c:1034
#10 0x081f90ff in main (argc=3, argv=0x9c89a60) at main.c:188


ISTM that the postmaster somehow does not receive (may be because of what I
reported in my other mail) PMSIGNAL_RECOVERY_START and hence bgwriter is not
started. The startup process then itself tries to take a checkpoint and
hangs.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-21 Thread Simon Riggs

On Fri, 2008-11-21 at 17:08 +0530, Pavan Deolasee wrote:

 Sorry, its the startup process thats stuck in the checkpoint. Here is
 the stack trace:

Already fixed in new version I'm preparing for you.

Both the startup process and bgwriter can perform restartpoints, so its
not a problem whether we reached the point at which we pmsignal or not.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Simon Riggs

On Thu, 2008-11-20 at 11:51 +0530, Pavan Deolasee wrote:

 I wonder if we should refactor lazy_scan_heap() so that *all* the real
 work of collecting information about dead tuples happens only in
 heap_page_prune(). Frankly, there is only a rare chance that a tuple
 may become DEAD after the pruning happened on the page. We can ignore
 such tuples; they will be vacuumed/pruned in the next cycle.
 
 This would save us a second check of HeapTupleSatisfiesVacuum on the
 tuples which are just now checked in heap_page_prune(). In addition,
 the following additional WAL records are then not necessary because
 heap_page_prune() must have already logged the latestRemovedXid.

I like this idea. I've attempted to plug every gap, but perhaps the best
way here is to remove the gap completely.

In my testing, I only saw this case happen a couple of times in many
tests. Rarely executed code gives sporadic bugs, so I would be happy to
remove it and the standby support stuff that goes with it.

I would suggest that we just remove the switch statement:
switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
and alter the following if test since tupgone is also removed.
That will cause HEAPTUPLE_DEAD rows to be fed to heap_freeze_tuple().
Comments on that function claim that is a bad thing, but we know that
any row that was *not* removed by heap_page_prune() and is now dead must
have died very recently and so will never be frozen.

Let me know if you're happy with that change and I'll make it so.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 3:38 PM, Simon Riggs [EMAIL PROTECTED] wrote:


 I would suggest that we just remove the switch statement:
switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
 and alter the following if test since tupgone is also removed.
 That will cause HEAPTUPLE_DEAD rows to be fed to heap_freeze_tuple().
 Comments on that function claim that is a bad thing, but we know that
 any row that was *not* removed by heap_page_prune() and is now dead must
 have died very recently and so will never be frozen.

 Let me know if you're happy with that change and I'll make it so.



Yeah, I think we should be safe. We continuously hold EX lock on the buffer
since the prune operation is carried out. So the only new DEAD tuples may
arrive because some transaction aborted in between, changing
INSERT_IN_PROGRESS tuple to DEAD. But these tuples won't pass the cutoff_xid
test and should never be frozen.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Simon Riggs

On Thu, 2008-11-20 at 12:03 +0530, Pavan Deolasee wrote:

 On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED]
 wrote:
 Hot Standby patch, including all major planned features.
 
 
 While experimenting with the patch, I noticed that sometimes the
 archiver process indefinitely waits for WALInsertLock. I haven't spent
 much time debugging that, but the following chunk clearly seems to be
 buggy. The WALInsertLock is not released if (leavingArchiveRecovery ==
 true).

Mmmm, it seems this is correct. I had to reconstruct this section of
code after recent bitrot, so it looks I introduced a bug doing that.
What I'm surprised about is that I got a similar hang myself in testing
and in my notes I have this ticked as resolved.

The fix is trivial, though I'm sorry it was there for you to find at
all.

(I assume you mean bgwriter, not archiver process).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 7:50 PM, Simon Riggs [EMAIL PROTECTED] wrote:




 (I assume you mean bgwriter, not archiver process).


Yeah, its the bgwriter, IIRC hung while taking checkpoint.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 I wonder if we should refactor lazy_scan_heap() so that *all* the real work
 of collecting information about dead tuples happens only in
 heap_page_prune(). Frankly, there is only a rare chance that a tuple may
 become DEAD after the pruning happened on the page. We can ignore such
 tuples; they will be vacuumed/pruned in the next cycle.

 This would save us a second check of HeapTupleSatisfiesVacuum on the tuples
 which are just now checked in heap_page_prune(). In addition, the following
 additional WAL records are then not necessary because heap_page_prune() must
 have already logged the latestRemovedXid.

I don't think you can do that.  Couldn't someone else have run
heap_page_prune between vacuum's first and second visit to the page?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Simon Riggs

On Thu, 2008-11-20 at 10:33 -0500, Tom Lane wrote:
 Pavan Deolasee [EMAIL PROTECTED] writes:
  I wonder if we should refactor lazy_scan_heap() so that *all* the real work
  of collecting information about dead tuples happens only in
  heap_page_prune(). Frankly, there is only a rare chance that a tuple may
  become DEAD after the pruning happened on the page. We can ignore such
  tuples; they will be vacuumed/pruned in the next cycle.
 
  This would save us a second check of HeapTupleSatisfiesVacuum on the tuples
  which are just now checked in heap_page_prune(). In addition, the following
  additional WAL records are then not necessary because heap_page_prune() must
  have already logged the latestRemovedXid.
 
 I don't think you can do that.  Couldn't someone else have run
 heap_page_prune between vacuum's first and second visit to the page?

I just looked at that in more detail and decided it was more difficult
than it first appeared. So I've left it for now.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 9:03 PM, Tom Lane [EMAIL PROTECTED] wrote:


 I don't think you can do that.  Couldn't someone else have run
 heap_page_prune between vacuum's first and second visit to the page?




You mean the second visit in the first pass where we again check for
HeapTupleSatisfiesVacuum ? We hold exclusive lock continuously in the first
pass. So its not possible for someone else to call heap_page_prune.  If its
the second visit in the second heap scan, then it removes only the dead
tuples recorded in the first pass. So we should be good there too.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-19 Thread Pavan Deolasee
On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 Hot Standby patch, including all major planned features.



I wonder if we should refactor lazy_scan_heap() so that *all* the real work
of collecting information about dead tuples happens only in
heap_page_prune(). Frankly, there is only a rare chance that a tuple may
become DEAD after the pruning happened on the page. We can ignore such
tuples; they will be vacuumed/pruned in the next cycle.

This would save us a second check of HeapTupleSatisfiesVacuum on the tuples
which are just now checked in heap_page_prune(). In addition, the following
additional WAL records are then not necessary because heap_page_prune() must
have already logged the latestRemovedXid.

+ /*
+  * For Hot Standby we need to know the highest transaction id that will
+  * be removed by any change. VACUUM proceeds in a number of passes so
+  * we need to consider how each pass operates. The first pass runs
+  * heap_page_prune(), which can issue XLOG_HEAP2_CLEAN records as it
+  * progresses - these will have a latestRemovedXid on each record.
+  * In many cases this removes all of the tuples to be removed.
+  * Then we look at tuples to be removed, but do not actually remove them
+  * until phase three. However, index records for those rows are removed
+  * in phase two and index blocks do not have MVCC information attached.
+  * So before we can allow removal of *any* index tuples we need to issue
+  * a WAL record indicating what the latestRemovedXid will be at the end
+  * of phase three. This then allows Hot Standby queries to block at the
+  * correct place, i.e. before phase two, rather than during phase three
+  * as we issue more XLOG_HEAP2_CLEAN records. If we need to run multiple
+  * phase two/three because of memory constraints we need to issue multiple
+  * log records also.
+  */
+ static void
+ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
+ {
+   /*
+* No need to log changes for temp tables, they do not contain
+* data visible on the standby server.
+*/
+   if (rel-rd_istemp)
+   return;
+
+   (void) log_heap_cleanup_info(rel-rd_node,
vacrelstats-latestRemovedXid);
+ }


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-19 Thread Pavan Deolasee
On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 Hot Standby patch, including all major planned features.


While experimenting with the patch, I noticed that sometimes the archiver
process indefinitely waits for WALInsertLock. I haven't spent much time
debugging that, but the following chunk clearly seems to be buggy. The
WALInsertLock is not released if (leavingArchiveRecovery == true).

--- 6565,6592 
}
}

!   if (leavingArchiveRecovery)
!   checkPoint.redo = GetRedoLocationForArchiveCheckpoint();
!   else
{
!   /*
!* Compute new REDO record ptr = location of next XLOG record.
!*
!* NB: this is NOT necessarily where the checkpoint record itself
will be,
!* since other backends may insert more XLOG records while we're off
doing
!* the buffer flush work.  Those XLOG records are logically after
the
!* checkpoint, even though physically before it.  Got that?
!*/
!   checkPoint.redo = GetRedoLocationForCheckpoint();

!   /*
!* Now we can release WAL insert lock, allowing other xacts to
proceed
!* while we are flushing disk buffers.
!*/
!   LWLockRelease(WALInsertLock);
}

/*
 * If enabled, log checkpoint start.  We postpone this until now so as
not
 * to log anything if we decided to skip the checkpoint.
 */


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com