Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-26 Thread Koichi Suzuki
I'm now writing v3 patch of PITR improvement, to work with sync.rep
and Hot Standby.Would like to change the thread.

2008/12/12 Pavan Deolasee pavan.deola...@gmail.com:
 On Fri, Dec 12, 2008 at 9:08 AM, Koichi Suzuki koichi@gmail.com wrote:
 Hmmm,  it's really like pg_readahead needs to be included in the core.
   I don't think it's a big work and will try to do this.


 Yes, I think it's best to have it in core. I would actually combine it
 with the other idea of reading xlog files directly into xlog buffers
 during recovery.

 Thanks,
 Pavan

 --
 Pavan Deolasee
 EnterpriseDB http://www.enterprisedb.com




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-11 Thread Koichi Suzuki
Hmmm,  it's really like pg_readahead needs to be included in the core.
   I don't think it's a big work and will try to do this.

2008/12/9 Fujii Masao masao.fu...@gmail.com:
 Hi,

 On Mon, Dec 8, 2008 at 2:54 PM, Koichi Suzuki koichi@gmail.com wrote:
 I understood your point.  In the case of synchronous replication,
 because slave fails over when master crashes,  there're no need to
 leave FPW from the beginning.

 In this case, only prefetch will work.   Fujii's code at the slave
 looks very similar to pg_standby and pg_readahead will help in this
 case with no modification.

 As the result of discussion, I will change the way to recover on the standby;
 we don't use PITR for the WAL which walreceiver received, instead, startup
 process read it by *record* from pg_xlog and redo. So, I'm afraid that
 synchronous replication doesn't match well with pg_readahead.

 Regards,


 2008/12/4 Simon Riggs si...@2ndquadrant.com:

 On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote:

  There's clearly a huge gain using prefetch, when we have
  full_page_writes = off. But that does make me think: Why do we need
  prefetch at all if we use full page writes? There's nothing to prefetch
  if we can keep it in cache.

 Agreed.   This is why I proposed prefetch optional through GUC.

  So I'm wondering if we only need prefetch because we're using lesslog?
 
  If we integrated lesslog better into the new replication would we be
  able to forget about doing the prefetch altogether?

 In the case of lesslog, almost all the FPW is replaced with
 corresponding incremental log and recovery takes longer.   Prefetch
 dramatically improve this, as you will see in the above result.To
 improve recovery time with FPW=off or FPW=on and lesslog=yes, we need
 prefetch.

 It does sound like it is needed, yes. But if you look at the
 architecture of synchronous replication in 8.4 then I don't think it
 makes sense any more. It would be very useful for the architecture we
 had in 8.3, but that time has gone.

 If we have FPW=on on primary then we will stream WAL with FPW to
 standby. There seems little point removing it *after* it has been sent,
 then putting it back again before we recover, especially when it causes
 a drop in performance that then needs to be fixed (by this patch).

 pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW.

 So if we find a way of streaming WAL without FPW then this patch makes
 sense, but not until then. So far many people have argued in favour of
 using FPW=on, which was the whole point of pg_lesslog. Are we now saying
 that we would run the primary with FPW=off?

 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support





 --
 --
 Koichi Suzuki

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-11 Thread Pavan Deolasee
On Fri, Dec 12, 2008 at 9:08 AM, Koichi Suzuki koichi@gmail.com wrote:
 Hmmm,  it's really like pg_readahead needs to be included in the core.
   I don't think it's a big work and will try to do this.


Yes, I think it's best to have it in core. I would actually combine it
with the other idea of reading xlog files directly into xlog buffers
during recovery.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-08 Thread Fujii Masao
Hi,

On Mon, Dec 8, 2008 at 2:54 PM, Koichi Suzuki [EMAIL PROTECTED] wrote:
 I understood your point.  In the case of synchronous replication,
 because slave fails over when master crashes,  there're no need to
 leave FPW from the beginning.

 In this case, only prefetch will work.   Fujii's code at the slave
 looks very similar to pg_standby and pg_readahead will help in this
 case with no modification.

As the result of discussion, I will change the way to recover on the standby;
we don't use PITR for the WAL which walreceiver received, instead, startup
process read it by *record* from pg_xlog and redo. So, I'm afraid that
synchronous replication doesn't match well with pg_readahead.

Regards,


 2008/12/4 Simon Riggs [EMAIL PROTECTED]:

 On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote:

  There's clearly a huge gain using prefetch, when we have
  full_page_writes = off. But that does make me think: Why do we need
  prefetch at all if we use full page writes? There's nothing to prefetch
  if we can keep it in cache.

 Agreed.   This is why I proposed prefetch optional through GUC.

  So I'm wondering if we only need prefetch because we're using lesslog?
 
  If we integrated lesslog better into the new replication would we be
  able to forget about doing the prefetch altogether?

 In the case of lesslog, almost all the FPW is replaced with
 corresponding incremental log and recovery takes longer.   Prefetch
 dramatically improve this, as you will see in the above result.To
 improve recovery time with FPW=off or FPW=on and lesslog=yes, we need
 prefetch.

 It does sound like it is needed, yes. But if you look at the
 architecture of synchronous replication in 8.4 then I don't think it
 makes sense any more. It would be very useful for the architecture we
 had in 8.3, but that time has gone.

 If we have FPW=on on primary then we will stream WAL with FPW to
 standby. There seems little point removing it *after* it has been sent,
 then putting it back again before we recover, especially when it causes
 a drop in performance that then needs to be fixed (by this patch).

 pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW.

 So if we find a way of streaming WAL without FPW then this patch makes
 sense, but not until then. So far many people have argued in favour of
 using FPW=on, which was the whole point of pg_lesslog. Are we now saying
 that we would run the primary with FPW=off?

 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support





 --
 --
 Koichi Suzuki

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-07 Thread Koichi Suzuki
I understood your point.  In the case of synchronous replication,
because slave fails over when master crashes,  there're no need to
leave FPW from the beginning.

In this case, only prefetch will work.   Fujii's code at the slave
looks very similar to pg_standby and pg_readahead will help in this
case with no modification.

2008/12/4 Simon Riggs [EMAIL PROTECTED]:

 On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote:

  There's clearly a huge gain using prefetch, when we have
  full_page_writes = off. But that does make me think: Why do we need
  prefetch at all if we use full page writes? There's nothing to prefetch
  if we can keep it in cache.

 Agreed.   This is why I proposed prefetch optional through GUC.

  So I'm wondering if we only need prefetch because we're using lesslog?
 
  If we integrated lesslog better into the new replication would we be
  able to forget about doing the prefetch altogether?

 In the case of lesslog, almost all the FPW is replaced with
 corresponding incremental log and recovery takes longer.   Prefetch
 dramatically improve this, as you will see in the above result.To
 improve recovery time with FPW=off or FPW=on and lesslog=yes, we need
 prefetch.

 It does sound like it is needed, yes. But if you look at the
 architecture of synchronous replication in 8.4 then I don't think it
 makes sense any more. It would be very useful for the architecture we
 had in 8.3, but that time has gone.

 If we have FPW=on on primary then we will stream WAL with FPW to
 standby. There seems little point removing it *after* it has been sent,
 then putting it back again before we recover, especially when it causes
 a drop in performance that then needs to be fixed (by this patch).

 pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW.

 So if we find a way of streaming WAL without FPW then this patch makes
 sense, but not until then. So far many people have argued in favour of
 using FPW=on, which was the whole point of pg_lesslog. Are we now saying
 that we would run the primary with FPW=off?

 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support





-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-04 Thread Simon Riggs

On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote:

  There's clearly a huge gain using prefetch, when we have
  full_page_writes = off. But that does make me think: Why do we need
  prefetch at all if we use full page writes? There's nothing to prefetch
  if we can keep it in cache.
 
 Agreed.   This is why I proposed prefetch optional through GUC.
 
  So I'm wondering if we only need prefetch because we're using lesslog?
 
  If we integrated lesslog better into the new replication would we be
  able to forget about doing the prefetch altogether?
 
 In the case of lesslog, almost all the FPW is replaced with
 corresponding incremental log and recovery takes longer.   Prefetch
 dramatically improve this, as you will see in the above result.To
 improve recovery time with FPW=off or FPW=on and lesslog=yes, we need
 prefetch.

It does sound like it is needed, yes. But if you look at the
architecture of synchronous replication in 8.4 then I don't think it
makes sense any more. It would be very useful for the architecture we
had in 8.3, but that time has gone.

If we have FPW=on on primary then we will stream WAL with FPW to
standby. There seems little point removing it *after* it has been sent,
then putting it back again before we recover, especially when it causes
a drop in performance that then needs to be fixed (by this patch).

pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW.

So if we find a way of streaming WAL without FPW then this patch makes
sense, but not until then. So far many people have argued in favour of
using FPW=on, which was the whole point of pg_lesslog. Are we now saying
that we would run the primary with FPW=off?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-04 Thread Fujii Masao
Hi,

On Thu, Dec 4, 2008 at 6:11 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote:

  There's clearly a huge gain using prefetch, when we have
  full_page_writes = off. But that does make me think: Why do we need
  prefetch at all if we use full page writes? There's nothing to prefetch
  if we can keep it in cache.

 Agreed.   This is why I proposed prefetch optional through GUC.

  So I'm wondering if we only need prefetch because we're using lesslog?
 
  If we integrated lesslog better into the new replication would we be
  able to forget about doing the prefetch altogether?

 In the case of lesslog, almost all the FPW is replaced with
 corresponding incremental log and recovery takes longer.   Prefetch
 dramatically improve this, as you will see in the above result.To
 improve recovery time with FPW=off or FPW=on and lesslog=yes, we need
 prefetch.

 It does sound like it is needed, yes. But if you look at the
 architecture of synchronous replication in 8.4 then I don't think it
 makes sense any more. It would be very useful for the architecture we
 had in 8.3, but that time has gone.

Agreed. I also think that lesslog is for archiving in single node rather
than replication between multiple nodes. Of course, it's very useful
for the user who doesn't use replication.. etc.

 So if we find a way of streaming WAL without FPW then this patch makes
 sense, but not until then. So far many people have argued in favour of
 using FPW=on, which was the whole point of pg_lesslog. Are we now saying
 that we would run the primary with FPW=off?

If we always recover a database from a base backup, the primary can run
with FPW=off. Since we might need a fresh backup when making the failed
server catch up with the current primary, such restriction (always recovery
from a backup) might not matter.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-03 Thread Koichi Suzuki
Agreed.

I borrowed WAL parsing code from XLogdump and I think WAL parsing
should be another candidate.

2008/12/3 Fujii Masao [EMAIL PROTECTED]:
 Hi,

 On Thu, Nov 27, 2008 at 9:04 PM, Koichi Suzuki [EMAIL PROTECTED] wrote:
 Please find enclosed a revised version of pg_readahead and a patch to
 invoke pg_readahead.

 Some similar functions are in xlog.c and pg_readahead.c (for example,
 RecordIsValid). I think that we should unify them as a common function,
 which helps to develop the tool (for example, xlogdump) treating WAL in
 the future.

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-02 Thread Koichi Suzuki
Hi,

As to checkpoint timeout, yes, this measurement is hard for FPW=on
case.  I'll do the similar measurement for checkpoint timeout = 5min
and post the result.   I expect that the recoevry time will be almost
the same in the case FPW=on, lesslog=yes and prefetpch = yes.

2008/12/2 Simon Riggs [EMAIL PROTECTED]:

 On Thu, 2008-11-27 at 21:04 +0900, Koichi Suzuki wrote:

 We ran the
 benchmark for on hour with chekpoint timeout 30min and completion_target 0.5.
 Then, collected all the archive log and run PITR.

 --+++---
 WAL conditions| Recovery   | Amount of  | recovery
   | time (sec) | physical read (MB) | rate (TX/min)
 --+++---
 w/o prefetch  |||
 archived with cp  |  6,611 | 5,435  |402
 FPW=off   |||
 --+++---
 With prefetch |||
 archived with cp  |  1,161 | 5,543  |  2,290
 FPW=off   |||
 --+++---

 There's clearly a huge gain using prefetch, when we have
 full_page_writes = off. But that does make me think: Why do we need
 prefetch at all if we use full page writes? There's nothing to prefetch
 if we can keep it in cache.

Agreed.   This is why I proposed prefetch optional through GUC.


 I notice we set the checkpoint_timeout to 30 mins, which is long enough
 to exceed the cache on the standby. I wonder if we reduced the timeout
 would we use the cache better on the standby and not need readahead at
 all? Do you have any results to examine cache overflow/shorter timeouts?

 w/o prefetch  |||
 archived with cp  |  1,683 |   801  |  1,458
 FPW=on|||  (8.3)
 --+++---
 w/o prefetch  |||
 archived with lesslog |  6,644 | 5,090  |369
 FPW=on|||
 --+++---
 With prefetch |||
 archived with cp  |  1,415 | 2,157  |  1,733
 FPW=on|||
 --+++---
 With prefetch |||
 archived with lesslog |  1,196 | 5,369  |  2,051
 FPW=on||| (This proposal)
 --+++---

 So I'm wondering if we only need prefetch because we're using lesslog?

 If we integrated lesslog better into the new replication would we be
 able to forget about doing the prefetch altogether?

In the case of lesslog, almost all the FPW is replaced with
corresponding incremental log and recovery takes longer.   Prefetch
dramatically improve this, as you will see in the above result.To
improve recovery time with FPW=off or FPW=on and lesslog=yes, we need
prefetch.

 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support





-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-02 Thread Fujii Masao
Hi,

On Thu, Nov 27, 2008 at 9:04 PM, Koichi Suzuki [EMAIL PROTECTED] wrote:
 Please find enclosed a revised version of pg_readahead and a patch to
 invoke pg_readahead.

Some similar functions are in xlog.c and pg_readahead.c (for example,
RecordIsValid). I think that we should unify them as a common function,
which helps to develop the tool (for example, xlogdump) treating WAL in
the future.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-01 Thread Simon Riggs

On Thu, 2008-11-27 at 21:04 +0900, Koichi Suzuki wrote:

 We ran the
 benchmark for on hour with chekpoint timeout 30min and completion_target 0.5.
 Then, collected all the archive log and run PITR.

 --+++---
 WAL conditions| Recovery   | Amount of  | recovery
   | time (sec) | physical read (MB) | rate (TX/min)
 --+++---
 w/o prefetch  |||
 archived with cp  |  6,611 | 5,435  |402
 FPW=off   |||
 --+++---
 With prefetch |||
 archived with cp  |  1,161 | 5,543  |  2,290
 FPW=off   |||
 --+++---

There's clearly a huge gain using prefetch, when we have
full_page_writes = off. But that does make me think: Why do we need
prefetch at all if we use full page writes? There's nothing to prefetch
if we can keep it in cache.

I notice we set the checkpoint_timeout to 30 mins, which is long enough
to exceed the cache on the standby. I wonder if we reduced the timeout
would we use the cache better on the standby and not need readahead at
all? Do you have any results to examine cache overflow/shorter timeouts?

 w/o prefetch  |||
 archived with cp  |  1,683 |   801  |  1,458
 FPW=on|||  (8.3)
 --+++---
 w/o prefetch  |||
 archived with lesslog |  6,644 | 5,090  |369
 FPW=on|||
 --+++---
 With prefetch |||
 archived with cp  |  1,415 | 2,157  |  1,733
 FPW=on|||
 --+++---
 With prefetch |||
 archived with lesslog |  1,196 | 5,369  |  2,051
 FPW=on||| (This proposal)
 --+++---

So I'm wondering if we only need prefetch because we're using lesslog?

If we integrated lesslog better into the new replication would we be
able to forget about doing the prefetch altogether?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers