Re: [HACKERS] V2 of PITR performance improvement for 8.4
I'm now writing v3 patch of PITR improvement, to work with sync.rep and Hot Standby.Would like to change the thread. 2008/12/12 Pavan Deolasee pavan.deola...@gmail.com: On Fri, Dec 12, 2008 at 9:08 AM, Koichi Suzuki koichi@gmail.com wrote: Hmmm, it's really like pg_readahead needs to be included in the core. I don't think it's a big work and will try to do this. Yes, I think it's best to have it in core. I would actually combine it with the other idea of reading xlog files directly into xlog buffers during recovery. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Hmmm, it's really like pg_readahead needs to be included in the core. I don't think it's a big work and will try to do this. 2008/12/9 Fujii Masao masao.fu...@gmail.com: Hi, On Mon, Dec 8, 2008 at 2:54 PM, Koichi Suzuki koichi@gmail.com wrote: I understood your point. In the case of synchronous replication, because slave fails over when master crashes, there're no need to leave FPW from the beginning. In this case, only prefetch will work. Fujii's code at the slave looks very similar to pg_standby and pg_readahead will help in this case with no modification. As the result of discussion, I will change the way to recover on the standby; we don't use PITR for the WAL which walreceiver received, instead, startup process read it by *record* from pg_xlog and redo. So, I'm afraid that synchronous replication doesn't match well with pg_readahead. Regards, 2008/12/4 Simon Riggs si...@2ndquadrant.com: On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote: There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. Agreed. This is why I proposed prefetch optional through GUC. So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? In the case of lesslog, almost all the FPW is replaced with corresponding incremental log and recovery takes longer. Prefetch dramatically improve this, as you will see in the above result.To improve recovery time with FPW=off or FPW=on and lesslog=yes, we need prefetch. It does sound like it is needed, yes. But if you look at the architecture of synchronous replication in 8.4 then I don't think it makes sense any more. It would be very useful for the architecture we had in 8.3, but that time has gone. If we have FPW=on on primary then we will stream WAL with FPW to standby. There seems little point removing it *after* it has been sent, then putting it back again before we recover, especially when it causes a drop in performance that then needs to be fixed (by this patch). pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW. So if we find a way of streaming WAL without FPW then this patch makes sense, but not until then. So far many people have argued in favour of using FPW=on, which was the whole point of pg_lesslog. Are we now saying that we would run the primary with FPW=off? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
On Fri, Dec 12, 2008 at 9:08 AM, Koichi Suzuki koichi@gmail.com wrote: Hmmm, it's really like pg_readahead needs to be included in the core. I don't think it's a big work and will try to do this. Yes, I think it's best to have it in core. I would actually combine it with the other idea of reading xlog files directly into xlog buffers during recovery. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Hi, On Mon, Dec 8, 2008 at 2:54 PM, Koichi Suzuki [EMAIL PROTECTED] wrote: I understood your point. In the case of synchronous replication, because slave fails over when master crashes, there're no need to leave FPW from the beginning. In this case, only prefetch will work. Fujii's code at the slave looks very similar to pg_standby and pg_readahead will help in this case with no modification. As the result of discussion, I will change the way to recover on the standby; we don't use PITR for the WAL which walreceiver received, instead, startup process read it by *record* from pg_xlog and redo. So, I'm afraid that synchronous replication doesn't match well with pg_readahead. Regards, 2008/12/4 Simon Riggs [EMAIL PROTECTED]: On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote: There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. Agreed. This is why I proposed prefetch optional through GUC. So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? In the case of lesslog, almost all the FPW is replaced with corresponding incremental log and recovery takes longer. Prefetch dramatically improve this, as you will see in the above result.To improve recovery time with FPW=off or FPW=on and lesslog=yes, we need prefetch. It does sound like it is needed, yes. But if you look at the architecture of synchronous replication in 8.4 then I don't think it makes sense any more. It would be very useful for the architecture we had in 8.3, but that time has gone. If we have FPW=on on primary then we will stream WAL with FPW to standby. There seems little point removing it *after* it has been sent, then putting it back again before we recover, especially when it causes a drop in performance that then needs to be fixed (by this patch). pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW. So if we find a way of streaming WAL without FPW then this patch makes sense, but not until then. So far many people have argued in favour of using FPW=on, which was the whole point of pg_lesslog. Are we now saying that we would run the primary with FPW=off? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
I understood your point. In the case of synchronous replication, because slave fails over when master crashes, there're no need to leave FPW from the beginning. In this case, only prefetch will work. Fujii's code at the slave looks very similar to pg_standby and pg_readahead will help in this case with no modification. 2008/12/4 Simon Riggs [EMAIL PROTECTED]: On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote: There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. Agreed. This is why I proposed prefetch optional through GUC. So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? In the case of lesslog, almost all the FPW is replaced with corresponding incremental log and recovery takes longer. Prefetch dramatically improve this, as you will see in the above result.To improve recovery time with FPW=off or FPW=on and lesslog=yes, we need prefetch. It does sound like it is needed, yes. But if you look at the architecture of synchronous replication in 8.4 then I don't think it makes sense any more. It would be very useful for the architecture we had in 8.3, but that time has gone. If we have FPW=on on primary then we will stream WAL with FPW to standby. There seems little point removing it *after* it has been sent, then putting it back again before we recover, especially when it causes a drop in performance that then needs to be fixed (by this patch). pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW. So if we find a way of streaming WAL without FPW then this patch makes sense, but not until then. So far many people have argued in favour of using FPW=on, which was the whole point of pg_lesslog. Are we now saying that we would run the primary with FPW=off? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote: There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. Agreed. This is why I proposed prefetch optional through GUC. So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? In the case of lesslog, almost all the FPW is replaced with corresponding incremental log and recovery takes longer. Prefetch dramatically improve this, as you will see in the above result.To improve recovery time with FPW=off or FPW=on and lesslog=yes, we need prefetch. It does sound like it is needed, yes. But if you look at the architecture of synchronous replication in 8.4 then I don't think it makes sense any more. It would be very useful for the architecture we had in 8.3, but that time has gone. If we have FPW=on on primary then we will stream WAL with FPW to standby. There seems little point removing it *after* it has been sent, then putting it back again before we recover, especially when it causes a drop in performance that then needs to be fixed (by this patch). pg_lesslog allowed us to write FPW to disk, yet send WAL without FPW. So if we find a way of streaming WAL without FPW then this patch makes sense, but not until then. So far many people have argued in favour of using FPW=on, which was the whole point of pg_lesslog. Are we now saying that we would run the primary with FPW=off? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Hi, On Thu, Dec 4, 2008 at 6:11 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Wed, 2008-12-03 at 14:22 +0900, Koichi Suzuki wrote: There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. Agreed. This is why I proposed prefetch optional through GUC. So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? In the case of lesslog, almost all the FPW is replaced with corresponding incremental log and recovery takes longer. Prefetch dramatically improve this, as you will see in the above result.To improve recovery time with FPW=off or FPW=on and lesslog=yes, we need prefetch. It does sound like it is needed, yes. But if you look at the architecture of synchronous replication in 8.4 then I don't think it makes sense any more. It would be very useful for the architecture we had in 8.3, but that time has gone. Agreed. I also think that lesslog is for archiving in single node rather than replication between multiple nodes. Of course, it's very useful for the user who doesn't use replication.. etc. So if we find a way of streaming WAL without FPW then this patch makes sense, but not until then. So far many people have argued in favour of using FPW=on, which was the whole point of pg_lesslog. Are we now saying that we would run the primary with FPW=off? If we always recover a database from a base backup, the primary can run with FPW=off. Since we might need a fresh backup when making the failed server catch up with the current primary, such restriction (always recovery from a backup) might not matter. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Agreed. I borrowed WAL parsing code from XLogdump and I think WAL parsing should be another candidate. 2008/12/3 Fujii Masao [EMAIL PROTECTED]: Hi, On Thu, Nov 27, 2008 at 9:04 PM, Koichi Suzuki [EMAIL PROTECTED] wrote: Please find enclosed a revised version of pg_readahead and a patch to invoke pg_readahead. Some similar functions are in xlog.c and pg_readahead.c (for example, RecordIsValid). I think that we should unify them as a common function, which helps to develop the tool (for example, xlogdump) treating WAL in the future. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Hi, As to checkpoint timeout, yes, this measurement is hard for FPW=on case. I'll do the similar measurement for checkpoint timeout = 5min and post the result. I expect that the recoevry time will be almost the same in the case FPW=on, lesslog=yes and prefetpch = yes. 2008/12/2 Simon Riggs [EMAIL PROTECTED]: On Thu, 2008-11-27 at 21:04 +0900, Koichi Suzuki wrote: We ran the benchmark for on hour with chekpoint timeout 30min and completion_target 0.5. Then, collected all the archive log and run PITR. --+++--- WAL conditions| Recovery | Amount of | recovery | time (sec) | physical read (MB) | rate (TX/min) --+++--- w/o prefetch ||| archived with cp | 6,611 | 5,435 |402 FPW=off ||| --+++--- With prefetch ||| archived with cp | 1,161 | 5,543 | 2,290 FPW=off ||| --+++--- There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. Agreed. This is why I proposed prefetch optional through GUC. I notice we set the checkpoint_timeout to 30 mins, which is long enough to exceed the cache on the standby. I wonder if we reduced the timeout would we use the cache better on the standby and not need readahead at all? Do you have any results to examine cache overflow/shorter timeouts? w/o prefetch ||| archived with cp | 1,683 | 801 | 1,458 FPW=on||| (8.3) --+++--- w/o prefetch ||| archived with lesslog | 6,644 | 5,090 |369 FPW=on||| --+++--- With prefetch ||| archived with cp | 1,415 | 2,157 | 1,733 FPW=on||| --+++--- With prefetch ||| archived with lesslog | 1,196 | 5,369 | 2,051 FPW=on||| (This proposal) --+++--- So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? In the case of lesslog, almost all the FPW is replaced with corresponding incremental log and recovery takes longer. Prefetch dramatically improve this, as you will see in the above result.To improve recovery time with FPW=off or FPW=on and lesslog=yes, we need prefetch. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Hi, On Thu, Nov 27, 2008 at 9:04 PM, Koichi Suzuki [EMAIL PROTECTED] wrote: Please find enclosed a revised version of pg_readahead and a patch to invoke pg_readahead. Some similar functions are in xlog.c and pg_readahead.c (for example, RecordIsValid). I think that we should unify them as a common function, which helps to develop the tool (for example, xlogdump) treating WAL in the future. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
On Thu, 2008-11-27 at 21:04 +0900, Koichi Suzuki wrote: We ran the benchmark for on hour with chekpoint timeout 30min and completion_target 0.5. Then, collected all the archive log and run PITR. --+++--- WAL conditions| Recovery | Amount of | recovery | time (sec) | physical read (MB) | rate (TX/min) --+++--- w/o prefetch ||| archived with cp | 6,611 | 5,435 |402 FPW=off ||| --+++--- With prefetch ||| archived with cp | 1,161 | 5,543 | 2,290 FPW=off ||| --+++--- There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. I notice we set the checkpoint_timeout to 30 mins, which is long enough to exceed the cache on the standby. I wonder if we reduced the timeout would we use the cache better on the standby and not need readahead at all? Do you have any results to examine cache overflow/shorter timeouts? w/o prefetch ||| archived with cp | 1,683 | 801 | 1,458 FPW=on||| (8.3) --+++--- w/o prefetch ||| archived with lesslog | 6,644 | 5,090 |369 FPW=on||| --+++--- With prefetch ||| archived with cp | 1,415 | 2,157 | 1,733 FPW=on||| --+++--- With prefetch ||| archived with lesslog | 1,196 | 5,369 | 2,051 FPW=on||| (This proposal) --+++--- So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers