Re: [HACKERS] New replication mode: write

2012-01-24 Thread Fujii Masao
On Mon, Jan 23, 2012 at 9:53 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 23, 2012 at 10:03 AM, Fujii Masao masao.fu...@gmail.com wrote:

 To make the walreceiver call WaitLatchOrSocket(), we would need to
 merge it and libpq_select() into one function. But the former is the 
 backend
 function and the latter is the frontend one. Now I have no good idea to
 merge them cleanly.

 We can wait on the socket wherever it comes from. poll/select doesn't
 care how we got the socket.

 So we just need a common handler that calls either
 walreceiver/libpqwalreceiver function as required to handle the
 wakeup.

 I'm afraid I could not understand your idea. Could you explain it in
 more detail?

 We either tell libpqwalreceiver about the latch, or we tell
 walreceiver about the socket used by libpqwalreceiver.

 In either case we share a pointer from one module to another.

The former seems difficult because it's not easy to link libpqwalreceiver.so
to the latch. I will consider about the latter.

 If we send back the reply as soon as the Apply pointer is changed, I'm
 afraid quite lots of reply messages are sent frequently, which might
 cause performance problem. This is also one of the reasons why I didn't
 implement the quick-response feature. To address this problem, we might
 need to change the master so that it sends the Wait pointer to the standby,
 and change the standby so that it replies whenever the Apply pointer
 catches up with the Wait one. This can reduce the number of useless
 reply from the standby about the Apply pointer.

 We send back one reply per incoming message. The incoming messages
 don't know request state and checking that has a cost which I don't
 think is an appropriate payment since we only need this info when the
 link goes quiet.

 When the link goes quiet we still need to send replies if we have
 apply mode, but we only need to send apply messages if the lsn has
 changed because of a commit. That will considerably reduce the
 messages sent so I don't see a problem.

 You mean to change the meaning of apply_location? Currently it indicates
 the end + 1 of the last replayed WAL record, regardless of whether it's
 a commit record or not. So too many replies can be sent per incoming
 message because it might contain many WAL records. But you mean to
 change apply_location only when a commit record is replayed?

 There is no change to the meaning of apply_location. The only change
 is that we send that message only when it has an updated value of
 committed lsn.

This means that apply_location might return the different location from
pg_last_xlog_replay_location() on the standby, though in 9.1 they return
the same. Which might confuse a user. No?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-24 Thread Simon Riggs
On Tue, Jan 24, 2012 at 10:47 AM, Fujii Masao masao.fu...@gmail.com wrote:

 I'm afraid I could not understand your idea. Could you explain it in
 more detail?

 We either tell libpqwalreceiver about the latch, or we tell
 walreceiver about the socket used by libpqwalreceiver.

 In either case we share a pointer from one module to another.

 The former seems difficult because it's not easy to link libpqwalreceiver.so
 to the latch. I will consider about the latter.

Yes, it might be too hard, but lets look.

 You mean to change the meaning of apply_location? Currently it indicates
 the end + 1 of the last replayed WAL record, regardless of whether it's
 a commit record or not. So too many replies can be sent per incoming
 message because it might contain many WAL records. But you mean to
 change apply_location only when a commit record is replayed?

 There is no change to the meaning of apply_location. The only change
 is that we send that message only when it has an updated value of
 committed lsn.

 This means that apply_location might return the different location from
 pg_last_xlog_replay_location() on the standby, though in 9.1 they return
 the same. Which might confuse a user. No?

The two values only match on a quiet system anyway, since both are
moving forwards.

They will still match on a quiet system.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-24 Thread Simon Riggs
On Tue, Jan 24, 2012 at 11:00 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Yes, it might be too hard, but lets look.

Your committer has timed out ;-)

committed write mode only

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-24 Thread Fujii Masao
On Wed, Jan 25, 2012 at 5:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jan 24, 2012 at 11:00 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Yes, it might be too hard, but lets look.

 Your committer has timed out ;-)

 committed write mode only

Thanks for the commit!

The apply mode is attractive, but I need more time to implement that completely.
I might not be able to complete that within this CF. So committing the
write mode
only is right decision, I think. If I have time after all of the
patches which I'm interested
in will have been committed, I will try the apply mode again, but
maybe for 9.3dev.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-23 Thread Fujii Masao
On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

 I notice that the Apply mode isn't fully implemented. I had in mind
 that you would add the latch required to respond more quickly when
 only the Apply pointer has changed.

 Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
 was there another reason for not implementing that?

I agree that the feature you pointed is useful for the Apply mode. But
I'm afraid that implementing that feature is not easy and would make
the patch big and complicated, so I didn't implement the Apply mode first.

To make the walreceiver call WaitLatchOrSocket(), we would need to
merge it and libpq_select() into one function. But the former is the backend
function and the latter is the frontend one. Now I have no good idea to
merge them cleanly.

If we send back the reply as soon as the Apply pointer is changed, I'm
afraid quite lots of reply messages are sent frequently, which might
cause performance problem. This is also one of the reasons why I didn't
implement the quick-response feature. To address this problem, we might
need to change the master so that it sends the Wait pointer to the standby,
and change the standby so that it replies whenever the Apply pointer
catches up with the Wait one. This can reduce the number of useless
reply from the standby about the Apply pointer.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-23 Thread Simon Riggs
On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

 I notice that the Apply mode isn't fully implemented. I had in mind
 that you would add the latch required to respond more quickly when
 only the Apply pointer has changed.

 Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
 was there another reason for not implementing that?

 I agree that the feature you pointed is useful for the Apply mode. But
 I'm afraid that implementing that feature is not easy and would make
 the patch big and complicated, so I didn't implement the Apply mode first.

 To make the walreceiver call WaitLatchOrSocket(), we would need to
 merge it and libpq_select() into one function. But the former is the backend
 function and the latter is the frontend one. Now I have no good idea to
 merge them cleanly.

We can wait on the socket wherever it comes from. poll/select doesn't
care how we got the socket.

So we just need a common handler that calls either
walreceiver/libpqwalreceiver function as required to handle the
wakeup.


 If we send back the reply as soon as the Apply pointer is changed, I'm
 afraid quite lots of reply messages are sent frequently, which might
 cause performance problem. This is also one of the reasons why I didn't
 implement the quick-response feature. To address this problem, we might
 need to change the master so that it sends the Wait pointer to the standby,
 and change the standby so that it replies whenever the Apply pointer
 catches up with the Wait one. This can reduce the number of useless
 reply from the standby about the Apply pointer.

We send back one reply per incoming message. The incoming messages
don't know request state and checking that has a cost which I don't
think is an appropriate payment since we only need this info when the
link goes quiet.

When the link goes quiet we still need to send replies if we have
apply mode, but we only need to send apply messages if the lsn has
changed because of a commit. That will considerably reduce the
messages sent so I don't see a problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-23 Thread Fujii Masao
On Mon, Jan 23, 2012 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

 I notice that the Apply mode isn't fully implemented. I had in mind
 that you would add the latch required to respond more quickly when
 only the Apply pointer has changed.

 Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
 was there another reason for not implementing that?

 I agree that the feature you pointed is useful for the Apply mode. But
 I'm afraid that implementing that feature is not easy and would make
 the patch big and complicated, so I didn't implement the Apply mode first.

 To make the walreceiver call WaitLatchOrSocket(), we would need to
 merge it and libpq_select() into one function. But the former is the backend
 function and the latter is the frontend one. Now I have no good idea to
 merge them cleanly.

 We can wait on the socket wherever it comes from. poll/select doesn't
 care how we got the socket.

 So we just need a common handler that calls either
 walreceiver/libpqwalreceiver function as required to handle the
 wakeup.

I'm afraid I could not understand your idea. Could you explain it in
more detail?

 If we send back the reply as soon as the Apply pointer is changed, I'm
 afraid quite lots of reply messages are sent frequently, which might
 cause performance problem. This is also one of the reasons why I didn't
 implement the quick-response feature. To address this problem, we might
 need to change the master so that it sends the Wait pointer to the standby,
 and change the standby so that it replies whenever the Apply pointer
 catches up with the Wait one. This can reduce the number of useless
 reply from the standby about the Apply pointer.

 We send back one reply per incoming message. The incoming messages
 don't know request state and checking that has a cost which I don't
 think is an appropriate payment since we only need this info when the
 link goes quiet.

 When the link goes quiet we still need to send replies if we have
 apply mode, but we only need to send apply messages if the lsn has
 changed because of a commit. That will considerably reduce the
 messages sent so I don't see a problem.

You mean to change the meaning of apply_location? Currently it indicates
the end + 1 of the last replayed WAL record, regardless of whether it's
a commit record or not. So too many replies can be sent per incoming
message because it might contain many WAL records. But you mean to
change apply_location only when a commit record is replayed?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-23 Thread Simon Riggs
On Mon, Jan 23, 2012 at 10:03 AM, Fujii Masao masao.fu...@gmail.com wrote:

 To make the walreceiver call WaitLatchOrSocket(), we would need to
 merge it and libpq_select() into one function. But the former is the backend
 function and the latter is the frontend one. Now I have no good idea to
 merge them cleanly.

 We can wait on the socket wherever it comes from. poll/select doesn't
 care how we got the socket.

 So we just need a common handler that calls either
 walreceiver/libpqwalreceiver function as required to handle the
 wakeup.

 I'm afraid I could not understand your idea. Could you explain it in
 more detail?

We either tell libpqwalreceiver about the latch, or we tell
walreceiver about the socket used by libpqwalreceiver.

In either case we share a pointer from one module to another.

 If we send back the reply as soon as the Apply pointer is changed, I'm
 afraid quite lots of reply messages are sent frequently, which might
 cause performance problem. This is also one of the reasons why I didn't
 implement the quick-response feature. To address this problem, we might
 need to change the master so that it sends the Wait pointer to the standby,
 and change the standby so that it replies whenever the Apply pointer
 catches up with the Wait one. This can reduce the number of useless
 reply from the standby about the Apply pointer.

 We send back one reply per incoming message. The incoming messages
 don't know request state and checking that has a cost which I don't
 think is an appropriate payment since we only need this info when the
 link goes quiet.

 When the link goes quiet we still need to send replies if we have
 apply mode, but we only need to send apply messages if the lsn has
 changed because of a commit. That will considerably reduce the
 messages sent so I don't see a problem.

 You mean to change the meaning of apply_location? Currently it indicates
 the end + 1 of the last replayed WAL record, regardless of whether it's
 a commit record or not. So too many replies can be sent per incoming
 message because it might contain many WAL records. But you mean to
 change apply_location only when a commit record is replayed?

There is no change to the meaning of apply_location. The only change
is that we send that message only when it has an updated value of
committed lsn.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-22 Thread Simon Riggs
On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

I notice that the Apply mode isn't fully implemented. I had in mind
that you would add the latch required to respond more quickly when
only the Apply pointer has changed.

Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
was there another reason for not implementing that?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-20 Thread Simon Riggs
On Mon, Jan 16, 2012 at 4:17 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Done. Attached is the updated version of the patch.

 Thanks.

 I'll review this first, but can't start immediately. Please expect
 something back in 2 days.

On initial review this looks fine.

I'll do a more thorough hands-on review now and commit if still OK.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-16 Thread Fujii Masao
On Fri, Jan 13, 2012 at 9:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Thought? Comments?

 This is almost exactly the same as my patch series
 syncrep_queues.v[1,2].patch earlier this year. Which I know because
 I was updating that patch myself last night for 9.2. I'm about half
 way through doing that, since you and I agreed in Ottawa I would do
 this. Perhaps it is better if we work together?

 I think this comment is mostly pointless. We don't have time to work
 together and there's no real reason to. You know what you're doing, so
 I'll leave you to do it.

 Please add the Apply mode.

 OK, will do.

Done. Attached is the updated version of the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 1559,1567  SET ENABLE_SEQSCAN TO OFF;
 para
  Specifies whether transaction commit will wait for WAL records
  to be written to disk before the command returns a quotesuccess/
! indication to the client.  Valid values are literalon/,
! literallocal/, and literaloff/.  The default, and safe, value
! is literalon/.  When literaloff/, there can be a delay between
  when success is reported to the client and when the transaction is
  really guaranteed to be safe against a server crash.  (The maximum
  delay is three times xref linkend=guc-wal-writer-delay.)  Unlike
--- 1559,1567 
 para
  Specifies whether transaction commit will wait for WAL records
  to be written to disk before the command returns a quotesuccess/
! indication to the client.  Valid values are literalon/, literalwrite/,
! literalapply/, literallocal/, and literaloff/.  The default, and safe,
! value is literalon/.  When literaloff/, there can be a delay between
  when success is reported to the client and when the transaction is
  really guaranteed to be safe against a server crash.  (The maximum
  delay is three times xref linkend=guc-wal-writer-delay.)  Unlike
***
*** 1579,1589  SET ENABLE_SEQSCAN TO OFF;
  If xref linkend=guc-synchronous-standby-names is set, this
  parameter also controls whether or not transaction commit will wait
  for the transaction's WAL records to be flushed to disk and replicated
! to the standby server.  The commit wait will last until a reply from
! the current synchronous standby indicates it has written the commit
! record of the transaction to durable storage.  If synchronous
  replication is in use, it will normally be sensible either to wait
! both for WAL records to reach both the local and remote disks, or
  to allow the transaction to commit asynchronously.  However, the
  special value literallocal/ is available for transactions that
  wish to wait for local flush to disk, but not synchronous replication.
--- 1579,1600 
  If xref linkend=guc-synchronous-standby-names is set, this
  parameter also controls whether or not transaction commit will wait
  for the transaction's WAL records to be flushed to disk and replicated
! to the standby server.  When literalon/, the commit wait will last
! until a reply from the current synchronous standby indicates it has flushed
! the commit record of the transaction to durable storage. This will
! avoids any data loss unless the database cluster of both primary and
! standby gets corrupted simultaneously. When literalwrite/,
! the commit wait will last until a reply from the current synchronous
! standby indicates it has received the commit record of the transaction
! to memory. Normally this causes no data loss at the time of failover.
! However, if both primary and standby crash, and the database cluster of
! the primary gets corrupted, recent committed transactions might
! be lost. When literalapply/, the commit will wait until the current
! synchronous standby has replayed the committed changes successfully.
! This guarantees that any transactions are visible on the synchronous
! standby when they are committed. If synchronous
  replication is in use, it will normally be sensible either to wait
! for both local flush and replication of WAL records, or
  to allow the transaction to commit asynchronously.  However, the
  special value literallocal/ is available for transactions that
  wish to wait for local flush to disk, but not synchronous replication.
*** 

Re: [HACKERS] New replication mode: write

2012-01-16 Thread Simon Riggs
On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Done. Attached is the updated version of the patch.

Thanks.

I'll review this first, but can't start immediately. Please expect
something back in 2 days.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-13 Thread Fujii Masao
On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Thought? Comments?

 This is almost exactly the same as my patch series
 syncrep_queues.v[1,2].patch earlier this year. Which I know because
 I was updating that patch myself last night for 9.2. I'm about half
 way through doing that, since you and I agreed in Ottawa I would do
 this. Perhaps it is better if we work together?

 I think this comment is mostly pointless. We don't have time to work
 together and there's no real reason to. You know what you're doing, so
 I'll leave you to do it.

 Please add the Apply mode.

OK, will do.

 In my patch, the reason I avoided doing WRITE mode (which we had
 previously referred to as RECV) was that no fsync of the WAL contents
 takes place. In that case we are applying changes using un-fsynced WAL
 data and in case of crash this would cause a problem.

My patch has not changed the execution order of WAL flush and replay.
WAL records are always replayed after they are flushed by walreceiver.
So, such a problem doesn't happen.

But which means that transaction might need to wait for WAL flush caused
by previous transaction even if WRITE mode is chosen. Which limits the
performance gain by WRITE mode, and should be improved later, I think.

 I was going to
 make the WalWriter available during recovery to cater for that. Do you
 not think that is no longer necessary?

That's still necessary to improve the performance in sync rep further, I think.
What I'd like to do (maybe in 9.3dev) after supporting WRITE mode is:

* Allow WAL records to be replayed before they are flushed to the disk.
* Add new GUC parameter specifying whether to allow the standby to defer
   WAL flush. If the parameter is false, walreceiver flushes WAL whenever it
   receives WAL (i.e., it's same as the current behavior). If true, walreceiver
   doesn't flush WAL at all. Instead, walwriter, backend or startup process
   does that. Walwriter periodically checks whether there is un-flushed WAL
   file, and flushes it if exists. When the buffer page is written out, backend
   or startup process forces WAL flush up to buffer's LSN.

If the above GUC parameter is set to true (i.e., walreceiver doesn't flush
WAL at all) and WRITE mode is chosen, transaction doesn't need to wait
for WAL flush on the standby at all. Also the frequency of WAL flush on
the standby would become lower, which significantly reduces I/O load.
After all, the performance in sync rep would improve very much.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-13 Thread Simon Riggs
On Fri, Jan 13, 2012 at 12:27 PM, Fujii Masao masao.fu...@gmail.com wrote:

 In my patch, the reason I avoided doing WRITE mode (which we had
 previously referred to as RECV) was that no fsync of the WAL contents
 takes place. In that case we are applying changes using un-fsynced WAL
 data and in case of crash this would cause a problem.

 My patch has not changed the execution order of WAL flush and replay.
 WAL records are always replayed after they are flushed by walreceiver.
 So, such a problem doesn't happen.

 But which means that transaction might need to wait for WAL flush caused
 by previous transaction even if WRITE mode is chosen. Which limits the
 performance gain by WRITE mode, and should be improved later, I think.

If the WALreceiver still flushes that is OK.

The latency would be smoother and lower if the WALwriter were active.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New replication mode: write

2012-01-12 Thread Fujii Masao
Hi,

http://archives.postgresql.org/message-id/AANLkTilgyL3Y1jkDVHX02433COq7JLmqicsqmOsbuyA1%40mail.gmail.com

Previously I proposed the replication mode recv on the above thread,
but it's not
committed yet. Now I'd like to propose that mode again because it's
useful to reduce
the overhead of synchronous replication. Attached patch implements that mode.

If you choose that mode, transaction waits for its WAL to be write()'d
on the standby,
IOW, waits until the standby saves the WAL in the memory. Which provides lower
level of durability than that current synchronous replication (i.e.,
transaction waits for
its WAL to be flushed to the disk) does. However, it's practically
useful setting
because it can decrease the response time for the transaction, and
causes no data loss
unless both the master and the standby crashes and the database of the
master gets
corrupted at the same time.

In the patch, you can choose that mode by setting synchronous_commit to write.
I renamed that mode to write from recv on the basis of its actual behavior.

I measured how much write mode improves the performance in
synchronous replication.
Here is the result:

synchronous_commit = on
tps = 424.510843 (including connections establishing)
tps = 420.767883 (including connections establishing)
tps = 419.715658 (including connections establishing)
tps = 428.810001 (including connections establishing)
tps = 337.341445 (including connections establishing)

synchronous_commit = write
tps = 550.752712 (including connections establishing)
tps = 407.104036 (including connections establishing)
tps = 455.576190 (including connections establishing)
tps = 453.548672 (including connections establishing)
tps = 555.171325 (including connections establishing)

I used pgbench (scale factor = 100) as a benchmark and ran the
following command.

pgbench -c 8 -j 8 -T 60 -M prepared

I always ran CHECKPOINT in both master and standby before starting each pgbench
test, to prevent CHECKPOINT from affecting the result of the performance test.

Thought? Comments?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 1559,1565  SET ENABLE_SEQSCAN TO OFF;
 para
  Specifies whether transaction commit will wait for WAL records
  to be written to disk before the command returns a quotesuccess/
! indication to the client.  Valid values are literalon/,
  literallocal/, and literaloff/.  The default, and safe, value
  is literalon/.  When literaloff/, there can be a delay between
  when success is reported to the client and when the transaction is
--- 1559,1565 
 para
  Specifies whether transaction commit will wait for WAL records
  to be written to disk before the command returns a quotesuccess/
! indication to the client.  Valid values are literalon/, literalwrite/,
  literallocal/, and literaloff/.  The default, and safe, value
  is literalon/.  When literaloff/, there can be a delay between
  when success is reported to the client and when the transaction is
***
*** 1579,1589  SET ENABLE_SEQSCAN TO OFF;
  If xref linkend=guc-synchronous-standby-names is set, this
  parameter also controls whether or not transaction commit will wait
  for the transaction's WAL records to be flushed to disk and replicated
! to the standby server.  The commit wait will last until a reply from
! the current synchronous standby indicates it has written the commit
! record of the transaction to durable storage.  If synchronous
  replication is in use, it will normally be sensible either to wait
! both for WAL records to reach both the local and remote disks, or
  to allow the transaction to commit asynchronously.  However, the
  special value literallocal/ is available for transactions that
  wish to wait for local flush to disk, but not synchronous replication.
--- 1579,1597 
  If xref linkend=guc-synchronous-standby-names is set, this
  parameter also controls whether or not transaction commit will wait
  for the transaction's WAL records to be flushed to disk and replicated
! to the standby server.  When literalwrite/, the commit wait will
! last until a reply from the current synchronous standby indicates
! it has received the commit record of the transaction to memory.
! Normally this causes no data loss at the time of failover. However,
! if both primary and standby crash, and the database cluster of
! the primary gets corrupted, recent committed transactions might
! be lost. When literalon/,  the commit wait will last until a reply
! from the current synchronous standby indicates it has flushed
! the commit record of the