Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-12 Thread Heikki Linnakangas

On 12/12/2014 04:29 AM, Michael Paquier wrote:

On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:


I propose the attached (I admit I haven't tested it).


Actually if you do it this way I think that it would be worth adding the
small optimization Fujii-san mentioned upthread: if priority is equal to 1,
we leave the loop earlier and return immediately the pointer. All those
things gathered give the patch attached, that I actually tested FWIW with
multiple standbys and multiple entries in s_s_names.


Ok, committed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-12 Thread Michael Paquier
On Fri, Dec 12, 2014 at 9:38 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 12/12/2014 04:29 AM, Michael Paquier wrote:

 On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 I propose the attached (I admit I haven't tested it).

 Actually if you do it this way I think that it would be worth adding the
 small optimization Fujii-san mentioned upthread: if priority is equal to
 1,
 we leave the loop earlier and return immediately the pointer. All those
 things gathered give the patch attached, that I actually tested FWIW with
 multiple standbys and multiple entries in s_s_names.


 Ok, committed.
Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-11 Thread Heikki Linnakangas

On 11/18/2014 11:23 PM, Michael Paquier wrote:

On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs si...@2ndquadrant.com wrote:


Can we just wait on this patch until we have the whole feature?


Well, this may take some time to even define, and even if goals are clearly
defined this may take even more time to have a prototype to discuss about.



We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.


Hm. I think that we are losing ourselves in this thread. The primary goal
is to remove a code duplication between syncrep.c and walsender,c that
exists since 9.1. Would it be possible to focus only on that for now?
That's still the topic of this CF.


Some comments on this patch:

* I don't like filling the priorities-array in 
SyncRepGetSynchronousStandby(). It might be convenient for the one 
caller that needs it, but it seems pretty ad hoc.


In fact, I don't really see the point of having the array at all. You 
could just print the priority in the main loop in 
pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock 
while reading the priority, but it seems over-zealous to be so strict 
about that in pg_stat_wal_senders(), since it's just an informational 
view, and priority changes so rarely that it's highly unlikely to hit a 
race condition there. Also note that when you change the list of 
synchronous standbys in the config file, and SIGHUP, the walsender 
processes will update their priorities in random order. So the idea that 
you get a consistent snapshot of the priorities is a bit bogus anyway. 
Also note that between filling the priorities array and the main loop in 
pg_stat_get_wal_senders(), a new WAL sender might connect and set a 
slot's pid. With the current coding, you'll print an uninitialized value 
from the array if that happens. So the only thing that holding the 
SyncRepLock really protects from is a torn read of the value, if 
reading an int is not atomic. We could use the spinlock to protect from 
that if we care.


* Would be nicer to return a pointer, than index into the wal senders 
array. That's what the caller really wants.


I propose the attached (I admit I haven't tested it).

- Heikki

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..da89a3b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -358,6 +358,54 @@ SyncRepInitConfig(void)
 }
 
 /*
+ * Find the WAL sender servicing the synchronous standby with the lowest
+ * priority value, or NULL if no synchronous standby is connected. If there
+ * are multiple nodes with the same lowest priority value, the first node
+ * found is selected. The caller must hold SyncRepLock.
+ */
+WalSnd *
+SyncRepGetSynchronousStandby(void)
+{
+	WalSnd	   *result = NULL;
+	int			result_priority = 0;
+	int			i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i  max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+		int			this_priority;
+
+		/* Must be active */
+		if (walsnd-pid == 0)
+			continue;
+
+		/* Must be streaming */
+		if (walsnd-state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Must be synchronous */
+		this_priority = walsnd-sync_standby_priority;
+		if (this_priority == 0)
+			continue;
+
+		/* Must have a lower priority value than any previous ones */
+		if (result != NULL  result_priority = this_priority)
+			continue;
+
+		/* Must have a valid flush position */
+		if (XLogRecPtrIsInvalid(walsnd-flush))
+			continue;
+
+		result = (WalSnd *) walsnd;
+		result_priority = this_priority;
+	}
+
+	return result;
+}
+
+/*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
  *
@@ -368,11 +416,9 @@ void
 SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
-	volatile WalSnd *syncWalSnd = NULL;
+	WalSnd	   *syncWalSnd;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +434,13 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, 

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-11 Thread Michael Paquier
On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 * I don't like filling the priorities-array in
 SyncRepGetSynchronousStandby(). It might be convenient for the one caller
 that needs it, but it seems pretty ad hoc.

 In fact, I don't really see the point of having the array at all. You
 could just print the priority in the main loop in
 pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock
 while reading the priority, but it seems over-zealous to be so strict about
 that in pg_stat_wal_senders(), since it's just an informational view, and
 priority changes so rarely that it's highly unlikely to hit a race
 condition there. Also note that when you change the list of synchronous
 standbys in the config file, and SIGHUP, the walsender processes will
 update their priorities in random order. So the idea that you get a
 consistent snapshot of the priorities is a bit bogus anyway. Also note that
 between filling the priorities array and the main loop in
 pg_stat_get_wal_senders(), a new WAL sender might connect and set a slot's
 pid. With the current coding, you'll print an uninitialized value from the
 array if that happens. So the only thing that holding the SyncRepLock
 really protects from is a torn read of the value, if reading an int is
 not atomic. We could use the spinlock to protect from that if we care.


That's fair, and it simplifies the whole process as well as the API able to
fetch the synchronous WAL sender.


 * Would be nicer to return a pointer, than index into the wal senders
 array. That's what the caller really wants.

 I propose the attached (I admit I haven't tested it).


Actually if you do it this way I think that it would be worth adding the
small optimization Fujii-san mentioned upthread: if priority is equal to 1,
we leave the loop earlier and return immediately the pointer. All those
things gathered give the patch attached, that I actually tested FWIW with
multiple standbys and multiple entries in s_s_names.

Regards,
-- 
Michael
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..34530a0 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -358,6 +358,62 @@ SyncRepInitConfig(void)
 }
 
 /*
+ * Find the WAL sender servicing the synchronous standby with the lowest
+ * priority value, or NULL if no synchronous standby is connected. If there
+ * are multiple nodes with the same lowest priority value, the first node
+ * found is selected. The caller must hold SyncRepLock.
+ */
+WalSnd *
+SyncRepGetSynchronousStandby(void)
+{
+	WalSnd	   *result = NULL;
+	int			result_priority = 0;
+	int			i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i  max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+		int			this_priority;
+
+		/* Must be active */
+		if (walsnd-pid == 0)
+			continue;
+
+		/* Must be streaming */
+		if (walsnd-state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Must be synchronous */
+		this_priority = walsnd-sync_standby_priority;
+		if (this_priority == 0)
+			continue;
+
+		/* Must have a lower priority value than any previous ones */
+		if (result != NULL  result_priority = this_priority)
+			continue;
+
+		/* Must have a valid flush position */
+		if (XLogRecPtrIsInvalid(walsnd-flush))
+			continue;
+
+		result = (WalSnd *) walsnd;
+		result_priority = this_priority;
+
+		/*
+		 * If priority is equal to 1, we are sure that there are no other
+		 * WAL senders that could be synchronous with a lower prioroty,
+		 * hence leave immediately with this one.
+		 */
+		if (this_priority == 1)
+			return result;
+	}
+
+	return result;
+}
+
+/*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
  *
@@ -368,11 +424,9 @@ void
 SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
-	volatile WalSnd *syncWalSnd = NULL;
+	WalSnd	   *syncWalSnd;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +442,13 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, also
-	 * change pg_stat_get_wal_senders().
+	 * then we use the first mentioned standbys.
 	 */

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-18 Thread Michael Paquier
On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Can we just wait on this patch until we have the whole feature?

Well, this may take some time to even define, and even if goals are clearly
defined this may take even more time to have a prototype to discuss about.


 We quite often break larger patches down into smaller ones, but if
 they arrive in lots of small pieces its more difficult to understand
 and correct issues that are happening on the larger scale. Churning
 the same piece of code multiple times is rather mind numbing.

Hm. I think that we are losing ourselves in this thread. The primary goal
is to remove a code duplication between syncrep.c and walsender,c that
exists since 9.1. Would it be possible to focus only on that for now?
That's still the topic of this CF.
-- 
Michael


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Simon Riggs
On 16 November 2014 12:07, Michael Paquier michael.paqu...@gmail.com wrote:

 Let's work
 the multiple node thing once we have a better spec of how to do it,
 visibly using a dedicated micro-language within s_s_names.

Hmm, please make sure that is a new post. That is easily something I
could disagree with, even though I support the need for more
functionality.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Michael Paquier
On Mon, Nov 17, 2014 at 10:00 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 16 November 2014 12:07, Michael Paquier michael.paqu...@gmail.com wrote:

 Let's work
 the multiple node thing once we have a better spec of how to do it,
 visibly using a dedicated micro-language within s_s_names.

 Hmm, please make sure that is a new post. That is easily something I
 could disagree with, even though I support the need for more
 functionality.
Sure. I am not really on that yet though :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Fujii Masao
On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'll send an updated patch tomorrow.

 Here are updated versions. I divided the patch into two for clarity,
 the first one is the actual refactoring patch, renaming
 SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
 like updating synchronous to sync in the comments as you mentioned)
 such as the namings have no conflicts.

 The second one updates the syncrep code, including WAL sender and WAL
 receiver, and its surroundings to use the term node instead of
 standby. This brings in the code the idea that a node using
 replication APIs is not necessarily a standby, making it more generic.
 This can be applied on top of the refactoring patch. If any other
 folks (Heikki or Fujii-san?) have comments about this idea feel free.

I've not read the patches yet. But while I was reading the code around
sync node detection, I thought that it might be good idea to treat the
node with priority as special. That is, if we find the active node with
the priority 1, we can immediately go out of the sync-node-detection-loop.

In many cases we can expect that the sync standby has the priority 1.
If yes, treating the priority 1 as special is good way to do, I think. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Michael Paquier
On Tue, Nov 18, 2014 at 3:03 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'll send an updated patch tomorrow.

 Here are updated versions. I divided the patch into two for clarity,
 the first one is the actual refactoring patch, renaming
 SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
 like updating synchronous to sync in the comments as you mentioned)
 such as the namings have no conflicts.

 The second one updates the syncrep code, including WAL sender and WAL
 receiver, and its surroundings to use the term node instead of
 standby. This brings in the code the idea that a node using
 replication APIs is not necessarily a standby, making it more generic.
 This can be applied on top of the refactoring patch. If any other
 folks (Heikki or Fujii-san?) have comments about this idea feel free.

 I've not read the patches yet. But while I was reading the code around
 sync node detection, I thought that it might be good idea to treat the
 node with priority as special. That is, if we find the active node with
 the priority 1, we can immediately go out of the sync-node-detection-loop.

 In many cases we can expect that the sync standby has the priority 1.
 If yes, treating the priority 1 as special is good way to do, I think. 
 Thought?

That would really save some cycles. Now we still need to fetch the
priority numbers of all nodes for pg_stat_get_wal_senders, so in this
case scanning all the entries in the WAL sender array is necessary.
This optimization is orthogonal to the refactoring patch btw, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-16 Thread Simon Riggs
On 31 October 2014 00:27, Michael Paquier michael.paqu...@gmail.com wrote:
 On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 If we stick with this version I'd argue it makes more sense to just stick
 the sync_node = and priority = statements into the if block and ditch the
 continue. /nitpick

 Let's go with the cleaner version then, I'd prefer code that can be read
 easily for this code path. Switching back is not much complicated either.

I like where you are headed with this feature set. I will do my best
to comment and review so we get something in 9.5.

Some review comments

* The new function calls them a Node rather than a Standby. That is a
good change, but it needs to be applied consistently, so we no longer
use the word Standby anywhere near the sync rep code. I'd prefer if we
continue to use the abbreviation sync for synchronous, since its less
typing and avoids spelling mistakes in code and comments.

* Rationale...
Robert Haas wrote
 Interestingly, the syncrep code has in some of its code paths the idea
 that a synchronous node is unique, while other code paths assume that
 there can be multiple synchronous nodes. If that's fine I think that
 it would be better to just make the code multiple-sync node aware, by
 having a single function call in walsender.c and syncrep.c that
 returns an integer array of WAL sender positions (WalSndCtl). as that
 seems more extensible long-term.

I thought this was the rationale for the patch, but it doesn't do
this. If you disagree with Robert, it would be best to say so now,
since I would guess he's expecting the patch to be doing as he asked.
Or perhaps I misunderstand.

* Multiple sync standbys were originally avoided because of the code
cost and complexity at ReleaseWaiters(). I'm very keen to keep the
code at that point very robust, so simplicity is essential. It would
also be useful to have some kind of micro-benchmark that allows us to
understand the overhead of various configs. Jim mentions he isn't sure
of the performance there. I don't see too much a problem with this
patch, but the later patches will require this, so we may as well do
this soon.

* We probably don't need a comment like /* Cleanup */ inserted above a
pfree. ;-)

I see this should get committed in next few days, even outside the CF.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-16 Thread Michael Paquier
Thanks for the comments!

On Sun, Nov 16, 2014 at 8:32 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 31 October 2014 00:27, Michael Paquier michael.paqu...@gmail.com wrote:
 On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 If we stick with this version I'd argue it makes more sense to just stick
 the sync_node = and priority = statements into the if block and ditch the
 continue. /nitpick

 Let's go with the cleaner version then, I'd prefer code that can be read
 easily for this code path. Switching back is not much complicated either.

 I like where you are headed with this feature set. I will do my best
 to comment and review so we get something in 9.5.

 Some review comments

 * The new function calls them a Node rather than a Standby. That is a
 good change, but it needs to be applied consistently, so we no longer
 use the word Standby anywhere near the sync rep code. I'd prefer if we
 continue to use the abbreviation sync for synchronous, since its less
 typing and avoids spelling mistakes in code and comments.
Yes I guess this makes sense.

 * Rationale...
 Robert Haas wrote
 Interestingly, the syncrep code has in some of its code paths the idea
 that a synchronous node is unique, while other code paths assume that
 there can be multiple synchronous nodes. If that's fine I think that
 it would be better to just make the code multiple-sync node aware, by
 having a single function call in walsender.c and syncrep.c that
 returns an integer array of WAL sender positions (WalSndCtl). as that
 seems more extensible long-term.

 I thought this was the rationale for the patch, but it doesn't do
 this. If you disagree with Robert, it would be best to say so now,
 since I would guess he's expecting the patch to be doing as he asked.
 Or perhaps I misunderstand.

This comment is originally from me :)
http://www.postgresql.org/message-id/CAB7nPqTtmSo0YX1_3_PykpKbdGUi3UeFd0_=2-6vrqo5n_t...@mail.gmail.com
Changing with my older opinion, I wrote the patch on this thread with
in mind the idea to keep the code as simple as possible, so for now it
is better to still think that we have a single sync node. Let's work
the multiple node thing once we have a better spec of how to do it,
visibly using a dedicated micro-language within s_s_names.

 * Multiple sync standbys were originally avoided because of the code
 cost and complexity at ReleaseWaiters(). I'm very keen to keep the
 code at that point very robust, so simplicity is essential. It would
 also be useful to have some kind of micro-benchmark that allows us to
 understand the overhead of various configs. Jim mentions he isn't sure
 of the performance there. I don't see too much a problem with this
 patch, but the later patches will require this, so we may as well do
 this soon.
Yes I have been playing with that with the patch series of the
previous commit fest, with something not that much kludgy.

 * We probably don't need a comment like /* Cleanup */ inserted above a
 pfree. ;-)
Sure. I'll send an updated patch tomorrow.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Michael Paquier
Thanks for your review! (No worries for creating a new thread, I don't
mind as this is not a huge patch. I think you could have downloaded
the mbox from postgresql.org btw).

On Thu, Oct 30, 2014 at 8:24 AM, Jim Nasby j...@nasby.net wrote:
 SyncRepGetSynchronousNode():
 IMHO, the top comment should mention that if there are multiple standbys of
 the same priority it returns the first one.
OK. Corrected.

 This switches from using a single if() with multiple conditions 'd
 together to a bunch of if() continue's. I don't know if those will perform
 the same, and AFAIK this is pretty performance critical.
Well, we could still use the old notation with a single if(). That's
not much complicated to change.

 grammarZealotIn the comments, Process should be
 Proceed./grammarZealot
Noted. Thanks for correcting my Frenglish.

 pg_stat_get_wal_senders():
 The original version only loops through walsnds[] one time; the new code
 loops twice, both times while holding SyncRepLock(shared). This will block
 transaction commit if syncrep is enabled. That's not great, but presumably
 this function isn't going to be called terribly often, so maybe it's OK. (On
 a side note, perhaps we should add a warning to the documentation for
 pg_stat_replication warning not to hammer it...)
Hm. For code readability I did not really want to do so, but let's do
this: let's add a new argument in SyncRepGetSynchronousNode to
retrieve the priority of each WAL sender and do a single pass through
the array such as there is no performance difference.

Updated patch attached.
Regards,
-- 
Michael
From 1a02c982de0aeb2bd7dcf0bbf34605017619a417 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 99 +++--
 src/backend/replication/walsender.c | 36 +++---
 src/include/replication/syncrep.h   |  1 +
 3 files changed, 80 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..70c799b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -357,6 +357,70 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i  max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+
+		/* First get the priority of this WAL sender */
+		if (node_priority)
+			node_priority[i] = XLogRecPtrIsInvalid(walsnd-flush) ?
+0 : walsnd-sync_standby_priority;
+
+		/*
+		 * Proceed immediately to next WAL sender if one of those
+		 * conditions is satisfied:
+		 * - Not active with PID equal to 0.
+		 * - Not streaming.
+		 * - Priority conditions not satisfied. If priority is not 0 it
+		 *   means that one potential synchronous node has already been
+		 *   found. The node selected needs as well to have the lowest
+		 *   priority in the list of WAL senders.
+		 * - Stream flush position is invalid.
+		 */
+		if (walsnd-pid == 0 ||
+			walsnd-state != WALSNDSTATE_STREAMING ||
+			walsnd-sync_standby_priority == 0 ||
+			(priority != 0 
+			 priority = walsnd-sync_standby_priority) ||
+			XLogRecPtrIsInvalid(walsnd-flush))
+			continue;
+
+		/*
+		 * We have a potential synchronous candidate, choose it as the
+		 * synchronous node for the time being before going through the
+		 * other nodes listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd-sync_standby_priority;
+	}

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Jim Nasby

On 10/30/14, 8:05 AM, Michael Paquier wrote:

This switches from using a single if() with multiple conditions 'd
together to a bunch of if() continue's. I don't know if those will perform
the same, and AFAIK this is pretty performance critical.

Well, we could still use the old notation with a single if(). That's
not much complicated to change.


I actually prefer the multiple if's; it reads a LOT cleaner. I don't know what 
the compiler will do with it though.

If we stick with this version I'd argue it makes more sense to just stick the 
sync_node = and priority = statements into the if block and ditch the continue. 
/nitpick
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Michael Paquier
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 If we stick with this version I'd argue it makes more sense to just stick
 the sync_node = and priority = statements into the if block and ditch the
 continue. /nitpick

Let's go with the cleaner version then, I'd prefer code that can be read
easily for this code path. Switching back is not much complicated either.
-- 
Michael
From 00d85f046d187056a85c6d6dc82e9a79674b132d Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 101 ++--
 src/backend/replication/walsender.c |  36 +++--
 src/include/replication/syncrep.h   |   1 +
 3 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..848c783 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -357,6 +357,72 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i  max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+
+		/* First get the priority of this WAL sender */
+		if (node_priority)
+			node_priority[i] = XLogRecPtrIsInvalid(walsnd-flush) ?
+0 : walsnd-sync_standby_priority;
+
+		/* Proceed to next if not active */
+		if (walsnd-pid == 0)
+			continue;
+
+		/* Proceed to next if not streaming */
+		if (walsnd-state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Proceed to next one if asynchronous */
+		if (walsnd-sync_standby_priority == 0)
+			continue;
+
+		/* Proceed to next one if priority conditions not satisfied */
+		if (priority != 0 
+			priority = walsnd-sync_standby_priority)
+			continue;
+
+		/* Proceed to next one if flush position is invalid */
+		if (XLogRecPtrIsInvalid(walsnd-flush))
+			continue;
+
+		/*
+		 * We have a potential synchronous candidate, choose it as the
+		 * synchronous node for the time being before going through the
+		 * other nodes listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd-sync_standby_priority;
+	}
+
+	return sync_node;
+}
+
 /*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +435,9 @@ SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
 	volatile WalSnd *syncWalSnd = NULL;
+	int			sync_node;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +453,14 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, also
-	 * change pg_stat_get_wal_senders().
+	 * then we use the first mentioned standbys.
 	 */
 	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+	sync_node = SyncRepGetSynchronousNode(NULL);
 
-	for (i = 0; i  max_wal_senders; i++)
-	{
-		/* use volatile pointer to prevent code rearrangement */
-		volatile WalSnd *walsnd = walsndctl-walsnds[i];
-
-		if (walsnd-pid != 0 
-			walsnd-state == WALSNDSTATE_STREAMING 
-			walsnd-sync_standby_priority  0 
-			(priority == 0 ||
-			 priority  walsnd-sync_standby_priority) 
-			!XLogRecPtrIsInvalid(walsnd-flush))
-		{

[HACKERS] Review of Refactoring code for sync node detection

2014-10-29 Thread Jim Nasby

Original message (sorry, don't have a copy to reply to :( ): 
http://www.postgresql.org/message-id/CAB7nPqThX=WvuGA1J-_CQ293dK3FmUivuYkNvHR0W5xjEc=o...@mail.gmail.com
Commitfest URL: https://commitfest.postgresql.org/action/patch_view?id=1579

Summary:
I'd like someone to chime in on the potential performance impact. Other than 
that, looks good modulo a few grammar fixes.

Patch applies cleanly and passes make check.

Searching for both SyncRepLock and sync_standby_priority I find no other code 
this patch needs to modify.

SyncRepGetSynchronousNode():
IMHO, the top comment should mention that if there are multiple standbys of the 
same priority it returns the first one.

This switches from using a single if() with multiple conditions 'd together 
to a bunch of if() continue's. I don't know if those will perform the same, and AFAIK 
this is pretty performance critical.

grammarZealotIn the comments, Process should be Proceed./grammarZealot

The original search logic in SyncRepReleaseWaiters is correctly copied and 
negated.

pg_stat_get_wal_senders():
The original version only loops through walsnds[] one time; the new code loops 
twice, both times while holding SyncRepLock(shared). This will block 
transaction commit if syncrep is enabled. That's not great, but presumably this 
function isn't going to be called terribly often, so maybe it's OK. (On a side 
note, perhaps we should add a warning to the documentation for 
pg_stat_replication warning not to hammer it...)

grammar
+   /* Get first the priorities on each standby as long as we hold a lock */
should be
+   /* First get the priority of each standby as long as we hold a lock */
or even just
+   /* First get the priority of each standby */
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers