On Fri, Sep 12, 2014 at 1:13 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> OK. I see your point.
>
> Now, what about the following assumptions (somewhat restrictions to
> facilitate the user experience for setting syncrep and the
> parametrization of this feature):
> - Nodes are defined within the same set (or group) if they have the
> same priority, aka the same application_name.
> - One node cannot be a part of two sets. That's obvious...

I feel pretty strongly that we should encourage people to use a
different application_name for every server.  The fact that a server
is interchangeable for one purpose does not mean that it's
interchangeable for all purposes; let's try to keep application_name
as the identifier for a server, and design the other facilities we
need around that.

> The current patch has its own merit, but it fails in the case you and
> Heikki are describing: wait for k nodes in set 1 (nodes with lowest
> priority value), l nodes in set 2 (nodes with priority 2nd lowest
> priority value), etc.
> What is does is, if for example we have a set of nodes with priorities
> {0,1,1,2,2,3,3}, backends will wait for flush_position from the first
> s_s_num nodes. By setting s_s_num to 3, we'll wait for {0,1,1}, to 2
> {0,1,1,2}, etc.
>
> Now what about that: instead of waiting for the nodes in "absolute"
> order like the way current patch does, let's do it in a "relative"
> way. By that I mean that a backend waits for flush_position
> confirmation only from *1* node among a set of nodes having the same
> priority. So by using s_s_num = 3, we'll wait for {0, "one node with
> 1", "one node with 2"}, and you can guess the rest.
>
> The point is as well that we can keep s_s_num behavior as it is now:
> - if set at -1, we rely on the default way of doing with s_s_names
> (empty means all nodes async, at least one entry meaning that we need
> to wait for a node)
> - if set at 0, all nodes are forced to be async'd
> - if set at n > 1, we have to wait for one node in each set of the
> N-lowest priority values.
> I'd see enough users happy with those improvements, and that would
> help improving the coverage of test cases that Heikki and you
> envisioned.

Sounds confusing.  I hate to be the guy always suggesting a
mini-language (cf. recent discussion of an expression syntax for
pgbench), but we could do much more powerful and flexible things here
if we had one.  For example, suppose we let each element of
synchronous_standby_names use the constructs (X,Y,Z,...) [meaning one
of the parenthesized severs], N(X,Y,Z,...) [meaning N of the
parenthesized servers].  Then if you want to consider a commit
acknowledge when you have any two of foo, bar, and baz you can write:

synchronous_standby_names = 2(foo,bar,baz)

And if you want to acknowledge when you've got either foo or both bar
and baz, you can write:

synchronous_standby_names = (foo,2(bar,baz))

And if you want one of foo and bar and one of baz and bletch, you can write:

synchronous_standby_names = 2((foo,bar),(baz,bletch))

The crazy-complicated policy I mentioned upthread would be:

synchronous_standby_names = (a,2((2(b,c),2(d,e)),f))
or (equivalently and simpler)
synchronous_standby_names = (a,3(b,c,f),3(d,e,f))

We could have a rule that we fall back to the next rule in
synchronous_standby_names when the first rule can never be satisfied
by the connected standbys.  For example, if you have foo, bar, and
baz, and you want any two of the three, but wish to prefer waiting for
foo over the others when it's connected, then you could write:

synchronous_standby_names = 2(foo,2(bar,baz)), 2(bar, baz)

If foo disconnects, the first rule can never be met, so we use the
second rule.  It's still 2 out of 3, just as if we'd written
2(foo,bar,baz) but we won't accept an ack from bar and baz as
sufficient unless foo is dead.

The exact syntax here is of course debatable; maybe somebody come up
with something better.  But it doesn't seem like it would be
incredibly painful to implement, and it would give us a lot of
flexibility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to