On Fri, Sep 12, 2014 at 1:13 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > OK. I see your point. > > Now, what about the following assumptions (somewhat restrictions to > facilitate the user experience for setting syncrep and the > parametrization of this feature): > - Nodes are defined within the same set (or group) if they have the > same priority, aka the same application_name. > - One node cannot be a part of two sets. That's obvious...
I feel pretty strongly that we should encourage people to use a different application_name for every server. The fact that a server is interchangeable for one purpose does not mean that it's interchangeable for all purposes; let's try to keep application_name as the identifier for a server, and design the other facilities we need around that. > The current patch has its own merit, but it fails in the case you and > Heikki are describing: wait for k nodes in set 1 (nodes with lowest > priority value), l nodes in set 2 (nodes with priority 2nd lowest > priority value), etc. > What is does is, if for example we have a set of nodes with priorities > {0,1,1,2,2,3,3}, backends will wait for flush_position from the first > s_s_num nodes. By setting s_s_num to 3, we'll wait for {0,1,1}, to 2 > {0,1,1,2}, etc. > > Now what about that: instead of waiting for the nodes in "absolute" > order like the way current patch does, let's do it in a "relative" > way. By that I mean that a backend waits for flush_position > confirmation only from *1* node among a set of nodes having the same > priority. So by using s_s_num = 3, we'll wait for {0, "one node with > 1", "one node with 2"}, and you can guess the rest. > > The point is as well that we can keep s_s_num behavior as it is now: > - if set at -1, we rely on the default way of doing with s_s_names > (empty means all nodes async, at least one entry meaning that we need > to wait for a node) > - if set at 0, all nodes are forced to be async'd > - if set at n > 1, we have to wait for one node in each set of the > N-lowest priority values. > I'd see enough users happy with those improvements, and that would > help improving the coverage of test cases that Heikki and you > envisioned. Sounds confusing. I hate to be the guy always suggesting a mini-language (cf. recent discussion of an expression syntax for pgbench), but we could do much more powerful and flexible things here if we had one. For example, suppose we let each element of synchronous_standby_names use the constructs (X,Y,Z,...) [meaning one of the parenthesized severs], N(X,Y,Z,...) [meaning N of the parenthesized servers]. Then if you want to consider a commit acknowledge when you have any two of foo, bar, and baz you can write: synchronous_standby_names = 2(foo,bar,baz) And if you want to acknowledge when you've got either foo or both bar and baz, you can write: synchronous_standby_names = (foo,2(bar,baz)) And if you want one of foo and bar and one of baz and bletch, you can write: synchronous_standby_names = 2((foo,bar),(baz,bletch)) The crazy-complicated policy I mentioned upthread would be: synchronous_standby_names = (a,2((2(b,c),2(d,e)),f)) or (equivalently and simpler) synchronous_standby_names = (a,3(b,c,f),3(d,e,f)) We could have a rule that we fall back to the next rule in synchronous_standby_names when the first rule can never be satisfied by the connected standbys. For example, if you have foo, bar, and baz, and you want any two of the three, but wish to prefer waiting for foo over the others when it's connected, then you could write: synchronous_standby_names = 2(foo,2(bar,baz)), 2(bar, baz) If foo disconnects, the first rule can never be met, so we use the second rule. It's still 2 out of 3, just as if we'd written 2(foo,bar,baz) but we won't accept an ack from bar and baz as sufficient unless foo is dead. The exact syntax here is of course debatable; maybe somebody come up with something better. But it doesn't seem like it would be incredibly painful to implement, and it would give us a lot of flexibility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers