On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier >> <michael.paqu...@gmail.com> wrote: >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmh...@gmail.com> wrote: >>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier >>>> <michael.paqu...@gmail.com> wrote: >>>>> Yes, please let's use the custom language, and let's not care of not >>>>> more than 1 level of nesting so as it is possible to represent >>>>> pg_stat_replication in a simple way for the user. >>>> >>>> "not" is used twice in this sentence in a way that renders me not able >>>> to be sure that I'm not understanding it not properly. >>> >>> 4 times here. Score beaten. >>> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine >>> to only support configurations up to one level of nested objects, like >>> that: >>> 2[node1, node2, node3] >>> node1, 2[node2, node3], node3 >>> In short, we could restrict things so as we cannot define a group of >>> nodes within an existing group. >> >> No, actually, that's stupid. Having up to two nested levels makes more >> sense, a quite common case for this feature being something like that: >> 2{node1,[node2,node3]} >> In short, sync confirmation is waited from node1 and (node2 or node3). >> >> Flattening groups of nodes with a new catalog will be necessary to >> ease the view of this data to users: >> - group name? >> - array of members with nodes/groups >> - group type: quorum or priority >> - number of items to wait for in this group > > So, here are some thoughts to make that more user-friendly. I think > that the critical issue here is to properly flatten the meta data in > the custom language and represent it properly in a new catalog, > without messing up too much with the existing pg_stat_replication that > people are now used to for 5 releases since 9.0. So, I would think > that we will need to have a new catalog, say > pg_stat_replication_groups with the following things: > - One line of this catalog represents the status of a group or of a single > node. > - The status of a node/group is either sync or potential, if a > node/group is specified more than once, it may be possible that it > would be sync and potential depending on where it is defined, in which > case setting its status to 'sync' has the most sense. If it is in sync > state I guess. > - Move sync_priority and sync_state, actually an equivalent from > pg_stat_replication into this new catalog, because those represent the > status of a node or group of nodes. > - group name, and by that I think that we had perhaps better make > mandatory the need to append a name with a quorum or priority group. > The group at the highest level is forcibly named as 'top', 'main', or > whatever if not directly specified by the user. If the entry is > directly a node, use the application_name. > - Type of group, quorum or priority > - Elements in this group, an element can be a group name or a node > name, aka application_name. If group is of type priority, the elements > are listed in increasing order. So the elements with lower priority > get first, etc. We could have one column listing explicitly a list of > integers that map with the elements of a group but it does not seem > worth it, what users would like to know is what are the nodes that are > prioritized. This covers the former 'priority' field of > pg_stat_replication. > > We may have a good idea of how to define a custom language, still we > are going to need to design a clean interface at catalog level more or > less close to what is written here. If we can get a clean interface, > the custom language implemented, and TAP tests that take advantage of > this user interface to check the node/group statuses, I guess that we > would be in good shape for this patch. > > Anyway that's not a small project, and perhaps I am over-complicating > the whole thing. >
I agree with adding new system catalog to easily checking replication status for user. And group name will needed for this. What about adding group name with ":" to immediately after set of standbys like follows? 2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo] Also, regarding sync replication according to configuration, the view I'm thinking is following definition. =# \d pg_synchronous_replication Column | Type | Modifiers -------------------------+-----------+----------- name | text | sync_type | text | wait_num | integer | sync_priority | inteter | sync_state | text | member | text[] | level | integer | write_location | pg_lsn | flush_location | pg_lsn | apply_location | pg_lsn | - "name" : node name or group name, or "main" meaning top level node. - "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL. - "wait_num" : number of nodes/groups to wait for in this group. - "sync_priority" : priority of node/group in this group. "main" node has "0". - the standby is in quorum group always has priority 1. - the standby is in priority group has priority according to definition order. - "sync_state" : 'sync' or 'potential' or 'quorum'. - the standby is in quorum group is always 'quorum'. - the standby is in priority group is 'sync' / 'potential'. - "member" : array of members for group node, otherwise NULL. - "level" : nested level. "main" node is level 0. - "write/flush/apply_location" : group/node calculated LSN according to configuration. When sync replication is set as above, the new system view shows, =# select * from pg_stat_replication_group; name | sync_type | wait_num | sync_priority | sync_state | member | level | write_location | flush_location | apply_location -------------+---------------+---------------+-------------------+-----------------+---------------------------------------+-------+---------------------+---------------------+---------------- main | priority | 2 | 0 | sync | {local,london,tokyo} | 0 | | | local | | 0 | 1 | sync | | 1 | | | london | quorum | 2 | 2 | potential | {london1,london2,london3} | 1 | | | london1 | | 0 | 1 | potential | | 2 | | | london2 | | 0 | 2 | potential | | 2 | | | london3 | | 0 | 3 | potential | | 2 | | | tokyo | quorum | 1 | 3 | potential | {tokyo1,tokyo2} | 1 | | | tokyo1 | | 0 | 1 | quorum | | 2 | | | tokyo2 | | 0 | 1 | quorum | | 2 | | | (9 rows) Thought? Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers