Re: [HACKERS] Support for N synchronous standby servers - take 2

Masahiko Sawada Fri, 05 Feb 2016 01:20:46 -0800

On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
<[email protected]> wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> <[email protected]> wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> <[email protected]> wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <[email protected]> wrote:
>>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>>> <[email protected]> wrote:
>>>>> Yes, please let's use the custom language, and let's not care of not
>>>>> more than 1 level of nesting so as it is possible to represent
>>>>> pg_stat_replication in a simple way for the user.
>>>>
>>>> "not" is used twice in this sentence in a way that renders me not able
>>>> to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single 
> node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>


I agree with adding new system catalog to easily checking replication
status for user. And group name will needed for this.
What about adding group name with ":" to immediately after set of
standbys like follows?

2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo]

Also, regarding sync replication according to configuration, the view
I'm thinking is following definition.

=# \d pg_synchronous_replication
     Column          |  Type   | Modifiers
-------------------------+-----------+-----------
 name                | text      |
 sync_type         | text      |
 wait_num          | integer  |
 sync_priority     | inteter   |
 sync_state        | text      |
 member            | text[]     |
 level                 | integer  |
 write_location    | pg_lsn  |
 flush_location    | pg_lsn  |
 apply_location   | pg_lsn   |

- "name" : node name or group name, or "main" meaning top level node.
- "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL.
- "wait_num" : number of nodes/groups to wait for in this group.
- "sync_priority" : priority of node/group in this group. "main" node has "0".
                          - the standby is in quorum group always has
priority 1.
                          - the standby is in priority group has
priority according to definition order.
- "sync_state" : 'sync' or 'potential' or 'quorum'.
                         - the standby is in quorum group is always 'quorum'.
                         - the standby is in priority group is 'sync'
/ 'potential'.
- "member" : array of members for group node, otherwise NULL.
- "level" : nested level. "main" node is level 0.
- "write/flush/apply_location" : group/node calculated LSN according
to configuration.

When sync replication is set as above, the new system view shows,

=# select * from pg_stat_replication_group;
  name   | sync_type | wait_num | sync_priority | sync_state |
 member                   | level | write_location | flush_location |
apply_location
-------------+---------------+---------------+-------------------+-----------------+---------------------------------------+-------+---------------------+---------------------+----------------
 main     | priority      |        2       |                 0 | sync
         | {local,london,tokyo}          |     0  |
  |                      |
 local      |                |        0       |                 1 |
sync           |                                        |     1 |
                |                      |
 london   | quorum    |        2       |                 2 | potential
     | {london1,london2,london3} |     1  |                      |
                 |
 london1 |                |        0       |                 1 |
potential      |                                        |     2  |
                 |                      |
 london2 |                |        0       |                 2 |
potential      |                                        |     2  |
                 |                      |
 london3 |                |        0       |                 3 |
potential      |                                        |     2  |
                 |                      |
 tokyo    | quorum    |        1       |                 3 | potential
     | {tokyo1,tokyo2}                 |     1  |
|                      |
 tokyo1  |                |        0       |                 1 |
quorum       |                                         |     2  |
               |                       |
 tokyo2  |                |        0       |                 1 |
quorum       |                                         |     2  |
               |                       |
(9 rows)

Thought?

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

Reply via email to