Hello,

At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada.m...@gmail.com> 
wrote in <cad21aoc9vi8wogtxio3z1nwovfxbjpnftt7+5jadvhkn17u...@mail.gmail.com>
> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao.fu...@gmail.com> wrote:
> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> > wrote:
...
> >> This patch is a tool for discussion, so I'm not going to fix this bug
> >> until getting consensus.
> >>
> >> We are still under the discussion to find solution that can get consensus.
> >> I felt that it's difficult to select from the two approaches within
> >> this development cycle, and there would not be time to implement such
> >> big feature even if we selected.
> >> But this feature is obviously needed by many users.
> >> So I'm considering more simple and extensible something solution, the
> >> idea I posted is one of them.
> >> The another worth considering approach is that just specifying the
> >> number of sync standby. It also can cover the main use cases in
> >> some-cases.
> >
> > Yes, it covers main and simple use case like "I want to have multiple
> > synchronous replicas!". Even if we miss quorum commit at the first
> > version, the feature is still very useful.

+1

> It can cover not only the case you mentioned but also main use case
> Michael mentioned by setting same application_name.
> And that first version patch is almost implemented, so just needs to
> be reviewed.
> 
> I think that it would be good to implement the simple feature at the
> first version, and then coordinate the design based on opinion and
> feed backs from more user, use-case.

Yeah. I agree with it. And I have two proposals in this
direction.

- Notation

 synchronous_standby_names, and synchronous_replication_method as
 a variable to provide other syntax is probably no argument
 except its name. But I feel synchronous_standby_num looks bit
 too specific.

 I'd like to propose if this doesn't reprise the argument on
 notation for replication definitions:p

 The following two GUCs would be enough to bear future expansion
 of notation syntax and/or method.

 synchronous_standby_names :  as it is

 synchronous_replication_method:

   default is "1-priority", which means the same with the current
   meaning.  possible additional values so far would be,

    "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
                  where n is the number of required acknowledges.

    "n-quorum":   the format of s_s_names is the same as above, but
                  it is read in quorum context.

 These can be expanded, for example, as follows, but in future.

    "complex" : Michael's format.
    "json"    : JSON?
    "json-ext": specify JSON in external file.

Even after we have complex notations, I suppose that many use
cases are coverd by the first tree notations.


- Internal design

 What should be done in SyncRepReleaseWaiters() is calculating a
 pair of LSNs that can be regarded as synced and decide whether
 *this* walsender have advanced the LSN pair, then trying to
 release backends that wait for the LSNs *if* this walsender has
 advanced them.

 From such point, the proposed patch will make redundant trials
 to release backens.

 Addition to that, the patch looks to be a mixture of the current
 implement and the new feature. These are for the same objective
 so they cannot coexist each other, I think. As the result, codes
 for both quorum/priority judgement appear at multiple level in
 call tree. This would be an obstacle for future (possible)
 expansion.

 So, I think this feature should be implemented as following,

 SyncRepInitConfig reads the configuration and stores the result
 structure into elsewhere such like WalSnd->syncrepset_definition
 instead of WalSnd->sync_standby_priority, which should be
 removed. Nothing would be stored if the current wal sender is
 not a member of the defined replication set. Storing a pointer
 to matching function there would increase the flexibility but
 such implement in contrast will make the code difficult to be
 read.. (I often look for the entity of xlogreader->read_page()
 ;)

 Then SyncRepSyncedLsnAdvancedTo() instead of
 SyncRepGetSynchronousStandbys() returns an LSN pair that can be
 regarded as 'synced' according to specified definition of
 replication set and whether this walsender have advanced the
 LSNs.

 Finally, SyncRepReleaseWaiters() uses it to release backends if
 needed.

 The differences among quorum/priority or others are confined in
 SyncRepSyncedLsnAdvancedTo(). As the result,
 SyncRepReleaseWaiters would look as following.

 | SyncRepReleaseWaiters(void)
 | {
 |   if (MyWalSnd->syncrepset_definition == NULL || ...)
 |      return;
 |   ...
 |   if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
 |   {
 |     /* I haven't advanced the synced LSNs */
 |     LWLockRelease(SyncRepLock);
 |     rerturn;
 |   }
 |   /* Set the lsn first so that when we wake backends they will relase...

 I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
 does but perhaps yes we can:p in effective manner..

 What do you think about this?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to