Hello, At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada.m...@gmail.com> wrote in <cad21aoc9vi8wogtxio3z1nwovfxbjpnftt7+5jadvhkn17u...@mail.gmail.com> > On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao.fu...@gmail.com> wrote: > > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada.m...@gmail.com> > > wrote: ... > >> This patch is a tool for discussion, so I'm not going to fix this bug > >> until getting consensus. > >> > >> We are still under the discussion to find solution that can get consensus. > >> I felt that it's difficult to select from the two approaches within > >> this development cycle, and there would not be time to implement such > >> big feature even if we selected. > >> But this feature is obviously needed by many users. > >> So I'm considering more simple and extensible something solution, the > >> idea I posted is one of them. > >> The another worth considering approach is that just specifying the > >> number of sync standby. It also can cover the main use cases in > >> some-cases. > > > > Yes, it covers main and simple use case like "I want to have multiple > > synchronous replicas!". Even if we miss quorum commit at the first > > version, the feature is still very useful.
+1 > It can cover not only the case you mentioned but also main use case > Michael mentioned by setting same application_name. > And that first version patch is almost implemented, so just needs to > be reviewed. > > I think that it would be good to implement the simple feature at the > first version, and then coordinate the design based on opinion and > feed backs from more user, use-case. Yeah. I agree with it. And I have two proposals in this direction. - Notation synchronous_standby_names, and synchronous_replication_method as a variable to provide other syntax is probably no argument except its name. But I feel synchronous_standby_num looks bit too specific. I'd like to propose if this doesn't reprise the argument on notation for replication definitions:p The following two GUCs would be enough to bear future expansion of notation syntax and/or method. synchronous_standby_names : as it is synchronous_replication_method: default is "1-priority", which means the same with the current meaning. possible additional values so far would be, "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...", where n is the number of required acknowledges. "n-quorum": the format of s_s_names is the same as above, but it is read in quorum context. These can be expanded, for example, as follows, but in future. "complex" : Michael's format. "json" : JSON? "json-ext": specify JSON in external file. Even after we have complex notations, I suppose that many use cases are coverd by the first tree notations. - Internal design What should be done in SyncRepReleaseWaiters() is calculating a pair of LSNs that can be regarded as synced and decide whether *this* walsender have advanced the LSN pair, then trying to release backends that wait for the LSNs *if* this walsender has advanced them. From such point, the proposed patch will make redundant trials to release backens. Addition to that, the patch looks to be a mixture of the current implement and the new feature. These are for the same objective so they cannot coexist each other, I think. As the result, codes for both quorum/priority judgement appear at multiple level in call tree. This would be an obstacle for future (possible) expansion. So, I think this feature should be implemented as following, SyncRepInitConfig reads the configuration and stores the result structure into elsewhere such like WalSnd->syncrepset_definition instead of WalSnd->sync_standby_priority, which should be removed. Nothing would be stored if the current wal sender is not a member of the defined replication set. Storing a pointer to matching function there would increase the flexibility but such implement in contrast will make the code difficult to be read.. (I often look for the entity of xlogreader->read_page() ;) Then SyncRepSyncedLsnAdvancedTo() instead of SyncRepGetSynchronousStandbys() returns an LSN pair that can be regarded as 'synced' according to specified definition of replication set and whether this walsender have advanced the LSNs. Finally, SyncRepReleaseWaiters() uses it to release backends if needed. The differences among quorum/priority or others are confined in SyncRepSyncedLsnAdvancedTo(). As the result, SyncRepReleaseWaiters would look as following. | SyncRepReleaseWaiters(void) | { | if (MyWalSnd->syncrepset_definition == NULL || ...) | return; | ... | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos)) | { | /* I haven't advanced the synced LSNs */ | LWLockRelease(SyncRepLock); | rerturn; | } | /* Set the lsn first so that when we wake backends they will relase... I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo does but perhaps yes we can:p in effective manner.. What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers