Re: [HACKERS] Patch for fail-back without fresh backup

Sawada Masahiko Sun, 07 Jul 2013 00:29:14 -0700

On Sun, Jul 7, 2013 at 4:19 PM, Sawada Masahiko <sawada.m...@gmail.com> wrote:
> On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
>> On 17 June 2013 09:03, Pavan Deolasee <pavan.deola...@gmail.com> wrote:
>>
>>> I agree. We should probably find a better name for this. Any suggestions ?
>>
>> err, I already made one...
>>
>>>> But that's not the whole story. I can see some utility in a patch that
>>>> makes all WAL transfer synchronous, rather than just commits. Some
>>>> name like synchronous_transfer might be appropriate. e.g.
>>>> synchronous_transfer = all | commit (default).
>>
>>> Since commits are more foreground in nature and this feature
>>> does not require us to wait during common foreground activities, we want a
>>> configuration where master can wait for synchronous transfers at other than
>>> commits. May we can solve that by having more granular control to the said
>>> parameter ?
>>>
>>>>
>>>> The idea of another slew of parameters that are very similar to
>>>> synchronous replication but yet somehow different seems weird. I can't
>>>> see a reason why we'd want a second lot of parameters. Why not just
>>>> use the existing ones for sync rep? (I'm surprised the Parameter
>>>> Police haven't visited you in the night...) Sure, we might want to
>>>> expand the design for how we specify multi-node sync rep, but that is
>>>> a different patch.
>>>
>>>
>>> How would we then distinguish between synchronous and the new kind of
>>> standby ?
>>
>> That's not the point. The point is "Why would we have a new kind of
>> standby?" and therefore why do we need new parameters?
>>
>>> I am told, one of the very popular setups for DR is to have one
>>> local sync standby and one async (may be cascaded by the local sync). Since
>>> this new feature is more useful for DR because taking a fresh backup on a
>>> slower link is even more challenging, IMHO we should support such setups.
>>
>> ...which still doesn't make sense to me. Lets look at that in detail.
>>
>> Take 3 servers, A, B, C with A and B being linked by sync rep, and C
>> being safety standby at a distance.
>>
>> Either A or B is master, except in disaster. So if A is master, then B
>> would be the failover target. If A fails, then you want to failover to
>> B. Once B is the target, you want to failback to A as the master. C
>> needs to follow the new master, whichever it is.
>>
>> If you set up sync rep between A and B and this new mode between A and
>> C. When B becomes the master, you need to failback from B from A, but
>> you can't because the new mode applied between A and C only, so you
>> have to failback from C to A. So having the new mode not match with
>> sync rep means you are forcing people to failback using the slow link
>> in the common case.
>>
>> You might observe that having the two modes match causes problems if A
>> and B fail, so you are forced to go to C as master and then eventually
>> failback to A or B across a slow link. That case is less common and
>> could be solved by extending sync transfer to more/multi nodes.
>>
>> It definitely doesn't make sense to have sync rep on anything other
>> than a subset of sync transfer. So while it may be sensible in the
>> future to make sync transfer a superset of sync rep nodes, it makes
>> sense to make them the same config for now.
> I have updated the patch.
>
> we support following 2 cases.
> 1. SYNC server and also make same failback safe standby server
> 2. ASYNC server and also make same failback safe standby server
>
> 1.  changed name of parameter
>   give up 'failback_safe_standby_names' parameter from the first patch.
>   and changed name of parameter from 'failback_safe_mode ' to
> 'synchronous_transfer'.
>   this parameter accepts 'all', 'data_flush' and 'commit'.
>
>   -'commit'
>     'commit' means that master waits for corresponding WAL to flushed
> to disk of standby server on commits.
>     but master doesn't waits for replicated data pages.
>
>   -'data_flush'
>     'data_flush' means that master waits for replicated data page
> (e.g, CLOG, pg_control) before flush to disk of master server.
>     but if user set to 'data_flush' to this parameter,
> 'synchronous_commit' values is ignored even if user set
> 'synchronous_commit'.
>
>   -'all'
>     'all' means that master waits for replicated WAL and data page.
>
> 2. put SyncRepWaitForLSN() function into XLogFlush() function
>   we have put SyncRepWaitForLSN() function into XLogFlush() function,
> and change argument of XLogFlush().
>
> they are setup case and need to set parameters.
>
> - SYNC server and also make same failback safe standgy server (case 1)
>   synchronous_transfer = all
>   synchronous_commit = remote_write/on
>   synchronous_standby_names = <ServerName>
>
> - ASYNC server and also make same failback safe standgy server (case 2)
>   synchronous_transfer = data_flush
>   (synchronous_commit values is ignored)
>
> - default SYNC replication
>   synchronous_transfer = commit
>   synchronous_commit = on
>   synchronous_standby_names = <ServerName>
>
> - default ASYNC replication
>   synchronous_transfer = commit
>
> ToDo
> 1. currently this patch supports synchronous transfer. so we can't set
> different synchronous transfer mode to each server.
>     we need to improve the patch for support following cases.
>    - SYNC standby and make separate ASYNC failback safe standby
>    - ASYNC standby and make separate ASYNC failback safe standby
>
> 2. we have not measure performance yet. we need to measure perfomance.
>
> please give me your feedback.
>
> Regards,
>
> -------
> Sawada Masahiko


I'm sorry. I forgot attached the patch.
Please see the attached file.

Regards,

-------
Sawada Masahiko

failback_safe_standby_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for fail-back without fresh backup

Reply via email to