Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-14 Thread Jose Ildefonso Camargo Tolosa
On Sat, Jul 14, 2012 at 12:42 AM, Amit kapila  wrote:
>> From: Jose Ildefonso Camargo Tolosa [ildefonso.cama...@gmail.com]
>> Sent: Saturday, July 14, 2012 9:36 AM
>>On Fri, Jul 13, 2012 at 11:12 PM, Amit kapila  wrote:
>> From: pgsql-hackers-ow...@postgresql.org 
>> [pgsql-hackers-ow...@postgresql.org] on behalf of Jose Ildefonso Camargo 
>> Tolosa [ildefonso.cama...@gmail.com]
>> Sent: Saturday, July 14, 2012 6:08 AM
>> On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian  wrote:
>>> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote:
>>>
>>>>> So how about this for a Postgres TODO:
>>>>>
>>>>> Add configuration variable to allow Postgres to disable 
>>>>> synchronous
>>>>>replication after a specified timeout, and add variable to alert
>>>>> administrators of the change.
>>
>>>> I agree we need a TODO for this, but... I think timeout-only is not
>>>> the best choice, there should be a maximum timeout (as a last
>>>> resource: the maximum time we are willing to wait for standby, this
>>>> have to have the option of "forever"), but certainly PostgreSQL have
>>>> to detect the *complete* disconnection of the standby (or all standbys
>>>> on the synchronous_standby_names), if it detects that no standbys are
>>>> eligible for sync standby AND the option to do fallback to async is
>>>> enabled = it will go into standalone mode (as if
>>>> synchronous_standby_names were empty), otherwise (if option is
>>>> disabled) it will just continue to wait for ever (the "last resource"
>>>> timeout is ignored if the fallback option is disabled) I would
>>>> call this "soft_synchronous_standby", and
>>>> "soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane
>>>> value would be ~5 seconds) or something like that (I'm quite bad at
>>>> picking names :( ).
>>
>> >After it has gone to standalone mode, if the standby came back will it be 
>> >able to return back to sync mode with it.
>
>> That's the idea, yes, after the standby comes back, the master would
>> act as if the sync standby connected for the first time: first going
>> through the "catchup" mode, and "once the lag between standby and
>> primary reaches zero "(...)" we move to real-time streaming state"
>> (from 9.1 docs), at that point: normal sync behavior is restored.
>
> Idea wise, it looks okay, but are you sure that in the current code/design, 
> it can handle the way you are suggesting.
> I am not sure it can work because it might be the case that due to network 
> instability, the master has gone in standalone mode
> and now after standy is able to communicate back, it might be expecting to 
> get more data rather than go in cacthup mode.
> I believe some person who is expert of this code area can comment here to 
> make it more concrete.

Well, I'd need to dive into the code, but as far as I know, is the
master who decides to be on "catchup" mode, and standby just takes
care of sending feedback to master.  Also, it has to handle the
situation, because currently, if master goes away because it crashed,
or because of network issues, the standby doesn't really know why, and
will reconnect to master and do whatever it needs to do to get in sync
with master again (be it: try to reconnect several times while master
is restarting, or that it just reconnect to a waiting master, and
request pending WAL segments).  There have to be code in place to
handle those issues, because it is already working.  I'm trying to get
a solution that is as non-intrusive as possible, with lower amount of
code added, so that performance doesn't suffer by reusing current
logic and actions, with small alterations.

>
> With Regards,
> Amit Kapila.



--
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-13 Thread Jose Ildefonso Camargo Tolosa
On Fri, Jul 13, 2012 at 11:12 PM, Amit kapila  wrote:
> From: pgsql-hackers-ow...@postgresql.org [pgsql-hackers-ow...@postgresql.org] 
> on behalf of Jose Ildefonso Camargo Tolosa [ildefonso.cama...@gmail.com]
> Sent: Saturday, July 14, 2012 6:08 AM
> On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian  wrote:
>> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote:
>>
>>> So how about this for a Postgres TODO:
>>>
>>> Add configuration variable to allow Postgres to disable synchronous
>>>replication after a specified timeout, and add variable to alert
>>> administrators of the change.
>
>> I agree we need a TODO for this, but... I think timeout-only is not
>> the best choice, there should be a maximum timeout (as a last
>> resource: the maximum time we are willing to wait for standby, this
>> have to have the option of "forever"), but certainly PostgreSQL have
>> to detect the *complete* disconnection of the standby (or all standbys
>> on the synchronous_standby_names), if it detects that no standbys are
>> eligible for sync standby AND the option to do fallback to async is
>> enabled = it will go into standalone mode (as if
>> synchronous_standby_names were empty), otherwise (if option is
>> disabled) it will just continue to wait for ever (the "last resource"
>> timeout is ignored if the fallback option is disabled) I would
>> call this "soft_synchronous_standby", and
>> "soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane
>> value would be ~5 seconds) or something like that (I'm quite bad at
>> picking names :( ).
>
> After it has gone to standalone mode, if the standby came back will it be 
> able to return back to sync mode with it.

That's the idea, yes, after the standby comes back, the master would
act as if the sync standby connected for the first time: first going
through the "catchup" mode, and "once the lag between standby and
primary reaches zero "(...)" we move to real-time streaming state"
(from 9.1 docs), at that point: normal sync behavior is restored.

--
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-13 Thread Jose Ildefonso Camargo Tolosa
On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian  wrote:
> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote:
>> How you decide what to do with the servers on failures isn't that
>> important here, really. You can probably run e.g. Pacemaker on 3+
>> machines and have it check for quorums to accomplish this. That's a
>> good approach at least. You can still have only 2 database servers
>> (for cost reasons), if you want. PostgreSQL could have all this
>> built-in, but I don't think it sounds overly useful to only be able
>> to disable synchronous replication on the primary after a timeout.
>> Then you can never safely do a failover to the secondary, because
>> you can't be sure synchronous replication was active on the failed
>> primary...
>
> So how about this for a Postgres TODO:
>
> Add configuration variable to allow Postgres to disable synchronous
> replication after a specified timeout, and add variable to alert
> administrators of the change.

I agree we need a TODO for this, but... I think timeout-only is not
the best choice, there should be a maximum timeout (as a last
resource: the maximum time we are willing to wait for standby, this
have to have the option of "forever"), but certainly PostgreSQL have
to detect the *complete* disconnection of the standby (or all standbys
on the synchronous_standby_names), if it detects that no standbys are
eligible for sync standby AND the option to do fallback to async is
enabled = it will go into standalone mode (as if
synchronous_standby_names were empty), otherwise (if option is
disabled) it will just continue to wait for ever (the "last resource"
timeout is ignored if the fallback option is disabled) I would
call this "soft_synchronous_standby", and
"soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane
value would be ~5 seconds) or something like that (I'm quite bad at
picking names :( ).

--
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-13 Thread Jose Ildefonso Camargo Tolosa
Hi Hampus,

On Fri, Jul 13, 2012 at 2:42 AM, Hampus Wessman  wrote:
> Hi all,
>
> Here are some (slightly too long) thoughts about this.

Nah, not that long.

>
> Shaun Thomas skrev 2012-07-12 22:40:
>
>> On 07/12/2012 12:02 PM, Bruce Momjian wrote:
>>
>>> Well, the problem also exists if add it as an internal database
>>> feature --- how long do we wait to consider the standby dead, how do
>>> we inform administrators, etc.
>>
>>
>> True. Though if there is no secondary connected, either because it's not
>> there yet, or because it disconnected, that's an easy check. It's the
>> network lag/stall detection that's tricky.
>
>
> It is indeed tricky to detect this. If you don't get an (immediate) reply
> from the secondary (and you never do!), then all you can do is wait and
> *eventually* (after how long? 250ms? 10s?) assume that there is no
> connection between them. The conclusion may very well be wrong sometimes. A
> second problem is that we still don't know if this is caused by some kind of
> network problems or if it's caused by the secondary not running. It's
> perfectly possible that both servers are working, but just can't communicate
> at the moment.

How about: same logic as it currently uses to detect when the
"designated" synchronous standby is no longer there, and move on to
the next one on the synchronous_standby_names?

The rule to *know* that a standby went away is already there.

>
> The thing is that what we do next (at least if our data is important and why
> otherwise use synchronous replication of any kind...) depends on what *did*
> happen. Assume that we have two database servers. At any time we need at
> most one primary database to be running. Without that requirement our data
> can get messed up completely... If HA is important to us, we may choose to

Not necessarily, but true: that's why you use to kill the (failing?)
node on promotion of the standby, just in case.

> do a failover to the secondary (and live without replication for the moment)
> if the primary fails. With synchronous repliction, we can do this without
> losing any data. If the secondary also dies, then we do lose data (and we'll
> know it!), but it might be an acceptable risk. If the secondary isn't
> permanently damaged, then we might even be able to get the data back after
> some down time. Ok, so that's one way to reconfigure the database servers on
> a failure. If the secondary fails instead, then we can do similarly and
> remove it from the "cluster" (or in other words, disable synchronous
> replication to the secondary). Again, we don't lose any data by doing this.

Right, but you have to monitor the standby too! ie: more work on the
pacemaker side. and non-trivial work, for example, just blowing
away the standby won't do any good here, as for the master: you can
just power it off, promote the standby, and be done with it!, if the
standby fails: you have to modify master's config, and reload configs
there... more code: more chances of failure.

> We're taking a certain risk, however. We can't safely do a failover to the
> secondary anymore... So if the primary fails now, then the only way not to
> lose data is to hope that we can get it back from the failed machine (the
> failure may be temporary).
>
> There's also the third possibility, of course, that the two servers are both
> up and running, but they can't communicate over the network at the moment
> (this is, by the way, a difference from RAID, I guess). What do we do then?

Kill the "failing" node, just in case, in this case, without the
"extra" work of monitoring standby, you would just make the standby
kill the master before promoting the standby.

> Well, we still need at most one primary database server. We'll have to
> (somehow, which doesn't matter as much) decide which database to keep and
> consider the other one "down". Then we can just do as above (with all the

This is arbitrary, we usually just assume the master to be failing
when the standby is healthy (from the standby point of view).

> same implications!). Is it always a good idea to keep the primary? No! What
> if you (as a stupid example) pull the network cable from the primary (or
> maybe turn off a switch so that it's isolated from most of the network)? In

That means that you failed to have redundant connectivity to the
standby (that is a must on clusters), yes, redundant switch too: with
"smart switches" on the  that case you probably want the secondary to take over instead. At least if
> you value service availability. At this point we can still do a safe
> failover too.
>
> My point here is that if HA is important to you, then you may very well want
> to disable synchronous replication on a failure to avoid down time, but this
> has to be integrated with your overall failover / cluster management
> solution. Just having the primary automatically disable synchronous

That's not a trivial matter, you have to monitor the standby, and make
changes on the master configuration.

Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-13 Thread Jose Ildefonso Camargo Tolosa
On Fri, Jul 13, 2012 at 12:25 AM, Amit Kapila  wrote:
>
>> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org]
>> On Behalf Of Jose Ildefonso Camargo Tolosa
>>>On Thu, Jul 12, 2012 at 9:28 AM, Aidan Van Dyk  wrote:
>> On Thu, Jul 12, 2012 at 9:21 AM, Shaun Thomas 
> wrote:
>>
>
>> As currently is, the point of: freezing the master because standby
>> dies is not good for all cases (and I dare say: for most cases), and
>> having to wait for pacemaker or other monitoring to note that, change
>> master config and reload... it will cause a service disruption! (for
>> several seconds, usually, ~30 seconds).
>
> Yes, this is true that it can cause service disruption, but the same will be
> True even if master detects that internally by having timeout.
> By keeping this as external, the current behavior of PostgreSQL can be
> maintained that
> if there is no standy in sync mode, it will wait and still serve the purpose
> as externally it can send message for master.
>

How does currently PostgreSQL detects that its main synchronous
standby went away and switch to another synchronous standby on the
synchronous_standby_names config parameter?

The same logic could be applied to "no more synchronous standbys: go
into standalone" (optionally).

--
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-12 Thread Jose Ildefonso Camargo Tolosa
On Thu, Jul 12, 2012 at 4:10 PM, Shaun Thomas  wrote:
> On 07/12/2012 12:02 PM, Bruce Momjian wrote:
>
>> Well, the problem also exists if add it as an internal database
>> feature --- how long do we wait to consider the standby dead, how do
>> we inform administrators, etc.
>
>
> True. Though if there is no secondary connected, either because it's not
> there yet, or because it disconnected, that's an easy check. It's the
> network lag/stall detection that's tricky.

Well, yes... but how does PostgreSQL currently note its "main
synchronous standby" went away and that it have to use another standby
and synchronous?  How long does it takes it to note that?

>
>
>> I don't think anyone says the feature is useless, but is isn't going
>> to be a simple boolean either.
>
>
> Oh $Deity no. I'd never suggest that. I just tend to be overly verbose, and
> sometimes my intent gets lost in the rambling as I try to explain my
> perspective. I apologize if it somehow came across that anyone could just
> flip a switch and have it work.
>
> My C is way too rusty, or I'd be writing an extension right now to do this,
> or be looking over that patch I linked to originally to make suitable
> adaptations. I know I talk about how relatively handy DRBD is, but it's also
> a gigantic PITA since it has to exist underneath the actual filesystem. :)
>
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> stho...@optionshouse.com
>
>
>
> __
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related
> to this email
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-12 Thread Jose Ildefonso Camargo Tolosa
On Thu, Jul 12, 2012 at 8:29 PM, Aidan Van Dyk  wrote:
> On Thu, Jul 12, 2012 at 8:27 PM, Jose Ildefonso Camargo Tolosa
>
>> Yeah, you need that with PostgreSQL, but no with DRBD, for example
>> (sorry, but DRBD is one of the flagships of HA things in the Linux
>> world).  Also, I'm not convinced about the "2nd standby" thing... I
>> mean, just read this on the docs, which is a little alarming:
>>
>> "If primary restarts while commits are waiting for acknowledgement,
>> those waiting transactions will be marked fully committed once the
>> primary database recovers. There is no way to be certain that all
>> standbys have received all outstanding WAL data at time of the crash
>> of the primary. Some transactions may not show as committed on the
>> standby, even though they show as committed on the primary. The
>> guarantee we offer is that the application will not receive explicit
>> acknowledgement of the successful commit of a transaction until the
>> WAL data is known to be safely received by the standby."
>>
>> So... there is no *real* warranty here either... I don't know how I
>> skipped that paragraph before today I mean, this implies that it
>> is possible that a transaction could be marked as commited on the
>> master, but the app was not informed on that (and thus, could try to
>> send it again), and the transaction was NOT applied on the standby
>> how can this happen? I mean, when the master comes back, shouldn't the
>> standby get the missing WAL pieces from the master and then apply the
>> transaction? The standby part is the one that I don't really get, on
>> the application side... well, there are several ways in which you can
>> miss the "commit confirmation": connection issues in the worst moment,
>> and the such, so, I guess it is not *so* serious, and the app should
>> have a way of checking its last transaction if it lost connectivity to
>> server before getting the transaction commited.
>
> But you already have that in a single server situation as well.  There
> is a window between when the commit is "durable" (and visible to
> others, and will be committed after recovery of a crash), when the
> client doesn't yet know it's committed (and might never get the commit
> message due to server crash, network disconnect, client middle-tier
> crash, etc).
>
> So people are already susceptible to that, and defending against it, no? ;-)

Right.  What I'm saying is that particular part on the docs:

"If primary restarts while commits are waiting for acknowledgement,
those waiting transactions will be marked fully committed once the
primary database recovers. "()"Some transactions may not show as
committed on the standby, even though they show as committed on the
primary."(...)

See? it sounds like, after the primary database recovers, the standby
will still not have the transaction committed, and as far as I thought
I knew, the standby should get that over the WAL stream from master
once it reconnects to it.

>
> And they are susceptible to that if they are on PostgreSQL, Oracle, MS
> SQL, DB2, etc.

Certainly.  That's why I said:

(...)"The standby part is the one that I don't really get, on
the application side... well, there are several ways in which you can
miss the "commit confirmation": connection issues in the worst moment,
and the such, so, I guess it is not *so* serious, and the app should
have a way of checking its last transaction if it lost connectivity to
server before getting the transaction commited."

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-12 Thread Jose Ildefonso Camargo Tolosa
On Thu, Jul 12, 2012 at 12:17 PM, Bruce Momjian  wrote:
> On Thu, Jul 12, 2012 at 11:33:26AM +0530, Amit Kapila wrote:
>> > From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org]
>> > On Behalf Of Jose Ildefonso Camargo Tolosa
>>
>> > Please, stop arguing on all of this: I don't think that adding an
>> > option will hurt anybody (specially because the work was already done
>> > by someone), we are not asking to change how the things work, we just
>> > want an option to decided whether we want it to freeze on standby
>> > disconnection, or if we want it to continue automatically... is that
>> > asking so much?
>>
>> I think this kind of decision should be done from outside utility or
>> scripts.
>> It would be better if from outside it can be detected that stand-by is down
>> during sync replication, and send command to master to change its mode or
>> change settings appropriately without stopping master.
>> Putting this kind of more and more logic into replication code will make it
>> more cumbersome.
>
> We certainly would need something external to inform administrators that
> the system is no longer synchronous.

That is *mandatory*, just as you monitor DRBD, or disk arrays: if a
disk fail, and alert have to be issued, to fix it as soon as possible.

But such alerts can wait 30 seconds to be sent out, so, any monitoring
system would be able to handle that, we just need to get current
system status from the monitoring system, and create corresponding
rules: a simple matter, actually.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-12 Thread Jose Ildefonso Camargo Tolosa
On Thu, Jul 12, 2012 at 9:28 AM, Aidan Van Dyk  wrote:
> On Thu, Jul 12, 2012 at 9:21 AM, Shaun Thomas  
> wrote:
>
>> So far as transaction durability is concerned... we have a continuous
>> background rsync over dark fiber for archived transaction logs, DRBD for
>> block-level sync, filesystem snapshots for our backups, a redundant async DR
>> cluster, an offsite backup location, and a tape archival service stretching
>> back for seven years. And none of that will cause the master to stop
>> processing transactions unless the master itself dies and triggers a
>> failover.
>
> Right, so if the dark fiber between New Orleans and Seattle (pick two
> places for your datacenter) happens to be the first thing failing in
> your NO data center.  Disconenct the sync-ness, and continue.  Not a
> problem, unless it happens to be Aug 29, 2005.
>
> You have lost data.  Maybe only a bit.  Maybe it wasn't even
> important.  But that's not for PostgreSQL to decide.

I never asked for it... but, you (the one who is configuring the
system) can decide, and should be able to decide... right now: we
can't decide.

>
> But because your PG on DRDB "continued" when it couldn't replicate to
> Seattle, it told it's clients the data was durable, just minutes
> before the whole DC was under water.

Yeah, well, what is the probability of all of that?... really tiny.  I
bet it is more likely that you win the lottery, than all of these
events happening within that time frame.  But, risking monetary loses
because, for example, the online store stopped accepting orders while
the standby server went down, that's not acceptable for some companies
(and some companies just can't buy 3 x DB servers, or more!).

>
> OK, so a wise admin team would have removed the NO DC from it's
> primary role days before that hit.
>
> Change the NO to NYC and the date Sept 11, 2001.
>
> OK, so maybe we can concede that these types of major catasrophies are
> more devestating to us than loosing some data.
>
> Now your primary server was in AWS US East last week.  It's sync slave
> was in the affected AZ, but your PG primary continues on, until, since
> it was a EC2 instance, it disappears.  Now where is your data?

Who would *really* trust your PostgreSQL DB to EC2?... I mean, the I/O
is not very good, and the price is not exactly that low so that you
take that risk.

All in all: you are still getting together coincidences that have *so
low* probability

>
> Or the fire marshall orders the data center (or whole building) EPO,
> and the connection to your backup goes down minutes before your
> servers or other network peers.
>
>> Using PG sync in its current incarnation would introduce an extra failure
>> scenario that wasn't there before. I'm pretty sure we're not the only ones
>> avoiding it for exactly that reason. Our queue discards messages it can't
>> fulfil within ten seconds and then throws an error for each one. We need to
>> decouple the secondary as quickly as possible if it becomes unresponsive,
>> and there's really no way to do that without something in the database, one
>> way or another.
>
> It introduces an "extra failure", because it has introduce an "extra
> data durability guarantee".
>
> Sure, many people don't *really* want that data durability guarantee,
> even though they would like the "maybe guaranteed" version of it.
>
> But that fine line is actually a difficult (impossible?) one to define
> if you don't know, at the moment of decision, what the next few
> moments will/could become.

You *never* know.  And the truth is that you have to make the decision
with what you have, if you can pay 10 servers nationwide: good for
you, not all of us can afford that (men, I could barely pay for two,
and that because I *know* I don't want to risk to lose the data or
service because the single server died).

As currently is, the point of: freezing the master because standby
dies is not good for all cases (and I dare say: for most cases), and
having to wait for pacemaker or other monitoring to note that, change
master config and reload... it will cause a service disruption! (for
several seconds, usually, ~30 seconds).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-12 Thread Jose Ildefonso Camargo Tolosa
On Thu, Jul 12, 2012 at 8:35 AM, Dimitri Fontaine
 wrote:
> Hi,
>
> Jose Ildefonso Camargo Tolosa  writes:
>> environments.  And no, it doesn't makes synchronous replication
>> meaningless, because it will work synchronous if it have someone to
>> sync to, and work async (or standalone) if it doesn't: that's perfect
>> for HA environment.
>
> You seem to want Service Availibility when we are providing Data
> Availibility. I'm not saying you shouldn't ask what you're asking, just
> that it is a different need.

Yes, and no: I don't see why we can't have and option to choose which
one we want.  I can see the point of "data availability": it is better
freeze the service, than risk losing transactions... however, try to
explain that to some managers: "well, you know, the DB server froze
the whole bank system because, well, the standby server died, and we
didn't want to risk transaction loss, we just froze the master you
know, in case the master were to die too before the we had a reliable
standby."  I don't think a manager would really understand why you
would block the whole company's system, just because *the standby*
server died (and why you don't block it, when the master dies?!).
Now, maybe that's a bad example, I know a bank should have at least 3
or 4 servers, with some of them in different geographical areas, but
just think on the typical boss.

In "Service Availability", you have data Availability most of the
time, until one of the servers fails (if you have just 2 nodes), what
if you have more than two: well, good for you!  But, you can keep
going with a single server, understanding that you are in a high risk,
that have to be fixed real soon (emergency).

>
> If you troll the archives, you will see that this debate has received
> much consideration already. The conclusion is that if you care about
> Service Availibility you should have 2 standby servers and set them both
> as candidates to being the synchronous one.

That's more cost, and for most applications: it doesn't worth the extra cost.

Really, I see the point you have, and I have *never* asked to remove
the data warranties, but to have an option to relax it, if the
particular situation requires it: "enough safety" for a given cost.

>
> That way, when you lose one standby the service is unaffected, the
> second standby is now the synchronous one, and it's possible to
> re-attach the failed standby live, with or without archiving (with is
> preferred so that the master isn't involved in the catch-up phase).
>
>> As synchronous standby currently is, it just doesn't fit the HA usage,
>
> It does actually allow both data high availability and service high
> availability, provided that you feed at least two standbys.

Still, doesn't fit.  You need to spend more hardware, and more power
(and money there), and more carbon footprint, . you get the point,
also, having 3 servers for your DB can be necessary (and possible) for
some companies, but for others: no.

>
> What you seem to be asking is both data and service high availability
> with only two nodes. You're right that we can not provide that with
> current releases of PostgreSQL. I'm not sure anyone has a solid plan to
> make that happen.
>
>> and if you really want to keep it that way, it doesn't belong to the
>> HA chapter on the pgsql documentation, and should be moved.  And NO
>> async replication will *not* work for HA, because the master can have
>> more transactions than standby, and if the master crashes, the standby
>> will have no way to recover these transactions, with synchronous
>> replication we have *exactly* what we need: the data in the standby,
>> after all, it will apply it once we promote it.
>
> Exactly. We want data availability first. Service availability is
> important too, and for that you need another standby.

Yeah, you need that with PostgreSQL, but no with DRBD, for example
(sorry, but DRBD is one of the flagships of HA things in the Linux
world).  Also, I'm not convinced about the "2nd standby" thing... I
mean, just read this on the docs, which is a little alarming:

"If primary restarts while commits are waiting for acknowledgement,
those waiting transactions will be marked fully committed once the
primary database recovers. There is no way to be certain that all
standbys have received all outstanding WAL data at time of the crash
of the primary. Some transactions may not show as committed on the
standby, even though they show as committed on the primary. The
guarantee we offer is that the application will not receive explicit
acknowledgement of the successful commit of a transaction until the
WAL data is known to be safely re

Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-11 Thread Jose Ildefonso Camargo Tolosa
On Wed, Jul 11, 2012 at 11:48 PM, Josh Berkus  wrote:
>
>> Please, stop arguing on all of this: I don't think that adding an
>> option will hurt anybody (specially because the work was already done
>> by someone), we are not asking to change how the things work, we just
>> want an option to decided whether we want it to freeze on standby
>> disconnection, or if we want it to continue automatically... is that
>> asking so much?
>
> The objection is that, *given the way synchronous replication currently
> works*, having that kind of an option would make the "synchronous"
> setting fairly meaningless.  The only benefit that synchronous
> replication gives you is the guarantee that a write on the master is
> also on the standby.  If you remove that guarantee, you are using
> asynchronous replication, even if the setting says synchronous.

I know how synchronous replication works, I have read it several
times, I have seen it in the real life, I have seen it in virtual test
environments.  And no, it doesn't makes synchronous replication
meaningless, because it will work synchronous if it have someone to
sync to, and work async (or standalone) if it doesn't: that's perfect
for HA environment.

>
> I think what you really want is a separate "auto-degrade" setting.  That
> is, a setting which says "if no synchronous standby is present,
> auto-degrade to async/standalone, and start writing a bunch of warning
> messages to the logs and whenever anyone runs a synchronous
> transaction".  That's an approach which makes some sense, but AFAICT
> somewhat different from the proposed patch.

Certainly, different to current patch, the one I saw I believe it had
all of that you say there: except the additional warning.

As synchronous standby currently is, it just doesn't fit the HA usage,
and if you really want to keep it that way, it doesn't belong to the
HA chapter on the pgsql documentation, and should be moved.  And NO
async replication will *not* work for HA, because the master can have
more transactions than standby, and if the master crashes, the standby
will have no way to recover these transactions, with synchronous
replication we have *exactly* what we need: the data in the standby,
after all, it will apply it once we promote it.

Ildefonso.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-11 Thread Jose Ildefonso Camargo Tolosa
Greetings,

On Wed, Jul 11, 2012 at 9:11 AM, Shaun Thomas  wrote:
> On 07/10/2012 06:02 PM, Daniel Farina wrote:
>
>> For example, what if DRBD can only complete one page per second for
>> some reason?  Does it it simply have the primary wait at this glacial
>> pace, or drop synchronous replication and go degraded?  Or does it do
>> something more clever than just a timeout?
>
>
> That's a good question, and way beyond what I know about the internals. :)
> In practice though, there are configurable thresholds, and if exceeded, it
> will invalidate the secondary. When using Pacemaker, we've actually had
> instances where the 10G link we had between the servers died, so each node
> thought the other was down. That lead to the secondary node self-promoting
> and trying to steal the VIP from the primary. Throw in a gratuitous arp, and
> you get a huge mess.

That's why Pacemaker *recommends* STONITH (Shoot The Other Node In The
Head).  Whenever the standby decides to promote itself, it would just
kill the former master (just in case)... the STONITH thing have to use
an independent connection.  Additionally, redundant link between
cluster nodes is a must.

>
> That lead to what DRBD calls split-brain, because both nodes were running
> and writing to the block device. Thankfully, you can actually tell one node
> to discard its changes and re-subscribe. Doing that will replay the
> transactions from the "good" node on the "bad" one. And even then, it's a
> good idea to run an online verify to do a block-by-block checksum and
> correct any differences.
>
> Of course, all of that's only possible because it's a block-level
> replication. I can't even imagine PG doing anything like that. It would have
> to know the last good transaction from the primary and do an implied PIT
> recovery to reach that state, then re-attach for sync commits.
>
>
>> Regardless of what DRBD does, I think the problem with the
>> async/sync duality as-is is there is no nice way to manage exposure
>> to transaction loss under various situations and requirements.
>
>
> Which would be handy. With synchronous commits, it's given that the protocol
> is bi-directional. Then again, PG can detect when clients disconnect the
> instant they do so, and having such an event implicitly disable
> synchronous_standby_names until reconnect would be an easy fix. The database
> already keeps transaction logs, so replaying would still happen on
> re-attach. It could easily throw a warning for every sync-required commit so
> long as it's in "degraded" mode. Those alone are very small changes that
> don't really harm the intent of sync commit.
>
> That's basically what a RAID-1 does, and people have been fine with that for
> decades.
>
>

I can't believe how many times I have seen this topic arise in the
mailing list... I was myself about to start a thread like this!
(thanks Shaun!).

I don't really get what people wants out of the synchronous streaming
replication DRBD (that is being used as comparison) in protocol C
is synchronous (it won't confirm a write unless it was written to disk
on both nodes).  PostgreSQL (8.4, 9.0, 9.1, ...) will work just fine
with it, except that you don't have a standby that you can connect
to... also, you need to setup a dedicated volume to put the DRBD block
device, setup DRBD, then put the filesystem on top of DRBD, and handle
the DRBD promotion, partition mount (with possible FS error handling),
and then starting PostgreSQL after the FS is correctly mounted..

With synchronous streaming replication you can have about the same:
the standby will have the changes written to disk before master
confirms commit I don't really care if standby has already applied
the changes to its DB (although that would certainly be nice) the
point is: the data is on the standby, and if the master were to crash,
and I were to "promote" the standby: the standby would have the same
commited data the server had before it crashed.

So, why are we, HA people, bothering you DB people so much?: simplify
the things, it is simpler to setup synchronous streaming replication,
than having to setup DRBD + pacemaker rules to make it promote DRBD,
mount FS, and then start pgsql.

Also, there is an great perk to synchronous replication with Hot
Standby: you have a read/only standby that can be used for some things
(even though it doesn't always have exactly the same data as the
master).

I mean, a lot of people here have a really valid point: 2-safe
reliability is great, but how good is it if when you lose it, ALL the
system just freeze? I mean, RAID1 gives you 2-safe reliability, but no
one would use it if the machine were to freeze when you lose 1 disk,
same for DRBD: it offers 2-safe reliability too (at block-level), but
it doesn't freeze if the secondary goes away!

Now, I see some people who are arguing because, apparently,
synchronous replication is not an HA feature (those who says that SR
doesn't fit the HA environment)... please, those peop