Re: [HACKERS] Standalone synchronous master

2014-01-27 Thread Josh Berkus
On 01/26/2014 07:56 PM, Rajeev rastogi wrote:
> I shall rework to improve this patch. Below are the summarization of all
> discussions, which will be used as input for improving the patch:
> 
> 1. Method of degrading the synchronous mode:
>   a. Expose the configuration variable to a new SQL-callable functions.
>   b. Using ALTER SYSTEM SET.
>   c. Auto-degrade using some sort of configuration parameter as done in 
> current patch.
>   d. Or may be combination of above, which DBA can use depending on their 
> use-cases.  
> 
>   We can discuss further to decide on one of the approach.
> 
> 2. Synchronous mode should upgraded/restored after at-least one synchronous 
> standby comes up and has caught up with the master.
> 
> 3. A better monitoring/administration interfaces, which can be even better if 
> it is made as a generic trap system.
> 
>   I shall propose a better approach for this.
> 
> 4. Send committing clients, a WARNING if they have committed a synchronous 
> transaction and we are in degraded mode.
> 
> 5. Please add more if I am missing something.

I think we actually need two degrade modes:

A. degrade once: if the sync standby connection is ever lost, degrade
and do not resync.

B. reconnect: if the sync standby catches up again, return it to sync
status.

The reason you'd want "degrade once" is to avoid the "flaky network"
issue where you're constantly degrading then reattaching the sync
standby, resulting in horrible performance.

If we did offer "degrade once" though, we'd need some easy way to
determine that the master was in a state of permanent degrade, and a
command to make it resync.

Discuss?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-27 Thread Robert Haas
On Sun, Jan 26, 2014 at 10:56 PM, Rajeev rastogi
 wrote:
> On 01/25/2014, Josh Berkus wrote:
>> > ISTM the consensus is that we need better monitoring/administration
>> > interfaces so that people can script the behavior they want in
>> > external tools. Also, a new synchronous apply replication mode would
>> > be handy, but that'd be a whole different patch. We don't have a
>> patch
>> > on the table that we could consider committing any time soon, so I'm
>> > going to mark this as rejected in the commitfest app.
>>
>> I don't feel that "we'll never do auto-degrade" is determinative;
>> several hackers were for auto-degrade, and they have a good use-case
>> argument.  However, we do have consensus that we need more scaffolding
>> than this patch supplies in order to make auto-degrade *safe*.
>>
>> I encourage the submitter to resumbit and improved version of this
>> patch (one with more monitorability) for  9.5 CF1.  That'll give us a
>> whole dev cycle to argue about it.
>
> I shall rework to improve this patch. Below are the summarization of all
> discussions, which will be used as input for improving the patch:
>
> 1. Method of degrading the synchronous mode:
> a. Expose the configuration variable to a new SQL-callable functions.
> b. Using ALTER SYSTEM SET.
> c. Auto-degrade using some sort of configuration parameter as done in 
> current patch.
> d. Or may be combination of above, which DBA can use depending on 
> their use-cases.
>
>   We can discuss further to decide on one of the approach.
>
> 2. Synchronous mode should upgraded/restored after at-least one synchronous 
> standby comes up and has caught up with the master.
>
> 3. A better monitoring/administration interfaces, which can be even better if 
> it is made as a generic trap system.
>
>   I shall propose a better approach for this.
>
> 4. Send committing clients, a WARNING if they have committed a synchronous 
> transaction and we are in degraded mode.
>
> 5. Please add more if I am missing something.

All of those things have been mentioned, but I'm not sure we have
consensus on which of them we actually want to do, or how.  Figuring
that out seems like the next step.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-26 Thread Rajeev rastogi
On 01/25/2014, Josh Berkus wrote:
> > ISTM the consensus is that we need better monitoring/administration
> > interfaces so that people can script the behavior they want in
> > external tools. Also, a new synchronous apply replication mode would
> > be handy, but that'd be a whole different patch. We don't have a
> patch
> > on the table that we could consider committing any time soon, so I'm
> > going to mark this as rejected in the commitfest app.
> 
> I don't feel that "we'll never do auto-degrade" is determinative;
> several hackers were for auto-degrade, and they have a good use-case
> argument.  However, we do have consensus that we need more scaffolding
> than this patch supplies in order to make auto-degrade *safe*.
> 
> I encourage the submitter to resumbit and improved version of this
> patch (one with more monitorability) for  9.5 CF1.  That'll give us a
> whole dev cycle to argue about it.

I shall rework to improve this patch. Below are the summarization of all
discussions, which will be used as input for improving the patch:

1. Method of degrading the synchronous mode:
a. Expose the configuration variable to a new SQL-callable functions.
b. Using ALTER SYSTEM SET.
c. Auto-degrade using some sort of configuration parameter as done in 
current patch.
d. Or may be combination of above, which DBA can use depending on their 
use-cases.  

  We can discuss further to decide on one of the approach.

2. Synchronous mode should upgraded/restored after at-least one synchronous 
standby comes up and has caught up with the master.

3. A better monitoring/administration interfaces, which can be even better if 
it is made as a generic trap system.

  I shall propose a better approach for this.

4. Send committing clients, a WARNING if they have committed a synchronous 
transaction and we are in degraded mode.

5. Please add more if I am missing something.

Thanks and Regards,
Kumar Rajeev Rastogi
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-26 Thread Hannu Krosing
On 01/24/2014 10:29 PM, Josh Berkus wrote:
> On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
>> ISTM the consensus is that we need better monitoring/administration
>> interfaces so that people can script the behavior they want in external
>> tools. Also, a new synchronous apply replication mode would be handy,
>> but that'd be a whole different patch. We don't have a patch on the
>> table that we could consider committing any time soon, so I'm going to
>> mark this as rejected in the commitfest app.
> I don't feel that "we'll never do auto-degrade" is determinative;
> several hackers were for auto-degrade, and they have a good use-case
> argument.  
Auto-degrade may make sense together with synchronous apply
mentioned by Heikki.

I do not see much use for synchronous-(noapply)-if-you-can mode,
though it may make some sense in some scenarios if sync failure
is accompanied by loud screaming ("hey DBA, we are writing checks
with no money in the bank, do something fast!")

Perhaps some kind of sync-with-timeout mode, where timing out
results with a "weak error" (something between current
warning and error) returned to client and/or where it causes and
external command to be run which could then be used to flood
admins mailbox :)
> However, we do have consensus that we need more scaffolding
> than this patch supplies in order to make auto-degrade *safe*.
>
> I encourage the submitter to resumbit and improved version of this patch
> (one with more monitorability) for  9.5 CF1.  That'll give us a whole
> dev cycle to argue about it.
>

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-24 Thread Florian Pflug
On Jan24, 2014, at 22:29 , Josh Berkus  wrote:
> On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
>> ISTM the consensus is that we need better monitoring/administration
>> interfaces so that people can script the behavior they want in external
>> tools. Also, a new synchronous apply replication mode would be handy,
>> but that'd be a whole different patch. We don't have a patch on the
>> table that we could consider committing any time soon, so I'm going to
>> mark this as rejected in the commitfest app.
> 
> I don't feel that "we'll never do auto-degrade" is determinative;
> several hackers were for auto-degrade, and they have a good use-case
> argument.  However, we do have consensus that we need more scaffolding
> than this patch supplies in order to make auto-degrade *safe*.
> 
> I encourage the submitter to resumbit and improved version of this patch
> (one with more monitorability) for  9.5 CF1.  That'll give us a whole
> dev cycle to argue about it.

There seemed to be at least some support for having way to manually
degrade from sync rep to async rep via something like

  ALTER SYSTEM SET synchronous_commit='local';

Doing that seems unlikely to meet much resistant on grounds of principle,
so it seems to me that working on that would be the best way forward for
the submitter. I don't know how hard it would be to pull this off,
though.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-24 Thread Josh Berkus
On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
> ISTM the consensus is that we need better monitoring/administration
> interfaces so that people can script the behavior they want in external
> tools. Also, a new synchronous apply replication mode would be handy,
> but that'd be a whole different patch. We don't have a patch on the
> table that we could consider committing any time soon, so I'm going to
> mark this as rejected in the commitfest app.

I don't feel that "we'll never do auto-degrade" is determinative;
several hackers were for auto-degrade, and they have a good use-case
argument.  However, we do have consensus that we need more scaffolding
than this patch supplies in order to make auto-degrade *safe*.

I encourage the submitter to resumbit and improved version of this patch
(one with more monitorability) for  9.5 CF1.  That'll give us a whole
dev cycle to argue about it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-24 Thread Heikki Linnakangas
ISTM the consensus is that we need better monitoring/administration 
interfaces so that people can script the behavior they want in external 
tools. Also, a new synchronous apply replication mode would be handy, 
but that'd be a whole different patch. We don't have a patch on the 
table that we could consider committing any time soon, so I'm going to 
mark this as rejected in the commitfest app.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Florian Pflug
On Jan13, 2014, at 22:30 , "Joshua D. Drake"  wrote:
> On 01/13/2014 01:14 PM, Jim Nasby wrote:
>> 
>> On 1/13/14, 12:21 PM, Joshua D. Drake wrote:
>>> 
>>> On 01/13/2014 10:12 AM, Hannu Krosing wrote:
>> In other words, if we're going to have auto-degrade, the most
>> intelligent place for it is in
>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
>> place.  Anything we do *inside* Postgres is going to have a really,
>> really hard time determining when to degrade.
> +1
> 
> This is also how 2PC works, btw - the database provides the building
> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
> to deal with issues that require a whole-cluster perspective.
> 
 
 ++1
>>> 
>>> +1
>> 
>> Josh, what do you think of the upthread idea of being able to recover
>> in-progress transactions that are waiting when we turn off sync rep? I'm
>> thinking that would be a very good feature to have... and it's not
>> something you can easily do externally.
> 
> I think it is extremely valuable, else we have lost those transactions which
> is exactly what we don't want.

We *have* to "recover" waiting transaction upon switching off sync rep.

A transaction that waits for a sync standby to respond has already committed
locally (i.e., updated the clog), it just hasn't updated the proc array yet,
and thus is still seen as in-progress by the rest of the system. But rolling
back the transaction is nevertheless *impossible* at that point (except by
PITR, and hence the quoted around reciver). So the only alternative to
"recovering" them, i.e. have them abort their waiting, is to let them linger
indefinitely, still holding their locks, preventing xmin from advancing, etc,
until either the client disconnects or the server is restarted.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Joshua D. Drake


On 01/13/2014 01:14 PM, Jim Nasby wrote:


On 1/13/14, 12:21 PM, Joshua D. Drake wrote:


On 01/13/2014 10:12 AM, Hannu Krosing wrote:

In other words, if we're going to have auto-degrade, the most
intelligent place for it is in
RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
place.  Anything we do *inside* Postgres is going to have a really,
really hard time determining when to degrade.

+1

This is also how 2PC works, btw - the database provides the building
blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
to deal with issues that require a whole-cluster perspective.



++1


+1


Josh, what do you think of the upthread idea of being able to recover
in-progress transactions that are waiting when we turn off sync rep? I'm
thinking that would be a very good feature to have... and it's not
something you can easily do externally.


I think it is extremely valuable, else we have lost those transactions 
which is exactly what we don't want.


JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Andres Freund
On 2014-01-13 15:14:21 -0600, Jim Nasby wrote:
> On 1/13/14, 12:21 PM, Joshua D. Drake wrote:
> >
> >On 01/13/2014 10:12 AM, Hannu Krosing wrote:
> In other words, if we're going to have auto-degrade, the most
> intelligent place for it is in
> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
> place.  Anything we do *inside* Postgres is going to have a really,
> really hard time determining when to degrade.
> >>>+1
> >>>
> >>>This is also how 2PC works, btw - the database provides the building
> >>>blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
> >>>to deal with issues that require a whole-cluster perspective.
> >>>
> >>
> >>++1
> >
> >+1
> 
> Josh, what do you think of the upthread idea of being able to recover 
> in-progress transactions that are waiting when we turn off sync rep? I'm 
> thinking that would be a very good feature to have... and it's not something 
> you can easily do externally.

I think it'd be a fairly simple patch to re-check the state of syncrep
config in SyncRepWaitForLsn(). Alternatively you can just write code to
iterate over the procarray and sets Proc->syncRepState to
SYNC_REP_WAIT_CANCELLED or such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Jim Nasby

On 1/13/14, 12:21 PM, Joshua D. Drake wrote:


On 01/13/2014 10:12 AM, Hannu Krosing wrote:

In other words, if we're going to have auto-degrade, the most
intelligent place for it is in
RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
place.  Anything we do *inside* Postgres is going to have a really,
really hard time determining when to degrade.

+1

This is also how 2PC works, btw - the database provides the building
blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
to deal with issues that require a whole-cluster perspective.



++1


+1


Josh, what do you think of the upthread idea of being able to recover 
in-progress transactions that are waiting when we turn off sync rep? I'm 
thinking that would be a very good feature to have... and it's not something 
you can easily do externally.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Joshua D. Drake


On 01/13/2014 10:12 AM, Hannu Krosing wrote:

In other words, if we're going to have auto-degrade, the most
intelligent place for it is in
RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
place.  Anything we do *inside* Postgres is going to have a really,
really hard time determining when to degrade.

+1

This is also how 2PC works, btw - the database provides the building
blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
to deal with issues that require a whole-cluster perspective.



++1


+1



I like Simons idea to have a pg_xxx function for switching between
replication modes, which should be enough to support a monitor
daemon doing the switching.

Maybe we could have an 'syncrep_taking_too_long_command' GUC
which could be used to alert such a monitoring daemon, so it can
immediately check weather to



I would think that would be a column in pg_stat_replication. Basically 
last_ack or something like that.




a) switch master to async rep or standalone mode (in case of sync slave
becoming unavailable)


Yep.



or

b) to failover to slave (in almost equally likely case that it was the
master
which became disconnected from the world and slave is available)

or


I think this should be left to external tools.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Hannu Krosing
On 01/13/2014 04:12 PM, Florian Pflug wrote:
> On Jan12, 2014, at 04:18 , Josh Berkus  wrote:
>> Thing is, when we talk about auto-degrade, we need to determine things
>> like "Is the replica down or is this just a network blip"? and take
>> action according to the user's desired configuration.  This is not
>> something, realistically, that we can do on a single request.  Whereas
>> it would be fairly simple for an external monitoring utility to do:
>>
>> 1. decide replica is offline for the duration (several poll attempts
>> have failed)
>>
>> 2. Send ALTER SYSTEM SET to the master and change/disable the
>> synch_replicas.
>>
>> In other words, if we're going to have auto-degrade, the most
>> intelligent place for it is in
>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
>> place.  Anything we do *inside* Postgres is going to have a really,
>> really hard time determining when to degrade.
> +1
>
> This is also how 2PC works, btw - the database provides the building
> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
> to deal with issues that require a whole-cluster perspective.
>

++1

I like Simons idea to have a pg_xxx function for switching between
replication modes, which should be enough to support a monitor
daemon doing the switching.

Maybe we could have an 'syncrep_taking_too_long_command' GUC
which could be used to alert such a monitoring daemon, so it can
immediately check weather to

a) switch master to async rep or standalone mode (in case of sync slave
becoming unavailable)

or

b) to failover to slave (in almost equally likely case that it was the
master
which became disconnected from the world and slave is available)

or

c) do something else depending on circumstances/policy :)


NB! Note that in case of b) 'syncrep_taking_too_long_command' will
very likely also not reach the monitor daemon, so it can not relay on
this as main trigger!

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Florian Pflug
On Jan12, 2014, at 04:18 , Josh Berkus  wrote:
> Thing is, when we talk about auto-degrade, we need to determine things
> like "Is the replica down or is this just a network blip"? and take
> action according to the user's desired configuration.  This is not
> something, realistically, that we can do on a single request.  Whereas
> it would be fairly simple for an external monitoring utility to do:
> 
> 1. decide replica is offline for the duration (several poll attempts
> have failed)
> 
> 2. Send ALTER SYSTEM SET to the master and change/disable the
> synch_replicas.
> 
> In other words, if we're going to have auto-degrade, the most
> intelligent place for it is in
> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
> place.  Anything we do *inside* Postgres is going to have a really,
> really hard time determining when to degrade.

+1

This is also how 2PC works, btw - the database provides the building
blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
to deal with issues that require a whole-cluster perspective.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Rajeev rastogi
 
> On Sun, Jan 12, Amit Kapila wrote:
> >> How would that work?  Would it be a tool in contrib?  There already
> >> is a timeout, so if a tool checked more frequently than the timeout,
> >> it should work.  The durable notification of the admin would happen
> >> in the tool, right?
> >
> > Well, you know what tool *I'm* planning to use.
> >
> > Thing is, when we talk about auto-degrade, we need to determine
> things
> > like "Is the replica down or is this just a network blip"? and take
> > action according to the user's desired configuration.  This is not
> > something, realistically, that we can do on a single request.
> Whereas
> > it would be fairly simple for an external monitoring utility to do:
> >
> > 1. decide replica is offline for the duration (several poll attempts
> > have failed)
> >
> > 2. Send ALTER SYSTEM SET to the master and change/disable the
> > synch_replicas.
> 
>Will it possible in current mechanism, because presently master will
>not accept any new command when the sync replica is not available?
>Or is there something else also which needs to be done along with
>above 2 points to make it possible.

Since there is not WAL written for ALTER SYSTEM SET command, 
then
it should be able to handle this command even though sync 
replica is
not available.

Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Rajeev rastogi

On 13th January 2013, Josh Berkus Wrote:

> I'm leading this off with a review of the features offered by the
> actual patch submitted.  My general discussion of the issues of Sync
> Degrade, which justifies my specific suggestions below, follows that.
> Rajeev, please be aware that other hackers may have different opinions
> than me on what needs to change about the patch, so you should collect
> all opinions before changing code.

Thanks for reviewing and providing the first level of comments. Surely
We'll collect all feedback to improve this patch.

> 
> > Add a new parameter :
> 
> > synchronous_standalone_master = on | off
> 
> I think this is a TERRIBLE name for any such parameter.  What does
> "synchronous standalone" even mean?  A better name for the parameter
> would be "auto_degrade_sync_replication" or "synchronous_timeout_action
> = error | degrade", or something similar.  It would be even better for
> this to be a mode of synchronous_commit, except that synchronous_commit
> is heavily overloaded already.

Yes we can change this parameter name. Some of the suggestion in order to 
degrade the mode
1. Auto-degrade using some sort of configuration parameter as done in 
current patch.
2. Expose the configuration variable to a new SQL-callable functions as 
suggested by Heikki.
3. Or using ALTER SYSTEM SET as suggested by others.

> Some issues raised by this log script:
> 
> LOG:  standby "tx0113" is now the synchronous standby with priority 1
> LOG:  waiting for standby synchronization
>   <-- standby wal receiver on the standby is killed (SIGKILL)
> LOG:  unexpected EOF on standby connection
> LOG:  not waiting for standby synchronization
>   <-- restart standby so that it connects again
> LOG:  standby "tx0113" is now the synchronous standby with priority 1
> LOG:  waiting for standby synchronization
>   <-- standby wal receiver is first stopped (SIGSTOP) to make sure
> 
> The "not waiting for standby synchronization" message should be marked
> something stronger than LOG.  I'd like ERROR.

Yes we can change this to ERROR.

> Second, you have the master resuming sync rep when the standby
> reconnects.  How do you determine when it's safe to do that?  You're
> making the assumption that you have a failing sync standby instead of
> one which simply can't keep up with the master, or a flakey network
> connection (see discussion below).

Yes this can be further improved so that only if we make sure that synchronous
Standby has caught up with master node (may require a better design), then only 
master can be upgraded to Synchronous mode by one of the method discussed above.

> > a.   Master_to_standalone_cmd: To be executed before master
> switches to standalone mode.
> >
> > b.  Master_to_sync_cmd: To be executed before master switches
> from
> sync mode to standalone mode.
> 
> I'm not at all clear what the difference between these two commands is.
>  When would one be excuted, and when would the other be executed?  Also,
> renaming ...

There is typo mistake in above explain, meaning of two commands are:
a.Master_to_standalone_cmd: To be executed during degradation of sync mode.

 b.  Master_to_sync_cmd: To be executed before upgrade or restoration of mode.

These two commands are per the TODO item to inform DBA.

But as per Heikki suggestion, we should not use this mechanism to inform DBA 
rather
We should some have some sort of generic trap system, instead of adding this 
one 
particular extra config option specifically for this feature. 
This looks to be better idea so we can have further discussion to come with 
proper
design.


> Missing features:
> 
> a) we should at least send committing clients a WARNING if they have
> commited a synchronous transaction and we are in degraded mode.

Yes it is great idea.

> One of the reasons that there's so much disagreement about this feature
> is that most of the folks strongly in favor of auto-degrade are
> thinking
> *only* of the case that the standby is completely down.  There are many
> other reasons for a sync transaction to hang, and the walsender has
> absolutely no way of knowing which is the case.  For example:
> 
> * Transient network issues
> * Standby can't keep up with master
> * Postgres bug
> * Storage/IO issues (think EBS)
> * Standby is restarting
> 
> You don't want to handle all of those issues the same way as far as
> sync rep is concerned.  For example, if the standby is restaring, you
> probably want to wait instead of degrading.

I think if we support to have some external SQL-callable functions as Heikki 
suggested to degrade instead of auto-degrade then user can handle at-least some 
of the above scenarios if not all based on their experience and observation. 


Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Amit Kapila
> On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode
> in a way such that in async mode it never
>> waits for commits to be written to standby, but in this new mode it will
>> do so unless it is not possible (all sync standby's goes down).
>> Can't we use existing wal_sender_timeout, or even if user expects a
>> different timeout because for this new mode, he expects master to wait
>> more before it start operating like standalone sync master, we can provide
>> a new parameter.
>
> One of the reasons that there's so much disagreement about this feature
> is that most of the folks strongly in favor of auto-degrade are thinking
> *only* of the case that the standby is completely down.  There are many
> other reasons for a sync transaction to hang, and the walsender has
> absolutely no way of knowing which is the case.  For example:
>
> * Transient network issues
> * Standby can't keep up with master
> * Postgres bug
> * Storage/IO issues (think EBS)
> * Standby is restarting
>
> You don't want to handle all of those issues the same way as far as sync
> rep is concerned.  For example, if the standby is restaring, you
> probably want to wait instead of degrading.

   I think it might be difficult to differentiate the cases except may be
   by having a separate timeout for this mode, so that it can wait more
   when server runs in this mode. OTOH why can't we define this new
   mode such that it will behave same for all cases, basically we can tell
   whenever sync standby is not available (n/w issue or m/c down), it will
   behave as master in async mode.
   Here I think the important point would be to gracefully allow resuming
   sync standby when it tries to reconnect (we can allow to reconnect if it
   can resolve all WAL differences.)


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
> Well, then that becomes a reason to want better/more configurability.

I agree with this- the challenge is figuring out what those options
should be and how we should document them.

> In the couple of sync rep sites I admin, I *would* want to wait.

That's certainly an interesting data point.  One of the specific
use-cases that I'm thinking of is to auto-degrade on a graceful shutdown
of the slave for upgrades and/or maintenance.  Perhaps we don't need
*auto* degrade in that case, but then an actual failure of the slave
will also bring down the master.

> > I don't follow this logic at all- why is there no safe way to resume?
> > You wait til the slave is caught up fully and then go back to sync mode.
> > If that turns out to be an extended problem then an alarm needs to be
> > raised, of course.
> 
> So, if you have auto-resume, how do you handle the "flaky network" case?
>  And how would an alarm be raised?

Ideally, every time there is a auto-degrade, messages are logs to log
files which are monitored and notices are sent to admins about it
happening, who, upon getting repeated such emails, would realize there's
a problem and work to fix it.

> On 01/12/2014 12:51 PM, Kevin Grittner wrote:
> > Josh Berkus  wrote:
> >> I know others have dismissed this idea as too "talky", but from my
> >> perspective, the agreement with the client for each synchronous
> >> commit is being violated, so each and every synchronous commit
> >> should report failure to sync.  Also, having a warning on every
> >> commit would make it easier to troubleshoot degraded mode for users
> >> who have ignored the other warnings we give them.
> >
> > I agree that every synchronous commit on a master which is configured
> > for synchronous replication which returns without persisting the work
> > of the transaction on both the (local) primary and a synchronous
> > replica should issue a WARNING.  That said, the API for some
> > connectors (like JDBC) puts the burden on the application or its
> > framework to check for warnings each time and do something reasonable
> > if found; I fear that a Venn diagram of those shops which would use
> > this new feature and those shops that don't rigorously look for and
> > reasonably deal with warnings would have significant overlap.
> 
> Oh, no question.  However, having such a WARNING would help with
> interactive troubleshooting once a problem has been identified, and
> that's my main reason for wanting it.

I'm in the camp of this being too 'talky'.

> Imagine the case where you have auto-degrade and a flaky network.  The
> user would experience problems as performance problems; that is, some
> commits take minutes on-again, off-again.  They wouldn't necessarily
> even LOOK at the sync rep settings.  So next step is to try walking
> through a sample transaction on the command line, and then the
> DBA/consultant gets WARNING messages, which gives an idea where the real
> problem lies.

Or they look in the logs which hopefully say that their slave keeps
getting disconnected...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Josh Berkus
On 01/12/2014 12:35 PM, Stephen Frost wrote:
> * Josh Berkus (j...@agliodbs.com) wrote:
>> You don't want to handle all of those issues the same way as far as sync
>> rep is concerned.  For example, if the standby is restaring, you
>> probably want to wait instead of degrading.
> 
> *What*?!  Certainly not in any kind of OLTP-type system; a system
> restart can easily take minutes.  Clearly, you want to resume once the
> standby is back up, which I feel like the people against an auto-degrade
> mode are missing, but holding up a commit until the standby finishes
> rebooting isn't practical.

Well, then that becomes a reason to want better/more configurability.
In the couple of sync rep sites I admin, I *would* want to wait.

>> There's also the issue that this patch, and necessarily any
>> walsender-level auto-degrade, has IMHO no safe way to resume sync
>> replication.  This means that any use who has a network or storage blip
>> once a day (again, think AWS) would be constantly in degraded mode, even
>> though both the master and the replica are up and running -- and it will
>> come as a complete surprise to them when the lose the master and
>> discover that they've lost data.
> 
> I don't follow this logic at all- why is there no safe way to resume?
> You wait til the slave is caught up fully and then go back to sync mode.
> If that turns out to be an extended problem then an alarm needs to be
> raised, of course.

So, if you have auto-resume, how do you handle the "flaky network" case?
 And how would an alarm be raised?

On 01/12/2014 12:51 PM, Kevin Grittner wrote:
> Josh Berkus  wrote:
>> I know others have dismissed this idea as too "talky", but from my
>> perspective, the agreement with the client for each synchronous
>> commit is being violated, so each and every synchronous commit
>> should report failure to sync.  Also, having a warning on every
>> commit would make it easier to troubleshoot degraded mode for users
>> who have ignored the other warnings we give them.
>
> I agree that every synchronous commit on a master which is configured
> for synchronous replication which returns without persisting the work
> of the transaction on both the (local) primary and a synchronous
> replica should issue a WARNING.  That said, the API for some
> connectors (like JDBC) puts the burden on the application or its
> framework to check for warnings each time and do something reasonable
> if found; I fear that a Venn diagram of those shops which would use
> this new feature and those shops that don't rigorously look for and
> reasonably deal with warnings would have significant overlap.

Oh, no question.  However, having such a WARNING would help with
interactive troubleshooting once a problem has been identified, and
that's my main reason for wanting it.

Imagine the case where you have auto-degrade and a flaky network.  The
user would experience problems as performance problems; that is, some
commits take minutes on-again, off-again.  They wouldn't necessarily
even LOOK at the sync rep settings.  So next step is to try walking
through a sample transaction on the command line, and then the
DBA/consultant gets WARNING messages, which gives an idea where the real
problem lies.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Kevin Grittner
Josh Berkus  wrote:

>>  Add a new parameter :

> 
>>  synchronous_standalone_master = on | off
> 
> I think this is a TERRIBLE name for any such parameter.  What does
> "synchronous standalone" even mean?  A better name for the parameter
> would be "auto_degrade_sync_replication" or 
> "synchronous_timeout_action
> = error | degrade", or something similar.  It would be even better for
> this to be a mode of synchronous_commit, except that synchronous_commit
> is heavily overloaded already.

+1

> a) we should at least send committing clients a WARNING if they have
> commited a synchronous transaction and we are in degraded mode.
> 
> I know others have dismissed this idea as too "talky", but from my
> perspective, the agreement with the client for each synchronous commit
> is being violated, so each and every synchronous commit should report
> failure to sync.  Also, having a warning on every commit would make it
> easier to troubleshoot degraded mode for users who have ignored the
> other warnings we give them.

I agree that every synchronous commit on a master which is configured for 
synchronous replication which returns without persisting the work of the 
transaction on both the (local) primary and a synchronous replica should issue 
a WARNING.  That said, the API for some connectors (like JDBC) puts the burden 
on the application or its framework to check for warnings each time and do 
something reasonable if found; I fear that a Venn diagram of those shops which 
would use this new feature and those shops that don't rigorously look for and 
reasonably deal with warnings would have significant overlap.

> b) pg_stat_replication needs to show degraded mode in some way, or we
> need pg_sync_rep_degraded(), or (ideally) both.

+1

Since this new feature, where enabled, would cause synchronous replication to 
provide no guarantees beyond what asynchronous replication does[1], but would 
tend to cause people to have an *expectation* that they have some additional 
protection, I think proper documentation will be a big challenge.


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]  If I understand correctly, this is what the feature is intended to provide:
- A transaction successfully committed on the primary is guaranteed to be 
visible on the replica?  No, in all modes.
- A transaction successfully committed on the primary is guaranteed *not* to be 
visible on the replica?  No, in all modes.
- A the work of a transaction which has not returned from a commit request may 
be visible on the primary and/or the standby?  Yes in all modes.
- A failure of the primary is guaranteed not to lose successfully committed 
transactions when failing over to the replica?  Yes for sync rep without this 
feature, no for async or when this feature is used.  If things are going well 
up to the moment of primary failure, the feature improves the odds (versus 
async) that successfully committed transactions will not be lost, or may reduce 
the number of successfully committed transactions lost.
- A failure of the replica allows transactions on the primary to continue?  
Read only for sync rep without this feature if the last sync standby has 
failed, read only for some interval and then read write with this feature or if 
there is still another working sync rep target, all transactions without 
interruption with async.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
> On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode
> in a way such that in async mode it never
> > waits for commits to be written to standby, but in this new mode it will
> > do so unless it is not possible (all sync standby's goes down).
> > Can't we use existing wal_sender_timeout, or even if user expects a
> > different timeout because for this new mode, he expects master to wait
> > more before it start operating like standalone sync master, we can provide
> > a new parameter.
> 
> One of the reasons that there's so much disagreement about this feature
> is that most of the folks strongly in favor of auto-degrade are thinking
> *only* of the case that the standby is completely down.  There are many
> other reasons for a sync transaction to hang, and the walsender has
> absolutely no way of knowing which is the case.  For example:

Uhh, yea, no, I'm pretty sure those in favor of auto-degrade are very
specifically thinking of cases like "Standby is restarting", which is
not a reason for the master to fall over.

> * Transient network issues
> * Standby can't keep up with master
> * Postgres bug
> * Storage/IO issues (think EBS)
> * Standby is restarting
> 
> You don't want to handle all of those issues the same way as far as sync
> rep is concerned.  For example, if the standby is restaring, you
> probably want to wait instead of degrading.

*What*?!  Certainly not in any kind of OLTP-type system; a system
restart can easily take minutes.  Clearly, you want to resume once the
standby is back up, which I feel like the people against an auto-degrade
mode are missing, but holding up a commit until the standby finishes
rebooting isn't practical.

> There's also the issue that this patch, and necessarily any
> walsender-level auto-degrade, has IMHO no safe way to resume sync
> replication.  This means that any use who has a network or storage blip
> once a day (again, think AWS) would be constantly in degraded mode, even
> though both the master and the replica are up and running -- and it will
> come as a complete surprise to them when the lose the master and
> discover that they've lost data.

I don't follow this logic at all- why is there no safe way to resume?
You wait til the slave is caught up fully and then go back to sync mode.
If that turns out to be an extended problem then an alarm needs to be
raised, of course.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Josh Berkus
All,

I'm leading this off with a review of the features offered by the actual
patch submitted.  My general discussion of the issues of Sync Degrade,
which justifies my specific suggestions below, follows that.  Rajeev,
please be aware that other hackers may have different opinions than me
on what needs to change about the patch, so you should collect all
opinions before changing code.

===

> Add a new parameter :

> synchronous_standalone_master = on | off

I think this is a TERRIBLE name for any such parameter.  What does
"synchronous standalone" even mean?  A better name for the parameter
would be "auto_degrade_sync_replication" or "synchronous_timeout_action
= error | degrade", or something similar.  It would be even better for
this to be a mode of synchronous_commit, except that synchronous_commit
is heavily overloaded already.

Some issues raised by this log script:

LOG:  standby "tx0113" is now the synchronous standby with priority 1
LOG:  waiting for standby synchronization
  <-- standby wal receiver on the standby is killed (SIGKILL)
LOG:  unexpected EOF on standby connection
LOG:  not waiting for standby synchronization
  <-- restart standby so that it connects again
LOG:  standby "tx0113" is now the synchronous standby with priority 1
LOG:  waiting for standby synchronization
  <-- standby wal receiver is first stopped (SIGSTOP) to make sure

The "not waiting for standby synchronization" message should be marked
something stronger than LOG.  I'd like ERROR.

Second, you have the master resuming sync rep when the standby
reconnects.  How do you determine when it's safe to do that?  You're
making the assumption that you have a failing sync standby instead of
one which simply can't keep up with the master, or a flakey network
connection (see discussion below).

> a.   Master_to_standalone_cmd: To be executed before master
switches to standalone mode.
>
> b.  Master_to_sync_cmd: To be executed before master switches from
sync mode to standalone mode.

I'm not at all clear what the difference between these two commands is.
 When would one be excuted, and when would the other be executed?  Also,
renaming ...

Missing features:

a) we should at least send committing clients a WARNING if they have
commited a synchronous transaction and we are in degraded mode.

I know others have dismissed this idea as too "talky", but from my
perspective, the agreement with the client for each synchronous commit
is being violated, so each and every synchronous commit should report
failure to sync.  Also, having a warning on every commit would make it
easier to troubleshoot degraded mode for users who have ignored the
other warnings we give them.

b) pg_stat_replication needs to show degraded mode in some way, or we
need pg_sync_rep_degraded(), or (ideally) both.

I'm also wondering if we need a more sophisticated approach to
wal_sender_timeout to go with all this.

===

On 01/11/2014 08:33 PM, Bruce Momjian wrote:
> On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote:
>> In other words, if we're going to have auto-degrade, the most
>> intelligent place for it is in
>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
>> place.  Anything we do *inside* Postgres is going to have a really,
>> really hard time determining when to degrade.
> 
> Well, one goal I was considering is that if a commit is hung waiting for
> slave sync confirmation, and the timeout happens, then the mode is
> changed to degraded and the commit returns success.  I am not sure how
> you would do that in an external tool, meaning there is going to be
> period where commits fail, unless you think there is a way that when the
> external tool changes the mode to degrade that all hung commits
> complete.  That would be nice.

Realistically, though, that's pretty unavoidable.  Any technique which
waits a reasonable interval to determine that the replica isn't going to
respond is liable to go beyond the application's timeout threshold
anyway.  There are undoubtedly exceptions to that, but it will be the
case a lot of the time -- how many applications are willing to wait
*minutes* for a COMMIT?

I also don't see any way to allow the hung transactions to commit
without allowing the walsender to make a decision on degrading.  As I've
outlined elsewhere (and below), the walsender just doesn't have enough
information to make a good decision.

On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode
in a way such that in async mode it never
> waits for commits to be written to standby, but in this new mode it will
> do so unless it is not possible (all sync standby's goes down).
> Can't we use existing wal_sender_timeout, or even if user expects a
> different timeout because for this new mode, he expects master to wait
> more before it start operating like standalone sync master, we can provide
> a new parameter.

One of the reasons that there's so much disag

Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Florian Pflug
On Jan11, 2014, at 18:53 , Andres Freund  wrote:
> On 2014-01-11 18:28:31 +0100, Florian Pflug wrote:
>> Hm, I was about to suggest that you can set statement_timeout before
>> doing COMMIT to limit the amount of time you want to wait for the
>> standby to respond. Interestingly, however, that doesn't seem to work,
>> which is weird, since AFAICS statement_timeout simply generates a
>> query cancel requester after the timeout has elapsed, and cancelling
>> the COMMIT with Ctrl-C in psql *does* work.
> 
> I think that'd be a pretty bad API since you won't know whether the
> commit failed or succeeded but replication timed out. There very well
> might have been longrunning constraint triggers or such taking a long
> time.

You could still distinguish these cases because the COMMIT would succeed
with a WARNING if the timeout elapses while waiting for the standby, just
as it does for query cancellations already.

I'm not saying that this is a great API, though - I brought it up only
because I accepting cancellation requests but ignoring timeouts seems
a bit inconsistent to me.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Amit Kapila
On Sat, Jan 11, 2014 at 9:41 PM, Bruce Momjian  wrote:
> On Sat, Jan 11, 2014 at 01:29:23PM +0530, Amit Kapila wrote:
>> Okay, this is one way of providing this new mode, others could be:
>>
>> a.
>> Have just one GUC sync_standalone_mode = true|false and make
>> this as PGC_POSTMASTER parameter, so that user is only
>> allowed to set this mode at startup. Even if we don't want it as
>> Postmaster parameter, we can mention to users that they can
>> change this parameter only before server reaches current situation.
>> I understand that without any alarm or some other way, it is difficult
>> for user to know and change it, but I think in that case he should
>> set it before server startup.
>>
>> b.
>> On above lines, instead of boolean parameter, provide a parameter
>> similar to current one such as available_synchronous_standby_names,
>> setting of this should follow what I said in point a. The benefit in this
>> as compare to 'a' is that it appears to be more like what we currently have.
>>
>> I think if we try to solve this problem by providing a way so that user
>> can change it at runtime or when the problem actually occurred, it can
>> make the UI more complex and difficult for us to provide a way so that
>> user can be alerted on such situation. We can keep our options open
>> so that if tomorrow, we can find any reasonable way, then we can
>> provide it to user a mechanism for changing this at runtime, but I don't
>> think it is stopping us from providing a way with which user can get the
>> benefit of this mode by providing start time parameter.
>
> I am not sure how this would work.  Right now we wait for one of the
> synchronous_standby_names servers to verify the writes.   We need some
> way of telling the system how long to wait before continuing in degraded
> mode.  Without a timeout and admin notification, it doesn't seem much
> better than our async mode, which is what many people were complaining
> about.

It is better than async mode in a way such that in async mode it never
waits for commits to be written to standby, but in this new mode it will
do so unless it is not possible (all sync standby's goes down).
Can't we use existing wal_sender_timeout, or even if user expects a
different timeout because for this new mode, he expects master to wait
more before it start operating like standalone sync master, we can provide
a new parameter.

With this the definition of new mode is to provide maximum
availability.

We can define the behavior in this new mode as:
a. It will operate like current synchronous master till one of the standby
mentioned in available_synchronous_standby_names is available.
b. If none is available, then it will start operating link current async
master, which means that if any async standby is configured, then
it will start sending WAL to that standby asynchronously, else if none
is configured, it will start operating in a standalone master.
c. We can even provide a new parameter replication_mode here
(non persistent), which will tell to user that master has switched
its mode, this can be made available by view. Update the value of
parameter when server switches to new mode.
d. When one of the standby mentioned in
available_synchronous_standby_names comes back and able to resolve
all WAL difference, then it will again switch back to sync mode, where it
will write to that standby before Commit finishes. After switch, it will
update the replication_mode parameter.

Now I think with above definition and behavior, it can switch to new mode
and will be able to provide information if user wants it by using view.

In above behaviour, the tricky part would be point 'd' where it has to switch
back to sync mode when one of the sync standby become available, but I
think we can workout design for that if you are positive about the above
definition and behaviour as defined by 4 points.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Bruce Momjian
On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote:
> In other words, if we're going to have auto-degrade, the most
> intelligent place for it is in
> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
> place.  Anything we do *inside* Postgres is going to have a really,
> really hard time determining when to degrade.

Well, one goal I was considering is that if a commit is hung waiting for
slave sync confirmation, and the timeout happens, then the mode is
changed to degraded and the commit returns success.  I am not sure how
you would do that in an external tool, meaning there is going to be
period where commits fail, unless you think there is a way that when the
external tool changes the mode to degrade that all hung commits
complete.  That would be nice.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Amit Kapila
On Sun, Jan 12, 2014 at 8:48 AM, Josh Berkus  wrote:
> On 01/10/2014 06:27 PM, Bruce Momjian wrote:
>> How would that work?  Would it be a tool in contrib?  There already is a
>> timeout, so if a tool checked more frequently than the timeout, it
>> should work.  The durable notification of the admin would happen in the
>> tool, right?
>
> Well, you know what tool *I'm* planning to use.
>
> Thing is, when we talk about auto-degrade, we need to determine things
> like "Is the replica down or is this just a network blip"? and take
> action according to the user's desired configuration.  This is not
> something, realistically, that we can do on a single request.  Whereas
> it would be fairly simple for an external monitoring utility to do:
>
> 1. decide replica is offline for the duration (several poll attempts
> have failed)
>
> 2. Send ALTER SYSTEM SET to the master and change/disable the
> synch_replicas.

   Will it possible in current mechanism, because presently master will
   not accept any new command when the sync replica is not available?
   Or is there something else also which needs to be done along with
   above 2 points to make it possible.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Josh Berkus
On 01/10/2014 06:27 PM, Bruce Momjian wrote:
> How would that work?  Would it be a tool in contrib?  There already is a
> timeout, so if a tool checked more frequently than the timeout, it
> should work.  The durable notification of the admin would happen in the
> tool, right?

Well, you know what tool *I'm* planning to use.

Thing is, when we talk about auto-degrade, we need to determine things
like "Is the replica down or is this just a network blip"? and take
action according to the user's desired configuration.  This is not
something, realistically, that we can do on a single request.  Whereas
it would be fairly simple for an external monitoring utility to do:

1. decide replica is offline for the duration (several poll attempts
have failed)

2. Send ALTER SYSTEM SET to the master and change/disable the
synch_replicas.

Such a tool would *also* be capable of detecting when the synchronous
replica was back up and operating, and switch back to sync mode,
something we simply can't do inside Postgres.  And it would be a lot
easier to configure an external tool with monitoring system integration
so that it can alert the DBA to degradation in a way which the DBA was
liable to actually see (which is NOT the Postgres log).

In other words, if we're going to have auto-degrade, the most
intelligent place for it is in
RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
place.  Anything we do *inside* Postgres is going to have a really,
really hard time determining when to degrade.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Tom Lane
Mark Kirkwood  writes [slightly rearranged]
> My 2c is:

> The current behavior in CAP theorem speak is 'Cap' - i.e focused on 
> consistency at the expense of availability. A reasonable thing to want.

> The other behavior being asked for is 'cAp' - i.e focused on 
> availability. Also a reasonable configuration to want.

> I think an option to control whether we operate 'Cap' or 'cAp' 
> (defaulting to the current 'Cap' I guess) is probably the best solution.

The above is all perfectly reasonable.  The argument that's not been made
to my satisfaction is that the proposed patch is a good implementation of
'cAp'-optimized behavior.  In particular,

> ... Now the desire to 
> use sync rather than async is to achieve as much consistency as 
> possible, which is also reasonable.

I don't think that the existing sync mode is designed to do that, and
simply lobotomizing it as proposed doesn't get you there.  I think we
need a replication mode that's been designed *from the ground up*
with cAp priorities in mind.  There may end up being only a few actual
differences in behavior --- but I fear that some of those differences
will be crucial.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Mark Kirkwood

On 11/01/14 13:25, Stephen Frost wrote:

Adrian,


* Adrian Klaver (adrian.kla...@gmail.com) wrote:

A) Change the existing sync mode to allow the master and standby
fall out of sync should a standby fall over.


I'm not sure that anyone is argueing for this..


B) Create a new mode that does this without changing the existing sync mode.

My two cents would be to implement B. Sync to me is a contract that
master and standby are in sync at any point in time. Anything else
should be called something else. Then it is up to the documentation
to clearly point out the benefits/pitfalls. If you want to implement
something as important as replication without reading the docs then
the results are on you.


The issue is that there are folks who are argueing, essentially, that
"B" is worthless, wrong, and no one should want it and therefore we
shouldn't have it.



We have some people who clearly do want it (and seemed to have provided 
sensible arguments about why it might be worthwhile), and the others who 
say they should not.


My 2c is:

The current behavior in CAP theorem speak is 'Cap' - i.e focused on 
consistency at the expense of availability. A reasonable thing to want.


The other behavior being asked for is 'cAp' - i.e focused on 
availability. Also a reasonable configuration to want. Now the desire to 
use sync rather than async is to achieve as much consistency as 
possible, which is also reasonable.


I think an option to control whether we operate 'Cap' or 'cAp' 
(defaulting to the current 'Cap' I guess) is probably the best solution.


Regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Andres Freund
On 2014-01-11 18:28:31 +0100, Florian Pflug wrote:
> Hm, I was about to suggest that you can set statement_timeout before
> doing COMMIT to limit the amount of time you want to wait for the
> standby to respond. Interestingly, however, that doesn't seem to work,
> which is weird, since AFAICS statement_timeout simply generates a
> query cancel requester after the timeout has elapsed, and cancelling
> the COMMIT with Ctrl-C in psql *does* work.

I think that'd be a pretty bad API since you won't know whether the
commit failed or succeeded but replication timed out. There very well
might have been longrunning constraint triggers or such taking a long
time.
So it really would need a separate GUC.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Tom Lane
Florian Pflug  writes:
> Hm, I was about to suggest that you can set statement_timeout before
> doing COMMIT to limit the amount of time you want to wait for the
> standby to respond. Interestingly, however, that doesn't seem to work,
> which is weird, since AFAICS statement_timeout simply generates a
> query cancel requester after the timeout has elapsed, and cancelling
> the COMMIT with Ctrl-C in psql *does* work.

> I'm quite probably missing something, but what?

finish_xact_command() disables statement timeout before committing.

Not sure about the pros and cons of doing that later in the sequence.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Florian Pflug
On Jan11, 2014, at 01:48 , Joshua D. Drake  wrote:
> On 01/10/2014 04:38 PM, Stephen Frost wrote:
>> Adrian,
>> 
>> * Adrian Klaver (adrian.kla...@gmail.com) wrote:
>>> On 01/10/2014 04:25 PM, Stephen Frost wrote:
 * Adrian Klaver (adrian.kla...@gmail.com) wrote:
> A) Change the existing sync mode to allow the master and standby
> fall out of sync should a standby fall over.
 
 I'm not sure that anyone is argueing for this..
>>> 
>>> Looks like here, unless I am really missing the point:
>> 
>> Elsewhere in the thread, JD agreed that having it as an independent
>> option was fine.
> 
> Yes. I am fine with an independent option.

Hm, I was about to suggest that you can set statement_timeout before
doing COMMIT to limit the amount of time you want to wait for the
standby to respond. Interestingly, however, that doesn't seem to work,
which is weird, since AFAICS statement_timeout simply generates a
query cancel requester after the timeout has elapsed, and cancelling
the COMMIT with Ctrl-C in psql *does* work.

I'm quite probably missing something, but what?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Bruce Momjian
On Sat, Jan 11, 2014 at 01:29:23PM +0530, Amit Kapila wrote:
> Okay, this is one way of providing this new mode, others could be:
> 
> a.
> Have just one GUC sync_standalone_mode = true|false and make
> this as PGC_POSTMASTER parameter, so that user is only
> allowed to set this mode at startup. Even if we don't want it as
> Postmaster parameter, we can mention to users that they can
> change this parameter only before server reaches current situation.
> I understand that without any alarm or some other way, it is difficult
> for user to know and change it, but I think in that case he should
> set it before server startup.
> 
> b.
> On above lines, instead of boolean parameter, provide a parameter
> similar to current one such as available_synchronous_standby_names,
> setting of this should follow what I said in point a. The benefit in this
> as compare to 'a' is that it appears to be more like what we currently have.
> 
> I think if we try to solve this problem by providing a way so that user
> can change it at runtime or when the problem actually occurred, it can
> make the UI more complex and difficult for us to provide a way so that
> user can be alerted on such situation. We can keep our options open
> so that if tomorrow, we can find any reasonable way, then we can
> provide it to user a mechanism for changing this at runtime, but I don't
> think it is stopping us from providing a way with which user can get the
> benefit of this mode by providing start time parameter.

I am not sure how this would work.  Right now we wait for one of the
synchronous_standby_names servers to verify the writes.   We need some
way of telling the system how long to wait before continuing in degraded
mode.  Without a timeout and admin notification, it doesn't seem much
better than our async mode, which is what many people were complaining
about.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Amit Kapila
On Fri, Jan 10, 2014 at 9:17 PM, Bruce Momjian  wrote:
> On Fri, Jan 10, 2014 at 10:21:42AM +0530, Amit Kapila wrote:
>> Here I think if user is aware from beginning that this is the behaviour,
>> then may be the importance of message is not very high.
>> What I want to say is that if we provide a UI in such a way that user
>> decides during setup of server the behavior that is required by him.
>>
>> For example, if we provide a new parameter
>> available_synchronous_standby_names along with current parameter
>> and ask user to use this new parameter, if he wishes to synchronously
>> commit transactions on another server when it is available, else it will
>> operate as a standalone sync master.
>
> I know there was a desire to remove this TODO item, but I think we have
> brought up enough new issues that we can keep it to see if we can come
> up with a solution.

  I am not telling any such thing, rather I am suggesting some other way
  for this new mode.

> I have added a link to this discussion on the TODO
> item.
>
> I think we will need at least four new GUC variables:
>
> *  timeout control for degraded mode
> *  command to run during switch to degraded mode
> *  command to run during switch from degraded mode
> *  read-only variable to report degraded mode

Okay, this is one way of providing this new mode, others could be:

a.
Have just one GUC sync_standalone_mode = true|false and make
this as PGC_POSTMASTER parameter, so that user is only
allowed to set this mode at startup. Even if we don't want it as
Postmaster parameter, we can mention to users that they can
change this parameter only before server reaches current situation.
I understand that without any alarm or some other way, it is difficult
for user to know and change it, but I think in that case he should
set it before server startup.

b.
On above lines, instead of boolean parameter, provide a parameter
similar to current one such as available_synchronous_standby_names,
setting of this should follow what I said in point a. The benefit in this
as compare to 'a' is that it appears to be more like what we currently have.

I think if we try to solve this problem by providing a way so that user
can change it at runtime or when the problem actually occurred, it can
make the UI more complex and difficult for us to provide a way so that
user can be alerted on such situation. We can keep our options open
so that if tomorrow, we can find any reasonable way, then we can
provide it to user a mechanism for changing this at runtime, but I don't
think it is stopping us from providing a way with which user can get the
benefit of this mode by providing start time parameter.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Bruce Momjian
On Fri, Jan 10, 2014 at 03:27:10PM -0800, Josh Berkus wrote:
> On 01/10/2014 01:49 PM, Andres Freund wrote:
> > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> >>
> >> On 01/10/2014 07:47 AM, Bruce Momjian wrote:
> >>
> >>> I know there was a desire to remove this TODO item, but I think we have
> >>> brought up enough new issues that we can keep it to see if we can come
> >>> up with a solution.  I have added a link to this discussion on the TODO
> >>> item.
> >>>
> >>> I think we will need at least four new GUC variables:
> >>>
> >>> *  timeout control for degraded mode
> >>> *  command to run during switch to degraded mode
> >>> *  command to run during switch from degraded mode
> >>> *  read-only variable to report degraded mode
> 
> I would argue that we don't need the first.  We just want a command to
> switch synchronous/degraded, and a variable (or function) to report on
> degraded mode.  If we have those things, then it becomes completely
> possible to have an external monitoring framework, which is capable of
> answering questions like "is the replica down or just slow?", control
> degrade.
> 
> Oh, wait!  We DO have such a command.  It's called ALTER SYSTEM SET!
> Recently committed.  So this is really a solvable issue if one is
> willing to use an external utility.

How would that work?  Would it be a tool in contrib?  There already is a
timeout, so if a tool checked more frequently than the timeout, it
should work.  The durable notification of the admin would happen in the
tool, right?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Bruce Momjian
On Fri, Jan 10, 2014 at 03:17:34PM -0800, Josh Berkus wrote:
> The purpose of sync rep is to know determinatively whether or not you
> have lost data when disaster strikes.  If knowing for certain isn't
> important to you, then use async.
> 
> BTW, people are using RAID1 as an analogy to 2-node sync replication.
> That's a very bad analogy, because in RAID1 you have a *single*
> controller which is capable of determining if the disks are in a failed
> state or not, and this is all happening on a single node where things
> like network outages aren't a consideration.  It's really not the same
> situation at all.
> 
> Also, frankly, I absolutely can't count the number of times I've had to
> rescue a customer or family member who had RAID1 but wan't monitoring
> syslog, and so one of their disks had been down for months without them
> knowning it.  Heck, I've done this myself.
> 
> So ... the Filesystem geeks have already been through this.  Filesystem
> clustering started out with systems like DRBD, which includes an
> auto-degrade option.  However, DBRD with auto-degrade is widely
> considered untrustworthy and is a significant portion of why DBRD isn't
> trusted today.
> 
> >From here, clustered filesystems went in two directions: RHCS added
> layers of monitoring and management to make auto-degrade a safer option
> than it is with DRBD (and still not the default option).  Scalable
> clustered filesystems added N(M) quorum commit in order to support more
> than 2 nodes.  Either of these courses are reasonable for us to pursue.
> 
> What's a bad idea is adding an auto-degrade option without any tools to
> manage and monitor it, which is what this patch does by my reading.  If
> I'm wrong, then someone can point it out to me.

Yes, my big take-away from the discussion is that informing the admin in
a durable way is a requirement for this degraded mode.  You are right
that many ignore RAID degradation warnings, but with the warnings
heeded, degraded functionality can be useful.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Peter Eisentraut
On Wed, 2014-01-08 at 17:56 -0500, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > That's why you should configure a second standby as another (candidate)
> > synchronous replica, also listed in synchronous_standby_names.
> 
> Perhaps we should stress in the docs that this is, in fact, the *only*
> reasonable mode in which to run with sync rep on?  Where there are
> multiple replicas, because otherwise Drake is correct that you'll just
> end up having both nodes go offline if the slave fails.

It's not unreasonable to run with only two if the writers are consuming
from a reliable message queue (or another system that maintains its own
reliable persistence).  Then you can just continue processing messages
after you have repaired your replication pair.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Adrian Klaver

On 01/10/2014 04:48 PM, Joshua D. Drake wrote:


On 01/10/2014 04:38 PM, Stephen Frost wrote:

Adrian,

* Adrian Klaver (adrian.kla...@gmail.com) wrote:

On 01/10/2014 04:25 PM, Stephen Frost wrote:

* Adrian Klaver (adrian.kla...@gmail.com) wrote:

A) Change the existing sync mode to allow the master and standby
fall out of sync should a standby fall over.


I'm not sure that anyone is argueing for this..


Looks like here, unless I am really missing the point:


Elsewhere in the thread, JD agreed that having it as an independent
option was fine.


Yes. I am fine with an independent option.


I missed that. What confused me and seems to be generally confusing is 
the overloading of the term sync:


"Proposed behavior:

db01->sync->db02 "

In my mind if that is an independent option it should have different 
name. I propose Schrödinger:)




JD






--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Jim Nasby

On 1/10/14, 6:19 PM, Adrian Klaver wrote:

1) Async. Runs at the speed of the master as it does not have to wait on the 
standby to signal a successful commit. There is some degree of offset between 
master and standby(s) due to latency.

2) Sync. Runs at the speed of the standby + latency between master and standby. 
This is counter balanced by knowledge that the master and standby are in the 
same state. As Josh Berkus pointed out there is a loop hole in this when 
multiple standbys are involved.

The topic under discussion is an intermediate mode between 1 and 2. There seems 
to be a consensus that this is not unreasonable.


That's not what's actually under debate; allow me to restate as option 3:

3) Sync. Everything you said, plus: "If for ANY reason the master can not talk to 
the slave it becomes read-only."

That's the current state.

What many people want is something along the lines of what you said in 2: The 
slave ALWAYS has everything the master does (at least on disk) unless the 
connection between master and slave fails.

The reason people want this is it protects you against a *single* fault. If 
just the master blows up, you have a 100% reliable slave. If the connection (or 
the slave itself) blows up, the master is still working.

I agree that there's a non-obvious gotcha here: in the case of a master failure 
you might also have experienced a connection failure, and without some kind of 
3rd party involved you have no way to know that.

We should make best efforts to make that gotcha as clear to users as we can. 
But just because some users will blindly ignore that doesn't mean we flat-out 
shouldn't support those that will understand the gotcha and accept it's 
limitations.

BTW, if ALTER SYSTEM SET actually does make it possible to implement automated 
failover without directly adding it to Postgres then I think a good compromise 
would be to have an external project that does just that and have the docs 
reference that project and explain why we haven't built it in.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 04:38 PM, Stephen Frost wrote:

Adrian,

* Adrian Klaver (adrian.kla...@gmail.com) wrote:

On 01/10/2014 04:25 PM, Stephen Frost wrote:

* Adrian Klaver (adrian.kla...@gmail.com) wrote:

A) Change the existing sync mode to allow the master and standby
fall out of sync should a standby fall over.


I'm not sure that anyone is argueing for this..


Looks like here, unless I am really missing the point:


Elsewhere in the thread, JD agreed that having it as an independent
option was fine.


Yes. I am fine with an independent option.

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
Adrian,

* Adrian Klaver (adrian.kla...@gmail.com) wrote:
> On 01/10/2014 04:25 PM, Stephen Frost wrote:
> >* Adrian Klaver (adrian.kla...@gmail.com) wrote:
> >>A) Change the existing sync mode to allow the master and standby
> >>fall out of sync should a standby fall over.
> >
> >I'm not sure that anyone is argueing for this..
> 
> Looks like here, unless I am really missing the point:

Elsewhere in the thread, JD agreed that having it as an independent
option was fine.

> Well you will not please everyone, just displease the least.

Well, sure, but we do generally try to reach concensus. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Adrian Klaver

On 01/10/2014 04:25 PM, Stephen Frost wrote:

Adrian,


* Adrian Klaver (adrian.kla...@gmail.com) wrote:

A) Change the existing sync mode to allow the master and standby
fall out of sync should a standby fall over.


I'm not sure that anyone is argueing for this..


Looks like here, unless I am really missing the point:

http://www.postgresql.org/message-id/52d07466.6070...@commandprompt.com

"Proposed behavior:

db01->sync->db02

Transactions are happening. Everything is happy. Website is up. Orders 
are being made.


db02 goes down. It doesn't matter why. It is down. db01 continues to 
accept orders, allow people to log into the website and we can still 
service accounts. The continuity of service continues.


Yes, there are all kinds of things that need to be considered when that 
happens, that isn't the point. The point is, PostgreSQL continues its 
uptime guarantee and allows the business to continue to function as (if) 
nothing has happened.


For many and I dare say the majority of businesses, this is enough. They 
know that if the slave goes down they can continue to operate. They know 
if the master goes down they can fail over. They know that while both 
are up they are using sync rep (with various caveats). They are happy. 
They like that it is simple and just works. They continue to use 
PostgreSQL. "





B) Create a new mode that does this without changing the existing sync mode.

My two cents would be to implement B. Sync to me is a contract that
master and standby are in sync at any point in time. Anything else
should be called something else. Then it is up to the documentation
to clearly point out the benefits/pitfalls. If you want to implement
something as important as replication without reading the docs then
the results are on you.


The issue is that there are folks who are argueing, essentially, that
"B" is worthless, wrong, and no one should want it and therefore we
shouldn't have it.


Well you will not please everyone, just displease the least.



Thanks,

Stephen




--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
Adrian,


* Adrian Klaver (adrian.kla...@gmail.com) wrote:
> A) Change the existing sync mode to allow the master and standby
> fall out of sync should a standby fall over.

I'm not sure that anyone is argueing for this..

> B) Create a new mode that does this without changing the existing sync mode.
> 
> My two cents would be to implement B. Sync to me is a contract that
> master and standby are in sync at any point in time. Anything else
> should be called something else. Then it is up to the documentation
> to clearly point out the benefits/pitfalls. If you want to implement
> something as important as replication without reading the docs then
> the results are on you.

The issue is that there are folks who are argueing, essentially, that
"B" is worthless, wrong, and no one should want it and therefore we
shouldn't have it.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Adrian Klaver

On 01/10/2014 03:38 PM, Joshua D. Drake wrote:


On 01/10/2014 03:17 PM, Josh Berkus wrote:


Any continuous replication should not be a SPOF. The current behavior
guarantees that a two node sync cluster is a SPOF. The proposed behavior
removes that.


Again, if that's your goal, then use async replication.


I think I have gone about this the wrong way. Async does not meet the
technical or business requirements that I have. Sync does except that it
increases the possibility of an outage. That is the requirement I am
trying to address.



The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes.  If knowing for certain isn't
important to you, then use async.


PostgreSQL Sync replication increases the possibility of an outage. That
is incorrect behavior.

I want sync because on the chance that the master goes down, I have as
much data as possible to fail over to. However, I can't use sync because
it increases the possibility that my business will not be able to
function on the chance that the standby goes down.



What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading.  If


This we absolutely agree on.



As I see it the state of replication in Postgres is as follows.

1) Async. Runs at the speed of the master as it does not have to wait on 
the standby to signal a successful commit. There is some degree of 
offset between master and standby(s) due to latency.


2) Sync. Runs at the speed of the standby + latency between master and 
standby. This is counter balanced by knowledge that the master and 
standby are in the same state. As Josh Berkus pointed out there is a 
loop hole in this when multiple standbys are involved.


The topic under discussion is an intermediate mode between 1 and 2. 
There seems to be a consensus that this is not unreasonable.


The issue seems to be how to achieve this with ideas falling into 
roughly two camps.


A) Change the existing sync mode to allow the master and standby fall 
out of sync should a standby fall over.


B) Create a new mode that does this without changing the existing sync mode.


My two cents would be to implement B. Sync to me is a contract that 
master and standby are in sync at any point in time. Anything else 
should be called something else. Then it is up to the documentation to 
clearly point out the benefits/pitfalls. If you want to implement 
something as important as replication without reading the docs then the 
results are on you.




JD





--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 03:17 PM, Josh Berkus wrote:


Any continuous replication should not be a SPOF. The current behavior
guarantees that a two node sync cluster is a SPOF. The proposed behavior
removes that.


Again, if that's your goal, then use async replication.


I think I have gone about this the wrong way. Async does not meet the 
technical or business requirements that I have. Sync does except that it 
increases the possibility of an outage. That is the requirement I am 
trying to address.




The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes.  If knowing for certain isn't
important to you, then use async.


PostgreSQL Sync replication increases the possibility of an outage. That 
is incorrect behavior.


I want sync because on the chance that the master goes down, I have as 
much data as possible to fail over to. However, I can't use sync because 
it increases the possibility that my business will not be able to 
function on the chance that the standby goes down.




What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading.  If


This we absolutely agree on.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Josh Berkus
On 01/10/2014 01:49 PM, Andres Freund wrote:
> On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
>>
>> On 01/10/2014 07:47 AM, Bruce Momjian wrote:
>>
>>> I know there was a desire to remove this TODO item, but I think we have
>>> brought up enough new issues that we can keep it to see if we can come
>>> up with a solution.  I have added a link to this discussion on the TODO
>>> item.
>>>
>>> I think we will need at least four new GUC variables:
>>>
>>> *  timeout control for degraded mode
>>> *  command to run during switch to degraded mode
>>> *  command to run during switch from degraded mode
>>> *  read-only variable to report degraded mode

I would argue that we don't need the first.  We just want a command to
switch synchronous/degraded, and a variable (or function) to report on
degraded mode.  If we have those things, then it becomes completely
possible to have an external monitoring framework, which is capable of
answering questions like "is the replica down or just slow?", control
degrade.

Oh, wait!  We DO have such a command.  It's called ALTER SYSTEM SET!
Recently committed.  So this is really a solvable issue if one is
willing to use an external utility.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Josh Berkus
On 01/10/2014 02:59 PM, Joshua D. Drake wrote:
> 
> On 01/10/2014 02:47 PM, Andres Freund wrote:
> 
>> Really, the commits themselves are sent to the server at exactly the
>> same speed independent of sync/async. The only thing that's delayed is
>> the *notificiation* of the client that sent the commit. Not the commit
>> itself.
> 
> Which is irrelevant to the point that if the standby goes down, we are
> now out of business.
> 
> Any continuous replication should not be a SPOF. The current behavior
> guarantees that a two node sync cluster is a SPOF. The proposed behavior
> removes that.

Again, if that's your goal, then use async replication.

I really don't understand the use-case here.

The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes.  If knowing for certain isn't
important to you, then use async.

BTW, people are using RAID1 as an analogy to 2-node sync replication.
That's a very bad analogy, because in RAID1 you have a *single*
controller which is capable of determining if the disks are in a failed
state or not, and this is all happening on a single node where things
like network outages aren't a consideration.  It's really not the same
situation at all.

Also, frankly, I absolutely can't count the number of times I've had to
rescue a customer or family member who had RAID1 but wan't monitoring
syslog, and so one of their disks had been down for months without them
knowning it.  Heck, I've done this myself.

So ... the Filesystem geeks have already been through this.  Filesystem
clustering started out with systems like DRBD, which includes an
auto-degrade option.  However, DBRD with auto-degrade is widely
considered untrustworthy and is a significant portion of why DBRD isn't
trusted today.

>From here, clustered filesystems went in two directions: RHCS added
layers of monitoring and management to make auto-degrade a safer option
than it is with DRBD (and still not the default option).  Scalable
clustered filesystems added N(M) quorum commit in order to support more
than 2 nodes.  Either of these courses are reasonable for us to pursue.

What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading.  If
I'm wrong, then someone can point it out to me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Hannu Krosing
On 01/10/2014 11:59 PM, Joshua D. Drake wrote:
>
> On 01/10/2014 02:57 PM, Stephen Frost wrote:
>
>> Yes, if you have a BBU that memory is authoritative in most
>> cases. But
>> in that case the argument of having two disks is pretty much
>> pointless,
>> the SPOF suddenly became the battery + ram.
>>
>>
>> If that is a concern then use multiple controllers. Certainly not
>> unheard of- look at SANs...
>>
>
> And in PostgreSQL we obviously have the option of having a third or
> fourth standby but that isn't the problem we are trying to solve.
The problem you are trying to solve is a controller with enough
Battery Backed Cache RAM to cache the entire database but with
write-though mode.

And you want it to degrade to write-back in case of disk failure so that
you can continue while the disk is broken.

People here are telling you that it would not be safe, use at least RAID-1
if you want availability

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 02:57 PM, Stephen Frost wrote:


Yes, if you have a BBU that memory is authoritative in most cases. But
in that case the argument of having two disks is pretty much pointless,
the SPOF suddenly became the battery + ram.


If that is a concern then use multiple controllers. Certainly not
unheard of- look at SANs...



And in PostgreSQL we obviously have the option of having a third or 
fourth standby but that isn't the problem we are trying to solve.


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 02:47 PM, Andres Freund wrote:


Really, the commits themselves are sent to the server at exactly the
same speed independent of sync/async. The only thing that's delayed is
the *notificiation* of the client that sent the commit. Not the commit
itself.


Which is irrelevant to the point that if the standby goes down, we are 
now out of business.


Any continuous replication should not be a SPOF. The current behavior 
guarantees that a two node sync cluster is a SPOF. The proposed behavior 
removes that.


Sincerely,

Joshua D. Drake



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
Greetings,

On Friday, January 10, 2014, Andres Freund wrote:

> Hi,
>
> On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
> > > Why do you know that you didn't loose any transactions? Trivial network
> > > hiccups, a restart of a standby, IO overload on the standby all can
> > > cause a very short interruptions in the walsender connection - leading
> > > to degradation.
>
> > You know that you haven't *lost* any by virtue of the master still being
> > up. The case you describe is a double-failure scenario- the link between
> > the master and slave has to go away AND the master must accept a
> > transaction and then fail independently.
>
> Unfortunately network outages do correlate with other system
> faults. What you're wishing for really is the "I like the world to be
> friendly to me" mode.
> Even if you have only disk problems, quite often if your disks die, you
> can continue to write (especially with a BBU), but uncached reads
> fail. So the walsender connection errors out because a read failed, and
> youre degrading into async mode. *Because* your primary is about to die.


That can happen, sure, but I don't agree that people using a single drive
with a BBU or having two drives in a raid1 die at the same time cases are
reasonable arguments against this option. Not to mention that, today, if
the master has an issue then we're SOL anyway. Also, if the network fails
then likely there aren't any new transactions happening.


> > > > As pointed out by someone
> > > > previously, that's how RAID-1 works (which I imagine quite a few of
> us
> > > > use).
> > >
> > > I don't think that argument makes much sense. Raid-1 isn't safe
> > > as-is. It's only safe if you use some sort of journaling or similar
> > > ontop. If you issued a write during a crash you normally will just get
> > > either the version from before or the version after the last write
> back,
> > > depending on the state on the individual disks and which disk is
> treated
> > > as authoritative by the raid software.
>
> > Uh, you need a decent raid controller then and we're talking about after
> a
> > transaction commit/sync.
>
> Yes, if you have a BBU that memory is authoritative in most cases. But
> in that case the argument of having two disks is pretty much pointless,
> the SPOF suddenly became the battery + ram.
>

If that is a concern then use multiple controllers. Certainly not unheard
of- look at SANs...

Thanks,

Stephen


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 14:44:28 -0800, Joshua D. Drake wrote:
> 
> On 01/10/2014 02:33 PM, Andres Freund wrote:
> >
> >On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
> >>db02 goes down. It doesn't matter why. It is down. db01 continues to accept
> >>orders, allow people to log into the website and we can still service
> >>accounts. The continuity of service continues.
> >
> >Why is that configuration advantageous over a async configuration is the
> >question. Why, with those requirements, are you using a synchronous
> >standby at all?
> 
> If the master goes down, I can fail over knowing that as many of my
> transactions as possible have been replicated.

It's not like async replication mode delays sending data to the standby
in any way.

Really, the commits themselves are sent to the server at exactly the
same speed independent of sync/async. The only thing that's delayed is
the *notificiation* of the client that sent the commit. Not the commit
itself.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Jeff Janes
On Fri, Jan 10, 2014 at 2:33 PM, Andres Freund wrote:

> On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
> > db02 goes down. It doesn't matter why. It is down. db01 continues to
> accept
> > orders, allow people to log into the website and we can still service
> > accounts. The continuity of service continues.
>
> Why is that configuration advantageous over a async configuration is the
> question.


Because it is orders of magnitude less likely to lose transactions that
were reported to have been committed.  A permanent failure of the master is
almost guaranteed to lose transactions with async.  With auto-degrade, a
permanent failure of the master only loses reported-committed transactions
if it co-occurs with a temporary failure of the replica or the network,
lasting longer than the time out period.


Why, with those requirements, are you using a synchronous
> standby at all?
>

They aren't using synchronous standby, they are using asynchronous standby
because we fail to provide the choice they prefer, which is a compromise
between the two.

Cheers,

Jeff


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
Hi,

On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
> > Why do you know that you didn't loose any transactions? Trivial network
> > hiccups, a restart of a standby, IO overload on the standby all can
> > cause a very short interruptions in the walsender connection - leading
> > to degradation.

> You know that you haven't *lost* any by virtue of the master still being
> up. The case you describe is a double-failure scenario- the link between
> the master and slave has to go away AND the master must accept a
> transaction and then fail independently.

Unfortunately network outages do correlate with other system
faults. What you're wishing for really is the "I like the world to be
friendly to me" mode.
Even if you have only disk problems, quite often if your disks die, you
can continue to write (especially with a BBU), but uncached reads
fail. So the walsender connection errors out because a read failed, and
youre degrading into async mode. *Because* your primary is about to die.

> > > As pointed out by someone
> > > previously, that's how RAID-1 works (which I imagine quite a few of us
> > > use).
> >
> > I don't think that argument makes much sense. Raid-1 isn't safe
> > as-is. It's only safe if you use some sort of journaling or similar
> > ontop. If you issued a write during a crash you normally will just get
> > either the version from before or the version after the last write back,
> > depending on the state on the individual disks and which disk is treated
> > as authoritative by the raid software.

> Uh, you need a decent raid controller then and we're talking about after a
> transaction commit/sync.

Yes, if you have a BBU that memory is authoritative in most cases. But
in that case the argument of having two disks is pretty much pointless,
the SPOF suddenly became the battery + ram.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 02:33 PM, Andres Freund wrote:


On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:

db02 goes down. It doesn't matter why. It is down. db01 continues to accept
orders, allow people to log into the website and we can still service
accounts. The continuity of service continues.


Why is that configuration advantageous over a async configuration is the
question. Why, with those requirements, are you using a synchronous
standby at all?


If the master goes down, I can fail over knowing that as many of my 
transactions as possible have been replicated.


JD




--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Adrian Klaver

On 01/10/2014 02:33 PM, Andres Freund wrote:

On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:

db02 goes down. It doesn't matter why. It is down. db01 continues to accept
orders, allow people to log into the website and we can still service
accounts. The continuity of service continues.


Why is that configuration advantageous over a async configuration is the
question. Why, with those requirements, are you using a synchronous
standby at all?


+1



Greetings,

Andres Freund




--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
> db02 goes down. It doesn't matter why. It is down. db01 continues to accept
> orders, allow people to log into the website and we can still service
> accounts. The continuity of service continues.

Why is that configuration advantageous over a async configuration is the
question. Why, with those requirements, are you using a synchronous
standby at all?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 01:49 PM, Andres Freund wrote:


I know I am the one that instigated all of this so I want to be very clear
on what I and what I am confident that my customers would expect.

If a synchronous slave goes down, the master continues to operate. That is
all. I don't care if it is configurable (I would be fine with that). I don't
care if it is not automatic (e.g; slave goes down and we have to tell the
master to continue).


Would you please explain, as precise as possible, what the advantages of
using a synchronous standby would be in such a scenario?


Current behavior:

db01->sync->db02

Transactions are happening. Everything is happy. Website is up. Orders 
are being made.


db02 goes down. It doesn't matter why. It is down. Because it is down, 
db01 for all intents and purposes is also down because we are using sync 
replication. We have just lost continuity of service, we can no longer 
accept orders, we can no longer allow people to log into the website, we 
can no longer service accounts.


In short, we are out of business.

Proposed behavior:

db01->sync->db02

Transactions are happening. Everything is happy. Website is up. Orders 
are being made.


db02 goes down. It doesn't matter why. It is down. db01 continues to 
accept orders, allow people to log into the website and we can still 
service accounts. The continuity of service continues.


Yes, there are all kinds of things that need to be considered when that 
happens, that isn't the point. The point is, PostgreSQL continues its 
uptime guarantee and allows the business to continue to function as (if) 
nothing has happened.


For many and I dare say the majority of businesses, this is enough. They 
know that if the slave goes down they can continue to operate. They know 
if the master goes down they can fail over. They know that while both 
are up they are using sync rep (with various caveats). They are happy. 
They like that it is simple and just works. They continue to use PostgreSQL.



Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
Andres,

On Friday, January 10, 2014, Andres Freund wrote:

> On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@2ndquadrant.com ) wrote:
> > > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > > > If a synchronous slave goes down, the master continues to operate.
> That is
> > > > all. I don't care if it is configurable (I would be fine with that).
> I don't
> > > > care if it is not automatic (e.g; slave goes down and we have to
> tell the
> > > > master to continue).
> > >
> > > Would you please explain, as precise as possible, what the advantages
> of
> > > using a synchronous standby would be in such a scenario?
> >
> > In a degraded/failure state, things continue to *work*.  In a
> > non-degraded/failure state, you're able to handle a system failure and
> > know that you didn't lose any transactions.
>
> Why do you know that you didn't loose any transactions? Trivial network
> hiccups, a restart of a standby, IO overload on the standby all can
> cause a very short interruptions in the walsender connection - leading
> to degradation.


You know that you haven't *lost* any by virtue of the master still being
up. The case you describe is a double-failure scenario- the link between
the master and slave has to go away AND the master must accept a
transaction and then fail independently.


> > As pointed out by someone
> > previously, that's how RAID-1 works (which I imagine quite a few of us
> > use).
>
> I don't think that argument makes much sense. Raid-1 isn't safe
> as-is. It's only safe if you use some sort of journaling or similar
> ontop. If you issued a write during a crash you normally will just get
> either the version from before or the version after the last write back,
> depending on the state on the individual disks and which disk is treated
> as authoritative by the raid software.


Uh, you need a decent raid controller then and we're talking about after a
transaction commit/sync.

And even if you disregard that, there's not much outside influence that
> can lead to loosing connection to a disk drive inside a raid outside an
> actually broken drive. Any network connection is normally kept *outside*
> the leven at which you build raids.


This is a fair point and perhaps we should have the timeout or jitter GUC
which was proposed elsewhere, but the notion that this configuration is
completely unreasonable is not accurate and therefore having it would be a
benefit overall.

Thanks,

Stephen


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > > If a synchronous slave goes down, the master continues to operate. That is
> > > all. I don't care if it is configurable (I would be fine with that). I 
> > > don't
> > > care if it is not automatic (e.g; slave goes down and we have to tell the
> > > master to continue).
> > 
> > Would you please explain, as precise as possible, what the advantages of
> > using a synchronous standby would be in such a scenario?
> 
> In a degraded/failure state, things continue to *work*.  In a
> non-degraded/failure state, you're able to handle a system failure and
> know that you didn't lose any transactions.

Why do you know that you didn't loose any transactions? Trivial network
hiccups, a restart of a standby, IO overload on the standby all can
cause a very short interruptions in the walsender connection - leading
to degradation.

> As pointed out by someone
> previously, that's how RAID-1 works (which I imagine quite a few of us
> use).

I don't think that argument makes much sense. Raid-1 isn't safe
as-is. It's only safe if you use some sort of journaling or similar
ontop. If you issued a write during a crash you normally will just get
either the version from before or the version after the last write back,
depending on the state on the individual disks and which disk is treated
as authoritative by the raid software.

And even if you disregard that, there's not much outside influence that
can lead to loosing connection to a disk drive inside a raid outside an
actually broken drive. Any network connection is normally kept *outside*
the leven at which you build raids.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > If a synchronous slave goes down, the master continues to operate. That is
> > all. I don't care if it is configurable (I would be fine with that). I don't
> > care if it is not automatic (e.g; slave goes down and we have to tell the
> > master to continue).
> 
> Would you please explain, as precise as possible, what the advantages of
> using a synchronous standby would be in such a scenario?

In a degraded/failure state, things continue to *work*.  In a
non-degraded/failure state, you're able to handle a system failure and
know that you didn't lose any transactions.

Tom's point is correct, that you will fail on the "have two copies of
everything" in this mode, but that could certainly be acceptable in the
case where there is a system failure.  As pointed out by someone
previously, that's how RAID-1 works (which I imagine quite a few of us
use).

I've been thinking about this a fair bit and I've come to like the RAID1
analogy.  Stinks that we can't keep things going (automatically) if
either side fails, but perhaps we will one day...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> 
> On 01/10/2014 07:47 AM, Bruce Momjian wrote:
> 
> >I know there was a desire to remove this TODO item, but I think we have
> >brought up enough new issues that we can keep it to see if we can come
> >up with a solution.  I have added a link to this discussion on the TODO
> >item.
> >
> >I think we will need at least four new GUC variables:
> >
> >*  timeout control for degraded mode
> >*  command to run during switch to degraded mode
> >*  command to run during switch from degraded mode
> >*  read-only variable to report degraded mode
> >
> 
> I know I am the one that instigated all of this so I want to be very clear
> on what I and what I am confident that my customers would expect.
> 
> If a synchronous slave goes down, the master continues to operate. That is
> all. I don't care if it is configurable (I would be fine with that). I don't
> care if it is not automatic (e.g; slave goes down and we have to tell the
> master to continue).

Would you please explain, as precise as possible, what the advantages of
using a synchronous standby would be in such a scenario?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Jim Nasby

On 1/10/14, 12:59 PM, Joshua D. Drake wrote:

I know I am the one that instigated all of this so I want to be very clear on 
what I and what I am confident that my customers would expect.

If a synchronous slave goes down, the master continues to operate. That is all. 
I don't care if it is configurable (I would be fine with that). I don't care if 
it is not automatic (e.g; slave goes down and we have to tell the master to 
continue).

I have read through this thread more than once, and I have also went back to 
the docs. I understand why we do it the way we do it. I also understand that 
from a business requirement for 99% of CMD's customers, it's wrong. At least in 
the sense of providing continuity of service.


+1

I understand that this is a degredation of full-on sync rep. But there is 
definite value added with sync-rep that can automatically (or at least easily) 
degrade over async; it protects you from single failures. I fully understand 
that it will not protect you from a double failure. That's OK in many cases.
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 07:47 AM, Bruce Momjian wrote:


I know there was a desire to remove this TODO item, but I think we have
brought up enough new issues that we can keep it to see if we can come
up with a solution.  I have added a link to this discussion on the TODO
item.

I think we will need at least four new GUC variables:

*  timeout control for degraded mode
*  command to run during switch to degraded mode
*  command to run during switch from degraded mode
*  read-only variable to report degraded mode



I know I am the one that instigated all of this so I want to be very 
clear on what I and what I am confident that my customers would expect.


If a synchronous slave goes down, the master continues to operate. That 
is all. I don't care if it is configurable (I would be fine with that). 
I don't care if it is not automatic (e.g; slave goes down and we have to 
tell the master to continue).


I have read through this thread more than once, and I have also went 
back to the docs. I understand why we do it the way we do it. I also 
understand that from a business requirement for 99% of CMD's customers, 
it's wrong. At least in the sense of providing continuity of service.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Hannu Krosing
On 01/10/2014 05:09 PM, Simon Riggs wrote:
> On 10 January 2014 15:47, Bruce Momjian  wrote:
>
>> I know there was a desire to remove this TODO item, but I think we have
>> brought up enough new issues that we can keep it to see if we can come
>> up with a solution.
> Can you summarise what you think the new issues are? All I see is some
> further rehashing of old discussions.
>
> There is already a solution to the "problem" because the docs are
> already very clear that you need multiple standbys to achieve commit
> guarantees AND high availability. RTFM is usually used as some form of
> put down, but that is what needs to happen here.

If we want to get the guarantees that often come up in "sync rep"
discussions - namely that you can assume that your change is applied
on standby when commit returns - then we could implement this by
returning LSN from commit at protocol level and having an option in
queries on standby to wait for this LSN (again passed on wire below
the level of query)  to be applied.

This can be mostly hidden in drivers and would need very little effort
from end user to use. basically you tell the driver that one connection
is bound as "the slave" of another and driver can manage using the
right LSNs. That is the last LSN received from master is always
attached to queries on slaves.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Simon Riggs
On 10 January 2014 15:47, Bruce Momjian  wrote:

> I know there was a desire to remove this TODO item, but I think we have
> brought up enough new issues that we can keep it to see if we can come
> up with a solution.

Can you summarise what you think the new issues are? All I see is some
further rehashing of old discussions.

There is already a solution to the "problem" because the docs are
already very clear that you need multiple standbys to achieve commit
guarantees AND high availability. RTFM is usually used as some form of
put down, but that is what needs to happen here.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Bruce Momjian
On Fri, Jan 10, 2014 at 10:21:42AM +0530, Amit Kapila wrote:
> On Thu, Jan 9, 2014 at 10:45 PM, Bruce Momjian  wrote:
> >
> > I think RAID-1 is a very good comparison because it is successful
> > technology and has similar issues.
> >
> > RAID-1 is like Postgres synchronous_standby_names mode in the sense that
> > the RAID-1 controller will not return success until writes have happened
> > on both mirrors, but it is unlike synchronous_standby_names in that it
> > will degrade and continue writes even when it can't write to both
> > mirrors.  What is being discussed is to allow the RAID-1 behavior in
> > Postgres.
> >
> > One issue that came up in discussions is the insufficiency of writing a
> > degrade notice in a server log file because the log file isn't durable
> > from server failures, meaning you don't know if a fail-over to the slave
> > lost commits.  The degrade message has to be stored durably against a
> > server failure, e.g. on a pager, probably using a command like we do for
> > archive_command, and has to return success before the server continues
> > in degrade mode.  I assume degraded RAID-1 controllers inform
> > administrators in the same way.
> 
> Here I think if user is aware from beginning that this is the behaviour,
> then may be the importance of message is not very high.
> What I want to say is that if we provide a UI in such a way that user
> decides during setup of server the behavior that is required by him.
> 
> For example, if we provide a new parameter
> available_synchronous_standby_names along with current parameter
> and ask user to use this new parameter, if he wishes to synchronously
> commit transactions on another server when it is available, else it will
> operate as a standalone sync master.

I know there was a desire to remove this TODO item, but I think we have
brought up enough new issues that we can keep it to see if we can come
up with a solution.  I have added a link to this discussion on the TODO
item.

I think we will need at least four new GUC variables:

*  timeout control for degraded mode
*  command to run during switch to degraded mode
*  command to run during switch from degraded mode
*  read-only variable to report degraded mode

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Michael Paquier
On Fri, Jan 10, 2014 at 3:23 AM, Simon Riggs  wrote:
> On 8 January 2014 21:40, Tom Lane  wrote:
>> Kevin Grittner  writes:
>>> I'm torn on whether we should cave to popular demand on this; but
>>> if we do, we sure need to be very clear in the documentation about
>>> what a successful return from a commit request means.  Sooner or
>>> later, Murphy's Law being what it is, if we do this someone will
>>> lose the primary and blame us because the synchronous replica is
>>> missing gobs of transactions that were successfully committed.
>>
>> I'm for not caving.  I think people who are asking for this don't
>> actually understand what they'd be getting.
>
> Agreed.
>
>
> Just to be clear, I made this mistake initially. Now I realise Heikki
> was right and if you think about it long enough, you will too. If you
> still disagree, think hard, read the archives until you do.
+1. I see far more potential in having a N-sync solution from the
usability viewpoint, and consistency with the existing mechanisms in
place. A synchronous apply mode would be nice as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Amit Kapila
On Thu, Jan 9, 2014 at 10:45 PM, Bruce Momjian  wrote:
>
> I think RAID-1 is a very good comparison because it is successful
> technology and has similar issues.
>
> RAID-1 is like Postgres synchronous_standby_names mode in the sense that
> the RAID-1 controller will not return success until writes have happened
> on both mirrors, but it is unlike synchronous_standby_names in that it
> will degrade and continue writes even when it can't write to both
> mirrors.  What is being discussed is to allow the RAID-1 behavior in
> Postgres.
>
> One issue that came up in discussions is the insufficiency of writing a
> degrade notice in a server log file because the log file isn't durable
> from server failures, meaning you don't know if a fail-over to the slave
> lost commits.  The degrade message has to be stored durably against a
> server failure, e.g. on a pager, probably using a command like we do for
> archive_command, and has to return success before the server continues
> in degrade mode.  I assume degraded RAID-1 controllers inform
> administrators in the same way.

Here I think if user is aware from beginning that this is the behaviour,
then may be the importance of message is not very high.
What I want to say is that if we provide a UI in such a way that user
decides during setup of server the behavior that is required by him.

For example, if we provide a new parameter
available_synchronous_standby_names along with current parameter
and ask user to use this new parameter, if he wishes to synchronously
commit transactions on another server when it is available, else it will
operate as a standalone sync master.


> I think RAID-1 controllers operate successfully with this behavior
> because they are seen as durable and authoritative in reporting the
> status of mirrors, while with Postgres, there is no central authority
> that can report that degrade status of master/slaves.
>
> Another concern with degrade mode is that once Postgres enters degrade
> mode, how does it get back to synchronous_standby_names mode?

   It will get back to mode where it will commit the transactions to another
   server before commit completes when all the gap in WAL is resolved.
   I think in new new mode it will operate as if there is no
   synchronous_standby_names.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Jim Nasby

On 1/9/14, 9:01 AM, Hannu Krosing wrote:

Yeah, and I think that the logging command that was suggested allows
>for that*if configured correctly*.

*But*  for relying on this, we would also need to make logging
*synchronous*,
which would probably not go down well with many people, as it makes things
even more fragile from availability viewpoint (and slower as well).


Not really... you only care about monitoring performance when the standby has 
gone AWOL *and* you haven't sent a notification yet. Once you've notified once 
you're done.

So in this case the master won't go down unless you have a double fault: 
standby goes down AND you can't get to your monitoring.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Simon Riggs
On 8 January 2014 21:40, Tom Lane  wrote:
> Kevin Grittner  writes:
>> I'm torn on whether we should cave to popular demand on this; but
>> if we do, we sure need to be very clear in the documentation about
>> what a successful return from a commit request means.  Sooner or
>> later, Murphy's Law being what it is, if we do this someone will
>> lose the primary and blame us because the synchronous replica is
>> missing gobs of transactions that were successfully committed.
>
> I'm for not caving.  I think people who are asking for this don't
> actually understand what they'd be getting.

Agreed.


Just to be clear, I made this mistake initially. Now I realise Heikki
was right and if you think about it long enough, you will too. If you
still disagree, think hard, read the archives until you do.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Josh Berkus
Robert,

> I think the problem here is that we tend to have a limited view of
> "the right way to use synch rep". If I have 5 nodes, and I set 1
> synchronous and the other 3 asynchronous, I've set up a "known
> successor" in the event that the leader fails. In this scenario
> though, if the "successor" fails, you actually probably want to keep
> accepting writes; since you weren't using synchronous for durability
> but for operational simplicity. I suspect there are probably other
> scenarios where users are willing to trade latency for improved and/or
> directed durability but not at the extent of availability, don't you?

That's a workaround for a completely different limitation though; the
inability to designate a specific async replica as "first".  That is, if
there were some way to do so, you would be using that rather than sync
rep.  Extending the capabilities of that workaround is not something I
would gladly do until I had exhausted other options.

The other problem is that *many* users think they can get improved
availability, consistency AND durability on two nodes somehow, and to
heck with the CAP theorem (certain companies are happy to foster this
illusion).  Having a simple, easily-accessable auto-degrade without
treading degrade as a major monitoring event will feed this
self-deception.  I know I already have to explain the difference between
"synchronous" and "simultaneous" to practically every one of my clients
for whom I set up replication.

Realistically, degrade shouldn't be something that happens inside a
single PostgreSQL node, either the master or the replica.  It should be
controlled by some external controller which is capable of deciding on
degrade or not based on a more complex set of circumstances (e.g. "Is
the replica actually down or just slow?").  Certainly this is the case
with Cassandra, VoltDB, Riak, and the other "serious" multinode databases.

> This isn't to say there isn't a lot of confusion around the issue.
> Designing, implementing, and configuring different guarantees in the
> presence of node failures is a non-trivial problem. Still, I'd prefer
> to see Postgres head in the direction of providing more options in
> this area rather than drawing a firm line at being a CP-oriented
> system.

I'm not categorically opposed to having any form of auto-degrade at all;
what I'm opposed to is a patch which adds auto-degrade **without adding
any additional monitoring or management infrastructure at all**.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Bruce Momjian
On Thu, Jan  9, 2014 at 09:36:47AM -0800, Jeff Janes wrote:
> Oh, right.  Because the main reason for a sync replica degrading is that
> it's down.  In which case it isn't going to record anything.  This would
> still be useful for sync rep candidates, though, and I'll document why
> below.  But first, lemme demolish the case for auto-degrade.
> 
> So here's the case that we can't possibly solve for auto-degrade.
> Anyone who wants auto-degrade needs to come up with a solution for this
> case as a first requirement:
> 
> 
> It seems like the only deterministically useful thing to do is to send a 
> NOTICE
> to the *client* that the commit has succeeded, but in degraded mode, so keep
> your receipts and have your lawyer's number handy.  Whether anyone is willing
> to add code to the client to process that message is doubtful, as well as
> whether the client will even ever receive it if we are in the middle of a 
> major
> disruption.

I don't think clients are the right place for notification.  Clients
running on a single server could have fsync=off set by the admin or
lying drives and never know it.  I can't imagine a client only wiling to
run if synchronous_standby_names is set.

The synchronous slave is something the administrator has set up and is
responsible for, so the administrator should be notified.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Jeff Janes
On Wed, Jan 8, 2014 at 3:00 PM, Josh Berkus  wrote:

> On 01/08/2014 01:49 PM, Tom Lane wrote:
> > Josh Berkus  writes:
> >> If we really want auto-degrading sync rep, then we'd (at a minimum) need
> >> a way to determine *from the replica* whether or not it was in degraded
> >> mode when the master died.  What good do messages to the master log do
> >> you if the master no longer exists?
> >
> > How would it be possible for a replica to know whether the master had
> > committed more transactions while communication was lost, if the master
> > dies without ever restoring communication?  It sounds like pie in the
> > sky from here ...
>
> Oh, right.  Because the main reason for a sync replica degrading is that
> it's down.  In which case it isn't going to record anything.  This would
> still be useful for sync rep candidates, though, and I'll document why
> below.  But first, lemme demolish the case for auto-degrade.
>
> So here's the case that we can't possibly solve for auto-degrade.
> Anyone who wants auto-degrade needs to come up with a solution for this
> case as a first requirement:
>

It seems like the only deterministically useful thing to do is to send a
NOTICE to the *client* that the commit has succeeded, but in degraded mode,
so keep your receipts and have your lawyer's number handy.  Whether anyone
is willing to add code to the client to process that message is doubtful,
as well as whether the client will even ever receive it if we are in the
middle of a major disruption.

But I think  there is a good probabilistic justification for an
auto-degrade mode.  (And really, what else is there?  There are never any
real guarantees of anything.  Maybe none of your replicas ever come back
up.  Maybe none of your customers do, either.)



>
> 1. A data center network/power event starts.
>
> 2. The sync replica goes down.
>
> 3. A short time later, the master goes down.
>
> 4. Data center power is restored.
>
> 5. The master is fried and is a permanent loss.  The replica is ok, though.
>
> Question: how does the DBA know whether data has been lost or not?
>

What if he had a way of knowing that some data *has* been lost?  What can
he do about it?  What is the value in knowing it was lost after the fact,
but without the ability to do anything about it?

But let's say that instead of a permanent loss, the master can be brought
back up in a few days after replacing a few components, or in a few weeks
after sending the drives out to clean-room data recovery specialists.
 Writing has already failed over to the replica, because you couldn't wait
that long to bring things back up.

Once you get your old master back, you can see if transaction have been
lost, and if they have been you can dump the tables out to a human readable
format, use PITR and restore a copy of the replica to the point just before
the failover (although I'm not really sure exactly how to identify that
point) and dump that out, then use 'diff' tools to figure out what changes
to the database were lost, consult with the application specialists to
figure out what the application was doing that lead to those changes (if
that is not obvious) and business operations people to figure out how to
apply the analogous changes to the top of the database, and customer
service VP or someone to figure how to retroactively fix transactions that
were done after the failover which would have been differently had the lost
transactions not been lost.  Or instead of all that, you could look at the
recovered data and learn that in fact nothing had been lost, so nothing
further needs to be done.

If you were running in asyn replication mode on a busy server, there is a
virtual certainty that some transactions have been lost.  If you were
running in sync mode with possibility of auto-degrade, it is far from
certain.  That depends on how long the power event lasted, compared to how
long you had the timeout set to.

Or rather than a data-center-wide power spike, what if your master just
"done fell over" with no drama to the rest of the neighborhood? Inspection
after the fail-over to the replica shows the RAID controller card failed.
 There is no reason to think that a RAID controller, in the process of
failing, would have caused the replication to kick into degraded mode.  You
know from the surviving logs that the master spent 60 seconds total in
degraded mode over the last 3 months, so there is a 99.999% chance no
confirmed transactions were lost.  To be conservative, let's drop it to
99.99% because maybe some unknown mechanism did allow a failing RAID
controller to blip the network card without leaving any evidence behind.
That's a lot better than the chances of lost transactions while in async
replication mode, which could be 99.9% in the other direction.

Cheers,

Jeff


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Bruce Momjian
On Thu, Jan  9, 2014 at 04:55:22PM +0100, Hannu Krosing wrote:
> On 01/09/2014 04:15 PM, MauMau wrote:
> > From: "Hannu Krosing" 
> >> On 01/09/2014 01:57 PM, MauMau wrote:
> >>> Let me ask a (probably) stupid question.  How is the sync rep
> >>> different from RAID-1?
> >>>
> >>> When I first saw sync rep, I expected that it would provide the same
> >>> guarantees as RAID-1 in terms of durability (data is always mirrored
> >>> on two servers) and availability (if one server goes down, another
> >>> server continues full service).
> >> What you describe is most like A-sync rep.
> >>
> >> Sync rep makes sure that data is always replicated before confirming to
> >> writer.
> >
> > Really?  RAID-1 is a-sync?
> Not exactly, as there is no "master" just controller writing to two
> equal disks.
> 
> But having a "degraded" mode makes it
> more like async - it continues even with single disk and syncs later if
> and when the 2nd disk comes back.

I think RAID-1 is a very good comparison because it is successful
technology and has similar issues.

RAID-1 is like Postgres synchronous_standby_names mode in the sense that
the RAID-1 controller will not return success until writes have happened
on both mirrors, but it is unlike synchronous_standby_names in that it
will degrade and continue writes even when it can't write to both
mirrors.  What is being discussed is to allow the RAID-1 behavior in
Postgres.

One issue that came up in discussions is the insufficiency of writing a
degrade notice in a server log file because the log file isn't durable
from server failures, meaning you don't know if a fail-over to the slave
lost commits.  The degrade message has to be stored durably against a
server failure, e.g. on a pager, probably using a command like we do for
archive_command, and has to return success before the server continues
in degrade mode.  I assume degraded RAID-1 controllers inform
administrators in the same way.

I think RAID-1 controllers operate successfully with this behavior
because they are seen as durable and authoritative in reporting the
status of mirrors, while with Postgres, there is no central authority
that can report that degrade status of master/slaves.

Another concern with degrade mode is that once Postgres enters degrade
mode, how does it get back to synchronous_standby_names mode?  We could
have each commit wait for the timeout before continuing, but that is
going to make degrade mode unusably slow.  Would there be an admin
command?  With a timeout to force degrade mode, a temporary network
outage could cause degrade mode, while our current behavior would
recover synchronous_standby_names mode once the network was repaired.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Hannu Krosing
On 01/09/2014 04:15 PM, MauMau wrote:
> From: "Hannu Krosing" 
>> On 01/09/2014 01:57 PM, MauMau wrote:
>>> Let me ask a (probably) stupid question.  How is the sync rep
>>> different from RAID-1?
>>>
>>> When I first saw sync rep, I expected that it would provide the same
>>> guarantees as RAID-1 in terms of durability (data is always mirrored
>>> on two servers) and availability (if one server goes down, another
>>> server continues full service).
>> What you describe is most like A-sync rep.
>>
>> Sync rep makes sure that data is always replicated before confirming to
>> writer.
>
> Really?  RAID-1 is a-sync?
Not exactly, as there is no "master" just controller writing to two
equal disks.

But having a "degraded" mode makes it
more like async - it continues even with single disk and syncs later if
and when the 2nd disk comes back.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread MauMau

From: "Hannu Krosing" 

On 01/09/2014 01:57 PM, MauMau wrote:

Let me ask a (probably) stupid question.  How is the sync rep
different from RAID-1?

When I first saw sync rep, I expected that it would provide the same
guarantees as RAID-1 in terms of durability (data is always mirrored
on two servers) and availability (if one server goes down, another
server continues full service).

What you describe is most like A-sync rep.

Sync rep makes sure that data is always replicated before confirming to
writer.


Really?  RAID-1 is a-sync?

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Hannu Krosing
On 01/09/2014 02:01 AM, Jim Nasby wrote:
> On 1/8/14, 6:05 PM, Tom Lane wrote:
>> Josh Berkus  writes:
>>> >On 01/08/2014 03:27 PM, Tom Lane wrote:
 >>What we lack, and should work on, is a way for sync mode to have
 M larger
 >>than one.  AFAICS, right now we'll report commit as soon as
 there's one
 >>up-to-date replica, and some high-reliability cases are going to
 want
 >>more.
>>> >"Sync N times" is really just a guarantee against data loss as long as
>>> >you lose N-1 servers or fewer.  And it becomes an even
>>> >lower-availability solution if you don't have at least N+1 replicas.
>>> >For that reason, I'd like to see some realistic actual user demand
>>> >before we take the idea seriously.
>> Sure.  I wasn't volunteering to implement it, just saying that what
>> we've got now is not designed to guarantee data survival across failure
>> of more than one server.  Changing things around the margins isn't
>> going to improve such scenarios very much.
>>
>> It struck me after re-reading your example scenario that the most
>> likely way to figure out what you had left would be to see if some
>> additional system (think Nagios monitor, or monitors) had records
>> of when the various database servers went down.  This might be
>> what you were getting at when you said "logging", but the key point
>> is it has to be logging done on an external server that could survive
>> failure of the database server.  postmaster.log ain't gonna do it.
>
> Yeah, and I think that the logging command that was suggested allows
> for that *if configured correctly*.
*But* for relying on this, we would also need to make logging
*synchronous*,
which would probably not go down well with many people, as it makes things
even more fragile from availability viewpoint (and slower as well).

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Hannu Krosing
On 01/09/2014 01:57 PM, MauMau wrote:
> From: "Andres Freund" 
>> On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:
>>> If we have the following:
>>>
>>> db0->db1:down
>>>
>>> Using the model (as I understand it) that is being discussed we have
>>> increased our failure rate because the moment db1:down we also lose
>>> db0. The
>>> node db0 may be up but if it isn't going to process transactions it is
>>> useless. I can tell you that I have exactly 0 customers that would
>>> want that
>>> model because a single node failure would cause a double node failure.
>>
>> That's why you should configure a second standby as another (candidate)
>> synchronous replica, also listed in synchronous_standby_names.
>
> Let me ask a (probably) stupid question.  How is the sync rep
> different from RAID-1?
>
> When I first saw sync rep, I expected that it would provide the same
> guarantees as RAID-1 in terms of durability (data is always mirrored
> on two servers) and availability (if one server goes down, another
> server continues full service).
What you describe is most like A-sync rep.

Sync rep makes sure that data is always replicated before confirming to
writer.


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Hannu Krosing
On 01/08/2014 11:49 PM, Tom Lane wrote:
> "Joshua D. Drake"  writes:
>> On 01/08/2014 01:55 PM, Tom Lane wrote:
>>> Sync mode is about providing a guarantee that the data exists on more than
>>> one server *before* we tell the client it's committed.  If you don't need
>>> that guarantee, you shouldn't be using sync mode.  If you do need it,
>>> it's not clear to me why you'd suddenly not need it the moment the going
>>> actually gets tough.
>> As I understand it what is being suggested is that if a subscriber or 
>> target goes down, then the master will just sit there and wait. When I 
>> read that, I read that the master will no longer process write 
>> transactions. If I am wrong in that understanding then cool. If I am not 
>> then that is a serious problem with a production scenario. There is an 
>> expectation that a master will continue to function if the target is 
>> down, synchronous or not.
> Then you don't understand the point of sync mode, and you shouldn't be
> using it.  The point is *exactly* to refuse to commit transactions unless
> we can guarantee the data's been replicated.
For single host scenario this would be similar to asking for
a mode which turns fsync=off in case of disk failure :)


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Hannu Krosing
On 01/09/2014 12:05 AM, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
>> On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
>>> * Andres Freund (and...@2ndquadrant.com) wrote:
 That's why you should configure a second standby as another (candidate)
 synchronous replica, also listed in synchronous_standby_names.
>>> Perhaps we should stress in the docs that this is, in fact, the *only*
>>> reasonable mode in which to run with sync rep on?  Where there are
>>> multiple replicas, because otherwise Drake is correct that you'll just
>>> end up having both nodes go offline if the slave fails.
>> Which, as it happens, is actually documented.
> I'm aware, my point was simply that we should state, up-front in
> 25.2.7.3 *and* where we document synchronous_standby_names, that it
> requires at least three servers to be involved to be a workable
> solution.
>
> Perhaps we should even log a warning if only one value is found in
> synchronous_standby_names...
You can have only one name in synchronous_standby_names and
have multiple slaves connecting with that name

Also, I can attest that I have had clients who want exactly that - a system
stop until admin intervention in case of a designated sync standby failing.

And they actually run more than one standby, they just want to make
sure that sync rep to 2nd data center always happens.


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Hannu Krosing
On 01/09/2014 05:09 AM, Robert Treat wrote:
> On Wed, Jan 8, 2014 at 6:15 PM, Josh Berkus  wrote:
>> Stephen,
>>
>>
>>> I'm aware, my point was simply that we should state, up-front in
>>> 25.2.7.3 *and* where we document synchronous_standby_names, that it
>>> requires at least three servers to be involved to be a workable
>>> solution.
>> It's a workable solution with 2 servers.  That's a "low-availability,
>> high-integrity" solution; the user has chosen to double their risk of
>> not accepting writes against never losing a write.  That's a perfectly
>> valid configuration, and I believe that NTT runs several applications
>> this way.
>>
>> In fact, that can already be looked at as a kind of "auto-degrade" mode:
>> if there aren't two nodes, then the database goes read-only.
>>
>> Might I also point out that transactions are synchronous or not
>> individually?  The sensible configuration is for only the important
>> writes being synchronous -- in which case auto-degrade makes even less
>> sense.
>>
>> I really think that demand for auto-degrade is coming from users who
>> don't know what sync rep is for in the first place.  The fact that other
>> vendors are offering auto-degrade as a feature instead of the ginormous
>> foot-gun it is adds to the confusion, but we can't help that.
>>
> I think the problem here is that we tend to have a limited view of
> "the right way to use synch rep". If I have 5 nodes, and I set 1
> synchronous and the other 3 asynchronous, I've set up a "known
> successor" in the event that the leader fails. 
But there is no guarantee that the synchronous replica actually
is ahead of async ones.

> In this scenario
> though, if the "successor" fails, you actually probably want to keep
> accepting writes; since you weren't using synchronous for durability
> but for operational simplicity. I suspect there are probably other
> scenarios where users are willing to trade latency for improved and/or
> directed durability but not at the extent of availability, don't you?
>
Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread MauMau

From: "Andres Freund" 

On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:

If we have the following:

db0->db1:down

Using the model (as I understand it) that is being discussed we have
increased our failure rate because the moment db1:down we also lose db0. 
The

node db0 may be up but if it isn't going to process transactions it is
useless. I can tell you that I have exactly 0 customers that would want 
that

model because a single node failure would cause a double node failure.


That's why you should configure a second standby as another (candidate)
synchronous replica, also listed in synchronous_standby_names.


Let me ask a (probably) stupid question.  How is the sync rep different from 
RAID-1?


When I first saw sync rep, I expected that it would provide the same 
guarantees as RAID-1 in terms of durability (data is always mirrored on two 
servers) and availability (if one server goes down, another server continues 
full service).


The cost is reasonable with RAID-1.  The sync rep requires high cost to get 
both durability and availability --- three servers.


Am I expecting too much?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Robert Treat
On Wed, Jan 8, 2014 at 6:15 PM, Josh Berkus  wrote:
> Stephen,
>
>
>> I'm aware, my point was simply that we should state, up-front in
>> 25.2.7.3 *and* where we document synchronous_standby_names, that it
>> requires at least three servers to be involved to be a workable
>> solution.
>
> It's a workable solution with 2 servers.  That's a "low-availability,
> high-integrity" solution; the user has chosen to double their risk of
> not accepting writes against never losing a write.  That's a perfectly
> valid configuration, and I believe that NTT runs several applications
> this way.
>
> In fact, that can already be looked at as a kind of "auto-degrade" mode:
> if there aren't two nodes, then the database goes read-only.
>
> Might I also point out that transactions are synchronous or not
> individually?  The sensible configuration is for only the important
> writes being synchronous -- in which case auto-degrade makes even less
> sense.
>
> I really think that demand for auto-degrade is coming from users who
> don't know what sync rep is for in the first place.  The fact that other
> vendors are offering auto-degrade as a feature instead of the ginormous
> foot-gun it is adds to the confusion, but we can't help that.
>

I think the problem here is that we tend to have a limited view of
"the right way to use synch rep". If I have 5 nodes, and I set 1
synchronous and the other 3 asynchronous, I've set up a "known
successor" in the event that the leader fails. In this scenario
though, if the "successor" fails, you actually probably want to keep
accepting writes; since you weren't using synchronous for durability
but for operational simplicity. I suspect there are probably other
scenarios where users are willing to trade latency for improved and/or
directed durability but not at the extent of availability, don't you?

In fact there are entire systems that provide that type of thing. I
feel like it's worth mentioning that there's a nice primer on tunable
consistency in the Riak docs; strongly recommended.
http://docs.basho.com/riak/1.1.0/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/.
I'm not entirely sure how well it maps into our problem space, but it
at least gives you a sane working model to think about. If you were
trying to explain the Postgres case, async is like the N value (I want
the data to end up on this many nodes eventually) and sync is like the
W value (it must be written to this many nodes, or it should fail). Of
course, we only offer an R = 1, W = 1 or 2, and N = all. And it's
worse than that, because we have golden nodes.

This isn't to say there isn't a lot of confusion around the issue.
Designing, implementing, and configuring different guarantees in the
presence of node failures is a non-trivial problem. Still, I'd prefer
to see Postgres head in the direction of providing more options in
this area rather than drawing a firm line at being a CP-oriented
system.

Robert Treat
play: xzilla.net
work: omniti.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Jim Nasby

On 1/8/14, 6:05 PM, Tom Lane wrote:

Josh Berkus  writes:

>On 01/08/2014 03:27 PM, Tom Lane wrote:

>>What we lack, and should work on, is a way for sync mode to have M larger
>>than one.  AFAICS, right now we'll report commit as soon as there's one
>>up-to-date replica, and some high-reliability cases are going to want
>>more.

>"Sync N times" is really just a guarantee against data loss as long as
>you lose N-1 servers or fewer.  And it becomes an even
>lower-availability solution if you don't have at least N+1 replicas.
>For that reason, I'd like to see some realistic actual user demand
>before we take the idea seriously.

Sure.  I wasn't volunteering to implement it, just saying that what
we've got now is not designed to guarantee data survival across failure
of more than one server.  Changing things around the margins isn't
going to improve such scenarios very much.

It struck me after re-reading your example scenario that the most
likely way to figure out what you had left would be to see if some
additional system (think Nagios monitor, or monitors) had records
of when the various database servers went down.  This might be
what you were getting at when you said "logging", but the key point
is it has to be logging done on an external server that could survive
failure of the database server.  postmaster.log ain't gonna do it.


Yeah, and I think that the logging command that was suggested allows for that 
*if configured correctly*.

Automatic degradation to async is useful for protecting you against all modes 
of a single failure: Master fails, you've got the replica. Replica fails, 
you've got the master.

But fit hits the shan as soon as you get a double failure, and that double 
failure can be very subtle. Josh's case is not subtle: You lost power AND the 
master died. You KNOW you have two failures.

But what happens if there's a network blip that's not large enough to notice 
(but large enough to degrade your replication) and the master dies? Now you 
have no clue if you've lost data.

Compare this to async: if the master goes down (one failure), you have zero 
clue if you lost data or not. At least with auto-degredation you know you have 
to have 2 failures to suffer data loss.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane
Josh Berkus  writes:
> On 01/08/2014 03:27 PM, Tom Lane wrote:
>> What we lack, and should work on, is a way for sync mode to have M larger
>> than one.  AFAICS, right now we'll report commit as soon as there's one
>> up-to-date replica, and some high-reliability cases are going to want
>> more.

> "Sync N times" is really just a guarantee against data loss as long as
> you lose N-1 servers or fewer.  And it becomes an even
> lower-availability solution if you don't have at least N+1 replicas.
> For that reason, I'd like to see some realistic actual user demand
> before we take the idea seriously.

Sure.  I wasn't volunteering to implement it, just saying that what
we've got now is not designed to guarantee data survival across failure
of more than one server.  Changing things around the margins isn't
going to improve such scenarios very much.

It struck me after re-reading your example scenario that the most
likely way to figure out what you had left would be to see if some
additional system (think Nagios monitor, or monitors) had records
of when the various database servers went down.  This might be
what you were getting at when you said "logging", but the key point
is it has to be logging done on an external server that could survive
failure of the database server.  postmaster.log ain't gonna do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Jeff Janes
On Wed, Jan 8, 2014 at 2:56 PM, Stephen Frost  wrote:

> * Andres Freund (and...@2ndquadrant.com) wrote:
> > That's why you should configure a second standby as another (candidate)
> > synchronous replica, also listed in synchronous_standby_names.
>
> Perhaps we should stress in the docs that this is, in fact, the *only*
> reasonable mode in which to run with sync rep on?


I don't think it is the only reasonable way to run it.  Most of the time
that the master can't communicate with rep1, it is because of a network
problem.  So, the master probably can't talk to rep2 either, and adding the
second one doesn't really get you all that much in terms of availability.

Cheers,

Jeff


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
On 01/08/2014 03:27 PM, Tom Lane wrote:
> Good point, but C can't solve this for you just by logging.  If C was the
> first to go down, it has no way to know whether A and B committed more
> transactions before dying; and it's unlikely to have logged its own crash,
> either.

Sure.  But if we *knew* that C was not in synchronous mode when it went
down, then we'd expect some data loss.  As you point out, though, the
converse is not true; even if C was in sync mode, we don't know that
there's been no data loss, since B could come back up as a sync replica
before going down again.

> What we lack, and should work on, is a way for sync mode to have M larger
> than one.  AFAICS, right now we'll report commit as soon as there's one
> up-to-date replica, and some high-reliability cases are going to want
> more.

Yeah, we talked about having this when sync rep originally went in.  It
involves a LOT more bookeeping on the master though, which is why nobody
has been willing to attempt it -- and why we went with the
single-replica solution in the first place.  Especially since most
people who want "quorum sync" really want MM replication anyway.

"Sync N times" is really just a guarantee against data loss as long as
you lose N-1 servers or fewer.  And it becomes an even
lower-availability solution if you don't have at least N+1 replicas.
For that reason, I'd like to see some realistic actual user demand
before we take the idea seriously.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Jeff Janes
On Wed, Jan 8, 2014 at 2:23 PM, Joshua D. Drake wrote:

>
> On 01/08/2014 01:55 PM, Tom Lane wrote:
>
>  Sync mode is about providing a guarantee that the data exists on more than
>> one server *before* we tell the client it's committed.  If you don't need
>> that guarantee, you shouldn't be using sync mode.  If you do need it,
>> it's not clear to me why you'd suddenly not need it the moment the going
>> actually gets tough.
>>
>
> As I understand it what is being suggested is that if a subscriber or
> target goes down, then the master will just sit there and wait. When I read
> that, I read that the master will no longer process write transactions. If
> I am wrong in that understanding then cool. If I am not then that is a
> serious problem with a production scenario. There is an expectation that a
> master will continue to function if the target is down, synchronous or not.
>

My expectation is that the master stops writing checks when it finds it can
no longer cash them.

Cheers,

Jeff


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane
Josh Berkus  writes:
> HOWEVER, we've already kind of set up an indeterminate situation with
> allowing sync rep groups and candidate sync rep servers.  Consider this:

> 1. Master server A is configured with sync replica B and candidate sync
> replica C

> 2. A rolling power/network failure event occurs, which causes B and C to
> go down sometime before A, and all of them to go down before the
> application does.

> 3. On restore, only C is restorable; both A and B are a total loss.

> Again, we have no way to know whether or not C was in sync replication
> when it went down.  If C went down before B, then we've lost data; if B
> went down before C, we haven't.  But we can't find out.  *This* is where
> it would be useful to have C log whenever it went into (or out of)
> synchronous mode.

Good point, but C can't solve this for you just by logging.  If C was the
first to go down, it has no way to know whether A and B committed more
transactions before dying; and it's unlikely to have logged its own crash,
either.

More fundamentally, if you want to survive the failure of M out of N
nodes, you need a sync configuration that guarantees data is on at least
M+1 nodes before reporting commit.  The above example doesn't meet that,
so it's not surprising that you're screwed.

What we lack, and should work on, is a way for sync mode to have M larger
than one.  AFAICS, right now we'll report commit as soon as there's one
up-to-date replica, and some high-reliability cases are going to want
more.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
On 01/08/2014 03:18 PM, Stephen Frost wrote:
> Do you really feel that a WARNING and increasing the docs to point
> out that three systems are necessary, particularly under the 'high
> availability' documentation and options, is a bad idea?  I fail to see
> how that does anything but clarify the use-case for our users.

I think the warning is dumb, and that the suggested documentation change
is insufficient.  If we're going to clarify things, then we need to have
a full-on several-page doc showing several examples of different sync
rep configurations and explaining their tradeoffs (including the
different sync modes and per-transaction sync).  Anything short of that
is just going to muddy the waters further.

Mind you, someone needs to take a machete to the HA section of the docs
anyway.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Stephen Frost
Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
> > I'm aware, my point was simply that we should state, up-front in
> > 25.2.7.3 *and* where we document synchronous_standby_names, that it
> > requires at least three servers to be involved to be a workable
> > solution.
> 
> It's a workable solution with 2 servers.  That's a "low-availability,
> high-integrity" solution; the user has chosen to double their risk of
> not accepting writes against never losing a write.  That's a perfectly
> valid configuration, and I believe that NTT runs several applications
> this way.

I really don't agree with that when the standby going offline can take
out the master.  Note that I didn't say we shouldn't allow it, but I
don't think we should accept that it's a real-world solution.

> I really think that demand for auto-degrade is coming from users who
> don't know what sync rep is for in the first place.  The fact that other
> vendors are offering auto-degrade as a feature instead of the ginormous
> foot-gun it is adds to the confusion, but we can't help that.

Do you really feel that a WARNING and increasing the docs to point
out that three systems are necessary, particularly under the 'high
availability' documentation and options, is a bad idea?  I fail to see
how that does anything but clarify the use-case for our users.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane
Stephen Frost  writes:
> I'm aware, my point was simply that we should state, up-front in
> 25.2.7.3 *and* where we document synchronous_standby_names, that it
> requires at least three servers to be involved to be a workable
> solution.

It only requires that if your requirements include both redundant
data storage and tolerating single-node failure.  Now admittedly,
most people who want replication want it so they can have failure
tolerance, but I don't think it's insane to say that you want to
stop accepting writes if either node of a 2-node server drops out.
If you can only afford two nodes, and you need guaranteed redundancy
for business reasons, then that's where you end up.

Or in short, I'm against throwing warnings for this kind of setup.
I do agree that we need some doc improvements, since this is
evidently not clear enough yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
Stephen,


> I'm aware, my point was simply that we should state, up-front in
> 25.2.7.3 *and* where we document synchronous_standby_names, that it
> requires at least three servers to be involved to be a workable
> solution.

It's a workable solution with 2 servers.  That's a "low-availability,
high-integrity" solution; the user has chosen to double their risk of
not accepting writes against never losing a write.  That's a perfectly
valid configuration, and I believe that NTT runs several applications
this way.

In fact, that can already be looked at as a kind of "auto-degrade" mode:
if there aren't two nodes, then the database goes read-only.

Might I also point out that transactions are synchronous or not
individually?  The sensible configuration is for only the important
writes being synchronous -- in which case auto-degrade makes even less
sense.

I really think that demand for auto-degrade is coming from users who
don't know what sync rep is for in the first place.  The fact that other
vendors are offering auto-degrade as a feature instead of the ginormous
foot-gun it is adds to the confusion, but we can't help that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@2ndquadrant.com) wrote:
> > > That's why you should configure a second standby as another (candidate)
> > > synchronous replica, also listed in synchronous_standby_names.
> > 
> > Perhaps we should stress in the docs that this is, in fact, the *only*
> > reasonable mode in which to run with sync rep on?  Where there are
> > multiple replicas, because otherwise Drake is correct that you'll just
> > end up having both nodes go offline if the slave fails.
> 
> Which, as it happens, is actually documented.

I'm aware, my point was simply that we should state, up-front in
25.2.7.3 *and* where we document synchronous_standby_names, that it
requires at least three servers to be involved to be a workable
solution.

Perhaps we should even log a warning if only one value is found in
synchronous_standby_names...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund
On 2014-01-08 14:52:07 -0800, Joshua D. Drake wrote:
> On 01/08/2014 02:46 PM, Andres Freund wrote:
> >>The idea is that we know that data on db0 is not written until we know for a
> >>fact that db1 also has that data. That is great and a guarantee of data
> >>integrity between the two nodes.
> >
> >That guarantee is never there. The only thing guaranteed is that the
> >client isn't notified of the commit until db1 has received the data.
> 
> Well ugh on that.. but that is for another reply.

You do realize that locally you have the same guarantees? If the client
didn't receive a reply to a COMMIT you won't know whether the tx
committed or not. If that's not sufficient you need to use 2pc and a
transaction manager.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
On 01/08/2014 01:49 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> If we really want auto-degrading sync rep, then we'd (at a minimum) need
>> a way to determine *from the replica* whether or not it was in degraded
>> mode when the master died.  What good do messages to the master log do
>> you if the master no longer exists?
> 
> How would it be possible for a replica to know whether the master had
> committed more transactions while communication was lost, if the master
> dies without ever restoring communication?  It sounds like pie in the
> sky from here ...

Oh, right.  Because the main reason for a sync replica degrading is that
it's down.  In which case it isn't going to record anything.  This would
still be useful for sync rep candidates, though, and I'll document why
below.  But first, lemme demolish the case for auto-degrade.

So here's the case that we can't possibly solve for auto-degrade.
Anyone who wants auto-degrade needs to come up with a solution for this
case as a first requirement:

1. A data center network/power event starts.

2. The sync replica goes down.

3. A short time later, the master goes down.

4. Data center power is restored.

5. The master is fried and is a permanent loss.  The replica is ok, though.

Question: how does the DBA know whether data has been lost or not?

With current sync rep, it's easy: no data was lost, because the master
stopped accepting writes once the replica went down.  If we support
auto-degrade, though, there's no way to know; the replica doesn't have
that information, and anything which was on the master is permanently
lost.  And the point several people have made is: if you can live with
indeterminancy, then you're better off with async rep in the first place.

Now, what we COULD definitely use is a single-command way of degrading
the master when the sync replica is down.  Something like "ALTER SYSTEM
DEGRADE SYNC".  Right now you have to push a change to the conf file and
reload, and there's no way to salvage the transaction which triggered
the sync failure.  This would be a nice 9.5 feature.

HOWEVER, we've already kind of set up an indeterminate situation with
allowing sync rep groups and candidate sync rep servers.  Consider this:

1. Master server A is configured with sync replica B and candidate sync
replica C

2. A rolling power/network failure event occurs, which causes B and C to
go down sometime before A, and all of them to go down before the
application does.

3. On restore, only C is restorable; both A and B are a total loss.

Again, we have no way to know whether or not C was in sync replication
when it went down.  If C went down before B, then we've lost data; if B
went down before C, we haven't.  But we can't find out.  *This* is where
it would be useful to have C log whenever it went into (or out of)
synchronous mode.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund
On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > That's why you should configure a second standby as another (candidate)
> > synchronous replica, also listed in synchronous_standby_names.
> 
> Perhaps we should stress in the docs that this is, in fact, the *only*
> reasonable mode in which to run with sync rep on?  Where there are
> multiple replicas, because otherwise Drake is correct that you'll just
> end up having both nodes go offline if the slave fails.

Which, as it happens, is actually documented.

http://www.postgresql.org/docs/devel/static/warm-standby.html#SYNCHRONOUS-REPLICATION
25.2.7.3. Planning for High Availability

"Commits made when synchronous_commit is set to on or remote_write will
wait until the synchronous standby responds. The response may never
occur if the last, or only, standby should crash.

The best solution for avoiding data loss is to ensure you don't lose
your last remaining synchronous standby. This can be achieved by naming
multiple potential synchronous standbys using
synchronous_standby_names. The first named standby will be used as the
synchronous standby. Standbys listed after this will take over the role
of synchronous standby if the first one should fail."


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Joshua D. Drake


On 01/08/2014 02:49 PM, Tom Lane wrote:


Then you don't understand the point of sync mode, and you shouldn't be
using it.  The point is *exactly* to refuse to commit transactions unless
we can guarantee the data's been replicated.


I understand exactly that and I don't disagree, except in the case where 
it is going to bring down the master (see my further reply). I now 
remember arguing about this a few years ago when we started down the 
sync path.


Anyway, perhaps this is just something of a knob that can be turned. We 
don't have to continue the argument. Thank you for considering what I 
was saying.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary 
act.", George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
> That's why you should configure a second standby as another (candidate)
> synchronous replica, also listed in synchronous_standby_names.

Perhaps we should stress in the docs that this is, in fact, the *only*
reasonable mode in which to run with sync rep on?  Where there are
multiple replicas, because otherwise Drake is correct that you'll just
end up having both nodes go offline if the slave fails.

Thanks,

Stephen


signature.asc
Description: Digital signature


  1   2   >