Re: [ClusterLabs] Does CMAN Still Not Support Multipe CoroSync Rings?

2018-02-13 Thread Eric Robinson
Thanks for the  suggestion everyone. I'll give that a try.

> -Original Message-
> From: Jan Friesse [mailto:jfrie...@redhat.com]
> Sent: Monday, February 12, 2018 8:49 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> ; Eric Robinson 
> Subject: Re: [ClusterLabs] Does CMAN Still Not Support Multipe CoroSync
> Rings?
> 
> Eric,
> 
> > General question. I tried to set up a cman + corosync + pacemaker
> > cluster using two corosync rings. When I start the cluster, everything
> > works fine, except when I do a 'corosync-cfgtool -s' it only shows one
> > ring. I tried manually editing the /etc/cluster/cluster.conf file
> > adding two 
> 
> AFAIK cluster.conf should be edited so altname is used. So something like in
> this example:
> https://access.redhat.com/documentation/en-
> us/red_hat_enterprise_linux/6/html/cluster_administration/s1-config-rrp-
> cli-ca
> 
> I don't think you have to add altmulticast.
> 
> Honza
> 
> sections, but then cman complained that I didn't have a multicast address
> specified, even though I did. I tried editing the /etc/corosdync/corosync.conf
> file, and then I could get two rings, but the nodes would not both join the
> cluster. Bah! I did some reading and saw that cman didn't support multiple
> rings years ago. Did it never get updated?
> >
> > [sig]
> >
> >
> >
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Maxim

13.02.2018 16:41, Klaus Wenninger пишет:

Let's put that differently.  With fencing you can make the

> loss-detection more aggressive and thus more prone to false-positives
> without risking a split-brain situation. (Actually without fencing
> you can never be really sure if the other side is really gone!) But
> to be honest if you are really behind sub-second
> detection/switchover I'm not sure if fencing - at least with the
> current implementation in pacemaker and the current selection of
> fencing-devices - will give you satisfactory results.
>
>> [Unfortunatly, I've no a hardware that implements fencing
>> abilities nearby and can't try it myself]
>
> If you don't have any of the usual fencing-devices available you
> might have some kind of a shared-disk that might be usable with SBD.
> For a 2-node-cluster with a single shared-disk (as in your case if I
> got it correctly) assure to pick an SBD-version that has
> 
https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377.

>
> But again I doubt that this will work reliably with sub-second 
requirements.




> Not saying I'm not interested in experiences/requirements with
> pacemaker doing failovers in a sub-second or more relaxed
> low-single-digit-second timeframe. Seeing this working reliably would
> open up pacemaker for a completely new class of applications.
>
> Regards, Klaus

I was in a sceptical mind too... especially when i've seen the monitor 
intervals of pacemaker resource agents :D
So <1 sec timings for an issue_detection & resource_moves seems are 
unachiavable by facilities of the current cluster software.

By the architectural reasons as well.
Thank you for the support and proposals.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Maxim

13.02.2018 16:41, Klaus Wenninger пишет:

Let's put that differently.  With fencing you can make the

> loss-detection more aggressive and thus more prone to false-positives
> without risking a split-brain situation. (Actually without fencing
> you can never be really sure if the other side is really gone!) But
> to be honest if you are really behind sub-second
> detection/switchover I'm not sure if fencing - at least with the
> current implementation in pacemaker and the current selection of
> fencing-devices - will give you satisfactory results.

> If you don't have any of the usual fencing-devices available you
> might have some kind of a shared-disk that might be usable with SBD.
> For a 2-node-cluster with a single shared-disk (as in your case if I
> got it correctly) assure to pick an SBD-version that has
> 
https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377.

>
> But again I doubt that this will work reliably with sub-second 
requirements.





> Not saying I'm not interested in experiences/requirements with
> pacemaker doing failovers in a sub-second or more relaxed
> low-single-digit-second timeframe. Seeing this working reliably would
> open up pacemaker for a completely new class of applications.
>
> Regards, Klaus

I was in a sceptical mind too... especially when i've seen the monitor 
intervals of pacemaker resource agents :D
So <1 sec timings for an issue_detection & resource_moves seems are 
unachiavable by facilities of the current cluster software. By the 
architectural reasons as well.


Thank you for the support and proposals.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Digimer
On 2018-02-13 05:46 AM, Maxim wrote:
> 12.02.2018 19:31, Digimer пишет:
>> Without fencing, all bets are  off. Please enable it and see if the
>> issue remains
> Seems, i know [in theory] about the fencing ability and its importance
> (although I've never configured it so far).
> But i don't undestand how it would help in the situtions of the hard
> reboot/shutdown.

An availability cluster's job is to keep things running. To do this,
there must be coordination between the nodes (otherwise, just run things
everywhere and be done with it). Thus, when a node stops responding, it
is critical that the lost node be put into a known state.

If you allow assumptions to be made, you will eventually assume wrong.
That could have consequences as "minor" as confusing switches/routers to
as devastating as corrupted data.

Fencing is not meant to speed up recovery, it is critical to ensuring
recovery works at all.

This is a common confusion (and people often mistakenly think that
quorum is how you avoid this, which is incorrect). There is no
replacement for fencing; You need it in any availability system. Without
it, it is like driving without a seat-belt.

https://www.alteeve.com/w/The_2-Node_Myth

>> Changing EL6 to corosync 2  pushes further into uncharted waters. EL6
>> should be using the cman pluging with corosync 1. May I ask why you
>> don't use EL7 if you want such a recent stack?
> For historical reasons. Let's say so. I've another software that built
> for RHEL 6 like OS and have to be installed on the cluster node.
> EL 7 stack is already not so recent, but it's one the most stable and
> least vulnearable, i suppose. And i understand the risks.
> I will update pcs to the latest version when i find a bit of free time.
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Ken Gaillot
On Tue, 2018-02-13 at 13:46 +0300, Maxim wrote:
> 12.02.2018 19:31, Digimer пишет:
> >  > should be using the cman pluging with corosync 1. May I ask why
> you
>  > don't use EL7 if you want such a recent stack?
> For historical reasons. Let's say so. I've another software that
> built 
> for RHEL 6 like OS and have to be installed on the cluster node.

Compiling a newer corosync/pacemaker is a perfectly good solution in
this situation, but just to give you more options:

You could instead put the app inside a RHEL 6 container, and run it on
RHEL 7 cluster hosts. The advantage of that approach is that the rest
of your usual system services would be on more modern versions. With
bundles (available in the newer pacemaker on RHEL 7), you can use your
existing resource agent to launch the service inside the bundle, so the
cluster can monitor it (as well as monitoring the container itself).

Similarly, you could create a RHEL 6 VM and run it on RHEL 7 cluster
hosts. You can add the remote-node option to the VM resource, to be
able to launch and monitor the app inside it via its resource agent.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Klaus Wenninger
On 02/13/2018 01:28 PM, Maxim wrote:
> 13.02.2018 14:03, Klaus Wenninger пишет:
>> - fencing helps you turning the  'maybe the node is down - it doesn't
> > respond within x milli-seconds' into certainty that your node is dead
> > and won't interfere with the rest of the cluster
> >
> > Regards, Klaus
>
> It is clear. But will it force pacemaker to perceive that the node is
> down faster?

Let's put that differently. With fencing you can make the loss-detection
more
aggressive and thus more prone to false-positives without risking a
split-brain situation. (Actually without fencing you can never be really
sure if the other side is really gone!)
But to be honest if you are really behind sub-second detection/switchover
I'm not sure if fencing - at least with the current implementation in
pacemaker and the current selection of fencing-devices - will
give you satisfactory results.

> [Unfortunatly, I've no a hardware that implements fencing abilities
> nearby and can't try it myself]

If you don't have any of the usual fencing-devices available you might
have some kind of a shared-disk that might be usable with SBD.
For a 2-node-cluster with a single shared-disk (as in your case if I got
it correctly) assure to pick an SBD-version that has
https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377.
But again I doubt that this will work reliably with sub-second requirements.

>
> [Seems, it is the last question from my side that is devoted to this
> topic]
>
> Thank you and Ken for the participation!
>
> Regards,
> Maxim

Not saying I'm not interested in experiences/requirements with
pacemaker doing failovers in a sub-second or more relaxed
low-single-digit-second timeframe.
Seeing this working reliably would open up pacemaker for a
completely new class of applications.

Regards,
Klaus

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Maxim

13.02.2018 14:03, Klaus Wenninger пишет:

- fencing helps you turning the  'maybe the node is down - it doesn't

> respond within x milli-seconds' into certainty that your node is dead
> and won't interfere with the rest of the cluster
>
> Regards, Klaus

It is clear. But will it force pacemaker to perceive that the node is 
down faster?
[Unfortunatly, I've no a hardware that implements fencing abilities 
nearby and can't try it myself]


[Seems, it is the last question from my side that is devoted to this topic]

Thank you and Ken for the participation!

Regards,
Maxim
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Klaus Wenninger
On 02/13/2018 11:46 AM, Maxim wrote:
> 12.02.2018 19:31, Digimer пишет:
>> Without fencing, all bets are  off. Please enable it and see if the
> > issue remains
> Seems, i know [in theory] about the fencing ability and its importance
> (although I've never configured it so far).
> But i don't undestand how it would help in the situtions of the hard
> reboot/shutdown.

Actually in 2 ways:

- you are strongly advised to use fencing - and thus the base of users using
  fencing is much higher and strange/unexpected behavior is thus much
  more likely with the less tested setups without fencing
- fencing helps you turning the 'maybe the node is down - it doesn't respond
  within x milli-seconds' into certainty that your node is dead and won't
  interfere with the rest of the cluster

Regards,
Klaus
>
>> Changing EL6 to corosync 2  pushes further into uncharted waters. EL6
> > should be using the cman pluging with corosync 1. May I ask why you
> > don't use EL7 if you want such a recent stack?
> For historical reasons. Let's say so. I've another software that built
> for RHEL 6 like OS and have to be installed on the cluster node.
> EL 7 stack is already not so recent, but it's one the most stable and
> least vulnearable, i suppose. And i understand the risks.
> I will update pcs to the latest version when i find a bit of free time.
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Maxim

12.02.2018 19:31, Digimer пишет:

Without fencing, all bets are  off. Please enable it and see if the

> issue remains
Seems, i know [in theory] about the fencing ability and its importance 
(although I've never configured it so far).
But i don't undestand how it would help in the situtions of the hard 
reboot/shutdown.



Changing EL6 to corosync 2  pushes further into uncharted waters. EL6

> should be using the cman pluging with corosync 1. May I ask why you
> don't use EL7 if you want such a recent stack?
For historical reasons. Let's say so. I've another software that built 
for RHEL 6 like OS and have to be installed on the cluster node.
EL 7 stack is already not so recent, but it's one the most stable and 
least vulnearable, i suppose. And i understand the risks.

I will update pcs to the latest version when i find a bit of free time.
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

2018-02-13 Thread Maxim

12.02.2018 18:46, Klaus Wenninger пишет:
> Maybe a few notes on the other way ;-) In general it is not easy to
> have a reliable answer to the question if the other node is down
> within just let's say 100ms. Think of network-hickups, scheduling
> issues and alike ... But if you are willing to accept
> false-positives you can reduce the token timeout of corosync instead
> of having another script that tries to do the job corosync is (amonst
> other things) made for (At least that is how I understood what you
> are aiming to do.).
>
> Regards, Klaus

Thank you again, Klaus.
Your description helps me to recognize a situation better (i've 
overworked a bit and can't this this not so nontrivial think by myself =)).


[
I've a scenario in the mind when an ability to mark a corosync ring as 
failed would be useful, but It doesn't relate to this topic.
It implemention on a corosync side would require some additional 
functionality for "checking" (let's call them so) rings that can be used 
only for network checking (not for cluster data synchronization). And 
the brokage of all "checking" rings (or some more enhanced logic) would 
indicate that the node is down or has the split brain. Just an idea.

]

No the ability? Ok, i try to deal with it )
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org