Re: [Pacemaker] sbd fencing race

2014-11-26 Thread Dejan Muhamedagic
Hi,

On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
 Hi list,
 
 The last night, i had a cluster in fencing race using sbd as stonith

Can you give a bit more details.

 device, i would like to know what is the effect to use start-delay in
 my stonith resource in this way:
 
 primitive stonith-sbd stonith:external/sbd \
 params sbd_device=/dev/mapper/SBD \
 op start interval=0 start-delay=5

Yes, that could help with a stonith deathmatch. Normally, you
have a stonith resource running on one node. On split brain, the
other node also starts the resource in order to shoot the first
node. That's where start-delay comes into play.

Ultimate resource for the issue: http://ourobengr.com/ha/

Cheers,

Dejan

 Thanks
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] sbd fencing race

2014-11-26 Thread emmanuel segura
But i would like to know if pacemaker needs to start sbd on the node
where sbd resource isnt running to fence the other nodes, because i
don't see any start action in the second node:

:::

message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do
NOT have quorum!
message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker
health check: UNHEALTHY
message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice:
LogActions: Leave   stonith-sbd(Started node01)
message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB:
We do NOT have quorum!

:

message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do
NOT have quorum!
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker
health check: UNHEALTHY
message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN:
custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable
(offline)
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery
process handling /dev/mapper/SBD01B0298700230
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing
reset to node slot node01
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 40



Thanks

2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:
 Hi,

 On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
 Hi list,

 The last night, i had a cluster in fencing race using sbd as stonith

 Can you give a bit more details.

 device, i would like to know what is the effect to use start-delay in
 my stonith resource in this way:

 primitive stonith-sbd stonith:external/sbd \
 params sbd_device=/dev/mapper/SBD \
 op start interval=0 start-delay=5

 Yes, that could help with a stonith deathmatch. Normally, you
 have a stonith resource running on one node. On split brain, the
 other node also starts the resource in order to shoot the first
 node. That's where start-delay comes into play.

 Ultimate resource for the issue: http://ourobengr.com/ha/

 Cheers,

 Dejan

 Thanks

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Problem with ClusterIP

2014-11-26 Thread Anne Nicolas

Hi !

I've been using clusterip for a while now without any problem in 
Active/Passive clusters (2 nodes). On my last install, I'm facing quite 
an annoying probem. Despite the same configuration for clusterip I've 
ever used, , the interface is now up on both nodes which ends with an IP 
conflict.


I'm looking for ideas to investigate the causes for such a problem. If 
anybody can help me on this, I would be gratefull


Cheers
--
Anne
http://mageia.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-11-26 Thread Daniel Dehennin
Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 I'll try find how to make the change directly in XML.

 Ok, looking at git history this feature seems only available on master
 branch and not yet released.

I do not have that feature on my pacemaker version.

Does it sounds normal, I have:

- asymmetrical Opt-in cluster[1]

- a group of resources with INFINITY location on a specific node

And the nodes excluded are fenced because of many monitor errors about
this resource.

Regards.

Footnotes: 
[1]  
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_asymmetrical_opt_in_clusters.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with ClusterIP

2014-11-26 Thread Michael Schwartzkopff
Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas:
 Hi !
 
 I've been using clusterip for a while now without any problem in
 Active/Passive clusters (2 nodes).

Could you please explain, how could you use the ClusterIP in an active/passive 
cluster? ClusterIP ist for the use in an active/active cluster. See
man iptables and look for the CLUSTERIP target.

 On my last install, I'm facing quite
 an annoying probem. Despite the same configuration for clusterip I've
 ever used, , the interface is now up on both nodes which ends with an IP
 conflict.

Please explain more detailed.
What is your config?
What do you expect the cluster to do?
What really happens?
Where is  the problem?

 I'm looking for ideas to investigate the causes for such a problem. If
 anybody can help me on this, I would be gratefull

Yes, I think I can help ;-)


Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with ClusterIP

2014-11-26 Thread Anne Nicolas

Le 26/11/2014 12:23, Michael Schwartzkopff a écrit :

Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas:

Hi !

I've been using clusterip for a while now without any problem in
Active/Passive clusters (2 nodes).


Could you please explain, how could you use the ClusterIP in an active/passive
cluster? ClusterIP ist for the use in an active/active cluster. See
man iptables and look for the CLUSTERIP target.




Please explain more detailed.
What is your config?
What do you expect the cluster to do?
What really happens?
Where is  the problem?


Maybe my explanation was not that clear. Here is my configuration

crm configuration show

node $id=17435146 pogcupsvr
node $id=34212362 pogcupsvr2
primitive apache ocf:heartbeat:apache \
params configfile=/etc/httpd/conf/httpd.conf \
op start interval=0 timeout=40s \
op stop interval=0 timeout=60s
primitive clusterip ocf:heartbeat:IPaddr2 \
params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
meta target-role=Started
...
property $id=cib-bootstrap-options \

dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \
cluster-infrastructure=corosync \
stonith-enabled=false \
no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
resource-stickiness=100

So I've started primary node (pogcupsvr). Configuration was checked and 
ok. Then started the second node (pogcupsvr2). This time all the 
configuration looked ok, no error but when I checked the network 
configuration, eth0 was up on both nodes with same IP address of course, 
instead of having it up only on primary node.


What I was expected (and in all other tests it was ok ) is that eth0 was 
up only on primary node and used by apache server.



I'm looking for ideas to investigate the causes for such a problem. If
anybody can help me on this, I would be gratefull


Yes, I think I can help ;-)


Thanks for that :)



Mit freundlichen Grüßen,

Michael Schwartzkopff



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
Anne
http://mageia.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-11-26 Thread Vladislav Bogdanov
26.11.2014 14:21, Daniel Dehennin wrote:
 Daniel Dehennin daniel.dehen...@baby-gnu.org writes:
 
 I'll try find how to make the change directly in XML.

 Ok, looking at git history this feature seems only available on master
 branch and not yet released.
 
 I do not have that feature on my pacemaker version.
 
 Does it sounds normal, I have:
 
 - asymmetrical Opt-in cluster[1]
 
 - a group of resources with INFINITY location on a specific node
 
 And the nodes excluded are fenced because of many monitor errors about
 this resource.

Nodes may be fenced because of resource _only_ if resource fails to
stop. I can only guess what exactly happens:
* cluster probes all resource on all nodes (to prevent that you need
feature mentioned by David)
* some of resource probes return something except not running
* cluster tries to stop that resources
* stop fails
* node is fenced

You need to locate what exactly resource returns error on probe and fix
that agent (actually you do not use OCF agents but rather upstart jobs
and LSB scripts).

Above is for the case if all nodes have mysql job and both scripts
installed.

If pacemaker decides to fence because one of them is missing - that
should be a bug.

 
 Regards.
 
 Footnotes: 
 [1]  
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_asymmetrical_opt_in_clusters.html
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with ClusterIP

2014-11-26 Thread Michael Schwartzkopff
Am Mittwoch, 26. November 2014, 12:54:20 schrieb Anne Nicolas:
 Le 26/11/2014 12:23, Michael Schwartzkopff a écrit :
  Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas:
  Hi !
  
  I've been using clusterip for a while now without any problem in
  Active/Passive clusters (2 nodes).
  
  Could you please explain, how could you use the ClusterIP in an
  active/passive cluster? ClusterIP ist for the use in an active/active
  cluster. See man iptables and look for the CLUSTERIP target.
  
  
  Please explain more detailed.
  What is your config?
  What do you expect the cluster to do?
  What really happens?
  Where is  the problem?
 
 Maybe my explanation was not that clear.

Yes.

 Here is my configuration

 crm configuration show
 
 node $id=17435146 pogcupsvr
 node $id=34212362 pogcupsvr2
 primitive apache ocf:heartbeat:apache \
  params configfile=/etc/httpd/conf/httpd.conf \
  op start interval=0 timeout=40s \
  op stop interval=0 timeout=60s
 primitive clusterip ocf:heartbeat:IPaddr2 \
  params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
  meta target-role=Started
 ...
 property $id=cib-bootstrap-options \
 
 dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \
  cluster-infrastructure=corosync \
  stonith-enabled=false \
  no-quorum-policy=ignore
 rsc_defaults $id=rsc-options \
  resource-stickiness=100
 
 So I've started primary node (pogcupsvr). Configuration was checked and
 ok. Then started the second node (pogcupsvr2). This time all the
 configuration looked ok, no error but when I checked the network
 configuration, eth0 was up on both nodes with same IP address of course,
 instead of having it up only on primary node.

If that is your config than the start of the IP address on BOTH nodes is really 
bad. This should not happen and is definitely an error.

BUT: I doubt that this is you complete config, because this would not work 
anyway. The cluster would start the IP address on one node and the Webserver 
in the other node.

Please paste the complete config. Then the community would be able to help.

Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with ClusterIP

2014-11-26 Thread Anne Nicolas

Le 26/11/2014 13:07, Michael Schwartzkopff a écrit :

Am Mittwoch, 26. November 2014, 12:54:20 schrieb Anne Nicolas:

Le 26/11/2014 12:23, Michael Schwartzkopff a écrit :

Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas:

Hi !

I've been using clusterip for a while now without any problem in
Active/Passive clusters (2 nodes).


Could you please explain, how could you use the ClusterIP in an
active/passive cluster? ClusterIP ist for the use in an active/active
cluster. See man iptables and look for the CLUSTERIP target.


Please explain more detailed.
What is your config?
What do you expect the cluster to do?
What really happens?
Where is  the problem?


Maybe my explanation was not that clear.


Yes.


Here is my configuration

crm configuration show

node $id=17435146 pogcupsvr
node $id=34212362 pogcupsvr2
primitive apache ocf:heartbeat:apache \
  params configfile=/etc/httpd/conf/httpd.conf \
  op start interval=0 timeout=40s \
  op stop interval=0 timeout=60s
primitive clusterip ocf:heartbeat:IPaddr2 \
  params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
  meta target-role=Started
...
property $id=cib-bootstrap-options \

dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \
  cluster-infrastructure=corosync \
  stonith-enabled=false \
  no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
  resource-stickiness=100

So I've started primary node (pogcupsvr). Configuration was checked and
ok. Then started the second node (pogcupsvr2). This time all the
configuration looked ok, no error but when I checked the network
configuration, eth0 was up on both nodes with same IP address of course,
instead of having it up only on primary node.


If that is your config than the start of the IP address on BOTH nodes is really
bad. This should not happen and is definitely an error.

BUT: I doubt that this is you complete config, because this would not work
anyway. The cluster would start the IP address on one node and the Webserver
in the other node.

Please paste the complete config. Then the community would be able to help.



Here is the complete configuration:

node $id=17435146 pogcupsvr
node $id=34212362 pogcupsvr2
primitive apache ocf:heartbeat:apache \
params configfile=/etc/httpd/conf/httpd.conf \
op start interval=0 timeout=40s \
op stop interval=0 timeout=60s
primitive clusterip ocf:heartbeat:IPaddr2 \
params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
meta target-role=Started
primitive drbdserv ocf:linbit:drbd \
params drbd_resource=server \
op monitor interval=60s
primitive fsserv ocf:heartbeat:Filesystem \
params device=/dev/drbd/by-res/server directory=/clusterfs 
fstype=ext4

primitive libvirt-guests lsb:libvirt-guests
primitive libvirtd lsb:libvirtd
primitive mysql ocf:heartbeat:mysql \
params binary=/usr/bin/mysqld_safe config=/etc/my.cnf 
datadir=/clusterfs/mysql \

op start interval=0 timeout=40s \
op stop interval=0 timeout=60s \
meta target-role=Started
primitive named lsb:named
primitive samba lsb:smb
group services fsserv clusterip libvirtd samba apache mysql
ms drbdservClone drbdserv \
meta master-max=1 master-node-max=1 clone-max=2 
clone-node-max=1 notify=true

colocation fs_on_drbd inf: fsserv drbdservClone:Master
order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
property $id=cib-bootstrap-options \

dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \
cluster-infrastructure=corosync \
stonith-enabled=false \
no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
resource-stickiness=100


--
Anne
http://mageia.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with ClusterIP

2014-11-26 Thread Keith Ouellette
Anne,

Are you expecting the eth0 to actually put in the down state like using the 
ifconfig eth0 down command? If so the IPaddr2 resource does not do that. What 
that is used for is to configure a second IP address on the NIC that can be 
moved around from eth0 on each node. Can you clearify that? Also, can you 
paste the output of the ip addr command as well? The full configuration of 
crm configure show would also be helpful

Thanks,
Keith



Keith Ouellette

kei...@fibermountain.com
700 West Johnson Avenue
Cheshire, CT06410
www.fibermountain.com
P. (203) 806-4046
C. (860) 810-4877
F. (845) 358-7882

Disclaimer: The information contained in this communication is confidential, 
may be privileged and is intended for the exclusive use of the above named 
addressee(s). If you are not the intended recipient(s), you are expressly 
prohibited from copying, distributing, disseminating, or in any other way using 
any information contained within this communication. If you have received this 
communication in error, please contact the sender by telephone or by response 
via mail. We have taken precautions to minimize the risk of transmitting 
software viruses, but we advise you to carry out your own virus checks on this 
message, as well as any attachments. We cannot accept liability for any loss or 
damage caused by software viruses.

-Original Message-
From: Anne Nicolas [mailto:enna...@gmail.com]
Sent: Wednesday, November 26, 2014 6:54 AM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Problem with ClusterIP

Le 26/11/2014 12:23, Michael Schwartzkopff a écrit :
 Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas:
 Hi !

 I've been using clusterip for a while now without any problem in
 Active/Passive clusters (2 nodes).

 Could you please explain, how could you use the ClusterIP in a
 active/passive cluster? ClusterIP ist for the use in an active/active
 cluster. See man iptables and look for the CLUSTERIP target.


 Please explain more detailed.
 What is your config?
 What do you expect the cluster to do?
 What really happens?
 Where is  the problem?

Maybe my explanation was not that clear. Here is my configuration

crm configuration show

node $id=17435146 pogcupsvr
node $id=34212362 pogcupsvr2
primitive apache ocf:heartbeat:apache \
 params configfile=/etc/httpd/conf/httpd.conf \
 op start interval=0 timeout=40s \
 op stop interval=0 timeout=60s
primitive clusterip ocf:heartbeat:IPaddr2 \
 params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
 meta target-role=Started
...
property $id=cib-bootstrap-options \

dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \
 cluster-infrastructure=corosync \
 stonith-enabled=false \
 no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
 resource-stickiness=100

So I've started primary node (pogcupsvr). Configuration was checked and ok. 
Then started the second node (pogcupsvr2). This time all the configuration 
looked ok, no error but when I checked the network configuration, eth0 was up 
on both nodes with same IP address of course, instead of having it up only on 
primary node.

What I was expected (and in all other tests it was ok ) is that eth0 was up 
only on primary node and used by apache server.

 I'm looking for ideas to investigate the causes for such a problem.
 If anybody can help me on this, I would be gratefull

 Yes, I think I can help ;-)

Thanks for that :)


 Mit freundlichen Grüßen,

 Michael Schwartzkopff



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



--
Anne
http://mageia.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with ClusterIP

2014-11-26 Thread Anne Nicolas
2014-11-26 13:22 GMT+01:00 Keith Ouellette kei...@fibermountain.com:

 Anne,

 Are you expecting the eth0 to actually put in the down state like using
 the ifconfig eth0 down command? If so the IPaddr2 resource does not do
 that. What that is used for is to configure a second IP address on the NIC
 that can be moved around from eth0 on each node. Can you clearify that?
 Also, can you paste the output of the ip addr command as well? The full
 configuration of crm configure show would also be helpful


Maybe I've misunderstood documentation but just reused
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/#_perform_a_failover
 It seemed to work at the beginning as expected.

I've sent some minutes ago my full configuration


 Thanks,
 Keith



 Keith Ouellette

 kei...@fibermountain.com
 700 West Johnson Avenue
 Cheshire, CT06410
 www.fibermountain.com
 P. (203) 806-4046
 C. (860) 810-4877
 F. (845) 358-7882

 Disclaimer: The information contained in this communication is
 confidential, may be privileged and is intended for the exclusive use of
 the above named addressee(s). If you are not the intended recipient(s), you
 are expressly prohibited from copying, distributing, disseminating, or in
 any other way using any information contained within this communication. If
 you have received this communication in error, please contact the sender by
 telephone or by response via mail. We have taken precautions to minimize
 the risk of transmitting software viruses, but we advise you to carry out
 your own virus checks on this message, as well as any attachments. We
 cannot accept liability for any loss or damage caused by software viruses.

 -Original Message-
 From: Anne Nicolas [mailto:enna...@gmail.com]
 Sent: Wednesday, November 26, 2014 6:54 AM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] Problem with ClusterIP

 Le 26/11/2014 12:23, Michael Schwartzkopff a écrit :
  Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas:
  Hi !
 
  I've been using clusterip for a while now without any problem in
  Active/Passive clusters (2 nodes).
 
  Could you please explain, how could you use the ClusterIP in a
  active/passive cluster? ClusterIP ist for the use in an active/active
  cluster. See man iptables and look for the CLUSTERIP target.

 
  Please explain more detailed.
  What is your config?
  What do you expect the cluster to do?
  What really happens?
  Where is  the problem?

 Maybe my explanation was not that clear. Here is my configuration

 crm configuration show

 node $id=17435146 pogcupsvr
 node $id=34212362 pogcupsvr2
 primitive apache ocf:heartbeat:apache \
  params configfile=/etc/httpd/conf/httpd.conf \
  op start interval=0 timeout=40s \
  op stop interval=0 timeout=60s
 primitive clusterip ocf:heartbeat:IPaddr2 \
  params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
  meta target-role=Started
 ...
 property $id=cib-bootstrap-options \

 dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \
  cluster-infrastructure=corosync \
  stonith-enabled=false \
  no-quorum-policy=ignore
 rsc_defaults $id=rsc-options \
  resource-stickiness=100

 So I've started primary node (pogcupsvr). Configuration was checked and
 ok. Then started the second node (pogcupsvr2). This time all the
 configuration looked ok, no error but when I checked the network
 configuration, eth0 was up on both nodes with same IP address of course,
 instead of having it up only on primary node.

 What I was expected (and in all other tests it was ok ) is that eth0 was
 up only on primary node and used by apache server.
 
  I'm looking for ideas to investigate the causes for such a problem.
  If anybody can help me on this, I would be gratefull
 
  Yes, I think I can help ;-)

 Thanks for that :)
 
 
  Mit freundlichen Grüßen,
 
  Michael Schwartzkopff
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org Getting started:
  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 


 --
 Anne
 http://mageia.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Anne
http://www.mageia.org

Re: [Pacemaker] sbd fencing race

2014-11-26 Thread Dejan Muhamedagic
On Wed, Nov 26, 2014 at 11:13:41AM +0100, emmanuel segura wrote:
 But i would like to know if pacemaker needs to start sbd on the node
 where sbd resource isnt running to fence the other nodes, because i
 don't see any start action in the second node:

That's strange. I'd expect that a stonith resource needs to be
started (enabled) first. Perhaps that changed, as it seems to be
the case judging by the logs below. I cannot offer any more
advice here, but would still like to know the circumstances and
how it happened that the nodes shot each other.

Thanks,

Dejan


 :::
 
 message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do
 NOT have quorum!
 message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker
 health check: UNHEALTHY
 message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice:
 LogActions: Leave   stonith-sbd(Started node01)
 message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB:
 We do NOT have quorum!
 
 :
 
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do
 NOT have quorum!
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker
 health check: UNHEALTHY
 message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN:
 custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable
 (offline)
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery
 process handling /dev/mapper/SBD01B0298700230
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing
 reset to node slot node01
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 
 40
 
 
 
 Thanks
 
 2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:
  Hi,
 
  On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
  Hi list,
 
  The last night, i had a cluster in fencing race using sbd as stonith
 
  Can you give a bit more details.
 
  device, i would like to know what is the effect to use start-delay in
  my stonith resource in this way:
 
  primitive stonith-sbd stonith:external/sbd \
  params sbd_device=/dev/mapper/SBD \
  op start interval=0 start-delay=5
 
  Yes, that could help with a stonith deathmatch. Normally, you
  have a stonith resource running on one node. On split brain, the
  other node also starts the resource in order to shoot the first
  node. That's where start-delay comes into play.
 
  Ultimate resource for the issue: http://ourobengr.com/ha/
 
  Cheers,
 
  Dejan
 
  Thanks
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
 -- 
 esta es mi vida e me la vivo hasta que dios quiera
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with ClusterIP

2014-11-26 Thread Anne Nicolas

Le 26/11/2014 13:43, Michael Schwartzkopff a écrit :

Am Mittwoch, 26. November 2014, 13:22:53 schrieben Sie:

Le 26/11/2014 13:07, Michael Schwartzkopff a écrit :

Am Mittwoch, 26. November 2014, 12:54:20 schrieb Anne Nicolas:

Le 26/11/2014 12:23, Michael Schwartzkopff a écrit :

Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas:

Hi !

I've been using clusterip for a while now without any problem in
Active/Passive clusters (2 nodes).


Could you please explain, how could you use the ClusterIP in an
active/passive cluster? ClusterIP ist for the use in an active/active
cluster. See man iptables and look for the CLUSTERIP target.


Please explain more detailed.
What is your config?
What do you expect the cluster to do?
What really happens?
Where is  the problem?


Maybe my explanation was not that clear.


Yes.


Here is my configuration

crm configuration show

node $id=17435146 pogcupsvr
node $id=34212362 pogcupsvr2
primitive apache ocf:heartbeat:apache \

   params configfile=/etc/httpd/conf/httpd.conf \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s

primitive clusterip ocf:heartbeat:IPaddr2 \

   params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
   meta target-role=Started

...
property $id=cib-bootstrap-options \

dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \

   cluster-infrastructure=corosync \
   stonith-enabled=false \
   no-quorum-policy=ignore

rsc_defaults $id=rsc-options \

   resource-stickiness=100

So I've started primary node (pogcupsvr). Configuration was checked and
ok. Then started the second node (pogcupsvr2). This time all the
configuration looked ok, no error but when I checked the network
configuration, eth0 was up on both nodes with same IP address of course,
instead of having it up only on primary node.


If that is your config than the start of the IP address on BOTH nodes is
really bad. This should not happen and is definitely an error.

BUT: I doubt that this is you complete config, because this would not work
anyway. The cluster would start the IP address on one node and the
Webserver in the other node.

Please paste the complete config. Then the community would be able to
help.


Here is the complete configuration:

node $id=17435146 pogcupsvr
node $id=34212362 pogcupsvr2
primitive apache ocf:heartbeat:apache \
  params configfile=/etc/httpd/conf/httpd.conf \
  op start interval=0 timeout=40s \
  op stop interval=0 timeout=60s
primitive clusterip ocf:heartbeat:IPaddr2 \
  params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \
  meta target-role=Started
primitive drbdserv ocf:linbit:drbd \
  params drbd_resource=server \
  op monitor interval=60s
primitive fsserv ocf:heartbeat:Filesystem \
  params device=/dev/drbd/by-res/server directory=/clusterfs
fstype=ext4
primitive libvirt-guests lsb:libvirt-guests
primitive libvirtd lsb:libvirtd
primitive mysql ocf:heartbeat:mysql \
  params binary=/usr/bin/mysqld_safe config=/etc/my.cnf
datadir=/clusterfs/mysql \
  op start interval=0 timeout=40s \
  op stop interval=0 timeout=60s \
  meta target-role=Started
primitive named lsb:named
primitive samba lsb:smb
group services fsserv clusterip libvirtd samba apache mysql
ms drbdservClone drbdserv \
  meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true
colocation fs_on_drbd inf: fsserv drbdservClone:Master
order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
property $id=cib-bootstrap-options \

dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \
  cluster-infrastructure=corosync \
  stonith-enabled=false \
  no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
  resource-stickiness=100


OK. Config seems to be ok. But I would make the constraints to work on the
group, not on the fsserver. But since the fsserver is the first resource in the
group, everything should be ok.

Now:

This time all the
configuration looked ok, no error but when I checked the network
configuration, eth0 was up on both nodes with same IP address of course,
instead of having it up only on primary node.


Please could you paste the output of the command ip addr list dev eth0 on
both nodes?


In fact I read your message and it just turned on the light in some part 
of my brain... I checked the interface coinfiguration and discovered an 
IP address on both side...
And for sure that could not work. Removing this just did the trick and 
everything is ok now.


Sorry for the noise but thanks for helping in finding this stupid mistake :)



Mit freundlichen Grüßen,

Michael Schwartzkopff




--
Anne
http://mageia.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: 

Re: [Pacemaker] sbd fencing race

2014-11-26 Thread emmanuel segura
I think pacemaker doesn't care about the sbd resource status when it
needs to make a fencing call, that what i think, but i hope some one,
will give me some more information.

Thanks


2014-11-26 15:11 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:
 On Wed, Nov 26, 2014 at 11:13:41AM +0100, emmanuel segura wrote:
 But i would like to know if pacemaker needs to start sbd on the node
 where sbd resource isnt running to fence the other nodes, because i
 don't see any start action in the second node:

 That's strange. I'd expect that a stonith resource needs to be
 started (enabled) first. Perhaps that changed, as it seems to be
 the case judging by the logs below. I cannot offer any more
 advice here, but would still like to know the circumstances and
 how it happened that the nodes shot each other.

 Thanks,

 Dejan


 :::

 message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do
 NOT have quorum!
 message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker
 health check: UNHEALTHY
 message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice:
 LogActions: Leave   stonith-sbd(Started node01)
 message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB:
 We do NOT have quorum!

 :

 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do
 NOT have quorum!
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker
 health check: UNHEALTHY
 message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN:
 custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable
 (offline)
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery
 process handling /dev/mapper/SBD01B0298700230
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing
 reset to node slot node01
 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 
 40

 

 Thanks

 2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:
  Hi,
 
  On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
  Hi list,
 
  The last night, i had a cluster in fencing race using sbd as stonith
 
  Can you give a bit more details.
 
  device, i would like to know what is the effect to use start-delay in
  my stonith resource in this way:
 
  primitive stonith-sbd stonith:external/sbd \
  params sbd_device=/dev/mapper/SBD \
  op start interval=0 start-delay=5
 
  Yes, that could help with a stonith deathmatch. Normally, you
  have a stonith resource running on one node. On split brain, the
  other node also starts the resource in order to shoot the first
  node. That's where start-delay comes into play.
 
  Ultimate resource for the issue: http://ourobengr.com/ha/
 
  Cheers,
 
  Dejan
 
  Thanks
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org



 --
 esta es mi vida e me la vivo hasta que dios quiera

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Lars Marowsky-Bree
On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote:

Okay, okay, apparently we have got enough topics to discuss. I'll
grumble a bit more about Brno, but let's get the organisation of that
thing on track ... Sigh. Always so much work!

I'm assuming arrival on the 3rd and departure on the 6th would be the
plan?

  Personally I'm interested in talking about scaling - with pacemaker-remoted
  and/or a new messaging/membership layer.
 If we're going to talk about scaling, we should throw in our new docker 
 support
 in the same discussion. Docker lends itself well to the pet vs cattle 
 analogy.
 I see management of docker with pacemaker making quite a bit of sense now 
 that we
 have the ability to scale into the cattle territory.

While we're on that, I'd like to throw in a heretic thought and suggest
that one might want to look at etcd and fleetd.

  Other design-y topics:
  - SBD

Point taken. I have actually not forgotten this Andrew, and am reading
your development. I probably just need to pull the code over ...

  - degraded mode
  - improved notifications
  - containerisation of services (cgroups, docker, virt)
  - resource-agents (upstream releases, handling of pull requests, testing)
 
 Yep, We definitely need to talk about the resource-agents.

Agreed.

  User-facing topics could include recent features (ie. pacemaker-remoted,
  crm_resource --restart) and common deployment scenarios (eg. NFS) that
  people get wrong.
 Adding to the list, it would be a good idea to talk about Deployment
 integration testing, what's going on with the phd project and why it's
 important regardless if you're interested in what the project functionally
 does.

OK. So QA is within scope as well. It seems the agenda will fill up
quite nicely.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 
(AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Cluster-devel] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Fabio M. Di Nitto


On 11/26/2014 4:41 PM, Lars Marowsky-Bree wrote:
 On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote:
 
 Okay, okay, apparently we have got enough topics to discuss. I'll
 grumble a bit more about Brno, but let's get the organisation of that
 thing on track ... Sigh. Always so much work!
 
 I'm assuming arrival on the 3rd and departure on the 6th would be the
 plan?

Yes that´s correct. Devconf starts the 6.

Fabio

 
 Personally I'm interested in talking about scaling - with pacemaker-remoted
 and/or a new messaging/membership layer.
 If we're going to talk about scaling, we should throw in our new docker 
 support
 in the same discussion. Docker lends itself well to the pet vs cattle 
 analogy.
 I see management of docker with pacemaker making quite a bit of sense now 
 that we
 have the ability to scale into the cattle territory.
 
 While we're on that, I'd like to throw in a heretic thought and suggest
 that one might want to look at etcd and fleetd.
 
 Other design-y topics:
 - SBD
 
 Point taken. I have actually not forgotten this Andrew, and am reading
 your development. I probably just need to pull the code over ...
 
 - degraded mode
 - improved notifications
 - containerisation of services (cgroups, docker, virt)
 - resource-agents (upstream releases, handling of pull requests, testing)

 Yep, We definitely need to talk about the resource-agents.
 
 Agreed.
 
 User-facing topics could include recent features (ie. pacemaker-remoted,
 crm_resource --restart) and common deployment scenarios (eg. NFS) that
 people get wrong.
 Adding to the list, it would be a good idea to talk about Deployment
 integration testing, what's going on with the phd project and why it's
 important regardless if you're interested in what the project functionally
 does.
 
 OK. So QA is within scope as well. It seems the agenda will fill up
 quite nicely.
 
 
 Regards,
 Lars
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of bare-metal remote nodes

2014-11-26 Thread Vladislav Bogdanov
26.11.2014 18:36, David Vossel wrote:
 
 
 - Original Message -
 25.11.2014 23:41, David Vossel wrote:


 - Original Message -
 Hi!

 is subj implemented?

 Trying echo c  /proc/sysrq-trigger on remote nodes and no fencing occurs.

 Yes, fencing remote-nodes works. Are you certain your fencing devices can
 handle
 fencing the remote-node? Fencing a remote-node requires a cluster node to
 invoke the agent that actually performs the fencing action on the
 remote-node.

 Yes, if I invoke fencing action manually ('crm node fence rnode' in
 crmsh syntax), node is fenced. So the issue seems to be related to the
 detection of a need fencing.

 Comments in related git commits are a little bit terse in this area. So
 could you please explain what exactly needs to happen on a remote node
 to initiate fencing?

 I tried so far:
 * kill pacemaker_remoted when no resources are running. systemd restated
 it and crmd reconnected after some time.
 * crash kernel when no resources are running
 * crash kernel during massive start of resources
 
 this last one should definitely cause fencing. What version of pacemaker are
 you using? I've made changes in this area recently. Can you provide a 
 crm_report.

It's c191bf3.
crm_report is ready, but I still wait an approval from a customer to
send it.


 
 -- David
 

 No fencing happened. In the last case that start actions 'hung' and were
 failed by timeout (it is rather long), node was not even listed as
 failed. My customer asked me to stop crashing nodes because one of them
 does not boot anymore (I like that modern UEFI hardware very much.),
 so it is hard for me to play more with that.

 Best,
 Vladislav



 -- Vossel


 Best,
 Vladislav

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Vladislav Bogdanov
25.11.2014 12:54, Lars Marowsky-Bree wrote:...

 OK, let's switch tracks a bit. What *topics* do we actually have? Can we
 fill two days? Where would we want to collect them?


Just my 2c.

- It would be interesting to get some bird-view information
on what C APIs corosync and pacemaker currently provide to application
developers (one immediate use-case is in-app monitoring of the cluster
events).

- One more (more developer-bounded) topic could be a resource degraded
state support. From the user perspective it would be nice to have. One
immediate example is iscsi connection to several portals. When some
portals are not accessible, connection still may work, but in the
degraded state.

Best,
Vladislav


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of bare-metal remote nodes

2014-11-26 Thread David Vossel


- Original Message -
 26.11.2014 18:36, David Vossel wrote:
  
  
  - Original Message -
  25.11.2014 23:41, David Vossel wrote:
 
 
  - Original Message -
  Hi!
 
  is subj implemented?
 
  Trying echo c  /proc/sysrq-trigger on remote nodes and no fencing
  occurs.
 
  Yes, fencing remote-nodes works. Are you certain your fencing devices can
  handle
  fencing the remote-node? Fencing a remote-node requires a cluster node to
  invoke the agent that actually performs the fencing action on the
  remote-node.
 
  Yes, if I invoke fencing action manually ('crm node fence rnode' in
  crmsh syntax), node is fenced. So the issue seems to be related to the
  detection of a need fencing.
 
  Comments in related git commits are a little bit terse in this area. So
  could you please explain what exactly needs to happen on a remote node
  to initiate fencing?
 
  I tried so far:
  * kill pacemaker_remoted when no resources are running. systemd restated
  it and crmd reconnected after some time.

This should definitely cause the remote-node to be fenced. I tested this
earlier today after reading you were having problems and my setup fenced
the remote-node correctly.

  * crash kernel when no resources are running

If a remote-node connection is lost and pacemaker was able to verify the
node is clean before the connection is lost, pacemaker will attempt to
reconnect to the remote-node without issuing a fencing request.

I could see why both fencing and not fencing in this situation could be desired.
Maybe i should make an option.

  * crash kernel during massive start of resources

This should definitely cause the remote node to be fenced.

  
  this last one should definitely cause fencing. What version of pacemaker
  are
  you using? I've made changes in this area recently. Can you provide a
  crm_report.
 
 It's c191bf3.
 crm_report is ready, but I still wait an approval from a customer to
 send it.

Great. I really need to see what you all are doing. Outside of my own setup I 
have
not seen many setups where pacemaker remote deployed on baremetal nodes. It is 
possible
something in your configuration exposes some edge case I haven't encountered 
yet.

There's a US holiday Thrusday and Friday, so I won't be able to look at this 
until next
week.

-- Vossel

 
  
  -- David
  
 
  No fencing happened. In the last case that start actions 'hung' and were
  failed by timeout (it is rather long), node was not even listed as
  failed. My customer asked me to stop crashing nodes because one of them
  does not boot anymore (I like that modern UEFI hardware very much.),
  so it is hard for me to play more with that.
 
  Best,
  Vladislav
 
 
 
  -- Vossel
 
 
  Best,
  Vladislav
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [ha-wg-technical] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Andrew Beekhof

 On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree l...@suse.com wrote:
 
 On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote:
 
 Okay, okay, apparently we have got enough topics to discuss. I'll
 grumble a bit more about Brno, but let's get the organisation of that
 thing on track ... Sigh. Always so much work!
 
 I'm assuming arrival on the 3rd and departure on the 6th would be the
 plan?
 
 Personally I'm interested in talking about scaling - with pacemaker-remoted
 and/or a new messaging/membership layer.
 If we're going to talk about scaling, we should throw in our new docker 
 support
 in the same discussion. Docker lends itself well to the pet vs cattle 
 analogy.
 I see management of docker with pacemaker making quite a bit of sense now 
 that we
 have the ability to scale into the cattle territory.
 
 While we're on that, I'd like to throw in a heretic thought and suggest
 that one might want to look at etcd and fleetd.

Nod. I suspect the next evolutionary step will be to sit on a NoSQL/Big-data 
kind of table somehow.
I was intending to head down that path last year when I did all that cib work.

 
 Other design-y topics:
 - SBD
 
 Point taken. I have actually not forgotten this Andrew, and am reading
 your development. I probably just need to pull the code over ...

ok

 
 - degraded mode
 - improved notifications
 - containerisation of services (cgroups, docker, virt)
 - resource-agents (upstream releases, handling of pull requests, testing)
 
 Yep, We definitely need to talk about the resource-agents.
 
 Agreed.
 
 User-facing topics could include recent features (ie. pacemaker-remoted,
 crm_resource --restart) and common deployment scenarios (eg. NFS) that
 people get wrong.
 Adding to the list, it would be a good idea to talk about Deployment
 integration testing, what's going on with the phd project and why it's
 important regardless if you're interested in what the project functionally
 does.
 
 OK. So QA is within scope as well. It seems the agenda will fill up
 quite nicely.
 
 
 Regards,
Lars
 
 -- 
 Architect Storage/HA
 SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 
 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 ___
 ha-wg-technical mailing list
 ha-wg-techni...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Suicide fencing and watchdog questions

2014-11-26 Thread Andrew Beekhof

 On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Hi,
 
 Is there any information how watchdog integration is intended to work?
 What are currently-evaluated use-cases for that?
 It seems to be forcibly disabled id SBD is not detected...

Are you referring to no-quorum-policy=suicide?

 
 Also, is there any way to make node (in one-node cluster ;) ) to suicide
 if it detects fencing is required? Technically, that can be done with
 IPMI 'power cycle' or 'power reset' commands - but node (and thus the
 whole cluster) will not know about fencing is succeeded, because if it
 received the answer, then fencing failed. But node will be hard reboot
 and thus cleaned up otherwise.
 
 Best,
 Vladislav
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Suicide fencing and watchdog questions

2014-11-26 Thread Vladislav Bogdanov
27.11.2014 03:43, Andrew Beekhof wrote:
 
 On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 Hi,

 Is there any information how watchdog integration is intended to work?
 What are currently-evaluated use-cases for that?
 It seems to be forcibly disabled id SBD is not detected...
 
 Are you referring to no-quorum-policy=suicide?

That too.

But main intention was to understand what value that feature can bring
at all.
I tried to enable it without SBD or no-quorum-policy=suicide and
watchdog was not fired up. Then I looked at sources and realized that it
is enabled only when SBD is detected, and is not actually managed by the
cluster option.

 

 Also, is there any way to make node (in one-node cluster ;) ) to suicide
 if it detects fencing is required? Technically, that can be done with
 IPMI 'power cycle' or 'power reset' commands - but node (and thus the
 whole cluster) will not know about fencing is succeeded, because if it
 received the answer, then fencing failed. But node will be hard reboot
 and thus cleaned up otherwise.

 Best,
 Vladislav

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org