[ClusterLabs] Antw: Re: Antw: Delayed first monitoring

2015-08-13 Thread Ulrich Windl
 Miloš Kozák milos.ko...@lejmr.com schrieb am 13.08.2015 um 09:56 in
Nachricht
55cc4daa.4020...@lejmr.com:

 
 Dne 13.8.2015 v 09:26 Andrei Borzenkov napsal(a):
 On Thu, Aug 13, 2015 at 10:01 AM, Miloš Kozák milos.ko...@lejmr.com
wrote:
 However,
   this does not make sense at all. Presumably, the pacemaker should get 
 along
 with lsb scripts which comes from system repository, right?

 Let's forget about pacemaker for a moment. You have system startup
 where service B needs service A. initscript for service A completes
 and script for service B is started but service A is not yet ready to
 be used.

 This is a bug in startup script. Irrespectively of whether you use it
 with pacemaker or not.
 
 I am sorry, but I didnt get the point..
 
 If service A is not ready then service B should not be started. 

As you seem to be ignorant for advice:
Yes, you are right: Service B should check whether service A is up before
starzing itself.
The easy change for the start script of B is to find aout what command was run
before it to check whether the command before did everything OK by checking
again itself.

[...]


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-13 Thread Andrei Borzenkov
On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
 And what exactly is your problem?

Real life example. Database resource depends on storage resource(s).
There are multiple filesystems/volumes with database files. Database
admin needs to increase available space. You add new storage,
configure it in cluster ... pooh, your database is restarted. There is
zero need to restart database because it does not even use new
resource yet.

I do the above routinely with other cluster implementation without any
visible impact.

If you change a resource, 
 it will be
 restrted, and if a resource is restarted, constraints will be followed...

 Despite of that: If I understand your configuration correctly, it's very much
 the same as

 resource_group
   ip1
   ip2
   apache1

 Regards,
 Ulrich

 John Gogu ionut.g...@gmail.com schrieb am 12.08.2015 um 18:35 in
 Nachricht
 CAMESV9DUj3owj16oT5DSYjxZWeZX1f5wV63=muyta3vv0kk...@mail.gmail.com:
 Hello,
 in my cluster configuration I have following situation:

 resource_group_A
ip1
ip2
 resource_group_B
apache1

 ordering constraint resource_group_A then resource_group_B symetrical=true

 When I add a new resource from group_A, resources from group_B are
 restarted. If I remove constraint all ok but I need to keep this ordering
 constraint.


 John







 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Antw: Ordering constraint restart second resource group

2015-08-13 Thread Ulrich Windl
 Andrei Borzenkov arvidj...@gmail.com schrieb am 13.08.2015 um 11:33 in
Nachricht
CAA91j0WaYcPPNCtMZnwDz4_QDFWgxPrO6DbB=ga3bv+_ooo...@mail.gmail.com:
 On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl
 ulrich.wi...@rz.uni-regensburg.de wrote:
 And what exactly is your problem?
 
 Real life example. Database resource depends on storage resource(s).
 There are multiple filesystems/volumes with database files. Database
 admin needs to increase available space. You add new storage,
 configure it in cluster ... pooh, your database is restarted. There is
 zero need to restart database because it does not even use new
 resource yet.
 
 I do the above routinely with other cluster implementation without any
 visible impact.

Hi!

So maybe that's why planning usually is done before implementation. We had 
similar problems with our NFS exports, and we redesigned to speed up the 
resource start and stop times, as well as allowing changes without restarting 
the NFS server (and most importantly) local (dependent) client resources...

What you could do: Add your new IP resource with a location constraint only (it 
will be started on the right node then). Then put the IP resource into the 
group, and the cluster will see that every resource is in the desired state, 
and nothing will be restarted.

Regards,
Ulrich

 
If you change a resource, 
 it will be
 restrted, and if a resource is restarted, constraints will be followed...

 Despite of that: If I understand your configuration correctly, it's very 
 much
 the same as

 resource_group
   ip1
   ip2
   apache1

 Regards,
 Ulrich

 John Gogu ionut.g...@gmail.com schrieb am 12.08.2015 um 18:35 in
 Nachricht
 CAMESV9DUj3owj16oT5DSYjxZWeZX1f5wV63=muyta3vv0kk...@mail.gmail.com:
 Hello,
 in my cluster configuration I have following situation:

 resource_group_A
ip1
ip2
 resource_group_B
apache1

 ordering constraint resource_group_A then resource_group_B symetrical=true

 When I add a new resource from group_A, resources from group_B are
 restarted. If I remove constraint all ok but I need to keep this ordering
 constraint.


 John







 ___
 Users mailing list: Users@clusterlabs.org 
 http://clusterlabs.org/mailman/listinfo/users 

 Project Home: http://www.clusterlabs.org 
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
 Bugs: http://bugs.clusterlabs.org 
 
 ___
 Users mailing list: Users@clusterlabs.org 
 http://clusterlabs.org/mailman/listinfo/users 
 
 Project Home: http://www.clusterlabs.org 
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
 Bugs: http://bugs.clusterlabs.org 





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] stonithd: stonith_choose_peer: Couldn't find anyone to fence node with any

2015-08-13 Thread Kostiantyn Ponomarenko
Hi,

Brief description of the STONITH problem:

I see two different behaviors with two different STONITH configurations. If
Pacemaker cannot find a device that can STONITH a problematic node, the
node remains up and running. Which is bad, because it must be STONITHed.
As opposite to it, if Pacemaker finds a device that, it thinks, can STONITH
a problematic node, even if the device actually cannot, Pacemaker goes down
after STONITH returns false positive. The Pacemaker shutdowns itself right
after STONITH.
Is it the expected behavior?
Do I need to configure a two more STONITH agents for just rebooting nodes
on which they are running (e.g. with # reboot -f)?



+-
+ Set-up:
+-
- two node cluster (node-0 and node-1);
- two fencing (STONITH) agents are configured (STONITH_node-0 and
STONITH_node-1).
- STONITH_node-0 runs only on node-1 // this fencing agent can only
fence node-0
- STONITH_node-1 runs only on node-0 // this fencing agent can only
fence node-1

+-
+ Environment:
+-
- one node - node-0 - is up and running;
- one STONITH agent - STONITH_node-1 - is up and running

+-
+ Test case:
+-
Simulate error of stopping a resource.
1. start cluster
2. change a RA's script to return $OCF_ERR_GENERIC from Stop function.
3. stop the resource by # crm resource stop resource

+-
+ Actual behavior:
+-

CASE 1:
STONITH is configured with:
# crm configure primitive STONITH_node-1 stonith:fence_sbb_hw \
params pcmk_host_list=node-1 pcmk_host_check=static-list

After issuing a stop command:
- the resource changes its state to FAILED
- Pacemaker remains working

See below LOG_snippet_1 section.


CASE 2:
STONITH is configured with:
# crm configure primitive STONITH_node-1 stonith:fence_sbb_hw

After issuing a stop command:
- the resource changes its state to FAILED
- Pacemaker stops working

See below LOG_snippet_2 section.


+-
+ LOG_snippet_1:
+-
Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd:   notice: handle_request:
Client crmd.39210.fa40430f wants to fence (reboot) 'node-0' with device
'(any)'
Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd:   notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
node-0: 18cc29db-b7e4-4994-85f1-df891f091a0d (0)

Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd:   notice:
can_fence_host_with_device: STONITH_node-1 can not fence (reboot)
node-0: static-list

Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd:   notice:
stonith_choose_peer:Couldn't find anyone to fence node-0 with any
Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd: info:
call_remote_stonith:Total remote op timeout set to 60 for fencing of
node node-0 for crmd.39210.18cc29db
Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd: info:
call_remote_stonith:None of the 1 peers have devices capable of
terminating node-0 for crmd.39210 (0)

Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd:  warning:
get_xpath_object:   No match for //@st_delegate in /st-reply
Aug 12 16:42:47 [39206] A6-4U24-402-T   stonithd:error: remote_op_done:
Operation reboot of node-0 by node-0 for crmd.39210@node-0.18cc29db: No
such device

Aug 12 16:42:47 [39210] A6-4U24-402-T   crmd:   notice:
tengine_stonith_callback:   Stonith operation
3/23:16:0:0856a484-6b69-4280-b93f-1af9a6a542ee: No such device (-19)
Aug 12 16:42:47 [39210] A6-4U24-402-T   crmd:   notice:
tengine_stonith_callback:   Stonith operation 3 for node-0 failed (No such
device): aborting transition.
Aug 12 16:42:47 [39210] A6-4U24-402-T   crmd: info:
abort_transition_graph: Transition aborted: Stonith failed
(source=tengine_stonith_callback:697, 0)
Aug 12 16:42:47 [39210] A6-4U24-402-T   crmd:   notice:
tengine_stonith_notify: Peer node-0 was not terminated (reboot) by
node-0 for node-0: No such device


+-
+ LOG_snippet_2:
+-
Aug 11 16:09:42 [9005] A6-4U24-402-T   stonithd:   notice: handle_request:
 Client crmd.9009.cabd2154 wants to fence (reboot) 'node-0' with device
'(any)'
Aug 11 16:09:42 [9005] A6-4U24-402-T   stonithd:   notice:
initiate_remote_stonith_op:  Initiating remote operation reboot for node-0:
3b06d3ce-b100-46d7-874e-96f10348d9e4 (0)

Aug 11 16:09:42 [9005] A6-4U24-402-T   stonithd:   notice:
can_fence_host_with_device:  STONITH_node-1 can fence (reboot) node-0: none

Aug 11 16:09:42 [9005] A6-4U24-402-T   stonithd: info:
call_remote_stonith: Total remote op timeout set to 60 for fencing of
node node-0 for crmd.9009.3b06d3ce
Aug 11 16:09:42 [9005] A6-4U24-402-T   stonithd: info:
call_remote_stonith: Requesting that node-0 perform op reboot node-0
for crmd.9009 (72s)

Aug 11 16:09:42 [9005] A6-4U24-402-T   stonithd:   

Re: [ClusterLabs] Antw: Re: Antw: Delayed first monitoring

2015-08-13 Thread Jan Pokorný
On 13/08/15 10:38 +0200, Ulrich Windl wrote:
 Miloš Kozák milos.ko...@lejmr.com schrieb am 13.08.2015 um 09:56 in
 Nachricht 55cc4daa.4020...@lejmr.com:
 
 
 Dne 13.8.2015 v 09:26 Andrei Borzenkov napsal(a):
 On Thu, Aug 13, 2015 at 10:01 AM, Miloš Kozák milos.ko...@lejmr.com
 wrote:
 However, this does not make sense at all. Presumably, the
 pacemaker should get along with lsb scripts which comes from
 system repository, right?
 
 Let's forget about pacemaker for a moment. You have system startup
 where service B needs service A. initscript for service A completes
 and script for service B is started but service A is not yet ready to
 be used.
 
 This is a bug in startup script. Irrespectively of whether you use it
 with pacemaker or not.
 
 I am sorry, but I didnt get the point..
 
 If service A is not ready then service B should not be started. 
 
 As you seem to be ignorant for advice:
 Yes, you are right: Service B should check whether service A is up before
 starzing itself.
 The easy change for the start script of B is to find aout what command was run
 before it to check whether the command before did everything OK by checking
 again itself.
 
 [...]

The harder task for the sketched, relaxed (not strictly serialized, at
least per prerequisite-ordering) environment is for service B aware of
its prerequisite-ordered predecessor A to (also) decide if A is not by
any chance just proceeding with a startup sequence -- something
requiring a very detailed knowledge of its internals and being
prone to race-conditions anyway.

Hence reasonable, high-level, init systems require such startup
sequences to be completely finished by the time they acknowledge
service at hand as started and allow prerequisite-ordered successor
to join the game too.  Consequently, the responsibility for such
is finished with startup (successfully or not)? is deferred to the
lower-level dedicated startup recipes that should then signal this
back to the init system (e.g., by finishing only when the startup
is over) credibly to prevent mess ups.

Going full circle, if such assumption is broken in httpd initscript,
it should be fixed.

-- 
Jan (Poki)


pgpYUzkQvpoLi.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] implementation of fence and stonith agents for pacemaker

2015-08-13 Thread Kostiantyn Ponomarenko
Digimer,

Thank you. I will try this out.
One more question. What about directories for those agents, what rules are
here?

Thank you,
Kostya

On Tue, Aug 11, 2015 at 6:21 PM, Digimer li...@alteeve.ca wrote:

 On 11/08/15 11:17 AM, Kostiantyn Ponomarenko wrote:
  Hi guys,
 
  Is there any documentation which describes implementation of fence and
  STONITH agents like those ones for Resource Agents?:
  http://www.linux-ha.org/wiki/OCF_Resource_Agents
  http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html
 
  I am particular interested in the arguments which are passed to a
  stonith resource by stonithd.
  Is there any guidelines what arguments it has to handle and where it
  must be put (which directories are allowed)?
 
  So far I found this http://linux.die.net/man/7/stonithd .
  But for example, it is not clear for me how
  pcmk_host_check=dynamic-list which is (query the device) works. Do I
  need to handle some action in my stonith agent for that parameter?
 
 
  Thank you,
  Kostya

 This is the API;

 https://fedorahosted.org/cluster/wiki/FenceAgentAPI

 It needs to be updated to reflect the need for agents to output the XML
 metadata. For now, you should be able to see the format needed by
 looking at the metadata output of existing FAs.

 --
 Digimer
 Papers and Projects: https://alteeve.ca/w/
 What if the cure for cancer is trapped in the mind of a person without
 access to education?

 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonithd: stonith_choose_peer: Couldn't find anyone to fence node with any

2015-08-13 Thread Kostiantyn Ponomarenko
 Then make sure it can be stonithd. Add additional stonith agent using
 independent communication channel.

Not possible. Only one node up and running in the cluster and I am
wondering - can it STONITH itself? Because most likely, after reboot, the
problem can be gone.

 I have no idea what fence_sbb_hw is or does

That just reboots the peer. It is our specific STONITH agent.

 What this node does by itself really does not matter.

What if at some point there is only one node in the cluster?
In the solution am I working on there are two nodes form the cluster.
And it is possible to use this solution even with only one node.

I am satisfied with the CASE 2 where Pacemaker shutdowns itself after
calling STONITH, despite that stonith agent didn't reboot the needed node
but returned false positive.
The only question is why this doesn't happen in CASE 1?


Thank you,
Kostya
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] question on www.gossamer-threads.com/lists/linuxha/users/

2015-08-13 Thread Kostiantyn Ponomarenko
Hi,

I noticed that after moving to the new mailing list there is no more
updates here:
http://www.gossamer-threads.com/lists/linuxha/users/

Can it be fixed or am I missing something? I was a convenient way of
searching/reading/tracking issues.

Thank you,
Kostya
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] question on www.gossamer-threads.com/lists/linuxha/users/

2015-08-13 Thread Stefan Bauer
DOH! Please ignore my mail - i live in the past ;)

Last mail in archive is from Jul.



Stefan

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: stonithd: stonith_choose_peer: Couldn't find anyone to fence node with any

2015-08-13 Thread Ulrich Windl
 Kostiantyn Ponomarenko konstantin.ponomare...@gmail.com schrieb am 
 13.08.2015
um 13:39 in Nachricht
caenth0fxlzwzw4jmoyk_go0w9o6e2gdd-zfdfohzrahwcgv...@mail.gmail.com:
 Hi,
 
 Brief description of the STONITH problem:
 
 I see two different behaviors with two different STONITH configurations. If
 Pacemaker cannot find a device that can STONITH a problematic node, the
 node remains up and running. Which is bad, because it must be STONITHed.

Correct observation. I wonder whether cloning a STONITH resource would help; 
for a symmetric STONITH like SBD any node can fence any other node at the same 
time. Still pacemaker waits for the stonith resource (wich is something 
different than SBD) is confirmed running on one node (hard to get if one node 
with the STONITH resource in a two-node cluster went down unexpectedly).

 As opposite to it, if Pacemaker finds a device that, it thinks, can STONITH
 a problematic node, even if the device actually cannot, Pacemaker goes down
 after STONITH returns false positive. The Pacemaker shutdowns itself right
 after STONITH.
 Is it the expected behavior?

I'd surprised if it were.

 Do I need to configure a two more STONITH agents for just rebooting nodes
 on which they are running (e.g. with # reboot -f)?

Good question ;-)

[...]




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Delayed first monitoring

2015-08-13 Thread Digimer
On 13/08/15 04:38 AM, Ulrich Windl wrote:
 Miloš Kozák milos.ko...@lejmr.com schrieb am 13.08.2015 um 09:56 in
 Nachricht
 55cc4daa.4020...@lejmr.com:
 

 Dne 13.8.2015 v 09:26 Andrei Borzenkov napsal(a):
 On Thu, Aug 13, 2015 at 10:01 AM, Miloš Kozák milos.ko...@lejmr.com
 wrote:
 However,
   this does not make sense at all. Presumably, the pacemaker should get 
 along
 with lsb scripts which comes from system repository, right?

 Let's forget about pacemaker for a moment. You have system startup
 where service B needs service A. initscript for service A completes
 and script for service B is started but service A is not yet ready to
 be used.

 This is a bug in startup script. Irrespectively of whether you use it
 with pacemaker or not.

 I am sorry, but I didnt get the point..

 If service A is not ready then service B should not be started. 
 
 As you seem to be ignorant for advice:

Ok, I'm starting to get annoyed now. You need to be more polite and
respectful on this list.

 Yes, you are right: Service B should check whether service A is up before
 starzing itself.
 The easy change for the start script of B is to find aout what command was run
 before it to check whether the command before did everything OK by checking
 again itself.
 
 [...]
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] implementation of fence and stonith agents for pacemaker

2015-08-13 Thread Digimer
On 13/08/15 07:54 AM, Kostiantyn Ponomarenko wrote:
 Digimer,
 
 Thank you. I will try this out.
 One more question. What about directories for those agents, what rules
 are here?
 
 Thank you,
 Kostya

I'm not entirely sure I understand the question, sorry. What do you mean
by directories for those agents? If you're asking about implementation
details like language to use, etc, there are no rules. Python and bash
are the most common languages, I think, but I write my fence agents in
perl just fine. I think a couple are even in C.

I suspect that python is the language upstream maintainer are happier
with, but as beekhof said in the RA script; the person doing the work
gets to make the decisions. :)

If I didn't answer your question, please clarify.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] implementation of fence and stonith agents for pacemaker

2015-08-13 Thread Kostiantyn Ponomarenko
Sorry, I should be more clear.
I mean the place where I must put my agent so it is visible to the cluster.
For example I know that you need to put your agent into /usr/sbin/ and
start its name with fence_ in order to get it visible to the cluster.
So I want to know the rules, are there other places which I also can put my
agent in and get it visible to the cluster?

Thank you,
Kostya

On Thu, Aug 13, 2015 at 5:34 PM, Digimer li...@alteeve.ca wrote:

 On 13/08/15 07:54 AM, Kostiantyn Ponomarenko wrote:
  Digimer,
 
  Thank you. I will try this out.
  One more question. What about directories for those agents, what rules
  are here?
 
  Thank you,
  Kostya

 I'm not entirely sure I understand the question, sorry. What do you mean
 by directories for those agents? If you're asking about implementation
 details like language to use, etc, there are no rules. Python and bash
 are the most common languages, I think, but I write my fence agents in
 perl just fine. I think a couple are even in C.

 I suspect that python is the language upstream maintainer are happier
 with, but as beekhof said in the RA script; the person doing the work
 gets to make the decisions. :)

 If I didn't answer your question, please clarify.

 --
 Digimer
 Papers and Projects: https://alteeve.ca/w/
 What if the cure for cancer is trapped in the mind of a person without
 access to education?

 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] implementation of fence and stonith agents for pacemaker

2015-08-13 Thread Kostiantyn Ponomarenko
Thank you for the help :-)
On Aug 13, 2015 20:19, Digimer li...@alteeve.ca wrote:

 Ah, yes. If it's a RHEL/CentOS machine, put it in /usr/sbin/. If it's
 another OS, locate fence_ipmilan and put your agent in the same directory.

 digimer

 On 13/08/15 01:03 PM, Kostiantyn Ponomarenko wrote:
  Sorry, I should be more clear.
  I mean the place where I must put my agent so it is visible to the
 cluster.
  For example I know that you need to put your agent into /usr/sbin/ and
  start its name with fence_ in order to get it visible to the cluster.
  So I want to know the rules, are there other places which I also can put
  my agent in and get it visible to the cluster?
 
  Thank you,
  Kostya
 
  On Thu, Aug 13, 2015 at 5:34 PM, Digimer li...@alteeve.ca
  mailto:li...@alteeve.ca wrote:
 
  On 13/08/15 07:54 AM, Kostiantyn Ponomarenko wrote:
   Digimer,
  
   Thank you. I will try this out.
   One more question. What about directories for those agents, what
 rules
   are here?
  
   Thank you,
   Kostya
 
  I'm not entirely sure I understand the question, sorry. What do you
 mean
  by directories for those agents? If you're asking about
 implementation
  details like language to use, etc, there are no rules. Python and
 bash
  are the most common languages, I think, but I write my fence agents
 in
  perl just fine. I think a couple are even in C.
 
  I suspect that python is the language upstream maintainer are happier
  with, but as beekhof said in the RA script; the person doing the work
  gets to make the decisions. :)
 
  If I didn't answer your question, please clarify.
 
  --
  Digimer
  Papers and Projects: https://alteeve.ca/w/
  What if the cure for cancer is trapped in the mind of a person
 without
  access to education?
 
  ___
  Users mailing list: Users@clusterlabs.org mailto:
 Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users
 
  Project Home: http://www.clusterlabs.org
  Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
 
  ___
  Users mailing list: Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 


 --
 Digimer
 Papers and Projects: https://alteeve.ca/w/
 What if the cure for cancer is trapped in the mind of a person without
 access to education?

 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: stonithd: stonith_choose_peer: Couldn't find anyone to fence node with any

2015-08-13 Thread Andrew Beekhof

 On 13 Aug 2015, at 11:36 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
 wrote:
 
 Kostiantyn Ponomarenko konstantin.ponomare...@gmail.com schrieb am 
 13.08.2015
 um 13:39 in Nachricht
 caenth0fxlzwzw4jmoyk_go0w9o6e2gdd-zfdfohzrahwcgv...@mail.gmail.com:
 Hi,
 
 Brief description of the STONITH problem:
 
 I see two different behaviors with two different STONITH configurations. If
 Pacemaker cannot find a device that can STONITH a problematic node, the
 node remains up and running. Which is bad, because it must be STONITHed.
 
 Correct observation. I wonder whether cloning a STONITH resource would help;

no

 for a symmetric STONITH like SBD any node can fence any other node at the 
 same time. Still pacemaker waits for the stonith resource (wich is something 
 different than SBD) is confirmed running on one node (hard to get if one node 
 with the STONITH resource in a two-node cluster went down unexpectedly).
 
 As opposite to it, if Pacemaker finds a device that, it thinks, can STONITH
 a problematic node, even if the device actually cannot, Pacemaker goes down
 after STONITH returns false positive. The Pacemaker shutdowns itself right
 after STONITH.
 Is it the expected behavior?
 
 I'd surprised if it were.
 
 Do I need to configure a two more STONITH agents for just rebooting nodes
 on which they are running (e.g. with # reboot -f)?
 
 Good question ;-)
 
 [...]
 
 
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Memory leak in crm_mon ?

2015-08-13 Thread Attila Megyeri


-Original Message-
From: Andrew Beekhof [mailto:and...@beekhof.net] 
Sent: Tuesday, August 11, 2015 2:49 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 
users@clusterlabs.org
Subject: Re: [ClusterLabs] Memory leak in crm_mon ?


 On 10 Aug 2015, at 5:33 pm, Attila Megyeri amegy...@minerva-soft.com wrote:
 
 Hi!
  
 We are building a new cluster on top of pacemaker/corosync and several times 
 during the past days we noticed that „crm_mon -Af” used up all the 
 memory+swap and caused high CPU usage. Killing the process solves the issue.
  
 We are using the binary package versions available in the latest ubuntu 
 trusty, namely:
  
 crmsh  1.2.5+hg1034-1ubuntu4  

 pacemaker
 1.1.10+git20130802-1ubuntu2.3  
 pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3  
 corosync 2.3.3-1ubuntu1   
  
 Kernel is 3.13.0-46-generic
  
 Looking back some „atop” data, the CPU went to 100% many times during the 
 last couple of days, at various times, more often around midnight exaclty 
 (strange).
  
 08.05 14:00
 08.06 21:41
 08.07 00:00
 08.07 00:00
 08.08 00:00
 08.09 06:27
  
 Checked the corosync log and syslog, but did not find any correlation between 
 the entries int he logs around the specific times.
 For most of the time, the node running the crm_mon was the DC as well – not 
 running any resources (e.g. a pairless node for quorum).
  
  
 We have another running system, where everything works perfecly, whereas it 
 is almost the same:
  
 crmsh  1.2.5+hg1034-1ubuntu4  
 
 pacemaker
 1.1.10+git20130802-1ubuntu2.1 
 pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 
 corosync 2.3.3-1ubuntu1  
  
 Kernel is 3.13.0-8-generic
  
  
 Is this perhaps a known issue?

Possibly, that version is over 2 years old.

 Any hints?

Getting something a little more recent would be the best place to start

Thanks Andew,

I tried to upgrade to 1.1.12 using the packages availabe at 
https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a 
single node, to see how it works out but I ended up with errors like

Could not establish cib_rw connection: Connection refused (111)

I have disabled the firewall, no changes. The node appears to be running but 
does not see any of the other nodes. On the other nodes I see this node as an 
UNCLEAN one. (I assume corosync is fine, but pacemaker not)
I use udpu for the transport.

Am I doing something wrong? I tried to look for some howtos on upgrade, but the 
only thing I found was the rather outdated   http://clusterlabs.org/wiki/Upgrade

Could you please direct me to some howto/guide on how to perform the upgrade?

Or am I facing some compatibility issue, so I should extract the whole cib, 
upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is 
meant to go live in 2 days... :) )

Thanks a lot in advance




  
 Thanks!
 ___
 Users mailing list: Users@clusterlabs.org 
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org 
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org