[ClusterLabs] Node attributes

2016-05-18 Thread ‪H Yavari‬ ‪
Hi,
How can I define a constraint for two resource based on one nodes attribute? 

For example resource X and Y are co-located based on node attribute Z.


Regards,H.Yavari
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-18 Thread Ken Gaillot
On 05/18/2016 01:15 AM, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 17.05.2016 um 16:53 in 
 Nachricht
> <573b3074.1040...@redhat.com>:
>> On 05/17/2016 04:07 AM, Nikhil Utane wrote:
>>> What I would like to understand is how much total shared memory
>>> (approximately) would Pacemaker need so that accordingly I can define
>>> the partition size. Currently it is 300 MB in our system. I recently ran
>>> into insufficient shared memory issue because of improper clean-up. So
>>> would like to understand how much Pacemaker would need for a 6-node
>>> cluster so that accordingly I can increase it.
>>
>> I have no idea :-)
> 
> A related question would be: What's in those segments? "strings" indicates 
> that there is a lot of XML in those segments, and I as programmer who's first 
> computer had 400 bytes of RAM wonder whether that is really needed... Aren't 
> there more efficient representations for information exchange?

That design choice was way before my time, so I can't speak to the
reasons. I'm guessing it was an easy way to ensure compatibility across
nodes with different OSes, software versions and machine endianness.

>>
>> I don't think there's any way to pre-calculate it. The libqb library is
>> the part of the software stack that actually manages the shared memory,
>> but it's used by everything -- corosync (including its cpg and
>> votequorum components) and each pacemaker daemon.
>>
>> The size depends directly on the amount of communication activity in the
>> cluster, which is only indirectly related to the number of
>> nodes/resources/etc., the size of the CIB, etc. A cluster with nodes
>> joining/leaving frequently and resources moving around a lot will use
>> more shared memory than a cluster of the same size that's quiet. Cluster
>> options such as cluster-recheck-interval would also matter.
>>
>> Practically, I think all you can do is simulate expected cluster
>> configurations and loads, and see what it comes out to be.
> [...]
> 
> 
> Regards,
> Ulrich
> 
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] unable to start fence_scsi

2016-05-18 Thread Ken Gaillot
On 05/18/2016 05:21 AM, Marco A. Carcano wrote:
> Hi Ken,
> 
> by the way I’ve just also tried with pacemaker 1.1.14 (I builded it from 
> sources into a new RPM) but it doesn’t work
> 
> 
>> On 18 May 2016, at 11:29, Marco A. Carcano  wrote:
>>
>> Hi Ken,
>>
>> thank you for the reply
>>
>> I tried as you suggested, and now the stonith devices tries to start but 
>> fails.
>>
>> I tried this
>>
>> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
>> apache-up002.ring0 apache-up003.ring0" 
>> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
>> apache-up002.ring1=apache-up002.ring0; 
>> apache-up003.ring1=apache-up003.ring0" pcmk_reboot_action="off" 
>> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
>> provides="unfencing"  op monitor interval=60s
>>
>> and even this, adding pcmk_monitor_action="metadata” as suggested in a post 
>> on RH knowledge base (even if the error was quite different)

Avoid that -- it's a last resort for a fence agent with a missing or
broken monitor action. If the fence agent is properly written, you're
just glossing over real errors.

>> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
>> apache-up002.ring0 apache-up003.ring0" 
>> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
>> apache-up002.ring1=apache-up002.ring0; 
>> apache-up003.ring1=apache-up003.ring0" pcmk_reboot_action="off" 
>> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
>> provides="unfencing" pcmk_monitor_action="metadata"  op monitor interval=60s
>>
>> I’m using CentOS 7.2, pacemaker-1.1.13-10  resource-agents-3.9.5-54 and 
>> fence-agents-scsi-4.0.11-27
>>
>> the error message are  Couldn't find anyone to fence (on) apache-up003.ring0 
>> with any device anderror: Operation on of apache-up003.ring0 by  
>> for crmd.15918@apache-up001.ring0.0599387e: No such device

I'm not sure why that would happen. You can try:

* fence_scsi -o metadata

Make sure "on" is in the list of supported actions. The stock one does,
but just to be sure you don't have a modified version ...

* stonith_admin -L

Make sure "scsi" is in the output (list of configured fence devices).

* stonith_admin -l apache-up003.ring0

to see what devices the cluster thinks can fence that node

* Does the cluster status show the fence device running on some node?
Does it list any failed actions?


>> Thanks
>>
>> Marco
>>
>>
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: State transition S_IDLE 
>> -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
>> origin=abort_transition_graph ]
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: On loss of CCM Quorum: 
>> Ignore
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
>> apache-up001.ring0: node discovery
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
>> apache-up002.ring0: node discovery
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
>> apache-up003.ring0: node discovery
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Start   
>> scsia#011(apache-up001.ring0)
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Calculated Transition 
>> 11: /var/lib/pacemaker/pengine/pe-input-95.bz2
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
>> operation (11) on apache-up003.ring0 (timeout=6)
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 9: 
>> probe_complete probe_complete-apache-up003.ring0 on apache-up003.ring0 - no 
>> waiting
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
>> operation (8) on apache-up002.ring0 (timeout=6)
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 6: 
>> probe_complete probe_complete-apache-up002.ring0 on apache-up002.ring0 - no 
>> waiting
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
>> operation (5) on apache-up001.ring0 (timeout=6)
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
>> crmd.15918.697c495e wants to fence (on) 'apache-up003.ring0' with device 
>> '(any)'
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
>> operation on for apache-up003.ring0: 0599387e-0a30-4e1b-b641-adea5ba2a4ad (0)
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
>> crmd.15918.697c495e wants to fence (on) 'apache-up002.ring0' with device 
>> '(any)'
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
>> operation on for apache-up002.ring0: 76aba815-280e-491a-bd17-40776c8169e9 (0)
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 3: 
>> probe_complete probe_complete-apache-up001.ring0 on apache-up001.ring0 
>> (local) - no waiting
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
>> crmd.15918.697c495e wants to fence (on) 'apache-up001.ring0' with device 
>> '(any)'
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
>> op

Re: [ClusterLabs] Two related Cluster

2016-05-18 Thread ‪H Yavari‬ ‪
Hi again,
I'm confused with attributes and rules. I was searching for dummies agents and 
I see the remote solution.You're are right. This is not advised for 
production.Now, I know that, physical servers should be map to services then I 
constraint services to other resource.
I couldn't find my way.
Regards,H.Yavari


  From: Klaus Wenninger 
 To: users@clusterlabs.org 
 Sent: Wednesday, 18 May 2016, 16:55:17
 Subject: Re: [ClusterLabs] Two related Cluster
   
On 05/18/2016 01:57 PM, ‪H Yavari‬ ‪ wrote:
> Hi,
>
> Thank you for reply.
> I tested the first method "multi-site cluster". it was ok with ticket
> manually assignment. But I had issues with running Booth.
> I tested second method "constraints and attributes". I make a cluster
> with 4 nodes and define some constraints for nodes. But I have
> problems with node relations now.
> I was searching in the docs then I found
> "http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Remote/ch05.html";.
> I thinks this is very close to my answer. Do you offer this solution?

Going with pacemaker-remote would be a 3rd option.
Could imagine to have the 2 nodes with the master/slave-resource be the
full-fledged pacemaker-nodes
and the 2 other nodes would then be remote-nodes.
On the other hand this would end up in a 2-node-cluster which is bad for
quorum and thus should
be avoided whenever possible - especially if you have enough nodes anyway.

Thinking in the direction of making the remote-node-resources both tied
to one of the full nodes each
and collocated with the master role. And your service would then be tied
to the remote nodes.
Sounds like something funny to play with although I've never set up
anything like this ;-)
But as pacemaker-remote is not broadly used this way it is probably not
advisable to use that in a
production environment.

Or how did you have in mind to leverage pacemaker-remote for your scenario?

>
> Regards,
> H.Yavari
>
>
> 
> *From:* Kristoffer Grönlund 
> *To:* ‪H Yavari‬ ‪ ; Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Sent:* Wednesday, 18 May 2016, 10:36:39
> *Subject:* Re: [ClusterLabs] Two related Cluster
>
> ‪H Yavari‬ ‪ mailto:hyav...@rocketmail.com>>
> writes:
>
> > Hi,
> > So you think for this solution Booth is better or attribute method?
> I'm not familiar with them so can you share your experiences with
> them?Many thanks.
> >
>
> I think a single cluster using node attributes should be a lot easier to
> understand and maintain, so I'd recommend that solution if it works out
> for you.
>
>
> Cheers,
> Kristoffer
>
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com 
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


  ___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] fence_sanlock and pacemaker

2016-05-18 Thread Da Shi Cao
Hello everybody,
After some try and error, fence_sanlock can be used as a stonith resource in 
pacemaker+corosync.
1. Add a "monitor" action, which is exactly the same action as "status".
2. Make "status" action return "false" if a resource belongs to a host is 
acquired and owned by another host. It returned "true" erroneously since it 
didn't make a test on the owner id of a resource in version 3.3.0.
3. Make fence_sanlockd try for several times before it failed if the resource 
for a host is owned by another host. This gives a time window for the resource 
to be released manually at the other host.

Sometimes a resource of a host get locked permanently by another host if the 
"off" action failed, often in time out.

Best Regards
Dashi Cao

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Two related Cluster

2016-05-18 Thread Klaus Wenninger
On 05/18/2016 01:57 PM, ‪H Yavari‬ ‪ wrote:
> Hi,
>
> Thank you for reply.
> I tested the first method "multi-site cluster". it was ok with ticket
> manually assignment. But I had issues with running Booth.
> I tested second method "constraints and attributes". I make a cluster
> with 4 nodes and define some constraints for nodes. But I have
> problems with node relations now.
> I was searching in the docs then I found
> "http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Remote/ch05.html";.
> I thinks this is very close to my answer. Do you offer this solution?

Going with pacemaker-remote would be a 3rd option.
Could imagine to have the 2 nodes with the master/slave-resource be the
full-fledged pacemaker-nodes
and the 2 other nodes would then be remote-nodes.
On the other hand this would end up in a 2-node-cluster which is bad for
quorum and thus should
be avoided whenever possible - especially if you have enough nodes anyway.

Thinking in the direction of making the remote-node-resources both tied
to one of the full nodes each
and collocated with the master role. And your service would then be tied
to the remote nodes.
Sounds like something funny to play with although I've never set up
anything like this ;-)
But as pacemaker-remote is not broadly used this way it is probably not
advisable to use that in a
production environment.

Or how did you have in mind to leverage pacemaker-remote for your scenario?

>
> Regards,
> H.Yavari
>
>
> 
> *From:* Kristoffer Grönlund 
> *To:* ‪H Yavari‬ ‪ ; Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Sent:* Wednesday, 18 May 2016, 10:36:39
> *Subject:* Re: [ClusterLabs] Two related Cluster
>
> ‪H Yavari‬ ‪ mailto:hyav...@rocketmail.com>>
> writes:
>
> > Hi,
> > So you think for this solution Booth is better or attribute method?
> I'm not familiar with them so can you share your experiences with
> them?Many thanks.
> >
>
> I think a single cluster using node attributes should be a lot easier to
> understand and maintain, so I'd recommend that solution if it works out
> for you.
>
>
> Cheers,
> Kristoffer
>
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com 
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Two related Cluster

2016-05-18 Thread ‪H Yavari‬ ‪
Hi,
Thank you for reply. 
I tested the first method "multi-site cluster". it was ok with ticket manually 
assignment. But I had issues with running Booth.I tested second method 
"constraints and attributes". I make a cluster with 4 nodes and define some 
constraints for nodes. But I have problems with node relations now.
I was searching in the docs then I found 
"http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Remote/ch05.html".I
 thinks this is very close to my answer. Do you offer this solution?
Regards,H.Yavari


  From: Kristoffer Grönlund 
 To: ‪H Yavari‬ ‪ ; Cluster Labs - All topics related 
to open-source clustering welcomed  
 Sent: Wednesday, 18 May 2016, 10:36:39
 Subject: Re: [ClusterLabs] Two related Cluster
   
‪H Yavari‬ ‪  writes:

> Hi,
> So you think for this solution Booth is better or attribute method? I'm not 
> familiar with them so can you share your experiences with them?Many thanks.
>

I think a single cluster using node attributes should be a lot easier to
understand and maintain, so I'd recommend that solution if it works out
for you.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

  ___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] unable to start fence_scsi

2016-05-18 Thread Marco A. Carcano
Hi Ken,

by the way I’ve just also tried with pacemaker 1.1.14 (I builded it from 
sources into a new RPM) but it doesn’t work


> On 18 May 2016, at 11:29, Marco A. Carcano  wrote:
> 
> Hi Ken,
> 
> thank you for the reply
> 
> I tried as you suggested, and now the stonith devices tries to start but 
> fails.
> 
> I tried this
> 
> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
> apache-up002.ring0 apache-up003.ring0" 
> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
> apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
> pcmk_reboot_action="off" 
> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
> provides="unfencing"  op monitor interval=60s
> 
> and even this, adding pcmk_monitor_action="metadata” as suggested in a post 
> on RH knowledge base (even if the error was quite different)
> 
> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
> apache-up002.ring0 apache-up003.ring0" 
> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
> apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
> pcmk_reboot_action="off" 
> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
> provides="unfencing" pcmk_monitor_action="metadata"  op monitor interval=60s
> 
> I’m using CentOS 7.2, pacemaker-1.1.13-10  resource-agents-3.9.5-54 and 
> fence-agents-scsi-4.0.11-27
> 
> the error message are  Couldn't find anyone to fence (on) apache-up003.ring0 
> with any device anderror: Operation on of apache-up003.ring0 by  
> for crmd.15918@apache-up001.ring0.0599387e: No such device
> 
> Thanks
> 
> Marco
> 
> 
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: On loss of CCM Quorum: 
> Ignore
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
> apache-up001.ring0: node discovery
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
> apache-up002.ring0: node discovery
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
> apache-up003.ring0: node discovery
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Start   
> scsia#011(apache-up001.ring0)
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Calculated Transition 
> 11: /var/lib/pacemaker/pengine/pe-input-95.bz2
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
> operation (11) on apache-up003.ring0 (timeout=6)
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 9: 
> probe_complete probe_complete-apache-up003.ring0 on apache-up003.ring0 - no 
> waiting
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
> operation (8) on apache-up002.ring0 (timeout=6)
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 6: 
> probe_complete probe_complete-apache-up002.ring0 on apache-up002.ring0 - no 
> waiting
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
> operation (5) on apache-up001.ring0 (timeout=6)
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
> crmd.15918.697c495e wants to fence (on) 'apache-up003.ring0' with device 
> '(any)'
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
> operation on for apache-up003.ring0: 0599387e-0a30-4e1b-b641-adea5ba2a4ad (0)
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
> crmd.15918.697c495e wants to fence (on) 'apache-up002.ring0' with device 
> '(any)'
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
> operation on for apache-up002.ring0: 76aba815-280e-491a-bd17-40776c8169e9 (0)
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 3: 
> probe_complete probe_complete-apache-up001.ring0 on apache-up001.ring0 
> (local) - no waiting
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
> crmd.15918.697c495e wants to fence (on) 'apache-up001.ring0' with device 
> '(any)'
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
> operation on for apache-up001.ring0: e50d7e16-9578-4964-96a3-7b36bdcfba46 (0)
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
> to fence (on) apache-up003.ring0 with any device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
> to fence (on) apache-up002.ring0 with any device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
> apache-up003.ring0 by  for crmd.15918@apache-up001.ring0.0599387e: No 
> such device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
> apache-up002.ring0 by  for crmd.15918@apache-up001.ring0.76aba815: No 
> such device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
> to fence (on) apache-up001.ring0 with any device
> May 

Re: [ClusterLabs] unable to start fence_scsi

2016-05-18 Thread Marco A. Carcano
Hi Ken,

thank you for the reply

I tried as you suggested, and now the stonith devices tries to start but fails.

I tried this

pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
apache-up002.ring0 apache-up003.ring0" 
pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
pcmk_reboot_action="off" 
devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
provides="unfencing"  op monitor interval=60s

and even this, adding pcmk_monitor_action="metadata” as suggested in a post on 
RH knowledge base (even if the error was quite different)

pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
apache-up002.ring0 apache-up003.ring0" 
pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
pcmk_reboot_action="off" 
devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
provides="unfencing" pcmk_monitor_action="metadata"  op monitor interval=60s

I’m using CentOS 7.2, pacemaker-1.1.13-10  resource-agents-3.9.5-54 and 
fence-agents-scsi-4.0.11-27

the error message are  Couldn't find anyone to fence (on) apache-up003.ring0 
with any device anderror: Operation on of apache-up003.ring0 by  
for crmd.15918@apache-up001.ring0.0599387e: No such device

Thanks

Marco


May 18 10:37:03 apache-up001 crmd[15918]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
May 18 10:37:03 apache-up001 pengine[15917]:  notice: On loss of CCM Quorum: 
Ignore
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
apache-up001.ring0: node discovery
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
apache-up002.ring0: node discovery
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
apache-up003.ring0: node discovery
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Start   
scsia#011(apache-up001.ring0)
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Calculated Transition 11: 
/var/lib/pacemaker/pengine/pe-input-95.bz2
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
operation (11) on apache-up003.ring0 (timeout=6)
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 9: 
probe_complete probe_complete-apache-up003.ring0 on apache-up003.ring0 - no 
waiting
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
operation (8) on apache-up002.ring0 (timeout=6)
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 6: 
probe_complete probe_complete-apache-up002.ring0 on apache-up002.ring0 - no 
waiting
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
operation (5) on apache-up001.ring0 (timeout=6)
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
crmd.15918.697c495e wants to fence (on) 'apache-up003.ring0' with device '(any)'
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
operation on for apache-up003.ring0: 0599387e-0a30-4e1b-b641-adea5ba2a4ad (0)
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
crmd.15918.697c495e wants to fence (on) 'apache-up002.ring0' with device '(any)'
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
operation on for apache-up002.ring0: 76aba815-280e-491a-bd17-40776c8169e9 (0)
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 3: 
probe_complete probe_complete-apache-up001.ring0 on apache-up001.ring0 (local) 
- no waiting
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
crmd.15918.697c495e wants to fence (on) 'apache-up001.ring0' with device '(any)'
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
operation on for apache-up001.ring0: e50d7e16-9578-4964-96a3-7b36bdcfba46 (0)
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
to fence (on) apache-up003.ring0 with any device
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
to fence (on) apache-up002.ring0 with any device
May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
apache-up003.ring0 by  for crmd.15918@apache-up001.ring0.0599387e: No 
such device
May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
apache-up002.ring0 by  for crmd.15918@apache-up001.ring0.76aba815: No 
such device
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
to fence (on) apache-up001.ring0 with any device
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Stonith operation 
5/11:11:0:8248cebf-c198-4ff2-bd43-7415533ce50f: No such device (-19)
May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
apache-up001.ring0 by  for crmd.15918@apache-up001.ring0.e50d7e16: No 
such device
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Stonith operation 5 for 
apache-up003.ring

Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-05-18 Thread Jan Friesse

Ken Gaillot napsal(a):

On 05/17/2016 09:54 AM, Digimer wrote:

On 16/05/16 04:35 AM, Bogdan Dobrelya wrote:

On 05/16/2016 09:23 AM, Jan Friesse wrote:

Hi,

I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is
it possible?
Is there any examination about that?


Indeed, would be *great* to have a Pacemaker based control plane on top
of other "pluggable" distributed KVS & messaging systems, for example
etcd as well :)
I'm looking forward to joining any dev efforts around that, although I'm
not a Java or Go developer.


Part of open source is the freedom to do whatever you want, of course.

Let me ask though; What problems would zookeeper, etcd or other systems
solve that can't be solved in corosync?

I ask because the HA community just finished a multi-year effort to
merge different projects into one common HA stack. This has a lot of
benefits to the user base, not least of which is lack of confusion.

Strikes me that the significant time investment in supporting a new
comms layer would be much more beneficially spent on improving the
existing stack.

Again, anyone is free to do whatever they want... I just don't see the
motivator personally.

digimer


I see one big difference that is both a strength and a weakness: these
other packages have a much wider user base beyond the HA cluster use
case. The strength is that there will be many more developers working to
fix bugs, add features, etc. The weakness is that most of those


This is exactly what I was thinking about during 2.x developement. If 
replacement of Corosync wouldn't make more sense than continue 
developing of Corosync. I was able to accept implementing some features. 
Sadly, there was exactly ONE project which would be able to replace 
corosync (Spread toolkit) which is even less widespread than Corosync.


From my point of view, replacement of corosync must be (at least) able to:
- Work without quorum
- Support 2 node clusters
- Allow multiple links (something like RRP)
- Don't include SPOF (so nothing like configuration stored on one node 
only and/or different machine on network)

- Provide EVS/VS
- Provide something like qdevice

Both zookeeper and etcd builds on top of quite simple to understand 
membership mechanism (zookeeper = elected master, something like amoeba, 
etcd = raft), what's nice, because it means more contributors. Sadly 
bare metal HA must work even in situations where "simple" quorum is not 
enough.




developers are ignorant of HA clustering and could easily cause more
problems for the HA use case than they fix.

Another potential benefit is the familiarity factor -- people are more
comfortable with things they recognize from somewhere else. So it might
help Pacemaker adoption, especially in the communities that already use
these packages.

I'm not aware of any technical advantages, and I wouldn't expect any,
given corosync's long HA focus.


 From my point of view (and yes, I'm biased), biggest problem of Zookeper
is need to have quorum
(https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_designing).
Direct consequence is inability to tolerate one node failure in 2 node
cluster -> no 2 node clusters (and such deployment is extremely
popular). Also Corosync can operate completely without quorum.

Regards,
   Honza



Thanks for your help!
Hai Nguyen


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org