Re: [ClusterLabs] two node cluster not behaving right

2015-11-06 Thread user . clusterlabs . org
>> Nov 06 01:30:54 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:56 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:57 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:59 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:31:01 corosync [TOTEM ] Retransmit List: 96 97
> 
> This means something is blocking successful delivery of packets. Make sure to:
> - Properly configure firewall (for testing you can disable it completely)
> - Make sure you have properly configured multicast. As alternative, you can 
> try udpu. Udpu is usually better compatible with switches and for two node 
> use case performance is same.
Thanks, though got no FW between VM nodes, and multicast should be working too:

[root@afnA ~]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source   destination 

Chain FORWARD (policy ACCEPT)
target prot opt source   destination 

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination 
[root@afnA ~]# omping afnB afnA
afnB : waiting for response msg
afnB : joined (S,G) = (*, 232.43.211.234), pinging
afnB :   unicast, seq=1, size=69 bytes, dist=0, time=0.265ms
afnB :   unicast, seq=2, size=69 bytes, dist=0, time=0.342ms
afnB : multicast, seq=2, size=69 bytes, dist=0, time=0.443ms
afnB :   unicast, seq=3, size=69 bytes, dist=0, time=0.517ms
afnB : multicast, seq=3, size=69 bytes, dist=0, time=0.590ms
afnB :   unicast, seq=4, size=69 bytes, dist=0, time=0.349ms
afnB : multicast, seq=4, size=69 bytes, dist=0, time=0.435ms
afnB :   unicast, seq=5, size=69 bytes, dist=0, time=0.361ms
afnB : multicast, seq=5, size=69 bytes, dist=0, time=0.448ms
afnB :   unicast, seq=6, size=69 bytes, dist=0, time=0.277ms
afnB : multicast, seq=6, size=69 bytes, dist=0, time=0.343ms
afnB :   unicast, seq=7, size=69 bytes, dist=0, time=0.302ms
afnB : multicast, seq=7, size=69 bytes, dist=0, time=0.402ms
^C
afnB :   unicast, xmt/rcv/%loss = 7/7/0%, min/avg/max/std-dev = 
0.265/0.345/0.517/0.084
afnB : multicast, xmt/rcv/%loss = 7/6/14% (seq>=2 0%), min/avg/max/std-dev = 
0.343/0.444/0.590/0.082
[root@afnA ~]# 

also corosync should be okay ImHO:

[root@afnA ~]# corosync-quorumtool  -l
Nodeid Name
   1   afnA.mxi.tdcfoo
   2   afnB.mxi.tdcfoo
[root@afnA ~]# corosync-quorumtool  -s
Version:  1.4.7
Nodes:2
Ring ID:  208
Quorum type:  quorum_cman
Quorate:  Yes

[root@afnA ~]# pcs status
Cluster name: afn-cluster
Last updated: Fri Nov  6 08:57:57 2015
Last change: Fri Nov  6 02:47:33 2015
Stack: cman
Current DC: afna - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured


Online: [ afna ]
OFFLINE: [ afnb ]

Full list of resources:


[root@afnA ~]# ___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] two node cluster not behaving right

2015-11-06 Thread user . clusterlabs . org

> On 6. nov. 2015, at 08.42, Jan Friesse  wrote:
> 
> user.clusterlabs@siimnet.dk  
> napsal(a):
>> Been new to pacemaker, I’m trying to create my first cluster of two nodes, 
>> but it seems to behave a little strange.
>> Following this guide: http://clusterlabs.org/quickstart-redhat-6.html 
>>  
>> > >
>> 
>> but am unable to do f.ex.:
>> 
>> [root@afnA ~]# pcs property set stonith-enabled=false
>> Error: Unable to update cib
>> Call cib_replace failed (-62): Timer expired
>> 
>> 
>> only thing I find in logs are continued corosync events:
>> 
>> Nov 06 01:30:54 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:56 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:57 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:59 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:31:01 corosync [TOTEM ] Retransmit List: 96 97
> 
> This means something is blocking successful delivery of packets. Make sure to:
> - Properly configure firewall (for testing you can disable it completely)
> - Make sure you have properly configured multicast. As alternative, you can 
> try udpu. Udpu is usually better compatible with switches and for two node 
> use case performance is same.
Found this thread: http://www.gossamer-threads.com/lists/linuxha/pacemaker/90203

It seems that multicast between my two KVM nodes stops after 180s:

afnA :   unicast, seq=178, size=69 bytes, dist=0, time=0.238ms
afnA : multicast, seq=178, size=69 bytes, dist=0, time=0.324ms
afnA :   unicast, seq=179, size=69 bytes, dist=0, time=0.243ms
afnA : multicast, seq=179, size=69 bytes, dist=0, time=0.313ms
afnA :   unicast, seq=180, size=69 bytes, dist=0, time=0.273ms
afnA :   unicast, seq=181, size=69 bytes, dist=0, time=0.449ms
afnA :   unicast, seq=182, size=69 bytes, dist=0, time=0.266ms
afnA :   unicast, seq=183, size=69 bytes, dist=0, time=0.367ms

I can then just restart omping and get another 180s of multicasting… hmm might 
this have anything to do with the open vswitch used between nodes… seem to 
remember to have read about issues with open vswitches and multicasting, will 
dig more…

Meanwhile since I only have two nodes cluster, how do I configure it to do 
unicast in /etc/cluster/cluster,conf, as cman stack doesn’t use 
/etc/corosync/corosync.conf (have test with skewed malfunction corosync.conf, 
cman still forms quorum initially)?

TIA

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] two node cluster not behaving right

2015-11-06 Thread user . clusterlabs . org

> On 6. nov. 2015, at 08.42, Jan Friesse  wrote:
> 
> This means something is blocking successful delivery of packets. Make sure to:
> - Properly configure firewall (for testing you can disable it completely)
> - Make sure you have properly configured multicast. As alternative, you can 
> try udpu. Udpu is usually better compatible with switches and for two node 
> use case performance is same.
Seem unicast fixed the issue ;)
This was done by changing cman configuration to udpu transport in cluster.conf 
like this:

http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] two node cluster not behaving right

2015-11-06 Thread Jan Friesse

Steffen,

user.clusterlabs@siimnet.dk napsal(a):



On 6. nov. 2015, at 08.42, Jan Friesse  wrote:

This means something is blocking successful delivery of packets. Make sure to:
- Properly configure firewall (for testing you can disable it completely)
- Make sure you have properly configured multicast. As alternative, you can try 
udpu. Udpu is usually better compatible with switches and for two node use case 
performance is same.

Seem unicast fixed the issue ;)


Good.

Multicast is always source of problems. Also if you decide one day to 
scale more and try multicast, you can try 
https://alteeve.ca/w/Fencing_KVM_Virtual_Servers#Fedora_18_Host.3B_Bridge_Configuration_Issue


Regards,
  Honza




This was done by changing cman configuration to udpu transport in cluster.conf 
like this:

http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] restarting resources

2015-11-06 Thread zulucloud



On 11/02/2015 05:59 PM, - - wrote:

Hi,
I need to be able to restart a resource (e.g apache) whenever a
configuration
file is updated. I have been using the 'crm resource restart ' command
to to it,
which does restart the resource BUT also restarts my other resources also.
Is this normal behaviour? Is there a way to just/force restart only the
resource whose config file is changed.

I have the following resources configured in a group (ew)


Hi,

i had the same problem with Apache and a virtual IP together in a group. 
I wasn't able to restart / reload just Apache. The only way was to bring 
down the whole group including IP, and then start the group again.



Is there a way to just start Website8086 or just reload it, without
affecting
the other resources.


I removed the group configuration and instead just inserted a colocation 
rule to make sure that Apache and IP are running on the same node. This 
works fine for me, now it's possible to crm resource restart res_apache 
without affecting the IP, and it works fast and smoothly.


best regards..

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: restarting resources

2015-11-06 Thread Ulrich Windl
>>> zulucloud  schrieb am 06.11.2015 um 12:48 in 
>>> Nachricht
<563c9374.5000...@mailbox.org>:

> 
> On 11/02/2015 05:59 PM, - - wrote:
>> Hi,
>> I need to be able to restart a resource (e.g apache) whenever a
>> configuration
>> file is updated. I have been using the 'crm resource restart ' command
>> to to it,
>> which does restart the resource BUT also restarts my other resources also.
>> Is this normal behaviour? Is there a way to just/force restart only the
>> resource whose config file is changed.
>>
>> I have the following resources configured in a group (ew)
> 
> Hi,
> 
> i had the same problem with Apache and a virtual IP together in a group. 
> I wasn't able to restart / reload just Apache. The only way was to bring 
> down the whole group including IP, and then start the group again.

I wonder: If the IP comes before Apache, and apache is the last in the group, 
you should be able to restart apache leaving the IP unaffected. Maybe you 
should provide more details...

> 
>> Is there a way to just start Website8086 or just reload it, without
>> affecting
>> the other resources.
> 
> I removed the group configuration and instead just inserted a colocation 
> rule to make sure that Apache and IP are running on the same node. This 
> works fine for me, now it's possible to crm resource restart res_apache 
> without affecting the IP, and it works fast and smoothly.
> 
> best regards..
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: restarting resources

2015-11-06 Thread zulucloud



On 11/06/2015 12:55 PM, Ulrich Windl wrote:

zulucloud  schrieb am 06.11.2015 um 12:48 in Nachricht



i had the same problem with Apache and a virtual IP together in a group.
I wasn't able to restart / reload just Apache. The only way was to bring
down the whole group including IP, and then start the group again.


I wonder: If the IP comes before Apache, and apache is the last in the group, 
you should be able to restart apache leaving the IP unaffected. Maybe you 
should provide more details...



Hello Ulrich,

thanks, interesting hint. I took a look in my old config's backup, and 
indeed, my group was configured the other way round:


group gr_apache res_apache res_haweb_ip \
meta target-role="Started"

I changed that to your suggestion:

group gr_apache res_haweb_ip res_apache \
meta target-role="Started"

removed the colocation, switched on the group configured that way and 
indeed: that way it's possible to do a


crm resource restart res_apache

without affecting the IP. Learned something new again ;)
Thank you :)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org