Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a lower case of hostlist.

2015-09-14 Thread Dejan Muhamedagic
Hi Hideo-san,

On Tue, Sep 08, 2015 at 05:28:05PM +0900, renayama19661...@ybb.ne.jp wrote:
> Hi All,
> 
> We intend to change some patches.
> We withdraw this patch.

I suppose that you'll send another one? I can vaguelly recall
a problem with non-lower case node names, but not the specifics.
Is that supposed to be handled within a stonith agent?

Cheers,

Dejan

> Best Regards,
> Hideo Yamauchi.
> 
> 
> - Original Message -
> > From: "renayama19661...@ybb.ne.jp" 
> > To: ClusterLabs-ML 
> > Cc: 
> > Date: 2015/9/7, Mon 09:06
> > Subject: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a 
> > lower case of hostlist.
> > 
> > Hi All,
> > 
> > When a cluster carries out stonith, Pacemaker handles host name by a small 
> > letter.
> > When a user sets the host name of the OS and host name of hostlist of 
> > external/libvrit in capital letters and waits, stonith is not carried out.
> > 
> > The external/libvrit to convert host name of hostlist, and to compare can 
> > assist 
> > a setting error of the user.
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Noel Kuntze

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hello Christine,

> I think it's worth mentioning here that corosync already sets its
> packets to TC_INTERACTIVE (which DLM does not), so they should not need
> too much messing around with in iptables/qdisc

If that is the case, then why do the totem messages timeout?
Corosync is already running with realtime priority and its packets are 
delivered instantly,
because they get put into band 0. So what's the problem here then, if it's not
too much traffic? Do you have an idea?
- -- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV9rNPAAoJEDg5KY9j7GZYoukQAJbGYCMeL77F+v5ynRGuZoCZ
+8XttYNzzZKMdHGnMcO8ewzOBjPKcYkl+KLe3cT/Mnpgt5RWqONwh3r7aIMK4NjQ
inhV5xpdW3JfPP6SCVjc5GoO8L7QmIKPKjhMaWPXw0E5eaW+7u1fVuiv6Cqshucr
XAfEq3lmETvRh9qSphVLPL7ay5/V6AUBpwV5ThRkXGXammA9b82v5PDyeJOOmmve
zTyeybRCcIKYmPciThA9W0GmJJjVzqwVCBd3vX79+s+m/DlEh1wCcmiUQtD4hNnT
NNQ4TnvGk5TzerEDCd+BhJWQU6IOwtCCDVE+injDZ7fbF7e8IpRh/h+y2QytF7aM
NZmLELV9W9QCNZN1B8xvMOEoR31bqTcbDCrocEDjP9QHfaCpDJ8uRDInEK81fNe/
ARFV6w+k6A/j45R4qCfAdJf0T5XXvjugOEZO95q4yNIc/5TcDIDi4GuJ6+pGNhQR
VyJKi06OKJ2oSvpwPnJ2OGghXwkVxNqFzFOBAVWMJ7O32hLK8wRvTR4PF7Mbep6z
JYw8DvbnWi1mAAhxMa2TgeFHSxQ6sU+CslUIrTA53qTKd/Fu6+SHqyoiPAfItHUB
lGvVJMsyo1Cgtauk0yzpVAJbK+3Y36/rfVsatTGHfWWFyBX3+ZpP0yrtBtgt+dkv
VqC0hELMsjZ4X6clHwUA
=HDmX
-END PGP SIGNATURE-


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Christine Caulfield
On 14/09/15 12:45, Noel Kuntze wrote:
> 
> Hello Christine,
> 
>> I think it's worth mentioning here that corosync already sets its
>> packets to TC_INTERACTIVE (which DLM does not), so they should not need
>> too much messing around with in iptables/qdisc
> 
> If that is the case, then why do the totem messages timeout?
> Corosync is already running with realtime priority and its packets are 
> delivered instantly,
> because they get put into band 0. So what's the problem here then, if it's not
> too much traffic? Do you have an idea?
> 

TBH I don't honestly know for sure. I don't think TC_INTERACTIVE is any
sort of guarantee of 'instant' delivery, just a hint to the kernel as to
relative priorities. If there is a huge amount of DLM traffic then that
still need to happen somehow. But really we ought to get input from
someone who knows more about the kernel networking internals than me :)

Chrissie

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Manage rate of resource migration

2015-09-14 Thread Ulrich Windl
>>> Matthew Vernon  schrieb am 14.09.2015 um 13:31 in Nachricht
<55f6b017.7070...@cam.ac.uk>:
> Hi,
> 
> I have a pacemaker/corosync cluster where the resources are all Xen
> guests. Pacemaker automatically distributes these between my two Xen
> hosts, and this is great.
> 
> When I move a node into or out of the standby state, though, pacemaker
> tries to bulk-migrate resources at once, and this can cause problems
> (migration timeouts, a serious peak in I/O load). Is there any way to
> specify a maximum number of resources that should be "in flight" between
> 2 nodes at once, if you see what I mean?

Try property migration-limit="2"...

Ulrich

> 
> Thanks,
> 
> Matthew
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Noel Kuntze

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hello Ullrich,


> Then that's not FIFO, but priority scheduling. Eveybody knows the starvation 
> problem of priotity scheduling.
It's mixed. The individual bands behave like a FIFO. The bands are prioritized 
over each other.

> Imagine some cluster filesystem has it's own timeount / fencing mechanism. 
> Then TOTEM when going wild can cause starvation of other services. It's the 
> wrong design IMHO.
> I wonder whether this can explain the mysterous cLVM retransmit list growing 
> under some loads.
>
> [...]
It can, if the total bytes in the queue is higher than the transmission rate of 
the network interface.
Packets in band 1 and 2 are only not delivered if band 0 always has packets. 
But again, for that, the transmission rate must be lower than
the rate at which the queues grows.
Does totem generate a runaway stream of packets when they are delivered in time?

 

- -- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV9rJNAAoJEDg5KY9j7GZYTGkP/A/WN+7NR5RFV9BXXRyircQ5
jNBkBoFKvhkyts5Yr9KUnnghlG/Kov+iPAYBaOpf8CRuo5PI0jN7UUjSYmg65GsM
rJFaWr+zPvIftrNc+rhT9ID2g+1OKQSbCGdnq6MjPbuLpoU3b9153J4wb85Qjf7B
OD5+rmtlvPFswvfwbXN+Q6pS+GOovZThad08PQQCNfNqqzda/wlnxRxLQSVywIWS
MV0M7ciwmsnlbnmIamp4fSepwtJMp1dw+eprcGOKCCVnSzzPIfG6ZSZv5m737+KP
TfpqTdkEcF9xUk2mmd4NPSX7FmW0djKy1Ws2gR26j5l2332lcc29E7tItV18Lr8J
6se0oNdXXWcjL1dfIkS0wox91eZBMH6RpQWuIBiYzN+y21VzU1ycMoiwlMTI4vxo
GTlqX51dB+IZ+FZ0Tn2UPmzWuz0oAQ3PEIZUD50KgOLELLp9Ri2Fpy4CKrJv2k6O
9fb4dq1vPvBiMthYUQ75Y/zb3YSteA2gp7VpSYQUEtfsJdwf6oS2xefIa0vqG232
9K76Wf5EJFw3hKy3guAsv3Sc7K8Bo3rqpnsyFyJmk2Xq5iticZGdgSZYp3xg1mP6
UjdGH9ic+7KKFPMh4Y6e0VO2NXOAuYX2AelHyujHfK71AVbri+IfoMJrJszIISwT
xxJV7KsfaQvGo+qRlAzq
=qEtz
-END PGP SIGNATURE-



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Noel Kuntze

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hello Christine,

I googled a bit and some doc[1] says that TC_PRIO_INTERACTIVE maps to value 6, 
whatever that is.
Assuming that value of 6 is the same as the "priority value", Corosync traffic 
should go into band 0, because
TOS values of 0x10 and 0x14 have "priority value" 6, too. The page[2] on 
lartc.org says that, too.

That means that at least when pfifo_fast is used, there's no need for iptables 
rules or tc filters to prioritize Corosync traffic manually.

[1] http://linux-tc-notes.sourceforge.net/tc/doc/priority.txt
[2] http://lartc.org/howto/lartc.qdisc.classless.html#AEN658
- -- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV9t2yAAoJEDg5KY9j7GZYA88P/3RVecICvk8PPSkP00SzUwB4
Rz1UCPyrT68y0ndvxWo9XL4aVBjKskmFHCD6db74hn6LOwa91l0lgV/ii6CZWHge
2Bvp3TVM18BzWN/iy+4AbAKHQ8wayLH/c1P1lRy8BYDCVvLK8/hT36A8TOPYhCFa
cXQcD1cGfAbfTlFTc1UjuJRjniPA8SOwHtbRNfSdJw6PznYX7smGy3tfcu7bpGRe
6Q62eaayZn1P3tZb3yc1Jt/J237pf9GMBCqLu6SRg0+yKJpgyhfSBgMtg7rCrPP3
ax6ta6ypXgIucCgp68vo78k1hcXwIDmUbvkuaR+su0GTulFIhl6qh74l+3UDFISH
hICNa0TdznFX0JHy3hsko+W97zC5ywMNCm/RTiy6DckHnshZyuLS5G1gh50AGhgc
tGMn1K06fIqmpq+MogRvZeszNOqm43qTBGGD1oXYOIcJnBbIYuUevLs5xoKWkwS8
afqvIWjFeNIKkEDjla8h9RblpzLAZ8uR68m2jYrev81Bb14DqUnVb2m3DcbqwYIS
x7wtoY4Mxj3c4joH+xiJ/Hk2JXf+S2JqoDFwcyLoh3gfAle7YAhx9cl93xCjJgf5
7CsTrojhRyLgOcvFnMpCB6r6guNVGqFN4kUvgP7eMWxBLpEE53CS4V44YS1QJkGR
6z+6w1a02L2UAgVI16Oj
=Q8Dc
-END PGP SIGNATURE-



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] node dead timeout

2015-09-14 Thread Nikola Ciprich
Hello Dejan,

thanks for your answer! OK, I see.. I'm using cman, so I guess it's
 option.

I'm not sure whether I understand RH docs on this correctly,
do I need to stop/start whole cluster for this option to apply?

Or even better, is there a way how can I check currently set value?

with best regards

nik



> No, it's an event delivered by the underlying messaging layer. I
> suppose that you're using corosync, in which case see about token
> timeout in corosync.conf(5).
> 
> Thanks,
> 
> Dejan
> 
> > thanks a lot in advance!
> > 
> > BR
> > 
> > nik
> > 
> > -- 
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> > 
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> 
> 
> 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpVhqkHoFpI5.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Antw: Re: EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Ulrich Windl
>>> Noel Kuntze  schrieb am 14.09.2015 um 13:55 in
Nachricht <55f6b5b5.8020...@familie-kuntze.de>:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Hello Ullrich,
> 
>> Actually I don't understand that claim: If packets are delivered in order
>>(mostly), any TOTEM packet has the same change to arrive than any other 

s/change/chance/ # typing error, maybe lack of coffeine in the morning

> packet.
>> While all other communication protocols can actually deal with Ethernet and

> the
>> Internet, TOTEM is the only protocol that can fail even in a switched LAN.
I
>> haven't benn convinced yet that it's not an implementation issue of TOTEM.
>> Instead of telling people to fiddle with their network configuration, I'd
>> prefer putting more efforts into fixing TOTEM.
> 
> I assume with "change to arrive", you mean delay? Or do you mean the 
> ordering of
> the packets?
> Totem behaves like it does because it needs to detect a failed node, afaik.

What totem does it detect network problems when there are none:

# grep ringid.*FAULTY /var/log/messages |wc -l
1981

> This is something that no other protocol you encounter on the internet/LAN 
> is supposed to do.

Definitely not: 0 interface errors on any interface, not communication
problems.

> All of those protocols are either for error reporting (ICMP) or for 
> transceiving of
> data (udp/tcp). UDP obviously has no congestion algorithm, but TCP does.

Even NFS over UDP is much smarter than TOTEM is.

> 
>> The main problem with priorities is who decides what is most important,
>> especially if a medium is shered by many different software stacks and

s/shered/shared/  #se above

>> applications.
> 
> Obviously some type of prioritzation has to be done, or at least should be 
> done,
> because some things *are* more important than others. The only thing that 
> can
> control congestion centrally in a computer system is the interface that 
> controls
> access to it, so it's either the NIC or the software that controls access to

> it, so the network stack
> of the operating system. The problem is a different one when the LAN is 
> bridged, rather then switched, because then
> the transmission of other hosts affects the transmission of one host.

If you have a central authority that can decide on each eand every priority
you are right. I was talking from practical experience...

> - -- 
> 
> Mit freundlichen Grüßen/Kind Regards,
> Noel Kuntze
> 
> GPG Key ID: 0x63EC6658
> Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2
> 
> iQIcBAEBCAAGBQJV9rWvAAoJEDg5KY9j7GZY3LkP/3vkEppL48nwlAGVpFIbVIRj
> HpC6usWTFTaS3s20FOBo+60mtGAi6QnDku05WEkcjKrN8rjb0lll8KKAxCAP5ejO
> xofUpmSZp4vs534gpwYotXf8IU4ZwLsF5WEjdtVc0AoVk99TNwS8g7P2eGRybvxy
> Qdr+C2I99n4iqr93MRjDRRZj5S6t+PICr7s2hRrGrNIiSO0XJJdnoJWYR2g3DlPi
> 9tw2nyb832Pe3eusRqBdXN1lDEw8Amr2apjW6yGlNKlbaVe/TbcxZg4qnuPQtTAa
> Jc9pxItG31ZGG6G3SyzQuU2VG1DUGfyqUBAKv//oQtlb8YEklYHfzvhUvf4/XTJn
> 5Zcv6IVoTUVVexB6bmQ6sHxbsXpHrb7Y+uViqVNEogJ66I4kTi9jo7DxxW3Mjsct
> TSMjGAWEdmhi1KKONuCnqLMvyVdqdF/4VKZhJ6P2NaVQpk/8zXXrp1Q0zJmfupV6
> awQXvwRdAwM4KP+G94KxjFn8J7cuC3a6Hk2LuQp2OL/2IEliN5p8+R0lii6eVev4
> n+wVsgLve/JHMBghNhJTf5Fs6+lUsgOOYt4RK3/gqAFuktE53XqmwdMVjl3yelXR
> UR5J3GxQ5AbuhzetbVn1HIVMfOzwjzgW8vjcWmkmB01tOKXyvpyWRjFP6HawLxCh
> kWHwsh6S+7OxJ0Oijrs5
> =CyyI
> -END PGP SIGNATURE-
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] node dead timeout

2015-09-14 Thread Dejan Muhamedagic
Hi,

On Mon, Sep 14, 2015 at 09:15:59AM +0200, Nikola Ciprich wrote:
> Hello Andrew and all pacemaker users and developers,
> 
> I'd like to ask for advice - reading the docs, I'm still not sure - how
> can I set timeout telling when is node considered dead (and fenced)?
> 
> Is it dc-deadtime ?

No, it's an event delivered by the underlying messaging layer. I
suppose that you're using corosync, in which case see about token
timeout in corosync.conf(5).

Thanks,

Dejan

> thanks a lot in advance!
> 
> BR
> 
> nik
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -



> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Noel Kuntze

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hello Christine,

Do you have a pointer for me where to look in the source?
Searching for TC_INTERACTIVE in the Corosync sources on Github yielded no 
results.
How the scheduler handles the packets depends on the settings and type of it,
so yes, it's no guarantee. Whatever comes in front of the scheduler is 
important, too,
but I do not know if there is something in front of it and it it is, what it is.
- -- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV9tigAAoJEDg5KY9j7GZYzG8QAJq0a47VlmpJQS0fAnXkW+1f
ejTX5FX5tKotHQHRIBnp+mImH8JzSd2z0Al45w+dy99a22Usx9VGvzwyOt4nSSJj
B6Nh3CEZRJuGq4CnUM7qS3WP9ZF6m3R+5mbvDAl/CNRjkFDRbdxrjmpAYbcwJu+A
Pjeo5DtN1jwwZVZvUDRYRcguiJ0FY0SfQxgegvfJohSw58KEDcvVoUUFxVMcyJcZ
llpmp3fB3pe2KpsDPQobsfw36BGEtkOd4dHd+KGMvWHXQxtEwh5HXoCSu9G8ztey
51b34KKsi3a1pOet8gXWDECiAeOPX31PnjBCAF/bzy+xdkqyX9dH+LzUaLhgkCSy
Zd2WJPOPQxob6SXLT4Nwi1uWoFSyHjPMILUStGRvIQt1520r6tQeXuxnFOnyohv4
AXYgEjeXLvJ3Dm5kCp/esPOJTxK0tMm4G1gtjCY3B1cGda/sAtDI3y5Xh4g8FiYE
qtHyJ66c3wkKMA2myT4ZiklNf6Gdk1iGgIbSfoMdsYu4GrDVz0oMKyalE6OIvWfM
lZrqS01ruXf3heT753UB9g079xcs0Ofuj4SmkD/3FPPAAzAcDac8VXPVW5Coktzf
NutX+mfW/YRM4TmKetOLx8CicfLDgx4lGo8AAXyNLAxIE0yZf1czoar0aViizRZz
bOvRvU4SZ2b7hrWfEcM7
=XBBE
-END PGP SIGNATURE-



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Adding 'virsh migrate migrate-setspeed' support to the vm RA

2015-09-14 Thread Digimer
On 14/09/15 07:19 AM, Dejan Muhamedagic wrote:
> Hi Digimer,
> 
> On Fri, Sep 04, 2015 at 03:36:09AM -0400, Digimer wrote:
>> Hi all,
>>
>>   I hit an issue a little while ago where live-migrating a VM (on the
>> same management network normally used for corosync and a few other
>> monitoring tasks) ate up all the bandwidth, causing corosync to declare
>> a failure and triggering a fence.
>>
>>   I've dealt with this, in part, by adding redundant ring support on a
>> different network. However, I'd like to also limit the migration
>> bandwidth so that I don't need to fail over the ring in the first place.
> 
> As you certainly know, heavy traffic networks are not well suited
> for the latency sensitive cluster communication. Best to have the
> two separated.

Oh for sure, but I've already hit 6 NICs per node (back-channel,
normally just corosync, storage dedicated to DRBD and internet-facing
dedicated to the user app). Adding two more NICs just for live
migration, which happens very rarely, is hard to justify, despite this
issue. If I can't reliably solve it with this (and/or tc and/or rrp...)
then I will revisit this option.

>>   Is it reasonable/feasible to add 'virsh migrate-setspeed' support to
>> the vm.sh RA? Something like; 'setspeed="75MiB"'?
> 
> Yes, sounds like a good idea. Care to open an issue in github?
> 
> Thanks,
> 
> Dejan

Done.

https://github.com/ClusterLabs/resource-agents/issues/679

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Digimer
On 14/09/15 10:46 AM, Noel Kuntze wrote:
> 
> Hello Christine,
> 
> I googled a bit and some doc[1] says that TC_PRIO_INTERACTIVE maps to value 
> 6, whatever that is.
> Assuming that value of 6 is the same as the "priority value", Corosync 
> traffic should go into band 0, because
> TOS values of 0x10 and 0x14 have "priority value" 6, too. The page[2] on 
> lartc.org says that, too.
> 
> That means that at least when pfifo_fast is used, there's no need for 
> iptables rules or tc filters to prioritize Corosync traffic manually.
> 
> [1] http://linux-tc-notes.sourceforge.net/tc/doc/priority.txt
> [2] http://lartc.org/howto/lartc.qdisc.classless.html#AEN658

So what's the final verdict on this? I followed your back and forth, and
it sounds like corosync uses 0, so nothing else is to be done?

I'm also fully willing to admit that something else triggered the fault
detection. It happened during a long live migration (actually, several
servers back to back), so I *assumed* that was the cause. Given it was a
cut-over weekend though, I made a mental note and went back to work. Bad
choice... I should have snagged the logs for later investigation. =/

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Digimer
On 14/09/15 04:20 AM, Jan Friesse wrote:
> Digimer napsal(a):
>> Hi all,
>>
>>Starting a new thread from the "Clustered LVM with iptables issue"
>> thread...
>>
>>I've decided to review how I do networking entirely in my cluster. I
>> make zero claims to being great at networks, so I would love some
>> feedback.
>>
>>I've got three active/passive bonded interfaces; Back-Channel, Storage
>> and Internet-Facing networks. The IFN is "off limits" to the cluster as
>> it is dedicated to hosted server traffic only.
>>
>>So before, I uses only the BCN for cluster traffic for cman/corosync
>> multicast traffic, no rrp. A couple months ago, I had a cluster
>> partition when VM live migration (also on the BCN) congested the
>> network. So I decided to enable RRP using the SN as backup, which has
>> been marginally successful.
>>
>>Now, I want to switch to unicast (> the SN as the backup and BCN as the primary ring and do a proper
>> IPTables firewall. Is this sane?
>>
>>When I stopped iptables entirely and started cman with unicast + RRP,
>> I saw this:
>>
>> ] Node 1
>> Sep 11 17:31:24 node1 kernel: DLM (built Aug 10 2015 09:45:36) installed
>> Sep 11 17:31:24 node1 corosync[2523]:   [MAIN  ] Corosync Cluster Engine
>> ('1.4.7'): started and ready to provide service.
>> Sep 11 17:31:24 node1 corosync[2523]:   [MAIN  ] Corosync built-in
>> features: nss dbus rdma snmp
>> Sep 11 17:31:24 node1 corosync[2523]:   [MAIN  ] Successfully read
>> config from /etc/cluster/cluster.conf
>> Sep 11 17:31:24 node1 corosync[2523]:   [MAIN  ] Successfully parsed
>> cman config
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] Initializing
>> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] Initializing
>> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] The network interface
>> [10.20.10.1] is now up.
>> Sep 11 17:31:24 node1 corosync[2523]:   [QUORUM] Using quorum provider
>> quorum_cman
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync cluster quorum service v0.1
>> Sep 11 17:31:24 node1 corosync[2523]:   [CMAN  ] CMAN 3.0.12.1 (built
>> Jul  6 2015 05:30:35) started
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync CMAN membership service 2.90
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> openais checkpoint service B.01.01
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync extended virtual synchrony service
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync configuration service
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync cluster closed process group service v1.01
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync cluster config database access v1.01
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync profile loading service
>> Sep 11 17:31:24 node1 corosync[2523]:   [QUORUM] Using quorum provider
>> quorum_cman
>> Sep 11 17:31:24 node1 corosync[2523]:   [SERV  ] Service engine loaded:
>> corosync cluster quorum service v0.1
>> Sep 11 17:31:24 node1 corosync[2523]:   [MAIN  ] Compatibility mode set
>> to whitetank.  Using V1 and V2 of the synchronization engine.
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] adding new UDPU member
>> {10.20.10.1}
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] adding new UDPU member
>> {10.20.10.2}
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] The network interface
>> [10.10.10.1] is now up.
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] adding new UDPU member
>> {10.10.10.1}
>> Sep 11 17:31:24 node1 corosync[2523]:   [TOTEM ] adding new UDPU member
>> {10.10.10.2}
>> Sep 11 17:31:27 node1 corosync[2523]:   [TOTEM ] Incrementing problem
>> counter for seqid 1 iface 10.10.10.1 to [1 of 3]
>> Sep 11 17:31:27 node1 corosync[2523]:   [TOTEM ] A processor joined or
>> left the membership and a new membership was formed.
>> Sep 11 17:31:27 node1 corosync[2523]:   [CMAN  ] quorum regained,
>> resuming activity
>> Sep 11 17:31:27 node1 corosync[2523]:   [QUORUM] This node is within the
>> primary component and will provide service.
>> Sep 11 17:31:27 node1 corosync[2523]:   [QUORUM] Members[1]: 1
>> Sep 11 17:31:27 node1 corosync[2523]:   [QUORUM] Members[1]: 1
>> Sep 11 17:31:27 node1 corosync[2523]:   [CPG   ] chosen downlist: sender
>> r(0) ip(10.20.10.1) r(1) ip(10.10.10.1) ; members(old:0 left:0)
>> Sep 11 17:31:27 node1 corosync[2523]:   [MAIN  ] Completed service
>> synchronization, ready to provide service.
>> Sep 11 17:31:27 node1 

Re: [ClusterLabs] Parser error with fence_ipmilan

2015-09-14 Thread Andrew Beekhof

> On 14 Sep 2015, at 7:48 pm, dan  wrote:
> 
> mån 2015-09-14 klockan 10:02 +0200 skrev dan:
>> Hi
>> 
>> To see if my cluster problem go away with a newer version of pacemaker I
>> have now installed pcemaker 1.1.12+git+a9c8177-3ubuntu1 and I had to get
>> 4.0.19-1 (ubuntu) of fence-agents to get a working fence-ipmilan.
>> 
>> But now when the cluster wants to stonith a node I get:
>> 
>> fence_ipmilan: Parser error: option -n/--plug is not recognize
>> fence_ipmilan: Please use '-h' for usage
>> 
>> Is the problem in fence-agents or in pacemaker?
> 
> Looking at the code producing this, I got it working by adding to my
> cluster config for my stonith devices
> port_as_ip=1 port=192.168.xx.xx
> 
> before I had:
> lanplus=1 ipaddr=192.168.xx.xx
> which worked fine before the new version of pacemaker.
> Now I have:
> lanplus=1 ipaddr=192.168.xx.xx port_as_ip=1 port=192.168.xx.xx
> which works.

I’m glad it works, looks like a regression to me though.
You shouldn’t need to override the value pacemaker supplies for port if ipaddr 
is being set.

Can you comment on this Marek?
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Jan Friesse

Digimer napsal(a):

Hi all,

   Starting a new thread from the "Clustered LVM with iptables issue"
thread...

   I've decided to review how I do networking entirely in my cluster. I
make zero claims to being great at networks, so I would love some feedback.

   I've got three active/passive bonded interfaces; Back-Channel, Storage
and Internet-Facing networks. The IFN is "off limits" to the cluster as
it is dedicated to hosted server traffic only.

   So before, I uses only the BCN for cluster traffic for cman/corosync
multicast traffic, no rrp. A couple months ago, I had a cluster
partition when VM live migration (also on the BCN) congested the
network. So I decided to enable RRP using the SN as backup, which has
been marginally successful.

   Now, I want to switch to unicast (

Don't do ifdown. Corosync reacts on ifdown very badly (long time known 
issue, also it's one of the reason for knet in future version).


Also rrp active is not so well tested as passive, so give a try to passive.

Honza



Thanks!




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Parser error with fence_ipmilan

2015-09-14 Thread dan
mån 2015-09-14 klockan 10:02 +0200 skrev dan:
> Hi
> 
> To see if my cluster problem go away with a newer version of pacemaker I
> have now installed pcemaker 1.1.12+git+a9c8177-3ubuntu1 and I had to get
> 4.0.19-1 (ubuntu) of fence-agents to get a working fence-ipmilan.
> 
> But now when the cluster wants to stonith a node I get:
> 
> fence_ipmilan: Parser error: option -n/--plug is not recognize
> fence_ipmilan: Please use '-h' for usage
> 
> Is the problem in fence-agents or in pacemaker?

Looking at the code producing this, I got it working by adding to my
cluster config for my stonith devices
port_as_ip=1 port=192.168.xx.xx

before I had:
lanplus=1 ipaddr=192.168.xx.xx
which worked fine before the new version of pacemaker.
Now I have:
lanplus=1 ipaddr=192.168.xx.xx port_as_ip=1 port=192.168.xx.xx
which works.

   Dan


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org