Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a lower case of hostlist.
Hi Hideo-san, On Tue, Sep 08, 2015 at 05:28:05PM +0900, renayama19661...@ybb.ne.jp wrote: > Hi All, > > We intend to change some patches. > We withdraw this patch. I suppose that you'll send another one? I can vaguelly recall a problem with non-lower case node names, but not the specifics. Is that supposed to be handled within a stonith agent? Cheers, Dejan > Best Regards, > Hideo Yamauchi. > > > - Original Message - > > From: "renayama19661...@ybb.ne.jp"> > To: ClusterLabs-ML > > Cc: > > Date: 2015/9/7, Mon 09:06 > > Subject: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a > > lower case of hostlist. > > > > Hi All, > > > > When a cluster carries out stonith, Pacemaker handles host name by a small > > letter. > > When a user sets the host name of the OS and host name of hostlist of > > external/libvrit in capital letters and waits, stonith is not carried out. > > > > The external/libvrit to convert host name of hostlist, and to compare can > > assist > > a setting error of the user. > > > > Best Regards, > > Hideo Yamauchi. > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello Christine, > I think it's worth mentioning here that corosync already sets its > packets to TC_INTERACTIVE (which DLM does not), so they should not need > too much messing around with in iptables/qdisc If that is the case, then why do the totem messages timeout? Corosync is already running with realtime priority and its packets are delivered instantly, because they get put into band 0. So what's the problem here then, if it's not too much traffic? Do you have an idea? - -- Mit freundlichen Grüßen/Kind Regards, Noel Kuntze GPG Key ID: 0x63EC6658 Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658 -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJV9rNPAAoJEDg5KY9j7GZYoukQAJbGYCMeL77F+v5ynRGuZoCZ +8XttYNzzZKMdHGnMcO8ewzOBjPKcYkl+KLe3cT/Mnpgt5RWqONwh3r7aIMK4NjQ inhV5xpdW3JfPP6SCVjc5GoO8L7QmIKPKjhMaWPXw0E5eaW+7u1fVuiv6Cqshucr XAfEq3lmETvRh9qSphVLPL7ay5/V6AUBpwV5ThRkXGXammA9b82v5PDyeJOOmmve zTyeybRCcIKYmPciThA9W0GmJJjVzqwVCBd3vX79+s+m/DlEh1wCcmiUQtD4hNnT NNQ4TnvGk5TzerEDCd+BhJWQU6IOwtCCDVE+injDZ7fbF7e8IpRh/h+y2QytF7aM NZmLELV9W9QCNZN1B8xvMOEoR31bqTcbDCrocEDjP9QHfaCpDJ8uRDInEK81fNe/ ARFV6w+k6A/j45R4qCfAdJf0T5XXvjugOEZO95q4yNIc/5TcDIDi4GuJ6+pGNhQR VyJKi06OKJ2oSvpwPnJ2OGghXwkVxNqFzFOBAVWMJ7O32hLK8wRvTR4PF7Mbep6z JYw8DvbnWi1mAAhxMa2TgeFHSxQ6sU+CslUIrTA53qTKd/Fu6+SHqyoiPAfItHUB lGvVJMsyo1Cgtauk0yzpVAJbK+3Y36/rfVsatTGHfWWFyBX3+ZpP0yrtBtgt+dkv VqC0hELMsjZ4X6clHwUA =HDmX -END PGP SIGNATURE- ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables
On 14/09/15 12:45, Noel Kuntze wrote: > > Hello Christine, > >> I think it's worth mentioning here that corosync already sets its >> packets to TC_INTERACTIVE (which DLM does not), so they should not need >> too much messing around with in iptables/qdisc > > If that is the case, then why do the totem messages timeout? > Corosync is already running with realtime priority and its packets are > delivered instantly, > because they get put into band 0. So what's the problem here then, if it's not > too much traffic? Do you have an idea? > TBH I don't honestly know for sure. I don't think TC_INTERACTIVE is any sort of guarantee of 'instant' delivery, just a hint to the kernel as to relative priorities. If there is a huge amount of DLM traffic then that still need to happen somehow. But really we ought to get input from someone who knows more about the kernel networking internals than me :) Chrissie ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Manage rate of resource migration
>>> Matthew Vernonschrieb am 14.09.2015 um 13:31 in Nachricht <55f6b017.7070...@cam.ac.uk>: > Hi, > > I have a pacemaker/corosync cluster where the resources are all Xen > guests. Pacemaker automatically distributes these between my two Xen > hosts, and this is great. > > When I move a node into or out of the standby state, though, pacemaker > tries to bulk-migrate resources at once, and this can cause problems > (migration timeouts, a serious peak in I/O load). Is there any way to > specify a maximum number of resources that should be "in flight" between > 2 nodes at once, if you see what I mean? Try property migration-limit="2"... Ulrich > > Thanks, > > Matthew > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: EL6, cman, rrp, unicast and iptables
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello Ullrich, > Then that's not FIFO, but priority scheduling. Eveybody knows the starvation > problem of priotity scheduling. It's mixed. The individual bands behave like a FIFO. The bands are prioritized over each other. > Imagine some cluster filesystem has it's own timeount / fencing mechanism. > Then TOTEM when going wild can cause starvation of other services. It's the > wrong design IMHO. > I wonder whether this can explain the mysterous cLVM retransmit list growing > under some loads. > > [...] It can, if the total bytes in the queue is higher than the transmission rate of the network interface. Packets in band 1 and 2 are only not delivered if band 0 always has packets. But again, for that, the transmission rate must be lower than the rate at which the queues grows. Does totem generate a runaway stream of packets when they are delivered in time? - -- Mit freundlichen Grüßen/Kind Regards, Noel Kuntze GPG Key ID: 0x63EC6658 Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658 -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJV9rJNAAoJEDg5KY9j7GZYTGkP/A/WN+7NR5RFV9BXXRyircQ5 jNBkBoFKvhkyts5Yr9KUnnghlG/Kov+iPAYBaOpf8CRuo5PI0jN7UUjSYmg65GsM rJFaWr+zPvIftrNc+rhT9ID2g+1OKQSbCGdnq6MjPbuLpoU3b9153J4wb85Qjf7B OD5+rmtlvPFswvfwbXN+Q6pS+GOovZThad08PQQCNfNqqzda/wlnxRxLQSVywIWS MV0M7ciwmsnlbnmIamp4fSepwtJMp1dw+eprcGOKCCVnSzzPIfG6ZSZv5m737+KP TfpqTdkEcF9xUk2mmd4NPSX7FmW0djKy1Ws2gR26j5l2332lcc29E7tItV18Lr8J 6se0oNdXXWcjL1dfIkS0wox91eZBMH6RpQWuIBiYzN+y21VzU1ycMoiwlMTI4vxo GTlqX51dB+IZ+FZ0Tn2UPmzWuz0oAQ3PEIZUD50KgOLELLp9Ri2Fpy4CKrJv2k6O 9fb4dq1vPvBiMthYUQ75Y/zb3YSteA2gp7VpSYQUEtfsJdwf6oS2xefIa0vqG232 9K76Wf5EJFw3hKy3guAsv3Sc7K8Bo3rqpnsyFyJmk2Xq5iticZGdgSZYp3xg1mP6 UjdGH9ic+7KKFPMh4Y6e0VO2NXOAuYX2AelHyujHfK71AVbri+IfoMJrJszIISwT xxJV7KsfaQvGo+qRlAzq =qEtz -END PGP SIGNATURE- ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello Christine, I googled a bit and some doc[1] says that TC_PRIO_INTERACTIVE maps to value 6, whatever that is. Assuming that value of 6 is the same as the "priority value", Corosync traffic should go into band 0, because TOS values of 0x10 and 0x14 have "priority value" 6, too. The page[2] on lartc.org says that, too. That means that at least when pfifo_fast is used, there's no need for iptables rules or tc filters to prioritize Corosync traffic manually. [1] http://linux-tc-notes.sourceforge.net/tc/doc/priority.txt [2] http://lartc.org/howto/lartc.qdisc.classless.html#AEN658 - -- Mit freundlichen Grüßen/Kind Regards, Noel Kuntze GPG Key ID: 0x63EC6658 Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658 -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJV9t2yAAoJEDg5KY9j7GZYA88P/3RVecICvk8PPSkP00SzUwB4 Rz1UCPyrT68y0ndvxWo9XL4aVBjKskmFHCD6db74hn6LOwa91l0lgV/ii6CZWHge 2Bvp3TVM18BzWN/iy+4AbAKHQ8wayLH/c1P1lRy8BYDCVvLK8/hT36A8TOPYhCFa cXQcD1cGfAbfTlFTc1UjuJRjniPA8SOwHtbRNfSdJw6PznYX7smGy3tfcu7bpGRe 6Q62eaayZn1P3tZb3yc1Jt/J237pf9GMBCqLu6SRg0+yKJpgyhfSBgMtg7rCrPP3 ax6ta6ypXgIucCgp68vo78k1hcXwIDmUbvkuaR+su0GTulFIhl6qh74l+3UDFISH hICNa0TdznFX0JHy3hsko+W97zC5ywMNCm/RTiy6DckHnshZyuLS5G1gh50AGhgc tGMn1K06fIqmpq+MogRvZeszNOqm43qTBGGD1oXYOIcJnBbIYuUevLs5xoKWkwS8 afqvIWjFeNIKkEDjla8h9RblpzLAZ8uR68m2jYrev81Bb14DqUnVb2m3DcbqwYIS x7wtoY4Mxj3c4joH+xiJ/Hk2JXf+S2JqoDFwcyLoh3gfAle7YAhx9cl93xCjJgf5 7CsTrojhRyLgOcvFnMpCB6r6guNVGqFN4kUvgP7eMWxBLpEE53CS4V44YS1QJkGR 6z+6w1a02L2UAgVI16Oj =Q8Dc -END PGP SIGNATURE- ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] node dead timeout
Hello Dejan, thanks for your answer! OK, I see.. I'm using cman, so I guess it's option. I'm not sure whether I understand RH docs on this correctly, do I need to stop/start whole cluster for this option to apply? Or even better, is there a way how can I check currently set value? with best regards nik > No, it's an event delivered by the underlying messaging layer. I > suppose that you're using corosync, in which case see about token > timeout in corosync.conf(5). > > Thanks, > > Dejan > > > thanks a lot in advance! > > > > BR > > > > nik > > > > -- > > - > > Ing. Nikola CIPRICH > > LinuxBox.cz, s.r.o. > > 28.rijna 168, 709 00 Ostrava > > > > tel.: +420 591 166 214 > > fax:+420 596 621 273 > > mobil: +420 777 093 799 > > www.linuxbox.cz > > > > mobil servis: +420 737 238 656 > > email servis: ser...@linuxbox.cz > > - > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - pgpVhqkHoFpI5.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Antw: Re: EL6, cman, rrp, unicast and iptables
>>> Noel Kuntzeschrieb am 14.09.2015 um 13:55 in Nachricht <55f6b5b5.8020...@familie-kuntze.de>: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Hello Ullrich, > >> Actually I don't understand that claim: If packets are delivered in order >>(mostly), any TOTEM packet has the same change to arrive than any other s/change/chance/ # typing error, maybe lack of coffeine in the morning > packet. >> While all other communication protocols can actually deal with Ethernet and > the >> Internet, TOTEM is the only protocol that can fail even in a switched LAN. I >> haven't benn convinced yet that it's not an implementation issue of TOTEM. >> Instead of telling people to fiddle with their network configuration, I'd >> prefer putting more efforts into fixing TOTEM. > > I assume with "change to arrive", you mean delay? Or do you mean the > ordering of > the packets? > Totem behaves like it does because it needs to detect a failed node, afaik. What totem does it detect network problems when there are none: # grep ringid.*FAULTY /var/log/messages |wc -l 1981 > This is something that no other protocol you encounter on the internet/LAN > is supposed to do. Definitely not: 0 interface errors on any interface, not communication problems. > All of those protocols are either for error reporting (ICMP) or for > transceiving of > data (udp/tcp). UDP obviously has no congestion algorithm, but TCP does. Even NFS over UDP is much smarter than TOTEM is. > >> The main problem with priorities is who decides what is most important, >> especially if a medium is shered by many different software stacks and s/shered/shared/ #se above >> applications. > > Obviously some type of prioritzation has to be done, or at least should be > done, > because some things *are* more important than others. The only thing that > can > control congestion centrally in a computer system is the interface that > controls > access to it, so it's either the NIC or the software that controls access to > it, so the network stack > of the operating system. The problem is a different one when the LAN is > bridged, rather then switched, because then > the transmission of other hosts affects the transmission of one host. If you have a central authority that can decide on each eand every priority you are right. I was talking from practical experience... > - -- > > Mit freundlichen Grüßen/Kind Regards, > Noel Kuntze > > GPG Key ID: 0x63EC6658 > Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658 > > -BEGIN PGP SIGNATURE- > Version: GnuPG v2 > > iQIcBAEBCAAGBQJV9rWvAAoJEDg5KY9j7GZY3LkP/3vkEppL48nwlAGVpFIbVIRj > HpC6usWTFTaS3s20FOBo+60mtGAi6QnDku05WEkcjKrN8rjb0lll8KKAxCAP5ejO > xofUpmSZp4vs534gpwYotXf8IU4ZwLsF5WEjdtVc0AoVk99TNwS8g7P2eGRybvxy > Qdr+C2I99n4iqr93MRjDRRZj5S6t+PICr7s2hRrGrNIiSO0XJJdnoJWYR2g3DlPi > 9tw2nyb832Pe3eusRqBdXN1lDEw8Amr2apjW6yGlNKlbaVe/TbcxZg4qnuPQtTAa > Jc9pxItG31ZGG6G3SyzQuU2VG1DUGfyqUBAKv//oQtlb8YEklYHfzvhUvf4/XTJn > 5Zcv6IVoTUVVexB6bmQ6sHxbsXpHrb7Y+uViqVNEogJ66I4kTi9jo7DxxW3Mjsct > TSMjGAWEdmhi1KKONuCnqLMvyVdqdF/4VKZhJ6P2NaVQpk/8zXXrp1Q0zJmfupV6 > awQXvwRdAwM4KP+G94KxjFn8J7cuC3a6Hk2LuQp2OL/2IEliN5p8+R0lii6eVev4 > n+wVsgLve/JHMBghNhJTf5Fs6+lUsgOOYt4RK3/gqAFuktE53XqmwdMVjl3yelXR > UR5J3GxQ5AbuhzetbVn1HIVMfOzwjzgW8vjcWmkmB01tOKXyvpyWRjFP6HawLxCh > kWHwsh6S+7OxJ0Oijrs5 > =CyyI > -END PGP SIGNATURE- > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] node dead timeout
Hi, On Mon, Sep 14, 2015 at 09:15:59AM +0200, Nikola Ciprich wrote: > Hello Andrew and all pacemaker users and developers, > > I'd like to ask for advice - reading the docs, I'm still not sure - how > can I set timeout telling when is node considered dead (and fenced)? > > Is it dc-deadtime ? No, it's an event delivered by the underlying messaging layer. I suppose that you're using corosync, in which case see about token timeout in corosync.conf(5). Thanks, Dejan > thanks a lot in advance! > > BR > > nik > > -- > - > Ing. Nikola CIPRICH > LinuxBox.cz, s.r.o. > 28.rijna 168, 709 00 Ostrava > > tel.: +420 591 166 214 > fax:+420 596 621 273 > mobil: +420 777 093 799 > www.linuxbox.cz > > mobil servis: +420 737 238 656 > email servis: ser...@linuxbox.cz > - > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello Christine, Do you have a pointer for me where to look in the source? Searching for TC_INTERACTIVE in the Corosync sources on Github yielded no results. How the scheduler handles the packets depends on the settings and type of it, so yes, it's no guarantee. Whatever comes in front of the scheduler is important, too, but I do not know if there is something in front of it and it it is, what it is. - -- Mit freundlichen Grüßen/Kind Regards, Noel Kuntze GPG Key ID: 0x63EC6658 Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658 -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJV9tigAAoJEDg5KY9j7GZYzG8QAJq0a47VlmpJQS0fAnXkW+1f ejTX5FX5tKotHQHRIBnp+mImH8JzSd2z0Al45w+dy99a22Usx9VGvzwyOt4nSSJj B6Nh3CEZRJuGq4CnUM7qS3WP9ZF6m3R+5mbvDAl/CNRjkFDRbdxrjmpAYbcwJu+A Pjeo5DtN1jwwZVZvUDRYRcguiJ0FY0SfQxgegvfJohSw58KEDcvVoUUFxVMcyJcZ llpmp3fB3pe2KpsDPQobsfw36BGEtkOd4dHd+KGMvWHXQxtEwh5HXoCSu9G8ztey 51b34KKsi3a1pOet8gXWDECiAeOPX31PnjBCAF/bzy+xdkqyX9dH+LzUaLhgkCSy Zd2WJPOPQxob6SXLT4Nwi1uWoFSyHjPMILUStGRvIQt1520r6tQeXuxnFOnyohv4 AXYgEjeXLvJ3Dm5kCp/esPOJTxK0tMm4G1gtjCY3B1cGda/sAtDI3y5Xh4g8FiYE qtHyJ66c3wkKMA2myT4ZiklNf6Gdk1iGgIbSfoMdsYu4GrDVz0oMKyalE6OIvWfM lZrqS01ruXf3heT753UB9g079xcs0Ofuj4SmkD/3FPPAAzAcDac8VXPVW5Coktzf NutX+mfW/YRM4TmKetOLx8CicfLDgx4lGo8AAXyNLAxIE0yZf1czoar0aViizRZz bOvRvU4SZ2b7hrWfEcM7 =XBBE -END PGP SIGNATURE- ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Adding 'virsh migrate migrate-setspeed' support to the vm RA
On 14/09/15 07:19 AM, Dejan Muhamedagic wrote: > Hi Digimer, > > On Fri, Sep 04, 2015 at 03:36:09AM -0400, Digimer wrote: >> Hi all, >> >> I hit an issue a little while ago where live-migrating a VM (on the >> same management network normally used for corosync and a few other >> monitoring tasks) ate up all the bandwidth, causing corosync to declare >> a failure and triggering a fence. >> >> I've dealt with this, in part, by adding redundant ring support on a >> different network. However, I'd like to also limit the migration >> bandwidth so that I don't need to fail over the ring in the first place. > > As you certainly know, heavy traffic networks are not well suited > for the latency sensitive cluster communication. Best to have the > two separated. Oh for sure, but I've already hit 6 NICs per node (back-channel, normally just corosync, storage dedicated to DRBD and internet-facing dedicated to the user app). Adding two more NICs just for live migration, which happens very rarely, is hard to justify, despite this issue. If I can't reliably solve it with this (and/or tc and/or rrp...) then I will revisit this option. >> Is it reasonable/feasible to add 'virsh migrate-setspeed' support to >> the vm.sh RA? Something like; 'setspeed="75MiB"'? > > Yes, sounds like a good idea. Care to open an issue in github? > > Thanks, > > Dejan Done. https://github.com/ClusterLabs/resource-agents/issues/679 -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables
On 14/09/15 10:46 AM, Noel Kuntze wrote: > > Hello Christine, > > I googled a bit and some doc[1] says that TC_PRIO_INTERACTIVE maps to value > 6, whatever that is. > Assuming that value of 6 is the same as the "priority value", Corosync > traffic should go into band 0, because > TOS values of 0x10 and 0x14 have "priority value" 6, too. The page[2] on > lartc.org says that, too. > > That means that at least when pfifo_fast is used, there's no need for > iptables rules or tc filters to prioritize Corosync traffic manually. > > [1] http://linux-tc-notes.sourceforge.net/tc/doc/priority.txt > [2] http://lartc.org/howto/lartc.qdisc.classless.html#AEN658 So what's the final verdict on this? I followed your back and forth, and it sounds like corosync uses 0, so nothing else is to be done? I'm also fully willing to admit that something else triggered the fault detection. It happened during a long live migration (actually, several servers back to back), so I *assumed* that was the cause. Given it was a cut-over weekend though, I made a mental note and went back to work. Bad choice... I should have snagged the logs for later investigation. =/ digimer -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables
On 14/09/15 04:20 AM, Jan Friesse wrote: > Digimer napsal(a): >> Hi all, >> >>Starting a new thread from the "Clustered LVM with iptables issue" >> thread... >> >>I've decided to review how I do networking entirely in my cluster. I >> make zero claims to being great at networks, so I would love some >> feedback. >> >>I've got three active/passive bonded interfaces; Back-Channel, Storage >> and Internet-Facing networks. The IFN is "off limits" to the cluster as >> it is dedicated to hosted server traffic only. >> >>So before, I uses only the BCN for cluster traffic for cman/corosync >> multicast traffic, no rrp. A couple months ago, I had a cluster >> partition when VM live migration (also on the BCN) congested the >> network. So I decided to enable RRP using the SN as backup, which has >> been marginally successful. >> >>Now, I want to switch to unicast (> the SN as the backup and BCN as the primary ring and do a proper >> IPTables firewall. Is this sane? >> >>When I stopped iptables entirely and started cman with unicast + RRP, >> I saw this: >> >> ] Node 1 >> Sep 11 17:31:24 node1 kernel: DLM (built Aug 10 2015 09:45:36) installed >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Corosync Cluster Engine >> ('1.4.7'): started and ready to provide service. >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Corosync built-in >> features: nss dbus rdma snmp >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Successfully read >> config from /etc/cluster/cluster.conf >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Successfully parsed >> cman config >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing transport >> (UDP/IP Unicast). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing transport >> (UDP/IP Unicast). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] The network interface >> [10.20.10.1] is now up. >> Sep 11 17:31:24 node1 corosync[2523]: [QUORUM] Using quorum provider >> quorum_cman >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync cluster quorum service v0.1 >> Sep 11 17:31:24 node1 corosync[2523]: [CMAN ] CMAN 3.0.12.1 (built >> Jul 6 2015 05:30:35) started >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync CMAN membership service 2.90 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> openais checkpoint service B.01.01 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync extended virtual synchrony service >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync configuration service >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync cluster closed process group service v1.01 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync cluster config database access v1.01 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync profile loading service >> Sep 11 17:31:24 node1 corosync[2523]: [QUORUM] Using quorum provider >> quorum_cman >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine loaded: >> corosync cluster quorum service v0.1 >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Compatibility mode set >> to whitetank. Using V1 and V2 of the synchronization engine. >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU member >> {10.20.10.1} >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU member >> {10.20.10.2} >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] The network interface >> [10.10.10.1] is now up. >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU member >> {10.10.10.1} >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU member >> {10.10.10.2} >> Sep 11 17:31:27 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 1 iface 10.10.10.1 to [1 of 3] >> Sep 11 17:31:27 node1 corosync[2523]: [TOTEM ] A processor joined or >> left the membership and a new membership was formed. >> Sep 11 17:31:27 node1 corosync[2523]: [CMAN ] quorum regained, >> resuming activity >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] This node is within the >> primary component and will provide service. >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] Members[1]: 1 >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] Members[1]: 1 >> Sep 11 17:31:27 node1 corosync[2523]: [CPG ] chosen downlist: sender >> r(0) ip(10.20.10.1) r(1) ip(10.10.10.1) ; members(old:0 left:0) >> Sep 11 17:31:27 node1 corosync[2523]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Sep 11 17:31:27 node1
Re: [ClusterLabs] Parser error with fence_ipmilan
> On 14 Sep 2015, at 7:48 pm, danwrote: > > mån 2015-09-14 klockan 10:02 +0200 skrev dan: >> Hi >> >> To see if my cluster problem go away with a newer version of pacemaker I >> have now installed pcemaker 1.1.12+git+a9c8177-3ubuntu1 and I had to get >> 4.0.19-1 (ubuntu) of fence-agents to get a working fence-ipmilan. >> >> But now when the cluster wants to stonith a node I get: >> >> fence_ipmilan: Parser error: option -n/--plug is not recognize >> fence_ipmilan: Please use '-h' for usage >> >> Is the problem in fence-agents or in pacemaker? > > Looking at the code producing this, I got it working by adding to my > cluster config for my stonith devices > port_as_ip=1 port=192.168.xx.xx > > before I had: > lanplus=1 ipaddr=192.168.xx.xx > which worked fine before the new version of pacemaker. > Now I have: > lanplus=1 ipaddr=192.168.xx.xx port_as_ip=1 port=192.168.xx.xx > which works. I’m glad it works, looks like a regression to me though. You shouldn’t need to override the value pacemaker supplies for port if ipaddr is being set. Can you comment on this Marek? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables
Digimer napsal(a): Hi all, Starting a new thread from the "Clustered LVM with iptables issue" thread... I've decided to review how I do networking entirely in my cluster. I make zero claims to being great at networks, so I would love some feedback. I've got three active/passive bonded interfaces; Back-Channel, Storage and Internet-Facing networks. The IFN is "off limits" to the cluster as it is dedicated to hosted server traffic only. So before, I uses only the BCN for cluster traffic for cman/corosync multicast traffic, no rrp. A couple months ago, I had a cluster partition when VM live migration (also on the BCN) congested the network. So I decided to enable RRP using the SN as backup, which has been marginally successful. Now, I want to switch to unicast ( Don't do ifdown. Corosync reacts on ifdown very badly (long time known issue, also it's one of the reason for knet in future version). Also rrp active is not so well tested as passive, so give a try to passive. Honza Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Parser error with fence_ipmilan
mån 2015-09-14 klockan 10:02 +0200 skrev dan: > Hi > > To see if my cluster problem go away with a newer version of pacemaker I > have now installed pcemaker 1.1.12+git+a9c8177-3ubuntu1 and I had to get > 4.0.19-1 (ubuntu) of fence-agents to get a working fence-ipmilan. > > But now when the cluster wants to stonith a node I get: > > fence_ipmilan: Parser error: option -n/--plug is not recognize > fence_ipmilan: Please use '-h' for usage > > Is the problem in fence-agents or in pacemaker? Looking at the code producing this, I got it working by adding to my cluster config for my stonith devices port_as_ip=1 port=192.168.xx.xx before I had: lanplus=1 ipaddr=192.168.xx.xx which worked fine before the new version of pacemaker. Now I have: lanplus=1 ipaddr=192.168.xx.xx port_as_ip=1 port=192.168.xx.xx which works. Dan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org