Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On Wed, Dec 04, 2019 at 02:44:49PM +0100, Jan Pokorný wrote: > For the record, based on my feedback, iptables-extensions man page is > headed to (finally) align with the actual in-kernel deprecation > message: > https://lore.kernel.org/netfilter-devel/20191204130921.2914-1-p...@nwl.cc/ >From a quick run of xt_cluster it seems to be working as expected for IPv4. It requires iptables rules and ARP reply rewrite like: arptables -A OUTPUT -o eth1 --h-length 6 -j mangle --mangle-mac-s 01:00:5e:00:01:01 However for IPv6 I could not find an equivalent command to rewrite Neighbour Advertisment packets. Does anyone have an idea how this could be done? -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On 03/12/19 23:38 +0100, Valentin Vidić wrote: > On Tue, Dec 03, 2019 at 11:14:41PM +0100, Jan Pokorný wrote: >> The conclusion is hence that even with bleeding edge software >> collection, there's no real problem in using ipt_CLUSTERIP >> (when compiled in or alongside kernel) when a proper interface >> is used, which may boil down to using an appropriate version of >> iptables command. The respective logic to select the proper one >> could be easily extended in the IPaddr2 agent (sorry, I mis-cased >> this name previously; in a nutshell: if there's iptables-legacy >> command, prefer that instead), which looks far more attainable >> than porting to xt_cluster any time soon unless there are >> volunteers. > > Indeed, I have tested with 2 nodes and TCP connections work as > expected: packets arrive at both nodes but only one of them > responds - sometimes the first node and sometimes the second. > > For ARP both nodes respond with the same multicast MAC: > > 22:33:14.231779 ARP, Request who-has 192.168.122.101 tell 192.168.122.1, > length 28 > 22:33:14.231833 ARP, Reply 192.168.122.101 is-at 21:53:69:51:3e:b1, length 28 > 22:33:14.231833 ARP, Reply 192.168.122.101 is-at 21:53:69:51:3e:b1, length 28 > >> Is there any iptables-legacy command equivalent in Debian? > > Yes, iptables package in Debian installs both: > > /usr/sbin/iptables-legacy > /usr/sbin/iptables-nft > > so the agent can be modified to prefer iptables-legacy over > iptables. Perfect, thanks for the affirmation, Valentin. -> https://github.com/ClusterLabs/resource-agents/pull/1439 For the record, based on my feedback, iptables-extensions man page is headed to (finally) align with the actual in-kernel deprecation message: https://lore.kernel.org/netfilter-devel/20191204130921.2914-1-p...@nwl.cc/ Woot! -- Jan (Poki) pgpi0b3Lz8I_i.pgp Description: PGP signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On Tue, Dec 03, 2019 at 11:14:41PM +0100, Jan Pokorný wrote: > The conclusion is hence that even with bleeding edge software > collection, there's no real problem in using ipt_CLUSTERIP > (when compiled in or alongside kernel) when a proper interface > is used, which may boil down to using an appropriate version of > iptables command. The respective logic to select the proper one > could be easily extended in the IPaddr2 agent (sorry, I mis-cased > this name previously; in a nutshell: if there's iptables-legacy > command, prefer that instead), which looks far more attainable > than porting to xt_cluster any time soon unless there are > volunteers. Indeed, I have tested with 2 nodes and TCP connections work as expected: packets arrive at both nodes but only one of them responds - sometimes the first node and sometimes the second. For ARP both nodes respond with the same multicast MAC: 22:33:14.231779 ARP, Request who-has 192.168.122.101 tell 192.168.122.1, length 28 22:33:14.231833 ARP, Reply 192.168.122.101 is-at 21:53:69:51:3e:b1, length 28 22:33:14.231833 ARP, Reply 192.168.122.101 is-at 21:53:69:51:3e:b1, length 28 > Is there any iptables-legacy command equivalent in Debian? Yes, iptables package in Debian installs both: /usr/sbin/iptables-legacy /usr/sbin/iptables-nft so the agent can be modified to prefer iptables-legacy over iptables. -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On 03/12/19 23:19 +0100, Valentin Vidić wrote: > Interesting enough, ipt_CLUSTERIP still seems to work when using > iptables-legacy :) 5 minutes ago, in another part of this thread :) -- Jan (Poki) pgpS99Ce6dNGb.pgp Description: PGP signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On Tue, Dec 03, 2019 at 08:38:06PM +0100, Valentin Vidić wrote: > The module might still work but the iptables command from the agent fails: > > [ 842.536916] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully > [ 842.539215] ipt_CLUSTERIP: cannot use CLUSTERIP target from nftables compat > > # uname -a > Linux sid 5.3.0-2-amd64 #1 SMP Debian 5.3.9-3 (2019-11-19) x86_64 GNU/Linux > > # iptables --version > iptables v1.8.3 (nf_tables) Interesting enough, ipt_CLUSTERIP still seems to work when using iptables-legacy :) OTHO, xt_cluster is a much simpler module so the same functionality requires more setup. Probably better to create a new ClusterIP agent than pollute IPaddr2 even more... -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On 03/12/19 20:38 +0100, Valentin Vidić wrote: > On Tue, Dec 03, 2019 at 03:06:14PM +0100, Jan Pokorný wrote: >> You likely refer to >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43270b1bc5f1e33522dacf3d3b9175c29404c36c >> >> however this extension is activelly maintained to this day, so don't >> see any immediate risks other than something related to containers >> as referred to from said commit -- that is good to know about in >> such scenarios nonetheless. >> >> My up2date Fedora Rawhide iptables installation, or rather its >> iptables-extensions(8) man page does not mention any deprecation >> at all (unlike with ULOG extension). >> >> OTOH, what may be a true show stopper is a support for IPv4 only, >> which xt_cluster seems to rectify. > > The module might still work but the iptables command from the agent fails: > > [ 842.536916] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully > [ 842.539215] ipt_CLUSTERIP: cannot use CLUSTERIP target from nftables compat > > # uname -a > Linux sid 5.3.0-2-amd64 #1 SMP Debian 5.3.9-3 (2019-11-19) x86_64 GNU/Linux > > # iptables --version > iptables v1.8.3 (nf_tables) Hm, you made me feel so much behind in this area :-/ Actually thank you, since this is going to bite the meat, different story than looming proclamations :-) TBH, there's so much going on in Fedora in the firewall area that I momentarily thought I was completely drowned -- not covered by firewall at all (per casual "iptables -nvL", haha): https://fedoraproject.org/wiki/Changes/firewalld_default_to_nftables So, in order to reproduce your situation, I had to install something that comes as "iptables-nft" in Fedora Rawhide, with the command being canonically named iptables-nft: # uname -srvmp > Linux 5.4.0-0.rc6.git2.1.fc32.x86_64 #1 SMP Thu Nov 7 16:31:36 UTC > 2019 x86_64 x86_64 # modprobe ipt_CLUSTERIP dmesg <- "ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully" # iptables-nft -I INPUT -i lo -d 127.0.0.200 -j CLUSTERIP --new \ --hashmode sourceip-sourceport --clustermac 01:00:5E:00:00:c8 --total-nodes 2 --local-node 1 > iptables v1.8.3 (nf_tables): RULE_INSERT failed (Operation not > supported): rule in chain INPUT dmesg <- "pt_CLUSTERIP: cannot use CLUSTERIP target from nftables compat" where --version matches your standard "iptables" # iptables-nft --version > iptables v1.8.3 (nf_tables) However! # readlink -f /usr/sbin/iptables{,-legacy} > /usr/sbin/xtables-legacy-multi > /usr/sbin/xtables-legacy-multi # iptables --version > iptables v1.8.3 (legacy) # iptables -I INPUT -i lo -d 127.0.0.200 -j CLUSTERIP --new \ --hashmode sourceip-sourceport --clustermac 01:00:5E:00:00:20 --total-nodes 2 --local-node 1 # echo $? > 0 dmesg <- "ipt_CLUSTERIP: ipt_CLUSTERIP is deprecated and it will removed soon, use xt_cluster instead" And it even works (hypothesis: there will be about 50% probability the "virtual IP" 127.0.0.200 will be unavailable as long as source port randomization of the client side is fair)! # mkdir /tmp/hello; touch hello.world; python3 -m http.server& CMD1> Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) # curl -m1 127.0.0.200:8000 CMD1> 127.0.0.1 - - [03/Dec/2019 22:55:46] "GET / HTTP/1.1" 200 - CMD2> "http://www.w3.org/TR/html4/strict.dtd";> CMD2> CMD2> CMD2> CMD2> Directory listing for / CMD2> CMD2> CMD2> Directory listing for / CMD2> CMD2> CMD2> hello.world CMD2> CMD2> CMD2> CMD2> # curl -m1 127.0.0.200:8000 CMD2> curl: (28) Connection timed out after 1001 milliseconds # fg > python3 -m http.server ^C > Keyboard interrupt received, exiting Ok, it is relatively fair, so the hypothesis holds based on these empiric data. The conclusion is hence that even with bleeding edge software collection, there's no real problem in using ipt_CLUSTERIP (when compiled in or alongside kernel) when a proper interface is used, which may boil down to using an appropriate version of iptables command. The respective logic to select the proper one could be easily extended in the IPaddr2 agent (sorry, I mis-cased this name previously; in a nutshell: if there's iptables-legacy command, prefer that instead), which looks far more attainable than porting to xt_cluster any time soon unless there are volunteers. Is there any iptables-legacy command equivalent in Debian? What I want to demonstrate with this is that no bridge appears to be burnt, regardless any declarations and worries, yet. Once again, thanks for pushing for this in-depth analysis, so we could gain more knowledge on the matter. -- Jan (Poki) pgpdQC1VnGfqg.pgp Description: PGP signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On Tue, Dec 03, 2019 at 03:06:14PM +0100, Jan Pokorný wrote: > You likely refer to > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43270b1bc5f1e33522dacf3d3b9175c29404c36c > > however this extension is activelly maintained to this day, so don't > see any immediate risks other than something related to containers > as referred to from said commit -- that is good to know about in > such scenarios nonetheless. > > My up2date Fedora Rawhide iptables installation, or rather its > iptables-extensions(8) man page does not mention any deprecation > at all (unlike with ULOG extension). > > OTOH, what may be a true show stopper is a support for IPv4 only, > which xt_cluster seems to rectify. The module might still work but the iptables command from the agent fails: [ 842.536916] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully [ 842.539215] ipt_CLUSTERIP: cannot use CLUSTERIP target from nftables compat # uname -a Linux sid 5.3.0-2-amd64 #1 SMP Debian 5.3.9-3 (2019-11-19) x86_64 GNU/Linux # iptables --version iptables v1.8.3 (nf_tables) -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On 02/12/19 09:50 -0600, Ken Gaillot wrote: > On Sat, 2019-11-30 at 18:58 +0300, Andrei Borzenkov wrote: >> 29.11.2019 17:46, Jan Pokorný пишет: >>> "Clone" feature for IPAddr2 is actually sort of an overloading that >>> agent with an alternative functionality -- trivial low-level load >>> balancing. You can ignore that if you don't need any such. >>> >> >> I would say IPaddr2 in clone mode does something similar to >> SharedAddress. > > Just a side note about something that came up recently: > > IPaddr2 cloning utilizes the iptables "clusterip" feature, which has > been deprecated in the Linux kernel since 2015. IPaddr2 cloning > therefore must be considered deprecated as well. (Using it for a single > floating IP is still fully supported.) > > IPaddr2 could be modified to use a newer iptables capability called > "xt_cluster", but someone would have to volunteer to do that as it's > not a priority. You likely refer to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43270b1bc5f1e33522dacf3d3b9175c29404c36c however this extension is activelly maintained to this day, so don't see any immediate risks other than something related to containers as referred to from said commit -- that is good to know about in such scenarios nonetheless. My up2date Fedora Rawhide iptables installation, or rather its iptables-extensions(8) man page does not mention any deprecation at all (unlike with ULOG extension). OTOH, what may be a true show stopper is a support for IPv4 only, which xt_cluster seems to rectify. -- Jan (Poki) pgpBadtkZA8r0.pgp Description: PGP signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On Sat, 2019-11-30 at 18:58 +0300, Andrei Borzenkov wrote: > 29.11.2019 17:46, Jan Pokorný пишет: > > "Clone" feature for IPAddr2 is actually sort of an overloading that > > agent with an alternative functionality -- trivial low-level load > > balancing. You can ignore that if you don't need any such. > > > > I would say IPaddr2 in clone mode does something similar to > SharedAddress. Just a side note about something that came up recently: IPaddr2 cloning utilizes the iptables "clusterip" feature, which has been deprecated in the Linux kernel since 2015. IPaddr2 cloning therefore must be considered deprecated as well. (Using it for a single floating IP is still fully supported.) IPaddr2 could be modified to use a newer iptables capability called "xt_cluster", but someone would have to volunteer to do that as it's not a priority. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
29.11.2019 17:46, Jan Pokorný пишет: On 27/11/19 20:13 +, matt_murd...@amat.com wrote: I finally understand that there is a Apache Resource for Pacemaker that assigns a single virtual ipaddress that "floats" between two nodes as in webservers. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/ch-service-haaa Can generic applications use this same resource type or is there an API to use to create a floating ip or a generic resource to use? Be assured generic applications can use the same "floating IP address", effectively IPAddr2 resource (particular instances thereof) to their own benefit. The only crux that can be seen here stems from loose started/stopped kind of inter-resource (putting resource-node relations aside) integration facilitated by pacemaker, imposing these constraints for a generic/convenient use case: 1. the resource relying on the particular floating IP instance needs to start at later moment than this IP instance (it could miss listening on the newly/at later point appearing IP otherwise) 2. the resource relying on the particular floating IP instance, in order to retain enough configuration flexibility, must _not_ be restricted regarding where to listen (bind the server side socket) at, for several reasons: - different names of the interfaces across the nodes to appoint the "externally provided service" network - fragile, possibly downright cluster-management hidden interdependencies whereby two parts of the overall configuration must exactly agree on the address to bind at, for given point in time apparently, this approach is suboptimal for constraining the allowed data paths (think, information security) Btw. As a slight paradox, rgmanager used for HA resource clustering in RHEL in the past allowed for natural resource "composability" (combinability, stackability) with a straightforward propagation of configuration values in the hierarchical composition of the resources -- it would then be enough if a floating IP dependent resource explicitly referred to IP to be borrowed from the cluster-tracked configuration of its hierarchically preceding "virtual IP" resource instance, avoiding thus hindrance stated in item 2. (item 1. remains rather universal). (Similar thing can, of course, be emulated with explicit pacemaker configuration introspection in the agent itself, however it feels rather dirty [resource agents would preferably avoid back-channel introspection of their own runners as much as possible, it'd break the rule of loose one-way coupling] in comparison to said built-in mechanism.) [*] In other HA system, Oracle Solaris Cluster, HPUX Service Guard, IBM PowerHA, they provide a "SharedAddress" resource type for applications to use. I suppose our ocf:heartbeat:IPAddr2 resource agent is a direct equivalent, but don't have the knowledge of these other products. At least in Oracle Solaris Cluster Shared Address is used in "scalable resource group" and effectively implements load-balance across multiple nodes, where each node answers requests on single virtual address. RH mentioned earlier actually describes typical failover cluster (with single IP address floating between two nodes), so I find reference to ShareAddress rather confusing here. I am getting confused by the Clone feature, the virtualized feature, and now the Apache resource as to how they all differ. "Clone" feature for IPAddr2 is actually sort of an overloading that agent with an alternative functionality -- trivial low-level load balancing. You can ignore that if you don't need any such. I would say IPaddr2 in clone mode does something similar to SharedAddress. Regarding "virtualized", virtual and floating are being used interchangably to refer to said "IP address" resource agent instances. Of course, you then have various other contexts of "virtualized", you can have virtual machines (VMs) as resources managed by pacemaker, your cluster can consist of a set of VMs rather than set of bare metal machines, "remote" instances of pacemaker can be detached in VMs, and so forth. If this isn't right group, let me know, and be kind, im just trying to get something working and make recommendations to my developers. This venue is a spot-on, welcome :-) [*] I've touched this topic slightly in the past: https://github.com/ClusterLabs/resource-agents/issues/1304#issuecomment-473525495 https://github.com/ClusterLabs/OCF-spec/issues/22 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons
On 27/11/19 20:13 +, matt_murd...@amat.com wrote: > I finally understand that there is a Apache Resource for Pacemaker > that assigns a single virtual ipaddress that "floats" between two > nodes as in webservers. > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/ch-service-haaa > > Can generic applications use this same resource type or is there an > API to use to create a floating ip or a generic resource to use? Be assured generic applications can use the same "floating IP address", effectively IPAddr2 resource (particular instances thereof) to their own benefit. The only crux that can be seen here stems from loose started/stopped kind of inter-resource (putting resource-node relations aside) integration facilitated by pacemaker, imposing these constraints for a generic/convenient use case: 1. the resource relying on the particular floating IP instance needs to start at later moment than this IP instance (it could miss listening on the newly/at later point appearing IP otherwise) 2. the resource relying on the particular floating IP instance, in order to retain enough configuration flexibility, must _not_ be restricted regarding where to listen (bind the server side socket) at, for several reasons: - different names of the interfaces across the nodes to appoint the "externally provided service" network - fragile, possibly downright cluster-management hidden interdependencies whereby two parts of the overall configuration must exactly agree on the address to bind at, for given point in time apparently, this approach is suboptimal for constraining the allowed data paths (think, information security) Btw. As a slight paradox, rgmanager used for HA resource clustering in RHEL in the past allowed for natural resource "composability" (combinability, stackability) with a straightforward propagation of configuration values in the hierarchical composition of the resources -- it would then be enough if a floating IP dependent resource explicitly referred to IP to be borrowed from the cluster-tracked configuration of its hierarchically preceding "virtual IP" resource instance, avoiding thus hindrance stated in item 2. (item 1. remains rather universal). (Similar thing can, of course, be emulated with explicit pacemaker configuration introspection in the agent itself, however it feels rather dirty [resource agents would preferably avoid back-channel introspection of their own runners as much as possible, it'd break the rule of loose one-way coupling] in comparison to said built-in mechanism.) [*] > In other HA system, Oracle Solaris Cluster, HPUX Service Guard, IBM > PowerHA, they provide a "SharedAddress" resource type for > applications to use. I suppose our ocf:heartbeat:IPAddr2 resource agent is a direct equivalent, but don't have the knowledge of these other products. > I am getting confused by the Clone feature, the virtualized feature, > and now the Apache resource as to how they all differ. "Clone" feature for IPAddr2 is actually sort of an overloading that agent with an alternative functionality -- trivial low-level load balancing. You can ignore that if you don't need any such. Regarding "virtualized", virtual and floating are being used interchangably to refer to said "IP address" resource agent instances. Of course, you then have various other contexts of "virtualized", you can have virtual machines (VMs) as resources managed by pacemaker, your cluster can consist of a set of VMs rather than set of bare metal machines, "remote" instances of pacemaker can be detached in VMs, and so forth. > If this isn't right group, let me know, and be kind, im just trying > to get something working and make recommendations to my developers. This venue is a spot-on, welcome :-) [*] I've touched this topic slightly in the past: https://github.com/ClusterLabs/resource-agents/issues/1304#issuecomment-473525495 https://github.com/ClusterLabs/OCF-spec/issues/22 -- Jan (Poki) pgpbTf6wCz_5v.pgp Description: PGP signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/