Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed
Am 20.02.20 um 15:14 schrieb Rafael David Tinoco: > we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba, Corosync and Pacemaker from the Ubuntu repos. We wanted to update to Ubuntu 16.04 but it failed: > > Quick question, perhaps unimportant to this forum, but, since this is a > samba HA setup, why to update to 16.04 and not to 18.04 ? I know Xenial > is still supported until 2024 but, as you're taking the chance to > migrate, why not the latest LTS version (bionic ?). The idea was to do a small step first while the cluster is still running - updating node by node (just 2) from corosync 2.3.3 to 2.3.5 und pacemaker 1.1.10 to 1.1.14. These minor updates looked possible to me - but I oversaw the upstart/systemd problem. If we have to shutdown the cluster anyway I think we will wait until April and then doing an upgrade to 20.04 LTS. > And you will understand their relation in the following wiki: > https://wiki.debian.org/MaintainerScripts Thanks for the notes and links. But during the upgrade I did not see any errors. > OR the upgrade was not smooth in regards to config options (you were > using) and its compatibility. > After manualle adding the nodelist the corosync at least this runs 'normal': corosync-quorumtool -sli Membership information -- Nodeid Votes Name 1084766053 1 192.168.55.101 1084766054 1 192.168.55.102 (local) This I see on both servers. Only the "local" tag is on the other IP - which is what I would expect. Does it mean all these errors or wrong state information are subject of pacemaker and not corosync, right? srv2 Last updated: Wed Feb 19 17:25:14 2020 Last change: Tue Feb 18 18:29:29 2020 by hacluster via crmd on srv2 Stack: corosync Current DC: srv2 (version 1.1.14-70404b0) - partition with quorum 2 nodes and 9 resources configured Node srv2: standby OFFLINE: [ srv1 ] Full list of resources: Resource Group: samba_daemons samba-nmbd (upstart:nmbd): Stopped [..]>> Failed Actions: * samba-nmbd_monitor_0 on srv2 'not installed' (5): call=5, status=Not installed, exitreason='none', last-rc-change='Wed Feb 19 14:13:20 2020', queued=0ms, exec=1ms [..] >> >> According to the logs it looks like the service (e.g. nmbd) is not >> available (may be because of (upstart:nmbd) - how do I change this >> configuration in pacemaker? I want to change it to "service" instead >> of "upstart". I hope this will fix at least the service problems. >> >> crm configure primitive smbd .. >> gives me: >> ERROR: smbd: id is already in use. I figured out to use "crm configure edit" to change the setup. > A bunch of things to notice from these messages: > > - Trusty used "upstart" as its init system > - Xenial uses systemd as its init system > - It looks to me you're using "upstart" resource agent > - In Xenial you would have to use systemd resource agent According to the pacemaker 1.1. manual, chapter 5.2.5 I can use the 'service' resource agent instead of systemd or upstart. But if I try with "configure edit" I see an error like "no such resource agent". I did this in an virtualbox test-environment with ubuntu 14.04. Any suggestions, ideas? Is the a nice HowTo for this upgrade situation? > > Yes > > 1) stop what you are doing, do it from the ground. > > 2) Take 1 of the servers and configure it appropriately using the proper > resource agent. Before configuring the resources, make sure the rings > are in good shape and the cluster has the proper votes. Thanks for all these notes and cents but I'm too less experienced to do this while the production node is online. I've to do more tests with my two virtualbox VMs to become more familuar with the corosync/pacemaker setup. Regards, Rasca ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed
Rasca Gmelch napsal(a): Am 19.02.20 um 19:20 schrieb Strahil Nikolov: On February 19, 2020 6:31:19 PM GMT+02:00, Rasca wrote: Hi, we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba, Corosync and Pacemaker from the Ubuntu repos. We wanted to update to Ubuntu 16.04 but it failed: I checked the versions before and because of just minor updates of corosync and pacemaker I thought it should be possible to update node by node. * Put srv2 into standby * Upgraded srv2 to Ubuntu 16.04 with reboot and so on * Added a nodelist to corosync.conf because it looked like corosync on srv2 didn't know the names of the node ids anymore But still it does not work on srv2. srv1 (the active server with ubuntu 14.04) ist fine. It looks like it's an upstart/systemd issue, but may be even more. Why does srv1 says UNCLEAN about srv2? On srv2 I see corosync sees both systems. But srv2 says srv1 is OFFLINE!? crm status srv1 Last updated: Wed Feb 19 17:22:03 2020 Last change: Tue Feb 18 11:05:47 2020 via crm_attribute on srv2 Stack: corosync Current DC: srv1 (1084766053) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 9 Resources configured Node srv2 (1084766054): UNCLEAN (offline) Online: [ srv1 ] Resource Group: samba_daemons samba-nmbd (upstart:nmbd): Started srv1 [..] srv2 Last updated: Wed Feb 19 17:25:14 2020 Last change: Tue Feb 18 18:29:29 2020 by hacluster via crmd on srv2 Stack: corosync Current DC: srv2 (version 1.1.14-70404b0) - partition with quorum 2 nodes and 9 resources configured Node srv2: standby OFFLINE: [ srv1 ] Still don't understand the concept of corosync/pacemaker. Which part is responsible for this "OFFLINE" statement? I don't know where to look deeper about this mismatch (see some lines above, where it says "Online" about srv1). Full list of resources: Resource Group: samba_daemons samba-nmbd (upstart:nmbd): Stopped [..]>> Failed Actions: * samba-nmbd_monitor_0 on srv2 'not installed' (5): call=5, status=Not installed, exitreason='none', last-rc-change='Wed Feb 19 14:13:20 2020', queued=0ms, exec=1ms [..] According to the logs it looks like the service (e.g. nmbd) is not available (may be because of (upstart:nmbd) - how do I change this configuration in pacemaker? I want to change it to "service" instead of "upstart". I hope this will fix at least the service problems. crm configure primitive smbd .. gives me: ERROR: smbd: id is already in use. Any suggestions, ideas? Is the a nice HowTo for this upgrade situation? Regards, Rasca Are you sure that there is no cluster peotocol mismatch ? Major number OS Upgrade (even if supported by vendor) must be done offline (with proper testing in advance). What happens when you upgraded the other node , or when you rollback the upgrade ? Best Regards, Strahil Nikolov Protocol mismatch of corosync or pacemaker? corosync-cmapctl shows that srv1 and srv2 are members. In the corosync config I have: service { ver: 0 name: pacemaker } What about this "ver: 0"? May be that's wrong - even for the ubuntu 14.04? The configuration itself was designed under ubuntu 12.04. May be we forgot to change this parameter when we upgraded from 12.04 to ubuntu 14.04 some years before? This is not used at all (was used for Pacemaker plugin for OpenAIS/Corosync 1.x). Honza Thx+Regards, Rasca ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed
>>> we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba, >>> Corosync and Pacemaker from the Ubuntu repos. We wanted to update >>> to Ubuntu 16.04 but it failed: Quick question, perhaps unimportant to this forum, but, since this is a samba HA setup, why to update to 16.04 and not to 18.04 ? I know Xenial is still supported until 2024 but, as you're taking the chance to migrate, why not the latest LTS version (bionic ?). >>> I checked the versions before and because of just minor updates >>> of corosync and pacemaker I thought it should be possible to >>> update node by node. You can check all versions with a tool called "rmadison": corosync|1.4.2-2|precise corosync|1.4.2-2ubuntu0.2|precise-updates corosync|2.3.3-1ubuntu1|trusty -corosync|2.3.3-1ubuntu4|trusty-updates corosync|2.3.5-3ubuntu1|xenial corosync|2.3.5-3ubuntu2.3|xenial-security +corosync|2.3.5-3ubuntu2.3|xenial-updates corosync|2.4.3-0ubuntu1|bionic corosync|2.4.3-0ubuntu1.1|bionic-security corosync|2.4.3-0ubuntu1.1|bionic-updates corosync|2.4.4-3|disco corosync|3.0.1-2ubuntu1|eoan corosync|3.0.2-1ubuntu2|focal pacemaker|1.1.6-2ubuntu3|precise pacemaker|1.1.6-2ubuntu3.3|precise-updates pacemaker|1.1.10+git20130802-1ubuntu2|trusty pacemaker|1.1.10+git20130802-1ubuntu2.4|trusty-security -pacemaker|1.1.10+git20130802-1ubuntu2.5|trusty-updates pacemaker|1.1.14-2ubuntu1|xenial pacemaker|1.1.14-2ubuntu1.6|xenial-security +pacemaker|1.1.14-2ubuntu1.6|xenial-updates pacemaker|1.1.18-0ubuntu1|bionic pacemaker|1.1.18-0ubuntu1.1|bionic-security pacemaker|1.1.18-0ubuntu1.1|bionic-updates pacemaker|1.1.18-2ubuntu1|disco pacemaker|1.1.18-2ubuntu1.19.04.1|disco-security pacemaker|1.1.18-2ubuntu1.19.04.1|disco-updates pacemaker|2.0.1-4ubuntu2|eoan pacemaker|2.0.1-5ubuntu5|focal >>> * Put srv2 into standby >>> * Upgraded srv2 to Ubuntu 16.04 with reboot and so on >>> * Added a nodelist to corosync.conf because it looked >>> like corosync on srv2 didn't know the names of the >>> node ids anymore The debian packaging upgrade execution path is likely not a topic to this list (likely targeted to the cluster software itself), but, since we are here... You can check the packaging scripts under "/var/lib/dpkg/info/" directory. Those files are the files to run in case of a package is uninstalled, purged, reinstalled, etc... In my current environment, important files would be: /var/lib/dpkg/info/pacemaker.conffiles /var/lib/dpkg/info/pacemaker-common.conffiles /var/lib/dpkg/info/pacemaker.postrm /var/lib/dpkg/info/pacemaker-common.postrm /var/lib/dpkg/info/pacemaker.prerm /var/lib/dpkg/info/pacemaker-common.postinst /var/lib/dpkg/info/pacemaker-cli-utils.postinst /var/lib/dpkg/info/pacemaker.postinst /var/lib/dpkg/info/corosync.prerm /var/lib/dpkg/info/corosync.conffiles /var/lib/dpkg/info/corosync.preinst /var/lib/dpkg/info/corosync.postrm /var/lib/dpkg/info/corosync.postinst And you will understand their relation in the following wiki: https://wiki.debian.org/MaintainerScripts Session "Upgrading": https://wiki.debian.org/MaintainerScripts?action=AttachFile&do=get&target=upgrade.png I haven't explored your upgrade execution path deeply, but, it sounds that your theory is that either you got important configuration files purged during package upgrade OR the jump from: -corosync|2.3.3-1ubuntu4|trusty-updates to +corosync|2.3.5-3ubuntu2.3|xenial-updates and -pacemaker|1.1.10+git20130802-1ubuntu2.5|trusty-updates to +pacemaker|1.1.14-2ubuntu1.6|xenial-updates OR the upgrade was not smooth in regards to config options (you were using) and its compatibility. Checking corosync only, there were 26 commits related to config (at least in a simple grep try): $ git log v2.3.3..v2.3.5 --pretty=oneline --grep config aabbace6 Log: Add logrotate configuration file b9f5c290 votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters 997074cc totemconfig: Check for duplicate nodeids d77cec24 Handle adding and removing UDPU members atomically 8f284b26 Reset timer_problem_decrementer on fault 6449bea8 config: Ensure mcast address/port differs for rrp 70bd35fc config: Process broadcast option consistently 6c028d4d config: Make sure user doesn't mix IPv6 and IPv4 57539d1a man page: Improve description of token timeout bb52fc27 Store configuration values used by totem to cmap 17488909 votequorum: Make qdev timeout in sync configurable 88dbb9f7 totemconfig: Make sure join timeout is less than consensus 3b8365e8 config: Fix typos 63bf0977 totemconfig: refactor nodelist_to_interface func 10c80f45 totemconfig: totem_config_get_ip_version dc35bfae totemconfig: Free ifaddrs list e3ffd4fe Implement config file testing mode 72cf15af votequorum: Do not process events during reload c8e3f14f Make config.reload_in_progress key read only d23ee6a3 upstart: Make job conf file configurable 7557fdec config: Allow dynamic change of token_coefficient 1f7e78ab init: Make init script configurable 9a8de87c totemconfig: Log errors on key change an
Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed
Am 19.02.20 um 19:20 schrieb Strahil Nikolov: > On February 19, 2020 6:31:19 PM GMT+02:00, Rasca > wrote: >> Hi, >> >> we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba, >> Corosync and Pacemaker from the Ubuntu repos. We wanted to update >> to Ubuntu 16.04 but it failed: >> >> I checked the versions before and because of just minor updates >> of corosync and pacemaker I thought it should be possible to >> update node by node. >> >> * Put srv2 into standby >> * Upgraded srv2 to Ubuntu 16.04 with reboot and so on >> * Added a nodelist to corosync.conf because it looked >> like corosync on srv2 didn't know the names of the >> node ids anymore >> >> But still it does not work on srv2. srv1 (the active >> server with ubuntu 14.04) ist fine. It looks like >> it's an upstart/systemd issue, but may be even more. >> Why does srv1 says UNCLEAN about srv2? On srv2 I see >> corosync sees both systems. But srv2 says srv1 is >> OFFLINE!? >> >> crm status >> >> >> srv1 >> Last updated: Wed Feb 19 17:22:03 2020 >> Last change: Tue Feb 18 11:05:47 2020 via crm_attribute on srv2 >> Stack: corosync >> Current DC: srv1 (1084766053) - partition with quorum >> Version: 1.1.10-42f2063 >> 2 Nodes configured >> 9 Resources configured >> >> >> Node srv2 (1084766054): UNCLEAN (offline) >> Online: [ srv1 ] >> >> Resource Group: samba_daemons >> samba-nmbd (upstart:nmbd): Started srv1 >> [..] >> >> >> srv2 >> Last updated: Wed Feb 19 17:25:14 2020 Last change: Tue Feb 18 >> 18:29:29 >> 2020 by hacluster via crmd on srv2 >> Stack: corosync >> Current DC: srv2 (version 1.1.14-70404b0) - partition with quorum >> 2 nodes and 9 resources configured >> >> Node srv2: standby >> OFFLINE: [ srv1 ] Still don't understand the concept of corosync/pacemaker. Which part is responsible for this "OFFLINE" statement? I don't know where to look deeper about this mismatch (see some lines above, where it says "Online" about srv1). >> >> Full list of resources: >> >> Resource Group: samba_daemons >> samba-nmbd (upstart:nmbd): Stopped >> [..]>> >> >> Failed Actions: >> * samba-nmbd_monitor_0 on srv2 'not installed' (5): call=5, status=Not >> installed, exitreason='none', >>last-rc-change='Wed Feb 19 14:13:20 2020', queued=0ms, exec=1ms >> [..] According to the logs it looks like the service (e.g. nmbd) is not available (may be because of (upstart:nmbd) - how do I change this configuration in pacemaker? I want to change it to "service" instead of "upstart". I hope this will fix at least the service problems. crm configure primitive smbd .. gives me: ERROR: smbd: id is already in use. >> >> Any suggestions, ideas? Is the a nice HowTo for this upgrade situation? >> >> Regards, >> Rasca > Are you sure that there is no cluster peotocol mismatch ? > > Major number OS Upgrade (even if supported by vendor) must be done offline > (with proper testing in advance). > > What happens when you upgraded the other node , or when you rollback the > upgrade ? > > Best Regards, > Strahil Nikolov Protocol mismatch of corosync or pacemaker? corosync-cmapctl shows that srv1 and srv2 are members. In the corosync config I have: service { ver: 0 name: pacemaker } What about this "ver: 0"? May be that's wrong - even for the ubuntu 14.04? The configuration itself was designed under ubuntu 12.04. May be we forgot to change this parameter when we upgraded from 12.04 to ubuntu 14.04 some years before? Thx+Regards, Rasca ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed
On February 19, 2020 6:31:19 PM GMT+02:00, Rasca wrote: >Hi, > >we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba, >Corosync and Pacemaker from the Ubuntu repos. We wanted to update >to Ubuntu 16.04 but it failed: > >I checked the versions before and because of just minor updates >of corosync and pacemaker I thought it should be possible to >update node by node. > >* Put srv2 into standby >* Upgraded srv2 to Ubuntu 16.04 with reboot and so on >* Added a nodelist to corosync.conf because it looked > like corosync on srv2 didn't know the names of the > node ids anymore > >But still it does not work on srv2. srv1 (the active >server with ubuntu 14.04) ist fine. It looks like >it's an upstart/systemd issue, but may be even more. >Why does srv1 says UNCLEAN about srv2? On srv2 I see >corosync sees both systems. But srv2 says srv1 is >OFFLINE!? > >crm status > > >srv1 >Last updated: Wed Feb 19 17:22:03 2020 >Last change: Tue Feb 18 11:05:47 2020 via crm_attribute on srv2 >Stack: corosync >Current DC: srv1 (1084766053) - partition with quorum >Version: 1.1.10-42f2063 >2 Nodes configured >9 Resources configured > > >Node srv2 (1084766054): UNCLEAN (offline) >Online: [ srv1 ] > > Resource Group: samba_daemons > samba-nmbd(upstart:nmbd): Started srv1 >[..] > > >srv2 >Last updated: Wed Feb 19 17:25:14 2020 Last change: Tue Feb 18 >18:29:29 >2020 by hacluster via crmd on srv2 >Stack: corosync >Current DC: srv2 (version 1.1.14-70404b0) - partition with quorum >2 nodes and 9 resources configured > >Node srv2: standby >OFFLINE: [ srv1 ] > >Full list of resources: > > Resource Group: samba_daemons > samba-nmbd(upstart:nmbd): Stopped >[..] > >Failed Actions: >* samba-nmbd_monitor_0 on srv2 'not installed' (5): call=5, status=Not >installed, exitreason='none', >last-rc-change='Wed Feb 19 14:13:20 2020', queued=0ms, exec=1ms >[..] > > >Any suggestions, ideas? Is the a nice HowTo for this upgrade situation? > >Regards, > Rasca > >___ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ Are you sure that there is no cluster peotocol mismatch ? Major number OS Upgrade (even if supported by vendor) must be done offline (with proper testing in advance). What happens when you upgraded the other node , or when you rollback the upgrade ? Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed
Hi, we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba, Corosync and Pacemaker from the Ubuntu repos. We wanted to update to Ubuntu 16.04 but it failed: I checked the versions before and because of just minor updates of corosync and pacemaker I thought it should be possible to update node by node. * Put srv2 into standby * Upgraded srv2 to Ubuntu 16.04 with reboot and so on * Added a nodelist to corosync.conf because it looked like corosync on srv2 didn't know the names of the node ids anymore But still it does not work on srv2. srv1 (the active server with ubuntu 14.04) ist fine. It looks like it's an upstart/systemd issue, but may be even more. Why does srv1 says UNCLEAN about srv2? On srv2 I see corosync sees both systems. But srv2 says srv1 is OFFLINE!? crm status srv1 Last updated: Wed Feb 19 17:22:03 2020 Last change: Tue Feb 18 11:05:47 2020 via crm_attribute on srv2 Stack: corosync Current DC: srv1 (1084766053) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 9 Resources configured Node srv2 (1084766054): UNCLEAN (offline) Online: [ srv1 ] Resource Group: samba_daemons samba-nmbd (upstart:nmbd): Started srv1 [..] srv2 Last updated: Wed Feb 19 17:25:14 2020 Last change: Tue Feb 18 18:29:29 2020 by hacluster via crmd on srv2 Stack: corosync Current DC: srv2 (version 1.1.14-70404b0) - partition with quorum 2 nodes and 9 resources configured Node srv2: standby OFFLINE: [ srv1 ] Full list of resources: Resource Group: samba_daemons samba-nmbd (upstart:nmbd): Stopped [..] Failed Actions: * samba-nmbd_monitor_0 on srv2 'not installed' (5): call=5, status=Not installed, exitreason='none', last-rc-change='Wed Feb 19 14:13:20 2020', queued=0ms, exec=1ms [..] Any suggestions, ideas? Is the a nice HowTo for this upgrade situation? Regards, Rasca ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/