Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed

2020-02-20 Thread Rasca Gmelch
Am 20.02.20 um 15:14 schrieb Rafael David Tinoco:
> 
 we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba,
 Corosync and Pacemaker from the Ubuntu repos. We wanted to update
 to Ubuntu 16.04 but it failed:
> 
> Quick question, perhaps unimportant to this forum, but, since this is a
> samba HA setup, why to update to 16.04 and not to 18.04 ? I know Xenial
> is still supported until 2024 but, as you're taking the chance to
> migrate, why not the latest LTS version (bionic ?).

The idea was to do a small step first while the cluster is
still running - updating node by node (just 2) from
corosync 2.3.3 to 2.3.5 und pacemaker 1.1.10 to 1.1.14.
These minor updates looked possible to me - but I oversaw
the upstart/systemd problem.

If we have to shutdown the cluster anyway I think
we will wait until April and then doing an upgrade
to 20.04 LTS.


> And you will understand their relation in the following wiki:
> https://wiki.debian.org/MaintainerScripts

Thanks for the notes and links. But during the upgrade I did
not see any errors.

> OR the upgrade  was not smooth in regards to config options (you were
> using) and its compatibility.
> 

After manualle adding the nodelist the corosync at least
this runs 'normal':

corosync-quorumtool -sli

Membership information
--
Nodeid  Votes Name
1084766053  1 192.168.55.101
1084766054  1 192.168.55.102 (local)

This I see on both servers. Only the "local" tag
is on the other IP - which is what I would expect.

Does it mean all these errors or wrong state information
are subject of pacemaker and not corosync, right?

 srv2
 Last updated: Wed Feb 19 17:25:14 2020 Last change: Tue Feb 18
 18:29:29
 2020 by hacluster via crmd on srv2
 Stack: corosync
 Current DC: srv2 (version 1.1.14-70404b0) - partition with quorum
 2 nodes and 9 resources configured

 Node srv2: standby
 OFFLINE: [ srv1 ]

 Full list of resources:

 Resource Group: samba_daemons
 samba-nmbd (upstart:nmbd): Stopped
 [..]>>

 Failed Actions:
 * samba-nmbd_monitor_0 on srv2 'not installed' (5): call=5, status=Not
 installed, exitreason='none',
last-rc-change='Wed Feb 19 14:13:20 2020', queued=0ms, exec=1ms
 [..]
>>
>> According to the logs it looks like the service (e.g. nmbd) is not
>> available (may be because of (upstart:nmbd) - how do I change this
>> configuration in pacemaker? I want to change it to "service" instead
>> of "upstart". I hope this will fix at least the service problems.
>>
>>   crm configure primitive smbd ..
>> gives me:
>>   ERROR: smbd: id is already in use.

I figured out to use "crm configure edit" to change
the setup.

> A bunch of things to notice from these messages:
> 
> - Trusty used "upstart" as its init system
> - Xenial uses systemd as its init system
> - It looks to me you're using "upstart" resource agent
> - In Xenial you would have to use systemd resource agent

According to the pacemaker 1.1. manual, chapter 5.2.5
I can use the 'service' resource agent instead of systemd
or upstart. But if I try with "configure edit" I see
an error like "no such resource agent". I did this in
an virtualbox test-environment with ubuntu 14.04.

 Any suggestions, ideas? Is the a nice HowTo for this upgrade situation?
> 
> Yes
> 
> 1) stop what you are doing, do it from the ground.
> 
> 2) Take 1 of the servers and configure it appropriately using the proper
> resource agent. Before configuring the resources, make sure the rings
> are in good shape and the cluster has the proper votes.

Thanks for all these notes and cents but I'm too less experienced
to do this while the production node is online.
I've to do more tests with my two virtualbox VMs to become more
familuar with the corosync/pacemaker setup.

Regards,
 Rasca
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed

2020-02-20 Thread Rasca Gmelch
Am 19.02.20 um 19:20 schrieb Strahil Nikolov:
> On February 19, 2020 6:31:19 PM GMT+02:00, Rasca  
> wrote:
>> Hi,
>>
>> we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba,
>> Corosync and Pacemaker from the Ubuntu repos. We wanted to update
>> to Ubuntu 16.04 but it failed:
>>
>> I checked the versions before and because of just minor updates
>> of corosync and pacemaker I thought it should be possible to
>> update node by node.
>>
>> * Put srv2 into standby
>> * Upgraded srv2 to Ubuntu 16.04 with reboot and so on
>> * Added a nodelist to corosync.conf because it looked
>>  like corosync on srv2 didn't know the names of the
>>  node ids anymore
>>
>> But still it does not work on srv2. srv1 (the active
>> server with ubuntu 14.04) ist fine. It looks like
>> it's an upstart/systemd issue, but may be even more.
>> Why does srv1 says UNCLEAN about srv2? On srv2 I see
>> corosync sees both systems. But srv2 says srv1 is
>> OFFLINE!?
>>
>> crm status
>>
>>
>> srv1
>> Last updated: Wed Feb 19 17:22:03 2020
>> Last change: Tue Feb 18 11:05:47 2020 via crm_attribute on srv2
>> Stack: corosync
>> Current DC: srv1 (1084766053) - partition with quorum
>> Version: 1.1.10-42f2063
>> 2 Nodes configured
>> 9 Resources configured
>>
>>
>> Node srv2 (1084766054): UNCLEAN (offline)
>> Online: [ srv1 ]
>>
>> Resource Group: samba_daemons
>> samba-nmbd   (upstart:nmbd): Started srv1
>> [..]
>>
>>
>> srv2
>> Last updated: Wed Feb 19 17:25:14 2020   Last change: Tue Feb 18
>> 18:29:29
>> 2020 by hacluster via crmd on srv2
>> Stack: corosync
>> Current DC: srv2 (version 1.1.14-70404b0) - partition with quorum
>> 2 nodes and 9 resources configured
>>
>> Node srv2: standby
>> OFFLINE: [ srv1 ]

Still don't understand the concept of corosync/pacemaker. Which part is
responsible for this "OFFLINE" statement? I don't know where to
look deeper about this mismatch (see some lines above, where it
says "Online" about srv1).

>>
>> Full list of resources:
>>
>> Resource Group: samba_daemons
>> samba-nmbd   (upstart:nmbd): Stopped
>> [..]>>
>>
>> Failed Actions:
>> * samba-nmbd_monitor_0 on srv2 'not installed' (5): call=5, status=Not
>> installed, exitreason='none',
>>last-rc-change='Wed Feb 19 14:13:20 2020', queued=0ms, exec=1ms
>> [..]

According to the logs it looks like the service (e.g. nmbd) is not
available (may be because of (upstart:nmbd) - how do I change this
configuration in pacemaker? I want to change it to "service" instead
of "upstart". I hope this will fix at least the service problems.

  crm configure primitive smbd ..
gives me:
  ERROR: smbd: id is already in use.

>>
>> Any suggestions, ideas? Is the a nice HowTo for this upgrade situation?
>>
>> Regards,
>> Rasca

> Are  you  sure  that there  is no cluster  peotocol mismatch ?
> 
> Major number OS Upgrade  (even if supported by vendor)  must be done offline  
> (with proper  testing in advance).
> 
> What happens  when you upgraded  the other  node ,  or when you rollback the 
> upgrade ?
> 
> Best Regards,
> Strahil Nikolov

Protocol mismatch of corosync or pacemaker? corosync-cmapctl shows that
srv1 and srv2 are members. In the corosync config I have:

service {
   ver: 0
   name: pacemaker
}

What about this "ver: 0"? May be that's wrong - even for the ubuntu
14.04? The configuration itself was designed under ubuntu 12.04. May
be we forgot to change this parameter when we upgraded from 12.04 to
ubuntu 14.04 some years before?


Thx+Regards,
 Rasca
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/