After a successful upgrade of a Ceph cluster from 16.2.7 to 16.2.11, I needed 
to downgrade it back to 16.2.7 as I found an issue with the new version. 

I expected that running the downgrade with:`ceph orch upgrade start 
--ceph-version 16.2.7` should have worked fine. However, it blocked right after 
the downgrade of the first MGR daemon. In fact, the downgraded daemon is not 
able to use the cephadm module anymore. Any `ceph orch` command fails with the 
following error:

$ ceph orch ps
Error ENOENT: Module not found
And the downgrade process is therefore blocked. 

These are the logs of the MGR when issuing the command:

Mar 28 12:13:15 astano03 
ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 
2023-03-28T10:13:15.557+0000 7f828fe8c700  0 log_channel(audit) log [DBG] : 
from='client.3136173 -' entity='client.admin' cmd=[{"prefix": "orch ps", 
"target": ["mon-mgr", ""]}]: dispatch
Mar 28 12:13:15 astano03 
ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 
2023-03-28T10:13:15.558+0000 7f829068d700  0 [orchestrator DEBUG root] _oremote 
orchestrator -> cephadm.list_daemons(*(None, None), **{'daemon_id': None, 
'host': None, 'refresh': False})
Mar 28 12:13:15 astano03 
ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 
2023-03-28T10:13:15.558+0000 7f829068d700 -1 no module 'cephadm'
Mar 28 12:13:15 astano03 
ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 
2023-03-28T10:13:15.558+0000 7f829068d700  0 [orchestrator DEBUG root] _oremote 
orchestrator -> cephadm.get_feature_set(*(), **{})
Mar 28 12:13:15 astano03 
ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 
2023-03-28T10:13:15.558+0000 7f829068d700 -1 no module 'cephadm'
Mar 28 12:13:15 astano03 
ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 
2023-03-28T10:13:15.558+0000 7f829068d700 -1 mgr.server reply reply (2) No such 
file or directory Module not found

Other interesting MGR logs are:
 2023-03-28T11:05:59.519+0000 7fcd16314700  4 mgr get_store get_store key: 
 2023-03-28T11:05:59.519+0000 7fcd16314700 -1 mgr load Failed to construct 
class in 'cephadm'
 2023-03-28T11:05:59.519+0000 7fcd16314700 -1 mgr load Traceback (most recent 
call last):
e "/usr/share/ceph/mgr/cephadm/module.py", line 450, in __init__
elf.upgrade = CephadmUpgrade(self)
e "/usr/share/ceph/mgr/cephadm/upgrade.py", line 111, in __init__
elf.upgrade_state: Optional[UpgradeState] = 
e "/usr/share/ceph/mgr/cephadm/upgrade.py", line 92, in from_json
eturn cls(**c)
rror: __init__() got an unexpected keyword argument 'daemon_types'

 2023-03-28T11:05:59.521+0000 7fcd16314700 -1 mgr operator() Failed to run 
module in active mode ('cephadm')
Which seem to relate to the new feature of staggered upgrades.

Please note that before, everything was working fine with version 16.2.7.

I am currently stuck in this situation with only one MGR daemon on version 
16.2.11 which is the only one still working fine:

[root@astano01 ~]# ceph orch ps | grep mgr
mgr.astano02.mzmewn                    astano02  *:8443,9283  running (5d)     
43s ago   2y     455M        -  16.2.11  7a63bce27215  e2d7806acf16
mgr.astano03.qtzccn                    astano03  *:8443,9283  running (3m)     
22s ago  95m     383M        -  16.2.7   463ec4b1fdc0  cc0d88864fa1

Does anyone already faced this issue or knows how can I make the 16.2.7 MGR 
load the cephadm module correctly?

Thanks in advance for any help!
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to