I was able to determine that the mon. key was not removed.  My mon nodes were 
stuck in a peering state because the new mon node was trying to use the 15.2.8 
image instead of the 16.2.4 image.  This caused a problem because during a 
recent Octopus upgrade I set auth_allow_insecure_global_id_reclaim to false, so 
the new mon node couldn't work.  Once I stopped the new mon node the cluster 
was able to recover.

I believe the reason it was trying to use the 15.2.8 image is because that's 
when I added my first arm64 nodes to my all x86_64 cluster and wasn't able to 
properly complete any upgrades after that, which means the global container 
image name was never changed.

Bryan

On Jun 1, 2021, at 9:38 AM, Bryan Stillwell 
<bstillw...@godaddy.com<mailto:bstillw...@godaddy.com>> wrote:

This morning I tried adding a mon node to my home Ceph cluster with the 
following command:

ceph orch daemon add mon ether


This seemed to work at first, but then it decided to remove it fairly quickly 
which broke the cluster because the mon. keyring was also removed:

2021-06-01T14:16:11.523210+0000 mgr.paris.glbvov [INF] Deploying daemon 
mon.ether on ether
2021-06-01T14:16:43.621759+0000 mgr.paris.glbvov [INF] Safe to remove 
mon.ether: not in monmap (['paris', 'excalibur'])
2021-06-01T14:16:43.622135+0000 mgr.paris.glbvov [INF] Removing monitor ether 
from monmap...
2021-06-01T14:16:43.641365+0000 mgr.paris.glbvov [INF] Removing daemon 
mon.ether from ether
2021-06-01T14:16:46.610283+0000 mgr.paris.glbvov [INF] Removing key for mon.


Digging in to this it seems like this line might need to check for 'mon.' and 
not 'mon':

https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/services/cephadmservice.py#L486


Anyways, does anyone know how to import the mon. keyring again once it has been 
removed?

Thanks,
Bryan

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to