Hey Adam,

thanks for your fast reply.

That's a bit more invasive and risky than I was hoping for.
But if this is the only way, I guess we need to do this.

Would it be advisable to put some maintenance flags like noout, nobackfill, 
norebalance?
And maybe stop the ceph target on the host I'm re-adding to pause all daemons?

Best, Mathias

On 5/19/2022 8:14 PM, Adam King wrote:
cephadm just takes the hostname given in the "ceph orch host add" commands and 
assumes it won't change. The FQDN names (or whatever "ceph orch host ls" shows 
in any scenario) are from whatever input was given in those commands. Cephadm 
will even try to verify the hostname matches what is given when adding the 
host. As for where it is stored, we keep that info in the mon key store and it 
isn't meant to be manually updated (ceph config-key get mgr/cephadm/inventory). 
Although, there have occasionally been people running into issues related to a 
mismatch between an FQDN and a shortname. There's no built-in command for 
changing a hostname because of the expectation that it won't change. However, 
you should be able to fix this by removing and re-adding the host. E.g. "ceph 
orch host rm osd-mirror-1.our.domain.org<http://osd-mirror-1.our.domain.org/>" 
followed by "ceph orch host add osd-mirror-1 172.16.62.22 --labels rgw --labels 
osd". If you're on a late enough version that it requests you drain the host 
before we'll remove it (it was some pacific dot release, don't remember which 
one) you can pass --force to the host rm command. Generally it's not a good 
idea to remove hosts from cephadm's control while there are still cephadm 
deployed daemons on it like that but this is a special case. Anyway, removing 
and re-adding the host is the only (reasonable) way to change what it has 
stored for the hostname that I can remember.

Let me know if that doesn't work,
 - Adam King

On Thu, May 19, 2022 at 1:41 PM Kuhring, Mathias 
<mathias.kuhr...@bih-charite.de<mailto:mathias.kuhr...@bih-charite.de>> wrote:
Dear ceph users,

one of our cluster is complaining about plenty of stray hosts and
daemons. Pretty much all of them.

[WRN] CEPHADM_STRAY_HOST: 6 stray host(s) with 280 daemon(s) not managed
by cephadm
     stray host osd-mirror-1 has 47 stray daemons:
['mgr.osd-mirror-1.ltmyyh', 'mon.osd-mirror-1', 'osd.1', ...]
     stray host osd-mirror-2 has 46 stray daemons: ['mon.osd-mirror-2',
'osd.0', ...]
     stray host osd-mirror-3 has 48 stray daemons:
['cephfs-mirror.osd-mirror-3.qzcuvv', 'mgr.osd-mirror-3',
'mon.osd-mirror-3', 'osd.101', ...]
     stray host osd-mirror-4 has 47 stray daemons:
['mds.cephfs.osd-mirror-4.omjlxu', 'mgr.osd-mirror-4', 'osd.103', ...]
     stray host osd-mirror-5 has 46 stray daemons: ['mgr.osd-mirror-5',
'osd.139', ...]
     stray host osd-mirror-6 has 46 stray daemons:
['mds.cephfs.osd-mirror-6.hobjsy', 'osd.141', ...]

It all seems to boil down to host names from `ceph orch host ls` not
matching with other configurations.

ceph orch host ls
HOST                                ADDR          LABELS STATUS
osd-mirror-1.our.domain.org<http://osd-mirror-1.our.domain.org>  172.16.62.22  
rgw osd
osd-mirror-2.our.domain.org<http://osd-mirror-2.our.domain.org>  172.16.62.23  
rgw osd
osd-mirror-3.our.domain.org<http://osd-mirror-3.our.domain.org>  172.16.62.24  
rgw osd
osd-mirror-4.our.domain.org<http://osd-mirror-4.our.domain.org>  172.16.62.25  
rgw mds osd
osd-mirror-5.our.domain.org<http://osd-mirror-5.our.domain.org>  172.16.62.32  
rgw osd
osd-mirror-6.our.domain.org<http://osd-mirror-6.our.domain.org>  172.16.62.33  
rgw mds osd

hostname
osd-mirror-6

hostname -f
osd-mirror-6.our.domain.org<http://osd-mirror-6.our.domain.org>

0|0[root@osd-mirror-6 ~]# ceph mon metadata | grep "\"hostname\""
         "hostname": "osd-mirror-1",
         "hostname": "osd-mirror-3",
         "hostname": "osd-mirror-2",

0|1[root@osd-mirror-6 ~]# ceph mgr metadata | grep "\"hostname\""
         "hostname": "osd-mirror-1",
         "hostname": "osd-mirror-3",
         "hostname": "osd-mirror-4",
         "hostname": "osd-mirror-5",


The documentation states, that "cephadm demands that the name of host
given via `ceph orch host add` equals the output of `hostname` on remote
hosts.".

https://docs.ceph.com/en/latest/cephadm/host-management/#fully-qualified-domain-names-vs-bare-host-names

https://docs.ceph.com/en/octopus/cephadm/concepts/?#fully-qualified-domain-names-vs-bare-host-names

But it seems our cluster wasn't setup like this.

How can I now change the host names which were assigend when adding the
hosts with `ceph orch host add HOSTNAME`?

I can't seem to find any documentation on changing the host names which
are listed by `ceph orch host ls`.
All I can find is related to changing the actual name of the host in the
system.
The crush map also just contains the bare host names.
So, where are these FQDN names actually registered?

Thank you for help.

Best regards,
Mathias
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>

--
Mathias Kuhring

Dr. rer. nat.
Bioinformatician
HPC & Core Unit Bioinformatics
Berlin Institute of Health at Charité (BIH)

E-Mail:  mathias.kuhr...@bih-charite.de<mailto:mathias.kuhr...@bih-charite.de>
Mobile: +49 172 3475576
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to