[ceph-users] [cephadm] Found duplicate OSDs
Folks, I have deployed 15 OSDs node clusters using cephadm and encount duplicate OSD on one of the nodes and am not sure how to clean that up. root@datastorn1:~# ceph health HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas configured osd.3 is duplicated on two nodes, i would like to remove it from datastorn4 but I'm not sure how to remove it. In the ceph osd tree I am not seeing any duplicate. root@datastorn1:~# ceph orch ps | grep osd.3 osd.3 datastorn4stopped 7m ago 3w-42.6G osd.3 datastorn5running (3w) 7m ago 3w2584M42.6G 17.2.3 0912465dcea5 d139f8a1234b Getting following error in logs 2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188) 1098186 : cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4, osd.3 in status running on datastorn5 2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188) 1098221 : cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4, osd.3 in status running on datastorn5 2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188) 1098256 : cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4, osd.3 in status running on datastorn5 2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188) 1098293 : cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4, osd.3 in status running on datastorn5 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] [cephadm] Found duplicate OSDs
Folks, I am playing with cephadm and life was good until I started upgrading from octopus to pacific. My upgrade process stuck after upgrading mgr and in logs now i can see following error root@ceph1:~# ceph log last cephadm 2022-09-01T14:40:45.739804+ mgr.ceph2.hmbdla (mgr.265806) 8 : cephadm [INF] Deploying daemon grafana.ceph1 on ceph1 2022-09-01T14:40:56.115693+ mgr.ceph2.hmbdla (mgr.265806) 14 : cephadm [INF] Deploying daemon prometheus.ceph1 on ceph1 2022-09-01T14:41:11.856725+ mgr.ceph2.hmbdla (mgr.265806) 25 : cephadm [INF] Reconfiguring alertmanager.ceph1 (dependencies changed)... 2022-09-01T14:41:11.861535+ mgr.ceph2.hmbdla (mgr.265806) 26 : cephadm [INF] Reconfiguring daemon alertmanager.ceph1 on ceph1 2022-09-01T14:41:12.927852+ mgr.ceph2.hmbdla (mgr.265806) 27 : cephadm [INF] Reconfiguring grafana.ceph1 (dependencies changed)... 2022-09-01T14:41:12.940615+ mgr.ceph2.hmbdla (mgr.265806) 28 : cephadm [INF] Reconfiguring daemon grafana.ceph1 on ceph1 2022-09-01T14:41:14.056113+ mgr.ceph2.hmbdla (mgr.265806) 33 : cephadm [INF] Found duplicate OSDs: osd.2 in status running on ceph1, osd.2 in status running on ceph2 2022-09-01T14:41:14.056437+ mgr.ceph2.hmbdla (mgr.265806) 34 : cephadm [INF] Found duplicate OSDs: osd.5 in status running on ceph1, osd.5 in status running on ceph2 2022-09-01T14:41:14.056630+ mgr.ceph2.hmbdla (mgr.265806) 35 : cephadm [INF] Found duplicate OSDs: osd.3 in status running on ceph1, osd.3 in status running on ceph2 Not sure from where duplicate names came and how that happened. In following output i can't see any duplication root@ceph1:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.97656 root default -3 0.48828 host ceph1 4hdd 0.09769 osd.4 up 1.0 1.0 0ssd 0.19530 osd.0 up 1.0 1.0 1ssd 0.19530 osd.1 up 1.0 1.0 -5 0.48828 host ceph2 5hdd 0.09769 osd.5 up 1.0 1.0 2ssd 0.19530 osd.2 up 1.0 1.0 3ssd 0.19530 osd.3 up 1.0 1.0 But same time i can see duplicate OSD number in ceph1 and ceph2 root@ceph1:~# ceph orch ps NAME HOST PORTSSTATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID alertmanager.ceph1 ceph1 *:9093,9094 running (20s) 2s ago 20s 17.1M- ba2b418f427c 856a4fe641f1 alertmanager.ceph1 ceph2 *:9093,9094 running (20s) 3s ago 20s 17.1M- ba2b418f427c 856a4fe641f1 crash.ceph2 ceph1 running (12d) 2s ago 12d 10.0M- 15.2.17 93146564743f 0a009254afb0 crash.ceph2 ceph2 running (12d) 3s ago 12d 10.0M- 15.2.17 93146564743f 0a009254afb0 grafana.ceph1ceph1 *:3000 running (18s) 2s ago 19s 47.9M- 8.3.5dad864ee21e9 7d7a70b8ab7f grafana.ceph1ceph2 *:3000 running (18s) 3s ago 19s 47.9M- 8.3.5dad864ee21e9 7d7a70b8ab7f mgr.ceph2.hmbdla ceph1 running (13h) 2s ago 12d 506M- 16.2.10 0d668911f040 6274723c35f7 mgr.ceph2.hmbdla ceph2 running (13h) 3s ago 12d 506M- 16.2.10 0d668911f040 6274723c35f7 node-exporter.ceph2 ceph1 running (91m) 2s ago 12d 60.7M- 0.18.1 e5a616e4b9cf d0ba04bb977c node-exporter.ceph2 ceph2 running (91m) 3s ago 12d 60.7M- 0.18.1 e5a616e4b9cf d0ba04bb977c osd.2ceph1 running (12h) 2s ago 12d 867M4096M 15.2.17 93146564743f e286fb1c6302 osd.2ceph2 running (12h) 3s ago 12d 867M4096M 15.2.17 93146564743f e286fb1c6302 osd.3ceph1 running (12h) 2s ago 12d 978M4096M 15.2.17 93146564743f d3ae5d9f694f osd.3ceph2 running (12h) 3s ago 12d 978M4096M 15.2.17 93146564743f d3ae5d9f694f osd.5ceph1 running (12h) 2s ago 8d 225M4096M 15.2.17 93146564743f 405068fb474e osd.5ceph2 running (12h) 3s ago 8d 225M4096M 15.2.17 93146564743f 405068fb474e prometheus.ceph1 ceph1 *:9095 running (8s) 2s ago 8s 30.4M- 514e6a882f6e 9031dbe30cae prometheus.ceph1 ceph2 *:9095 running (8s) 3s ago 8s 30.4M- 514e6a882f6e 9031dbe30cae Is this a bug or did I do something wrong? any workaround to get out from this condition? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io