[ceph-users] [cephadm] Found duplicate OSDs

2022-10-21 Thread Satish Patel
Folks,

I have deployed 15 OSDs node clusters using cephadm and encount duplicate
OSD on one of the nodes and am not sure how to clean that up.

root@datastorn1:~# ceph health
HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas
configured

osd.3 is duplicated on two nodes, i would like to remove it from
datastorn4 but I'm not sure how to remove it. In the ceph osd tree I am not
seeing any duplicate.

root@datastorn1:~# ceph orch ps | grep osd.3
osd.3  datastorn4stopped  7m
ago   3w-42.6G 
osd.3  datastorn5running (3w) 7m
ago   3w2584M42.6G  17.2.3 0912465dcea5  d139f8a1234b


Getting following error in logs

2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188) 1098186 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188) 1098221 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188) 1098256 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188) 1098293 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Folks,

I am playing with cephadm and life was good until I started upgrading from
octopus to pacific. My upgrade process stuck after upgrading mgr and in
logs now i can see following error

root@ceph1:~# ceph log last cephadm
2022-09-01T14:40:45.739804+ mgr.ceph2.hmbdla (mgr.265806) 8 :
cephadm [INF] Deploying daemon grafana.ceph1 on ceph1
2022-09-01T14:40:56.115693+ mgr.ceph2.hmbdla (mgr.265806) 14 :
cephadm [INF] Deploying daemon prometheus.ceph1 on ceph1
2022-09-01T14:41:11.856725+ mgr.ceph2.hmbdla (mgr.265806) 25 :
cephadm [INF] Reconfiguring alertmanager.ceph1 (dependencies
changed)...
2022-09-01T14:41:11.861535+ mgr.ceph2.hmbdla (mgr.265806) 26 :
cephadm [INF] Reconfiguring daemon alertmanager.ceph1 on ceph1
2022-09-01T14:41:12.927852+ mgr.ceph2.hmbdla (mgr.265806) 27 :
cephadm [INF] Reconfiguring grafana.ceph1 (dependencies changed)...
2022-09-01T14:41:12.940615+ mgr.ceph2.hmbdla (mgr.265806) 28 :
cephadm [INF] Reconfiguring daemon grafana.ceph1 on ceph1
2022-09-01T14:41:14.056113+ mgr.ceph2.hmbdla (mgr.265806) 33 :
cephadm [INF] Found duplicate OSDs: osd.2 in status running on ceph1,
osd.2 in status running on ceph2
2022-09-01T14:41:14.056437+ mgr.ceph2.hmbdla (mgr.265806) 34 :
cephadm [INF] Found duplicate OSDs: osd.5 in status running on ceph1,
osd.5 in status running on ceph2
2022-09-01T14:41:14.056630+ mgr.ceph2.hmbdla (mgr.265806) 35 :
cephadm [INF] Found duplicate OSDs: osd.3 in status running on ceph1,
osd.3 in status running on ceph2


Not sure from where duplicate names came and how that happened. In
following output i can't see any duplication

root@ceph1:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME   STATUS  REWEIGHT  PRI-AFF
-1 0.97656  root default
-3 0.48828  host ceph1
 4hdd  0.09769  osd.4   up   1.0  1.0
 0ssd  0.19530  osd.0   up   1.0  1.0
 1ssd  0.19530  osd.1   up   1.0  1.0
-5 0.48828  host ceph2
 5hdd  0.09769  osd.5   up   1.0  1.0
 2ssd  0.19530  osd.2   up   1.0  1.0
 3ssd  0.19530  osd.3   up   1.0  1.0


But same time i can see duplicate OSD number in ceph1 and ceph2


root@ceph1:~# ceph orch ps
NAME HOST   PORTSSTATUS REFRESHED  AGE
 MEM USE  MEM LIM  VERSION  IMAGE ID  CONTAINER ID
alertmanager.ceph1   ceph1  *:9093,9094  running (20s) 2s ago  20s
   17.1M-   ba2b418f427c  856a4fe641f1
alertmanager.ceph1   ceph2  *:9093,9094  running (20s) 3s ago  20s
   17.1M-   ba2b418f427c  856a4fe641f1
crash.ceph2  ceph1   running (12d) 2s ago  12d
   10.0M-  15.2.17  93146564743f  0a009254afb0
crash.ceph2  ceph2   running (12d) 3s ago  12d
   10.0M-  15.2.17  93146564743f  0a009254afb0
grafana.ceph1ceph1  *:3000   running (18s) 2s ago  19s
   47.9M-  8.3.5dad864ee21e9  7d7a70b8ab7f
grafana.ceph1ceph2  *:3000   running (18s) 3s ago  19s
   47.9M-  8.3.5dad864ee21e9  7d7a70b8ab7f
mgr.ceph2.hmbdla ceph1   running (13h) 2s ago  12d
506M-  16.2.10  0d668911f040  6274723c35f7
mgr.ceph2.hmbdla ceph2   running (13h) 3s ago  12d
506M-  16.2.10  0d668911f040  6274723c35f7
node-exporter.ceph2  ceph1   running (91m) 2s ago  12d
   60.7M-  0.18.1   e5a616e4b9cf  d0ba04bb977c
node-exporter.ceph2  ceph2   running (91m) 3s ago  12d
   60.7M-  0.18.1   e5a616e4b9cf  d0ba04bb977c
osd.2ceph1   running (12h) 2s ago  12d
867M4096M  15.2.17  93146564743f  e286fb1c6302
osd.2ceph2   running (12h) 3s ago  12d
867M4096M  15.2.17  93146564743f  e286fb1c6302
osd.3ceph1   running (12h) 2s ago  12d
978M4096M  15.2.17  93146564743f  d3ae5d9f694f
osd.3ceph2   running (12h) 3s ago  12d
978M4096M  15.2.17  93146564743f  d3ae5d9f694f
osd.5ceph1   running (12h) 2s ago   8d
225M4096M  15.2.17  93146564743f  405068fb474e
osd.5ceph2   running (12h) 3s ago   8d
225M4096M  15.2.17  93146564743f  405068fb474e
prometheus.ceph1 ceph1  *:9095   running (8s)  2s ago   8s
   30.4M-   514e6a882f6e  9031dbe30cae
prometheus.ceph1 ceph2  *:9095   running (8s)  3s ago   8s
   30.4M-   514e6a882f6e  9031dbe30cae


Is this a bug or did I do something wrong? any workaround to get out
from this condition?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io