Hi,
I had a failure on 2 of 7 OSD nodes.
This caused a server reboot and unfortunately the cluster network failed
to come up.

This resulted in many OSD down situation.

I decided to stop all services (OSD, MGR, MON) and to start them
sequentially.

Now I have multiple OSD marked as down although the service is running.
None of these down OSDS is connected to the 2 nodes with failure.

In the OSD logs I can see multiple entries like this:
2019-12-09 11:13:10.378 7f9a372fb700  1 osd.374 pg_epoch: 493189
pg[11.1992( v 457986'92619 (303558'88266,457986'92619]
local-lis/les=466724/466725 n=4107 ec=8346/8346 lis/c 466724/466724
les/c/f 466725/466725/176266 468956/493184/468423) [203,412] r=-1
lpr=493184 pi=[466724,493184)/1 crt=457986'92619 lcod 0'0 unknown NOTIFY
mbc={}] state<Start>: transitioning to Stray

I tried to restart the impacted OSD w/o success, means the relevant OSD
is still marked as down.

Is there a procedure to overcome this issue, means getting all OSD up?

THX
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to