According to ceph -s the cluster is in recovery, backfill, ect. data: pools: 7 pools, 19656 pgs objects: 65.02M objects, 248 TiB usage: 761 TiB used, 580 TiB / 1.3 PiB avail pgs: 16.173% pgs unknown 0.493% pgs not active 890328/195069177 objects degraded (0.456%) 828080/195069177 objects misplaced (0.425%) 15733 active+clean 3179 unknown 215 active+undersized+degraded+remapped+backfilling 152 active+undersized+degraded+remapped+backfill_wait 135 active+remapped+backfill_wait 107 active+remapped+backfilling 65 down 31 undersized+degraded+peered 18 active+recovering 7 active+recovery_wait 6 active+recovery_wait+degraded 4 active+recovering+degraded 1 active+recovery_wait+remapped 1 peering 1 active+remapped+backfill_toofull 1 active+undersized+degraded+remapped+backfill_wait+backfill_toofull
io: client: 607 B/s rd, 134 MiB/s wr, 0 op/s rd, 34 op/s wr recovery: 1.9 GiB/s, 511 objects/s Am 09.12.2019 um 13:44 schrieb Paul Emmerich: > An OSD that is down does not recover or backfill. Faster recovery or > backfill will not resolve down OSDs > > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io <http://www.croit.io> > Tel: +49 89 1896585 90 > > > On Mon, Dec 9, 2019 at 1:42 PM Thomas Schneider <74cmo...@gmail.com > <mailto:74cmo...@gmail.com>> wrote: > > Hi, > > I think I can speed-up the recovery / backfill. > > What is the recommended setting for > osd_max_backfills > osd_recovery_max_active > ? > > THX > > Am 09.12.2019 um 13:36 schrieb Paul Emmerich: > > This message is expected. > > > > But your current situation is a great example of why having a > separate > > cluster network is a bad idea in most situations. > > First thing I'd do in this scenario is to get rid of the cluster > > network and see if that helps > > > > > > Paul > > > > -- > > Paul Emmerich > > > > Looking for help with your Ceph cluster? Contact us at > https://croit.io > > > > croit GmbH > > Freseniusstr. 31h > > 81247 München > > www.croit.io <http://www.croit.io> <http://www.croit.io> > > Tel: +49 89 1896585 90 > > > > > > On Mon, Dec 9, 2019 at 11:22 AM Thomas Schneider > <74cmo...@gmail.com <mailto:74cmo...@gmail.com> > > <mailto:74cmo...@gmail.com <mailto:74cmo...@gmail.com>>> wrote: > > > > Hi, > > I had a failure on 2 of 7 OSD nodes. > > This caused a server reboot and unfortunately the cluster > network > > failed > > to come up. > > > > This resulted in many OSD down situation. > > > > I decided to stop all services (OSD, MGR, MON) and to start them > > sequentially. > > > > Now I have multiple OSD marked as down although the service is > > running. > > None of these down OSDS is connected to the 2 nodes with > failure. > > > > In the OSD logs I can see multiple entries like this: > > 2019-12-09 11:13:10.378 7f9a372fb700 1 osd.374 pg_epoch: 493189 > > pg[11.1992( v 457986'92619 (303558'88266,457986'92619] > > local-lis/les=466724/466725 n=4107 ec=8346/8346 lis/c > 466724/466724 > > les/c/f 466725/466725/176266 468956/493184/468423) [203,412] > r=-1 > > lpr=493184 pi=[466724,493184)/1 crt=457986'92619 lcod 0'0 > unknown > > NOTIFY > > mbc={}] state<Start>: transitioning to Stray > > > > I tried to restart the impacted OSD w/o success, means the > > relevant OSD > > is still marked as down. > > > > Is there a procedure to overcome this issue, means getting > all OSD up? > > > > THX > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > <mailto:ceph-users@ceph.io> > > <mailto:ceph-users@ceph.io <mailto:ceph-users@ceph.io>> > > To unsubscribe send an email to ceph-users-le...@ceph.io > <mailto:ceph-users-le...@ceph.io> > > <mailto:ceph-users-le...@ceph.io > <mailto:ceph-users-le...@ceph.io>> > > > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io