> Op 3 juli 2016 om 11:34 schreef Roozbeh Shafiee <roozbeh.shaf...@gmail.com>: > > > Actually I tried all the ways which I found them on Ceph Docs and mailing > lists but > non of them had no effect. As a last resort I changed pg/pgp. > > Anyway… What can I do as the best way to solve this problem? >
Did you try to restart some of the OSDs on which recovery is hanging? Does that help anything? Wido > Thanks > > > On Jul 3, 2016, at 1:43 PM, Wido den Hollander <w...@42on.com> wrote: > > > > > >> Op 3 juli 2016 om 11:02 schreef Roozbeh Shafiee > >> <roozbeh.shaf...@gmail.com>: > >> > >> > >> Yes, you’re right but I have 0 object/s recovery last night. when I > >> changed pg/pgp from 1400 > >> to 2048, rebalancing speeded up but the percentage of rebalancing backed > >> to 53%. > >> > > > > Why did you change that? I would not change that value while a cluster is > > still in recovery. > > > >> I have this situation again n again since I dropped out failed OSD when I > >> increase pg/pgp but > >> each time rebalancing stopped at 0 objects/s and low speed transfer. > >> > > > > Hard to judge at this point. You might want to try and restart osd.27 and > > see if that gets things going again. It seems to be involved in many PGs > > which are in 'backfilling' state. > > > > Wido > > > >> Thanks > >> > >>> On Jul 3, 2016, at 1:25 PM, Wido den Hollander <w...@42on.com> wrote: > >>> > >>> > >>>> Op 3 juli 2016 om 10:50 schreef Roozbeh Shafiee > >>>> <roozbeh.shaf...@gmail.com>: > >>>> > >>>> > >>>> Thanks for quick response, Wido > >>>> > >>>> the "ceph -s" output has pasted here: > >>>> http://pastie.org/10897747 > >>>> > >>>> and this is output of “ceph health detail”: > >>>> http://pastebin.com/vMeURWC9 > >>>> > >>> > >>> It seems the cluster is still backfilling PGs and you 'ceph -s' shows so: > >>> 'recovery io 62375 kB/s, 15 objects/s' > >>> > >>> It will just take some time before it finishes. > >>> > >>> Wido > >>> > >>>> Thank you > >>>> > >>>>> On Jul 3, 2016, at 1:10 PM, Wido den Hollander <w...@42on.com> wrote: > >>>>> > >>>>> > >>>>>> Op 3 juli 2016 om 10:34 schreef Roozbeh Shafiee > >>>>>> <roozbeh.shaf...@gmail.com>: > >>>>>> > >>>>>> > >>>>>> Hi list, > >>>>>> > >>>>>> A few days ago one of my OSDs failed and I dropped out that but > >>>>>> afterwards I got > >>>>>> HEALTH_WARN until now. After turing off the OSD, the self-healing > >>>>>> system started > >>>>>> to rebalance data between other OSDs. > >>>>>> > >>>>>> My question is: At the end of rebalancing, the process doesn’t > >>>>>> complete and I get this message > >>>>>> at the end of “ceph -s” output: > >>>>>> > >>>>>> recovery io 1456 KB/s, 0 object/s > >>>>>> > >>>>> > >>>>> Could you post the exact output of 'ceph -s'? > >>>>> > >>>>> There is something more which needs to be shown. > >>>>> > >>>>> 'ceph health detail' also might tell you more. > >>>>> > >>>>> Wido > >>>>> > >>>>>> how can I back to HEALTH_OK situation again? > >>>>>> > >>>>>> My cluster details are: > >>>>>> > >>>>>> - 27 OSDs > >>>>>> - 3 MONs > >>>>>> - 2048 pg/pgs > >>>>>> - Each OSD has 4 TB of space > >>>>>> - CentOS 7.2 with 3.10 linux kernel > >>>>>> - Ceph Hammer version > >>>>>> > >>>>>> Thank you, > >>>>>> Roozbeh_______________________________________________ > >>>>>> ceph-users mailing list > >>>>>> ceph-users@lists.ceph.com > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > >> > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com