Re: [ceph-users] Ceph Rebalance Issue

Wido den Hollander Mon, 04 Jul 2016 09:50:19 -0700

> Op 3 juli 2016 om 11:34 schreef Roozbeh Shafiee <roozbeh.shaf...@gmail.com>:
> 
> 
> Actually I tried all the ways which I found them on Ceph Docs and mailing 
> lists but
> non of them had no effect. As a last resort I changed pg/pgp.
> 
> Anyway… What can I do as the best way to solve this problem?
>


Did you try to restart some of the OSDs on which recovery is hanging? Does that 
help anything?

Wido

> Thanks
> 
> > On Jul 3, 2016, at 1:43 PM, Wido den Hollander <w...@42on.com> wrote:
> > 
> > 
> >> Op 3 juli 2016 om 11:02 schreef Roozbeh Shafiee 
> >> <roozbeh.shaf...@gmail.com>:
> >> 
> >> 
> >> Yes, you’re right but I have 0 object/s recovery last night. when I 
> >> changed pg/pgp from 1400
> >> to 2048, rebalancing speeded up but the percentage of rebalancing backed 
> >> to 53%.
> >> 
> > 
> > Why did you change that? I would not change that value while a cluster is 
> > still in recovery.
> > 
> >> I have this situation again n again since I dropped out failed OSD when I 
> >> increase pg/pgp but 
> >> each time rebalancing stopped at 0 objects/s and low speed transfer.
> >> 
> > 
> > Hard to judge at this point. You might want to try and restart osd.27 and 
> > see if that gets things going again. It seems to be involved in many PGs 
> > which are in 'backfilling' state.
> > 
> > Wido
> > 
> >> Thanks
> >> 
> >>> On Jul 3, 2016, at 1:25 PM, Wido den Hollander <w...@42on.com> wrote:
> >>> 
> >>> 
> >>>> Op 3 juli 2016 om 10:50 schreef Roozbeh Shafiee 
> >>>> <roozbeh.shaf...@gmail.com>:
> >>>> 
> >>>> 
> >>>> Thanks for quick response, Wido
> >>>> 
> >>>> the "ceph -s" output has pasted here:
> >>>> http://pastie.org/10897747
> >>>> 
> >>>> and this is output of “ceph health detail”:
> >>>> http://pastebin.com/vMeURWC9
> >>>> 
> >>> 
> >>> It seems the cluster is still backfilling PGs and you 'ceph -s' shows so: 
> >>> 'recovery io 62375 kB/s, 15 objects/s'
> >>> 
> >>> It will just take some time before it finishes.
> >>> 
> >>> Wido
> >>> 
> >>>> Thank you
> >>>> 
> >>>>> On Jul 3, 2016, at 1:10 PM, Wido den Hollander <w...@42on.com> wrote:
> >>>>> 
> >>>>> 
> >>>>>> Op 3 juli 2016 om 10:34 schreef Roozbeh Shafiee 
> >>>>>> <roozbeh.shaf...@gmail.com>:
> >>>>>> 
> >>>>>> 
> >>>>>> Hi list,
> >>>>>> 
> >>>>>> A few days ago one of my OSDs failed and I dropped out that but 
> >>>>>> afterwards I got
> >>>>>> HEALTH_WARN until now. After turing off the OSD, the self-healing 
> >>>>>> system started
> >>>>>> to rebalance data between other OSDs.
> >>>>>> 
> >>>>>> My question is: At the end of rebalancing, the process doesn’t 
> >>>>>> complete and I get this message
> >>>>>> at the end of “ceph -s” output:
> >>>>>> 
> >>>>>> recovery io 1456 KB/s, 0 object/s
> >>>>>> 
> >>>>> 
> >>>>> Could you post the exact output of 'ceph -s'?
> >>>>> 
> >>>>> There is something more which needs to be shown.
> >>>>> 
> >>>>> 'ceph health detail' also might tell you more.
> >>>>> 
> >>>>> Wido
> >>>>> 
> >>>>>> how can I back to HEALTH_OK situation again?
> >>>>>> 
> >>>>>> My cluster details are:
> >>>>>> 
> >>>>>> - 27 OSDs
> >>>>>> - 3 MONs
> >>>>>> - 2048 pg/pgs
> >>>>>> - Each OSD has 4 TB of space
> >>>>>> - CentOS 7.2 with 3.10 linux kernel
> >>>>>> - Ceph Hammer version
> >>>>>> 
> >>>>>> Thank you,
> >>>>>> Roozbeh_______________________________________________
> >>>>>> ceph-users mailing list
> >>>>>> ceph-users@lists.ceph.com
> >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> 
> >> 
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Rebalance Issue

Reply via email to