Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

Andrija Panic Wed, 04 Mar 2015 08:57:10 -0800

Hi Robert,

I already have this stuff set. CEph is 0.87.0 now...


Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
move data in less than 8h per my last experineced that was arround8h, but
some 1G OSDs were included...

Thx!

On 4 March 2015 at 17:49, Robert LeBlanc <rob...@leblancnet.us> wrote:

> You will most likely have a very high relocation percentage. Backfills
> always are more impactful on smaller clusters, but "osd max backfills"
> should be what you need to help reduce the impact. The default is 10,
> you will want to use 1.
>
> I didn't catch which version of Ceph you are running, but I think
> there was some priority work done in firefly to help make backfills
> lower priority. I think it has gotten better in later versions.
>
> On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic <andrija.pa...@gmail.com>
> wrote:
> > Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
> crush
> > map - weather that will cause more than 37% of data moved (80% or
> whatever)
> >
> > I'm also wondering if the thortling that I applied is fine or not - I
> will
> > introduce the osd_recovery_delay_start 10sec as Irek said.
> >
> > I'm just wondering hom much will be the performance impact, because:
> > - when stoping OSD, the impact while backfilling was fine more or a less
> - I
> > can leave with this
> > - when I removed OSD from cursh map - first 1h or so, impact was
> tremendous,
> > and later on during recovery process impact was much less but still
> > noticable...
> >
> > Thanks for the tip of course !
> > Andrija
> >
> > On 3 March 2015 at 18:34, Robert LeBlanc <rob...@leblancnet.us> wrote:
> >>
> >> I would be inclined to shut down both OSDs in a node, let the cluster
> >> recover. Once it is recovered, shut down the next two, let it recover.
> >> Repeat until all the OSDs are taken out of the cluster. Then I would
> >> set nobackfill and norecover. Then remove the hosts/disks from the
> >> CRUSH then unset nobackfill and norecover.
> >>
> >> That should give you a few small changes (when you shut down OSDs) and
> >> then one big one to get everything in the final place. If you are
> >> still adding new nodes, when nobackfill and norecover is set, you can
> >> add them in so that the one big relocate fills the new drives too.
> >>
> >> On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic <andrija.pa...@gmail.com>
> >> wrote:
> >> > Thx Irek. Number of replicas is 3.
> >> >
> >> > I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
> >> > decommissioned), which is further connected to a new 10G
> switch/network
> >> > with
> >> > 3 servers on it with 12 OSDs each.
> >> > I'm decommissioning old 3 nodes on 1G network...
> >> >
> >> > So you suggest removing whole node with 2 OSDs manually from crush
> map?
> >> > Per my knowledge, ceph never places 2 replicas on 1 node, all 3
> replicas
> >> > were originally been distributed over all 3 nodes. So anyway It could
> be
> >> > safe to remove 2 OSDs at once together with the node itself...since
> >> > replica
> >> > count is 3...
> >> > ?
> >> >
> >> > Thx again for your time
> >> >
> >> > On Mar 3, 2015 1:35 PM, "Irek Fasikhov" <malm...@gmail.com> wrote:
> >> >>
> >> >> Once you have only three nodes in the cluster.
> >> >> I recommend you add new nodes to the cluster, and then delete the
> old.
> >> >>
> >> >> 2015-03-03 15:28 GMT+03:00 Irek Fasikhov <malm...@gmail.com>:
> >> >>>
> >> >>> You have a number of replication?
> >> >>>
> >> >>> 2015-03-03 15:14 GMT+03:00 Andrija Panic <andrija.pa...@gmail.com>:
> >> >>>>
> >> >>>> Hi Irek,
> >> >>>>
> >> >>>> yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
> >> >>>> degraded and moved/recovered.
> >> >>>> When I after that removed it from Crush map "ceph osd crush rm id",
> >> >>>> that's when the stuff with 37% happened.
> >> >>>>
> >> >>>> And thanks Irek for help - could you kindly just let me know of the
> >> >>>> prefered steps when removing whole node?
> >> >>>> Do you mean I first stop all OSDs again, or just remove each OSD
> from
> >> >>>> crush map, or perhaps, just decompile cursh map, delete the node
> >> >>>> completely,
> >> >>>> compile back in, and let it heal/recover ?
> >> >>>>
> >> >>>> Do you think this would result in less data missplaces and moved
> >> >>>> arround
> >> >>>> ?
> >> >>>>
> >> >>>> Sorry for bugging you, I really appreaciate your help.
> >> >>>>
> >> >>>> Thanks
> >> >>>>
> >> >>>> On 3 March 2015 at 12:58, Irek Fasikhov <malm...@gmail.com> wrote:
> >> >>>>>
> >> >>>>> A large percentage of the rebuild of the cluster map (But low
> >> >>>>> percentage degradation). If you had not made "ceph osd crush rm
> id",
> >> >>>>> the
> >> >>>>> percentage would be low.
> >> >>>>> In your case, the correct option is to remove the entire node,
> >> >>>>> rather
> >> >>>>> than each disk individually
> >> >>>>>
> >> >>>>> 2015-03-03 14:27 GMT+03:00 Andrija Panic <andrija.pa...@gmail.com
> >:
> >> >>>>>>
> >> >>>>>> Another question - I mentioned here 37% of objects being moved
> >> >>>>>> arround
> >> >>>>>> - this is MISPLACED object (degraded objects were 0.001%, after I
> >> >>>>>> removed 1
> >> >>>>>> OSD from cursh map (out of 44 OSD or so).
> >> >>>>>>
> >> >>>>>> Can anybody confirm this is normal behaviour - and are there any
> >> >>>>>> workarrounds ?
> >> >>>>>>
> >> >>>>>> I understand this is because of the object placement algorithm of
> >> >>>>>> CEPH, but still 37% of object missplaces just by removing 1 OSD
> >> >>>>>> from crush
> >> >>>>>> maps out of 44 make me wonder why this large percentage ?
> >> >>>>>>
> >> >>>>>> Seems not good to me, and I have to remove another 7 OSDs (we are
> >> >>>>>> demoting some old hardware nodes). This means I can potentialy go
> >> >>>>>> with 7 x
> >> >>>>>> the same number of missplaced objects...?
> >> >>>>>>
> >> >>>>>> Any thoughts ?
> >> >>>>>>
> >> >>>>>> Thanks
> >> >>>>>>
> >> >>>>>> On 3 March 2015 at 12:14, Andrija Panic <andrija.pa...@gmail.com
> >
> >> >>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>> Thanks Irek.
> >> >>>>>>>
> >> >>>>>>> Does this mean, that after peering for each PG, there will be
> >> >>>>>>> delay
> >> >>>>>>> of 10sec, meaning that every once in a while, I will have 10sec
> od
> >> >>>>>>> the
> >> >>>>>>> cluster NOT being stressed/overloaded, and then the recovery
> takes
> >> >>>>>>> place for
> >> >>>>>>> that PG, and then another 10sec cluster is fine, and then
> stressed
> >> >>>>>>> again ?
> >> >>>>>>>
> >> >>>>>>> I'm trying to understand process before actually doing stuff
> >> >>>>>>> (config
> >> >>>>>>> reference is there on ceph.com but I don't fully understand the
> >> >>>>>>> process)
> >> >>>>>>>
> >> >>>>>>> Thanks,
> >> >>>>>>> Andrija
> >> >>>>>>>
> >> >>>>>>> On 3 March 2015 at 11:32, Irek Fasikhov <malm...@gmail.com>
> wrote:
> >> >>>>>>>>
> >> >>>>>>>> Hi.
> >> >>>>>>>>
> >> >>>>>>>> Use value "osd_recovery_delay_start"
> >> >>>>>>>> example:
> >> >>>>>>>> [root@ceph08 ceph]# ceph --admin-daemon
> >> >>>>>>>> /var/run/ceph/ceph-osd.94.asok config show  | grep
> >> >>>>>>>> osd_recovery_delay_start
> >> >>>>>>>>   "osd_recovery_delay_start": "10"
> >> >>>>>>>>
> >> >>>>>>>> 2015-03-03 13:13 GMT+03:00 Andrija Panic
> >> >>>>>>>> <andrija.pa...@gmail.com>:
> >> >>>>>>>>>
> >> >>>>>>>>> HI Guys,
> >> >>>>>>>>>
> >> >>>>>>>>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and
> it
> >> >>>>>>>>> caused over 37% od the data to rebalance - let's say this is
> >> >>>>>>>>> fine (this is
> >> >>>>>>>>> when I removed it frm Crush Map).
> >> >>>>>>>>>
> >> >>>>>>>>> I'm wondering - I have previously set some throtling
> mechanism,
> >> >>>>>>>>> but
> >> >>>>>>>>> during first 1h of rebalancing, my rate of recovery was going
> up
> >> >>>>>>>>> to 1500
> >> >>>>>>>>> MB/s - and VMs were unusable completely, and then last 4h of
> the
> >> >>>>>>>>> duration of
> >> >>>>>>>>> recover this recovery rate went down to, say, 100-200 MB.s and
> >> >>>>>>>>> during this
> >> >>>>>>>>> VM performance was still pretty impacted, but at least I could
> >> >>>>>>>>> work more or
> >> >>>>>>>>> a less
> >> >>>>>>>>>
> >> >>>>>>>>> So my question, is this behaviour expected, is throtling here
> >> >>>>>>>>> working as expected, since first 1h was almoust no throtling
> >> >>>>>>>>> applied if I
> >> >>>>>>>>> check the recovery rate 1500MB/s and the impact on Vms.
> >> >>>>>>>>> And last 4h seemed pretty fine (although still lot of impact
> in
> >> >>>>>>>>> general)
> >> >>>>>>>>>
> >> >>>>>>>>> I changed these throtling on the fly with:
> >> >>>>>>>>>
> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_max_backfills 1'
> >> >>>>>>>>>
> >> >>>>>>>>> My Jorunals are on SSDs (12 OSD per server, of which 6
> journals
> >> >>>>>>>>> on
> >> >>>>>>>>> one SSD, 6 journals on another SSD)  - I have 3 of these
> hosts.
> >> >>>>>>>>>
> >> >>>>>>>>> Any thought are welcome.
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Andrija Panić
> >> >>>>>>>>>
> >> >>>>>>>>> _______________________________________________
> >> >>>>>>>>> ceph-users mailing list
> >> >>>>>>>>> ceph-users@lists.ceph.com
> >> >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> С уважением, Фасихов Ирек Нургаязович
> >> >>>>>>>> Моб.: +79229045757
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>>
> >> >>>>>>> Andrija Panić
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>>
> >> >>>>>> Andrija Panić
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> С уважением, Фасихов Ирек Нургаязович
> >> >>>>> Моб.: +79229045757
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Andrija Panić
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> С уважением, Фасихов Ирек Нургаязович
> >> >>> Моб.: +79229045757
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> С уважением, Фасихов Ирек Нургаязович
> >> >> Моб.: +79229045757
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
> >
> >
> > --
> >
> > Andrija Panić
>



-- 

Andrija Panić

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

Reply via email to