Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

Andrija Panic Thu, 05 Mar 2015 11:53:27 -0800

Hi Robert,

it seems I have not listened well on your advice - I set osd to out,
instead of stoping it - and now instead of some ~ 3% of degraded objects,
now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
is happening again, but this is small percentage..


Do you know if later when I remove this OSD from crush map - no more data
will be rebalanced (as per CEPH official documentation) - since already
missplaced objects are geting distributed away to all other nodes ?

(after service ceph stop osd.0 - there was 2.45% degraded data - but no
backfilling was happening for some reason...it just stayed degraded... so
this is a reason why I started back the OSD, and then set it to out...)

Thanks

On 4 March 2015 at 17:54, Andrija Panic <andrija.pa...@gmail.com> wrote:

> Hi Robert,
>
> I already have this stuff set. CEph is 0.87.0 now...
>
> Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
> move data in less than 8h per my last experineced that was arround8h, but
> some 1G OSDs were included...
>
> Thx!
>
> On 4 March 2015 at 17:49, Robert LeBlanc <rob...@leblancnet.us> wrote:
>
>> You will most likely have a very high relocation percentage. Backfills
>> always are more impactful on smaller clusters, but "osd max backfills"
>> should be what you need to help reduce the impact. The default is 10,
>> you will want to use 1.
>>
>> I didn't catch which version of Ceph you are running, but I think
>> there was some priority work done in firefly to help make backfills
>> lower priority. I think it has gotten better in later versions.
>>
>> On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic <andrija.pa...@gmail.com>
>> wrote:
>> > Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
>> crush
>> > map - weather that will cause more than 37% of data moved (80% or
>> whatever)
>> >
>> > I'm also wondering if the thortling that I applied is fine or not - I
>> will
>> > introduce the osd_recovery_delay_start 10sec as Irek said.
>> >
>> > I'm just wondering hom much will be the performance impact, because:
>> > - when stoping OSD, the impact while backfilling was fine more or a
>> less - I
>> > can leave with this
>> > - when I removed OSD from cursh map - first 1h or so, impact was
>> tremendous,
>> > and later on during recovery process impact was much less but still
>> > noticable...
>> >
>> > Thanks for the tip of course !
>> > Andrija
>> >
>> > On 3 March 2015 at 18:34, Robert LeBlanc <rob...@leblancnet.us> wrote:
>> >>
>> >> I would be inclined to shut down both OSDs in a node, let the cluster
>> >> recover. Once it is recovered, shut down the next two, let it recover.
>> >> Repeat until all the OSDs are taken out of the cluster. Then I would
>> >> set nobackfill and norecover. Then remove the hosts/disks from the
>> >> CRUSH then unset nobackfill and norecover.
>> >>
>> >> That should give you a few small changes (when you shut down OSDs) and
>> >> then one big one to get everything in the final place. If you are
>> >> still adding new nodes, when nobackfill and norecover is set, you can
>> >> add them in so that the one big relocate fills the new drives too.
>> >>
>> >> On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic <andrija.pa...@gmail.com
>> >
>> >> wrote:
>> >> > Thx Irek. Number of replicas is 3.
>> >> >
>> >> > I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
>> >> > decommissioned), which is further connected to a new 10G
>> switch/network
>> >> > with
>> >> > 3 servers on it with 12 OSDs each.
>> >> > I'm decommissioning old 3 nodes on 1G network...
>> >> >
>> >> > So you suggest removing whole node with 2 OSDs manually from crush
>> map?
>> >> > Per my knowledge, ceph never places 2 replicas on 1 node, all 3
>> replicas
>> >> > were originally been distributed over all 3 nodes. So anyway It
>> could be
>> >> > safe to remove 2 OSDs at once together with the node itself...since
>> >> > replica
>> >> > count is 3...
>> >> > ?
>> >> >
>> >> > Thx again for your time
>> >> >
>> >> > On Mar 3, 2015 1:35 PM, "Irek Fasikhov" <malm...@gmail.com> wrote:
>> >> >>
>> >> >> Once you have only three nodes in the cluster.
>> >> >> I recommend you add new nodes to the cluster, and then delete the
>> old.
>> >> >>
>> >> >> 2015-03-03 15:28 GMT+03:00 Irek Fasikhov <malm...@gmail.com>:
>> >> >>>
>> >> >>> You have a number of replication?
>> >> >>>
>> >> >>> 2015-03-03 15:14 GMT+03:00 Andrija Panic <andrija.pa...@gmail.com
>> >:
>> >> >>>>
>> >> >>>> Hi Irek,
>> >> >>>>
>> >> >>>> yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
>> >> >>>> degraded and moved/recovered.
>> >> >>>> When I after that removed it from Crush map "ceph osd crush rm
>> id",
>> >> >>>> that's when the stuff with 37% happened.
>> >> >>>>
>> >> >>>> And thanks Irek for help - could you kindly just let me know of
>> the
>> >> >>>> prefered steps when removing whole node?
>> >> >>>> Do you mean I first stop all OSDs again, or just remove each OSD
>> from
>> >> >>>> crush map, or perhaps, just decompile cursh map, delete the node
>> >> >>>> completely,
>> >> >>>> compile back in, and let it heal/recover ?
>> >> >>>>
>> >> >>>> Do you think this would result in less data missplaces and moved
>> >> >>>> arround
>> >> >>>> ?
>> >> >>>>
>> >> >>>> Sorry for bugging you, I really appreaciate your help.
>> >> >>>>
>> >> >>>> Thanks
>> >> >>>>
>> >> >>>> On 3 March 2015 at 12:58, Irek Fasikhov <malm...@gmail.com>
>> wrote:
>> >> >>>>>
>> >> >>>>> A large percentage of the rebuild of the cluster map (But low
>> >> >>>>> percentage degradation). If you had not made "ceph osd crush rm
>> id",
>> >> >>>>> the
>> >> >>>>> percentage would be low.
>> >> >>>>> In your case, the correct option is to remove the entire node,
>> >> >>>>> rather
>> >> >>>>> than each disk individually
>> >> >>>>>
>> >> >>>>> 2015-03-03 14:27 GMT+03:00 Andrija Panic <
>> andrija.pa...@gmail.com>:
>> >> >>>>>>
>> >> >>>>>> Another question - I mentioned here 37% of objects being moved
>> >> >>>>>> arround
>> >> >>>>>> - this is MISPLACED object (degraded objects were 0.001%, after
>> I
>> >> >>>>>> removed 1
>> >> >>>>>> OSD from cursh map (out of 44 OSD or so).
>> >> >>>>>>
>> >> >>>>>> Can anybody confirm this is normal behaviour - and are there any
>> >> >>>>>> workarrounds ?
>> >> >>>>>>
>> >> >>>>>> I understand this is because of the object placement algorithm
>> of
>> >> >>>>>> CEPH, but still 37% of object missplaces just by removing 1 OSD
>> >> >>>>>> from crush
>> >> >>>>>> maps out of 44 make me wonder why this large percentage ?
>> >> >>>>>>
>> >> >>>>>> Seems not good to me, and I have to remove another 7 OSDs (we
>> are
>> >> >>>>>> demoting some old hardware nodes). This means I can potentialy
>> go
>> >> >>>>>> with 7 x
>> >> >>>>>> the same number of missplaced objects...?
>> >> >>>>>>
>> >> >>>>>> Any thoughts ?
>> >> >>>>>>
>> >> >>>>>> Thanks
>> >> >>>>>>
>> >> >>>>>> On 3 March 2015 at 12:14, Andrija Panic <
>> andrija.pa...@gmail.com>
>> >> >>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>> Thanks Irek.
>> >> >>>>>>>
>> >> >>>>>>> Does this mean, that after peering for each PG, there will be
>> >> >>>>>>> delay
>> >> >>>>>>> of 10sec, meaning that every once in a while, I will have
>> 10sec od
>> >> >>>>>>> the
>> >> >>>>>>> cluster NOT being stressed/overloaded, and then the recovery
>> takes
>> >> >>>>>>> place for
>> >> >>>>>>> that PG, and then another 10sec cluster is fine, and then
>> stressed
>> >> >>>>>>> again ?
>> >> >>>>>>>
>> >> >>>>>>> I'm trying to understand process before actually doing stuff
>> >> >>>>>>> (config
>> >> >>>>>>> reference is there on ceph.com but I don't fully understand
>> the
>> >> >>>>>>> process)
>> >> >>>>>>>
>> >> >>>>>>> Thanks,
>> >> >>>>>>> Andrija
>> >> >>>>>>>
>> >> >>>>>>> On 3 March 2015 at 11:32, Irek Fasikhov <malm...@gmail.com>
>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>> Hi.
>> >> >>>>>>>>
>> >> >>>>>>>> Use value "osd_recovery_delay_start"
>> >> >>>>>>>> example:
>> >> >>>>>>>> [root@ceph08 ceph]# ceph --admin-daemon
>> >> >>>>>>>> /var/run/ceph/ceph-osd.94.asok config show  | grep
>> >> >>>>>>>> osd_recovery_delay_start
>> >> >>>>>>>>   "osd_recovery_delay_start": "10"
>> >> >>>>>>>>
>> >> >>>>>>>> 2015-03-03 13:13 GMT+03:00 Andrija Panic
>> >> >>>>>>>> <andrija.pa...@gmail.com>:
>> >> >>>>>>>>>
>> >> >>>>>>>>> HI Guys,
>> >> >>>>>>>>>
>> >> >>>>>>>>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and
>> it
>> >> >>>>>>>>> caused over 37% od the data to rebalance - let's say this is
>> >> >>>>>>>>> fine (this is
>> >> >>>>>>>>> when I removed it frm Crush Map).
>> >> >>>>>>>>>
>> >> >>>>>>>>> I'm wondering - I have previously set some throtling
>> mechanism,
>> >> >>>>>>>>> but
>> >> >>>>>>>>> during first 1h of rebalancing, my rate of recovery was
>> going up
>> >> >>>>>>>>> to 1500
>> >> >>>>>>>>> MB/s - and VMs were unusable completely, and then last 4h of
>> the
>> >> >>>>>>>>> duration of
>> >> >>>>>>>>> recover this recovery rate went down to, say, 100-200 MB.s
>> and
>> >> >>>>>>>>> during this
>> >> >>>>>>>>> VM performance was still pretty impacted, but at least I
>> could
>> >> >>>>>>>>> work more or
>> >> >>>>>>>>> a less
>> >> >>>>>>>>>
>> >> >>>>>>>>> So my question, is this behaviour expected, is throtling here
>> >> >>>>>>>>> working as expected, since first 1h was almoust no throtling
>> >> >>>>>>>>> applied if I
>> >> >>>>>>>>> check the recovery rate 1500MB/s and the impact on Vms.
>> >> >>>>>>>>> And last 4h seemed pretty fine (although still lot of impact
>> in
>> >> >>>>>>>>> general)
>> >> >>>>>>>>>
>> >> >>>>>>>>> I changed these throtling on the fly with:
>> >> >>>>>>>>>
>> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
>> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
>> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_max_backfills 1'
>> >> >>>>>>>>>
>> >> >>>>>>>>> My Jorunals are on SSDs (12 OSD per server, of which 6
>> journals
>> >> >>>>>>>>> on
>> >> >>>>>>>>> one SSD, 6 journals on another SSD)  - I have 3 of these
>> hosts.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Any thought are welcome.
>> >> >>>>>>>>> --
>> >> >>>>>>>>>
>> >> >>>>>>>>> Andrija Panić
>> >> >>>>>>>>>
>> >> >>>>>>>>> _______________________________________________
>> >> >>>>>>>>> ceph-users mailing list
>> >> >>>>>>>>> ceph-users@lists.ceph.com
>> >> >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> --
>> >> >>>>>>>> С уважением, Фасихов Ирек Нургаязович
>> >> >>>>>>>> Моб.: +79229045757
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>>
>> >> >>>>>>> Andrija Panić
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>>
>> >> >>>>>> Andrija Panić
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> С уважением, Фасихов Ирек Нургаязович
>> >> >>>>> Моб.: +79229045757
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>>
>> >> >>>> Andrija Panić
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> С уважением, Фасихов Ирек Нургаязович
>> >> >>> Моб.: +79229045757
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> С уважением, Фасихов Ирек Нургаязович
>> >> >> Моб.: +79229045757
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

Reply via email to