Hi Robert, it seems I have not listened well on your advice - I set osd to out, instead of stoping it - and now instead of some ~ 3% of degraded objects, now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing is happening again, but this is small percentage..
Do you know if later when I remove this OSD from crush map - no more data will be rebalanced (as per CEPH official documentation) - since already missplaced objects are geting distributed away to all other nodes ? (after service ceph stop osd.0 - there was 2.45% degraded data - but no backfilling was happening for some reason...it just stayed degraded... so this is a reason why I started back the OSD, and then set it to out...) Thanks On 4 March 2015 at 17:54, Andrija Panic <andrija.pa...@gmail.com> wrote: > Hi Robert, > > I already have this stuff set. CEph is 0.87.0 now... > > Thanks, will schedule this for weekend, 10G network and 36 OSDs - should > move data in less than 8h per my last experineced that was arround8h, but > some 1G OSDs were included... > > Thx! > > On 4 March 2015 at 17:49, Robert LeBlanc <rob...@leblancnet.us> wrote: > >> You will most likely have a very high relocation percentage. Backfills >> always are more impactful on smaller clusters, but "osd max backfills" >> should be what you need to help reduce the impact. The default is 10, >> you will want to use 1. >> >> I didn't catch which version of Ceph you are running, but I think >> there was some priority work done in firefly to help make backfills >> lower priority. I think it has gotten better in later versions. >> >> On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic <andrija.pa...@gmail.com> >> wrote: >> > Thank you Rober - I'm wondering when I do remove total of 7 OSDs from >> crush >> > map - weather that will cause more than 37% of data moved (80% or >> whatever) >> > >> > I'm also wondering if the thortling that I applied is fine or not - I >> will >> > introduce the osd_recovery_delay_start 10sec as Irek said. >> > >> > I'm just wondering hom much will be the performance impact, because: >> > - when stoping OSD, the impact while backfilling was fine more or a >> less - I >> > can leave with this >> > - when I removed OSD from cursh map - first 1h or so, impact was >> tremendous, >> > and later on during recovery process impact was much less but still >> > noticable... >> > >> > Thanks for the tip of course ! >> > Andrija >> > >> > On 3 March 2015 at 18:34, Robert LeBlanc <rob...@leblancnet.us> wrote: >> >> >> >> I would be inclined to shut down both OSDs in a node, let the cluster >> >> recover. Once it is recovered, shut down the next two, let it recover. >> >> Repeat until all the OSDs are taken out of the cluster. Then I would >> >> set nobackfill and norecover. Then remove the hosts/disks from the >> >> CRUSH then unset nobackfill and norecover. >> >> >> >> That should give you a few small changes (when you shut down OSDs) and >> >> then one big one to get everything in the final place. If you are >> >> still adding new nodes, when nobackfill and norecover is set, you can >> >> add them in so that the one big relocate fills the new drives too. >> >> >> >> On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic <andrija.pa...@gmail.com >> > >> >> wrote: >> >> > Thx Irek. Number of replicas is 3. >> >> > >> >> > I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already >> >> > decommissioned), which is further connected to a new 10G >> switch/network >> >> > with >> >> > 3 servers on it with 12 OSDs each. >> >> > I'm decommissioning old 3 nodes on 1G network... >> >> > >> >> > So you suggest removing whole node with 2 OSDs manually from crush >> map? >> >> > Per my knowledge, ceph never places 2 replicas on 1 node, all 3 >> replicas >> >> > were originally been distributed over all 3 nodes. So anyway It >> could be >> >> > safe to remove 2 OSDs at once together with the node itself...since >> >> > replica >> >> > count is 3... >> >> > ? >> >> > >> >> > Thx again for your time >> >> > >> >> > On Mar 3, 2015 1:35 PM, "Irek Fasikhov" <malm...@gmail.com> wrote: >> >> >> >> >> >> Once you have only three nodes in the cluster. >> >> >> I recommend you add new nodes to the cluster, and then delete the >> old. >> >> >> >> >> >> 2015-03-03 15:28 GMT+03:00 Irek Fasikhov <malm...@gmail.com>: >> >> >>> >> >> >>> You have a number of replication? >> >> >>> >> >> >>> 2015-03-03 15:14 GMT+03:00 Andrija Panic <andrija.pa...@gmail.com >> >: >> >> >>>> >> >> >>>> Hi Irek, >> >> >>>> >> >> >>>> yes, stoping OSD (or seting it to OUT) resulted in only 3% of data >> >> >>>> degraded and moved/recovered. >> >> >>>> When I after that removed it from Crush map "ceph osd crush rm >> id", >> >> >>>> that's when the stuff with 37% happened. >> >> >>>> >> >> >>>> And thanks Irek for help - could you kindly just let me know of >> the >> >> >>>> prefered steps when removing whole node? >> >> >>>> Do you mean I first stop all OSDs again, or just remove each OSD >> from >> >> >>>> crush map, or perhaps, just decompile cursh map, delete the node >> >> >>>> completely, >> >> >>>> compile back in, and let it heal/recover ? >> >> >>>> >> >> >>>> Do you think this would result in less data missplaces and moved >> >> >>>> arround >> >> >>>> ? >> >> >>>> >> >> >>>> Sorry for bugging you, I really appreaciate your help. >> >> >>>> >> >> >>>> Thanks >> >> >>>> >> >> >>>> On 3 March 2015 at 12:58, Irek Fasikhov <malm...@gmail.com> >> wrote: >> >> >>>>> >> >> >>>>> A large percentage of the rebuild of the cluster map (But low >> >> >>>>> percentage degradation). If you had not made "ceph osd crush rm >> id", >> >> >>>>> the >> >> >>>>> percentage would be low. >> >> >>>>> In your case, the correct option is to remove the entire node, >> >> >>>>> rather >> >> >>>>> than each disk individually >> >> >>>>> >> >> >>>>> 2015-03-03 14:27 GMT+03:00 Andrija Panic < >> andrija.pa...@gmail.com>: >> >> >>>>>> >> >> >>>>>> Another question - I mentioned here 37% of objects being moved >> >> >>>>>> arround >> >> >>>>>> - this is MISPLACED object (degraded objects were 0.001%, after >> I >> >> >>>>>> removed 1 >> >> >>>>>> OSD from cursh map (out of 44 OSD or so). >> >> >>>>>> >> >> >>>>>> Can anybody confirm this is normal behaviour - and are there any >> >> >>>>>> workarrounds ? >> >> >>>>>> >> >> >>>>>> I understand this is because of the object placement algorithm >> of >> >> >>>>>> CEPH, but still 37% of object missplaces just by removing 1 OSD >> >> >>>>>> from crush >> >> >>>>>> maps out of 44 make me wonder why this large percentage ? >> >> >>>>>> >> >> >>>>>> Seems not good to me, and I have to remove another 7 OSDs (we >> are >> >> >>>>>> demoting some old hardware nodes). This means I can potentialy >> go >> >> >>>>>> with 7 x >> >> >>>>>> the same number of missplaced objects...? >> >> >>>>>> >> >> >>>>>> Any thoughts ? >> >> >>>>>> >> >> >>>>>> Thanks >> >> >>>>>> >> >> >>>>>> On 3 March 2015 at 12:14, Andrija Panic < >> andrija.pa...@gmail.com> >> >> >>>>>> wrote: >> >> >>>>>>> >> >> >>>>>>> Thanks Irek. >> >> >>>>>>> >> >> >>>>>>> Does this mean, that after peering for each PG, there will be >> >> >>>>>>> delay >> >> >>>>>>> of 10sec, meaning that every once in a while, I will have >> 10sec od >> >> >>>>>>> the >> >> >>>>>>> cluster NOT being stressed/overloaded, and then the recovery >> takes >> >> >>>>>>> place for >> >> >>>>>>> that PG, and then another 10sec cluster is fine, and then >> stressed >> >> >>>>>>> again ? >> >> >>>>>>> >> >> >>>>>>> I'm trying to understand process before actually doing stuff >> >> >>>>>>> (config >> >> >>>>>>> reference is there on ceph.com but I don't fully understand >> the >> >> >>>>>>> process) >> >> >>>>>>> >> >> >>>>>>> Thanks, >> >> >>>>>>> Andrija >> >> >>>>>>> >> >> >>>>>>> On 3 March 2015 at 11:32, Irek Fasikhov <malm...@gmail.com> >> wrote: >> >> >>>>>>>> >> >> >>>>>>>> Hi. >> >> >>>>>>>> >> >> >>>>>>>> Use value "osd_recovery_delay_start" >> >> >>>>>>>> example: >> >> >>>>>>>> [root@ceph08 ceph]# ceph --admin-daemon >> >> >>>>>>>> /var/run/ceph/ceph-osd.94.asok config show | grep >> >> >>>>>>>> osd_recovery_delay_start >> >> >>>>>>>> "osd_recovery_delay_start": "10" >> >> >>>>>>>> >> >> >>>>>>>> 2015-03-03 13:13 GMT+03:00 Andrija Panic >> >> >>>>>>>> <andrija.pa...@gmail.com>: >> >> >>>>>>>>> >> >> >>>>>>>>> HI Guys, >> >> >>>>>>>>> >> >> >>>>>>>>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and >> it >> >> >>>>>>>>> caused over 37% od the data to rebalance - let's say this is >> >> >>>>>>>>> fine (this is >> >> >>>>>>>>> when I removed it frm Crush Map). >> >> >>>>>>>>> >> >> >>>>>>>>> I'm wondering - I have previously set some throtling >> mechanism, >> >> >>>>>>>>> but >> >> >>>>>>>>> during first 1h of rebalancing, my rate of recovery was >> going up >> >> >>>>>>>>> to 1500 >> >> >>>>>>>>> MB/s - and VMs were unusable completely, and then last 4h of >> the >> >> >>>>>>>>> duration of >> >> >>>>>>>>> recover this recovery rate went down to, say, 100-200 MB.s >> and >> >> >>>>>>>>> during this >> >> >>>>>>>>> VM performance was still pretty impacted, but at least I >> could >> >> >>>>>>>>> work more or >> >> >>>>>>>>> a less >> >> >>>>>>>>> >> >> >>>>>>>>> So my question, is this behaviour expected, is throtling here >> >> >>>>>>>>> working as expected, since first 1h was almoust no throtling >> >> >>>>>>>>> applied if I >> >> >>>>>>>>> check the recovery rate 1500MB/s and the impact on Vms. >> >> >>>>>>>>> And last 4h seemed pretty fine (although still lot of impact >> in >> >> >>>>>>>>> general) >> >> >>>>>>>>> >> >> >>>>>>>>> I changed these throtling on the fly with: >> >> >>>>>>>>> >> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_recovery_max_active 1' >> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1' >> >> >>>>>>>>> ceph tell osd.* injectargs '--osd_max_backfills 1' >> >> >>>>>>>>> >> >> >>>>>>>>> My Jorunals are on SSDs (12 OSD per server, of which 6 >> journals >> >> >>>>>>>>> on >> >> >>>>>>>>> one SSD, 6 journals on another SSD) - I have 3 of these >> hosts. >> >> >>>>>>>>> >> >> >>>>>>>>> Any thought are welcome. >> >> >>>>>>>>> -- >> >> >>>>>>>>> >> >> >>>>>>>>> Andrija Panić >> >> >>>>>>>>> >> >> >>>>>>>>> _______________________________________________ >> >> >>>>>>>>> ceph-users mailing list >> >> >>>>>>>>> ceph-users@lists.ceph.com >> >> >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >>>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> -- >> >> >>>>>>>> С уважением, Фасихов Ирек Нургаязович >> >> >>>>>>>> Моб.: +79229045757 >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> -- >> >> >>>>>>> >> >> >>>>>>> Andrija Panić >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> -- >> >> >>>>>> >> >> >>>>>> Andrija Panić >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> -- >> >> >>>>> С уважением, Фасихов Ирек Нургаязович >> >> >>>>> Моб.: +79229045757 >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> -- >> >> >>>> >> >> >>>> Andrija Panić >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> С уважением, Фасихов Ирек Нургаязович >> >> >>> Моб.: +79229045757 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> С уважением, Фасихов Ирек Нургаязович >> >> >> Моб.: +79229045757 >> >> > >> >> > >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@lists.ceph.com >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > >> > >> > >> > >> > >> > -- >> > >> > Andrija Panić >> > > > > -- > > Andrija Panić > -- Andrija Panić
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com