Thanks Paul, Coming back to my question, is it a good idea to add SSD Journals for HDD on a new node in an existing hdd journal and osd cluster?
On Sun, Apr 28, 2019 at 2:49 PM Paul Emmerich <paul.emmer...@croit.io> wrote: > Looks like you got lots of tiny objects. By default the recovery speed > on HDDs is limited to 10 objects per second (40 with DB on a SSD) per > thread. > > > Decrease osd_recovery_sleep_hdd (default 0.1) to increase > recovery/backfill speed. > > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Sun, Apr 28, 2019 at 6:57 AM Nikhil R <nikh.ravin...@gmail.com> wrote: > > > > Hi, > > I have set noout, noscrub and nodeep-scrub and the last time we added > osd's we adding few at a time. > > The main issue here is with IOPS where the existing osd's are not able > to backfill at a higher rate - not even 1 thread during peak hours and a > max of 2 threads during off-peak. We are getting more client i/o and the > documents ingested are more than the space being freed up by backfilling > pg's to new osd's added. > > Below is our cluster health > > health HEALTH_WARN > > 5221 pgs backfill_wait > > 31 pgs backfilling > > 1453 pgs degraded > > 4 pgs recovering > > 1054 pgs recovery_wait > > 1453 pgs stuck degraded > > 6310 pgs stuck unclean > > 384 pgs stuck undersized > > 384 pgs undersized > > recovery 130823732/9142530156 objects degraded (1.431%) > > recovery 2446840943/9142530156 objects misplaced (26.763%) > > noout,nobackfill,noscrub,nodeep-scrub flag(s) set > > mon.mon_1 store is getting too big! 26562 MB >= 15360 MB > > mon.mon_2 store is getting too big! 26828 MB >= 15360 MB > > mon.mon_3 store is getting too big! 26504 MB >= 15360 MB > > monmap e1: 3 mons at > {mon_1=x.x.x.x:x.yyyy/0,mon_2=x.x.x.x:yyyy/0,mon_3=x.x.x.x:yyyy/0} > > election epoch 7996, quorum 0,1,2 mon_1,mon_2,mon_3 > > osdmap e194833: 105 osds: 105 up, 105 in; 5931 remapped pgs > > flags > noout,nobackfill,noscrub,nodeep-scrub,sortbitwise,require_jewel_osds > > pgmap v48390703: 10536 pgs, 18 pools, 144 TB data, 2906 Mobjects > > 475 TB used, 287 TB / 763 TB avail > > 130823732/9142530156 objects degraded (1.431%) > > 2446840943/9142530156 objects misplaced (26.763%) > > 4851 active+remapped+wait_backfill > > 4226 active+clean > > 659 active+recovery_wait+degraded+remapped > > 377 active+recovery_wait+degraded > > 357 active+undersized+degraded+remapped+wait_backfill > > 18 active+recovery_wait+undersized+degraded+remapped > > 16 active+degraded+remapped+backfilling > > 13 active+degraded+remapped+wait_backfill > > 9 active+undersized+degraded+remapped+backfilling > > 6 active+remapped+backfilling > > 2 active+recovering+degraded > > 2 active+recovering+degraded+remapped > > client io 11894 kB/s rd, 105 kB/s wr, 981 op/s rd, 72 op/s wr > > > > So, is it a good option to add new osd's on a new node with ssd's as > journals? > > in.linkedin.com/in/nikhilravindra > > > > > > > > On Sun, Apr 28, 2019 at 6:05 AM Erik McCormick < > emccorm...@cirrusseven.com> wrote: > >> > >> On Sat, Apr 27, 2019, 3:49 PM Nikhil R <nikh.ravin...@gmail.com> wrote: > >>> > >>> We have baremetal nodes 256GB RAM, 36core CPU > >>> We are on ceph jewel 10.2.9 with leveldb > >>> The osd’s and journals are on the same hdd. > >>> We have 1 backfill_max_active, 1 recovery_max_active and 1 > recovery_op_priority > >>> The osd crashes and starts once a pg is backfilled and the next pg > tried to backfill. This is when we see iostat and the disk is utilised upto > 100%. > >> > >> > >> I would set noout to prevent excess movement in the event of OSD > flapping, and disable scrubbing and deep scrubbing until your backfilling > has completed. I would also bring the new OSDs online a few at a time > rather than all 25 at once if you add more servers. > >> > >>> > >>> Appreciate your help David > >>> > >>> On Sun, 28 Apr 2019 at 00:46, David C <dcsysengin...@gmail.com> wrote: > >>>> > >>>> > >>>> > >>>> On Sat, 27 Apr 2019, 18:50 Nikhil R, <nikh.ravin...@gmail.com> wrote: > >>>>> > >>>>> Guys, > >>>>> We now have a total of 105 osd’s on 5 baremetal nodes each hosting > 21 osd’s on HDD which are 7Tb with journals on HDD too. Each journal is > about 5GB > >>>> > >>>> > >>>> This would imply you've got a separate hdd partition for journals, I > don't think there's any value in that and would probabaly be detrimental to > performance. > >>>>> > >>>>> > >>>>> We expanded our cluster last week and added 1 more node with 21 HDD > and journals on same disk. > >>>>> Our client i/o is too heavy and we are not able to backfill even 1 > thread during peak hours - incase we backfill during peak hours osd's are > crashing causing undersized pg's and if we have another osd crash we wont > be able to use our cluster due to undersized and recovery pg's. During > non-peak we can just backfill 8-10 pgs. > >>>>> Due to this our MAX AVAIL is draining out very fast. > >>>> > >>>> > >>>> How much ram have you got in your nodes? In my experience that's a > common reason for crashing OSDs during recovery ops > >>>> > >>>> What does your recovery and backfill tuning look like? > >>>> > >>>> > >>>>> > >>>>> We are thinking of adding 2 more baremetal nodes with 21 *7tb osd’s > on HDD and add 50GB SSD Journals for these. > >>>>> We aim to backfill from the 105 osd’s a bit faster and expect writes > of backfillis coming to these osd’s faster. > >>>> > >>>> > >>>> Ssd journals would certainly help, just be sure it's a model that > performs well with Ceph > >>>>> > >>>>> > >>>>> Is this a good viable idea? > >>>>> Thoughts please? > >>>> > >>>> > >>>> I'd recommend sharing more detail e.g full spec of the nodes, Ceph > version etc. > >>>>> > >>>>> > >>>>> -Nikhil > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> ceph-users@lists.ceph.com > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >>> -- > >>> Sent from my iPhone > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com