Hi David, Here is the output of ceph df. We have lot of space in our ceph cluster. We have 2 OSDs (266,500) down earlier due to hardware issue and never got a chance to fix them.
GLOBAL: SIZE AVAIL RAW USED %RAW USED 1101T 701T 400T 36.37 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 0 0 159T 0 .rgw.root 3 780 0 159T 3 .rgw.control 4 0 0 159T 8 .rgw.gc 5 0 0 159T 35 .users.uid 6 6037 0 159T 32 images 7 16462G 4.38 159T 2660844 .rgw 10 820 0 159T 4 volumes 11 106T 28.91 159T 28011837 compute 12 11327G 3.01 159T 1467722 backups 15 0 0 159T 0 .rgw.buckets.index 16 0 0 159T 2 .rgw.buckets 17 0 0 159T 0 Thanks, Pardhiv K On Fri, May 11, 2018 at 7:14 PM, David Turner <drakonst...@gmail.com> wrote: > What's your `ceph osd tree`, `ceph df`, `ceph osd df`? You sound like you > just have a fairly fill cluster that you haven't balanced the crush weights > on. > > > On Fri, May 11, 2018, 10:06 PM Pardhiv Karri <meher4in...@gmail.com> > wrote: > >> Hi David, >> >> Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much >> on all OSDs. But this node it is different whether full weight or just >> 0.2of OSD 611 the OSD 611 start increasing. >> >> --Pardhiv K >> >> >> On Fri, May 11, 2018 at 10:50 AM, David Turner <drakonst...@gmail.com> >> wrote: >> >>> There was a time in the history of Ceph where a weight of 0.0 was not >>> always what you thought. People had better experiences with crush weights >>> of something like 0.0001 or something. This is just a memory tickling in >>> the back of my mind of things I've read on the ML years back. >>> >>> On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell <bstillw...@godaddy.com> >>> wrote: >>> >>>> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 >>>> 2TB disks >>>> > each to the cluster. All the 5 nodes rebalanced well without any >>>> issues and >>>> > the sixth/last node OSDs started acting weird as I increase weight of >>>> one osd >>>> > the utilization doesn't change but a different osd on the same node >>>> > utilization is getting increased. Rebalance complete fine but >>>> utilization is >>>> > not right. >>>> > >>>> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 >>>> > started increasing but its weight is 0.0. If I increase weight of OSD >>>> 611 to >>>> > 0.2 then its overall utilization is growing to what if its weight is >>>> 0.4. So >>>> > if I increase weight of 610 and 615 to their full weight then >>>> utilization on >>>> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to >>>> stop and >>>> > downsize the OSD's crush weight back to 0.0 to avoid any implications >>>> on ceph >>>> > cluster. Its not just one osd but different OSD's on that one node. >>>> The only >>>> > correlation I found out is 610 and 611 OSD Journal partitions are on >>>> the same >>>> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or >>>> > resolve this will be helpful. >>>> >>>> You didn't say which version of Ceph you were using, but based on the >>>> output >>>> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? >>>> >>>> I've found that data placement can be a little weird when you have >>>> really >>>> low CRUSH weights (0.2) on one of the nodes where the other nodes have >>>> large >>>> CRUSH weights (2.0). I've had it where a single OSD in a node was >>>> getting >>>> almost all the data. It wasn't until I increased the weights to be >>>> more in >>>> line with the rest of the cluster that it evened back out. >>>> >>>> I believe this can also be caused by not having enough PGs in your >>>> cluster. >>>> Or the PGs you do have aren't distributed correctly based on the data >>>> usage >>>> in each pool. Have you used https://ceph.com/pgcalc/ to determine the >>>> correct number of PGs you should have per pool? >>>> >>>> Since you are likely running a pre-Jewel cluster it could also be that >>>> you >>>> haven't switched your tunables to use the straw2 data placement >>>> algorithm: >>>> >>>> http://docs.ceph.com/docs/master/rados/operations/crush- >>>> map/#hammer-crush-v4 >>>> >>>> That should help as well. Once that's enabled you can convert your >>>> existing >>>> buckets to straw2 as well. Just be careful you don't have any old >>>> clients >>>> connecting to your cluster that don't support that feature yet. >>>> >>>> Bryan >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> >> >> -- >> *Pardhiv Karri* >> "Rise and Rise again until LAMBS become LIONS" >> >> >> -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS"
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com