What's your `ceph osd tree`, `ceph df`, `ceph osd df`? You sound like you just have a fairly fill cluster that you haven't balanced the crush weights on.
On Fri, May 11, 2018, 10:06 PM Pardhiv Karri <meher4in...@gmail.com> wrote: > Hi David, > > Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much > on all OSDs. But this node it is different whether full weight or just > 0.2of OSD 611 the OSD 611 start increasing. > > --Pardhiv K > > > On Fri, May 11, 2018 at 10:50 AM, David Turner <drakonst...@gmail.com> > wrote: > >> There was a time in the history of Ceph where a weight of 0.0 was not >> always what you thought. People had better experiences with crush weights >> of something like 0.0001 or something. This is just a memory tickling in >> the back of my mind of things I've read on the ML years back. >> >> On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell <bstillw...@godaddy.com> >> wrote: >> >>> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 >>> 2TB disks >>> > each to the cluster. All the 5 nodes rebalanced well without any >>> issues and >>> > the sixth/last node OSDs started acting weird as I increase weight of >>> one osd >>> > the utilization doesn't change but a different osd on the same node >>> > utilization is getting increased. Rebalance complete fine but >>> utilization is >>> > not right. >>> > >>> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 >>> > started increasing but its weight is 0.0. If I increase weight of OSD >>> 611 to >>> > 0.2 then its overall utilization is growing to what if its weight is >>> 0.4. So >>> > if I increase weight of 610 and 615 to their full weight then >>> utilization on >>> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to >>> stop and >>> > downsize the OSD's crush weight back to 0.0 to avoid any implications >>> on ceph >>> > cluster. Its not just one osd but different OSD's on that one node. >>> The only >>> > correlation I found out is 610 and 611 OSD Journal partitions are on >>> the same >>> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or >>> > resolve this will be helpful. >>> >>> You didn't say which version of Ceph you were using, but based on the >>> output >>> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? >>> >>> I've found that data placement can be a little weird when you have really >>> low CRUSH weights (0.2) on one of the nodes where the other nodes have >>> large >>> CRUSH weights (2.0). I've had it where a single OSD in a node was >>> getting >>> almost all the data. It wasn't until I increased the weights to be more >>> in >>> line with the rest of the cluster that it evened back out. >>> >>> I believe this can also be caused by not having enough PGs in your >>> cluster. >>> Or the PGs you do have aren't distributed correctly based on the data >>> usage >>> in each pool. Have you used https://ceph.com/pgcalc/ to determine the >>> correct number of PGs you should have per pool? >>> >>> Since you are likely running a pre-Jewel cluster it could also be that >>> you >>> haven't switched your tunables to use the straw2 data placement >>> algorithm: >>> >>> >>> http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4 >>> >>> That should help as well. Once that's enabled you can convert your >>> existing >>> buckets to straw2 as well. Just be careful you don't have any old >>> clients >>> connecting to your cluster that don't support that feature yet. >>> >>> Bryan >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > *Pardhiv Karri* > "Rise and Rise again until LAMBS become LIONS" > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com