Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

Bryan Stillwell Fri, 11 May 2018 10:27:08 -0700

> We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB disks
> each to the cluster. All the 5 nodes rebalanced well without any issues and
> the sixth/last node OSDs started acting weird as I increase weight of one osd
> the utilization doesn't change but a different osd on the same node
> utilization is getting increased. Rebalance complete fine but utilization is
> not right.
>
> Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
> started increasing but its weight is 0.0. If I increase weight of OSD 611 to
> 0.2 then its overall utilization is growing to what if its weight is 0.4. So
> if I increase weight of 610 and 615 to their full weight then utilization on
> OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop and
> downsize the OSD's crush weight back to 0.0 to avoid any implications on ceph
> cluster. Its not just one osd but different OSD's on that one node. The only
> correlation I found out is 610 and 611 OSD Journal partitions are on the same
> SSD drive and all the OSDs are SAS drives. Any help on how to debug or
> resolve this will be helpful.


You didn't say which version of Ceph you were using, but based on the output
of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?

I've found that data placement can be a little weird when you have really
low CRUSH weights (0.2) on one of the nodes where the other nodes have large
CRUSH weights (2.0).  I've had it where a single OSD in a node was getting
almost all the data.  It wasn't until I increased the weights to be more in
line with the rest of the cluster that it evened back out.

I believe this can also be caused by not having enough PGs in your cluster.
Or the PGs you do have aren't distributed correctly based on the data usage
in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
correct number of PGs you should have per pool?

Since you are likely running a pre-Jewel cluster it could also be that you
haven't switched your tunables to use the straw2 data placement algorithm:

http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4

That should help as well.  Once that's enabled you can convert your existing
buckets to straw2 as well.  Just be careful you don't have any old clients
connecting to your cluster that don't support that feature yet.

Bryan

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

Reply via email to