Hi, We have six storage nodes, and added three new only-SSD storage.nodes. I started increasing weight to fill in freshly added OSD on new osd storage nodes, the command was: ceph osd crush reweight osd.126 0.2 cluster started rebalance: 2019-05-22 11:00:00.000253 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4607699 : cluster [INF] overall HEALTH_OK 2019-05-22 12:00:00.000175 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4608927 : cluster [INF] overall HEALTH_OK 2019-05-22 13:00:00.000216 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4610174 : cluster [INF] overall HEALTH_OK 2019-05-22 13:44:57.353665 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611095 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY) 2019-05-22 13:44:58.642328 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611097 : cluster [WRN] Health check failed: 68628/33693246 objects misplaced (0.204%) (OBJECT_MISPLACED) 2019-05-22 13:45:02.696121 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611098 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 5 pgs peering) 2019-05-22 13:45:04.733172 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611099 : cluster [WRN] Health check update: 694611/33693423 objects misplaced (2.062%) (OBJECT_MISPLACED)
By my knowledge, it should fill up about 200GB on osd.126 and stop rebalancing, but in this case disk on ssdstor-a01 was filled over 85% and filling this disk/osd didn't stopped: [root@ssdstor-a01 ~]# df -h | grep ceph /dev/sdc1 1.8T 1.6T 237G 88% /var/lib/ceph/osd/ceph-126 /dev/sdd1 1.8T 136G 1.7T 8% /var/lib/ceph/osd/ceph-127 /dev/sde1 1.8T 99G 1.8T 6% /var/lib/ceph/osd/ceph-128 /dev/sdf1 1.8T 121G 1.7T 7% /var/lib/ceph/osd/ceph-129 /dev/sdg1 1.8T 98G 1.8T 6% /var/lib/ceph/osd/ceph-130 /dev/sdh1 1.8T 38G 1.8T 3% /var/lib/ceph/osd/ceph-131 then I changed waight back to 0.1, but cluster seemed to behaved unstable, so I changed on other new osds weight to 0.1 (to spreed load and available disk space among new disks). then the situation on other osds repeated: weight for osds 132-137: -55 0.59995 host ssdstor-b01 132 ssd 0.09999 osd.132 up 1.00000 1.00000 133 ssd 0.09999 osd.133 up 1.00000 1.00000 134 ssd 0.09999 osd.134 up 1.00000 1.00000 135 ssd 0.09999 osd.135 up 1.00000 1.00000 136 ssd 0.09999 osd.136 up 1.00000 1.00000 137 ssd 0.09999 osd.137 up 1.00000 1.00000 and on physical server: root@ssdstor-b01:~# df -h | grep ceph /dev/sdc1 1.8T 642G 1.2T 35% /var/lib/ceph/osd/ceph-132 /dev/sdd1 1.8T 342G 1.5T 19% /var/lib/ceph/osd/ceph-133 /dev/sde1 1.8T 285G 1.6T 16% /var/lib/ceph/osd/ceph-134 /dev/sdf1 1.8T 114G 1.7T 7% /var/lib/ceph/osd/ceph-135 /dev/sdg1 1.8T 215G 1.6T 12% /var/lib/ceph/osd/ceph-136 /dev/sdh1 1.8T 101G 1.8T 6% /var/lib/ceph/osd/ceph-137 I was changing weight for osds all evening and at night, and at the end I found the weights, which stabilized replication: -54 0.11993 host ssdstor-a01 126 ssd 0.01999 osd.126 up 1.00000 1.00000 127 ssd 0.01999 osd.127 up 0.96999 1.00000 128 ssd 0.01999 osd.128 up 1.00000 1.00000 129 ssd 0.01999 osd.129 up 1.00000 1.00000 130 ssd 0.01999 osd.130 up 1.00000 1.00000 131 ssd 0.01999 osd.131 up 1.00000 1.00000 -- -55 0.26993 host ssdstor-b01 132 ssd 0.01999 osd.132 up 1.00000 1.00000 133 ssd 0.04999 osd.133 up 1.00000 1.00000 134 ssd 0.04999 osd.134 up 1.00000 1.00000 135 ssd 0.04999 osd.135 up 1.00000 1.00000 136 ssd 0.04999 osd.136 up 1.00000 1.00000 137 ssd 0.04999 osd.137 up 1.00000 1.00000 -- -56 0.29993 host ssdstor-c01 138 ssd 0.04999 osd.138 up 1.00000 1.00000 139 ssd 0.04999 osd.139 up 1.00000 1.00000 140 ssd 0.04999 osd.140 up 1.00000 1.00000 141 ssd 0.04999 osd.141 up 1.00000 1.00000 142 ssd 0.04999 osd.142 up 1.00000 1.00000 143 ssd 0.04999 osd.143 up 1.00000 1.00000 I also changed reweight on osd.127, to spread data among osds on the same storage node ceph osd reweight osd.127 0.97 - as You can see in above output. Version of ceph: # ceph versions { "mon": { "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3 }, "osd": { "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144 }, "mds": {}, "rbd-mirror": { "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3 }, "rgw": { "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6 }, "overall": { "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159 } } Qustion is: is there some problem/bug with balancing in crush or I miss some setting ? -- Regards, Lukasz _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com