Re: [ceph-users] Very unbalanced osd data placement with differing sized devices

David Zafman Wed, 16 Oct 2013 20:28:56 -0700

I may be wrong, but I always thought that a weight of 0 means don't put 
anything there.  All weights > 0 will be looked at proportionally.


See http://ceph.com/docs/master/rados/operations/crush-map/ which recommends 
higher weights anyway:

Weighting Bucket Items

Ceph expresses bucket weights as double integers, which allows for fine 
weighting. A weight is the relative difference between device capacities. We 
recommend using 1.00 as the relative weight for a 1TB storage device. In such a 
scenario, a weight of 0.5 would represent approximately 500GB, and a weight of 
3.00 would represent approximately 3TB. Higher level buckets have a weight that 
is the sum total of the leaf items aggregated by the bucket.

A bucket item weight is one dimensional, but you may also calculate your item 
weights to reflect the performance of the storage drive. For example, if you 
have many 1TB drives where some have relatively low data transfer rate and the 
others have a relatively high data transfer rate, you may weight them 
differently, even though they have the same capacity (e.g., a weight of 0.80 
for the first set of drives with lower total throughput, and 1.20 for the 
second set of drives with higher total throughput).


David Zafman
Senior Developer
http://www.inktank.com




On Oct 16, 2013, at 8:15 PM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz> 
wrote:

> I stumbled across this today:
> 
> 4 osds on 4 hosts (names ceph1 -> ceph4). They are KVM guests (this is a play 
> setup).
> 
> - ceph1 and ceph2 each have a 5G volume for osd data (+ 2G vol for journal)
> - ceph3 and ceph4 each have a 10G volume for osd data (+ 2G vol for journal)
> 
> I do a standard installation via ceph-deploy (1.2.7) of ceph (0.67.4) on each 
> one [1]. The topology looks like:
> 
> $ ceph osd tree
> # id    weight    type name    up/down    reweight
> -1    0.01999    root default
> -2    0        host ceph1
> 0    0            osd.0    up    1
> -3    0        host ceph2
> 1    0            osd.1    up    1
> -4    0.009995        host ceph3
> 2    0.009995            osd.2    up    1
> -5    0.009995        host ceph4
> 3    0.009995            osd.3    up    1
> 
> So osd.0 and osd.1 (on ceph1,2) have weight 0, and osd2 and osd.3 (on 
> ceph3,4) have weight 0.009995 this suggests that data will flee osd.0,1 and 
> live only on osd.3.4. Sure enough putting in a few objects via radus put 
> results in:
> 
> ceph1 $ df -m
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/vda1           5038  2508      2275  53% /
> udev                 994     1       994   1% /dev
> tmpfs                401     1       401   1% /run
> none                   5     0         5   0% /run/lock
> none                1002     0      1002   0% /run/shm
> /dev/vdb1           5109    40      5070   1% /var/lib/ceph/osd/ceph-0
> 
> (similarly for ceph2), whereas:
> 
> ceph3 $df -m
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/vda1           5038  2405      2377  51% /
> udev                 994     1       994   1% /dev
> tmpfs                401     1       401   1% /run
> none                   5     0         5   0% /run/lock
> none                1002     0      1002   0% /run/shm
> /dev/vdb1          10229  1315      8915  13% /var/lib/ceph/osd/ceph-2
> 
> (similarly for ceph4). Obviously I can fix this via the reweighting the first 
> two osds to something like 0.005, but I'm wondering if there is something 
> I've missed - clearly some kind of auto weighting is has been performed on 
> the basis of the size difference in the data volumes, but looks to be skewing 
> data far too much to the bigger ones. Is there perhaps a bug in the smarts 
> for this? Or is it just because I'm using small volumes (5G = 0 weight)?
> 
> Cheers
> 
> Mark
> 
> [1] i.e:
> 
> $ ceph-deploy new ceph1
> $ ceph-deploy mon create ceph1
> $ ceph-deploy gatherkeys ceph1
> $ ceph-deploy osd create ceph1:/dev/vdb:/dev/vdc
> ...
> $ ceph-deploy osd create ceph4:/dev/vdb:/dev/vdc
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Very unbalanced osd data placement with differing sized devices

Reply via email to