Hi Félix, Changing the failure domain to OSD is probably the easiest option if this is a test cluster. I think the commands would go like: - ceph osd getcrushmap -o map.bin - crushtool -d map.bin -o map.txt - sed -i 's/step chooseleaf firstn 0 type host/step chooseleaf firstn 0 type osd/' map.txt - crushtool -c map.txt -o map.bin - ceph osd setcrushmap -i map.bin
Moving HDDs into ~8TB/server would be a good option if this is a capacity focused use case. It will allow you to reboot 1 server at a time without radosgw down time. You would target for 26/3 = 8.66TB/ node so: - node1: 1x8TB - node2: 1x8TB +1x2TB - node3: 2x6 TB + 1x2TB If you are more concerned about performance then set the weights to 1 on all HDDs and forget about the wasted capacity. Cheers, Maxime On Tue, 6 Jun 2017 at 00:44 Christian Wuerdig <christian.wuer...@gmail.com> wrote: > Yet another option is to change the failure domain to OSD instead host > (this avoids having to move disks around and will probably meet you initial > expectations). > Means your cluster will become unavailable when you loose a host until you > fix it though. OTOH you probably don't have too much leeway anyway with > just 3 hosts so it might be an acceptable trade-off. It also means you can > just add new OSDs to the servers wherever they fit. > > On Tue, Jun 6, 2017 at 1:51 AM, David Turner <drakonst...@gmail.com> > wrote: > >> If you want to resolve your issue without purchasing another node, you >> should move one disk of each size into each server. This process will be >> quite painful as you'll need to actually move the disks in the crush map to >> be under a different host and then all of your data will move around, but >> then your weights will be able to utilize the weights and distribute the >> data between the 2TB, 3TB, and 8TB drives much more evenly. >> >> On Mon, Jun 5, 2017 at 9:21 AM Loic Dachary <l...@dachary.org> wrote: >> >>> >>> >>> On 06/05/2017 02:48 PM, Christian Balzer wrote: >>> > >>> > Hello, >>> > >>> > On Mon, 5 Jun 2017 13:54:02 +0200 Félix Barbeira wrote: >>> > >>> >> Hi, >>> >> >>> >> We have a small cluster for radosgw use only. It has three nodes, >>> witch 3 >>> > ^^^^^ ^^^^^ >>> >> osds each. Each node has different disk sizes: >>> >> >>> > >>> > There's your answer, staring you right in the face. >>> > >>> > Your default replication size is 3, your default failure domain is >>> host. >>> > >>> > Ceph can not distribute data according to the weight, since it needs >>> to be >>> > on a different node (one replica per node) to comply with the replica >>> size. >>> >>> Another way to look at it is to imagine a situation where 10TB worth of >>> data >>> is stored on node01 which has 8x3 24TB. Since you asked for 3 replicas, >>> this >>> data must be replicated to node02 but ... there only is 2x3 6TB >>> available. >>> So the maximum you can store is 6TB and remaining disk space on node01 >>> and node03 >>> will never be used. >>> >>> python-crush analyze will display a message about that situation and >>> show which buckets >>> are overweighted. >>> >>> Cheers >>> >>> > >>> > If your cluster had 4 or more nodes, you'd see what you expected. >>> > And most likely wouldn't be happy about the performance with your 8TB >>> HDDs >>> > seeing 4 times more I/Os than then 2TB ones and thus becoming the >>> > bottleneck of your cluster. >>> > >>> > Christian >>> > >>> >> node01 : 3x8TB >>> >> node02 : 3x2TB >>> >> node03 : 3x3TB >>> >> >>> >> I thought that the weight handle the amount of data that every osd >>> receive. >>> >> In this case for example the node with the 8TB disks should receive >>> more >>> >> than the rest, right? All of them receive the same amount of data and >>> the >>> >> smaller disk (2TB) reaches 100% before the bigger ones. Am I doing >>> >> something wrong? >>> >> >>> >> The cluster is jewel LTS 10.2.7. >>> >> >>> >> # ceph osd df >>> >> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >>> >> 0 7.27060 1.00000 7445G 1012G 6432G 13.60 0.57 133 >>> >> 3 7.27060 1.00000 7445G 1081G 6363G 14.52 0.61 163 >>> >> 4 7.27060 1.00000 7445G 787G 6657G 10.58 0.44 120 >>> >> 1 1.81310 1.00000 1856G 1047G 809G 56.41 2.37 143 >>> >> 5 1.81310 1.00000 1856G 956G 899G 51.53 2.16 143 >>> >> 6 1.81310 1.00000 1856G 877G 979G 47.24 1.98 130 >>> >> 2 2.72229 1.00000 2787G 1010G 1776G 36.25 1.52 140 >>> >> 7 2.72229 1.00000 2787G 831G 1955G 29.83 1.25 130 >>> >> 8 2.72229 1.00000 2787G 1038G 1748G 37.27 1.56 146 >>> >> TOTAL 36267G 8643G 27624G 23.83 >>> >> MIN/MAX VAR: 0.44/2.37 STDDEV: 18.60 >>> >> # >>> >> >>> >> # ceph osd tree >>> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >>> >> -1 35.41795 root default >>> >> -2 21.81180 host node01 >>> >> 0 7.27060 osd.0 up 1.00000 1.00000 >>> >> 3 7.27060 osd.3 up 1.00000 1.00000 >>> >> 4 7.27060 osd.4 up 1.00000 1.00000 >>> >> -3 5.43929 host node02 >>> >> 1 1.81310 osd.1 up 1.00000 1.00000 >>> >> 5 1.81310 osd.5 up 1.00000 1.00000 >>> >> 6 1.81310 osd.6 up 1.00000 1.00000 >>> >> -4 8.16687 host node03 >>> >> 2 2.72229 osd.2 up 1.00000 1.00000 >>> >> 7 2.72229 osd.7 up 1.00000 1.00000 >>> >> 8 2.72229 osd.8 up 1.00000 1.00000 >>> >> # >>> >> >>> >> # ceph -s >>> >> cluster 49ba9695-7199-4c21-9199-ac321e60065e >>> >> health HEALTH_OK >>> >> monmap e1: 3 mons at >>> >> >>> {ceph-mon01=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon02=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon03=[x:x:x:x:x:x:x:x]:6789/0} >>> >> election epoch 48, quorum 0,1,2 >>> ceph-mon01,ceph-mon03,ceph-mon02 >>> >> osdmap e265: 9 osds: 9 up, 9 in >>> >> flags sortbitwise,require_jewel_osds >>> >> pgmap v95701: 416 pgs, 11 pools, 2879 GB data, 729 kobjects >>> >> 8643 GB used, 27624 GB / 36267 GB avail >>> >> 416 active+clean >>> >> # >>> >> >>> >> # ceph osd pool ls >>> >> .rgw.root >>> >> default.rgw.control >>> >> default.rgw.data.root >>> >> default.rgw.gc >>> >> default.rgw.log >>> >> default.rgw.users.uid >>> >> default.rgw.users.keys >>> >> default.rgw.buckets.index >>> >> default.rgw.buckets.non-ec >>> >> default.rgw.buckets.data >>> >> default.rgw.users.email >>> >> # >>> >> >>> >> # ceph df >>> >> GLOBAL: >>> >> SIZE AVAIL RAW USED %RAW USED >>> >> 36267G 27624G 8643G 23.83 >>> >> POOLS: >>> >> NAME ID USED %USED MAX >>> AVAIL >>> >> OBJECTS >>> >> .rgw.root 1 1588 0 >>> 5269G >>> >> 4 >>> >> default.rgw.control 2 0 0 >>> 5269G >>> >> 8 >>> >> default.rgw.data.root 3 8761 0 >>> 5269G >>> >> 28 >>> >> default.rgw.gc 4 0 0 >>> 5269G >>> >> 32 >>> >> default.rgw.log 5 0 0 >>> 5269G >>> >> 127 >>> >> default.rgw.users.uid 6 4887 0 >>> 5269G >>> >> 28 >>> >> default.rgw.users.keys 7 144 0 >>> 5269G >>> >> 16 >>> >> default.rgw.buckets.index 9 0 0 >>> 5269G >>> >> 14 >>> >> default.rgw.buckets.non-ec 10 0 0 >>> 5269G >>> >> 3 >>> >> default.rgw.buckets.data 11 2879G 35.34 >>> 5269G >>> >> 746848 >>> >> default.rgw.users.email 12 13 0 >>> 5269G >>> >> 1 >>> >> # >>> >> >>> > >>> > >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com