Actually both our solutions don't work very well. Frequently the same OSD was chosen for multiple chunks:
8.72 9751 0 0 0 40895512576 0 0 1302 active+clean 2h 224790'12801 225410:49810 [13,1,14,11,18,2,19,13]p13 [13,1,14,11,18,2,19,13]p13 2021-05-11T22:41:11.332885+0000 2021-05-11T22:41:11.332885+0000 8.7f 9695 0 0 0 40661680128 0 0 2184 active+clean 5h 224790'12850 225409:57529 [8,17,4,1,14,0,19,8]p8 [8,17,4,1,14,0,19,8]p8 2021-05-11T22:41:11.332885+0000 2021-05-11T22:41:11.332885+0000 I'm now considering using device classes and assigning the OSDs to either hdd1 or hdd2... Unless someone has another idea? Thanks, Bryan > On May 14, 2021, at 12:35 PM, Bryan Stillwell <bstillw...@godaddy.com> wrote: > > This works better than my solution. It allows the cluster to put more PGs on > the systems with more space on them: > > # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r > '.pg_stats[].pgid'); do >> echo $pg >> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do >> ceph osd find $osd | jq -r '.host' >> done | sort | uniq -c | sort -n -k1 >> done > 8.0 > 1 excalibur > 1 mandalaybay > 2 aladdin > 2 harrahs > 2 paris > 8.1 > 1 aladdin > 1 excalibur > 1 harrahs > 1 mirage > 2 mandalaybay > 2 paris > 8.2 > 1 aladdin > 1 mandalaybay > 2 harrahs > 2 mirage > 2 paris > ... > > Thanks! > Bryan > >> On May 13, 2021, at 2:58 AM, Ján Senko <ja...@protonmail.ch> wrote: >> >> Caution: This email is from an external sender. Please do not click links or >> open attachments unless you recognize the sender and know the content is >> safe. Forward suspicious emails to isitbad@. >> >> >> >> Would something like this work? >> >> step take default >> step choose indep 4 type host >> step chooseleaf indep 1 type osd >> step emit >> step take default >> step choose indep 0 type host >> step chooseleaf indep 1 type osd >> step emit >> >> J. >> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >> >> On Wednesday, May 12th, 2021 at 17:58, Bryan Stillwell >> <bstillw...@godaddy.com> wrote: >> >>> I'm trying to figure out a CRUSH rule that will spread data out across my >>> cluster as much as possible, but not more than 2 chunks per host. >>> >>> If I use the default rule with an osd failure domain like this: >>> >>> step take default >>> >>> step choose indep 0 type osd >>> >>> step emit >>> >>> I get clustering of 3-4 chunks on some of the hosts: >>> >>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r >>> '.pg_stats[].pgid'); do >>> ======================================================================================= >>> >>>> echo $pg >>>> >>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do >>>> >>>> ceph osd find $osd | jq -r '.host' >>>> >>>> done | sort | uniq -c | sort -n -k1 >>> >>> 8.0 >>> >>> 1 harrahs >>> >>> 3 paris >>> >>> 4 aladdin >>> >>> 8.1 >>> >>> 1 aladdin >>> >>> 1 excalibur >>> >>> 2 mandalaybay >>> >>> 4 paris >>> >>> 8.2 >>> >>> 1 harrahs >>> >>> 2 aladdin >>> >>> 2 mirage >>> >>> 3 paris >>> >>> ... >>> >>> However, if I change the rule to use: >>> >>> step take default >>> >>> step choose indep 0 type host >>> >>> step chooseleaf indep 2 type osd >>> >>> step emit >>> >>> I get the data spread across 4 hosts with 2 chunks per host: >>> >>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r >>> '.pg_stats[].pgid'); do >>> ======================================================================================= >>> >>>> echo $pg >>>> >>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do >>>> >>>> ceph osd find $osd | jq -r '.host' >>>> >>>> done | sort | uniq -c | sort -n -k1 >>>> >>>> done >>> >>> 8.0 >>> >>> 2 aladdin >>> >>> 2 harrahs >>> >>> 2 mandalaybay >>> >>> 2 paris >>> >>> 8.1 >>> >>> 2 aladdin >>> >>> 2 harrahs >>> >>> 2 mandalaybay >>> >>> 2 paris >>> >>> 8.2 >>> >>> 2 harrahs >>> >>> 2 mandalaybay >>> >>> 2 mirage >>> >>> 2 paris >>> >>> ... >>> >>> Is it possible to get the data to spread out over more hosts? I plan on >>> expanding the cluster in the near future and would like to see more hosts >>> get 1 chunk instead of 2. >>> >>> Also, before you recommend adding two more hosts and switching to a >>> host-based failure domain, the cluster is on a variety of hardware with >>> between 2-6 drives per host and drives that are 4TB-12TB in size (it's part >>> of my home lab). >>> >>> Thanks, >>> >>> Bryan >>> >>> ceph-users mailing list -- ceph-users@ceph.io >>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io