[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

Dan van der Ster Sun, 16 May 2021 14:41:51 -0700

Hi Bryan,

I had to do something similar, and never found a rule to place "up to"
2 chunks per host, so I stayed with the placement of *exactly* 2
chunks per host.


But I did this slightly differently to what you wrote earlier: my rule
chooses exactly 4 hosts, then chooses exactly 2 osds on each:

        type erasure
        min_size 3
        max_size 10
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default class hdd
        step choose indep 4 type host
        step choose indep 2 type osd
        step emit

If you really need the "up to 2" approach then maybe you can split
each host into two "host" crush buckets, with half the OSDs in each.
Then a normal host-wise rule should work.

Cheers, Dan


On Sun, May 16, 2021 at 2:34 AM Bryan Stillwell <bstillw...@godaddy.com> wrote:
>
> Actually both our solutions don't work very well.  Frequently the same OSD 
> was chosen for multiple chunks:
>
>
> 8.72     9751         0          0        0  40895512576            0         
>   0  1302                   active+clean     2h  224790'12801   225410:49810  
>   [13,1,14,11,18,2,19,13]p13    [13,1,14,11,18,2,19,13]p13  
> 2021-05-11T22:41:11.332885+0000  2021-05-11T22:41:11.332885+0000
> 8.7f     9695         0          0        0  40661680128            0         
>   0  2184                   active+clean     5h  224790'12850   225409:57529  
>       [8,17,4,1,14,0,19,8]p8        [8,17,4,1,14,0,19,8]p8  
> 2021-05-11T22:41:11.332885+0000  2021-05-11T22:41:11.332885+0000
>
> I'm now considering using device classes and assigning the OSDs to either 
> hdd1 or hdd2...  Unless someone has another idea?
>
> Thanks,
> Bryan
>
> > On May 14, 2021, at 12:35 PM, Bryan Stillwell <bstillw...@godaddy.com> 
> > wrote:
> >
> > This works better than my solution.  It allows the cluster to put more PGs 
> > on the systems with more space on them:
> >
> > # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
> > '.pg_stats[].pgid'); do
> >>  echo $pg
> >>  for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> >>    ceph osd find $osd | jq -r '.host'
> >>  done | sort | uniq -c | sort -n -k1
> >> done
> > 8.0
> >      1 excalibur
> >      1 mandalaybay
> >      2 aladdin
> >      2 harrahs
> >      2 paris
> > 8.1
> >      1 aladdin
> >      1 excalibur
> >      1 harrahs
> >      1 mirage
> >      2 mandalaybay
> >      2 paris
> > 8.2
> >      1 aladdin
> >      1 mandalaybay
> >      2 harrahs
> >      2 mirage
> >      2 paris
> > ...
> >
> > Thanks!
> > Bryan
> >
> >> On May 13, 2021, at 2:58 AM, Ján Senko <ja...@protonmail.ch> wrote:
> >>
> >> Caution: This email is from an external sender. Please do not click links 
> >> or open attachments unless you recognize the sender and know the content 
> >> is safe. Forward suspicious emails to isitbad@.
> >>
> >>
> >>
> >> Would something like this work?
> >>
> >> step take default
> >> step choose indep 4 type host
> >> step chooseleaf indep 1 type osd
> >> step emit
> >> step take default
> >> step choose indep 0 type host
> >> step chooseleaf indep 1 type osd
> >> step emit
> >>
> >> J.
> >>
> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> >>
> >> On Wednesday, May 12th, 2021 at 17:58, Bryan Stillwell 
> >> <bstillw...@godaddy.com> wrote:
> >>
> >>> I'm trying to figure out a CRUSH rule that will spread data out across my 
> >>> cluster as much as possible, but not more than 2 chunks per host.
> >>>
> >>> If I use the default rule with an osd failure domain like this:
> >>>
> >>> step take default
> >>>
> >>> step choose indep 0 type osd
> >>>
> >>> step emit
> >>>
> >>> I get clustering of 3-4 chunks on some of the hosts:
> >>>
> >>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
> >>> '.pg_stats[].pgid'); do
> >>> =======================================================================================
> >>>
> >>>> echo $pg
> >>>>
> >>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> >>>>
> >>>> ceph osd find $osd | jq -r '.host'
> >>>>
> >>>> done | sort | uniq -c | sort -n -k1
> >>>
> >>> 8.0
> >>>
> >>> 1 harrahs
> >>>
> >>> 3 paris
> >>>
> >>> 4 aladdin
> >>>
> >>> 8.1
> >>>
> >>> 1 aladdin
> >>>
> >>> 1 excalibur
> >>>
> >>> 2 mandalaybay
> >>>
> >>> 4 paris
> >>>
> >>> 8.2
> >>>
> >>> 1 harrahs
> >>>
> >>> 2 aladdin
> >>>
> >>> 2 mirage
> >>>
> >>> 3 paris
> >>>
> >>> ...
> >>>
> >>> However, if I change the rule to use:
> >>>
> >>> step take default
> >>>
> >>> step choose indep 0 type host
> >>>
> >>> step chooseleaf indep 2 type osd
> >>>
> >>> step emit
> >>>
> >>> I get the data spread across 4 hosts with 2 chunks per host:
> >>>
> >>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
> >>> '.pg_stats[].pgid'); do
> >>> =======================================================================================
> >>>
> >>>> echo $pg
> >>>>
> >>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> >>>>
> >>>> ceph osd find $osd | jq -r '.host'
> >>>>
> >>>> done | sort | uniq -c | sort -n -k1
> >>>>
> >>>> done
> >>>
> >>> 8.0
> >>>
> >>> 2 aladdin
> >>>
> >>> 2 harrahs
> >>>
> >>> 2 mandalaybay
> >>>
> >>> 2 paris
> >>>
> >>> 8.1
> >>>
> >>> 2 aladdin
> >>>
> >>> 2 harrahs
> >>>
> >>> 2 mandalaybay
> >>>
> >>> 2 paris
> >>>
> >>> 8.2
> >>>
> >>> 2 harrahs
> >>>
> >>> 2 mandalaybay
> >>>
> >>> 2 mirage
> >>>
> >>> 2 paris
> >>>
> >>> ...
> >>>
> >>> Is it possible to get the data to spread out over more hosts? I plan on 
> >>> expanding the cluster in the near future and would like to see more hosts 
> >>> get 1 chunk instead of 2.
> >>>
> >>> Also, before you recommend adding two more hosts and switching to a 
> >>> host-based failure domain, the cluster is on a variety of hardware with 
> >>> between 2-6 drives per host and drives that are 4TB-12TB in size (it's 
> >>> part of my home lab).
> >>>
> >>> Thanks,
> >>>
> >>> Bryan
> >>>
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>>
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

Reply via email to