The larger the value of K relative to M, the more efficient the raw :: usable 
ratio ends up.

There are tradeoffs and caveats.  Here are some of my thoughts; if I’m off-base 
here, I welcome enlightenment. 



When possible, it’s ideal to have at least K+M failure domains — often racks, 
sometimes hosts, chassis, etc.  Thus smaller clusters, say with 5-6 nodes, 
aren’t good fits for larger sums of K+M if your data is valuable.

Larger sums of K+M also mean that more drives will be touched by each read or 
write, especially during recovery.  This could be a factor if one is 
IOPS-limited.  Same with scrubs.

When using a pool for, eg. RGW buckets, larger sums of K+M may result in 
greater overhead when storing small objects, since Ceph / RGW only AIUI writes 
full stripes.  So say you have an EC pool of 17,3 on drives with the default 
4kB bluestone_min_alloc size.  A 1kB S3 object would thus allocate 17+3=20 x 
4kB == 80kB of storage, which is 7900% overhead.  This is an extreme example to 
illustrate the point.

Larger sums of K+M may present more IOPs to each storage drive, dependent on 
workload and the EC plugin selected.

With larger objects (including RBD) the modulo factor is dramatically smaller.  
One’s use-case and dataset per-pool may thus inform the EC profiles that make 
sense; workloads that are predominately smaller objects might opt for 
replication instead.

There was a post ….. a year ago? suggesting that values with small prime 
factors are advantageous, but I never saw a discussion of why that might be.

In some cases where one might be pressured to use replication with only 2 
copies of data, a 2,2 EC profile might achieve the same efficiency with greater 
safety.

Geo / stretch clusters or ones in challenging environments are a special case; 
they might choose values of M equal to or even larger than K.

That said, I think 4,2 is a reasonable place to *start*, adjusted by one’s 
specific needs.  You get a raw :: usable ratio of 1.5 without getting too 
complicated.

ymmv






> 
> Hi,
> 
> It depends of hardware, failure domain, use case, overhead.
> 
> I don’t see an easy way to chose k and m values.
> 
> -
> Etienne Menguy
> etienne.men...@croit.io
> 
> 
>> On 4 Oct 2021, at 16:57, Golasowski Martin <martin.golasow...@vsb.cz> wrote:
>> 
>> Hello guys,
>> how does one estimate number of chunks for erasure coded pool ( k = ? ) ? I 
>> see that number of m chunks determines the pool’s resiliency, however I did 
>> not find clear guideline how to determine k.
>> 
>> Red Hat states that they support only the following combinations:
>> 
>> k=8, m=3
>> k=8, m=4
>> k=4, m=2
>> 
>> without any rationale behind them.
>> The table is taken from 
>> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/storage_strategies_guide/erasure_code_pools.
>> 
>> Thanks!
>> 
>> Regards,
>> Martin
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to