Hi,

thanks for the information - it seems that with pinning of 
subvolumes/directories you can distribute the load to different MDS. But in 
that case, what would be the difference to setting up different top-level 
volumes and attach them to different MDS? What I am not clear about is whether 
setting up one fs volume and pin subvolumes to different MDS is basically 
equivalent to using multiple fs volumes and attaching them to different MDS. 
Quotas/auth caps etc. can both be set for volumes as well as subvolumes.

The only recommendation I have found on [0] says

„...it is recommended to consolidate file system workloads onto a single CephFS 
file system, when possible. Consolidate the workloads to avoid over-allocating 
resources to MDS servers that can be underutilized.“

Is there a workload difference when using multiple fs volumes vs. a single one 
and subvolumes? Intuitively I would think that multiple fs volumes might 
provide some more error resilience in case of failures - in which case only one 
fs (of several) would fail instead of the whole cluster (if there is just a 
single volume and subvolumes are used).

Any insights? Thanks,

Sophonet

[0] 
https://www.ibm.com/docs/en/storage-ceph/8.1.0?topic=systems-cephfs-volumes-subvolumes-subvolume-groups

> Am 23.09.2025 um 15:45 schrieb Eugen Block <[email protected]>:
> 
> Hi,
> 
> with multiple active MDS daemons you can use pinning. This allows you to pin 
> specific directories (or subvolumes) to a specific rank to spread the load. 
> You can find the relevant docs here [0].
> 
> Note that during an upgrade, max_mds is reduced to 1 (automatically if you 
> use the orchestrator), which can have a significant impact because all the 
> load previously spreaded across multiple daemons is now shuffled onto a 
> single node. This can crash a file system, just so you're aware.
> 
> So there are several options, two or three "fat" MDS nodes in active/standby 
> mode which can handle all the load. Or you have more "fat" nodes which could 
> handle all the load during an upgrade, spreading the load again after the 
> upgrade is finished. Or you have multiple "not so fat" nodes to spread the 
> workload but with a higher risk of an issue during an upgrade.
> 
> Regards,
> Eugen
> 
> [0] https://docs.ceph.com/en/latest/cephfs/multimds/
> [1] 
> https://docs.ceph.com/en/latest/cephfs/upgrading/#upgrading-the-mds-cluster
> 
> Zitat von Sophonet <[email protected]>:
> 
>> Hi list,
>> 
>> for multiple project-level file shares (with individual access rights) I am 
>> planning to use CephFS.
>> 
>> Technically this can be implemented both with multiple toplevel cephfs or 
>> with a single cephfs in the cluster and subvolumes.
>> 
>> What is the preferred choice? I have not found any guidance in docs.ceph.com 
>> <http://docs.ceph.com/>. The only location that suggests to use subvolumes 
>> is 
>> https://www.ibm.com/docs/en/storage-ceph/8.1.0?topic=systems-cephfs-volumes-subvolumes-subvolume-groups.
>>  However, how can I avoid that only one MDS is responsible for serving all 
>> subvolumes? Is there some current literature (books or web docs) that 
>> contain recommendations and examples? A couple of ceph-related books are 
>> available in well-known online book stores, but many of them are rather old 
>> (6 years or even more).
>> 
>> Thanks a lot,
>> 
>> Sophonet
>> 
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
> 
> 
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to