[ceph-users] Re: Cannot create CephFS subvolume
Hi Daniel, On Wed, Dec 28, 2022 at 3:17 AM Daniel Kovacs wrote: > > Hello! > > I'd like to create a CephFS subvol, with these command: ceph fs > subvolume create cephfs_ssd subvol_1 > I got this error: Error EINVAL: invalid value specified for > ceph.dir.subvolume > If I use another cephfs volume, there were no error reported. Was `subvol_1` created earlier, deleted and now being recreated again (with the same name)? > > What did I wrong? > > Best regards, > > Daniel > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Cheers, Venky ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CephFS: Isolating folders for different users
Hi Jonas, On Mon, Jan 2, 2023 at 10:52 PM Jonas Schwab wrote: > > Thank you very much! Works like a charm, except for one thing: I gave my > clients the MDS caps 'allow rws path=' to also be able > to create snapshots from the client, but `mkdir .snap/test` still returns > mkdir: cannot create directory ‘.snap/test’: Operation not permitted > > Do you have an idea what might be the issue here? If you are using cephfs subvolume, its a good idea to take snapshots via ceph fs subvolume snapshot create ... since there is some subvolume jugglery done which might deny taking snapshots at arbitrary levels. > > Best regards, > Jonas > > PS: A happy new year to everyone! > > On 23.12.22 10:05, Kai Stian Olstad wrote: > > On 22.12.2022 15:47, Jonas Schwab wrote: > >> Now the question: Since I established this setup more or less through > >> trial and error, I was wondering if there is a more elegant/better > >> approach than what is outlined above? > > > > You can use namespace so you don't need separate pools. > > Unfortunately the documentation is sparse on the subject, I use it > > with subvolume like this > > > > > > # Create a subvolume > > > > ceph fs subvolume create > > --pool_layout --namespace-isolated > > > > The subvolume is created with namespace fsvolume_ > > You can also find the name with > > > > ceph fs subvolume info | jq -r > > .pool_namespace > > > > > > # Create a user with access to the subvolume and the namespace > > > > ## First find the path to the subvolume > > > > ceph fs subvolume getpath > > > > ## Create the user > > > > ceph auth get-or-create client. mon 'allow r' osd 'allow > > rw pool= namespace=fsvolumens_' > > > > > > I have found this by looking at how Openstack does it and some trial > > and error. > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Cheers, Venky ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pg deep scrubbing issue
Look closely at your output. The PGs with 0 objects. Are only “every other” due to how the command happened to order the output. Note that the empty PGs all have IDs matching “3.*”. The numeric prefix of a PG ID reflects the cardinal ID of the pool to which it belongs. I strongly suspect that you have a pool with no data. >> Strangely, ceph pg dump gives shows every other PG with 0 objects. An >> attempt to perform a deep scrub (or scrub) on one of these PGs does nothing. >> The cluster appears to be running fine, but obviously there’s an issue. >> What should my next steps be to troubleshoot ? >>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES >>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE >>> STATE_STAMPVERSION REPORTED UP >>> UP_PRIMARY ACTINGACTING_PRIMARY LAST_SCRUBSCRUB_STAMP >>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN >>> 3.e9b 0 00 0 00 >>> 0 000active+clean 2022-12-31 >>> 22:49:07.629579 0'023686:19820 [28,79] 28 >>> [28,79] 28 0'0 2022-12-31 22:49:07.629508 >>> 0'0 2022-12-31 22:49:07.629508 0 >>> 1.e99 60594 00 0 0 177433523272 >>> 0 0 3046 3046active+clean 2022-12-21 >>> 14:35:08.175858 23686'268137 23686:1732399 [178,115]178 >>> [178,115]178 23675'267613 2022-12-21 11:01:10.403525 >>> 23675'267613 2022-12-21 11:01:10.403525 0 >>> 3.e9a 0 00 0 00 >>> 0 000active+clean 2022-12-31 >>> 09:16:48.644619 0'023686:22855 [51,140] 51 >>> [51,140] 51 0'0 2022-12-31 09:16:48.644568 >>> 0'0 2022-12-30 02:35:23.367344 0 >>> 1.e98 59962 00 0 0 177218669411 >>> 0 0 3035 3035active+clean 2022-12-28 >>> 14:14:49.908560 23686'265576 23686:1357499 [92,86] 92 >>> [92,86] 92 23686'265445 2022-12-28 14:14:49.908522 >>> 23686'265445 2022-12-28 14:14:49.908522 0 >>> 3.e95 0 00 0 00 >>> 0 000active+clean 2022-12-31 >>> 06:09:39.442932 0'023686:22757 [48,83] 48 >>> [48,83] 48 0'0 2022-12-31 06:09:39.442879 >>> 0'0 2022-12-18 09:33:47.892142 0 As to your PGs not scrubbed in time, what sort of hardware are your OSDs? Here are some thoughts, especially if they’re HDDs. * If you don’t need that empty pool, delete it, then evaluate how many PGs on average your OSDs hold (eg. `ceph osd df`). If you have an unusually high number of PGs per, maybe just maybe you’re running afoul of osd_scrub_extended_sleep / osd_scrub_sleep . In other words, individual scrubs on empty PGs may naturally be very fast, but they may be DoSing because of the efforts Ceph makes to spread out the impact of scrubs. * Do you limit scrubs to certain times via osd_scrub_begin_hour, osd_scrub_end_hour, osd_scrub_begin_week_day, osd_scrub_end_week_day? I’ve seen operators who constraint scrubs to only a few overnight / weekend hours, but doing so can hobble Ceph’s ability to get through them all in time. * Similarly, a value of osd_scrub_load_threshold that’s too low can also result in starvation. The load average statistic can be misleading on modern SMP systems with lots of cores. I’ve witnessed 32c/64t OSD nodes report a load average of like 40, but with tools like htop one could see that they were barely breaking a sweat. * If you have osd_scrub_during_recovery disabled and experience a lot of backfill / recovery / rebalance traffic, that can starve scrubs too. IMHO with recent releases this should almost always be enabled, ymmv. * Back when I ran busy (read: underspend) HDD clusters I had to bump osd_deep_scrub_interval by a factor of 4x due to how slow and seek-bound the LFF spinners were. Of course, the longer one spaces out scrubs, the less effective they are at detecting problems before they’re impactful. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pg deep scrubbing issue
Thanks for the reply. I’ll give that a try, I wasn’t using the balancer. > On Jan 2, 2023, at 1:55 AM, Pavin Joseph wrote: > > Hi Jeff, > > Might be worth checking the balancer [0] status, also you probably want to > use upmap mode [1] if possible. > > [0]: https://docs.ceph.com/en/latest/rados/operations/balancer/#status > [1]: https://docs.ceph.com/en/latest/rados/operations/balancer/#modes > > Kind regards, > Pavin Joseph. > > On 02-Jan-23 12:04 AM, Jeffrey Turmelle wrote: >> Hi Everyone, >> My Nautilus cluster of 6 nodes, 180 OSDs, is having a weird issue I don’t >> know how to troubleshoot. >> I started receiving health warning issues, and the number of PGS not >> deep-scrubbed in time has been increasing. >> # ceph health detail >> HEALTH_WARN 3013 pgs not scrubbed in time >> PG_NOT_SCRUBBED 3013 pgs not scrubbed in time >> pg 1.e99 not scrubbed since 2022-12-21 11:01:10.403525 >> pg 1.e94 not scrubbed since 2022-12-18 06:26:14.086410 >> pg 3.e91 not scrubbed since 2022-12-17 03:00:25.104174 >> pg 1.e90 not scrubbed since 2022-12-18 03:31:44.747218 >> pg 1.e8e not scrubbed since 2022-12-21 12:04:17.111762 >> pg 1.e89 not scrubbed since 2022-12-18 07:20:13.328540 >> ... >> 2963 more pgs… >> Strangely, ceph pg dump gives shows every other PG with 0 objects. An >> attempt to perform a deep scrub (or scrub) on one of these PGs does nothing. >> The cluster appears to be running fine, but obviously there’s an issue. >> What should my next steps be to troubleshoot ? >> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES >> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP >>VERSION REPORTED UPUP_PRIMARY ACTING >> ACTING_PRIMARY LAST_SCRUBSCRUB_STAMP >> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN >> 3.e9b 0 00 0 00 >> 0 000active+clean 2022-12-31 >> 22:49:07.629579 0'023686:19820 [28,79] 28 >> [28,79] 28 0'0 2022-12-31 22:49:07.629508 >> 0'0 2022-12-31 22:49:07.629508 0 >> 1.e99 60594 00 0 0 177433523272 >> 0 0 3046 3046active+clean 2022-12-21 >> 14:35:08.175858 23686'268137 23686:1732399 [178,115]178 >> [178,115]178 23675'267613 2022-12-21 11:01:10.403525 >> 23675'267613 2022-12-21 11:01:10.403525 0 >> 3.e9a 0 00 0 00 >> 0 000active+clean 2022-12-31 >> 09:16:48.644619 0'023686:22855 [51,140] 51 >> [51,140] 51 0'0 2022-12-31 09:16:48.644568 >> 0'0 2022-12-30 02:35:23.367344 0 >> 1.e98 59962 00 0 0 177218669411 >> 0 0 3035 3035active+clean 2022-12-28 >> 14:14:49.908560 23686'265576 23686:1357499 [92,86] 92 >> [92,86] 92 23686'265445 2022-12-28 14:14:49.908522 >> 23686'265445 2022-12-28 14:14:49.908522 0 >> 3.e95 0 00 0 00 >> 0 000active+clean 2022-12-31 >> 06:09:39.442932 0'023686:22757 [48,83] 48 >> [48,83] 48 0'0 2022-12-31 06:09:39.442879 >> 0'0 2022-12-18 09:33:47.892142 0 >> 1.e97 60062 00 0 0 176721095235 >> 0 0 3050 3050active+clean 2022-12-31 >> 21:19:33.758473 23686'267934 23686:1514273 [137,123]137 >> [137,123]137 23686'267916 2022-12-31 21:19:33.758417 >> 23686'267713 2022-12-27 19:16:27.025326 0 >> 3.e94 0 00 0 00 >> 0 000active+clean 2022-12-31 >> 10:00:38.864773 0'023686:18478 [101,1]101 >> [101,1]101 0'0 2022-12-31 10:00:38.864730 >> 0'0 2022-12-28 22:28:13.790168 0 >> 1.e96 59753 00 0 0 175411602155 >> 0 0 3083 3083active+clean 2022-12-28 >> 14:13:32.186265 23686'264255 23686:1676359 [54,170] 54 >> [54,170] 54 23686'264120 2022-12-28 14:13:32.186220 >> 23686'264120 2022-12-28 14:13:32.186220 0 >> 3.e97 0 00 0 00 >> 0 000active+clean
[ceph-users] Re: CephFS: Isolating folders for different users
One side affect of using sub volumes is that you can then only take a snap at the sub volume level, nothing further down the tree. I find you can use the same path on the auth without the sub volume unless I’m missing something in this thread. On Mon, Jan 2, 2023 at 10:21 AM Jonas Schwab < jonas.sch...@physik.uni-wuerzburg.de> wrote: > Thank you very much! Works like a charm, except for one thing: I gave my > clients the MDS caps 'allow rws path=' to also be able > to create snapshots from the client, but `mkdir .snap/test` still returns > mkdir: cannot create directory ‘.snap/test’: Operation not permitted > > Do you have an idea what might be the issue here? > > Best regards, > Jonas > > PS: A happy new year to everyone! > > On 23.12.22 10:05, Kai Stian Olstad wrote: > > On 22.12.2022 15:47, Jonas Schwab wrote: > >> Now the question: Since I established this setup more or less through > >> trial and error, I was wondering if there is a more elegant/better > >> approach than what is outlined above? > > > > You can use namespace so you don't need separate pools. > > Unfortunately the documentation is sparse on the subject, I use it > > with subvolume like this > > > > > > # Create a subvolume > > > > ceph fs subvolume create > > --pool_layout --namespace-isolated > > > > The subvolume is created with namespace fsvolume_ > > You can also find the name with > > > > ceph fs subvolume info | jq -r > > .pool_namespace > > > > > > # Create a user with access to the subvolume and the namespace > > > > ## First find the path to the subvolume > > > > ceph fs subvolume getpath > > > > ## Create the user > > > > ceph auth get-or-create client. mon 'allow r' osd 'allow > > rw pool= namespace=fsvolumens_' > > > > > > I have found this by looking at how Openstack does it and some trial > > and error. > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CephFS: Isolating folders for different users
Thank you very much! Works like a charm, except for one thing: I gave my clients the MDS caps 'allow rws path=' to also be able to create snapshots from the client, but `mkdir .snap/test` still returns mkdir: cannot create directory ‘.snap/test’: Operation not permitted Do you have an idea what might be the issue here? Best regards, Jonas PS: A happy new year to everyone! On 23.12.22 10:05, Kai Stian Olstad wrote: On 22.12.2022 15:47, Jonas Schwab wrote: Now the question: Since I established this setup more or less through trial and error, I was wondering if there is a more elegant/better approach than what is outlined above? You can use namespace so you don't need separate pools. Unfortunately the documentation is sparse on the subject, I use it with subvolume like this # Create a subvolume ceph fs subvolume create --pool_layout --namespace-isolated The subvolume is created with namespace fsvolume_ You can also find the name with ceph fs subvolume info | jq -r .pool_namespace # Create a user with access to the subvolume and the namespace ## First find the path to the subvolume ceph fs subvolume getpath ## Create the user ceph auth get-or-create client. mon 'allow r' osd 'allow rw pool= namespace=fsvolumens_' I have found this by looking at how Openstack does it and some trial and error. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation
Sent prematurely. I meant to add that after ~3 years of service, the 1 DWPD drives in the clusters I mentioned mostly reported <10% of endurance burned. Required endurance is in part a function of how long you expect the drives to last. >> Having said that, for a storage cluster where write performance is expected >> to be the main bottleneck, I would be hesitant to use drives that only have >> 1DWPD endurance since Ceph has fairly high write amplification factors. If >> you use 3-fold replication, this cluster might only be able to handle a few >> TB of writes per day without wearing out the drives prematurely. > >> >>> Hi Experts,I am seeking for if there is achievable significant write >>> performance improvements when separating WAL/DB in a ceph cluster with all >>> SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise >>> Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I >>> get better write IOPs and throughput if I add one NVMe OSD per node and >>> separate WAL/DB on it?Is the result of this separation, meaningful >>> performance improvement or not? >>> My ceph cluster is block storage back-end of Openstack cinder in a public >>> cloud service. > > > My zwei pfennig: > > * IMHO the performance delta with external WAL+DB is going to be limited. > NVNe WAL+DB would deliver lower write latency up to a point, but throughput > is still going to be limited by the SAS HBA / bulk OSD drives. You also have > the hassle of managing OSDs that span devices: when replacing a failed OSD > properly handling the shared device can be tricky. With your very small > number of nodes and drives, the blast radius of one failing would be really > large. > > * Do you have the libvirt / librbd client-side cache disabled? > > * I’ve run 3R clusters in a similar role, backing libvirt / librbd clients > and using SATA SSDs. We mostly were able to sustain an average write latency > <= 5ms, though a couple of times we had to expand a cluster for IOPs before > capacity. The crappy HBAs in use were part of the bottleneck. This sort of > thing is one of the inputs to the SNIA TCO calculator. > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation
> Having said that, for a storage cluster where write performance is expected > to be the main bottleneck, I would be hesitant to use drives that only have > 1DWPD endurance since Ceph has fairly high write amplification factors. If > you use 3-fold replication, this cluster might only be able to handle a few > TB of writes per day without wearing out the drives prematurely. > >> Hi Experts,I am seeking for if there is achievable significant write >> performance improvements when separating WAL/DB in a ceph cluster with all >> SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise >> Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I >> get better write IOPs and throughput if I add one NVMe OSD per node and >> separate WAL/DB on it?Is the result of this separation, meaningful >> performance improvement or not? >> My ceph cluster is block storage back-end of Openstack cinder in a public >> cloud service. My zwei pfennig: * IMHO the performance delta with external WAL+DB is going to be limited. NVNe WAL+DB would deliver lower write latency up to a point, but throughput is still going to be limited by the SAS HBA / bulk OSD drives. You also have the hassle of managing OSDs that span devices: when replacing a failed OSD properly handling the shared device can be tricky. With your very small number of nodes and drives, the blast radius of one failing would be really large. * Do you have the libvirt / librbd client-side cache disabled? * I’ve run 3R clusters in a similar role, backing libvirt / librbd clients and using SATA SSDs. We mostly were able to sustain an average write latency <= 5ms, though a couple of times we had to expand a cluster for IOPs before capacity. The crappy HBAs in use were part of the bottleneck. This sort of thing is one of the inputs to the SNIA TCO calculator. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation
Hi all, I have a similar question regarding a cluster configuration consisting of HDDs, SSDs and NVMes. Let's say I would setup a OSD configuration in a yaml file like this: service_type:osd service_id:osd_spec_default placement: host_pattern:'*' spec: data_devices: model:HDD-Model-XY db_devices: model:NVME-Model-A wal_devices: model:NVME-Model-A Afaik that would result in having the db and wal on the nvmes, whereas all data is put on the HDD. I assume I would get significant IOPS performance when using Block device in Openstack (or Proxmox), but I have to put up with wearing out the NVMes, right? In the Ceph documentation, I saw that it would be also sufficient to define the 2,5" SSDs as db_devices. Nevertheless, I need some SSDs as OSDs since the Ceph FS metadata pool and NFS pool needs to be put on a SSD OSD. I might use 3 out of 4 SSDs each node as db_devices, but I need some for the metadata pools. Any suggestions? Regards, Mevludin Am 02.01.2023 um 15:03 schrieb Erik Lindahl: Depends. In theory, each OSD will have access to 1/4 of the separate WAL/DB device, so to get better performance you need to find an NVMe device that delivers significantly more than 4x the IOPS rate of the pm1643 drives, which is not common. That assumes the pm1643 devices are connected to a high-quality well-configured 12Gb SAS controller that really can deliver the full IOPS rate of 4 drives combined. The only way to find that out is likely to benchmark. Having said that, for a storage cluster where write performance is expected to be the main bottleneck, I would be hesitant to use drives that only have 1DWPD endurance since Ceph has fairly high write amplification factors. If you use 3-fold replication, this cluster might only be able to handle a few TB of writes per day without wearing out the drives prematurely. In practice we've been quite happy with Samsung drives that have often far exceeded their warranty endurance, but that's not something I would like to rely on when providing a commercial service. Cheers, Erik -- Erik Lindahl On 2 Jan 2023 at 10:25 +0100,hosseinz8...@yahoo.com , wrote: Hi Experts,I am seeking for if there is achievable significant write performance improvements when separating WAL/DB in a ceph cluster with all SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I get better write IOPs and throughput if I add one NVMe OSD per node and separate WAL/DB on it?Is the result of this separation, meaningful performance improvement or not? My ceph cluster is block storage back-end of Openstack cinder in a public cloud service. Thanks in advance. ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation
Depends. In theory, each OSD will have access to 1/4 of the separate WAL/DB device, so to get better performance you need to find an NVMe device that delivers significantly more than 4x the IOPS rate of the pm1643 drives, which is not common. That assumes the pm1643 devices are connected to a high-quality well-configured 12Gb SAS controller that really can deliver the full IOPS rate of 4 drives combined. The only way to find that out is likely to benchmark. Having said that, for a storage cluster where write performance is expected to be the main bottleneck, I would be hesitant to use drives that only have 1DWPD endurance since Ceph has fairly high write amplification factors. If you use 3-fold replication, this cluster might only be able to handle a few TB of writes per day without wearing out the drives prematurely. In practice we've been quite happy with Samsung drives that have often far exceeded their warranty endurance, but that's not something I would like to rely on when providing a commercial service. Cheers, Erik -- Erik Lindahl On 2 Jan 2023 at 10:25 +0100, hosseinz8...@yahoo.com , wrote: > Hi Experts,I am seeking for if there is achievable significant write > performance improvements when separating WAL/DB in a ceph cluster with all > SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise > Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I > get better write IOPs and throughput if I add one NVMe OSD per node and > separate WAL/DB on it?Is the result of this separation, meaningful > performance improvement or not? > My ceph cluster is block storage back-end of Openstack cinder in a public > cloud service. > > Thanks in advance. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph failing to write data - MDSs read only
Hi Kotresh, The issue is fixed for now I followed the steps below. I have an unmounted kernel client and restarted mds service which brought back mds to normal. But even after this "1 MDSs behind on trimming issue" didn't solve I waited for about 20 - 30 mins which automatically fixed the trimming issue and ceph status is healthy now. I didn't modify the settings related to the MDS cache they are in their default settings. On Mon, Jan 2, 2023 at 10:54 AM Kotresh Hiremath Ravishankar < khire...@redhat.com> wrote: > The MDS requests the clients to release caps to trim caches when there is > cache pressure or it > might proactively request the client to release caps in some cases. But > the client is failing to release the > caps soon enough in your case. > > Few questions: > > 1. Have you tuned MDS cache configurations? If so please share. > 2. Is this kernel client or fuse client? > 3. Could you please share 'session ls' output? > 4. Also share the MDS/Client logs. > > Sometimes dropping the caches (echo 3 > /proc/sys/vm/drop_caches if it's > kclient) or unmount and mounting > the problematic client could fix the issue if it's acceptable. > > Thanks and Regards, > Kotresh H R > > On Thu, Dec 29, 2022 at 4:35 PM Amudhan P wrote: > >> Hi, >> >> Suddenly facing an issue with Ceph cluster I am using ceph version 16.2.6. >> I couldn't find any solution for the issue below. >> Any suggestions? >> >> >> health: HEALTH_WARN >> 1 clients failing to respond to capability release >> 1 clients failing to advance oldest client/flush tid >> 1 MDSs are read only >> 1 MDSs report slow requests >> 1 MDSs behind on trimming >> >> services: >> mon: 3 daemons, quorum strg-node1,strg-node2,strg-node3 (age 9w) >> mgr: strg-node1.ivkfid(active, since 9w), standbys: strg-node2.unyimy >> mds: 1/1 daemons up, 1 standby >> osd: 32 osds: 32 up (since 9w), 32 in (since 5M) >> >> data: >> volumes: 1/1 healthy >> pools: 3 pools, 321 pgs >> objects: 13.19M objects, 45 TiB >> usage: 90 TiB used, 85 TiB / 175 TiB avail >> pgs: 319 active+clean >> 2 active+clean+scrubbing+deep >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: max pool size (amount of data/number of OSDs)
Hi Chris, The actually limits are not software. Usually Ceph teams on Cloud Providers or Universities running out at physical resources at first: racks, racks power or network (ports, EOL switches that can't be upgraded) or hardware lifetime (There is no point in buying old hardware, and the new one is too new to interfere with the old one. At the same time, replacing everything at once is very expensive (millions of dollars [depends on the region where the equipment is purchased and where it will be operated])) k Sent from my iPhone > On 30 Dec 2022, at 19:52, Christopher Durham wrote: > > > Hi, > Is there any information on this issue? Max number of OSDs per pool, or > maxpool size (data) as opposed to cluster size? Thanks! > -Chris > > > -Original Message- > From: Christopher Durham > To: ceph-users@ceph.io > Sent: Thu, Dec 15, 2022 5:36 pm > Subject: max pool size (amount of data/number of OSDs) > > > Hi, > There are various articles, case studies, etc about large ceph clusters, > storing 10s of PiB,with CERN being the largest cluster as far as I know. > Is there a largest pool capacity limit? In other words, while you may have a > 30PiB cluster,is there a limit or recommendation as to max pool capacity. For > example, in the 30PiB example,is there a limit or recommendation that says do > not have a pool capacity of higher than 5iB, for 6pools in that cluster at a > ttotal of 30PiB? > > I know this would be contingent upon a variety of things, including, but not > limited to network throughput, individual serversize (disk size and number, > memory, compute). I am specifically talking about s3./rgw storage. > > But is there a technical limit, or just a tested size, of a pool? Should I > createdifferent pools when a given pool would otherwise reach a size capacity > of Xor have N osds or PGs in it, when considering adding additional osds? > Thanks for any info > -Chris > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph All-SSD Cluster & Wal/DB Separation
Hi Experts,I am seeking for if there is achievable significant write performance improvements when separating WAL/DB in a ceph cluster with all SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I get better write IOPs and throughput if I add one NVMe OSD per node and separate WAL/DB on it?Is the result of this separation, meaningful performance improvement or not? My ceph cluster is block storage back-end of Openstack cinder in a public cloud service. Thanks in advance. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io