[ceph-users] Re: Cannot create CephFS subvolume

2023-01-02 Thread Venky Shankar
Hi Daniel,

On Wed, Dec 28, 2022 at 3:17 AM Daniel Kovacs  wrote:
>
> Hello!
>
> I'd like to create a CephFS subvol, with these command: ceph fs
> subvolume create cephfs_ssd subvol_1
> I got this error: Error EINVAL: invalid value specified for
> ceph.dir.subvolume
> If I use another cephfs volume, there were no error reported.

Was `subvol_1` created earlier, deleted and now being recreated again
(with the same name)?

>
> What did I wrong?
>
> Best regards,
>
> Daniel
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Venky

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS: Isolating folders for different users

2023-01-02 Thread Venky Shankar
Hi Jonas,

On Mon, Jan 2, 2023 at 10:52 PM Jonas Schwab
 wrote:
>
> Thank you very much! Works like a charm, except for one thing: I gave my
> clients the MDS caps 'allow rws path=' to also be able
> to create snapshots from the client, but `mkdir .snap/test` still returns
>  mkdir: cannot create directory ‘.snap/test’: Operation not permitted
>
> Do you have an idea what might be the issue here?

If you are using cephfs subvolume, its a good idea to take snapshots via

ceph fs subvolume snapshot create ...

since there is some subvolume jugglery done which might deny taking
snapshots at arbitrary levels.

>
> Best regards,
> Jonas
>
> PS: A happy new year to everyone!
>
> On 23.12.22 10:05, Kai Stian Olstad wrote:
> > On 22.12.2022 15:47, Jonas Schwab wrote:
> >> Now the question: Since I established this setup more or less through
> >> trial and error, I was wondering if there is a more elegant/better
> >> approach than what is outlined above?
> >
> > You can use namespace so you don't need separate pools.
> > Unfortunately the documentation is sparse on the subject, I use it
> > with subvolume like this
> >
> >
> > # Create a subvolume
> >
> > ceph fs subvolume create  
> > --pool_layout  --namespace-isolated
> >
> > The subvolume is created with namespace fsvolume_
> > You can also find the name with
> >
> > ceph fs subvolume info   | jq -r
> > .pool_namespace
> >
> >
> > # Create a user with access to the subvolume and the namespace
> >
> > ## First find the path to the subvolume
> >
> > ceph fs subvolume getpath  
> >
> > ## Create the user
> >
> > ceph auth get-or-create client. mon 'allow r' osd 'allow
> > rw pool= namespace=fsvolumens_'
> >
> >
> > I have found this by looking at how Openstack does it and some trial
> > and error.
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pg deep scrubbing issue

2023-01-02 Thread Anthony D'Atri
Look closely at your output. The PGs with 0 objects. Are only “every other” due 
to how the command happened to order the output.

Note that the empty PGs all have IDs matching “3.*”. The numeric prefix of a PG 
ID reflects the cardinal ID of the pool to which it belongs.   I strongly 
suspect that you have a pool with no data.



>> Strangely, ceph pg dump gives shows every other PG with 0 objects.  An 
>> attempt to perform a deep scrub (or scrub) on one of these PGs does nothing. 
>>   The cluster appears to be running fine, but obviously there’s an issue.   
>> What should my next steps be to troubleshoot ?
>>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
>>> OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE   
>>> STATE_STAMPVERSION   REPORTED   UP
>>> UP_PRIMARY ACTINGACTING_PRIMARY LAST_SCRUBSCRUB_STAMP   
>>>  LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
>>> 3.e9b 0  00 0   00  
>>>  0  000active+clean 2022-12-31 
>>> 22:49:07.629579   0'023686:19820   [28,79] 28   
>>> [28,79] 28   0'0 2022-12-31 22:49:07.629508 
>>> 0'0 2022-12-31 22:49:07.629508 0
>>> 1.e99 60594  00 0   0 177433523272  
>>>  0  0 3046 3046active+clean 2022-12-21 
>>> 14:35:08.175858  23686'268137  23686:1732399 [178,115]178 
>>> [178,115]178  23675'267613 2022-12-21 11:01:10.403525
>>> 23675'267613 2022-12-21 11:01:10.403525 0
>>> 3.e9a 0  00 0   00  
>>>  0  000active+clean 2022-12-31 
>>> 09:16:48.644619   0'023686:22855  [51,140] 51  
>>> [51,140] 51   0'0 2022-12-31 09:16:48.644568
>>>  0'0 2022-12-30 02:35:23.367344 0
>>> 1.e98 59962  00 0   0 177218669411  
>>>  0  0 3035 3035active+clean 2022-12-28 
>>> 14:14:49.908560  23686'265576  23686:1357499   [92,86] 92   
>>> [92,86] 92  23686'265445 2022-12-28 14:14:49.908522
>>> 23686'265445 2022-12-28 14:14:49.908522 0
>>> 3.e95 0  00 0   00  
>>>  0  000active+clean 2022-12-31 
>>> 06:09:39.442932   0'023686:22757   [48,83] 48   
>>> [48,83] 48   0'0 2022-12-31 06:09:39.442879 
>>> 0'0 2022-12-18 09:33:47.892142 0


As to your PGs not scrubbed in time, what sort of hardware are your OSDs?  Here 
are some thoughts, especially if they’re HDDs.

* If you don’t need that empty pool, delete it, then evaluate how many PGs on 
average your OSDs  hold (eg. `ceph osd df`).  If you have an unusually high 
number of PGs per, maybe just maybe you’re running afoul of 
osd_scrub_extended_sleep / osd_scrub_sleep .  In other words, individual scrubs 
on empty PGs may naturally be very fast, but they may be DoSing because of the 
efforts Ceph makes to spread out the impact of scrubs.

* Do you limit scrubs to certain times via osd_scrub_begin_hour, 
osd_scrub_end_hour, osd_scrub_begin_week_day, osd_scrub_end_week_day?  I’ve 
seen operators who constraint scrubs to only a few overnight / weekend hours, 
but doing so can hobble Ceph’s ability to get through them all in time.

* Similarly, a value of osd_scrub_load_threshold that’s too low can also result 
in starvation.  The load average statistic can be misleading on modern SMP 
systems with lots of cores.  I’ve witnessed 32c/64t OSD nodes report a load 
average of like 40, but with tools like htop one could see that they were 
barely breaking a sweat.

* If you have osd_scrub_during_recovery disabled and experience a lot of 
backfill / recovery / rebalance traffic, that can starve scrubs too.  IMHO with 
recent releases this should almost always be enabled, ymmv.

* Back when I ran busy (read: underspend) HDD clusters I had to bump 
osd_deep_scrub_interval by a factor of 4x due to how slow and seek-bound the 
LFF spinners were.  Of course, the longer one spaces out scrubs, the less 
effective they are at detecting problems before they’re impactful.




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pg deep scrubbing issue

2023-01-02 Thread Jeffrey Turmelle
Thanks for the reply.  I’ll give that a try, I wasn’t using the balancer.

> On Jan 2, 2023, at 1:55 AM, Pavin Joseph  wrote:
> 
> Hi Jeff,
> 
> Might be worth checking the balancer [0] status, also you probably want to 
> use upmap mode [1] if possible.
> 
> [0]: https://docs.ceph.com/en/latest/rados/operations/balancer/#status
> [1]: https://docs.ceph.com/en/latest/rados/operations/balancer/#modes
> 
> Kind regards,
> Pavin Joseph.
> 
> On 02-Jan-23 12:04 AM, Jeffrey Turmelle wrote:
>> Hi Everyone,
>> My Nautilus cluster of 6 nodes, 180 OSDs, is having a weird issue I don’t 
>> know how to troubleshoot.
>> I started receiving  health warning issues, and the number of PGS not 
>> deep-scrubbed in time has been increasing.
>> # ceph health detail
>> HEALTH_WARN 3013 pgs not scrubbed in time
>> PG_NOT_SCRUBBED 3013 pgs not scrubbed in time
>> pg 1.e99 not scrubbed since 2022-12-21 11:01:10.403525
>> pg 1.e94 not scrubbed since 2022-12-18 06:26:14.086410
>> pg 3.e91 not scrubbed since 2022-12-17 03:00:25.104174
>> pg 1.e90 not scrubbed since 2022-12-18 03:31:44.747218
>> pg 1.e8e not scrubbed since 2022-12-21 12:04:17.111762
>> pg 1.e89 not scrubbed since 2022-12-18 07:20:13.328540
>> ...
>> 2963 more pgs…
>> Strangely, ceph pg dump gives shows every other PG with 0 objects.  An 
>> attempt to perform a deep scrub (or scrub) on one of these PGs does nothing. 
>>   The cluster appears to be running fine, but obviously there’s an issue.   
>> What should my next steps be to troubleshoot ?
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
>> OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE   STATE_STAMP 
>>VERSION   REPORTED   UPUP_PRIMARY ACTING  
>>   ACTING_PRIMARY LAST_SCRUBSCRUB_STAMP
>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
>> 3.e9b 0  00 0   00   
>> 0  000active+clean 2022-12-31 
>> 22:49:07.629579   0'023686:19820   [28,79] 28   
>> [28,79] 28   0'0 2022-12-31 22:49:07.629508 
>> 0'0 2022-12-31 22:49:07.629508 0
>> 1.e99 60594  00 0   0 177433523272   
>> 0  0 3046 3046active+clean 2022-12-21 
>> 14:35:08.175858  23686'268137  23686:1732399 [178,115]178 
>> [178,115]178  23675'267613 2022-12-21 11:01:10.403525
>> 23675'267613 2022-12-21 11:01:10.403525 0
>> 3.e9a 0  00 0   00   
>> 0  000active+clean 2022-12-31 
>> 09:16:48.644619   0'023686:22855  [51,140] 51  
>> [51,140] 51   0'0 2022-12-31 09:16:48.644568 
>> 0'0 2022-12-30 02:35:23.367344 0
>> 1.e98 59962  00 0   0 177218669411   
>> 0  0 3035 3035active+clean 2022-12-28 
>> 14:14:49.908560  23686'265576  23686:1357499   [92,86] 92   
>> [92,86] 92  23686'265445 2022-12-28 14:14:49.908522
>> 23686'265445 2022-12-28 14:14:49.908522 0
>> 3.e95 0  00 0   00   
>> 0  000active+clean 2022-12-31 
>> 06:09:39.442932   0'023686:22757   [48,83] 48   
>> [48,83] 48   0'0 2022-12-31 06:09:39.442879 
>> 0'0 2022-12-18 09:33:47.892142 0
>> 1.e97 60062  00 0   0 176721095235   
>> 0  0 3050 3050active+clean 2022-12-31 
>> 21:19:33.758473  23686'267934  23686:1514273 [137,123]137 
>> [137,123]137  23686'267916 2022-12-31 21:19:33.758417
>> 23686'267713 2022-12-27 19:16:27.025326 0
>> 3.e94 0  00 0   00   
>> 0  000active+clean 2022-12-31 
>> 10:00:38.864773   0'023686:18478   [101,1]101   
>> [101,1]101   0'0 2022-12-31 10:00:38.864730 
>> 0'0 2022-12-28 22:28:13.790168 0
>> 1.e96 59753  00 0   0 175411602155   
>> 0  0 3083 3083active+clean 2022-12-28 
>> 14:13:32.186265  23686'264255  23686:1676359  [54,170] 54  
>> [54,170] 54  23686'264120 2022-12-28 14:13:32.186220
>> 23686'264120 2022-12-28 14:13:32.186220 0
>> 3.e97 0  00 0   00   
>> 0  000active+clean 

[ceph-users] Re: CephFS: Isolating folders for different users

2023-01-02 Thread Robert Gallop
One side affect of using sub volumes is that you can then only take a snap
at the sub volume level, nothing further down the tree.

I find you can use the same path on the auth without the sub volume unless
I’m missing something in this thread.

On Mon, Jan 2, 2023 at 10:21 AM Jonas Schwab <
jonas.sch...@physik.uni-wuerzburg.de> wrote:

> Thank you very much! Works like a charm, except for one thing: I gave my
> clients the MDS caps 'allow rws path=' to also be able
> to create snapshots from the client, but `mkdir .snap/test` still returns
>  mkdir: cannot create directory ‘.snap/test’: Operation not permitted
>
> Do you have an idea what might be the issue here?
>
> Best regards,
> Jonas
>
> PS: A happy new year to everyone!
>
> On 23.12.22 10:05, Kai Stian Olstad wrote:
> > On 22.12.2022 15:47, Jonas Schwab wrote:
> >> Now the question: Since I established this setup more or less through
> >> trial and error, I was wondering if there is a more elegant/better
> >> approach than what is outlined above?
> >
> > You can use namespace so you don't need separate pools.
> > Unfortunately the documentation is sparse on the subject, I use it
> > with subvolume like this
> >
> >
> > # Create a subvolume
> >
> > ceph fs subvolume create  
> > --pool_layout  --namespace-isolated
> >
> > The subvolume is created with namespace fsvolume_
> > You can also find the name with
> >
> > ceph fs subvolume info   | jq -r
> > .pool_namespace
> >
> >
> > # Create a user with access to the subvolume and the namespace
> >
> > ## First find the path to the subvolume
> >
> > ceph fs subvolume getpath  
> >
> > ## Create the user
> >
> > ceph auth get-or-create client. mon 'allow r' osd 'allow
> > rw pool= namespace=fsvolumens_'
> >
> >
> > I have found this by looking at how Openstack does it and some trial
> > and error.
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS: Isolating folders for different users

2023-01-02 Thread Jonas Schwab
Thank you very much! Works like a charm, except for one thing: I gave my 
clients the MDS caps 'allow rws path=' to also be able 
to create snapshots from the client, but `mkdir .snap/test` still returns

    mkdir: cannot create directory ‘.snap/test’: Operation not permitted

Do you have an idea what might be the issue here?

Best regards,
Jonas

PS: A happy new year to everyone!

On 23.12.22 10:05, Kai Stian Olstad wrote:

On 22.12.2022 15:47, Jonas Schwab wrote:

Now the question: Since I established this setup more or less through
trial and error, I was wondering if there is a more elegant/better
approach than what is outlined above?


You can use namespace so you don't need separate pools.
Unfortunately the documentation is sparse on the subject, I use it 
with subvolume like this



# Create a subvolume

    ceph fs subvolume create   
--pool_layout  --namespace-isolated


The subvolume is created with namespace fsvolume_
You can also find the name with

    ceph fs subvolume info   | jq -r 
.pool_namespace



# Create a user with access to the subvolume and the namespace

## First find the path to the subvolume

    ceph fs subvolume getpath  

## Create the user

    ceph auth get-or-create client. mon 'allow r' osd 'allow 
rw pool= namespace=fsvolumens_'



I have found this by looking at how Openstack does it and some trial 
and error.




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation

2023-01-02 Thread Anthony D'Atri
 Sent prematurely.

I meant to add that after ~3 years of service, the 1 DWPD drives in the 
clusters I mentioned mostly reported <10% of endurance burned.

Required endurance is in part a function of how long you expect the drives to 
last.

>> Having said that, for a storage cluster where write performance is expected 
>> to be the main bottleneck, I would be hesitant to use drives that only have 
>> 1DWPD endurance since Ceph has fairly high write amplification factors. If 
>> you use 3-fold replication, this cluster might only be able to handle a few 
>> TB of writes per day without wearing out the drives prematurely.
> 
>> 
>>> Hi Experts,I am seeking for if there is achievable significant write 
>>> performance improvements when separating WAL/DB in a ceph cluster with all 
>>> SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise 
>>> Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I 
>>> get better write IOPs and throughput if I add one NVMe OSD per node and 
>>> separate WAL/DB on it?Is the result of this separation, meaningful 
>>> performance improvement or not?
>>> My ceph cluster is block storage back-end of Openstack cinder in a public 
>>> cloud service.
> 
> 
> My zwei pfennig:
> 
> * IMHO the performance delta with external WAL+DB is going to be limited.  
> NVNe WAL+DB would deliver lower write latency up to a point, but throughput 
> is still going to be limited by the SAS HBA / bulk OSD drives.  You also have 
> the hassle of managing OSDs that span devices: when replacing a failed OSD 
> properly handling the shared device can be tricky.  With your very small 
> number of nodes and drives, the blast radius of one failing would be really 
> large.
> 
> * Do you have the libvirt / librbd client-side cache disabled?
> 
> * I’ve run 3R clusters in a similar role, backing libvirt / librbd clients 
> and using SATA SSDs.  We mostly were able to sustain an average write latency 
> <= 5ms, though a couple of times we had to expand a cluster for IOPs before 
> capacity.  The crappy HBAs in use were part of the bottleneck.  This sort of 
> thing is one of the inputs to the SNIA TCO calculator.
> 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation

2023-01-02 Thread Anthony D'Atri


> Having said that, for a storage cluster where write performance is expected 
> to be the main bottleneck, I would be hesitant to use drives that only have 
> 1DWPD endurance since Ceph has fairly high write amplification factors. If 
> you use 3-fold replication, this cluster might only be able to handle a few 
> TB of writes per day without wearing out the drives prematurely.

> 
>> Hi Experts,I am seeking for if there is achievable significant write 
>> performance improvements when separating WAL/DB in a ceph cluster with all 
>> SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise 
>> Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I 
>> get better write IOPs and throughput if I add one NVMe OSD per node and 
>> separate WAL/DB on it?Is the result of this separation, meaningful 
>> performance improvement or not?
>> My ceph cluster is block storage back-end of Openstack cinder in a public 
>> cloud service.


My zwei pfennig:

* IMHO the performance delta with external WAL+DB is going to be limited.  NVNe 
WAL+DB would deliver lower write latency up to a point, but throughput is still 
going to be limited by the SAS HBA / bulk OSD drives.  You also have the hassle 
of managing OSDs that span devices: when replacing a failed OSD properly 
handling the shared device can be tricky.  With your very small number of nodes 
and drives, the blast radius of one failing would be really large.

* Do you have the libvirt / librbd client-side cache disabled?

* I’ve run 3R clusters in a similar role, backing libvirt / librbd clients and 
using SATA SSDs.  We mostly were able to sustain an average write latency <= 
5ms, though a couple of times we had to expand a cluster for IOPs before 
capacity.  The crappy HBAs in use were part of the bottleneck.  This sort of 
thing is one of the inputs to the SNIA TCO calculator.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation

2023-01-02 Thread Mevludin Blazevic

Hi all,

I have a similar question regarding a cluster configuration consisting 
of HDDs, SSDs and NVMes. Let's say I would setup a OSD configuration in 
a yaml file like this:


service_type:osd
service_id:osd_spec_default
placement:
host_pattern:'*'
spec:
data_devices:
model:HDD-Model-XY
db_devices:
model:NVME-Model-A
wal_devices:
model:NVME-Model-A Afaik that would result in having the db and wal on 
the nvmes, whereas all data is put on the HDD. I assume I would get 
significant IOPS performance when using Block device in Openstack (or 
Proxmox), but I have to put up with wearing out the NVMes, right? In the 
Ceph documentation, I saw that it would be also sufficient to define the 
2,5" SSDs as db_devices. Nevertheless, I need some SSDs as OSDs since 
the Ceph FS metadata pool and NFS pool needs to be put on a SSD OSD. I 
might use 3 out of 4 SSDs each node as db_devices, but I need some for 
the metadata pools. Any suggestions? Regards, Mevludin


Am 02.01.2023 um 15:03 schrieb Erik Lindahl:

Depends.

In theory, each OSD will have access to 1/4 of the separate WAL/DB device, so 
to get better performance you need to find an NVMe device that delivers 
significantly more than 4x the IOPS rate of the pm1643 drives, which is not 
common.

That assumes the pm1643 devices are connected to a high-quality well-configured 
12Gb SAS controller that really can deliver the full IOPS rate of 4 drives 
combined. The only way to find that out is likely to benchmark.

Having said that, for a storage cluster where write performance is expected to 
be the main bottleneck, I would be hesitant to use drives that only have 1DWPD 
endurance since Ceph has fairly high write amplification factors. If you use 
3-fold replication, this cluster might only be able to handle a few TB of 
writes per day without wearing out the drives prematurely.

In practice we've been quite happy with Samsung drives that have often far 
exceeded their warranty endurance, but that's not something I would like to 
rely on when providing a commercial service.

Cheers,

Erik




--
Erik Lindahl
On 2 Jan 2023 at 10:25 +0100,hosseinz8...@yahoo.com  , 
wrote:

Hi Experts,I am seeking for if there is achievable significant write 
performance improvements when separating WAL/DB in a ceph cluster with all SSD 
type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise Samsung). I 
have 10 Storage node each with 4 OSD. I want to know that can I get better 
write IOPs and throughput if I add one NVMe OSD per node and separate WAL/DB on 
it?Is the result of this separation, meaningful performance improvement or not?
My ceph cluster is block storage back-end of Openstack cinder in a public cloud 
service.

Thanks in advance.
___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph All-SSD Cluster & Wal/DB Separation

2023-01-02 Thread Erik Lindahl
Depends.

In theory, each OSD will have access to 1/4 of the separate WAL/DB device, so 
to get better performance you need to find an NVMe device that delivers 
significantly more than 4x the IOPS rate of the pm1643 drives, which is not 
common.

That assumes the pm1643 devices are connected to a high-quality well-configured 
12Gb SAS controller that really can deliver the full IOPS rate of 4 drives 
combined. The only way to find that out is likely to benchmark.

Having said that, for a storage cluster where write performance is expected to 
be the main bottleneck, I would be hesitant to use drives that only have 1DWPD 
endurance since Ceph has fairly high write amplification factors. If you use 
3-fold replication, this cluster might only be able to handle a few TB of 
writes per day without wearing out the drives prematurely.

In practice we've been quite happy with Samsung drives that have often far 
exceeded their warranty endurance, but that's not something I would like to 
rely on when providing a commercial service.

Cheers,

Erik




--
Erik Lindahl 
On 2 Jan 2023 at 10:25 +0100, hosseinz8...@yahoo.com , 
wrote:
> Hi Experts,I am seeking for if there is achievable significant write 
> performance improvements when separating WAL/DB in a ceph cluster with all 
> SSD type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise 
> Samsung). I have 10 Storage node each with 4 OSD. I want to know that can I 
> get better write IOPs and throughput if I add one NVMe OSD per node and 
> separate WAL/DB on it?Is the result of this separation, meaningful 
> performance improvement or not?
> My ceph cluster is block storage back-end of Openstack cinder in a public 
> cloud service.
>
> Thanks in advance.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph failing to write data - MDSs read only

2023-01-02 Thread Amudhan P
Hi Kotresh,

The issue is fixed for now I followed  the steps below.

I have an unmounted kernel client and restarted mds service which brought
back mds to normal. But even after this "1 MDSs behind on trimming issue"
didn't solve I waited for about 20 - 30 mins which automatically fixed the
trimming issue and ceph status is healthy now.

I didn't modify the settings related to the MDS cache they are in their
default settings.


On Mon, Jan 2, 2023 at 10:54 AM Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> The MDS requests the clients to release caps to trim caches when there is
> cache pressure or it
> might proactively request the client to release caps in some cases. But
> the client is failing to release the
> caps soon enough in your case.
>
> Few questions:
>
> 1. Have you tuned MDS cache configurations? If so please share.
> 2. Is this kernel client or fuse client?
> 3. Could you please share 'session ls' output?
> 4. Also share the MDS/Client logs.
>
> Sometimes dropping the caches (echo 3 > /proc/sys/vm/drop_caches if it's
> kclient) or unmount and mounting
> the problematic client  could fix the issue if it's acceptable.
>
> Thanks and Regards,
> Kotresh H R
>
> On Thu, Dec 29, 2022 at 4:35 PM Amudhan P  wrote:
>
>> Hi,
>>
>> Suddenly facing an issue with Ceph cluster I am using ceph version 16.2.6.
>> I couldn't find any solution for the issue below.
>> Any suggestions?
>>
>>
>> health: HEALTH_WARN
>> 1 clients failing to respond to capability release
>> 1 clients failing to advance oldest client/flush tid
>> 1 MDSs are read only
>> 1 MDSs report slow requests
>> 1 MDSs behind on trimming
>>
>>   services:
>> mon: 3 daemons, quorum strg-node1,strg-node2,strg-node3 (age 9w)
>> mgr: strg-node1.ivkfid(active, since 9w), standbys: strg-node2.unyimy
>> mds: 1/1 daemons up, 1 standby
>> osd: 32 osds: 32 up (since 9w), 32 in (since 5M)
>>
>>   data:
>> volumes: 1/1 healthy
>> pools:   3 pools, 321 pgs
>> objects: 13.19M objects, 45 TiB
>> usage:   90 TiB used, 85 TiB / 175 TiB avail
>> pgs: 319 active+clean
>>  2   active+clean+scrubbing+deep
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: max pool size (amount of data/number of OSDs)

2023-01-02 Thread Konstantin Shalygin
Hi Chris,

The actually limits are not software. Usually Ceph teams on Cloud Providers or 
Universities running out at physical resources at first: racks, racks power or 
network (ports, EOL switches that can't be upgraded) or hardware lifetime 
(There is no point in buying old hardware, and the new one is too new to 
interfere with the old one.  At the same time, replacing everything at once is 
very expensive (millions of dollars [depends on the region where the equipment 
is purchased and where it will be operated]))


k
Sent from my iPhone

> On 30 Dec 2022, at 19:52, Christopher Durham  wrote:
> 
> 
> Hi,
> Is there any information on this issue? Max number of OSDs per pool, or 
> maxpool size (data) as opposed to cluster size? Thanks!
> -Chris
> 
> 
> -Original Message-
> From: Christopher Durham 
> To: ceph-users@ceph.io 
> Sent: Thu, Dec 15, 2022 5:36 pm
> Subject: max pool size (amount of data/number of OSDs)
> 
> 
> Hi,
> There are various articles, case studies, etc about large ceph clusters, 
> storing 10s of PiB,with CERN being the largest cluster as far as I know.
> Is there a largest pool capacity limit?  In other words, while you may have a 
> 30PiB cluster,is there a limit or recommendation as to max pool capacity. For 
> example, in the 30PiB example,is there a limit or recommendation that says do 
> not have a pool capacity of higher than 5iB, for 6pools in that cluster at a 
> ttotal of 30PiB?
> 
> I know this would be contingent upon a variety of things, including, but not 
> limited to network throughput, individual serversize (disk size and number, 
> memory, compute). I am specifically talking about s3./rgw storage.
> 
> But is there a technical limit, or just a tested size, of a pool? Should I 
> createdifferent pools when a given pool would otherwise reach a size capacity 
> of Xor have N osds or PGs in it, when considering adding additional osds?
> Thanks for any info
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph All-SSD Cluster & Wal/DB Separation

2023-01-02 Thread hosseinz8...@yahoo.com
Hi Experts,I am seeking for if there is achievable significant write 
performance improvements when separating WAL/DB in a ceph cluster with all SSD 
type OSD.I have a cluster with 40 SSD (PM1643 1.8 TB SSD Enterprise Samsung). I 
have 10 Storage node each with 4 OSD. I want to know that can I get better 
write IOPs and throughput if I add one NVMe OSD per node and separate WAL/DB on 
it?Is the result of this separation, meaningful performance improvement or not?
My ceph cluster is block storage back-end of Openstack cinder in a public cloud 
service.

Thanks in advance.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io