[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread Frédéric Nass

Hello, 
  
We've had a similar situation recently where OSDs would use way more memory 
than osd_memory_target and get OOM killed by the kernel. 
It was due to a kernel bug related to cgroups [1]. 
  
If num_cgroups below keeps increasing then you may hit this bug.
 
  
$ cat /proc/cgroups | grep -e subsys -e blkio | column -t 
   #subsys_name  hierarchy  num_cgroups  enabled 
   blkio         4          1099         1 
  
If you hit this bug, upgrading OSDs nodes kernels should get you through. If 
you can't access the Red Hat KB [1], let me know your current nodes kernel 
version and I'll check for you. 
  Regards,
Frédéric.  
 
  
[1] https://access.redhat.com/solutions/7014337 

-Message original-

De: huxiaoyu 
à: ceph-users 
Envoyé: mercredi 10 janvier 2024 19:21 CET
Sujet : [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

Dear Ceph folks, 

I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one 
with replication 3, and the other with EC 4+2. After around 400 days runing 
quietly and smoothly, recently the two clusters occured with similar problems: 
some of OSDs consume ca 18 GB while the memory target is setting at 2GB. 

What could wrong in the background? Does it mean any slow OSD memory leak 
issues with 14.2.22 which i do not know yet? 

I would be highly appreciated if some some provides any clues, ideas, comments 
.. 

best regards, 

Samuel 



huxia...@horebdata.cn 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io   
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread huxia...@horebdata.cn
Dear Frederic,

Thanks a lot for the suggestions. We are using the valilla Linux 4.19 LTS 
version. Do you think we may be suffering from the same bug?

best regards,

Samuel



huxia...@horebdata.cn
 
From: Frédéric Nass
Date: 2024-01-12 09:19
To: huxiaoyu
CC: ceph-users
Subject: Re: [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?
Hello,
 
We've had a similar situation recently where OSDs would use way more memory 
than osd_memory_target and get OOM killed by the kernel.
It was due to a kernel bug related to cgroups [1].
 
If num_cgroups below keeps increasing then you may hit this bug.
 
$ cat /proc/cgroups | grep -e subsys -e blkio | column -t
   #subsys_name  hierarchy  num_cgroups  enabled
   blkio 4  1099 1
 
If you hit this bug, upgrading OSDs nodes kernels should get you through. If 
you can't access the Red Hat KB [1], let me know your current nodes kernel 
version and I'll check for you.
 
Regards,
Frédéric.
 
[1] https://access.redhat.com/solutions/7014337


De: huxiaoyu 
à: ceph-users 
Envoyé: mercredi 10 janvier 2024 19:21 CET
Sujet : [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

Dear Ceph folks, 

I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one 
with replication 3, and the other with EC 4+2. After around 400 days runing 
quietly and smoothly, recently the two clusters occured with similar problems: 
some of OSDs consume ca 18 GB while the memory target is setting at 2GB. 

What could wrong in the background? Does it mean any slow OSD memory leak 
issues with 14.2.22 which i do not know yet? 

I would be highly appreciated if some some provides any clues, ideas, comments 
.. 

best regards, 

Samuel 



huxia...@horebdata.cn 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Hi Yang and Anthony,

I found the solution for this problem on a HDD disk 7200rpm

When the cluster recovers, one or multiple disk failures because slowop
appears and then affects the cluster, we can change these configurations
and may reduce IOPS when recovery.
osd_mclock_profile=custom
osd_mclock_scheduler_background_recovery_lim=0.2
osd_mclock_scheduler_background_recovery_res=0.2
osd_mclock_scheduler_client_wgt


Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang 
đã viết:

> The 2*10Gbps shared network seems to be full (1.9GB/s).
> Is it possible to reduce part of the workload and wait for the cluster
> to return to a healthy state?
> Tip: Erasure coding needs to collect all data blocks when recovering
> data, so it takes up a lot of network card bandwidth and processor
> resources.
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
I update the config
osd_mclock_profile=custom
osd_mclock_scheduler_background_recovery_lim=0.2
osd_mclock_scheduler_background_recovery_res=0.2
osd_mclock_scheduler_client_wgt=6

Vào Th 6, 12 thg 1, 2024 vào lúc 15:31 Phong Tran Thanh <
tranphong...@gmail.com> đã viết:

> Hi Yang and Anthony,
>
> I found the solution for this problem on a HDD disk 7200rpm
>
> When the cluster recovers, one or multiple disk failures because slowop
> appears and then affects the cluster, we can change these configurations
> and may reduce IOPS when recovery.
> osd_mclock_profile=custom
> osd_mclock_scheduler_background_recovery_lim=0.2
> osd_mclock_scheduler_background_recovery_res=0.2
> osd_mclock_scheduler_client_wgt
>
>
> Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang 
> đã viết:
>
>> The 2*10Gbps shared network seems to be full (1.9GB/s).
>> Is it possible to reduce part of the workload and wait for the cluster
>> to return to a healthy state?
>> Tip: Erasure coding needs to collect all data blocks when recovering
>> data, so it takes up a lot of network card bandwidth and processor
>> resources.
>>
>
>
> --
> Trân trọng,
>
> 
>
> *Tran Thanh Phong*
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Frédéric Nass

Hello Torkil, 
  
We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with the 
below rule: 
  
 
rule ec54 { 
        id 3 
        type erasure 
        min_size 3 
        max_size 9 
        step set_chooseleaf_tries 5 
        step set_choose_tries 100 
        step take default class hdd 
        step choose indep 0 type datacenter 
        step chooseleaf indep 3 type host 
        step emit 
} 
  
Works fine. The only difference I see with your EC rule is the fact that we set 
min_size and max_size but I doubt this has anything to do with your situation. 
  
Since the cluster still complains about "Pool cephfs.hdd.data has 1024 
placement groups, should have 2048", did you run "ceph osd pool set 
cephfs.hdd.data pgp_num 2048" right after running "ceph osd pool set 
cephfs.hdd.data pg_num 2048"? [1] 
  
Might be that the pool still has 1024 PGs. 
    
 
Regards,
Frédéric. 
  
[1] 
https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups
  

   

-Message original-

De: Torkil 
à: ceph-users 
Cc: Ruben 
Envoyé: vendredi 12 janvier 2024 09:00 CET
Sujet : [ceph-users] 3 DC with 4+5 EC not quite working

We are looking to create a 3 datacenter 4+5 erasure coded pool but can't 
quite get it to work. Ceph version 17.2.7. These are the hosts (there 
will eventually be 6 hdd hosts in each datacenter): 

-33 886.00842 datacenter 714 
-7 209.93135 host ceph-hdd1 

-69 69.86389 host ceph-flash1 
-6 188.09579 host ceph-hdd2 

-3 233.57649 host ceph-hdd3 

-12 184.54091 host ceph-hdd4 
-34 824.47168 datacenter DCN 
-73 69.86389 host ceph-flash2 
-2 201.78067 host ceph-hdd5 

-81 288.26501 host ceph-hdd6 

-31 264.56207 host ceph-hdd7 

-36 1284.48621 datacenter TBA 
-77 69.86389 host ceph-flash3 
-21 190.83224 host ceph-hdd8 

-29 199.08838 host ceph-hdd9 

-11 193.85382 host ceph-hdd10 

-9 237.28154 host ceph-hdd11 

-26 187.19536 host ceph-hdd12 

-4 206.37102 host ceph-hdd13 

We did this: 

ceph osd erasure-code-profile set DRCMR_k4m5_datacenter_hdd 
plugin=jerasure k=4 m=5 technique=reed_sol_van crush-root=default 
crush-failure-domain=datacenter crush-device-class=hdd 

ceph osd pool create cephfs.hdd.data erasure DRCMR_k4m5_datacenter_hdd 
ceph osd pool set cephfs.hdd.data allow_ec_overwrites true 
ceph osd pool set cephfs.hdd.data pg_autoscale_mode warn 

Didn't quite work: 

" 
[WARN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg 
incomplete 
pg 33.0 is creating+incomplete, acting 
[104,219,NONE,NONE,NONE,41,NONE,NONE,NONE] (reducing pool 
cephfs.hdd.data min_size from 5 may help; search ceph.com/docs for 
'incomplete') 
" 

I then manually changed the crush rule from this: 

" 
rule cephfs.hdd.data { 
id 7 
type erasure 
step set_chooseleaf_tries 5 
step set_choose_tries 100 
step take default class hdd 
step chooseleaf indep 0 type datacenter 
step emit 
} 
" 

To this: 

" 
rule cephfs.hdd.data { 
id 7 
type erasure 
step set_chooseleaf_tries 5 
step set_choose_tries 100 
step take default class hdd 
step choose indep 0 type datacenter 
step chooseleaf indep 3 type host 
step emit 
} 
" 

Based on some testing and dialogue I had with Red Hat support last year 
when we were on RHCS, and it seemed to work. Then: 

ceph fs add_data_pool cephfs cephfs.hdd.data 
ceph fs subvolumegroup create hdd --pool_layout cephfs.hdd.data 

I started copying data to the subvolume and increased pg_num a couple of 
times: 

ceph osd pool set cephfs.hdd.data pg_num 256 
ceph osd pool set cephfs.hdd.data pg_num 2048 

But at some point it failed to activate new PGs eventually leading to this: 

" 
[WARN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs 
mds.cephfs.ceph-flash1.agdajf(mds.0): 64 slow metadata IOs are 
blocked > 30 secs, oldest blocked for 25455 secs 
[WARN] MDS_TRIM: 1 MDSs behind on trimming 
mds.cephfs.ceph-flash1.agdajf(mds.0): Behind on trimming 
(997/128) max_segments: 128, num_segments: 997 
[WARN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive 
pg 33.6f6 is stuck inactive for 8h, current state 
activating+remapped, last acting [50,79,116,299,98,219,164,124,421] 
pg 33.6fa is stuck inactive for 11h, current state 
activating+undersized+degraded+remapped, last acting 
[17,408,NONE,196,223,290,73,39,11] 
pg 33.705 is stuck inactive for 11h, current state 
activating+undersized+degraded+remapped, last acting 
[33,273,71,NONE,411,96,28,7,161] 
pg 33.721 is stuck inactive for 7h, current state 
activating+remapped, last acting [283,150,209,423,103,325,118,142,87] 
pg 33.726 is stuck inactive for 11h, current state 
activating+undersized+degraded+remapped, last acting 
[234,NONE,416,121,54,141,277,265,19] 
[WARN] PG_DEGRADED: Degraded data redundancy: 1818/1282640036 objects 
degraded (0.000%), 3 pgs degraded, 3 pgs undersized 
pg 33.6fa is stuck undersized for 7h, current state 
activating+undersized+degraded+remapped, last acting 
[17,408,NONE,196,223,290,73,39,11] 
pg 33.705 is stuck undersized for 7

[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Szabo, Istvan (Agoda)
Is it better?


Istvan Szabo
Staff Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---




From: Phong Tran Thanh 
Sent: Friday, January 12, 2024 3:32 PM
To: David Yang 
Cc: ceph-users@ceph.io 
Subject: [ceph-users] Re: About ceph disk slowops effect to cluster

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


I update the config
osd_mclock_profile=custom
osd_mclock_scheduler_background_recovery_lim=0.2
osd_mclock_scheduler_background_recovery_res=0.2
osd_mclock_scheduler_client_wgt=6

Vào Th 6, 12 thg 1, 2024 vào lúc 15:31 Phong Tran Thanh <
tranphong...@gmail.com> đã viết:

> Hi Yang and Anthony,
>
> I found the solution for this problem on a HDD disk 7200rpm
>
> When the cluster recovers, one or multiple disk failures because slowop
> appears and then affects the cluster, we can change these configurations
> and may reduce IOPS when recovery.
> osd_mclock_profile=custom
> osd_mclock_scheduler_background_recovery_lim=0.2
> osd_mclock_scheduler_background_recovery_res=0.2
> osd_mclock_scheduler_client_wgt
>
>
> Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang 
> đã viết:
>
>> The 2*10Gbps shared network seems to be full (1.9GB/s).
>> Is it possible to reduce part of the workload and wait for the cluster
>> to return to a healthy state?
>> Tip: Erasure coding needs to collect all data blocks when recovering
>> data, so it takes up a lot of network card bandwidth and processor
>> resources.
>>
>
>
> --
> Trân trọng,
>
> 
>
> *Tran Thanh Phong*
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
>


--
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread Frédéric Nass

Samuel, 
  
Hard to tell for sure since this bug hit different major versions of the 
kernel, at least RHEL's from what I know. The only way to tell is to check for 
num_cgroups in /proc/cgroups:

 
 
$ cat /proc/cgroups | grep -e subsys -e blkio | column -t 
   #subsys_name  hierarchy  num_cgroups  enabled 
   blkio         4          1099         1  
Otherwise, you'd have to check the sources of the kernel you're using against 
the patch that fixed this bug. Unfortunately, I can't spot the upstream patch 
that fixed this issue since RH BZs related to this bug are private. Maybe 
someone here can spot it. 
   
 
Regards, 
Frédéric.  

  

-Message original-

De: huxiaoyu 
à: Frédéric 
Cc: ceph-users 
Envoyé: vendredi 12 janvier 2024 09:25 CET
Sujet : Re: Re: [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

 
Dear Frederic, 
  
Thanks a lot for the suggestions. We are using the valilla Linux 4.19 LTS 
version. Do you think we may be suffering from the same bug? 
  
best regards, 
  
Samuel 
  
   huxia...@horebdata.cn    From: Frédéric Nass Date: 2024-01-12 09:19 To: 
huxiaoyu CC: ceph-users Subject: Re: [ceph-users] Ceph Nautilous 14.2.22 slow 
OSD memory leak?  Hello,   We've had a similar situation recently where 
OSDs would use way more memory than osd_memory_target and get OOM killed by the 
kernel. It was due to a kernel bug related to cgroups [1].   If num_cgroups 
below keeps increasing then you may hit this bug.
 
  
$ cat /proc/cgroups | grep -e subsys -e blkio | column -t 
   #subsys_name  hierarchy  num_cgroups  enabled 
   blkio         4          1099         1 
  
If you hit this bug, upgrading OSDs nodes kernels should get you through. If 
you can't access the Red Hat KB [1], let me know your current nodes kernel 
version and I'll check for you. 
  Regards,
Frédéric. 
 
 
  
[1] https://access.redhat.com/solutions/7014337 
De: huxiaoyu 
à: ceph-users 
Envoyé: mercredi 10 janvier 2024 19:21 CET
Sujet : [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

Dear Ceph folks, 

I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one 
with replication 3, and the other with EC 4+2. After around 400 days runing 
quietly and smoothly, recently the two clusters occured with similar problems: 
some of OSDs consume ca 18 GB while the memory target is setting at 2GB. 

What could wrong in the background? Does it mean any slow OSD memory leak 
issues with 14.2.22 which i do not know yet? 

I would be highly appreciated if some some provides any clues, ideas, comments 
.. 

best regards, 

Samuel 



huxia...@horebdata.cn 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Yes, it's good for me, reduce recovery process from 4GB/s to 200MB/s

Vào Th 6, 12 thg 1, 2024 vào lúc 15:52 Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> đã viết:

> Is it better?
>
> Istvan Szabo
> Staff Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
> --
> *From:* Phong Tran Thanh 
> *Sent:* Friday, January 12, 2024 3:32 PM
> *To:* David Yang 
> *Cc:* ceph-users@ceph.io 
> *Subject:* [ceph-users] Re: About ceph disk slowops effect to cluster
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> I update the config
> osd_mclock_profile=custom
> osd_mclock_scheduler_background_recovery_lim=0.2
> osd_mclock_scheduler_background_recovery_res=0.2
> osd_mclock_scheduler_client_wgt=6
>
> Vào Th 6, 12 thg 1, 2024 vào lúc 15:31 Phong Tran Thanh <
> tranphong...@gmail.com> đã viết:
>
> > Hi Yang and Anthony,
> >
> > I found the solution for this problem on a HDD disk 7200rpm
> >
> > When the cluster recovers, one or multiple disk failures because slowop
> > appears and then affects the cluster, we can change these configurations
> > and may reduce IOPS when recovery.
> > osd_mclock_profile=custom
> > osd_mclock_scheduler_background_recovery_lim=0.2
> > osd_mclock_scheduler_background_recovery_res=0.2
> > osd_mclock_scheduler_client_wgt
> >
> >
> > Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang  >
> > đã viết:
> >
> >> The 2*10Gbps shared network seems to be full (1.9GB/s).
> >> Is it possible to reduce part of the workload and wait for the cluster
> >> to return to a healthy state?
> >> Tip: Erasure coding needs to collect all data blocks when recovering
> >> data, so it takes up a lot of network card bandwidth and processor
> >> resources.
> >>
> >
> >
> > --
> > Trân trọng,
> >
> >
> 
> >
> > *Tran Thanh Phong*
> >
> > Email: tranphong...@gmail.com
> > Skype: tranphong079
> >
>
>
> --
> Trân trọng,
>
> 
>
> *Tran Thanh Phong*
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Torkil Svensgaard



On 12-01-2024 09:35, Frédéric Nass wrote:


Hello Torkil,


Hi Frédéric


We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with the 
below rule:
   
  
rule ec54 {

         id 3
         type erasure
         min_size 3
         max_size 9
         step set_chooseleaf_tries 5
         step set_choose_tries 100
         step take default class hdd
         step choose indep 0 type datacenter
         step chooseleaf indep 3 type host
         step emit
}
   
Works fine. The only difference I see with your EC rule is the fact that we set min_size and max_size but I doubt this has anything to do with your situation.


Great, thanks. I wonder if we might need to tweak the min_size. I think 
I tried lowering it to no avail and then set it back to 5 after editing 
the crush rule.



Since the cluster still complains about "Pool cephfs.hdd.data has 1024 placement groups, should have 
2048", did you run "ceph osd pool set cephfs.hdd.data pgp_num 2048" right after running 
"ceph osd pool set cephfs.hdd.data pg_num 2048"? [1]
   
Might be that the pool still has 1024 PGs.


Hmm coming from RHCS we didn't do this as:

"
RHCS 4.x and 5.x does not require the pgp_num value to be set. This will 
be done by ceph-mgr automatically. For RHCS 4.x and 5.x, only the pg_num 
is required to be incremented for the necessary pools.

"

So I only did "ceph osd pool set cephfs.hdd.data pgp_num 2048" and let 
the mgr handle the rest. I had a watch running to see how it went and 
the pool was up to something like 1922 PGs when it got stuck.


As I read the documentation[1] this shouldn't get us stuck like we did, 
but we would have to set the pgp_num eventually to get it to rebalance?


Mvh.

Torkil

[1] 
https://docs.ceph.com/en/quincy/rados/operations/placement-groups/#setting-the-number-of-pgs


 
  
Regards,

Frédéric.
   
[1] https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups





-Message original-

De: Torkil 
à: ceph-users 
Cc: Ruben 
Envoyé: vendredi 12 janvier 2024 09:00 CET
Sujet : [ceph-users] 3 DC with 4+5 EC not quite working

We are looking to create a 3 datacenter 4+5 erasure coded pool but can't
quite get it to work. Ceph version 17.2.7. These are the hosts (there
will eventually be 6 hdd hosts in each datacenter):

-33 886.00842 datacenter 714
-7 209.93135 host ceph-hdd1

-69 69.86389 host ceph-flash1
-6 188.09579 host ceph-hdd2

-3 233.57649 host ceph-hdd3

-12 184.54091 host ceph-hdd4
-34 824.47168 datacenter DCN
-73 69.86389 host ceph-flash2
-2 201.78067 host ceph-hdd5

-81 288.26501 host ceph-hdd6

-31 264.56207 host ceph-hdd7

-36 1284.48621 datacenter TBA
-77 69.86389 host ceph-flash3
-21 190.83224 host ceph-hdd8

-29 199.08838 host ceph-hdd9

-11 193.85382 host ceph-hdd10

-9 237.28154 host ceph-hdd11

-26 187.19536 host ceph-hdd12

-4 206.37102 host ceph-hdd13

We did this:

ceph osd erasure-code-profile set DRCMR_k4m5_datacenter_hdd
plugin=jerasure k=4 m=5 technique=reed_sol_van crush-root=default
crush-failure-domain=datacenter crush-device-class=hdd

ceph osd pool create cephfs.hdd.data erasure DRCMR_k4m5_datacenter_hdd
ceph osd pool set cephfs.hdd.data allow_ec_overwrites true
ceph osd pool set cephfs.hdd.data pg_autoscale_mode warn

Didn't quite work:

"
[WARN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg
incomplete
pg 33.0 is creating+incomplete, acting
[104,219,NONE,NONE,NONE,41,NONE,NONE,NONE] (reducing pool
cephfs.hdd.data min_size from 5 may help; search ceph.com/docs for
'incomplete')
"

I then manually changed the crush rule from this:

"
rule cephfs.hdd.data {
id 7
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type datacenter
step emit
}
"

To this:

"
rule cephfs.hdd.data {
id 7
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type datacenter
step chooseleaf indep 3 type host
step emit
}
"

Based on some testing and dialogue I had with Red Hat support last year
when we were on RHCS, and it seemed to work. Then:

ceph fs add_data_pool cephfs cephfs.hdd.data
ceph fs subvolumegroup create hdd --pool_layout cephfs.hdd.data

I started copying data to the subvolume and increased pg_num a couple of
times:

ceph osd pool set cephfs.hdd.data pg_num 256
ceph osd pool set cephfs.hdd.data pg_num 2048

But at some point it failed to activate new PGs eventually leading to this:

"
[WARN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.cephfs.ceph-flash1.agdajf(mds.0): 64 slow metadata IOs are
blocked > 30 secs, oldest blocked for 25455 secs
[WARN] MDS_TRIM: 1 MDSs behind on trimming
mds.cephfs.ceph-flash1.agdajf(mds.0): Behind on trimming
(997/128) max_segments: 128, num_segments: 997
[WARN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive
pg 33.6f6 is stuck inactive for 8h, current state
activating+remapped, last acting [50,79,116,299,98,219,1

[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Frank Schilder
Is it maybe this here: 
https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

I always have to tweak the num-tries parameters.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Torkil Svensgaard 
Sent: Friday, January 12, 2024 10:17 AM
To: Frédéric Nass
Cc: ceph-users@ceph.io; Ruben Vestergaard
Subject: [ceph-users] Re: 3 DC with 4+5 EC not quite working



On 12-01-2024 09:35, Frédéric Nass wrote:
>
> Hello Torkil,

Hi Frédéric

> We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with 
> the below rule:
>
>
> rule ec54 {
>  id 3
>  type erasure
>  min_size 3
>  max_size 9
>  step set_chooseleaf_tries 5
>  step set_choose_tries 100
>  step take default class hdd
>  step choose indep 0 type datacenter
>  step chooseleaf indep 3 type host
>  step emit
> }
>
> Works fine. The only difference I see with your EC rule is the fact that we 
> set min_size and max_size but I doubt this has anything to do with your 
> situation.

Great, thanks. I wonder if we might need to tweak the min_size. I think
I tried lowering it to no avail and then set it back to 5 after editing
the crush rule.

> Since the cluster still complains about "Pool cephfs.hdd.data has 1024 
> placement groups, should have 2048", did you run "ceph osd pool set 
> cephfs.hdd.data pgp_num 2048" right after running "ceph osd pool set 
> cephfs.hdd.data pg_num 2048"? [1]
>
> Might be that the pool still has 1024 PGs.

Hmm coming from RHCS we didn't do this as:

"
RHCS 4.x and 5.x does not require the pgp_num value to be set. This will
be done by ceph-mgr automatically. For RHCS 4.x and 5.x, only the pg_num
is required to be incremented for the necessary pools.
"

So I only did "ceph osd pool set cephfs.hdd.data pgp_num 2048" and let
the mgr handle the rest. I had a watch running to see how it went and
the pool was up to something like 1922 PGs when it got stuck.

As I read the documentation[1] this shouldn't get us stuck like we did,
but we would have to set the pgp_num eventually to get it to rebalance?

Mvh.

Torkil

[1]
https://docs.ceph.com/en/quincy/rados/operations/placement-groups/#setting-the-number-of-pgs

>
>
> Regards,
> Frédéric.
>
> [1] 
> https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups
>
>
>
> -Message original-
>
> De: Torkil 
> à: ceph-users 
> Cc: Ruben 
> Envoyé: vendredi 12 janvier 2024 09:00 CET
> Sujet : [ceph-users] 3 DC with 4+5 EC not quite working
>
> We are looking to create a 3 datacenter 4+5 erasure coded pool but can't
> quite get it to work. Ceph version 17.2.7. These are the hosts (there
> will eventually be 6 hdd hosts in each datacenter):
>
> -33 886.00842 datacenter 714
> -7 209.93135 host ceph-hdd1
>
> -69 69.86389 host ceph-flash1
> -6 188.09579 host ceph-hdd2
>
> -3 233.57649 host ceph-hdd3
>
> -12 184.54091 host ceph-hdd4
> -34 824.47168 datacenter DCN
> -73 69.86389 host ceph-flash2
> -2 201.78067 host ceph-hdd5
>
> -81 288.26501 host ceph-hdd6
>
> -31 264.56207 host ceph-hdd7
>
> -36 1284.48621 datacenter TBA
> -77 69.86389 host ceph-flash3
> -21 190.83224 host ceph-hdd8
>
> -29 199.08838 host ceph-hdd9
>
> -11 193.85382 host ceph-hdd10
>
> -9 237.28154 host ceph-hdd11
>
> -26 187.19536 host ceph-hdd12
>
> -4 206.37102 host ceph-hdd13
>
> We did this:
>
> ceph osd erasure-code-profile set DRCMR_k4m5_datacenter_hdd
> plugin=jerasure k=4 m=5 technique=reed_sol_van crush-root=default
> crush-failure-domain=datacenter crush-device-class=hdd
>
> ceph osd pool create cephfs.hdd.data erasure DRCMR_k4m5_datacenter_hdd
> ceph osd pool set cephfs.hdd.data allow_ec_overwrites true
> ceph osd pool set cephfs.hdd.data pg_autoscale_mode warn
>
> Didn't quite work:
>
> "
> [WARN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg
> incomplete
> pg 33.0 is creating+incomplete, acting
> [104,219,NONE,NONE,NONE,41,NONE,NONE,NONE] (reducing pool
> cephfs.hdd.data min_size from 5 may help; search ceph.com/docs for
> 'incomplete')
> "
>
> I then manually changed the crush rule from this:
>
> "
> rule cephfs.hdd.data {
> id 7
> type erasure
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take default class hdd
> step chooseleaf indep 0 type datacenter
> step emit
> }
> "
>
> To this:
>
> "
> rule cephfs.hdd.data {
> id 7
> type erasure
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take default class hdd
> step choose indep 0 type datacenter
> step chooseleaf indep 3 type host
> step emit
> }
> "
>
> Based on some testing and dialogue I had with Red Hat support last year
> when we were on RHCS, and it seemed to work. Then:
>
> ceph fs add_data_pool cephfs cephfs.hdd.data
> ceph fs subvolumegroup create hdd --pool_layout cephfs.hdd.data
>
> I started copying data to the subvolume and increased pg_num a cou

[ceph-users] Re: RGW rate-limiting or anti-hammering for (external) auth requests // Anti-DoS measures

2024-01-12 Thread Christian Rohmann

Hey Istvan,

On 10.01.24 03:27, Szabo, Istvan (Agoda) wrote:
I'm using in the frontend https config on haproxy like this, it works 
so far good:


stick-table type ip size 1m expire 10s store http_req_rate(10s)

tcp-request inspect-delay 10s
tcp-request content track-sc0 src
http-request deny deny_status 429 if { sc_http_req_rate(0) gt 1 }



But this serves as a basic rate limit for all request coming from a 
single IP address, right?



My question was rather about limiting clients in regards to 
authentication requests / unauthorized requests,

which end up hammering the auth system (Keystone in my case) at full rate.



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW - user created bucket with name of already created bucket

2024-01-12 Thread Ondřej Kukla
Thanks Jayanth,

I’ve tried this but unfortunately the unlink fails as the it checks against the 
bucket owner id which is not the user I’m trying to unlink.

So I’m still stuck here with two users with same bucket name :(

Ondrej

> On 24. 12. 2023, at 17:14, Jayanth Reddy  wrote:
> 
> Hi Ondřej,
> I've not tried it myself, but see if you can use # radosgw-admin bucket 
> unlink [1] command to achieve it. It is strange that the user was somehow 
> able to create the bucket with the same name. We've also got v17.2.6 and have 
> not encountered this so far. Maybe devs from RGW can answer this.
> 
> [1] https://docs.ceph.com/en/quincy/man/8/radosgw-admin/#commands
> 
> Thanks,
> Jayanth
> 
> On Fri, Dec 22, 2023 at 7:29 PM Ondřej Kukla  > wrote:
>> Hello,
>> 
>> I would like to share a quite worrying experience I’ve just found on one of 
>> my production clusters.
>> 
>> User successfully created a bucket with name of a bucket that already exists!
>> 
>> He is not bucket owner - the original user is, but he is able to see it when 
>> he does ListBuckets over s3 api. (Both accounts are able to do it now - only 
>> the original owner is able to interact with it)
>> 
>> This bucket is also counted to the new users usage stats.
>> 
>> Has anyone noticed this before? This cluster is running on Quincy - 17.2.6.
>> 
>> Is there a way to detach the bucket from the new owner so he doesn’t have a 
>> bucket that doesn’t belong to him?
>> 
>> Regards,
>> 
>> Ondrej
>> 
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io 
>> To unsubscribe send an email to ceph-users-le...@ceph.io 
>> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Torkil Svensgaard



On 12-01-2024 10:30, Frank Schilder wrote:

Is it maybe this here: 
https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

I always have to tweak the num-tries parameters.


Oh, that seems plausible. Kinda scary, and odd, that hitting this 
doesn't generate a ceph health warning when the problem is fairly well 
documented?


Anyhoo, we decided to redo the pool etc going straight to 2048 PGs 
before copying any data and it worked just fine this time. Thanks a lot, 
both of you.


Mvh.

Torkil


Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Torkil Svensgaard 
Sent: Friday, January 12, 2024 10:17 AM
To: Frédéric Nass
Cc: ceph-users@ceph.io; Ruben Vestergaard
Subject: [ceph-users] Re: 3 DC with 4+5 EC not quite working



On 12-01-2024 09:35, Frédéric Nass wrote:


Hello Torkil,


Hi Frédéric


We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with the 
below rule:


rule ec54 {
  id 3
  type erasure
  min_size 3
  max_size 9
  step set_chooseleaf_tries 5
  step set_choose_tries 100
  step take default class hdd
  step choose indep 0 type datacenter
  step chooseleaf indep 3 type host
  step emit
}

Works fine. The only difference I see with your EC rule is the fact that we set 
min_size and max_size but I doubt this has anything to do with your situation.


Great, thanks. I wonder if we might need to tweak the min_size. I think
I tried lowering it to no avail and then set it back to 5 after editing
the crush rule.


Since the cluster still complains about "Pool cephfs.hdd.data has 1024 placement groups, should have 
2048", did you run "ceph osd pool set cephfs.hdd.data pgp_num 2048" right after running 
"ceph osd pool set cephfs.hdd.data pg_num 2048"? [1]

Might be that the pool still has 1024 PGs.


Hmm coming from RHCS we didn't do this as:

"
RHCS 4.x and 5.x does not require the pgp_num value to be set. This will
be done by ceph-mgr automatically. For RHCS 4.x and 5.x, only the pg_num
is required to be incremented for the necessary pools.
"

So I only did "ceph osd pool set cephfs.hdd.data pgp_num 2048" and let
the mgr handle the rest. I had a watch running to see how it went and
the pool was up to something like 1922 PGs when it got stuck.

As I read the documentation[1] this shouldn't get us stuck like we did,
but we would have to set the pgp_num eventually to get it to rebalance?

Mvh.

Torkil

[1]
https://docs.ceph.com/en/quincy/rados/operations/placement-groups/#setting-the-number-of-pgs




Regards,
Frédéric.

[1] 
https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups



-Message original-

De: Torkil 
à: ceph-users 
Cc: Ruben 
Envoyé: vendredi 12 janvier 2024 09:00 CET
Sujet : [ceph-users] 3 DC with 4+5 EC not quite working

We are looking to create a 3 datacenter 4+5 erasure coded pool but can't
quite get it to work. Ceph version 17.2.7. These are the hosts (there
will eventually be 6 hdd hosts in each datacenter):

-33 886.00842 datacenter 714
-7 209.93135 host ceph-hdd1

-69 69.86389 host ceph-flash1
-6 188.09579 host ceph-hdd2

-3 233.57649 host ceph-hdd3

-12 184.54091 host ceph-hdd4
-34 824.47168 datacenter DCN
-73 69.86389 host ceph-flash2
-2 201.78067 host ceph-hdd5

-81 288.26501 host ceph-hdd6

-31 264.56207 host ceph-hdd7

-36 1284.48621 datacenter TBA
-77 69.86389 host ceph-flash3
-21 190.83224 host ceph-hdd8

-29 199.08838 host ceph-hdd9

-11 193.85382 host ceph-hdd10

-9 237.28154 host ceph-hdd11

-26 187.19536 host ceph-hdd12

-4 206.37102 host ceph-hdd13

We did this:

ceph osd erasure-code-profile set DRCMR_k4m5_datacenter_hdd
plugin=jerasure k=4 m=5 technique=reed_sol_van crush-root=default
crush-failure-domain=datacenter crush-device-class=hdd

ceph osd pool create cephfs.hdd.data erasure DRCMR_k4m5_datacenter_hdd
ceph osd pool set cephfs.hdd.data allow_ec_overwrites true
ceph osd pool set cephfs.hdd.data pg_autoscale_mode warn

Didn't quite work:

"
[WARN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg
incomplete
pg 33.0 is creating+incomplete, acting
[104,219,NONE,NONE,NONE,41,NONE,NONE,NONE] (reducing pool
cephfs.hdd.data min_size from 5 may help; search ceph.com/docs for
'incomplete')
"

I then manually changed the crush rule from this:

"
rule cephfs.hdd.data {
id 7
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type datacenter
step emit
}
"

To this:

"
rule cephfs.hdd.data {
id 7
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type datacenter
step chooseleaf indep 3 type host
step emit
}
"

Based on some testing and dialogue I had with Red Hat support last year
when we were on RHCS, and it seemed to work. Then:

ceph fs add_data_pool cephf

[ceph-users] Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-12 Thread Chris Palmer
I was delighted to see the native Debian 12 (bookworm) packages turn up 
in Reef 18.2.1.


We currently run a number of ceph clusters on Debian11 (bullseye) / 
Quincy 17.2.7. These are not cephadm-managed.


I have attempted to upgrade a test cluster, and it is not going well. 
Quincy only supports bullseye, and Reef only supports bookworm, we are 
reinstalling from bare metal. However I don't think either of these two 
problems are related to that.


Problem 1
--

A simple "apt install ceph" goes most of the way, then errors with

Setting up cephadm (18.2.1-1~bpo12+1) ...
usermod: unlocking the user's password would result in a passwordless 
account.

You should set a password with usermod -p to unlock this user's password.
mkdir: cannot create directory ‘/home/cephadm/.ssh’: No such file or 
directory

dpkg: error processing package cephadm (--configure):
 installed cephadm package post-installation script subprocess returned 
error exit status 1

dpkg: dependency problems prevent configuration of ceph-mgr-cephadm:
 ceph-mgr-cephadm depends on cephadm; however:
  Package cephadm is not configured yet.

dpkg: error processing package ceph-mgr-cephadm (--configure):
 dependency problems - leaving unconfigured


The two cephadm-related packages are then left in an error state, which 
apt tries to continue each time it is run.


The cephadm user has a login directory of /nonexistent, however the 
cephadm --configure script is trying to use /home/cephadm (as it was on 
Quincy/bullseye).


So, we aren't using cephadm, and decided to keep going as the other 
packages were actually installed, and deal with the package state later.


Problem 2
---

I upgraded 2/3 monitor nodes without any other problems, and (for the 
moment) removed the other Quincy monitor prior to rebuild.


I then shutdown the remaining Quincy manager, and attempted to start the 
Reef manager. Although the manager is running, "ceph mgr services" shows 
it is only providing the restful and not the dashboard service. The log 
file has lots of the following error:


ImportError: PyO3 modules may only be initialized once per interpreter 
process


and ceph -s reports "Module 'dashboard' has failed dependency: PyO3 
modules may only be initialized once per interpreter process



Questions
---

1. Have the Reef/bookworm packages ever been tested in a non-cephadm 
environment?
2. I want to revert this cluster back to a fully functional state. I 
cannot bring back up the remaining Quincy monitor though ("require 
release 18 > 17"). Would I have to go through the procedure of starting 
over, and trying to rescue the monmap from the OSDs? (OSDs and an active 
MDS are still up and running Quincy). I'm aware that process exists but 
have never had to delve into it.



Thanks, Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-12 Thread Chris Palmer

More info on problem 2:

When starting the dashboard, the mgr seems to try to initialise cephadm, 
which in turn uses python crypto libraries that lead to the python error:


$ ceph crash info 
2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52

{
    "backtrace": [
    "  File \"/usr/share/ceph/mgr/cephadm/__init__.py\", line 1, in 
\n    from .module import CephadmOrchestrator",
    "  File \"/usr/share/ceph/mgr/cephadm/module.py\", line 15, in 
\n    from cephadm.service_discovery import ServiceDiscovery",
    "  File \"/usr/share/ceph/mgr/cephadm/service_discovery.py\", 
line 20, in \n    from cephadm.ssl_cert_utils import SSLCerts",
    "  File \"/usr/share/ceph/mgr/cephadm/ssl_cert_utils.py\", line 
8, in \n    from cryptography import x509",
    "  File 
\"/lib/python3/dist-packages/cryptography/x509/__init__.py\", line 6, in 
\n    from cryptography.x509 import certificate_transparency",
    "  File 
\"/lib/python3/dist-packages/cryptography/x509/certificate_transparency.py\", 
line 10, in \n    from cryptography.hazmat.bindings._rust import 
x509 as rust_x509",
    "ImportError: PyO3 modules may only be initialized once per 
interpreter process"

    ],
    "ceph_version": "18.2.1",
    "crash_id": 
"2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52",

    "entity_name": "mgr.x01",
    "mgr_module": "cephadm",
    "mgr_module_caller": "PyModule::load_subclass_of",
    "mgr_python_exception": "ImportError",
    "os_id": "12",
    "os_name": "Debian GNU/Linux 12 (bookworm)",
    "os_version": "12 (bookworm)",
    "os_version_id": "12",
    "process_name": "ceph-mgr",
    "stack_sig": 
"7815ad73ced094695056319d1241bf7847da19b4b0dfee7a216407b59a7e3d84",

    "timestamp": "2024-01-12T11:10:03.938478Z",
    "utsname_hostname": "x01.xxx.xxx",
    "utsname_machine": "x86_64",
    "utsname_release": "6.1.0-17-amd64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 
(2023-12-30)"

}


On 12/01/2024 12:39, Chris Palmer wrote:
I was delighted to see the native Debian 12 (bookworm) packages turn 
up in Reef 18.2.1.


We currently run a number of ceph clusters on Debian11 (bullseye) / 
Quincy 17.2.7. These are not cephadm-managed.


I have attempted to upgrade a test cluster, and it is not going well. 
Quincy only supports bullseye, and Reef only supports bookworm, we are 
reinstalling from bare metal. However I don't think either of these 
two problems are related to that.


Problem 1
--

A simple "apt install ceph" goes most of the way, then errors with

Setting up cephadm (18.2.1-1~bpo12+1) ...
usermod: unlocking the user's password would result in a passwordless 
account.

You should set a password with usermod -p to unlock this user's password.
mkdir: cannot create directory ‘/home/cephadm/.ssh’: No such file or 
directory

dpkg: error processing package cephadm (--configure):
 installed cephadm package post-installation script subprocess 
returned error exit status 1

dpkg: dependency problems prevent configuration of ceph-mgr-cephadm:
 ceph-mgr-cephadm depends on cephadm; however:
  Package cephadm is not configured yet.

dpkg: error processing package ceph-mgr-cephadm (--configure):
 dependency problems - leaving unconfigured


The two cephadm-related packages are then left in an error state, 
which apt tries to continue each time it is run.


The cephadm user has a login directory of /nonexistent, however the 
cephadm --configure script is trying to use /home/cephadm (as it was 
on Quincy/bullseye).


So, we aren't using cephadm, and decided to keep going as the other 
packages were actually installed, and deal with the package state later.


Problem 2
---

I upgraded 2/3 monitor nodes without any other problems, and (for the 
moment) removed the other Quincy monitor prior to rebuild.


I then shutdown the remaining Quincy manager, and attempted to start 
the Reef manager. Although the manager is running, "ceph mgr services" 
shows it is only providing the restful and not the dashboard service. 
The log file has lots of the following error:


ImportError: PyO3 modules may only be initialized once per interpreter 
process


and ceph -s reports "Module 'dashboard' has failed dependency: PyO3 
modules may only be initialized once per interpreter process



Questions
---

1. Have the Reef/bookworm packages ever been tested in a non-cephadm 
environment?
2. I want to revert this cluster back to a fully functional state. I 
cannot bring back up the remaining Quincy monitor though ("require 
release 18 > 17"). Would I have to go through the procedure of 
starting over, and trying to rescue the monmap from the OSDs? (OSDs 
and an active MDS are still up and running Quincy). I'm aware that 
process exists but have never had to delve into it.



Thanks, Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe

[ceph-users] recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-12 Thread Drew Weaver
Hello,

So we were going to replace a Ceph cluster with some hardware we had laying 
around using SATA HBAs but I was told that the only right way to build Ceph in 
2023 is with direct attach NVMe.

Does anyone have any recommendation for a 1U barebones server (we just drop in 
ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the 
motherboard without a bridge or HBA for Ceph specifically?

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Unable to locate "bluestore_compressed_allocated" & "bluestore_compressed_original" parameters while executing "ceph daemon osd.X perf dump" command.

2024-01-12 Thread Alam Mohammad
Hi,

We are considering BlueStore compression test in our cluster. For this we have 
created rbd image on our EC pool.

While we are executing "ceph daemon osd.X perf dump | grep -E 
'(compress_.*_count|bluestore_compressed_)'", we are not locate below 
parameters, even we tried with ceph tell command.
"bluestore_compressed_allocated"
"bluestore_compressed_original"
As a result, we are unable to determine the extent of compression for specific 
RBD images. 

Are there any specific configurations required to locate the given parameters 
or any alternate method to assess Bluestore compression.
Any guidance or insight would be greatly appreciated.

Thanks
Mohammad Saif
Ceph Enthusiast
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Anthony D'Atri


> On Jan 12, 2024, at 03:31, Phong Tran Thanh  wrote:
> 
> Hi Yang and Anthony,
> 
> I found the solution for this problem on a HDD disk 7200rpm
> 
> When the cluster recovers, one or multiple disk failures because slowop
> appears and then affects the cluster, we can change these configurations
> and may reduce IOPS when recovery.
> osd_mclock_profile=custom
> osd_mclock_scheduler_background_recovery_lim=0.2
> osd_mclock_scheduler_background_recovery_res=0.2
> osd_mclock_scheduler_client_wgt

This got cut off.  What value are you using for wgt?

And how are you setting these?

With 17.2.5 I get

[rook@rook-ceph-tools-5ff8d58445-gkl5w /]$ ceph config set osd 
osd_mclock_scheduler_background_recovery_res 0.2
Error EINVAL: error parsing value: strict_si_cast: unit prefix not recognized

but with 17.2.6 it works.

The wording isn't clear but I suspect this is a function of 
https://tracker.ceph.com/issues/57533

> 
> 
> Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang 
> đã viết:
> 
>> The 2*10Gbps shared network seems to be full (1.9GB/s).
>> Is it possible to reduce part of the workload and wait for the cluster
>> to return to a healthy state?
>> Tip: Erasure coding needs to collect all data blocks when recovering
>> data, so it takes up a lot of network card bandwidth and processor
>> resources.
>> 
> 
> 
> -- 
> Trân trọng,
> 
> 
> *Tran Thanh Phong*
> 
> Email: tranphong...@gmail.com
> Skype: tranphong079
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-12 Thread Özkan Göksu
Hello.

I have 5 node ceph cluster and I'm constantly having "clients failing to
respond to cache pressure" warning.

I have 84 cephfs kernel clients (servers) and my users are accessing their
personal subvolumes  located on one pool.

My users are software developers and the data is home and user data. (Git,
python projects, sample data and generated new data)

-
--- RAW STORAGE ---
CLASS SIZEAVAILUSED  RAW USED  %RAW USED
ssd146 TiB  101 TiB  45 TiB45 TiB  30.71
TOTAL  146 TiB  101 TiB  45 TiB45 TiB  30.71

--- POOLS ---
POOL ID   PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
.mgr  1 1  356 MiB   90  1.0 GiB  0 30 TiB
cephfs.ud-data.meta   9   256   69 GiB3.09M  137 GiB   0.15 45 TiB
cephfs.ud-data.data  10  2048   26 TiB  100.83M   44 TiB  32.97 45 TiB
-
root@ud-01:~# ceph fs status
ud-data - 84 clients
===
RANK  STATE   MDS  ACTIVITY DNSINOS   DIRS
CAPS
 0active  ud-data.ud-04.seggyv  Reqs:  142 /s  2844k  2798k   303k
720k
POOL   TYPE USED  AVAIL
cephfs.ud-data.meta  metadata   137G  44.9T
cephfs.ud-data.datadata44.2T  44.9T
STANDBY MDS
ud-data.ud-02.xcoojt
ud-data.ud-05.rnhcfe
ud-data.ud-03.lhwkml
ud-data.ud-01.uatjle
MDS version: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
quincy (stable)

---
My MDS settings are below:

mds_cache_memory_limit| 8589934592
mds_cache_trim_threshold  | 524288
mds_recall_global_max_decay_threshold | 131072
mds_recall_max_caps   | 3
mds_recall_max_decay_rate | 1.50
mds_recall_max_decay_threshold| 131072
mds_recall_warning_threshold  | 262144


I have 2 questions:
1- What should I do to prevent cache pressue warning ?
2- What can I do to increase speed ?

- Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Only change it with a custom profile, no with built-in profiles, i am
configuring it from ceph dashboard.

osd_mclock_scheduler_client_wgt=6 -> this is my setting

Vào Th 7, 13 thg 1, 2024 vào lúc 02:19 Anthony D'Atri 
đã viết:

>
>
> > On Jan 12, 2024, at 03:31, Phong Tran Thanh 
> wrote:
> >
> > Hi Yang and Anthony,
> >
> > I found the solution for this problem on a HDD disk 7200rpm
> >
> > When the cluster recovers, one or multiple disk failures because slowop
> > appears and then affects the cluster, we can change these configurations
> > and may reduce IOPS when recovery.
> > osd_mclock_profile=custom
> > osd_mclock_scheduler_background_recovery_lim=0.2
> > osd_mclock_scheduler_background_recovery_res=0.2
> > osd_mclock_scheduler_client_wgt
>
> This got cut off.  What value are you using for wgt?
>
> And how are you setting these?
>
> With 17.2.5 I get
>
> [rook@rook-ceph-tools-5ff8d58445-gkl5w /]$ ceph config set osd
> osd_mclock_scheduler_background_recovery_res 0.2
> Error EINVAL: error parsing value: strict_si_cast: unit prefix not
> recognized
>
> but with 17.2.6 it works.
>
> The wording isn't clear but I suspect this is a function of
> https://tracker.ceph.com/issues/57533
>
> >
> >
> > Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang  >
> > đã viết:
> >
> >> The 2*10Gbps shared network seems to be full (1.9GB/s).
> >> Is it possible to reduce part of the workload and wait for the cluster
> >> to return to a healthy state?
> >> Tip: Erasure coding needs to collect all data blocks when recovering
> >> data, so it takes up a lot of network card bandwidth and processor
> >> resources.
> >>
> >
> >
> > --
> > Trân trọng,
> >
> 
> >
> > *Tran Thanh Phong*
> >
> > Email: tranphong...@gmail.com
> > Skype: tranphong079
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io