date:20240305

[ceph-users] Re: reef 18.2.2 (hot-fix) QE validation status

2024-03-05 Thread Venky Shankar

+Patrick Donnelly

On Tue, Mar 5, 2024 at 9:18 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/64721#note-1
> Release Notes - TBD
> LRC upgrade - TBD
>
> Seeking approvals/reviews for:
>
> smoke - in progress
> rados - Radek, Laura?
> quincy-x - in progress

I think

https://github.com/ceph/ceph/pull/55669

was supposed to be included in this hotfix (I recall Patrick
mentioning this in last week's CLT). The change was merged into reef
head last week.

>
> Also need approval from Travis, Redouane for Prometheus fix testing.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: reef 18.2.2 (hot-fix) QE validation status

2024-03-05 Thread Venky Shankar

Hi Laura,

On Wed, Mar 6, 2024 at 4:53 AM Laura Flores  wrote:

> Here are the rados and smoke suite summaries.
>
> @Radoslaw Zarzynski , @Adam King 
> , @Nizamudeen A , mind having a look to ensure the
> results from the rados suite look good to you?
>
>  @Venky Shankar  mind having a look at the smoke
> suite? There was a resurgence of https://tracker.ceph.com/issues/57206. I
> don't see this as a blocker to the hotfix release, but LMK your thoughts.
>

Yeh, that's not a blocker. The backport fix can be included for the next
release.


>
> *rados*
> -
> https://pulpito.ceph.com/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi
> -
> https://pulpito.ceph.com/yuriw-2024-03-05_01:29:36-rados-reef-release-distro-default-smithi
>
> Failures, unrelated:
> 1. https://tracker.ceph.com/issues/64725 -- new tracker, but known
> issue
> 2. https://tracker.ceph.com/issues/61774
> 3. https://tracker.ceph.com/issues/49287
> 4. https://tracker.ceph.com/issues/55141
> 5. https://tracker.ceph.com/issues/59142
> 6. https://tracker.ceph.com/issues/64726 -- new tracker
> 7. https://tracker.ceph.com/issues/62992
> 8. https://tracker.ceph.com/issues/64208
>
> Details:
> 1. rados/singleton: application not enabled on pool 'rbd' - Ceph -
> RADOS
> 2. centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak
> in mons - Ceph - RADOS
> 3. podman: setting cgroup config for procHooks process caused: Unit
> libpod-$hash.scope not found - Ceph - Orchestrator
> 4. thrashers/fastread: assertion failure: rollback_info_trimmed_to ==
> head - Ceph - RADOS
> 5. mgr/dashboard: fix e2e for dashboard v3 - Ceph - Mgr - Dashboard
> 6. LibRadosAioEC.MultiWritePP hang and pkill - Ceph - RADOS
> 7. Heartbeat crash in reset_timeout and clear_timeout - Ceph - RADOS
> 8. test_cephadm.sh: Container version mismatch causes job to fail. -
> Ceph - Orchestrator
>
> *smoke*
> -
> https://pulpito.ceph.com/yuriw-2024-03-05_15:31:54-smoke-reef-release-distro-default-smithi/
>
> Failures, unrelated:
> 1. https://tracker.ceph.com/issues/52624
> 2. https://tracker.ceph.com/issues/57206
> 3. https://tracker.ceph.com/issues/64727 -- new tracker
>
> Details:
> 1. qa: "Health check failed: Reduced data availability: 1 pg peering
> (PG_AVAILABILITY)" - Ceph
> 2. ceph_test_libcephfs_reclaim crashes during test - Ceph - CephFS
> 3. suites/dbench.sh: Socket exception: No route to host (113) -
> Infrastructure
>
> On Tue, Mar 5, 2024 at 3:05 PM Yuri Weinstein  wrote:
>
>> Only suits below need approval:
>>
>> smoke - Radek, Laura?
>> rados - Radek, Laura?
>>
>> We are also in the process of upgrading gibba and then LRC into 18.2.2 RC
>>
>> On Tue, Mar 5, 2024 at 7:47 AM Yuri Weinstein 
>> wrote:
>> >
>> > Details of this release are summarized here:
>> >
>> > https://tracker.ceph.com/issues/64721#note-1
>> > Release Notes - TBD
>> > LRC upgrade - TBD
>> >
>> > Seeking approvals/reviews for:
>> >
>> > smoke - in progress
>> > rados - Radek, Laura?
>> > quincy-x - in progress
>> >
>> > Also need approval from Travis, Redouane for Prometheus fix testing.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>

-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Cluster Config File Locations?

2024-03-05 Thread Eugen Block


Hi,

I've checked, checked, and checked again that the individual config  
files all point towards the correct ip subnet for the monitors, and  
I cannot find any trace of the old subnet's ip address in any config  
file (that I can find).


what are those "individual config files"? The ones underneath  
/var/lib/ceph/{FSID}/mgr.{MGR}/config? Did you also look in the config  
store? I'd try something like:


ceph config dump | grep "192\.168\."  (or whatever your IP range was)
ceph config get mgr public_network  (just in case you accidentally used that)
ceph config get mon public_network  (does it match your actual setup?)

Could it be possible that you're looking at the wrong MGRs you're  
trying to start? Maybe from earlier failed attempts or something? Does  
'cephadm ls --no-detail | grep mgr' on all hosts reveal more than the  
MGRs you expect?


One possible and relatively quick manual workaround would be to set up  
a MGR the legacy way [1] which basically is to add a keyring (that  
should work if the MONs have a quorum) and start the daemon:


ceph-mgr -i $name

Note that you'll need the respective package ceph-mgr on that host.  
You could then convert it with cephadm. But maybe it's not necessary  
if you get the existing containers up.


[1] https://docs.ceph.com/en/nautilus/mgr/administrator/#manual-setup

Zitat von duluxoz :


Hi All,

I don't know how its happened (bad backup/restore, bad config file  
somewhere, I don't know) but my (DEV) Ceph Cluster is in a very bad  
state, and I'm looking for pointers/help in getting it back running  
(unfortunate, a complete rebuild/restore is *not* an option).


This is on Ceph Reef (on Rocky 9) which was converted to CephAdm  
from a manual install a few weeks ago (which worked). Five days ago  
everything when "t!ts-up" (an Ozzie technical ICT term meaning  
nothing works :-)   )


So, my (first?) issue is that I can't get any Managers to come up  
clean. Each one tries to connect on an ip subnet which doesn't exist  
any longer and hasn't for a couple of years.


The second issue is that (possible because of the first) every `ceph  
orch` command just hangs. Cephadm commands work fine.


I've checked, checked, and checked again that the individual config  
files all point towards the correct ip subnet for the monitors, and  
I cannot find any trace of the old subnet's ip address in any config  
file (that I can find).


For the record I am *not* a "podman guy" so there may be something  
there that's causing my issue(s?) but I don't know where to even  
begin to look.


Any/all logs simply start that the Manager(s) try to come up, can't  
find an address in the "old" subnet, and so fail - nothing else  
helpful (at least to me).


I've even pulled a copy of the monmap and its showing the correct ip  
subnet addresses for the monitors.


The firewalls are all set correctly and an audit2allow shows nothing  
is out of place, as does disabling SELinux (ie no change).


A `ceph -s` shows I've got no active managers (and that a monitor is  
down - that's my third issue), plus a whole bunch of osds and pgs  
aren't happy either. I have, though, got a monitor quorum.


So, what should I be looking at / where should I be looking? Any  
help is greatly *greatly* appreciated.


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Monitoring Ceph Bucket and overall ceph cluster remaining space

2024-03-05 Thread Konstantin Shalygin

Hi,

Don't aware about what is SW, but if this software works with Prometheus 
metrics format - why not. Anyway the exporters are open source, you can modify 
the existing code for your environment


k

Sent from my iPhone

> On 6 Mar 2024, at 07:58, Michael Worsham  wrote:
> 
> This looks interesting, but instead of Prometheus, could the data be exported 
> for SolarWinds?
> 
> The intent is to have SW watch the available storage space allocated and then 
> to alert when a certain threshold is reached (75% remaining for a warning; 
> 95% remaining for a critical).
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to build ceph without QAT?

2024-03-05 Thread Feng, Hualong

Hi Dongchuan

Could I know which version or which commit that you are building and your 
environment: system, CPU, kernel?

./do_cmake.sh -DCMAKE_BUILD_TYPE=RelWithDebInfo   this command should be OK 
without QAT.

Thanks
-Hualong

> -Original Message-
> From: 张东川 
> Sent: Wednesday, March 6, 2024 9:51 AM
> To: ceph-users 
> Subject: [ceph-users] How to build ceph without QAT?
> 
> Hi guys,
> 
> 
> I tried both following commands.
> Neither of them worked.
> 
> 
> "./do_cmake.sh -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_QAT=OFF
> -DWITH_QATDRV=OFF -DWITH_QATZIP=OFF"
> "ARGS="-DWITH_QAT=OFF -DWITH_QATDRV=OFF -
> DWITH_QATZIP=OFF" ./do_cmake.sh -
> DCMAKE_BUILD_TYPE=RelWithDebInfo"
> 
> 
> I still see errors like:
> make[1]: *** [Makefile:4762:
> quickassist/lookaside/access_layer/src/sample_code/performance/framew
> ork/linux/user_space/cpa_sample_code-cpa_sample_code_utils.o] Error 1
> 
> 
> 
> 
> So what's the proper way to configure build commands?
> Thanks a lot.
> 
> 
> Best Regards,
> Dongchuan
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
> to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: reef 18.2.2 (hot-fix) QE validation status

2024-03-05 Thread Nizamudeen A

for dashboard, I see 1 failure, 2 dead and 2 passed jobs. The failed e2e is
something we fixed a while ago. not sure why it's broken again.
but if it's recurring, we'll have a look. In any case it's not a blocker.

On Wed, Mar 6, 2024 at 4:53 AM Laura Flores  wrote:

> Here are the rados and smoke suite summaries.
>
> @Radoslaw Zarzynski , @Adam King 
> , @Nizamudeen A , mind having a look to ensure the
> results from the rados suite look good to you?
>
>  @Venky Shankar  mind having a look at the smoke
> suite? There was a resurgence of https://tracker.ceph.com/issues/57206. I
> don't see this as a blocker to the hotfix release, but LMK your thoughts.
>
> *rados*
> -
> https://pulpito.ceph.com/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi
> -
> https://pulpito.ceph.com/yuriw-2024-03-05_01:29:36-rados-reef-release-distro-default-smithi
>
> Failures, unrelated:
> 1. https://tracker.ceph.com/issues/64725 -- new tracker, but known
> issue
> 2. https://tracker.ceph.com/issues/61774
> 3. https://tracker.ceph.com/issues/49287
> 4. https://tracker.ceph.com/issues/55141
> 5. https://tracker.ceph.com/issues/59142
> 6. https://tracker.ceph.com/issues/64726 -- new tracker
> 7. https://tracker.ceph.com/issues/62992
> 8. https://tracker.ceph.com/issues/64208
>
> Details:
> 1. rados/singleton: application not enabled on pool 'rbd' - Ceph -
> RADOS
> 2. centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak
> in mons - Ceph - RADOS
> 3. podman: setting cgroup config for procHooks process caused: Unit
> libpod-$hash.scope not found - Ceph - Orchestrator
> 4. thrashers/fastread: assertion failure: rollback_info_trimmed_to ==
> head - Ceph - RADOS
> 5. mgr/dashboard: fix e2e for dashboard v3 - Ceph - Mgr - Dashboard
> 6. LibRadosAioEC.MultiWritePP hang and pkill - Ceph - RADOS
> 7. Heartbeat crash in reset_timeout and clear_timeout - Ceph - RADOS
> 8. test_cephadm.sh: Container version mismatch causes job to fail. -
> Ceph - Orchestrator
>
> *smoke*
> -
> https://pulpito.ceph.com/yuriw-2024-03-05_15:31:54-smoke-reef-release-distro-default-smithi/
>
> Failures, unrelated:
> 1. https://tracker.ceph.com/issues/52624
> 2. https://tracker.ceph.com/issues/57206
> 3. https://tracker.ceph.com/issues/64727 -- new tracker
>
> Details:
> 1. qa: "Health check failed: Reduced data availability: 1 pg peering
> (PG_AVAILABILITY)" - Ceph
> 2. ceph_test_libcephfs_reclaim crashes during test - Ceph - CephFS
> 3. suites/dbench.sh: Socket exception: No route to host (113) -
> Infrastructure
>
> On Tue, Mar 5, 2024 at 3:05 PM Yuri Weinstein  wrote:
>
>> Only suits below need approval:
>>
>> smoke - Radek, Laura?
>> rados - Radek, Laura?
>>
>> We are also in the process of upgrading gibba and then LRC into 18.2.2 RC
>>
>> On Tue, Mar 5, 2024 at 7:47 AM Yuri Weinstein 
>> wrote:
>> >
>> > Details of this release are summarized here:
>> >
>> > https://tracker.ceph.com/issues/64721#note-1
>> > Release Notes - TBD
>> > LRC upgrade - TBD
>> >
>> > Seeking approvals/reviews for:
>> >
>> > smoke - in progress
>> > rados - Radek, Laura?
>> > quincy-x - in progress
>> >
>> > Also need approval from Travis, Redouane for Prometheus fix testing.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Monitoring Ceph Bucket and overall ceph cluster remaining space

2024-03-05 Thread Michael Worsham

This looks interesting, but instead of Prometheus, could the data be exported 
for SolarWinds?

The intent is to have SW watch the available storage space allocated and then 
to alert when a certain threshold is reached (75% remaining for a warning; 95% 
remaining for a critical).

-- Michael

From: Konstantin Shalygin 
Sent: Tuesday, March 5, 2024 11:17:10 PM
To: Michael Worsham 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Monitoring Ceph Bucket and overall ceph cluster 
remaining space

This is an external email. Please take care when clicking links or opening 
attachments. When in doubt, check with the Help Desk or Security.

Hi,

For RGW usage statistics you can use radosgw_usage_exporter [1]


k
[1] https://github.com/blemmenes/radosgw_usage_exporter

Sent from my iPhone

On 6 Mar 2024, at 00:21, Michael Worsham  wrote:

Is there an easy way to poll the ceph cluster buckets in a way to see how much 
space is remaining? And is it possible to see how much ceph cluster space is 
remaining overall? I am trying to extract the data from our  Ceph cluster and 
put it into a format that our SolarWinds can understand in whole number 
integers, so we can monitor bucket allocated space and overall cluster space in 
the cluster as a whole.

Via Canonical support, the said I can do something like "sudo ceph df -f 
json-pretty" to pull the information, but what is it I need to look at from the 
output (see below) to display over to SolarWinds?

{
"stats": {
"total_bytes": 960027263238144,
"total_avail_bytes": 403965214187520,
"total_used_bytes": 556062049050624,
"total_used_raw_bytes": 556062049050624,
"total_used_raw_ratio": 0.57921481132507324,
"num_osds": 48,
"num_per_pool_osds": 48,
"num_per_pool_omap_osds": 48
},
"stats_by_class": {
"ssd": {
"total_bytes": 960027263238144,
"total_avail_bytes": 403965214187520,
"total_used_bytes": 556062049050624,
"total_used_raw_bytes": 556062049050624,
"total_used_raw_ratio": 0.57921481132507324
}
},

And a couple of data pools...
{
"name": "default.rgw.jv-va-pool.data",
"id": 65,
"stats": {
"stored": 4343441915904,
"objects": 17466616,
"kb_used": 12774490932,
"bytes_used": 13081078714368,
"percent_used": 0.053900588303804398,
"max_avail": 76535973281792
}
},
{
"name": "default.rgw.jv-va-pool.index",
"id": 66,
"stats": {
"stored": 42533675008,
"objects": 401,
"kb_used": 124610380,
"bytes_used": 127601028363,
"percent_used": 0.00055542576592415571,
"max_avail": 76535973281792
}
},
This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Monitoring Ceph Bucket and overall ceph cluster remaining space

2024-03-05 Thread Konstantin Shalygin

Hi, 

For RGW usage statistics you can use radosgw_usage_exporter [1]


k
[1] https://github.com/blemmenes/radosgw_usage_exporter

Sent from my iPhone

> On 6 Mar 2024, at 00:21, Michael Worsham  wrote:
> Is there an easy way to poll the ceph cluster buckets in a way to see how 
> much space is remaining? And is it possible to see how much ceph cluster 
> space is remaining overall? I am trying to extract the data from our  Ceph 
> cluster and put it into a format that our SolarWinds can understand in whole 
> number integers, so we can monitor bucket allocated space and overall cluster 
> space in the cluster as a whole.
> 
> Via Canonical support, the said I can do something like "sudo ceph df -f 
> json-pretty" to pull the information, but what is it I need to look at from 
> the output (see below) to display over to SolarWinds?
> 
> {
> "stats": {
> "total_bytes": 960027263238144,
> "total_avail_bytes": 403965214187520,
> "total_used_bytes": 556062049050624,
> "total_used_raw_bytes": 556062049050624,
> "total_used_raw_ratio": 0.57921481132507324,
> "num_osds": 48,
> "num_per_pool_osds": 48,
> "num_per_pool_omap_osds": 48
> },
> "stats_by_class": {
> "ssd": {
> "total_bytes": 960027263238144,
> "total_avail_bytes": 403965214187520,
> "total_used_bytes": 556062049050624,
> "total_used_raw_bytes": 556062049050624,
> "total_used_raw_ratio": 0.57921481132507324
> }
> },
> 
> And a couple of data pools...
> {
> "name": "default.rgw.jv-va-pool.data",
> "id": 65,
> "stats": {
> "stored": 4343441915904,
> "objects": 17466616,
> "kb_used": 12774490932,
> "bytes_used": 13081078714368,
> "percent_used": 0.053900588303804398,
> "max_avail": 76535973281792
> }
> },
> {
> "name": "default.rgw.jv-va-pool.index",
> "id": 66,
> "stats": {
> "stored": 42533675008,
> "objects": 401,
> "kb_used": 124610380,
> "bytes_used": 127601028363,
> "percent_used": 0.00055542576592415571,
> "max_avail": 76535973281792
> }
> },
> This message and its attachments are from Data Dimensions and are intended 
> only for the use of the individual or entity to which it is addressed, and 
> may contain information that is privileged, confidential, and exempt from 
> disclosure under applicable law. If the reader of this message is not the 
> intended recipient, or the employee or agent responsible for delivering the 
> message to the intended recipient, you are hereby notified that any 
> dissemination, distribution, or copying of this communication is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender immediately and permanently delete the original email and destroy 
> any copies or printouts of this email as well as any attachments.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] How to build ceph without QAT?

2024-03-05 Thread 张东川

Hi guys,


I tried both following commands.
Neither of them worked.


"./do_cmake.sh -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_QAT=OFF 
-DWITH_QATDRV=OFF -DWITH_QATZIP=OFF"
"ARGS="-DWITH_QAT=OFF -DWITH_QATDRV=OFF -DWITH_QATZIP=OFF" ./do_cmake.sh 
-DCMAKE_BUILD_TYPE=RelWithDebInfo"


I still see errors like:
make[1]: *** [Makefile:4762: 
quickassist/lookaside/access_layer/src/sample_code/performance/framework/linux/user_space/cpa_sample_code-cpa_sample_code_utils.o]
 Error 1




So what's the proper way to configure build commands?
Thanks a lot.


Best Regards,
Dongchuan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: reef 18.2.2 (hot-fix) QE validation status

2024-03-05 Thread Laura Flores

Here are the rados and smoke suite summaries.

@Radoslaw Zarzynski , @Adam King
, @Nizamudeen
A , mind having a look to ensure the results from the rados
suite look good to you?

 @Venky Shankar  mind having a look at the smoke
suite? There was a resurgence of https://tracker.ceph.com/issues/57206. I
don't see this as a blocker to the hotfix release, but LMK your thoughts.

*rados*
-
https://pulpito.ceph.com/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi
-
https://pulpito.ceph.com/yuriw-2024-03-05_01:29:36-rados-reef-release-distro-default-smithi

Failures, unrelated:
1. https://tracker.ceph.com/issues/64725 -- new tracker, but known issue
2. https://tracker.ceph.com/issues/61774
3. https://tracker.ceph.com/issues/49287
4. https://tracker.ceph.com/issues/55141
5. https://tracker.ceph.com/issues/59142
6. https://tracker.ceph.com/issues/64726 -- new tracker
7. https://tracker.ceph.com/issues/62992
8. https://tracker.ceph.com/issues/64208

Details:
1. rados/singleton: application not enabled on pool 'rbd' - Ceph - RADOS
2. centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak
in mons - Ceph - RADOS
3. podman: setting cgroup config for procHooks process caused: Unit
libpod-$hash.scope not found - Ceph - Orchestrator
4. thrashers/fastread: assertion failure: rollback_info_trimmed_to ==
head - Ceph - RADOS
5. mgr/dashboard: fix e2e for dashboard v3 - Ceph - Mgr - Dashboard
6. LibRadosAioEC.MultiWritePP hang and pkill - Ceph - RADOS
7. Heartbeat crash in reset_timeout and clear_timeout - Ceph - RADOS
8. test_cephadm.sh: Container version mismatch causes job to fail. -
Ceph - Orchestrator

*smoke*
-
https://pulpito.ceph.com/yuriw-2024-03-05_15:31:54-smoke-reef-release-distro-default-smithi/

Failures, unrelated:
1. https://tracker.ceph.com/issues/52624
2. https://tracker.ceph.com/issues/57206
3. https://tracker.ceph.com/issues/64727 -- new tracker

Details:
1. qa: "Health check failed: Reduced data availability: 1 pg peering
(PG_AVAILABILITY)" - Ceph
2. ceph_test_libcephfs_reclaim crashes during test - Ceph - CephFS
3. suites/dbench.sh: Socket exception: No route to host (113) -
Infrastructure

On Tue, Mar 5, 2024 at 3:05 PM Yuri Weinstein  wrote:

> Only suits below need approval:
>
> smoke - Radek, Laura?
> rados - Radek, Laura?
>
> We are also in the process of upgrading gibba and then LRC into 18.2.2 RC
>
> On Tue, Mar 5, 2024 at 7:47 AM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/64721#note-1
> > Release Notes - TBD
> > LRC upgrade - TBD
> >
> > Seeking approvals/reviews for:
> >
> > smoke - in progress
> > rados - Radek, Laura?
> > quincy-x - in progress
> >
> > Also need approval from Travis, Redouane for Prometheus fix testing.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: debian-reef_OLD?

2024-03-05 Thread Daniel Brown


Thank you! 



> On Mar 5, 2024, at 3:55 PM, Laura Flores  wrote:
> 
> Hi all,
> 
> The issue should be fixed, but please let us know if anything is still
> amiss.
> 
> Thanks,
> Laura
> 
> On Tue, Mar 5, 2024 at 9:59 AM Reed Dier  wrote:
> 
>> Given that both the debian and rpm paths have been appended with _OLD, and
>> this more recent post about 18.2.2 (hot-fix), it sounds like there is some
>> sort of issue with 18.2.1?
>> 
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LEYDHWAPZW7KOGH2OH4TOPVGAFMZPYYP/
>> 
>> Reed
>> 
>> On Mar 5, 2024, at 3:07 AM, Christian Rohmann 
>> wrote:
>> 
>> On 04.03.24 22:24, Daniel Brown wrote:
>> 
>> debian-reef/
>> 
>> Now appears to be:
>> 
>> debian-reef_OLD/
>> 
>> 
>> Could this have been  some sort of "release script" just messing up the
>> renaming / symlinking to the most recent stable?
>> 
>> 
>> 
>> Regards
>> 
>> 
>> Christian
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> 
> -- 
> 
> Laura Flores
> 
> She/Her/Hers
> 
> Software Engineer, Ceph Storage 
> 
> Chicago, IL
> 
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Monitoring Ceph Bucket and overall ceph cluster remaining space

2024-03-05 Thread Michael Worsham

Is there an easy way to poll the ceph cluster buckets in a way to see how much 
space is remaining? And is it possible to see how much ceph cluster space is 
remaining overall? I am trying to extract the data from our  Ceph cluster and 
put it into a format that our SolarWinds can understand in whole number 
integers, so we can monitor bucket allocated space and overall cluster space in 
the cluster as a whole.

Via Canonical support, the said I can do something like "sudo ceph df -f 
json-pretty" to pull the information, but what is it I need to look at from the 
output (see below) to display over to SolarWinds?

{
"stats": {
"total_bytes": 960027263238144,
"total_avail_bytes": 403965214187520,
"total_used_bytes": 556062049050624,
"total_used_raw_bytes": 556062049050624,
"total_used_raw_ratio": 0.57921481132507324,
"num_osds": 48,
"num_per_pool_osds": 48,
"num_per_pool_omap_osds": 48
},
"stats_by_class": {
"ssd": {
"total_bytes": 960027263238144,
"total_avail_bytes": 403965214187520,
"total_used_bytes": 556062049050624,
"total_used_raw_bytes": 556062049050624,
"total_used_raw_ratio": 0.57921481132507324
}
},

And a couple of data pools...
{
"name": "default.rgw.jv-va-pool.data",
"id": 65,
"stats": {
"stored": 4343441915904,
"objects": 17466616,
"kb_used": 12774490932,
"bytes_used": 13081078714368,
"percent_used": 0.053900588303804398,
"max_avail": 76535973281792
}
},
{
"name": "default.rgw.jv-va-pool.index",
"id": 66,
"stats": {
"stored": 42533675008,
"objects": 401,
"kb_used": 124610380,
"bytes_used": 127601028363,
"percent_used": 0.00055542576592415571,
"max_avail": 76535973281792
}
},
This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: reef 18.2.2 (hot-fix) QE validation status

2024-03-05 Thread Yuri Weinstein

Only suits below need approval:

smoke - Radek, Laura?
rados - Radek, Laura?

We are also in the process of upgrading gibba and then LRC into 18.2.2 RC

On Tue, Mar 5, 2024 at 7:47 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/64721#note-1
> Release Notes - TBD
> LRC upgrade - TBD
>
> Seeking approvals/reviews for:
>
> smoke - in progress
> rados - Radek, Laura?
> quincy-x - in progress
>
> Also need approval from Travis, Redouane for Prometheus fix testing.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: debian-reef_OLD?

2024-03-05 Thread Laura Flores

Hi all,

The issue should be fixed, but please let us know if anything is still
amiss.

Thanks,
Laura

On Tue, Mar 5, 2024 at 9:59 AM Reed Dier  wrote:

> Given that both the debian and rpm paths have been appended with _OLD, and
> this more recent post about 18.2.2 (hot-fix), it sounds like there is some
> sort of issue with 18.2.1?
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LEYDHWAPZW7KOGH2OH4TOPVGAFMZPYYP/
>
> Reed
>
> On Mar 5, 2024, at 3:07 AM, Christian Rohmann 
> wrote:
>
> On 04.03.24 22:24, Daniel Brown wrote:
>
> debian-reef/
>
> Now appears to be:
>
> debian-reef_OLD/
>
>
> Could this have been  some sort of "release script" just messing up the
> renaming / symlinking to the most recent stable?
>
>
>
> Regards
>
>
> Christian
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Number of pgs

2024-03-05 Thread Mark Nelson

There are both pros and cons to having more PGs.  Here are a couple of 
considerations:


Pros:
1) Better data distribution prior to balancing (and maybe after)
2) Fewer objects/data per PG
3) Lower per-PG lock contention

Cons:
1) Higher PG log memory usage until you hit the osd target unless you 
shorten the per-PG log length.*

2) More work for the mons
3) More work for the mgr per collection interval

* Counter-intuitively increasing the number of PGs in a replicated pool 
with simple objects may lower aggregate PG log memory consumption if 
there are other EC pools with complex objects being written to 
concurrently.  This is due to the global limit on PG log entries.  See: 
osd_target_pg_log_entries_per_osd


Mark

On 3/5/24 14:11, Nikolaos Dandoulakis wrote:

Hi Anthony,

Thank you very much for your input.

It is a mixture of HDDs and a few NVMe drives. The sizes of the HDDs vary 
between 8-18TB and `ceph osd df` reports 23-25 pgs for the small drives  50-55 
for the bigger ones.

Considering that the cluster is working fine, what would be the benefit of more 
pgs?

Best,
Nick

From: Anthony D'Atri 
Sent: 05 March 2024 19:54
To: Nikolaos Dandoulakis 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Number of pgs

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.
If you only have one pool of significant size, then your PG ratio is around 40 
.  IMHO too low.

If you're using HDDs I personally might set to 8192 ; if using NVMe SSDS 
arguably 16384 -- assuming that your OSD sizes are more or less close to each 
other.


`ceph osd df` will show toward the right how many PG replicas are on each OSD.

On Mar 5, 2024, at 14:50, Nikolaos Dandoulakis  wrote:

Hi Anthony,

I should have said, it’s replicated (3)

Best,
Nick

Sent from my phone, apologies for any typos!

From: Anthony D'Atri 
Sent: Tuesday, March 5, 2024 7:22:42 PM
To: Nikolaos Dandoulakis 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Number of pgs

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Replicated or EC?


On Mar 5, 2024, at 14:09, Nikolaos Dandoulakis  wrote:

Hi all,

Pretty sure not the first time you see a thread like this.

Our cluster consists of 12 nodes/153 OSDs/1.2 PiB used, 708 TiB /1.9 PiB avail

The data pool is 2048 pgs big exactly the same number as when the cluster 
started. We have no issues with the cluster, everything runs as expected and 
very efficiently. We support about 1000 clients. The question is should we 
increase the number of pgs? If you think so, what is the sensible number to go 
to? 4096? More?

I will eagerly await for your response.

Best,
Nick

P.S. Yes, autoscaler is off :)
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th' ann an Oilthigh 
Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Number of pgs

2024-03-05 Thread Nikolaos Dandoulakis

Hi Anthony,

Thank you very much for your input.

It is a mixture of HDDs and a few NVMe drives. The sizes of the HDDs vary 
between 8-18TB and `ceph osd df` reports 23-25 pgs for the small drives  50-55 
for the bigger ones.

Considering that the cluster is working fine, what would be the benefit of more 
pgs?

Best,
Nick

From: Anthony D'Atri 
Sent: 05 March 2024 19:54
To: Nikolaos Dandoulakis 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Number of pgs

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.
If you only have one pool of significant size, then your PG ratio is around 40 
.  IMHO too low.

If you're using HDDs I personally might set to 8192 ; if using NVMe SSDS 
arguably 16384 -- assuming that your OSD sizes are more or less close to each 
other.

`ceph osd df` will show toward the right how many PG replicas are on each OSD.

On Mar 5, 2024, at 14:50, Nikolaos Dandoulakis  wrote:

Hi Anthony,

I should have said, it’s replicated (3)

Best,
Nick

Sent from my phone, apologies for any typos!

From: Anthony D'Atri 
Sent: Tuesday, March 5, 2024 7:22:42 PM
To: Nikolaos Dandoulakis 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Number of pgs

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Replicated or EC?

> On Mar 5, 2024, at 14:09, Nikolaos Dandoulakis  wrote:
>
> Hi all,
>
> Pretty sure not the first time you see a thread like this.
>
> Our cluster consists of 12 nodes/153 OSDs/1.2 PiB used, 708 TiB /1.9 PiB avail
>
> The data pool is 2048 pgs big exactly the same number as when the cluster 
> started. We have no issues with the cluster, everything runs as expected and 
> very efficiently. We support about 1000 clients. The question is should we 
> increase the number of pgs? If you think so, what is the sensible number to 
> go to? 4096? More?
>
> I will eagerly await for your response.
>
> Best,
> Nick
>
> P.S. Yes, autoscaler is off :)
> The University of Edinburgh is a charitable body, registered in Scotland, 
> with registration number SC005336. Is e buidheann carthannais a th' ann an 
> Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Number of pgs

2024-03-05 Thread Anthony D'Atri

If you only have one pool of significant size, then your PG ratio is around 40 
.  IMHO too low.

If you're using HDDs I personally might set to 8192 ; if using NVMe SSDS 
arguably 16384 -- assuming that your OSD sizes are more or less close to each 
other.


`ceph osd df` will show toward the right how many PG replicas are on each OSD.

> On Mar 5, 2024, at 14:50, Nikolaos Dandoulakis  wrote:
> 
> Hi Anthony,
> 
> I should have said, it’s replicated (3)
> 
> Best,
> Nick
> 
> Sent from my phone, apologies for any typos!
> From: Anthony D'Atri 
> Sent: Tuesday, March 5, 2024 7:22:42 PM
> To: Nikolaos Dandoulakis 
> Cc: ceph-users@ceph.io 
> Subject: Re: [ceph-users] Number of pgs
>  
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that the 
> email is genuine and the content is safe.
> 
> Replicated or EC?
> 
> > On Mar 5, 2024, at 14:09, Nikolaos Dandoulakis  wrote:
> >
> > Hi all,
> >
> > Pretty sure not the first time you see a thread like this.
> >
> > Our cluster consists of 12 nodes/153 OSDs/1.2 PiB used, 708 TiB /1.9 PiB 
> > avail
> >
> > The data pool is 2048 pgs big exactly the same number as when the cluster 
> > started. We have no issues with the cluster, everything runs as expected 
> > and very efficiently. We support about 1000 clients. The question is should 
> > we increase the number of pgs? If you think so, what is the sensible number 
> > to go to? 4096? More?
> >
> > I will eagerly await for your response.
> >
> > Best,
> > Nick
> >
> > P.S. Yes, autoscaler is off :)
> > The University of Edinburgh is a charitable body, registered in Scotland, 
> > with registration number SC005336. Is e buidheann carthannais a th' ann an 
> > Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Number of pgs

2024-03-05 Thread Nikolaos Dandoulakis

Hi Anthony,

I should have said, it’s replicated (3)

Best,
Nick

Sent from my phone, apologies for any typos!

From: Anthony D'Atri 
Sent: Tuesday, March 5, 2024 7:22:42 PM
To: Nikolaos Dandoulakis 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Number of pgs

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Replicated or EC?

> On Mar 5, 2024, at 14:09, Nikolaos Dandoulakis  wrote:
>
> Hi all,
>
> Pretty sure not the first time you see a thread like this.
>
> Our cluster consists of 12 nodes/153 OSDs/1.2 PiB used, 708 TiB /1.9 PiB avail
>
> The data pool is 2048 pgs big exactly the same number as when the cluster 
> started. We have no issues with the cluster, everything runs as expected and 
> very efficiently. We support about 1000 clients. The question is should we 
> increase the number of pgs? If you think so, what is the sensible number to 
> go to? 4096? More?
>
> I will eagerly await for your response.
>
> Best,
> Nick
>
> P.S. Yes, autoscaler is off :)
> The University of Edinburgh is a charitable body, registered in Scotland, 
> with registration number SC005336. Is e buidheann carthannais a th' ann an 
> Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Number of pgs

2024-03-05 Thread Anthony D'Atri

Replicated or EC?

> On Mar 5, 2024, at 14:09, Nikolaos Dandoulakis  wrote:
> 
> Hi all,
> 
> Pretty sure not the first time you see a thread like this.
> 
> Our cluster consists of 12 nodes/153 OSDs/1.2 PiB used, 708 TiB /1.9 PiB avail
> 
> The data pool is 2048 pgs big exactly the same number as when the cluster 
> started. We have no issues with the cluster, everything runs as expected and 
> very efficiently. We support about 1000 clients. The question is should we 
> increase the number of pgs? If you think so, what is the sensible number to 
> go to? 4096? More?
> 
> I will eagerly await for your response.
> 
> Best,
> Nick
> 
> P.S. Yes, autoscaler is off :)
> The University of Edinburgh is a charitable body, registered in Scotland, 
> with registration number SC005336. Is e buidheann carthannais a th' ann an 
> Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Number of pgs

2024-03-05 Thread Nikolaos Dandoulakis

Hi all,

Pretty sure not the first time you see a thread like this.

Our cluster consists of 12 nodes/153 OSDs/1.2 PiB used, 708 TiB /1.9 PiB avail

The data pool is 2048 pgs big exactly the same number as when the cluster 
started. We have no issues with the cluster, everything runs as expected and 
very efficiently. We support about 1000 clients. The question is should we 
increase the number of pgs? If you think so, what is the sensible number to go 
to? 4096? More?

I will eagerly await for your response.

Best,
Nick

P.S. Yes, autoscaler is off :)
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th' ann an Oilthigh 
Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: reef 18.2.2 (hot-fix) QE validation status

2024-03-05 Thread Travis Nielsen

Looks great to me, Redo has tested this thoroughly.

Thanks!
Travis

On Tue, Mar 5, 2024 at 8:48 AM Yuri Weinstein  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/64721#note-1
> Release Notes - TBD
> LRC upgrade - TBD
>
> Seeking approvals/reviews for:
>
> smoke - in progress
> rados - Radek, Laura?
> quincy-x - in progress
>
> Also need approval from Travis, Redouane for Prometheus fix testing.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: debian-reef_OLD?

2024-03-05 Thread Reed Dier

Given that both the debian and rpm paths have been appended with _OLD, and this 
more recent post about 18.2.2 (hot-fix), it sounds like there is some sort of 
issue with 18.2.1?
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LEYDHWAPZW7KOGH2OH4TOPVGAFMZPYYP/

Reed

> On Mar 5, 2024, at 3:07 AM, Christian Rohmann  
> wrote:
> 
> On 04.03.24 22:24, Daniel Brown wrote:
>> debian-reef/
>> 
>> Now appears to be:
>> 
>> debian-reef_OLD/
> 
> Could this have been  some sort of "release script" just messing up the 
> renaming / symlinking to the most recent stable?
> 
> 
> 
> Regards
> 
> 
> Christian
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] reef 18.2.2 (hot-fix) QE validation status

2024-03-05 Thread Yuri Weinstein

Details of this release are summarized here:

https://tracker.ceph.com/issues/64721#note-1
Release Notes - TBD
LRC upgrade - TBD

Seeking approvals/reviews for:

smoke - in progress
rados - Radek, Laura?
quincy-x - in progress

Also need approval from Travis, Redouane for Prometheus fix testing.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Eugen Block


Thanks for chiming in, Adam.

Zitat von Adam King :


There was a bug with this that was fixed by
https://github.com/ceph/ceph/pull/52122 (which also specifically added an
integration test for this case). It looks like it's missing a reef and
quincy backport though unfortunately. I'll try to open one for both.

On Tue, Mar 5, 2024 at 8:26 AM Eugen Block  wrote:


It seems to be an issue with the service type (in this case "mon"),
it's not entirely "broken", with the node-exporter it works:

quincy-1:~ # cat node-exporter.yaml
service_type: node-exporter
service_name: node-exporter
placement:
   host_pattern: '*'
extra_entrypoint_args:
   -
"--collector.textfile.directory=/var/lib/node_exporter/textfile_collector2"

quincy-1:~ # ceph orch apply -i node-exporter.yaml
Scheduled node-exporter update...

I'll keep looking... unless one of the devs is reading this thread and
finds it quicker.


Zitat von Eugen Block :

> Oh, you're right. I just checked on Quincy as well at it failed with
> the same error message. For pacific it still works. I'll check for
> existing tracker issues.
>
> Zitat von Robert Sander :
>
>> Hi,
>>
>> On 3/5/24 08:57, Eugen Block wrote:
>>
>>> extra_entrypoint_args:
>>>   -
>>>
'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
>>
>> When I try this on my test cluster with Reef 18.2.1 the
>> orchestrator tells me:
>>
>> # ceph orch apply -i mon.yml
>> Error EINVAL: ServiceSpec: __init__() got an unexpected keyword
>> argument 'extra_entrypoint_args'
>>
>> It's a documented feature:
>>
>>
https://docs.ceph.com/en/reef/cephadm/services/#cephadm-extra-entrypoint-args
>>
>> Regards
>> --
>> Robert Sander
>> Heinlein Consulting GmbH
>> Schwedter Str. 8/9b, 10119 Berlin
>>
>> https://www.heinlein-support.de
>>
>> Tel: 030 / 405051-43
>> Fax: 030 / 405051-19
>>
>> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
>> Geschäftsführer: Peer Heinlein - Sitz: Berlin
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Adam King

There was a bug with this that was fixed by
https://github.com/ceph/ceph/pull/52122 (which also specifically added an
integration test for this case). It looks like it's missing a reef and
quincy backport though unfortunately. I'll try to open one for both.

On Tue, Mar 5, 2024 at 8:26 AM Eugen Block  wrote:

> It seems to be an issue with the service type (in this case "mon"),
> it's not entirely "broken", with the node-exporter it works:
>
> quincy-1:~ # cat node-exporter.yaml
> service_type: node-exporter
> service_name: node-exporter
> placement:
>host_pattern: '*'
> extra_entrypoint_args:
>-
> "--collector.textfile.directory=/var/lib/node_exporter/textfile_collector2"
>
> quincy-1:~ # ceph orch apply -i node-exporter.yaml
> Scheduled node-exporter update...
>
> I'll keep looking... unless one of the devs is reading this thread and
> finds it quicker.
>
>
> Zitat von Eugen Block :
>
> > Oh, you're right. I just checked on Quincy as well at it failed with
> > the same error message. For pacific it still works. I'll check for
> > existing tracker issues.
> >
> > Zitat von Robert Sander :
> >
> >> Hi,
> >>
> >> On 3/5/24 08:57, Eugen Block wrote:
> >>
> >>> extra_entrypoint_args:
> >>>   -
> >>>
> '--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
> >>
> >> When I try this on my test cluster with Reef 18.2.1 the
> >> orchestrator tells me:
> >>
> >> # ceph orch apply -i mon.yml
> >> Error EINVAL: ServiceSpec: __init__() got an unexpected keyword
> >> argument 'extra_entrypoint_args'
> >>
> >> It's a documented feature:
> >>
> >>
> https://docs.ceph.com/en/reef/cephadm/services/#cephadm-extra-entrypoint-args
> >>
> >> Regards
> >> --
> >> Robert Sander
> >> Heinlein Consulting GmbH
> >> Schwedter Str. 8/9b, 10119 Berlin
> >>
> >> https://www.heinlein-support.de
> >>
> >> Tel: 030 / 405051-43
> >> Fax: 030 / 405051-19
> >>
> >> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> >> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help with deep scrub warnings (probably a bug ... set on pool for effect)

2024-03-05 Thread Peter Maloney


I had the same problem as you

The only solution that worked for me is to set it on the pools:
    for pool in $(ceph osd pool ls); do
    ceph osd pool set "$pool" scrub_max_interval "$smaxi"
    ceph osd pool set "$pool" scrub_min_interval "$smini"
    ceph osd pool set "$pool" deep_scrub_interval "$dsi"
    done

which is insane...so therefore has to be a bug, but didn't report it because 
such a thing should be so obvious that it has to be just me, right?
btw this happened to be once the clock rolled around to 2024, without any other 
scrub related changes ... so maybe that's somehow the cause. I looked in the 
source code to try to solve this, and couldn't find any bug like that though. 
(and that's how I found that such a thing as those setting per pool even exists)

And also I find the randomize ratio stuff doesn't work that well...and I also 
made them smaller. It would constantly scrub things way too early, blocking the 
ones that are late or soon late (after recovery for example, then some are late 
soon). Note that the 2 ratios are different...

osd_scrub_interval_randomize_ratio multiplies time to spread out the scheduling, and that 
is ok but not that great... I think what makes sense is do the *next* one early once 
idle, not a *random* one early at *random* timing, but it's good enough to eventually 
spread things out. Scrubs are fast and low impact so I don't think you have to worry 
about this at all. You may be "just bunching them up" ... so you want to verify 
that (pg dump, sort by timestamp), but I found it doesn't matter in practice... as long 
as it's lower than osd_max_scrubs at any given time, I don't really care.

But I think the other is terrible, especially if you have your deep scrubs a 
lot less often than scrubs. osd_deep_scrub_randomize_ratio will randomly 
upgrade scrubs into deep scrubs which are way more IO intensive. Again I 
disagree with that making sense to randomly do it on a random pg at a mostly 
random time instead of eg. doing the next deep scrub a bit early when load is 
low... but in this case it matters because of how much more IO intensive it is. 
Any time I see late scrubs (frequent after recovery), I also see it's scrubbing 
things like that way too early while it's blocking late ones, so it takes very 
long to finally get all scrubs done. I changed that to 0.01 so it doesn't 
bother me now.

Peter

On 2024-03-05 07:58, Anthony D'Atri wrote:

* Try applying the settings to global so that mons/mgrs get them.

* Set your shallow scrub settings back to the default.  Shallow scrubs take 
very few resources

* Set your randomize_ratio back to the default, you’re just bunching them up

* Set the load threshold back to the default, I can’t imagine any OSD node ever 
having a load < 0.3, you’re basically keeping scrubs from ever running

* osd_deep_scrub_interval is the only thing you should need to change.


On Mar 5, 2024, at 2:42 AM, Nicola Mori  wrote:

Dear Ceph users,

in order to reduce the deep scrub load on my cluster I set the deep scrub 
interval to 2 weeks, and tuned other parameters as follows:

# ceph config get osd osd_deep_scrub_interval
1209600.00
# ceph config get osd osd_scrub_sleep
0.10
# ceph config get osd osd_scrub_load_threshold
0.30
# ceph config get osd osd_deep_scrub_randomize_ratio
0.10
# ceph config get osd osd_scrub_min_interval
259200.00
# ceph config get osd osd_scrub_max_interval
1209600.00

In my admittedly poor knowledge of Ceph's deep scrub procedures, these settings 
should spread the deep scrub operations in two weeks instead of the default one 
week, lowering the scrub frequency and the related load. But I'm currently 
getting warnings like:

[WRN] PG_NOT_DEEP_SCRUBBED: 56 pgs not deep-scrubbed in time
pg 3.1e1 not deep-scrubbed since 2024-02-22T00:22:55.296213+
pg 3.1d9 not deep-scrubbed since 2024-02-20T03:41:25.461002+
pg 3.1d5 not deep-scrubbed since 2024-02-20T09:52:57.334058+
pg 3.1cb not deep-scrubbed since 2024-02-20T03:30:40.510979+
. . .

I don't understand the first one, since the deep scrub interval should be two 
weeks so I don''t expect warnings for PGs which have been deep-scrubbed less 
than 14 days ago (at the moment I'm writing it's Tue Mar  5 07:39:07 UTC 2024).

Moreover, I don't understand why the deep scrub for so many PGs is lagging 
behind. Is there something wrong in my settings?

Thanks in advance for any help,

Nicola
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--

Peter Maloney
Brockmann Consult GmbH
www.brockmann-consult.de
Chrysanderstr. 1
D-21029 Hamburg, Germany
Tel: +49 (0)40 69 63 89 - 320
E-mail:

[ceph-users] Re: Help with deep scrub warnings

2024-03-05 Thread Nicola Mori


Hi Anthony,

thanks for the tips. I reset all the values but osd_deep_scrub_interval 
to their defaults as reported at 
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ :


# ceph config set osd osd_scrub_sleep 0.0
# ceph config set osd osd_scrub_load_threshold 0.5
# ceph config set osd osd_deep_scrub_randomize_ratio 0.15
# ceph config set osd osd_scrub_min_interval 86400
# ceph config set osd osd_scrub_max_interval 604800

and kept osd_deep_scrub_interval at 1209600 (14 days). Is this what you 
meant or did I miss something?

Let's see if the situation improves in the next days.

Nicola


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Eugen Block

It seems to be an issue with the service type (in this case "mon"),  
it's not entirely "broken", with the node-exporter it works:


quincy-1:~ # cat node-exporter.yaml
service_type: node-exporter
service_name: node-exporter
placement:
  host_pattern: '*'
extra_entrypoint_args:
  -  
"--collector.textfile.directory=/var/lib/node_exporter/textfile_collector2"


quincy-1:~ # ceph orch apply -i node-exporter.yaml  
Scheduled node-exporter update...

I'll keep looking... unless one of the devs is reading this thread and  
finds it quicker.



Zitat von Eugen Block :

Oh, you're right. I just checked on Quincy as well at it failed with  
the same error message. For pacific it still works. I'll check for  
existing tracker issues.


Zitat von Robert Sander :


Hi,

On 3/5/24 08:57, Eugen Block wrote:


extra_entrypoint_args:
  -  
'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'


When I try this on my test cluster with Reef 18.2.1 the  
orchestrator tells me:


# ceph orch apply -i mon.yml
Error EINVAL: ServiceSpec: __init__() got an unexpected keyword  
argument 'extra_entrypoint_args'


It's a documented feature:

https://docs.ceph.com/en/reef/cephadm/services/#cephadm-extra-entrypoint-args

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Uninstall ceph rgw

2024-03-05 Thread Albert Shih

Le 05/03/2024 à 11:54:34+0100, Robert Sander a écrit

Hi, 

> On 3/5/24 11:05, Albert Shih wrote:
> 
> > But I like to clean up and «erase» everything about rgw ?  not only to try
> > to understand but also because I think I mixted up between realm and
> > zonegroup...
> 
> Remove the service with "ceph orch rm …" and then remove all the pools the
> rgw services has created. They usually have "rgw" in their name.

Yess...that's the point I was missing. 

So now I can create 

  https://docs.ceph.com/en/latest/mgr/rgw/#mgr-rgw-module

I eventually make the command 

  ceph rgw realm bootstrap --realm-name fr --zonegroup-name obspm --zone-name 
Meudon --port 5500 --placement 'label:services' --zone_endpoints 
'http://cthulhu1.*:5500, http://cthulhu2.*:5500, 
http://cthulhu3.*:5500, http://cthulhu4.*:5500, 
http://cthulhu5.*:5500' --start-radosgw
 
work (meaning end without error)

But I don't known why (maybe related to the zone_endpoints ? ) the radosgw
didn't start. And I'm not be able to find how using 

  ceph rgw

to start radogsw.

Thanks.

Regards
-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
mar. 05 mars 2024 14:15:40 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help with deep scrub warnings

2024-03-05 Thread Anthony D'Atri

* Try applying the settings to global so that mons/mgrs get them.

* Set your shallow scrub settings back to the default.  Shallow scrubs take 
very few resources

* Set your randomize_ratio back to the default, you’re just bunching them up

* Set the load threshold back to the default, I can’t imagine any OSD node ever 
having a load < 0.3, you’re basically keeping scrubs from ever running

* osd_deep_scrub_interval is the only thing you should need to change.

> On Mar 5, 2024, at 2:42 AM, Nicola Mori  wrote:
> 
> Dear Ceph users,
> 
> in order to reduce the deep scrub load on my cluster I set the deep scrub 
> interval to 2 weeks, and tuned other parameters as follows:
> 
> # ceph config get osd osd_deep_scrub_interval
> 1209600.00
> # ceph config get osd osd_scrub_sleep
> 0.10
> # ceph config get osd osd_scrub_load_threshold
> 0.30
> # ceph config get osd osd_deep_scrub_randomize_ratio
> 0.10
> # ceph config get osd osd_scrub_min_interval
> 259200.00
> # ceph config get osd osd_scrub_max_interval
> 1209600.00
> 
> In my admittedly poor knowledge of Ceph's deep scrub procedures, these 
> settings should spread the deep scrub operations in two weeks instead of the 
> default one week, lowering the scrub frequency and the related load. But I'm 
> currently getting warnings like:
> 
> [WRN] PG_NOT_DEEP_SCRUBBED: 56 pgs not deep-scrubbed in time
>pg 3.1e1 not deep-scrubbed since 2024-02-22T00:22:55.296213+
>pg 3.1d9 not deep-scrubbed since 2024-02-20T03:41:25.461002+
>pg 3.1d5 not deep-scrubbed since 2024-02-20T09:52:57.334058+
>pg 3.1cb not deep-scrubbed since 2024-02-20T03:30:40.510979+
>. . .
> 
> I don't understand the first one, since the deep scrub interval should be two 
> weeks so I don''t expect warnings for PGs which have been deep-scrubbed less 
> than 14 days ago (at the moment I'm writing it's Tue Mar  5 07:39:07 UTC 
> 2024).
> 
> Moreover, I don't understand why the deep scrub for so many PGs is lagging 
> behind. Is there something wrong in my settings?
> 
> Thanks in advance for any help,
> 
> Nicola
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: PGs with status active+clean+laggy

2024-03-05 Thread Robert Sander


Hi,

On 3/5/24 13:05, ricardom...@soujmv.com wrote:

I have a ceph quincy cluster with 5 nodes currently. But only 3 with 
SSDs.


Do not mix HDDs and SSDs in the same pool.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] PGs with status active+clean+laggy

2024-03-05 Thread ricardomori


Dear community,

I have a ceph quincy cluster with 5 nodes currently. But only 3 with 
SSDs. I have had many alerts from PGs with active-clean-laggy status. 
This has caused problems with slow writing. I wanted to know how to 
troubleshoot properly. I checked several things related to the network, 
I have 10 GB cards on all nodes and everything seems to be correct.



Many thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [RGW] Restrict a subuser to access only one specific bucket

2024-03-05 Thread Ondřej Kukla

Hello,

As a one solution you can create a bucket policy that would give the “subuser” 
permissions to access the bucket. Just keep in mind that the second user is not 
the bucket owner so he will not be able to see the bucket in his bucket list, 
but when he access the bucket directly it will work as intended.

Ondrej

> On 5. 3. 2024, at 3:47, Huy Nguyen  wrote:
> 
> Hi community,
> I have a user that owns some buckets. I want to create a subuser that has 
> permission to access only one bucket. What can I do to archive this?
> 
> Thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph storage project for virtualization

2024-03-05 Thread Eneko Lacunza


Hi Egoitz,

I don't think it is a good idea, but can't comment about if that's 
possible because I don't know well enough Ceph's inner workings, maybe 
others can comment.


This is what worries me:
"

ach NFS redundant service of each datacenter will be composed by two
NFS gateways accessing to the OSDs of the placement group located in the
own datacenter. I planned achieving this with OSD weights and getting
with that the fact that the crush algorithm to build the map so that
each datacenter accesses end up having as master, the OSD of the own
datacenter in the placement group. Obviously, slave OSD replicas will
exist in the other three datacenters or even I don't discard the fact of
using erasure coding in some manner.

"
First, I don't think you got OSD weights right. Also, any write will be 
synchronous to the replicas so that's why I asked about latencies first. 
You may be able to read from DC-local "master" pgs (I recall someone 
doing this with host-local pgs...)


In the best case you'll have your data in a corner-case configuration, 
which may trigger strange bugs and/or behaviour not seen elsewhere.


I wouldn't like to be in such a position, but I don't know how valuable 
your data is...


I think it would be best to determine inter-DC network latency first; if 
you can choose DCs, then choose wisely with low enough latency ;) Then 
see if a regular Ceph storage configuration will give you good enough 
performance.


Another option would be to run DC-local ceph storages and to mirror to 
other DC.


Cheers

El 5/3/24 a las 11:50, ego...@ramattack.net escribió:

Hi Eneko!

I don't really have that data but I was planning to have as master OSD
only the ones in the same datacenter as the hypervisor using the
storage. The other datacenters would be just replicas. I assume you ask
it because replication is totally synchronous.

Well for doing step by step. Imagine for the moment, the point of
failure is a rack and all the replicas will be in the same datacenter in
different racks and rows. In this case the latency should be acceptable
and low.

My question was more related to the redundant nfs and if you have some
experience with similar setups. I was trying to know if first is
feasible what I'm planning to do.

Thank you so much :)

Cheers!

El 2024-03-05 11:43, Eneko Lacunza escribió:


Hi Egoitz,

What network latency between datacenters?

Cheers

El 5/3/24 a las 11:31, ego...@ramattack.net escribió:


Hi!

I have been reading some ebooks of Ceph and some doc and learning about
it. The goal of all it, is the fact of creating a rock solid storage por
virtual machines. After all the learning I have not been able to answer
by myself to this question so I was wondering if perhaps you could
clarify my doubt.

Let's imagine three datacenters, each one with for instance, 4
virtualization hosts. As I was planning to build a solution for diferent
hypervisors I have been thinking in the following env.

- I planed to have my Ceph storage (with different pools inside) with
OSDs in three different datacenters (as failure point).

- Each datacenter's hosts, will be accessing to a NFS redundant service
in the own datacenter.

- Each NFS redundant service of each datacenter will be composed by two
NFS gateways accessing to the OSDs of the placement group located in the
own datacenter. I planned achieving this with OSD weights and getting
with that the fact that the crush algorithm to build the map so that
each datacenter accesses end up having as master, the OSD of the own
datacenter in the placement group. Obviously, slave OSD replicas will
exist in the other three datacenters or even I don't discard the fact of
using erasure coding in some manner.

- The NFS gateways could be a NFS redundant gateway service from Ceph (I
have seen now they have developed something for this purpose
https://docs.ceph.com/en/quincy/mgr/nfs/) or perhaps two different
Debian machines, accessing to Ceph with rados and sharing to the
hypervisors that information over NFS. In case of Debian machines I have
heard good results using pacemaker/corosync for providing HA to that NFS
(between 0,5 and 3 seconds for fail over and service up again).

What do you think about this plan?. Do you see it feasible?. We will
work too with KVM and there we could access to Ceph directly but I would
needed to provide too storage por Xen and Vmware.

Thank you so much in advance,

Cheers!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Uninstall ceph rgw

2024-03-05 Thread Robert Sander


On 3/5/24 11:05, Albert Shih wrote:


But I like to clean up and «erase» everything about rgw ?  not only to try
to understand but also because I think I mixted up between realm and
zonegroup...


Remove the service with "ceph orch rm …" and then remove all the pools 
the rgw services has created. They usually have "rgw" in their name.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph storage project for virtualization

2024-03-05 Thread egoitz

Hi Eneko! 

I don't really have that data but I was planning to have as master OSD
only the ones in the same datacenter as the hypervisor using the
storage. The other datacenters would be just replicas. I assume you ask
it because replication is totally synchronous. 

Well for doing step by step. Imagine for the moment, the point of
failure is a rack and all the replicas will be in the same datacenter in
different racks and rows. In this case the latency should be acceptable
and low. 

My question was more related to the redundant nfs and if you have some
experience with similar setups. I was trying to know if first is
feasible what I'm planning to do. 

Thank you so much :) 

Cheers! 

El 2024-03-05 11:43, Eneko Lacunza escribió:

> Hi Egoitz,
> 
> What network latency between datacenters?
> 
> Cheers
> 
> El 5/3/24 a las 11:31, ego...@ramattack.net escribió: 
> 
>> Hi!
>> 
>> I have been reading some ebooks of Ceph and some doc and learning about
>> it. The goal of all it, is the fact of creating a rock solid storage por
>> virtual machines. After all the learning I have not been able to answer
>> by myself to this question so I was wondering if perhaps you could
>> clarify my doubt.
>> 
>> Let's imagine three datacenters, each one with for instance, 4
>> virtualization hosts. As I was planning to build a solution for diferent
>> hypervisors I have been thinking in the following env.
>> 
>> - I planed to have my Ceph storage (with different pools inside) with
>> OSDs in three different datacenters (as failure point).
>> 
>> - Each datacenter's hosts, will be accessing to a NFS redundant service
>> in the own datacenter.
>> 
>> - Each NFS redundant service of each datacenter will be composed by two
>> NFS gateways accessing to the OSDs of the placement group located in the
>> own datacenter. I planned achieving this with OSD weights and getting
>> with that the fact that the crush algorithm to build the map so that
>> each datacenter accesses end up having as master, the OSD of the own
>> datacenter in the placement group. Obviously, slave OSD replicas will
>> exist in the other three datacenters or even I don't discard the fact of
>> using erasure coding in some manner.
>> 
>> - The NFS gateways could be a NFS redundant gateway service from Ceph (I
>> have seen now they have developed something for this purpose
>> https://docs.ceph.com/en/quincy/mgr/nfs/) or perhaps two different
>> Debian machines, accessing to Ceph with rados and sharing to the
>> hypervisors that information over NFS. In case of Debian machines I have
>> heard good results using pacemaker/corosync for providing HA to that NFS
>> (between 0,5 and 3 seconds for fail over and service up again).
>> 
>> What do you think about this plan?. Do you see it feasible?. We will
>> work too with KVM and there we could access to Ceph directly but I would
>> needed to provide too storage por Xen and Vmware.
>> 
>> Thank you so much in advance,
>> 
>> Cheers!
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
> 
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> 
> https://www.youtube.com/user/CANALBINOVO
> https://www.linkedin.com/company/37269706/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph storage project for virtualization

2024-03-05 Thread Eneko Lacunza


Hi Egoitz,

What network latency between datacenters?

Cheers

El 5/3/24 a las 11:31, ego...@ramattack.net escribió:

Hi!

I have been reading some ebooks of Ceph and some doc and learning about
it. The goal of all it, is the fact of creating a rock solid storage por
virtual machines. After all the learning I have not been able to answer
by myself to this question so I was wondering if perhaps you could
clarify my doubt.

Let's imagine three datacenters, each one with for instance, 4
virtualization hosts. As I was planning to build a solution for diferent
hypervisors I have been thinking in the following env.

- I planed to have my Ceph storage (with different pools inside) with
OSDs in three different datacenters (as failure point).

- Each datacenter's hosts, will be accessing to a NFS redundant service
in the own datacenter.

- Each NFS redundant service of each datacenter will be composed by two
NFS gateways accessing to the OSDs of the placement group located in the
own datacenter. I planned achieving this with OSD weights and getting
with that the fact that the crush algorithm to build the map so that
each datacenter accesses end up having as master, the OSD of the own
datacenter in the placement group. Obviously, slave OSD replicas will
exist in the other three datacenters or even I don't discard the fact of
using erasure coding in some manner.

- The NFS gateways could be a NFS redundant gateway service from Ceph (I
have seen now they have developed something for this purpose
https://docs.ceph.com/en/quincy/mgr/nfs/) or perhaps two different
Debian machines, accessing to Ceph with rados and sharing to the
hypervisors that information over NFS. In case of Debian machines I have
heard good results using pacemaker/corosync for providing HA to that NFS
(between 0,5 and 3 seconds for fail over and service up again).

What do you think about this plan?. Do you see it feasible?. We will
work too with KVM and there we could access to Ceph directly but I would
needed to provide too storage por Xen and Vmware.

Thank you so much in advance,

Cheers!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph storage project for virtualization

2024-03-05 Thread egoitz

Hi! 

I have been reading some ebooks of Ceph and some doc and learning about
it. The goal of all it, is the fact of creating a rock solid storage por
virtual machines. After all the learning I have not been able to answer
by myself to this question so I was wondering if perhaps you could
clarify my doubt. 

Let's imagine three datacenters, each one with for instance, 4
virtualization hosts. As I was planning to build a solution for diferent
hypervisors I have been thinking in the following env. 

- I planed to have my Ceph storage (with different pools inside) with
OSDs in three different datacenters (as failure point). 

- Each datacenter's hosts, will be accessing to a NFS redundant service
in the own datacenter. 

- Each NFS redundant service of each datacenter will be composed by two
NFS gateways accessing to the OSDs of the placement group located in the
own datacenter. I planned achieving this with OSD weights and getting
with that the fact that the crush algorithm to build the map so that
each datacenter accesses end up having as master, the OSD of the own
datacenter in the placement group. Obviously, slave OSD replicas will
exist in the other three datacenters or even I don't discard the fact of
using erasure coding in some manner. 

- The NFS gateways could be a NFS redundant gateway service from Ceph (I
have seen now they have developed something for this purpose
https://docs.ceph.com/en/quincy/mgr/nfs/) or perhaps two different
Debian machines, accessing to Ceph with rados and sharing to the
hypervisors that information over NFS. In case of Debian machines I have
heard good results using pacemaker/corosync for providing HA to that NFS
(between 0,5 and 3 seconds for fail over and service up again). 

What do you think about this plan?. Do you see it feasible?. We will
work too with KVM and there we could access to Ceph directly but I would
needed to provide too storage por Xen and Vmware. 

Thank you so much in advance, 

Cheers!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Uninstall ceph rgw

2024-03-05 Thread Albert Shih

Hi everyone, 

I'm currently trying to understand how to deploy rgw, so I test few things
but now I'm not sure what's are installed what not. 

First I try to install according to 

  https://docs.ceph.com/en/quincy/cephadm/services/rgw/

then I see in that page they are 

  https://docs.ceph.com/en/quincy/mgr/rgw/#mgr-rgw-module

so now I got some rgw daemon running. 

But I like to clean up and «erase» everything about rgw ?  not only to try
to understand but also because I think I mixted up between realm and
zonegroup...

Regards



-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
mar. 05 mars 2024 11:01:30 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph Cluster Config File Locations?

2024-03-05 Thread duluxoz


Hi All,

I don't know how its happened (bad backup/restore, bad config file 
somewhere, I don't know) but my (DEV) Ceph Cluster is in a very bad 
state, and I'm looking for pointers/help in getting it back running 
(unfortunate, a complete rebuild/restore is *not* an option).


This is on Ceph Reef (on Rocky 9) which was converted to CephAdm from a 
manual install a few weeks ago (which worked). Five days ago everything 
when "t!ts-up" (an Ozzie technical ICT term meaning nothing works :-)   )


So, my (first?) issue is that I can't get any Managers to come up clean. 
Each one tries to connect on an ip subnet which doesn't exist any longer 
and hasn't for a couple of years.


The second issue is that (possible because of the first) every `ceph 
orch` command just hangs. Cephadm commands work fine.


I've checked, checked, and checked again that the individual config 
files all point towards the correct ip subnet for the monitors, and I 
cannot find any trace of the old subnet's ip address in any config file 
(that I can find).


For the record I am *not* a "podman guy" so there may be something there 
that's causing my issue(s?) but I don't know where to even begin to look.


Any/all logs simply start that the Manager(s) try to come up, can't find 
an address in the "old" subnet, and so fail - nothing else helpful (at 
least to me).


I've even pulled a copy of the monmap and its showing the correct ip 
subnet addresses for the monitors.


The firewalls are all set correctly and an audit2allow shows nothing is 
out of place, as does disabling SELinux (ie no change).


A `ceph -s` shows I've got no active managers (and that a monitor is 
down - that's my third issue), plus a whole bunch of osds and pgs aren't 
happy either. I have, though, got a monitor quorum.


So, what should I be looking at / where should I be looking? Any help is 
greatly *greatly* appreciated.


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: debian-reef_OLD?

2024-03-05 Thread Christian Rohmann


On 04.03.24 22:24, Daniel Brown wrote:

debian-reef/

Now appears to be:

debian-reef_OLD/


Could this have been  some sort of "release script" just messing up the 
renaming / symlinking to the most recent stable?




Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Eugen Block

Oh, you're right. I just checked on Quincy as well at it failed with  
the same error message. For pacific it still works. I'll check for  
existing tracker issues.


Zitat von Robert Sander :


Hi,

On 3/5/24 08:57, Eugen Block wrote:


extra_entrypoint_args:
  -  
'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'


When I try this on my test cluster with Reef 18.2.1 the orchestrator  
tells me:


# ceph orch apply -i mon.yml
Error EINVAL: ServiceSpec: __init__() got an unexpected keyword  
argument 'extra_entrypoint_args'


It's a documented feature:

https://docs.ceph.com/en/reef/cephadm/services/#cephadm-extra-entrypoint-args

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Robert Sander


Hi,

On 3/5/24 08:57, Eugen Block wrote:


extra_entrypoint_args:
   - 
'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'


When I try this on my test cluster with Reef 18.2.1 the orchestrator tells me:

# ceph orch apply -i mon.yml
Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument 
'extra_entrypoint_args'

It's a documented feature:

https://docs.ceph.com/en/reef/cephadm/services/#cephadm-extra-entrypoint-args

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Zakhar Kirpichenko

Well, that option could be included in new mon configs generated during mon
upgrades. But it isn't being used, a minimal config is written instead.
I.e. it seems that the configuration option is useless for all intents and
purposes, as it doesn't seem to be taken into account at any stage of a
mon's lifecycle.

/Z

On Tue, 5 Mar 2024 at 10:09, Eugen Block  wrote:

> Hi,
>
> > I also added it to the cluster config
> > with "ceph config set mon mon_rocksdb_options", but it seems that this
> > option doesn't have any effect at all.
>
> that's because it's an option that has to be present *during* mon
> startup, not *after* the startup when it can read the config store.
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi Eugen,
> >
> > It is correct that I manually added the configuration, but not to the
> > unit.run but rather to each mon's config (i.e.
> > /var/lib/ceph/FSID/mon.*/config). I also added it to the cluster config
> > with "ceph config set mon mon_rocksdb_options", but it seems that this
> > option doesn't have any effect at all.
> >
> > /Z
> >
> > On Tue, 5 Mar 2024 at 09:58, Eugen Block  wrote:
> >
> >> Hi,
> >>
> >> > 1. RocksDB options, which I provided to each mon via their
> configuration
> >> > files, got overwritten during mon redeployment and I had to re-add
> >> > mon_rocksdb_options back.
> >>
> >> IIRC, you didn't use the extra_entrypoint_args for that option but
> >> added it directly to the container unit.run file. So it's expected
> >> that it's removed after an update. If you want it to persist a
> >> container update you should consider using the extra_entrypoint_args:
> >>
> >> cat mon.yaml
> >> service_type: mon
> >> service_name: mon
> >> placement:
> >>hosts:
> >>- host1
> >>- host2
> >>- host3
> >> extra_entrypoint_args:
> >>-
> >>
> >>
> '--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
> >>
> >> Regards,
> >> Eugen
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Hi,
> >> >
> >> > I have upgraded my test and production cephadm-managed clusters from
> >> > 16.2.14 to 16.2.15. The upgrade was smooth and completed without
> issues.
> >> > There were a few things which I noticed after each upgrade:
> >> >
> >> > 1. RocksDB options, which I provided to each mon via their
> configuration
> >> > files, got overwritten during mon redeployment and I had to re-add
> >> > mon_rocksdb_options back.
> >> >
> >> > 2. Monitor debug_rocksdb option got silently reset back to the default
> >> 4/5,
> >> > I had to set it back to 1/5.
> >> >
> >> > 3. For roughly 2 hours after the upgrade, despite the clusters being
> >> > healthy and operating normally, all monitors would run manual
> compactions
> >> > very often and write to disks at very high rates. For example,
> production
> >> > monitors had their rocksdb:low0 thread write to store.db:
> >> >
> >> > monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
> >> > monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.
> >> >
> >> > After roughly 2 hours with no changes to the cluster the write rates
> >> > dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The
> reason
> >> for
> >> > frequent manual compactions and high write rates wasn't immediately
> >> > apparent.
> >> >
> >> > 4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
> >> > /var/lib/ceph/FSID/crash/posted, despite I already fixed it manually
> >> after
> >> > the upgrade to 16.2.14 which had broken it as well.
> >> >
> >> > 5. Mgr RAM usage appears to be increasing at a slower rate than it did
> >> with
> >> > 16.2.14, although it's too early to tell whether the issue with mgrs
> >> > randomly consuming all RAM and getting OOM-killed has been fixed -
> with
> >> > 16.2.14 this would normally take several days.
> >> >
> >> > Overall, things look good. Thanks to the Ceph team for this release!
> >> >
> >> > Zakhar
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Eugen Block


Hi,


I also added it to the cluster config
with "ceph config set mon mon_rocksdb_options", but it seems that this
option doesn't have any effect at all.


that's because it's an option that has to be present *during* mon  
startup, not *after* the startup when it can read the config store.


Zitat von Zakhar Kirpichenko :


Hi Eugen,

It is correct that I manually added the configuration, but not to the
unit.run but rather to each mon's config (i.e.
/var/lib/ceph/FSID/mon.*/config). I also added it to the cluster config
with "ceph config set mon mon_rocksdb_options", but it seems that this
option doesn't have any effect at all.

/Z

On Tue, 5 Mar 2024 at 09:58, Eugen Block  wrote:


Hi,

> 1. RocksDB options, which I provided to each mon via their configuration
> files, got overwritten during mon redeployment and I had to re-add
> mon_rocksdb_options back.

IIRC, you didn't use the extra_entrypoint_args for that option but
added it directly to the container unit.run file. So it's expected
that it's removed after an update. If you want it to persist a
container update you should consider using the extra_entrypoint_args:

cat mon.yaml
service_type: mon
service_name: mon
placement:
   hosts:
   - host1
   - host2
   - host3
extra_entrypoint_args:
   -

'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'

Regards,
Eugen

Zitat von Zakhar Kirpichenko :

> Hi,
>
> I have upgraded my test and production cephadm-managed clusters from
> 16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
> There were a few things which I noticed after each upgrade:
>
> 1. RocksDB options, which I provided to each mon via their configuration
> files, got overwritten during mon redeployment and I had to re-add
> mon_rocksdb_options back.
>
> 2. Monitor debug_rocksdb option got silently reset back to the default
4/5,
> I had to set it back to 1/5.
>
> 3. For roughly 2 hours after the upgrade, despite the clusters being
> healthy and operating normally, all monitors would run manual compactions
> very often and write to disks at very high rates. For example, production
> monitors had their rocksdb:low0 thread write to store.db:
>
> monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
> monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.
>
> After roughly 2 hours with no changes to the cluster the write rates
> dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The reason
for
> frequent manual compactions and high write rates wasn't immediately
> apparent.
>
> 4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
> /var/lib/ceph/FSID/crash/posted, despite I already fixed it manually
after
> the upgrade to 16.2.14 which had broken it as well.
>
> 5. Mgr RAM usage appears to be increasing at a slower rate than it did
with
> 16.2.14, although it's too early to tell whether the issue with mgrs
> randomly consuming all RAM and getting OOM-killed has been fixed - with
> 16.2.14 this would normally take several days.
>
> Overall, things look good. Thanks to the Ceph team for this release!
>
> Zakhar
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Zakhar Kirpichenko

Hi Eugen,

It is correct that I manually added the configuration, but not to the
unit.run but rather to each mon's config (i.e.
/var/lib/ceph/FSID/mon.*/config). I also added it to the cluster config
with "ceph config set mon mon_rocksdb_options", but it seems that this
option doesn't have any effect at all.

/Z

On Tue, 5 Mar 2024 at 09:58, Eugen Block  wrote:

> Hi,
>
> > 1. RocksDB options, which I provided to each mon via their configuration
> > files, got overwritten during mon redeployment and I had to re-add
> > mon_rocksdb_options back.
>
> IIRC, you didn't use the extra_entrypoint_args for that option but
> added it directly to the container unit.run file. So it's expected
> that it's removed after an update. If you want it to persist a
> container update you should consider using the extra_entrypoint_args:
>
> cat mon.yaml
> service_type: mon
> service_name: mon
> placement:
>hosts:
>- host1
>- host2
>- host3
> extra_entrypoint_args:
>-
>
> '--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > I have upgraded my test and production cephadm-managed clusters from
> > 16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
> > There were a few things which I noticed after each upgrade:
> >
> > 1. RocksDB options, which I provided to each mon via their configuration
> > files, got overwritten during mon redeployment and I had to re-add
> > mon_rocksdb_options back.
> >
> > 2. Monitor debug_rocksdb option got silently reset back to the default
> 4/5,
> > I had to set it back to 1/5.
> >
> > 3. For roughly 2 hours after the upgrade, despite the clusters being
> > healthy and operating normally, all monitors would run manual compactions
> > very often and write to disks at very high rates. For example, production
> > monitors had their rocksdb:low0 thread write to store.db:
> >
> > monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
> > monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.
> >
> > After roughly 2 hours with no changes to the cluster the write rates
> > dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The reason
> for
> > frequent manual compactions and high write rates wasn't immediately
> > apparent.
> >
> > 4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
> > /var/lib/ceph/FSID/crash/posted, despite I already fixed it manually
> after
> > the upgrade to 16.2.14 which had broken it as well.
> >
> > 5. Mgr RAM usage appears to be increasing at a slower rate than it did
> with
> > 16.2.14, although it's too early to tell whether the issue with mgrs
> > randomly consuming all RAM and getting OOM-killed has been fixed - with
> > 16.2.14 this would normally take several days.
> >
> > Overall, things look good. Thanks to the Ceph team for this release!
> >
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

46 matches

Mail list logo