[ceph-users] Re: cache pressure?

2024-04-26 Thread Erich Weiler
As Dietmar said, VS Code may cause this. Quite funny to read, actually, because we've been dealing with this issue for over a year, and yesterday was the very first time Ceph complained about a client and we saw VS Code's remote stuff running. Coincidence. I'm holding my breath that the

[ceph-users] Re: cache pressure?

2024-04-26 Thread William Edwards
Hi Erich, Erich Weiler schreef op 2024-04-23 15:47: So I'm trying to figure out ways to reduce the number of warnings I'm getting and I'm thinking about the one "client failing to respond to cache pressure". Is there maybe a way to tell a client (or all clients) to reduce the amount of

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Wesley for the clear explanation between the 2 methods! The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks about primary-affinity. Could primary-affinity help remove an OSD with hardware issue from the cluster gracefully? Thanks, Mary On Fri, Apr 26, 2024 at

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Wesley Dillingham
What you want to do is to stop the OSD (and all its copies of data it contains) by stopping the OSD service immediately. The downside of this approach is it causes the PGs on that OSD to be degraded. But the upside is the OSD which has bad hardware is immediately no longer participating in any

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Eugen for your warm help! I'm trying to understand the difference between 2 methods. For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph Documentation says it involves 2 steps: 1. evacuating all

[ceph-users] Re: Add node-exporter using ceph orch

2024-04-26 Thread Robert Sander
On 4/26/24 15:47, Vahideh Alinouri wrote: The result of this command shows one of the servers in the cluster, but I have node-exporter daemons on all servers. The default service specification looks like this: service_type: node-exporter service_name: node-exporter placement: host_pattern:

[ceph-users] Re: [EXTERN] cache pressure?

2024-04-26 Thread Erich Weiler
Hi Dietmar, We do in fact have a bunch of users running vscode on our HPC head node as well (in addition to a few of our general purpose interactive compute servers). I'll suggest they make the mods you referenced! Thanks for the tip. cheers, erich On 4/24/24 12:58 PM, Dietmar Rieder

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Eugen Block
Hi, if you remove the OSD this way, it will be drained. Which means that it will try to recover PGs from this OSD, and in case of hardware failure it might lead to slow requests. It might make sense to forcefully remove the OSD without draining: - stop the osd daemon - mark it as out -

[ceph-users] Public Swift bucket with Openstack Keystone integration - not working in quincy/reef

2024-04-26 Thread Bartosz Bezak
Hi, Similar case as with previously fixed https://tracker.ceph.com/issues/48382 - https://github.com/ceph/ceph/pull/47308. Confirmed on Cephadm deployed Ceph 18.2.2/17.2.7 with Openstack Antelope/Yoga. I’m getting "404 NoSuchBucket" error with public buckets. Enabled with Swift/Keystone

[ceph-users] Re: Add node-exporter using ceph orch

2024-04-26 Thread Robert Sander
On 4/26/24 12:15, Vahideh Alinouri wrote: Hi guys, I have tried to add node-exporter to the new host in ceph cluster by the command mentioned in the document. ceph orch apply node-exporter hostname Usually a node-exporter daemon is deployed on all cluster hosts by the node-exporter service

[ceph-users] Re: Setup Ceph over RDMA

2024-04-26 Thread Vahideh Alinouri
Hi guys, There is just ms_type = async+rdma in the document, but there are options not mentioned. I get them using osd config show: ceph config show-with-defaults osd.0 | grep rdma ms_async_rdma_buffer_size 131072 ms_async_rdma_cm false ms_async_rdma_device_name ms_async_rdma_dscp 96

[ceph-users] Add node-exporter using ceph orch

2024-04-26 Thread Vahideh Alinouri
Hi guys, I have tried to add node-exporter to the new host in ceph cluster by the command mentioned in the document. ceph orch apply node-exporter hostname I think there is a functionality issue because cephadm log print node-exporter was applied successfully, but it didn't work! I tried the

[ceph-users] Re: MDS crash

2024-04-26 Thread Frédéric Nass
Hello, 'almost all diagnostic ceph subcommands hang!' -> this triggered my bell. We've had a similar issue with many ceph commands hanging due to a missing L3 ACL between MGRs and a new MDS machine that we added to the cluster. I second Eugen analysis: network issue, whatever the OSI layer.

[ceph-users] Re: MDS crash

2024-04-26 Thread Eugen Block
Hi, it's unlikely that all OSDs fail at the same time, it seems like a network issue. Do you have an active MGR? Just a couple of days ago someone reported incorrect OSD stats because no MGR was up. Although your 'ceph health detail' output doesn't mention that, there are still issues when