[ceph-users] Re: cache pressure?

2024-04-26 Thread Erich Weiler
As Dietmar said, VS Code may cause this. Quite funny to read, actually, 
because we've been dealing with this issue for over a year, and 
yesterday was the very first time Ceph complained about a client and we 
saw VS Code's remote stuff running. Coincidence.


I'm holding my breath that the vscode issue is the one affecting us - I 
got my users to tweak their vscode configs and the problem seemed to go 
away, but I guess I won't consider it 'solved' until a few days pass 
without it coming back...  :)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cache pressure?

2024-04-26 Thread William Edwards

Hi Erich,

Erich Weiler schreef op 2024-04-23 15:47:
So I'm trying to figure out ways to reduce the number of warnings I'm 
getting and I'm thinking about the one "client failing to respond to 
cache pressure".


Is there maybe a way to tell a client (or all clients) to reduce the 
amount of cache it uses or to release caches quickly?  Like, all the 
time?


I know the linux kernel (and maybe ceph) likes to cache everything for 
a while, and rightfully so, but I suspect in my use case it may be more 
efficient to more quickly purge the cache or to in general just cache 
way less overall...?


We have many thousands of threads all doing different things that are 
hitting our filesystem, so I suspect the caching isn't really doing me 
much good anyway due to the churn, and probably is causing more 
problems than it helping...


We are seeing "client failing to respond to cache pressure" on a daily 
basis.


Remounting on the client usually 'fixes' the issue. Sometimes, 
remounting on all clients that have the same directory mounted is 
needed. Also, a larger MDS cache seems to help.


As Dietmar said, VS Code may cause this. Quite funny to read, actually, 
because we've been dealing with this issue for over a year, and 
yesterday was the very first time Ceph complained about a client and we 
saw VS Code's remote stuff running. Coincidence.




-erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


Met vriendelijke groeten,

William Edwards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Wesley for the clear explanation between the 2 methods!
The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks
about primary-affinity. Could primary-affinity help remove an OSD with
hardware issue from the cluster gracefully?

Thanks,
Mary


On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham 
wrote:

> What you want to do is to stop the OSD (and all its copies of data it
> contains) by stopping the OSD service immediately. The downside of this
> approach is it causes the PGs on that OSD to be degraded. But the upside is
> the OSD which has bad hardware is immediately no  longer participating in
> any client IO (the source of your RGW 503s). In this situation the PGs go
> into degraded+backfilling
>
> The alternative method is to keep the failing OSD up and in the cluster
> but slowly migrate the data off of it, this would be a long drawn out
> period of time in which the failing disk would continue to serve client
> reads and also facilitate backfill but you wouldnt take a copy of the data
> out of the cluster and cause degraded PGs. In this scenario the PGs would
> be remapped+backfilling
>
> I tried to find a way to have your cake and eat it to in relation to this
> "predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
> but it was deemed "wont fix".
>
> Respectfully,
>
> *Wes Dillingham*
> LinkedIn 
> w...@wesdillingham.com
>
>
>
>
> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
> wrote:
>
>> Thank you Eugen for your warm help!
>>
>> I'm trying to understand the difference between 2 methods.
>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
>> Documentation
>> 
>> says
>> it involves 2 steps:
>>
>>1.
>>
>>evacuating all placement groups (PGs) from the OSD
>>2.
>>
>>removing the PG-free OSD from the cluster
>>
>> For method 2, or the procedure you recommended, Adding/Removing OSDs —
>> Ceph
>> Documentation
>> <
>> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
>> >
>> says
>> "After the OSD has been taken out of the cluster, Ceph begins rebalancing
>> the cluster by migrating placement groups out of the OSD that was removed.
>> "
>>
>> What's the difference between "evacuating PGs" in method 1 and "migrating
>> PGs" in method 2? I think method 1 must read the OSD to be removed.
>> Otherwise, we would not see slow ops warning. Does method 2 not involve
>> reading this OSD?
>>
>> Thanks,
>> Mary
>>
>> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:
>>
>> > Hi,
>> >
>> > if you remove the OSD this way, it will be drained. Which means that
>> > it will try to recover PGs from this OSD, and in case of hardware
>> > failure it might lead to slow requests. It might make sense to
>> > forcefully remove the OSD without draining:
>> >
>> > - stop the osd daemon
>> > - mark it as out
>> > - osd purge  [--force] [--yes-i-really-mean-it]
>> >
>> > Regards,
>> > Eugen
>> >
>> > Zitat von Mary Zhang :
>> >
>> > > Hi,
>> > >
>> > > We recently removed an osd from our Cepth cluster. Its underlying disk
>> > has
>> > > a hardware issue.
>> > >
>> > > We use command: ceph orch osd rm osd_id --zap
>> > >
>> > > During the process, sometimes ceph cluster enters warning state with
>> slow
>> > > ops on this osd. Our rgw also failed to respond to requests and
>> returned
>> > > 503.
>> > >
>> > > We restarted rgw daemon to make it work again. But the same failure
>> > occured
>> > > from time to time. Eventually we noticed that rgw 503 error is a
>> result
>> > of
>> > > osd slow ops.
>> > >
>> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
>> > > hardware issue won't impact cluster performance & rgw availbility. Is
>> our
>> > > expectation reasonable? What's the best way to handle osd with
>> hardware
>> > > failures?
>> > >
>> > > Thank you in advance for any comments or suggestions.
>> > >
>> > > Best Regards,
>> > > Mary Zhang
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Wesley Dillingham
What you want to do is to stop the OSD (and all its copies of data it
contains) by stopping the OSD service immediately. The downside of this
approach is it causes the PGs on that OSD to be degraded. But the upside is
the OSD which has bad hardware is immediately no  longer participating in
any client IO (the source of your RGW 503s). In this situation the PGs go
into degraded+backfilling

The alternative method is to keep the failing OSD up and in the cluster but
slowly migrate the data off of it, this would be a long drawn out period of
time in which the failing disk would continue to serve client reads and
also facilitate backfill but you wouldnt take a copy of the data out of the
cluster and cause degraded PGs. In this scenario the PGs would be
remapped+backfilling

I tried to find a way to have your cake and eat it to in relation to this
"predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
but it was deemed "wont fix".

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang  wrote:

> Thank you Eugen for your warm help!
>
> I'm trying to understand the difference between 2 methods.
> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
> Documentation
>  says
> it involves 2 steps:
>
>1.
>
>evacuating all placement groups (PGs) from the OSD
>2.
>
>removing the PG-free OSD from the cluster
>
> For method 2, or the procedure you recommended, Adding/Removing OSDs — Ceph
> Documentation
> <
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
> >
> says
> "After the OSD has been taken out of the cluster, Ceph begins rebalancing
> the cluster by migrating placement groups out of the OSD that was removed.
> "
>
> What's the difference between "evacuating PGs" in method 1 and "migrating
> PGs" in method 2? I think method 1 must read the OSD to be removed.
> Otherwise, we would not see slow ops warning. Does method 2 not involve
> reading this OSD?
>
> Thanks,
> Mary
>
> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:
>
> > Hi,
> >
> > if you remove the OSD this way, it will be drained. Which means that
> > it will try to recover PGs from this OSD, and in case of hardware
> > failure it might lead to slow requests. It might make sense to
> > forcefully remove the OSD without draining:
> >
> > - stop the osd daemon
> > - mark it as out
> > - osd purge  [--force] [--yes-i-really-mean-it]
> >
> > Regards,
> > Eugen
> >
> > Zitat von Mary Zhang :
> >
> > > Hi,
> > >
> > > We recently removed an osd from our Cepth cluster. Its underlying disk
> > has
> > > a hardware issue.
> > >
> > > We use command: ceph orch osd rm osd_id --zap
> > >
> > > During the process, sometimes ceph cluster enters warning state with
> slow
> > > ops on this osd. Our rgw also failed to respond to requests and
> returned
> > > 503.
> > >
> > > We restarted rgw daemon to make it work again. But the same failure
> > occured
> > > from time to time. Eventually we noticed that rgw 503 error is a result
> > of
> > > osd slow ops.
> > >
> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> > > hardware issue won't impact cluster performance & rgw availbility. Is
> our
> > > expectation reasonable? What's the best way to handle osd with hardware
> > > failures?
> > >
> > > Thank you in advance for any comments or suggestions.
> > >
> > > Best Regards,
> > > Mary Zhang
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Eugen for your warm help!

I'm trying to understand the difference between 2 methods.
For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph Documentation
 says
it involves 2 steps:

   1.

   evacuating all placement groups (PGs) from the OSD
   2.

   removing the PG-free OSD from the cluster

For method 2, or the procedure you recommended, Adding/Removing OSDs — Ceph
Documentation

says
"After the OSD has been taken out of the cluster, Ceph begins rebalancing
the cluster by migrating placement groups out of the OSD that was removed.
"

What's the difference between "evacuating PGs" in method 1 and "migrating
PGs" in method 2? I think method 1 must read the OSD to be removed.
Otherwise, we would not see slow ops warning. Does method 2 not involve
reading this OSD?

Thanks,
Mary

On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:

> Hi,
>
> if you remove the OSD this way, it will be drained. Which means that
> it will try to recover PGs from this OSD, and in case of hardware
> failure it might lead to slow requests. It might make sense to
> forcefully remove the OSD without draining:
>
> - stop the osd daemon
> - mark it as out
> - osd purge  [--force] [--yes-i-really-mean-it]
>
> Regards,
> Eugen
>
> Zitat von Mary Zhang :
>
> > Hi,
> >
> > We recently removed an osd from our Cepth cluster. Its underlying disk
> has
> > a hardware issue.
> >
> > We use command: ceph orch osd rm osd_id --zap
> >
> > During the process, sometimes ceph cluster enters warning state with slow
> > ops on this osd. Our rgw also failed to respond to requests and returned
> > 503.
> >
> > We restarted rgw daemon to make it work again. But the same failure
> occured
> > from time to time. Eventually we noticed that rgw 503 error is a result
> of
> > osd slow ops.
> >
> > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> > hardware issue won't impact cluster performance & rgw availbility. Is our
> > expectation reasonable? What's the best way to handle osd with hardware
> > failures?
> >
> > Thank you in advance for any comments or suggestions.
> >
> > Best Regards,
> > Mary Zhang
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Add node-exporter using ceph orch

2024-04-26 Thread Robert Sander

On 4/26/24 15:47, Vahideh Alinouri wrote:

The result of this command shows one of the servers in the cluster,
but I have node-exporter daemons on all servers.


The default service specification looks like this:

service_type: node-exporter
service_name: node-exporter
placement:
  host_pattern: '*'

If you apply this YAML code the orchestrator should deploy one 
node-exporter daemon to each host of the cluster.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERN] cache pressure?

2024-04-26 Thread Erich Weiler

Hi Dietmar,

We do in fact have a bunch of users running vscode on our HPC head node 
as well (in addition to a few of our general purpose interactive compute 
servers).  I'll suggest they make the mods you referenced!  Thanks for 
the tip.


cheers,
erich

On 4/24/24 12:58 PM, Dietmar Rieder wrote:

Hi Erich,

in our case the "client failing to respond to cache pressure" situation 
is/was often caused by users how have vscode connecting via ssh to our 
HPC head node. vscode makes heavy use of file watchers and we have seen 
users with > 400k watchers. All these watched files must be held in the 
MDS cache and if you have multiple users at the same time running vscode 
it gets problematic.


Unfortunately there is no global setting - at least none that we are 
aware of - for vscode to exclude certain files or directories from being 
watched. We asked the users to configure their vscode (Remote Settings 
-> Watcher Exclude) as follows:


{
   "files.watcherExclude": {
  "**/.git/objects/**": true,
  "**/.git/subtree-cache/**": true,
  "**/node_modules/*/**": true,
     "**/.cache/**": true,
     "**/.conda/**": true,
     "**/.local/**": true,
     "**/.nextflow/**": true,
     "**/work/**": true
   }
}

~/.vscode-server/data/Machine/settings.json

To monitor and find processes with watcher you may use inotify-info


HTH
   Dietmar

On 4/23/24 15:47, Erich Weiler wrote:
So I'm trying to figure out ways to reduce the number of warnings I'm 
getting and I'm thinking about the one "client failing to respond to 
cache pressure".


Is there maybe a way to tell a client (or all clients) to reduce the 
amount of cache it uses or to release caches quickly?  Like, all the 
time?


I know the linux kernel (and maybe ceph) likes to cache everything for 
a while, and rightfully so, but I suspect in my use case it may be 
more efficient to more quickly purge the cache or to in general just 
cache way less overall...?


We have many thousands of threads all doing different things that are 
hitting our filesystem, so I suspect the caching isn't really doing me 
much good anyway due to the churn, and probably is causing more 
problems than it helping...


-erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Eugen Block

Hi,

if you remove the OSD this way, it will be drained. Which means that  
it will try to recover PGs from this OSD, and in case of hardware  
failure it might lead to slow requests. It might make sense to  
forcefully remove the OSD without draining:


- stop the osd daemon
- mark it as out
- osd purge  [--force] [--yes-i-really-mean-it]

Regards,
Eugen

Zitat von Mary Zhang :


Hi,

We recently removed an osd from our Cepth cluster. Its underlying disk has
a hardware issue.

We use command: ceph orch osd rm osd_id --zap

During the process, sometimes ceph cluster enters warning state with slow
ops on this osd. Our rgw also failed to respond to requests and returned
503.

We restarted rgw daemon to make it work again. But the same failure occured
from time to time. Eventually we noticed that rgw 503 error is a result of
osd slow ops.

Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
hardware issue won't impact cluster performance & rgw availbility. Is our
expectation reasonable? What's the best way to handle osd with hardware
failures?

Thank you in advance for any comments or suggestions.

Best Regards,
Mary Zhang
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Public Swift bucket with Openstack Keystone integration - not working in quincy/reef

2024-04-26 Thread Bartosz Bezak
Hi, 

Similar case as with previously fixed https://tracker.ceph.com/issues/48382 - 
https://github.com/ceph/ceph/pull/47308.

Confirmed on Cephadm deployed Ceph 18.2.2/17.2.7 with Openstack Antelope/Yoga. 

I’m getting "404 NoSuchBucket" error with public buckets. Enabled with 
Swift/Keystone integration - everything else works fine.

With rgw_swift_account_in_url = true and proper endpoints: 
"https://rgw.test/swift/v1/AUTH_%(project_id)s"

ticking public access in horizon properly sets ACL on the bucket according to 
swift client:

swift -v stat test-bucket
URL: https://rgw.test/swift/v1/AUTH_daksjhdkajdshda/testbucket
Auth Token:
Account: AUTH_daksjhdkajdshda
Container: testbucket
Objects: 1
Bytes: 1021036
Read ACL: .r:*,.rlistings
Write ACL:
Sync To:
Sync Key:
X-Timestamp: 1710947159.41219
X-Container-Bytes-Used-Actual: 1024000
X-Storage-Policy: default-placement
X-Storage-Class: STANDARD
Last-Modified: Thu, 21 Mar 2024 10:30:05 GMT
X-Trans-Id: tx0092ac12312312312-1231231231-1701e5-default
X-Openstack-Request-Id: tx0092ac12312312312-1231231231-1701e5-default
Accept-Ranges: bytes
Content-Type: text/plain; charset=utf-8

however still getting 404 NoSuchBucket error


Could someone using the latest version of Ceph with Swift/Keystone integration 
please test public buckets? Thank you.


Best regards, 
Bartosz Bezak




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Add node-exporter using ceph orch

2024-04-26 Thread Robert Sander

On 4/26/24 12:15, Vahideh Alinouri wrote:

Hi guys,

I have tried to add node-exporter to the new host in ceph cluster by
the command mentioned in the document.
ceph orch apply node-exporter hostname


Usually a node-exporter daemon is deployed on all cluster hosts by the 
node-exporter service and its placement strategy.


What does your node-exporter service look like?

ceph orch ls node-exporter --export

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setup Ceph over RDMA

2024-04-26 Thread Vahideh Alinouri
Hi guys,

There is just ms_type = async+rdma in the document, but there are
options not mentioned. I get them using osd config show:
ceph config show-with-defaults osd.0 | grep rdma

ms_async_rdma_buffer_size 131072
ms_async_rdma_cm false
ms_async_rdma_device_name
ms_async_rdma_dscp 96
ms_async_rdma_enable_hugepage false
ms_async_rdma_gid_idx 0
ms_async_rdma_local_gid
ms_async_rdma_polling_us 1000
ms_async_rdma_port_num 1
ms_async_rdma_receive_buffers 32768
ms_async_rdma_receive_queue_len 4096
ms_async_rdma_roce_ver 1
ms_async_rdma_send_buffers 1024
ms_async_rdma_sl 3
ms_async_rdma_support_srq true
ms_async_rdma_type ib

When I checked Ceph github I found these options with_legacy: true.
https://github.com/ceph/ceph/blob/main/src/common/options/global.yaml.in

- name: ms_async_rdma_device_name
type: str
level: advanced
with_legacy: true
- name: ms_async_rdma_enable_hugepage
type: bool
level: advanced
default: false
with_legacy: true
- name: ms_async_rdma_buffer_size
type: size
level: advanced
default: 128_K
with_legacy: true
- name: ms_async_rdma_send_buffers
type: uint
level: advanced
default: 1_K
with_legacy: true

size of the receive buffer pool, 0 is unlimited
- name: ms_async_rdma_receive_buffers
type: uint
level: advanced
default: 32_K
with_legacy: true
max number of wr in srq
- name: ms_async_rdma_receive_queue_len
type: uint
level: advanced
default: 4_K
with_legacy: true
support srq
- name: ms_async_rdma_support_srq
type: bool
level: advanced
default: true
with_legacy: true
- name: ms_async_rdma_port_num
type: uint
level: advanced
default: 1
with_legacy: true
- name: ms_async_rdma_polling_us
type: uint
level: advanced
default: 1000
with_legacy: true
- name: ms_async_rdma_gid_idx
type: int
level: advanced
desc: use gid_idx to select GID for choosing RoCEv1 or RoCEv2
default: 0
with_legacy: true
GID format: "fe80::::7efe:90ff:fe72:6efe", no zero folding
- name: ms_async_rdma_local_gid
type: str
level: advanced
with_legacy: true
0=RoCEv1, 1=RoCEv2, 2=RoCEv1.5
- name: ms_async_rdma_roce_ver
type: int
level: advanced
default: 1
with_legacy: true
in RoCE, this means PCP
- name: ms_async_rdma_sl
type: int
level: advanced
default: 3
with_legacy: true
in RoCE, this means DSCP
- name: ms_async_rdma_dscp
type: int
level: advanced
default: 96
with_legacy: true
when there are enough accept failures, indicating there are
unrecoverable failures,
just do ceph_abort() . Here we make it configurable.
- name: ms_max_accept_failures
type: int
level: advanced
desc: The maximum number of consecutive failed accept() calls before considering
the daemon is misconfigured and abort it.
default: 4
with_legacy: true
rdma connection management
- name: ms_async_rdma_cm
type: bool
level: advanced
default: false
with_legacy: true
- name: ms_async_rdma_type
type: str
level: advanced
default: ib
with_legacy: true

It causes confusion and The RDMA setup needs more detail in the document.

Regards

On Mon, Apr 8, 2024 at 10:06 AM Vahideh Alinouri
 wrote:
>
> Hi guys,
>
> I need setup Ceph over RDMA, but I faced many issues!
> The info regarding my cluster:
> Ceph version is Reef
> Network cards are Broadcom RDMA.
> RDMA connection between OSD nodes are OK.
>
> I just found ms_type = async+rdma config in document and apply it using
> ceph config set global ms_type async+rdma
> After this action the cluster crashes. I tried to cluster back, and I did:
> Put ms_type async+posix in ceph.conf
> Restart all MON services
>
> The cluster is back, but I don't have any active mgr. All OSDs are down too.
> Is there any order to do for setting up Ceph over RDMA?
> Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Add node-exporter using ceph orch

2024-04-26 Thread Vahideh Alinouri
Hi guys,

I have tried to add node-exporter to the new host in ceph cluster by
the command mentioned in the document.
ceph orch apply node-exporter hostname

I think there is a functionality issue because cephadm log print
node-exporter was applied successfully, but it didn't work!

I tried the below command and it worked!
ceph orch daemon add node-exporter hostname

Which way is the correct way?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crash

2024-04-26 Thread Frédéric Nass
Hello,

'almost all diagnostic ceph subcommands hang!' -> this triggered my bell. We've 
had a similar issue with many ceph commands hanging due to a missing L3 ACL 
between MGRs and a new MDS machine that we added to the cluster.

I second Eugen analysis: network issue, whatever the OSI layer.

Regards,
Frédéric.

- Le 26 Avr 24, à 9:31, Eugen Block ebl...@nde.ag a écrit :

> Hi, it's unlikely that all OSDs fail at the same time, it seems like a
> network issue. Do you have an active MGR? Just a couple of days ago
> someone reported incorrect OSD stats because no MGR was up. Although
> your 'ceph health detail' output doesn't mention that, there are still
> issues when MGR processes are active according to ceph but don't
> respond anymore.
> I would probably start with basic network debugging, e. g. iperf,
> pings on public and cluster networks (if present) and so on.
> 
> Regards,
> Eugen
> 
> Zitat von Alexey GERASIMOV :
> 
>> Colleagues, I have the update.
>>
>> Starting from yestrerday the situation with ceph health is much
>> worse than it was previously.
>> We found that
>> - ceph -s informs us that some PGs are in stale state
>> -  almost all diagnostic ceph subcommands hang! For example, "ceph
>> osd ls" , "ceph osd dump",  "ceph osd tree", "ceph health detail"
>> provide the output - but "ceph osd status", all the commands "ceph
>> pg ..." and other ones hang.
>>
>> So, it looks that the crashes of MDS daemons were the first signs of
>> problems only.
>> I read that "stale" state for PGs means that all nodes storing this
>> placement group may be down - but it's wrong, all osd daemons are up
>> on all three nodes:
>>
>> --- ceph osd tree
>> ID  CLASS  WEIGHTTYPE NAME STATUS  REWEIGHT  PRI-AFF
>> -1 68.05609  root default
>> -3 22.68536  host asrv-dev-stor-1
>>  0hdd   5.45799  osd.0 up   1.0  1.0
>>  1hdd   5.45799  osd.1 up   1.0  1.0
>>  2hdd   5.45799  osd.2 up   1.0  1.0
>>  3hdd   5.45799  osd.3 up   1.0  1.0
>> 12ssd   0.42670  osd.12up   1.0  1.0
>> 13ssd   0.42670  osd.13up   1.0  1.0
>> -5 22.68536  host asrv-dev-stor-2
>>  4hdd   5.45799  osd.4 up   1.0  1.0
>>  5hdd   5.45799  osd.5 up   1.0  1.0
>>  6hdd   5.45799  osd.6 up   1.0  1.0
>>  7hdd   5.45799  osd.7 up   1.0  1.0
>> 14ssd   0.42670  osd.14up   1.0  1.0
>> 15ssd   0.42670  osd.15up   1.0  1.0
>> -7 22.68536  host asrv-dev-stor-3
>>  8hdd   5.45799  osd.8 up   1.0  1.0
>> 10hdd   5.45799  osd.10up   1.0  1.0
>> 11hdd   5.45799  osd.11up   1.0  1.0
>> 18hdd   5.45799  osd.18up   1.0  1.0
>> 16ssd   0.42670  osd.16up   1.0  1.0
>> 17ssd   0.42670  osd.17up   1.0  1.0
>>
>> May it be the physical problem with our drives? "smartctl -a"
>> informs nothing wrong.  We started the surface check using dd
>> command also but it will be 7 hours per drive at least...
>>
>> What should we do also?
>>
>> The output of  "ceph health detail":
>>
>> ceph health detail
>> HEALTH_ERR 1 MDSs report damaged metadata; insufficient standby MDS
>> daemons available; Reduced data availability: 50 pgs stale; 90
>> daemons have recently crashed; 3 mgr modules have recently crashed
>> [ERR] MDS_DAMAGE: 1 MDSs report damaged metadata
>> mds.asrv-dev-stor-2(mds.0): Metadata damage detected
>> [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
>> have 0; want 1 more
>> [WRN] PG_AVAILABILITY: Reduced data availability: 50 pgs stale
>> pg 5.0 is stuck stale for 67m, current state stale+active+clean,
>> last acting [4,1,11]
>> pg 5.13 is stuck stale for 67m, current state
>> stale+active+clean, last acting [4,0,10]
>> pg 5.18 is stuck stale for 67m, current state
>> stale+active+clean, last acting [4,11,2]
>> pg 5.19 is stuck stale for 67m, current state
>> stale+active+clean, last acting [4,3,10]
>> pg 5.1e is stuck stale for 10h, current state
>> stale+active+clean, last acting [0,7,11]
>> pg 5.22 is stuck stale for 10h, current state
>> stale+active+clean, last acting [0,6,18]
>> pg 5.26 is stuck stale for 67m, current state
>> stale+active+clean, last acting [4,1,18]
>> pg 5.29 is stuck stale for 10h, current state
>> stale+active+clean, last acting [0,11,6]
>> pg 5.2b is stuck stale for 10h, current state
>> stale+active+clean, last acting [0,18,6]
>> pg 5.30 is stuck stale 

[ceph-users] Re: MDS crash

2024-04-26 Thread Eugen Block
Hi, it's unlikely that all OSDs fail at the same time, it seems like a  
network issue. Do you have an active MGR? Just a couple of days ago  
someone reported incorrect OSD stats because no MGR was up. Although  
your 'ceph health detail' output doesn't mention that, there are still  
issues when MGR processes are active according to ceph but don't  
respond anymore.
I would probably start with basic network debugging, e. g. iperf,  
pings on public and cluster networks (if present) and so on.


Regards,
Eugen

Zitat von Alexey GERASIMOV :


Colleagues, I have the update.

Starting from yestrerday the situation with ceph health is much  
worse than it was previously.

We found that
- ceph -s informs us that some PGs are in stale state
-  almost all diagnostic ceph subcommands hang! For example, "ceph  
osd ls" , "ceph osd dump",  "ceph osd tree", "ceph health detail"  
provide the output - but "ceph osd status", all the commands "ceph  
pg ..." and other ones hang.


So, it looks that the crashes of MDS daemons were the first signs of  
problems only.
I read that "stale" state for PGs means that all nodes storing this  
placement group may be down - but it's wrong, all osd daemons are up  
on all three nodes:


--- ceph osd tree
ID  CLASS  WEIGHTTYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 68.05609  root default
-3 22.68536  host asrv-dev-stor-1
 0hdd   5.45799  osd.0 up   1.0  1.0
 1hdd   5.45799  osd.1 up   1.0  1.0
 2hdd   5.45799  osd.2 up   1.0  1.0
 3hdd   5.45799  osd.3 up   1.0  1.0
12ssd   0.42670  osd.12up   1.0  1.0
13ssd   0.42670  osd.13up   1.0  1.0
-5 22.68536  host asrv-dev-stor-2
 4hdd   5.45799  osd.4 up   1.0  1.0
 5hdd   5.45799  osd.5 up   1.0  1.0
 6hdd   5.45799  osd.6 up   1.0  1.0
 7hdd   5.45799  osd.7 up   1.0  1.0
14ssd   0.42670  osd.14up   1.0  1.0
15ssd   0.42670  osd.15up   1.0  1.0
-7 22.68536  host asrv-dev-stor-3
 8hdd   5.45799  osd.8 up   1.0  1.0
10hdd   5.45799  osd.10up   1.0  1.0
11hdd   5.45799  osd.11up   1.0  1.0
18hdd   5.45799  osd.18up   1.0  1.0
16ssd   0.42670  osd.16up   1.0  1.0
17ssd   0.42670  osd.17up   1.0  1.0

May it be the physical problem with our drives? "smartctl -a"  
informs nothing wrong.  We started the surface check using dd  
command also but it will be 7 hours per drive at least...


What should we do also?

The output of  "ceph health detail":

ceph health detail
HEALTH_ERR 1 MDSs report damaged metadata; insufficient standby MDS  
daemons available; Reduced data availability: 50 pgs stale; 90  
daemons have recently crashed; 3 mgr modules have recently crashed

[ERR] MDS_DAMAGE: 1 MDSs report damaged metadata
mds.asrv-dev-stor-2(mds.0): Metadata damage detected
[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
have 0; want 1 more
[WRN] PG_AVAILABILITY: Reduced data availability: 50 pgs stale
pg 5.0 is stuck stale for 67m, current state stale+active+clean,  
last acting [4,1,11]
pg 5.13 is stuck stale for 67m, current state  
stale+active+clean, last acting [4,0,10]
pg 5.18 is stuck stale for 67m, current state  
stale+active+clean, last acting [4,11,2]
pg 5.19 is stuck stale for 67m, current state  
stale+active+clean, last acting [4,3,10]
pg 5.1e is stuck stale for 10h, current state  
stale+active+clean, last acting [0,7,11]
pg 5.22 is stuck stale for 10h, current state  
stale+active+clean, last acting [0,6,18]
pg 5.26 is stuck stale for 67m, current state  
stale+active+clean, last acting [4,1,18]
pg 5.29 is stuck stale for 10h, current state  
stale+active+clean, last acting [0,11,6]
pg 5.2b is stuck stale for 10h, current state  
stale+active+clean, last acting [0,18,6]
pg 5.30 is stuck stale for 10h, current state  
stale+active+clean, last acting [0,8,7]
pg 5.37 is stuck stale for 67m, current state  
stale+active+clean, last acting [4,10,0]
pg 5.3c is stuck stale for 67m, current state  
stale+active+clean, last acting [4,10,3]
pg 5.43 is stuck stale for 10h, current state  
stale+active+clean, last acting [0,6,18]
pg 5.44 is stuck stale for 67m, current state  
stale+active+clean, last acting [4,2,11]
pg 5.45 is stuck stale for 67m, current state  
stale+active+clean, last acting [4,11,3]
pg 5.47 is stuck stale for 67m, current state