[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
If any rook-ceph users see the situation that mds is stuck in replay, then look 
at the logs of the mds pod.

When it runs and then terminates repeatedly, check if  there is "liveness probe 
termninated" error message by typing "kubectl describe pod -n (namspace) (mds' 
pod name)"

If there is the error message, it's helpful to increase the threshold about 
"liveness probe"

In my case, it resolved the issue.
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
This issue has been closed.
If any rook-ceph users see this, when mds replay takes a long time, look at the 
logs in mds pod.
If it's going well and then abruptly terminates, try describing the mds pod, 
and if liveness probe terminated, try increasing the threadhold of liveness 
probe.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
I think the rook-ceph is not responding to the liveness probe (confirmed by k8s 
describe mds pod) I don't think it's the memory as I don't limit it, and I have 
the cpu set to 500m per mds, but what direction should I go from here?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mds terminated

2023-07-18 Thread dxodnd
hello.
I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9 and 10 are in 
standby.
I have one ceph filesystem, and 2 mds are trimming.
Under one FILESYSTEM, there are 6 MDSs in RESOLVE, 1 MDS in REPLAY, and 3 in 
ACTIVE.
For some reason, since 36 hours ago, RESOLVE is stuck in TRIMMING, and so are 
the MDSs in REPLAY.
I've also tried FAILing each MDS, but to no avail.
I think something should change when the MDS in REPLAY goes to RESOLVE, but I 
don't know what.
Even looking at the logs of the REPLAY MDS, it's hard to see any messages other 
than it is TERMINATED every 11 minutes.
I'm desperate for someone's help.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io