[ceph-users] 16.2.14: [progress WARNING root] complete: ev {UUID} does not exist

2023-09-28 Thread Zakhar Kirpichenko
Hi, Mgr of my cluster logs this every few seconds: [progress WARNING root] complete: ev 7de5bb74-790b-4fda-8838-e4af4af18c62 does not exist [progress WARNING root] complete: ev fff93fce-b630-4141-81ee-19e7a3e61483 does not exist [progress WARNING root] complete: ev a02f6966-5b9f-49e8-89c4-b4fb8e6

[ceph-users] CEPH complete cluster failure: unknown PGS

2023-09-28 Thread v1tnam
I have an 8-node cluster with old hardware. a week ago 4 nodes went down and the CEPH cluster went nuts. All pgs became unknown and montors took too long to be in sync. So i reduced the number of mons to one and mgrs to one as well Now the recovery starts with 100% unknown pgs and then pgs start

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-28 Thread Mark Nelson
There are some pretty strange compaction behavior happening in these logs.  For instance, in osd0, we see a O-1 CF L1 compaction that's taking ~204 seconds: 2023-09-21T20:03:59.378+ 7f16a286c700  4 rocksdb: (Original Log Time 2023/09/21-20:03:59.381808) EVENT_LOG_v1 {"time_micros": 169532

[ceph-users] Re: Snap_schedule does not always work.

2023-09-28 Thread Kushagr Gupta
Hi Milind,Team Thank you for the response @Milind. >>Snap-schedule no longer accepts a --subvol argument, Thank you for the information. Currently, we are using the following commands to create the snap-schedules: Syntax: *"ceph fs snap-schedule add /// "* *"ceph fs snap-schedule retention add

[ceph-users] Re: cephfs health warn

2023-09-28 Thread Ben
Hi Venky, and cephers Thanks for reply. no config changes had been made before the issues occurred. It suspects to be client bug. Please see following message about the log segment accumulation to be trimmed.for the moment problematic client nodes can not be rebooted.evicting client will definite

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-28 Thread Igor Fedotov
Hi Sudhin, It looks like manual DB compactions are (periodically?) issued via admin socket for your OSDs, which (my working hypothesis) triggers DB access stalls. Here are the log lines indicating such calls debug 2023-09-22T11:24:55.234+ 7fc4efa20700  1 osd.1 1192508 triggering manual