[ceph-users] Re: mds behind on trimming - replay until memory exhausted

Frank Schilder Mon, 08 Jun 2020 11:09:21 -0700

OK, now we are talking. It is very well possible that trimming will not start 
until this operation is completed.


If there are enough shards/copies to recover the lost objects, you should try a 
pg repair first. If you did loose too many replicas, there are ways to flush 
this PG out of the system. You will loose data this way. I don't know how to 
repair or flush only broken objects out of a PG, but would hope that this is 
possible.

Before you do anything destructive, open a new thread in this list specifically 
for how to repair/remove this PG with the least possible damage.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Francois Legrand <f...@lpnhe.in2p3.fr>
Sent: 08 June 2020 16:00:28
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory 
exhausted

There is no recovery going on, but indeed we have a pg damaged (with
some lost objects due to a major crash few weeks ago)... and there are
some shards of this pg on osd 27 !
That's also why we are migrating all the data out of this FS !
It's certainly related and I guess that  it's trying to remove some
datas that are already lost and it get stuck ! I don't know if there is
a way to tell ceph to forget about these ops ! I guess no.
I thus think that there is not that much to do apart from reading as
much data as we can to save as much as possible.
F.

Le 08/06/2020 à 15:48, Frank Schilder a écrit :
> That's strange. Maybe there is another problem. Do you have any other health 
> warnings that might be related? Is there some recovery/rebalancing going on?
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Francois Legrand <f...@lpnhe.in2p3.fr>
> Sent: 08 June 2020 15:27:59
> To: Frank Schilder; ceph-users
> Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory 
> exhausted
>
> Thanks again for the hint !
> Indeed, I did a
> ceph daemon  mds.lpnceph-mds02.in2p3.fr objecter_requests
> and it seems that osd 27 is more or less stuck with op of age 34987.5
> (while others osd have ages < 1).
> I tryed a ceph osd down 27 which resulted in reseting the age but I can
> notice that age for osd.27 ops is rising again.
> I think I will restart it (btw our osd servers and mds are different
> machines).
> F.
>
> Le 08/06/2020 à 15:01, Frank Schilder a écrit :
>> Hi Francois,
>>
>> this sounds great. At least its operational. I guess it is still using a lot 
>> of swap while trying to replay operations.
>>
>> I would disconnect cleanly all clients if you didn't do so already, even any 
>> read-only clients. Any extra load will just slow down recovery. My best 
>> guess is, that the MDS is replaying some operations, which is very slow due 
>> to swap. While doing so, the segments to trim will probably keep increasing 
>> for a while until it can start trimming.
>>
>> The slow meta-data IO is an operation hanging in some OSD. You should check 
>> which OSD it is (ceph health detail) and check if you can see the operation 
>> in the OSDs OPS queue. I would expect this OSD to have a really long OPS 
>> queue. I have seen meta-data operations hang for a long time. In case this 
>> OSD runs on the same server as your MDS, you will probably have to sit it 
>> out.
>>
>> If the meta-data operation is the only operation in the queue, the OSD might 
>> need a restart. But be careful, if in doubt ask the list first.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Francois Legrand <f...@lpnhe.in2p3.fr>
>> Sent: 08 June 2020 14:45:13
>> To: Frank Schilder; ceph-users
>> Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory 
>> exhausted
>>
>> Hi Franck,
>> Finally I dit :
>> ceph config set global mds_beacon_grace 600000
>> and create /etc/sysctl.d/sysctl-ceph.conf with
>> vm.min_free_kbytes=4194303
>> and then
>> sysctl --system
>>
>> After that, the mds went to rejoin for a very long time (almost 24
>> hours) with errors like :
>> 2020-06-07 04:10:36.802 7ff866e2e700  1 heartbeat_map is_healthy
>> 'MDSRank' had timed out after 15
>> 2020-06-07 04:10:36.802 7ff866e2e700  0
>> mds.beacon.lpnceph-mds02.in2p3.fr Skipping beacon heartbeat to monitors
>> (last acked 14653.8s ago); MDS internal heartbeat is not healthy!
>> 2020-06-07 04:10:37.021 7ff868e32700 -1 monclient: _check_auth_rotating
>> possible clock skew, rotating keys expired way too early (before
>> 2020-06-07 03:10:37.022271)
>> and also
>> 2020-06-07 04:10:44.942 7ff86d63b700  0 auth: could not find secret_id=10363
>> 2020-06-07 04:10:44.942 7ff86d63b700  0 cephx: verify_authorizer could
>> not get service secret for service mds secret_id=10363
>>
>> but at the end the mds went active ! :-)
>> I let it at rest from sunday afternoon until this morning.
>> Indeed I was able to connect clients (in read-only for now) and read the
>> datas.
>> I checked the clients connected with ceph tell
>> mds.lpnceph-mds02.in2p3.fr client ls
>> and disconnected the few clients still there (with umount) and checked
>> that they were not connected anymore with the same command.
>> But I still have the following warnings
>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>>        mdslpnceph-mds02.in2p3.fr(mds.0): 1 slow metadata IOs are blocked >
>> 30 secs, oldest blocked for 75372 secs
>> MDS_TRIM 1 MDSs behind on trimming
>>        mdslpnceph-mds02.in2p3.fr(mds.0): Behind on trimming (122836/128)
>> max_segments: 128, num_segments: 122836
>>
>> and the number of segments is still rising (slowly).
>> F.
>>
>>
>> Le 08/06/2020 à 12:00, Frank Schilder a écrit :
>>> Hi Francois,
>>>
>>> did you manage to get any further with this?
>>>
>>> Best regards,
>>> =================
>>> Frank Schilder
>>> AIT Risø Campus
>>> Bygning 109, rum S14
>>>
>>> ________________________________________
>>> From: Frank Schilder <fr...@dtu.dk>
>>> Sent: 06 June 2020 15:21:59
>>> To: ceph-users; f...@lpnhe.in2p3.fr
>>> Subject: [ceph-users] Re: mds behind on trimming - replay until memory 
>>> exhausted
>>>
>>> I think you have a problem similar to one I have. The priority of beacons 
>>> seems very low. As soon as something gets busy, beacons are ignored or not 
>>> sent. This was part of your log messages from the MDS. It stopped reporting 
>>> to the MONs due to laggy connection. This laggyness is a result of swapping:
>>>
>>>> 2020-06-05 21:39:06.015 7f251bfe6700  1 mds.0.322900 skipping upkeep
>>>> work because connection to Monitors appears laggy
>>> Hence, during the (entire) time you are trying to get the MDS back using 
>>> swap, it will almost certainly stop sending beacons. Therefore, you need to 
>>> disable the time-out temporarily, otherwise the MON will always kill it for 
>>> no real reason. The time-out should be long enough to cover the entire 
>>> recovery period.
>>>
>>> Best regards,
>>> =================
>>> Frank Schilder
>>> AIT Risø Campus
>>> Bygning 109, rum S14
>>>
>>> ________________________________________
>>> From: Francois Legrand <f...@lpnhe.in2p3.fr>
>>> Sent: 06 June 2020 11:11
>>> To: Frank Schilder; ceph-users
>>> Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory 
>>> exhausted
>>>
>>> Thanks for the tip,
>>> I will try that. For now vm.min_free_kbytes = 90112
>>> Indeed, yesterday after your last mail I set mds_beacon_grace to 240.0
>>> but this didn't change anything...
>>>        -27> 2020-06-06 06:15:07.373 7f83e3626700  1
>>> mds.beacon.lpnceph-mds04.in2p3.fr MDS connection to Monitors appears to
>>> be laggy; 332.044s since last acked beacon
>>> Which is the same time since last acked beacon I had before changing the
>>> parameter.
>>> As mds beacon interval is 4 s setting mds_beacon_grace to 240 should
>>> lead to 960 s (16mn).  Thus I think that the bottleneck is elsewhere.
>>> F.
>>>
>>>
>>> Le 06/06/2020 à 09:47, Frank Schilder a écrit :
>>>> Hi Francois,
>>>>
>>>> there is actually one more parameter you might consider changing in case 
>>>> the MDS gets kicked out again. For a system under such high memory 
>>>> pressure, the value for the kernel parameter vm.min_free_kbytes might need 
>>>> adjusting. You can check the current value with
>>>>
>>>> sysctl vm.min_free_kbytes
>>>>
>>>> In your case with heavy swap usage, this value should probably be 
>>>> somewhere between 2-4GB.
>>>>
>>>> Careful, do not change this value while memory is in high demand. If not 
>>>> enough memory is available, setting this will immediately OOM kill your 
>>>> machine. Make sure that plenty of pages are unused. Drop page cache if 
>>>> necessary or reboot the machine before setting this value.
>>>>
>>>> Best regards,
>>>> =================
>>>> Frank Schilder
>>>> AIT Risø Campus
>>>> Bygning 109, rum S14
>>>>
>>>> ________________________________________
>>>> From: Frank Schilder <fr...@dtu.dk>
>>>> Sent: 06 June 2020 00:36:13
>>>> To: ceph-users; f...@lpnhe.in2p3.fr
>>>> Subject: [ceph-users] Re: mds behind on trimming - replay until memory 
>>>> exhausted
>>>>
>>>> Hi Francois,
>>>>
>>>> yes, the beacon grace needs to be higher due to the latency of swap. Not 
>>>> sure if 60s will do. For this particular recovery operation, you might 
>>>> want to go much higher (1h) and watch the cluster health closely.
>>>>
>>>> Good luck and best regards,
>>>> =================
>>>> Frank Schilder
>>>> AIT Risø Campus
>>>> Bygning 109, rum S14
>>>>
>>>> ________________________________________
>>>> From: Francois Legrand <f...@lpnhe.in2p3.fr>
>>>> Sent: 05 June 2020 23:51:04
>>>> To: Frank Schilder; ceph-users
>>>> Subject: Re: [ceph-users] mds behind on trimming - replay until memory 
>>>> exhausted
>>>>
>>>> Hi,
>>>> Unfortunately adding swap did not solve the problem !
>>>> I added 400 GB of swap. It used about 18GB of swap after consuming all
>>>> the ram and stopped with the following logs :
>>>>
>>>> 2020-06-05 21:33:31.967 7f251e7eb700  1 mds.lpnceph-mds04.in2p3.fr
>>>> Updating MDS map to version 324691 from mon.1
>>>> 2020-06-05 21:33:40.355 7f251e7eb700  1 mds.lpnceph-mds04.in2p3.fr
>>>> Updating MDS map to version 324692 from mon.1
>>>> 2020-06-05 21:33:59.787 7f251b7e5700  1 heartbeat_map is_healthy
>>>> 'MDSRank' had timed out after 15
>>>> 2020-06-05 21:33:59.787 7f251b7e5700  0
>>>> mds.beacon.lpnceph-mds04.in2p3.fr Skipping beacon heartbeat to monitors
>>>> (last acked 3.99979s ago); MDS internal heartbeat is not healthy!
>>>> 2020-06-05 21:34:00.287 7f251b7e5700  1 heartbeat_map is_healthy
>>>> 'MDSRank' had timed out after 15
>>>> 2020-06-05 21:34:00.287 7f251b7e5700  0
>>>> mds.beacon.lpnceph-mds04.in2p3.fr Skipping beacon heartbeat to monitors
>>>> (last acked 4.49976s ago); MDS internal heartbeat is not healthy!
>>>> ....
>>>> 2020-06-05 21:39:05.991 7f251bfe6700  1 heartbeat_map reset_timeout
>>>> 'MDSRank' had timed out after 15
>>>> 2020-06-05 21:39:06.015 7f251bfe6700  1
>>>> mds.beacon.lpnceph-mds04.in2p3.fr MDS connection to Monitors appears to
>>>> be laggy; 310.228s since last acked beacon
>>>> 2020-06-05 21:39:06.015 7f251bfe6700  1 mds.0.322900 skipping upkeep
>>>> work because connection to Monitors appears laggy
>>>> 2020-06-05 21:39:19.838 7f251bfe6700  1 mds.0.322900 skipping upkeep
>>>> work because connection to Monitors appears laggy
>>>> 2020-06-05 21:39:19.869 7f251e7eb700  1 mds.lpnceph-mds04.in2p3.fr
>>>> Updating MDS map to version 324694 from mon.1
>>>> 2020-06-05 21:39:19.869 7f251e7eb700  1 mds.lpnceph-mds04.in2p3.fr Map
>>>> removed me (mds.-1 gid:210070681) from cluster due to lost contact;
>>>> respawning
>>>> 2020-06-05 21:39:19.870 7f251e7eb700  1 mds.lpnceph-mds04.in2p3.fr respawn!
>>>> --- begin dump of recent events ---
>>>>       -9999> 2020-06-05 19:28:07.982 7f25217f1700  5
>>>> mds.beacon.lpnceph-mds04.in2p3.fr received beacon reply up:replay seq
>>>> 2131 rtt 0.930951
>>>>       -9998> 2020-06-05 19:28:11.053 7f251b7e5700  5
>>>> mds.beacon.lpnceph-mds04.in2p3.fr Sending beacon up:replay seq 2132
>>>>       -9997> 2020-06-05 19:28:11.053 7f251b7e5700 10 monclient:
>>>> _send_mon_message to mon.lpnceph-mon02 at v2:134.158.152.210:3300/0
>>>>       -9996> 2020-06-05 19:28:12.176 7f25217f1700  5
>>>> mds.beacon.lpnceph-mds04.in2p3.fr received beacon reply up:replay seq
>>>> 2132 rtt 1.12294
>>>>       -9995> 2020-06-05 19:28:12.176 7f251e7eb700  1
>>>> mds.lpnceph-mds04.in2p3.fr Updating MDS map to version 323302 from mon.1
>>>>       -9994> 2020-06-05 19:28:14.290 7f251d7e9700 10 monclient: tick
>>>>       -9993> 2020-06-05 19:28:14.290 7f251d7e9700 10 monclient:
>>>> _check_auth_rotating have uptodate secrets (they expire after 2020-06-05
>>>> 19:27:44.290995)
>>>> ...
>>>> 2020-06-05 21:39:31.092 7f3c4d5e3700  1 mds.lpnceph-mds04.in2p3.fr
>>>> Updating MDS map to version 324749 from mon.1
>>>> 2020-06-05 21:39:35.257 7f3c4d5e3700  1 mds.lpnceph-mds04.in2p3.fr
>>>> Updating MDS map to version 324750 from mon.1
>>>> 2020-06-05 21:39:35.257 7f3c4d5e3700  1 mds.lpnceph-mds04.in2p3.fr Map
>>>> has assigned me to become a standby
>>>>
>>>> However, the mons doesn't seems particularly loaded !
>>>> So I am trying to set mds_beacon_grace to 60.0 to see if it helps (I did
>>>> it both for mds and mons daemons because it's seems to be present in
>>>> both conf).
>>>> I will tells you if it works.
>>>>
>>>> Any other clue ?
>>>> F.
>>>>
>>>> Le 05/06/2020 à 14:44, Frank Schilder a écrit :
>>>>> Hi Francois,
>>>>>
>>>>> thanks for the link. The option "mds dump cache after rejoin" is for 
>>>>> debugging purposes only. It will write the cache after rejoin to a file, 
>>>>> but not drop the cache. This will not help you. I think this was 
>>>>> implemented recently to make it possible to send a cache dump file to 
>>>>> developers after an MDS crash before the restarting MDS changes the cache.
>>>>>
>>>>> In your case, I would set osd_op_queue_cut_off during the next regular 
>>>>> cluster service or upgrade.
>>>>>
>>>>> My best bet right now is to try to add swap. Maybe someone else reading 
>>>>> this has a better idea or you find a hint in one of the other threads.
>>>>>
>>>>> Good luck!
>>>>> =================
>>>>> Frank Schilder
>>>>> AIT Risø Campus
>>>>> Bygning 109, rum S14
>>>>>
>>>>> ________________________________________
>>>>> From: Francois Legrand<f...@lpnhe.in2p3.fr>
>>>>> Sent: 05 June 2020 14:34:06
>>>>> To: Frank Schilder; ceph-users
>>>>> Subject: Re: [ceph-users] mds behind on trimming - replay until memory 
>>>>> exhausted
>>>>>
>>>>> Le 05/06/2020 à 14:18, Frank Schilder a écrit :
>>>>>> Hi Francois,
>>>>>>
>>>>>>> I was also wondering if setting mds dump cache after rejoin could help ?
>>>>>> Haven't heard of that option. Is there some documentation?
>>>>> I found it on :
>>>>> https://docs.ceph.com/docs/nautilus/cephfs/mds-config-ref/
>>>>> mds dump cache after rejoin
>>>>> Description
>>>>> Ceph will dump MDS cache contents to a file after rejoining the cache
>>>>> (during recovery).
>>>>> Type
>>>>> Boolean
>>>>> Default
>>>>> false
>>>>>
>>>>> but I don't think it can help in my case, because rejoin occurs after
>>>>> replay and in my case replay never ends !
>>>>>
>>>>>>> I have :
>>>>>>> osd_op_queue=wpq
>>>>>>> osd_op_queue_cut_off=low
>>>>>>> I can try to set osd_op_queue_cut_off to high, but it will be useful
>>>>>>> only if the mds get active, true ?
>>>>>> I think so. If you have no clients connected, there should not be queue 
>>>>>> priority issues. Maybe it is best to wait until your cluster is healthy 
>>>>>> again as you will need to restart all daemons. Make sure you set this in 
>>>>>> [global]. When I applied that change and after re-starting all OSDs my 
>>>>>> MDSes had reconnect issues until I set it on them too. I think all 
>>>>>> daemons use that option (the prefix osd_ is misleading).
>>>>> For sure I would prefer not to restart all daemons because the second
>>>>> filesystem is up and running (with production clients).
>>>>>
>>>>>>> For now, the mds_cache_memory_limit is set to 8 589 934 592 (so 8GB
>>>>>>> which seems reasonable for a mds server with 32/48GB).
>>>>>> This sounds bad. 8GB should not cause any issues. Maybe you are hitting 
>>>>>> a bug, I believe there is a regression in Nautilus. There were recent 
>>>>>> threads on absurdly high memory use by MDSes. Maybe its worth searching 
>>>>>> for these in the list.
>>>>> I will have a look.
>>>>>
>>>>>>> I already force the clients to unmount (and even rebooted the ones from
>>>>>>> which the rsync and the rmdir .snaps were launched).
>>>>>> I don't know when the MDS acknowledges this. If is was a clean unmount 
>>>>>> (i.e. without -f or forced by reboot) the MDS should have dropped the 
>>>>>> clients already. If it was an unclean unmount it might not be that easy 
>>>>>> to get the stale client session out. However, I don't know about that.
>>>>> Moreover when I did that, the mds was already not active but in replay,
>>>>> so for sure the unmount was not acknowledged by any mds !
>>>>>
>>>>>>> I think that providing more swap maybe the solution ! I will try that if
>>>>>>> I cannot find another way to fix it.
>>>>>> If the memory overrun is somewhat limited, this should allow the MDS to 
>>>>>> trim the logs. Will take a while, but it will do eventually.
>>>>>>
>>>>>> Best regards,
>>>>>> =================
>>>>>> Frank Schilder
>>>>>> AIT Risø Campus
>>>>>> Bygning 109, rum S14
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Francois Legrand<f...@lpnhe.in2p3.fr>
>>>>>> Sent: 05 June 2020 13:46:03
>>>>>> To: Frank Schilder; ceph-users
>>>>>> Subject: Re: [ceph-users] mds behind on trimming - replay until memory 
>>>>>> exhausted
>>>>>>
>>>>>> I was also wondering if setting mds dump cache after rejoin could help ?
>>>>>>
>>>>>>
>>>>>> Le 05/06/2020 à 12:49, Frank Schilder a écrit :
>>>>>>> Out of interest, I did the same on a mimic cluster a few months ago, 
>>>>>>> running up to 5 parallel rsync sessions without any problems. I moved 
>>>>>>> about 120TB. Each rsync was running on a separate client with its own 
>>>>>>> cache. I made sure that the sync dirs were all disjoint (no overlap of 
>>>>>>> files/directories).
>>>>>>>
>>>>>>> How many rsync processes are you running in parallel?
>>>>>>> Do you have these settings enabled:
>>>>>>>
>>>>>>>          osd_op_queue=wpq
>>>>>>>          osd_op_queue_cut_off=high
>>>>>>>
>>>>>>> WPQ should be default, but osd_op_queue_cut_off=high might not be. 
>>>>>>> Setting the latter removed any behind trimming problems we have seen 
>>>>>>> before.
>>>>>>>
>>>>>>> You are in a somewhat peculiar situation. I think you need to trim 
>>>>>>> client caches, which requires an active MDS. If your MDS becomes active 
>>>>>>> for at least some time, you could try the following (I'm not an expert 
>>>>>>> here, so take with a grain of scepticism):
>>>>>>>
>>>>>>> - reduce the MDS cache memory limit to force recall of caps much 
>>>>>>> earlier than now
>>>>>>> - reduce client cach size
>>>>>>> - set "osd_op_queue_cut_off=high" if not already done so, I think this 
>>>>>>> requires restart of OSDs, so be careful
>>>>>>>
>>>>>>> At this point, you could watch your restart cycle to see if things 
>>>>>>> improve already. Maybe nothing more is required.
>>>>>>>
>>>>>>> If you have good SSDs, you could try to provide temporarily some swap 
>>>>>>> space. It saved me once. This will be very slow, but at least it might 
>>>>>>> allow you to move forward.
>>>>>>>
>>>>>>> Harder measures:
>>>>>>>
>>>>>>> - stop all I/O from the FS clients, throw users out if necessary
>>>>>>> - ideally, try to cleanly (!) shut down clients or force trimming the 
>>>>>>> cache by either
>>>>>>>          * umount or
>>>>>>>          * sync; echo 3 > /proc/sys/vm/drop_caches
>>>>>>>          Either of these might hang for a long time. Do not interrupt 
>>>>>>> and do not do this on more than one client at a time.
>>>>>>>
>>>>>>> At some point, your active MDS should be able to hold a full session. 
>>>>>>> You should then tune the cache and other parameters such that the MDSes 
>>>>>>> can handle your rsync sessions.
>>>>>>>
>>>>>>> My experience is that MDSes overrun their cache limits quite a lot. 
>>>>>>> Since I reduced mds_cache_memory_limit to not more than half of what is 
>>>>>>> physically available, I have not had any problems again.
>>>>>>>
>>>>>>> Hope that helps.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> =================
>>>>>>> Frank Schilder
>>>>>>> AIT Risø Campus
>>>>>>> Bygning 109, rum S14
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Francois Legrand<f...@lpnhe.in2p3.fr>
>>>>>>> Sent: 05 June 2020 11:42:42
>>>>>>> To: ceph-users
>>>>>>> Subject: [ceph-users] mds behind on trimming - replay until memory 
>>>>>>> exhausted
>>>>>>>
>>>>>>> Hi all,
>>>>>>> We have a ceph nautilus cluster (14.2.8) with two cephfs filesystem and
>>>>>>> 3 mds (1 active for each fs + one failover).
>>>>>>> We are transfering all the datas (~600M files) from one FS (which was in
>>>>>>> EC 3+2) to the other FS (in R3).
>>>>>>> On the old FS we first removed the snapshots (to avoid strays problems
>>>>>>> when removing files) and the ran some rsync deleting the files after the
>>>>>>> transfer.
>>>>>>> The operation should last a few weeks more to complete.
>>>>>>> But few days ago, we started to have some warning mds behind on trimming
>>>>>>> from the mds managing the old FS.
>>>>>>> Yesterday, I restarted the active mds service to force the takeover by
>>>>>>> the standby mds (basically because the standby is more powerfull and
>>>>>>> have more memory, i.e 48GB over 32).
>>>>>>> The standby mds took the rank 0 and started to replay... the mds behind
>>>>>>> on trimming came back and the number of segments rised as well as the
>>>>>>> memory usage of the server. Finally, it exhausted the memory of the mds
>>>>>>> and the service stopped and the previous mds took rank 0 and started to
>>>>>>> replay... until memory exhaustion and a new switch of mds etc...
>>>>>>> It thus seems that we are in a never ending loop ! And of course, as the
>>>>>>> mds is always in replay, the data are not accessible and the transfers
>>>>>>> are blocked.
>>>>>>> I stopped all the rsync and unmount the clients.
>>>>>>>
>>>>>>> My questions are :
>>>>>>> - Does the mds trim during the replay so we could hope that after a
>>>>>>> while it will purge everything and the mds will be able to become active
>>>>>>> at the end ?
>>>>>>> - Is there a way to accelerate the operation or to fix this situation ?
>>>>>>>
>>>>>>> Thanks for you help.
>>>>>>> F.
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list --ceph-users@ceph.io
>>>>>>> To unsubscribe send an email toceph-users-le...@ceph.io
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

Reply via email to