Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

Jonathan Woytek Thu, 16 Aug 2018 05:58:50 -0700

This did the trick! THANK YOU!

After starting with the mds_wipe_sessions set and after removing the
mds*_openfiles.0 entries in the metadata pool, mds started almost
immediately and went to active. I verified that the filesystem could mount
again, shut down mds, removed the wipe sessions setting, and restarted all
four mds daemons. The cluster is back to healthy again.


I've got more stuff to write up on our end for recovery procedures now, and
that's a good thing! Thanks again!

jonathan

On Wed, Aug 15, 2018 at 11:12 PM, Jonathan Woytek <woy...@dryrose.com>
wrote:

>
> On Wed, Aug 15, 2018 at 11:02 PM Yan, Zheng <uker...@gmail.com> wrote:
>
>> On Thu, Aug 16, 2018 at 10:55 AM Jonathan Woytek <woy...@dryrose.com>
>> wrote:
>> >
>> > ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic
>> (stable)
>> >
>> >
>>
>> Try deleting mds0_openfiles.0 (mds1_openfiles.0 and so on if you have
>> multiple active mds)  from metadata pool of your filesystem. Records
>> in these files are open files hints. It's safe to delete them.
>
>
> I will try that in the morning. I had to bail for the night here (UTC-4).
> Thank you!
>
> Jonathan
>
>> --
> Sent from my Commodore64
>



-- 
Jonathan Woytek
http://www.dryrose.com
KB3HOZ
PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

Reply via email to