Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

Sean Sullivan Mon, 30 Apr 2018 17:24:53 -0700

So I think I can reliably reproduce this crash from a ceph client.

```
root@kh08-8:~# ceph -s
  cluster:
    id:     9f58ee5a-7c5d-4d68-81ee-debe16322544
    health: HEALTH_OK


  services:
    mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
    mgr: kh08-8(active)
    mds: cephfs-1/1/1 up  {0=kh09-8=up:active}, 1 up:standby
    osd: 570 osds: 570 up, 570 in
```


then from a client try to mount aufs over cephfs:
```
mount -vvv -t aufs -o br=/cephfs=rw:/mnt/aufs=rw -o udba=reval none /aufs
```

Now watch as your ceph mds servers fail:

```
root@kh08-8:~# ceph -s
  cluster:
    id:     9f58ee5a-7c5d-4d68-81ee-debe16322544
    health: HEALTH_WARN
            insufficient standby MDS daemons available

  services:
    mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
    mgr: kh08-8(active)
    mds: cephfs-1/1/1 up  {0=kh10-8=up:active(laggy or crashed)}
```


I am now stuck in a degraded and I can't seem to get them to start again.

On Mon, Apr 30, 2018 at 5:06 PM, Sean Sullivan <lookcr...@gmail.com> wrote:

> I had 2 MDS servers (one active one standby) and both were down. I took a
> dumb chance and marked the active as down (it said it was up but laggy).
> Then started the primary again and now both are back up. I have never seen
> this before I am also not sure of what I just did.
>
> On Mon, Apr 30, 2018 at 4:32 PM, Sean Sullivan <lookcr...@gmail.com>
> wrote:
>
>> I was creating a new user and mount point. On another hardware node I
>> mounted CephFS as admin to mount as root. I created /aufstest and then
>> unmounted. From there it seems that both of my mds nodes crashed for some
>> reason and I can't start them any more.
>>
>> https://pastebin.com/1ZgkL9fa -- my mds log
>>
>> I have never had this happen in my tests so now I have live data here. If
>> anyone can lend a hand or point me in the right direction while
>> troubleshooting that would be a godsend!
>>
>> I tried cephfs-journal-tool inspect and it reports that the journal
>> should be fine. I am not sure why it's crashing:
>>
>> /home/lacadmin# cephfs-journal-tool journal inspect
>> Overall journal integrity: OK
>>
>>
>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

Reply via email to