Very possible.  They were off by about 5 minute.  I was able to connect back to 
MDS after I installed NTP on all of them.  This happened at wake on client 
(including installation of NTP on all nodes).

[137537.630016] libceph: mds0 192.168.1.20:6803 socket closed (con state 
NEGOTIATING)
[137707.372620] ceph: mds0 caps stale
[137707.373504] ceph: mds0 caps renewed
[137807.489200] ceph: mds0 caps stale
[137827.512563] ceph: mds0 caps stale
[137984.482575] ceph: mds0 caps renewed

and OSD had some heartbeat_map logs but seemed to cleared itself up.

And yes those cephx message seemed to have continued way before the wake event. 
 

I'll monitor the sleep and wake few more times and see if it is good.

Thanks.

Regards,
Hong


________________________________
 From: Gregory Farnum <g...@inktank.com>
To: hjcho616 <hjcho...@yahoo.com> 
Cc: Mohd Bazli Ab Karim <bazli.abka...@mimos.my>; "Yan, Zheng" 
<uker...@gmail.com>; Sage Weil <s...@inktank.com>; "ceph-users@lists.ceph.com" 
<ceph-users@lists.ceph.com> 
Sent: Tuesday, March 25, 2014 12:59 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 

On Tue, Mar 25, 2014 at 9:56 AM, hjcho616 <hjcho...@yahoo.com> wrote:
> I am merely putting the client to sleep and waking it up.  When it is up,
> running ls on the mounted directory.  As far as I am concerned at very high
> level I am doing the same thing.  All are running 3.13 kernel Debian
> provided.
>
> When that infinite loop of decrypt error happened, I waited about 10 minutes
> before I restarted MDS.

Okay, but a sleeping client isn't going to impact the OSD/MSD
sessions. Are some of them also hosted on the client and going to
sleep at the same time?
Otherwise, how far out of sync are their clocks? They don't need to be
very close but if they're off by hours it will cause these sorts of
problems with CephX.

> Last time MDS crashed restarting MDS didn't get out of that degraded
 mode
> without me restarting the OSD for hours.  That's why I started restarting
> OSDs shortly after MDS restarts.  Next time I'll try to wait a bit more.
Okay, it certainly shouldn't take hours, but that makes me think it
was having the same authentication trouble as your log snippet
previously.

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to