I am merely putting the client to sleep and waking it up.  When it is up, 
running ls on the mounted directory.  As far as I am concerned at very high 
level I am doing the same thing.  All are running 3.13 kernel Debian provided.

When that infinite loop of decrypt error happened, I waited about 10 minutes 
before I restarted MDS.

Last time MDS crashed restarting MDS didn't get out of that degraded mode 
without me restarting the OSD for hours.  That's why I started restarting OSDs 
shortly after MDS restarts.  Next time I'll try to wait a bit more.

Regards,
Hong


________________________________
 From: Gregory Farnum <g...@inktank.com>
To: hjcho616 <hjcho...@yahoo.com> 
Cc: Mohd Bazli Ab Karim <bazli.abka...@mimos.my>; "Yan, Zheng" 
<uker...@gmail.com>; Sage Weil <s...@inktank.com>; "ceph-users@lists.ceph.com" 
<ceph-users@lists.ceph.com> 
Sent: Tuesday, March 25, 2014 11:05 AM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 

On Mon, Mar 24, 2014 at 6:26 PM, hjcho616 <hjcho...@yahoo.com> wrote:
> I tried the patch twice.  First time, it worked.  There was no issue.
> Connected back to MDS and was happily running.  All three MDS demons were
> running ok.
>
> Second time though... all three demons were alive.  Health was reported OK.
> However client does not connect to MDS.  MDS demon gets following messages
> over and over and over again.  192.168.1.30 is one of the OSD.
> 2014-03-24 20:20:51.722367 7f400c735700  0 cephx: verify_reply couldn't
> decrypt with error: error decoding block for decryption
> 2014-03-24 20:20:51.722392 7f400c735700  0 -- 192.168.1.20:6803/21678 >>
> 192.168.1.30:6806/3796 pipe(0x2be3b80 sd=20 :56656 s=1 pgs=0 cs=0 l=1
> c=0x2bd6840).failed verifying authorize reply

This sounds different than the scenario you initially described, with
a client going to sleep. Exactly what are you doing?

>
> When I restart the MDS (not OSDs) when I do ceph health detail I did see a
> mds degraded message with a replay.  I restarted OSDs again and OSDs and it
> was ok.  Is there something I can do to prevent this?

That sounds normal -- the MDS has to replay its journal when it
restarts. It shouldn't take too long, but restarting OSDs definitely
won't help since the MDS is trying to read data off of them.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to