Re: [ceph-users] MDS crash when client goes to sleep

2014-04-03 Thread Florent B
I'm not sure I will re-test and tell you ;)

On 04/02/2014 04:14 PM, Gregory Farnum wrote:
> A *clean* shutdown? That sounds like a different issue; hjcho616's
> issue only happens when a client wakes back up again.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Wed, Apr 2, 2014 at 6:34 AM, Florent B  wrote:
>> Can someone confirm that this issue is also in Emperor release (0.72) ?
>>
>> I think I have the same problem than hjcho616 : Debian Wheezy with 3.13
>> backports, and MDS dying when a client shutdown.
>>
>> On 03/31/2014 11:46 PM, Gregory Farnum wrote:
>>> Yes, Zheng's fix for the MDS crash is in current mainline and will be
>>> in the next Firefly RC release.
>>>
>>> Sage, is there something else we can/should be doing when a client
>>> goes to sleep that we aren't already? (ie, flushing out all dirty data
>>> or something and disconnecting?)
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Thu, Mar 27, 2014 at 4:45 PM, hjcho616  wrote:
 Looks like client is waking up ok now.  Thanks.

 Will those fixes be included in next release? Firefly?

 Regards,
 Hong
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-04-02 Thread Gregory Farnum
A *clean* shutdown? That sounds like a different issue; hjcho616's
issue only happens when a client wakes back up again.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 2, 2014 at 6:34 AM, Florent B  wrote:
> Can someone confirm that this issue is also in Emperor release (0.72) ?
>
> I think I have the same problem than hjcho616 : Debian Wheezy with 3.13
> backports, and MDS dying when a client shutdown.
>
> On 03/31/2014 11:46 PM, Gregory Farnum wrote:
>> Yes, Zheng's fix for the MDS crash is in current mainline and will be
>> in the next Firefly RC release.
>>
>> Sage, is there something else we can/should be doing when a client
>> goes to sleep that we aren't already? (ie, flushing out all dirty data
>> or something and disconnecting?)
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Thu, Mar 27, 2014 at 4:45 PM, hjcho616  wrote:
>>> Looks like client is waking up ok now.  Thanks.
>>>
>>> Will those fixes be included in next release? Firefly?
>>>
>>> Regards,
>>> Hong
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-31 Thread Gregory Farnum
Yes, Zheng's fix for the MDS crash is in current mainline and will be
in the next Firefly RC release.

Sage, is there something else we can/should be doing when a client
goes to sleep that we aren't already? (ie, flushing out all dirty data
or something and disconnecting?)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Mar 27, 2014 at 4:45 PM, hjcho616  wrote:
> Looks like client is waking up ok now.  Thanks.
>
> Will those fixes be included in next release? Firefly?
>
> Regards,
> Hong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-27 Thread hjcho616
Looks like client is waking up ok now.  Thanks.

Will those fixes be included in next release? Firefly?

Regards,
Hong



 From: hjcho616 
To: Gregory Farnum  
Cc: "ceph-users@lists.ceph.com"  
Sent: Tuesday, March 25, 2014 11:56 AM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 


I am merely putting the client to sleep and waking it up.  When it is up, 
running ls on the mounted directory.  As far as I am concerned at very high 
level I am doing the same thing.  All are running 3.13 kernel Debian provided.

When that infinite loop of decrypt error happened, I waited about 10 minutes 
before I restarted MDS.

Last time MDS crashed restarting MDS didn't get out of that degraded mode 
without me restarting the OSD for hours.  That's why I started restarting OSDs 
shortly after MDS restarts.  Next time I'll try to wait a bit more.

Regards,
Hong



 From: Gregory Farnum 
To: hjcho616  
Cc: Mohd Bazli Ab Karim ; "Yan, Zheng" 
; Sage Weil ; "ceph-users@lists.ceph.com" 
 
Sent: Tuesday, March 25, 2014 11:05 AM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 

On Mon, Mar 24, 2014 at 6:26 PM, hjcho616  wrote:
> I tried the patch twice.  First time, it worked.  There was no issue.
> Connected back to MDS and was happily running.  All three MDS demons were
> running ok.
>
> Second time though... all three demons were alive.  Health was reported OK.
> However client does not connect to MDS.  MDS demon gets following messages
> over and over and over again.  192.168.1.30 is one of the OSD.
> 2014-03-24 20:20:51.722367 7f400c735700  0 cephx: verify_reply couldn't
> decrypt with error: error decoding block for decryption
> 2014-03-24 20:20:51.722392 7f400c735700  0 -- 192.168.1.20:6803/21678 >>
> 192.168.1.30:6806/3796 pipe(0x2be3b80 sd=20 :56656 s=1 pgs=0 cs=0 l=1
> c=0x2bd6840).failed verifying
 authorize reply

This sounds different than the scenario you initially described, with
a client going to sleep. Exactly what are you doing?

>
> When I restart the MDS (not OSDs) when I do ceph health detail I did see a
> mds degraded message with a replay.  I restarted OSDs again and OSDs and it
> was ok.  Is there something I can do to prevent this?

That sounds normal -- the MDS has to replay its journal when it
restarts. It shouldn't take too long, but restarting OSDs definitely
won't help since the MDS is trying to read data off of them.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-25 Thread hjcho616
Very possible.  They were off by about 5 minute.  I was able to connect back to 
MDS after I installed NTP on all of them.  This happened at wake on client 
(including installation of NTP on all nodes).

[137537.630016] libceph: mds0 192.168.1.20:6803 socket closed (con state 
NEGOTIATING)
[137707.372620] ceph: mds0 caps stale
[137707.373504] ceph: mds0 caps renewed
[137807.489200] ceph: mds0 caps stale
[137827.512563] ceph: mds0 caps stale
[137984.482575] ceph: mds0 caps renewed

and OSD had some heartbeat_map logs but seemed to cleared itself up.

And yes those cephx message seemed to have continued way before the wake event. 
 

I'll monitor the sleep and wake few more times and see if it is good.

Thanks.

Regards,
Hong



 From: Gregory Farnum 
To: hjcho616  
Cc: Mohd Bazli Ab Karim ; "Yan, Zheng" 
; Sage Weil ; "ceph-users@lists.ceph.com" 
 
Sent: Tuesday, March 25, 2014 12:59 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 

On Tue, Mar 25, 2014 at 9:56 AM, hjcho616  wrote:
> I am merely putting the client to sleep and waking it up.  When it is up,
> running ls on the mounted directory.  As far as I am concerned at very high
> level I am doing the same thing.  All are running 3.13 kernel Debian
> provided.
>
> When that infinite loop of decrypt error happened, I waited about 10 minutes
> before I restarted MDS.

Okay, but a sleeping client isn't going to impact the OSD/MSD
sessions. Are some of them also hosted on the client and going to
sleep at the same time?
Otherwise, how far out of sync are their clocks? They don't need to be
very close but if they're off by hours it will cause these sorts of
problems with CephX.

> Last time MDS crashed restarting MDS didn't get out of that degraded
 mode
> without me restarting the OSD for hours.  That's why I started restarting
> OSDs shortly after MDS restarts.  Next time I'll try to wait a bit more.
Okay, it certainly shouldn't take hours, but that makes me think it
was having the same authentication trouble as your log snippet
previously.

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-25 Thread Gregory Farnum
On Tue, Mar 25, 2014 at 9:56 AM, hjcho616  wrote:
> I am merely putting the client to sleep and waking it up.  When it is up,
> running ls on the mounted directory.  As far as I am concerned at very high
> level I am doing the same thing.  All are running 3.13 kernel Debian
> provided.
>
> When that infinite loop of decrypt error happened, I waited about 10 minutes
> before I restarted MDS.

Okay, but a sleeping client isn't going to impact the OSD/MSD
sessions. Are some of them also hosted on the client and going to
sleep at the same time?
Otherwise, how far out of sync are their clocks? They don't need to be
very close but if they're off by hours it will cause these sorts of
problems with CephX.

> Last time MDS crashed restarting MDS didn't get out of that degraded mode
> without me restarting the OSD for hours.  That's why I started restarting
> OSDs shortly after MDS restarts.  Next time I'll try to wait a bit more.
Okay, it certainly shouldn't take hours, but that makes me think it
was having the same authentication trouble as your log snippet
previously.

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-25 Thread hjcho616
I am merely putting the client to sleep and waking it up.  When it is up, 
running ls on the mounted directory.  As far as I am concerned at very high 
level I am doing the same thing.  All are running 3.13 kernel Debian provided.

When that infinite loop of decrypt error happened, I waited about 10 minutes 
before I restarted MDS.

Last time MDS crashed restarting MDS didn't get out of that degraded mode 
without me restarting the OSD for hours.  That's why I started restarting OSDs 
shortly after MDS restarts.  Next time I'll try to wait a bit more.

Regards,
Hong



 From: Gregory Farnum 
To: hjcho616  
Cc: Mohd Bazli Ab Karim ; "Yan, Zheng" 
; Sage Weil ; "ceph-users@lists.ceph.com" 
 
Sent: Tuesday, March 25, 2014 11:05 AM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 

On Mon, Mar 24, 2014 at 6:26 PM, hjcho616  wrote:
> I tried the patch twice.  First time, it worked.  There was no issue.
> Connected back to MDS and was happily running.  All three MDS demons were
> running ok.
>
> Second time though... all three demons were alive.  Health was reported OK.
> However client does not connect to MDS.  MDS demon gets following messages
> over and over and over again.  192.168.1.30 is one of the OSD.
> 2014-03-24 20:20:51.722367 7f400c735700  0 cephx: verify_reply couldn't
> decrypt with error: error decoding block for decryption
> 2014-03-24 20:20:51.722392 7f400c735700  0 -- 192.168.1.20:6803/21678 >>
> 192.168.1.30:6806/3796 pipe(0x2be3b80 sd=20 :56656 s=1 pgs=0 cs=0 l=1
> c=0x2bd6840).failed verifying authorize reply

This sounds different than the scenario you initially described, with
a client going to sleep. Exactly what are you doing?

>
> When I restart the MDS (not OSDs) when I do ceph health detail I did see a
> mds degraded message with a replay.  I restarted OSDs again and OSDs and it
> was ok.  Is there something I can do to prevent this?

That sounds normal -- the MDS has to replay its journal when it
restarts. It shouldn't take too long, but restarting OSDs definitely
won't help since the MDS is trying to read data off of them.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-25 Thread Gregory Farnum
On Mon, Mar 24, 2014 at 6:26 PM, hjcho616  wrote:
> I tried the patch twice.  First time, it worked.  There was no issue.
> Connected back to MDS and was happily running.  All three MDS demons were
> running ok.
>
> Second time though... all three demons were alive.  Health was reported OK.
> However client does not connect to MDS.  MDS demon gets following messages
> over and over and over again.  192.168.1.30 is one of the OSD.
> 2014-03-24 20:20:51.722367 7f400c735700  0 cephx: verify_reply couldn't
> decrypt with error: error decoding block for decryption
> 2014-03-24 20:20:51.722392 7f400c735700  0 -- 192.168.1.20:6803/21678 >>
> 192.168.1.30:6806/3796 pipe(0x2be3b80 sd=20 :56656 s=1 pgs=0 cs=0 l=1
> c=0x2bd6840).failed verifying authorize reply

This sounds different than the scenario you initially described, with
a client going to sleep. Exactly what are you doing?

>
> When I restart the MDS (not OSDs) when I do ceph health detail I did see a
> mds degraded message with a replay.  I restarted OSDs again and OSDs and it
> was ok.  Is there something I can do to prevent this?

That sounds normal -- the MDS has to replay its journal when it
restarts. It shouldn't take too long, but restarting OSDs definitely
won't help since the MDS is trying to read data off of them.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-23 Thread Mohd Bazli Ab Karim
Hi Hong,

Could you apply the patch and see if it crash after sleep?
This could lead us to find the correct fix to MDS/client too.

As what I can see here, this patch should fix the crash, but how to fix MDS if 
the crash happens?
It happened to us, when it crashed, it was totally crash, and even restart the 
ceph-mds service with --reset-journal also not helping.
Anyone can shed some lights on this matter?

p/s: Is there any steps/tools to backup the MDS metadata? Say if MDS crash and 
refuse to run normally, can we restore the backup metadata? I'm thinking of it 
as a preventive steps, just in case if it happens again in future.

Many thanks.
Bazli

-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: Sunday, March 23, 2014 2:53 PM
To: Sage Weil
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil  wrote:
> Hi,
>
> I looked at this a bit earlier and wasn't sure why we would be getting
> a remote_reset event after a sleep/wake cycle.  The patch should fix
> the crash, but I'm a bit worried something is not quite right on the
> client side, too...
>

When client wakes up, it first tries reconnecting the old session. MDS refuses 
the reconnect request and sends a session close message to the client. After 
receiving the session close message, client closes the old session, then sends 
a session open message to the MDS.  The MDS receives the open request and 
triggers a remote reset
(Pipe.cc:466)

> sage
>
> On Sun, 23 Mar 2014, Yan, Zheng wrote:
>
>> thank you for reporting this. Below patch should fix this issue
>>
>> ---
>> diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc index 57c7f4a..6b53c14
>> 100644
>> --- a/src/mds/MDS.cc
>> +++ b/src/mds/MDS.cc
>> @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
>>if (session->is_closed()) {
>>   dout(3) << "ms_handle_reset closing connection for session " <<
>> session->info.inst << dendl;
>>   messenger->mark_down(con);
>> + con->set_priv(NULL);
>>   sessionmap.remove_session(session);
>>}
>>session->put();
>> @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
>>if (session->is_closed()) {
>>   dout(3) << "ms_handle_remote_reset closing connection for session "
>> << session->info.inst << dendl;
>>   messenger->mark_down(con);
>> + con->set_priv(NULL);
>>   sessionmap.remove_session(session);
>>}
>>session->put();
>>
>> On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
>>  wrote:
>> > Hi Hong,
>> >
>> >
>> >
>> > How's the client now? Would it able to mount to the filesystem now?
>> > It looks similar to our case,
>> > http://www.spinics.net/lists/ceph-devel/msg18395.html
>> >
>> > However, you need to collect some logs to confirm this.
>> >
>> >
>> >
>> > Thanks.
>> >
>> >
>> >
>> >
>> >
>> > From: hjcho616 [mailto:hjcho...@yahoo.com]
>> > Sent: Friday, March 21, 2014 2:30 PM
>> >
>> >
>> > To: Luke Jing Yuan
>> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
>> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
>> >
>> >
>> >
>> > Luke,
>> >
>> >
>> >
>> > Not sure what flapping ceph-mds daemon mean, but when I connected
>> > to MDS when this happened there no longer was any process with
>> > ceph-mds when I ran one daemon.  When I ran three there were one
>> > left but wasn't doing much.  I didn't record the logs but behavior
>> > was very similar in 0.72 emperor.  I am using debian packages.
>> >
>> >
>> >
>> > Client went to sleep for a while (like 8+ hours).  There was no I/O
>> > prior to the sleep other than the fact that cephfs was still mounted.
>> >
>> >
>> >
>> > Regards,
>> >
>> > Hong
>> >
>> >
>> >
>> > 
>> >
>> > From: Luke Jing Yuan 
>> >
>> >
>> > To: hjcho616 
>> > Cc: Mohd Bazli Ab Karim ;
>> > "ceph-users@lists.ceph.com" 
>> > Sent: Friday, March 21, 2014 1:17 AM
>> >
>> > Subject: RE: [ceph-users] MDS crash when client goes to sleep
>> >
>> >
>> > Hi Hong,
>> >
>> &g

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-22 Thread Yan, Zheng
On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil  wrote:
> Hi,
>
> I looked at this a bit earlier and wasn't sure why we would be getting a
> remote_reset event after a sleep/wake cycle.  The patch should fix the
> crash, but I'm a bit worried something is not quite right on the client
> side, too...
>

When client wakes up, it first tries reconnecting the old session. MDS
refuses the reconnect request and sends a session close message to the
client. After receiving the session close message, client closes the
old session, then sends a session open message to the MDS.  The MDS
receives the open request and triggers a remote reset
(Pipe.cc:466)

> sage
>
> On Sun, 23 Mar 2014, Yan, Zheng wrote:
>
>> thank you for reporting this. Below patch should fix this issue
>>
>> ---
>> diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc
>> index 57c7f4a..6b53c14 100644
>> --- a/src/mds/MDS.cc
>> +++ b/src/mds/MDS.cc
>> @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
>>if (session->is_closed()) {
>>   dout(3) << "ms_handle_reset closing connection for session " <<
>> session->info.inst << dendl;
>>   messenger->mark_down(con);
>> + con->set_priv(NULL);
>>   sessionmap.remove_session(session);
>>}
>>session->put();
>> @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
>>if (session->is_closed()) {
>>   dout(3) << "ms_handle_remote_reset closing connection for session "
>> << session->info.inst << dendl;
>>   messenger->mark_down(con);
>> + con->set_priv(NULL);
>>   sessionmap.remove_session(session);
>>}
>>session->put();
>>
>> On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
>>  wrote:
>> > Hi Hong,
>> >
>> >
>> >
>> > How's the client now? Would it able to mount to the filesystem now? It 
>> > looks
>> > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
>> >
>> > However, you need to collect some logs to confirm this.
>> >
>> >
>> >
>> > Thanks.
>> >
>> >
>> >
>> >
>> >
>> > From: hjcho616 [mailto:hjcho...@yahoo.com]
>> > Sent: Friday, March 21, 2014 2:30 PM
>> >
>> >
>> > To: Luke Jing Yuan
>> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
>> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
>> >
>> >
>> >
>> > Luke,
>> >
>> >
>> >
>> > Not sure what flapping ceph-mds daemon mean, but when I connected to MDS
>> > when this happened there no longer was any process with ceph-mds when I ran
>> > one daemon.  When I ran three there were one left but wasn't doing much.  I
>> > didn't record the logs but behavior was very similar in 0.72 emperor.  I am
>> > using debian packages.
>> >
>> >
>> >
>> > Client went to sleep for a while (like 8+ hours).  There was no I/O prior 
>> > to
>> > the sleep other than the fact that cephfs was still mounted.
>> >
>> >
>> >
>> > Regards,
>> >
>> > Hong
>> >
>> >
>> >
>> > 
>> >
>> > From: Luke Jing Yuan 
>> >
>> >
>> > To: hjcho616 
>> > Cc: Mohd Bazli Ab Karim ;
>> > "ceph-users@lists.ceph.com" 
>> > Sent: Friday, March 21, 2014 1:17 AM
>> >
>> > Subject: RE: [ceph-users] MDS crash when client goes to sleep
>> >
>> >
>> > Hi Hong,
>> >
>> > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
>> > (up:replay) and a flapping ceph-mds daemon, but then again we are using
>> > version 0.72.2. Having said so the triggering point seem similar to us as
>> > well, which is the following line:
>> >
>> >   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 
>> > 192.168.1.20:6801/17079
>> >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0
>> > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
>> >
>> > So how long did your client go into sleep? Was there any I/O prior to the
>> > sleep?
>> >
>> > Regards,
>> > Luke
>> >
>> > From: hjcho616 [mailto:hjcho...@yahoo.com]
>> > Sent: Friday, 21 Ma

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-22 Thread Sage Weil
Hi,

I looked at this a bit earlier and wasn't sure why we would be getting a 
remote_reset event after a sleep/wake cycle.  The patch should fix the 
crash, but I'm a bit worried something is not quite right on the client 
side, too...

sage

On Sun, 23 Mar 2014, Yan, Zheng wrote:

> thank you for reporting this. Below patch should fix this issue
> 
> ---
> diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc
> index 57c7f4a..6b53c14 100644
> --- a/src/mds/MDS.cc
> +++ b/src/mds/MDS.cc
> @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
>if (session->is_closed()) {
>   dout(3) << "ms_handle_reset closing connection for session " <<
> session->info.inst << dendl;
>   messenger->mark_down(con);
> + con->set_priv(NULL);
>   sessionmap.remove_session(session);
>}
>session->put();
> @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
>if (session->is_closed()) {
>   dout(3) << "ms_handle_remote_reset closing connection for session "
> << session->info.inst << dendl;
>   messenger->mark_down(con);
> + con->set_priv(NULL);
>   sessionmap.remove_session(session);
>}
>session->put();
> 
> On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
>  wrote:
> > Hi Hong,
> >
> >
> >
> > How's the client now? Would it able to mount to the filesystem now? It looks
> > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
> >
> > However, you need to collect some logs to confirm this.
> >
> >
> >
> > Thanks.
> >
> >
> >
> >
> >
> > From: hjcho616 [mailto:hjcho...@yahoo.com]
> > Sent: Friday, March 21, 2014 2:30 PM
> >
> >
> > To: Luke Jing Yuan
> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
> >
> >
> >
> > Luke,
> >
> >
> >
> > Not sure what flapping ceph-mds daemon mean, but when I connected to MDS
> > when this happened there no longer was any process with ceph-mds when I ran
> > one daemon.  When I ran three there were one left but wasn't doing much.  I
> > didn't record the logs but behavior was very similar in 0.72 emperor.  I am
> > using debian packages.
> >
> >
> >
> > Client went to sleep for a while (like 8+ hours).  There was no I/O prior to
> > the sleep other than the fact that cephfs was still mounted.
> >
> >
> >
> > Regards,
> >
> > Hong
> >
> >
> >
> > 
> >
> > From: Luke Jing Yuan 
> >
> >
> > To: hjcho616 
> > Cc: Mohd Bazli Ab Karim ;
> > "ceph-users@lists.ceph.com" 
> > Sent: Friday, March 21, 2014 1:17 AM
> >
> > Subject: RE: [ceph-users] MDS crash when client goes to sleep
> >
> >
> > Hi Hong,
> >
> > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
> > (up:replay) and a flapping ceph-mds daemon, but then again we are using
> > version 0.72.2. Having said so the triggering point seem similar to us as
> > well, which is the following line:
> >
> >   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079
> >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0
> > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
> >
> > So how long did your client go into sleep? Was there any I/O prior to the
> > sleep?
> >
> > Regards,
> > Luke
> >
> > From: hjcho616 [mailto:hjcho...@yahoo.com]
> > Sent: Friday, 21 March, 2014 12:09 PM
> > To: Luke Jing Yuan
> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
> >
> > Nope just these segfaults.
> >
> > [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp
> > 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
> > [211263.265402] ceph-mds[17135]: segfault at 200 ip 00007f59eec280b8 sp
> > 00007f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
> > [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp
> > 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
> > [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp
> > 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
> > [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp
>

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-22 Thread Yan, Zheng
thank you for reporting this. Below patch should fix this issue

---
diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc
index 57c7f4a..6b53c14 100644
--- a/src/mds/MDS.cc
+++ b/src/mds/MDS.cc
@@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
   if (session->is_closed()) {
  dout(3) << "ms_handle_reset closing connection for session " <<
session->info.inst << dendl;
  messenger->mark_down(con);
+ con->set_priv(NULL);
  sessionmap.remove_session(session);
   }
   session->put();
@@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
   if (session->is_closed()) {
  dout(3) << "ms_handle_remote_reset closing connection for session "
<< session->info.inst << dendl;
  messenger->mark_down(con);
+ con->set_priv(NULL);
  sessionmap.remove_session(session);
   }
   session->put();

On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
 wrote:
> Hi Hong,
>
>
>
> How's the client now? Would it able to mount to the filesystem now? It looks
> similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
>
> However, you need to collect some logs to confirm this.
>
>
>
> Thanks.
>
>
>
>
>
> From: hjcho616 [mailto:hjcho...@yahoo.com]
> Sent: Friday, March 21, 2014 2:30 PM
>
>
> To: Luke Jing Yuan
> Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] MDS crash when client goes to sleep
>
>
>
> Luke,
>
>
>
> Not sure what flapping ceph-mds daemon mean, but when I connected to MDS
> when this happened there no longer was any process with ceph-mds when I ran
> one daemon.  When I ran three there were one left but wasn't doing much.  I
> didn't record the logs but behavior was very similar in 0.72 emperor.  I am
> using debian packages.
>
>
>
> Client went to sleep for a while (like 8+ hours).  There was no I/O prior to
> the sleep other than the fact that cephfs was still mounted.
>
>
>
> Regards,
>
> Hong
>
>
>
> 
>
> From: Luke Jing Yuan 
>
>
> To: hjcho616 
> Cc: Mohd Bazli Ab Karim ;
> "ceph-users@lists.ceph.com" 
> Sent: Friday, March 21, 2014 1:17 AM
>
> Subject: RE: [ceph-users] MDS crash when client goes to sleep
>
>
> Hi Hong,
>
> That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
> (up:replay) and a flapping ceph-mds daemon, but then again we are using
> version 0.72.2. Having said so the triggering point seem similar to us as
> well, which is the following line:
>
>   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0
> c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
>
> So how long did your client go into sleep? Was there any I/O prior to the
> sleep?
>
> Regards,
> Luke
>
> From: hjcho616 [mailto:hjcho...@yahoo.com]
> Sent: Friday, 21 March, 2014 12:09 PM
> To: Luke Jing Yuan
> Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] MDS crash when client goes to sleep
>
> Nope just these segfaults.
>
> [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp
> 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
> [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp
> 7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
> [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp
> 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
> [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp
> 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
> [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp
> 7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]
>
> Regards,
> Hong
>
> 
> From: Luke Jing Yuan 
> To: hjcho616 
> Cc: Mohd Bazli Ab Karim ;
> "ceph-users@lists.ceph.com" 
> Sent: Thursday, March 20, 2014 10:53 PM
> Subject: Re: [ceph-users] MDS crash when client goes to sleep
>
> Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like
> that?
>
> Regards,
> Luke
>
> On Mar 21, 2014, at 11:09 AM, "hjcho616"  wrote:
> On client, I was no longer able to access the filesystem.  It would hang.
> Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same
> machine.  Two crashes and one appears to be hung up(?). ceph health says MDS
> is in degraded state when that happened.
>
> I was able to recover by restarting every node.  I

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-21 Thread Mohd Bazli Ab Karim
Hi Hong,

How's the client now? Would it able to mount to the filesystem now? It looks 
similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
However, you need to collect some logs to confirm this.

Thanks.


From: hjcho616 [mailto:hjcho...@yahoo.com]
Sent: Friday, March 21, 2014 2:30 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Luke,

Not sure what flapping ceph-mds daemon mean, but when I connected to MDS when 
this happened there no longer was any process with ceph-mds when I ran one 
daemon.  When I ran three there were one left but wasn't doing much.  I didn't 
record the logs but behavior was very similar in 0.72 emperor.  I am using 
debian packages.

Client went to sleep for a while (like 8+ hours).  There was no I/O prior to 
the sleep other than the fact that cephfs was still mounted.

Regards,
Hong


From: Luke Jing Yuan mailto:jyl...@mimos.my>>
To: hjcho616 mailto:hjcho...@yahoo.com>>
Cc: Mohd Bazli Ab Karim 
mailto:bazli.abka...@mimos.my>>; 
"ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" 
mailto:ceph-users@lists.ceph.com>>
Sent: Friday, March 21, 2014 1:17 AM
Subject: RE: [ceph-users] MDS crash when client goes to sleep

Hi Hong,

That's interesting, for Mr. Bazli and I, we ended with MDS stuck in (up:replay) 
and a flapping ceph-mds daemon, but then again we are using version 0.72.2. 
Having said so the triggering point seem similar to us as well, which is the 
following line:

  -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 >> 
192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION

So how long did your client go into sleep? Was there any I/O prior to the sleep?

Regards,
Luke

From: hjcho616 [mailto:hjcho...@yahoo.com<mailto:hjcho...@yahoo.com>]
Sent: Friday, 21 March, 2014 12:09 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Nope just these segfaults.

[149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp 
7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
[211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp 
7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
[214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp 
7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
[289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp 
7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
[373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp 
7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]

Regards,
Hong


From: Luke Jing Yuan mailto:jyl...@mimos.my>>
To: hjcho616 mailto:hjcho...@yahoo.com>>
Cc: Mohd Bazli Ab Karim 
mailto:bazli.abka...@mimos.my>>; 
"ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" 
mailto:ceph-users@lists.ceph.com>>
Sent: Thursday, March 20, 2014 10:53 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like 
that?

Regards,
Luke

On Mar 21, 2014, at 11:09 AM, "hjcho616" 
mailto:hjcho...@yahoo.com>> wrote:
On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.

I was able to recover by restarting every node.  I currently have three 
machine, one with MDS and MON, and two with OSDs.

It is failing everytime my client machine goes to sleep.  If you need me to run 
something let me know what and how.

Regards,
Hong


From: Mohd Bazli Ab Karim 
mailto:bazli.abka...@mimos.my>>
To: hjcho616 mailto:hjcho...@yahoo.com>>; 
"ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" 
mailto:ceph-users@lists.ceph.com>>
Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep

Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.

Thanks.
Bazli

From: 
ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com> 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com<mailto:ceph-users@l

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread hjcho616
Luke,

Not sure what flapping ceph-mds daemon mean, but when I connected to MDS when 
this happened there no longer was any process with ceph-mds when I ran one 
daemon.  When I ran three there were one left but wasn't doing much.  I didn't 
record the logs but behavior was very similar in 0.72 emperor.  I am using 
debian packages.

Client went to sleep for a while (like 8+ hours).  There was no I/O prior to 
the sleep other than the fact that cephfs was still mounted.

Regards,
Hong



 From: Luke Jing Yuan 
To: hjcho616  
Cc: Mohd Bazli Ab Karim ; "ceph-users@lists.ceph.com" 
 
Sent: Friday, March 21, 2014 1:17 AM
Subject: RE: [ceph-users] MDS crash when client goes to sleep
 

Hi Hong,

That's interesting, for Mr. Bazli and I, we ended with MDS stuck in (up:replay) 
and a flapping ceph-mds daemon, but then again we are using version 0.72.2. 
Having said so the triggering point seem similar to us as well, which is the 
following line:

   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION

So how long did your client go into sleep? Was there any I/O prior to the sleep?

Regards,
Luke

From: hjcho616 [mailto:hjcho...@yahoo.com]
Sent: Friday, 21 March, 2014 12:09 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Nope just these segfaults.

[149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp 
7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
[211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp 
7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
[214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp 
7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
[289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp 
7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
[373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp 
7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]

Regards,
Hong


From: Luke Jing Yuan 
To: hjcho616 
Cc: Mohd Bazli Ab Karim ; "ceph-users@lists.ceph.com" 

Sent: Thursday, March 20, 2014 10:53 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like 
that?

Regards,
Luke

On Mar 21, 2014, at 11:09 AM, "hjcho616"  wrote:
On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.

I was able to recover by restarting every node.  I currently have three 
machine, one with MDS and MON, and two with OSDs.

It is failing everytime my client machine goes to sleep.  If you need me to run 
something let me know what and how.

Regards,
Hong


From: Mohd Bazli Ab Karim 
To: hjcho616 ; "ceph-users@lists.ceph.com" 

Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep

Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.

Thanks.
Bazli

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] MDS crash when client goes to sleep

When CephFS is mounted on a client and when client decides to go to sleep, MDS 
segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
kernels 3.13.  What can I do to not crash the whole system if a client goes to 
sleep (and looks like disconnect may do the same)? Let me know if you need any 
more info.

Regards,
Hong

   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
-- ?+0 0x1ee9f080 con 0x2e56580
   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
caps per inode
   -40> 2014-03-20 20:08:44.4949

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread Luke Jing Yuan
Hi Hong,

That's interesting, for Mr. Bazli and I, we ended with MDS stuck in (up:replay) 
and a flapping ceph-mds daemon, but then again we are using version 0.72.2. 
Having said so the triggering point seem similar to us as well, which is the 
following line:

   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION

So how long did your client go into sleep? Was there any I/O prior to the sleep?

Regards,
Luke

From: hjcho616 [mailto:hjcho...@yahoo.com]
Sent: Friday, 21 March, 2014 12:09 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Nope just these segfaults.

[149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp 
7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
[211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp 
7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
[214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp 
7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
[289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp 
7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
[373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp 
7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]

Regards,
Hong


From: Luke Jing Yuan 
To: hjcho616 
Cc: Mohd Bazli Ab Karim ; "ceph-users@lists.ceph.com" 

Sent: Thursday, March 20, 2014 10:53 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like 
that?

Regards,
Luke

On Mar 21, 2014, at 11:09 AM, "hjcho616"  wrote:
On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.

I was able to recover by restarting every node.  I currently have three 
machine, one with MDS and MON, and two with OSDs.

It is failing everytime my client machine goes to sleep.  If you need me to run 
something let me know what and how.

Regards,
Hong


From: Mohd Bazli Ab Karim 
To: hjcho616 ; "ceph-users@lists.ceph.com" 

Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep

Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.

Thanks.
Bazli

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] MDS crash when client goes to sleep

When CephFS is mounted on a client and when client decides to go to sleep, MDS 
segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
kernels 3.13.  What can I do to not crash the whole system if a client goes to 
sleep (and looks like disconnect may do the same)? Let me know if you need any 
more info.

Regards,
Hong

   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
-- ?+0 0x1ee9f080 con 0x2e56580
   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
caps per inode
   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept sd=18 
192.168.1.101:52026/0
   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52026/0)
   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread hjcho616
Nope just these segfaults.

[149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp 
7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
[211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp 
7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
[214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp 
7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
[289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp 
7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
[373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp 
7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]

Regards,
Hong



 From: Luke Jing Yuan 
To: hjcho616  
Cc: Mohd Bazli Ab Karim ; "ceph-users@lists.ceph.com" 
 
Sent: Thursday, March 20, 2014 10:53 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 


Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like 
that?

Regards, 
Luke

On Mar 21, 2014, at 11:09 AM, "hjcho616"  wrote:


On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.
>
>
>I was able to recover by restarting every node.  I currently have three 
>machine, one with MDS and MON, and two with OSDs.
>
>
>It is failing everytime my client machine goes to sleep.  If you need me to 
>run something let me know what and how.
>
>
>Regards,
>Hong
>
>
>
>
> From: Mohd Bazli Ab Karim 
>To: hjcho616 ; "ceph-users@lists.ceph.com" 
> 
>Sent: Thursday, March 20, 2014 9:40 PM
>Subject: RE: [ceph-users] MDS crash when client goes to sleep
>
>
>
> 
>Hi Hong,
>May I know what has happened to your MDS once it crashed? Was it able to 
>recover from replay?
>We also facing this issue and I am interested to know on how to reproduce it.
> 
>Thanks.
>Bazli
> 
>From:ceph-users-boun...@lists.ceph.com 
>[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
>Sent: Friday, March 21, 2014 10:29 AM
>To: ceph-users@lists.ceph.com
>Subject: [ceph-users] MDS crash when client goes to sleep
> 
>When CephFS is mounted on a client and when client decides to go to sleep, MDS 
>segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
>in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
>kernels 3.13.  What can I do to not crash the whole system if a client goes to 
>sleep (and looks like disconnect may do the same)? Let me know if you need any 
>more info.
> 
>Regards,
>Hong
> 
>   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
>--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
>-- ?+0 0x1ee9f080 con 0x2e56580
>   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
><== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
>21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
>   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
>check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
>baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
>caps per inode
>   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>>> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept 
>sd=18 192.168.1.101:52026/0
>   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
>192.168.1.101:52026/0)
>   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
>   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 
>c=0x1f0e2160).fault 0: Success
>   -36> 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 
>192.168.1.101:0/2113152127
>   -35> 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset 
>closing connection for session client.6019 192.168.1.101:0/2113152127
>   -34> 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 
>mark_down 0x1f0e2160 -- pipe dne
>  

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread Luke Jing Yuan
Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like 
that?

Regards,
Luke

On Mar 21, 2014, at 11:09 AM, "hjcho616" 
mailto:hjcho...@yahoo.com>> wrote:

On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.

I was able to recover by restarting every node.  I currently have three 
machine, one with MDS and MON, and two with OSDs.

It is failing everytime my client machine goes to sleep.  If you need me to run 
something let me know what and how.

Regards,
Hong


From: Mohd Bazli Ab Karim 
mailto:bazli.abka...@mimos.my>>
To: hjcho616 mailto:hjcho...@yahoo.com>>; 
"ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" 
mailto:ceph-users@lists.ceph.com>>
Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep

Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.

Thanks.
Bazli

From: 
ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com> 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] MDS crash when client goes to sleep

When CephFS is mounted on a client and when client decides to go to sleep, MDS 
segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
kernels 3.13.  What can I do to not crash the whole system if a client goes to 
sleep (and looks like disconnect may do the same)? Let me know if you need any 
more info.

Regards,
Hong

   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
-- ?+0 0x1ee9f080 con 0x2e56580
   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
caps per inode
   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept sd=18 
192.168.1.101:52026/0
   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52026/0)
   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 
c=0x1f0e2160).fault 0: Success
   -36> 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 
192.168.1.101:0/2113152127
   -35> 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset 
closing connection for session client.6019 192.168.1.101:0/2113152127
   -34> 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 
mark_down 0x1f0e2160 -- pipe dne
   -33> 2014-03-20 20:08:44.653628 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept sd=18 
192.168.1.101:52027/0
   -32> 2014-03-20 20:08:44.653677 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e22c0).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52027/0)
   -31> 2014-03-20 20:08:44.925618 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== client.6019 192.168.1.101:0/2113152127 1  client_reconnect(77349 caps) 
v2  0+0+11032578 (0 0 3293767716) 0x2e92780 con 0x1f0e22c0
   -30> 2014-03-20 20:08:44.925682 7fee411d4700  1 mds.0.server  no longer in 
reconnect state, ignoring reconnect, sending close
   -29> 2014-03-20 20:08:44.925735 7fee411d4700  0 log [INF] : denied reconnect 
attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
2014-03-20 20:08:44.925679 (allowed inter

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread hjcho616
On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.

I was able to recover by restarting every node.  I currently have three 
machine, one with MDS and MON, and two with OSDs.

It is failing everytime my client machine goes to sleep.  If you need me to run 
something let me know what and how.

Regards,
Hong



 From: Mohd Bazli Ab Karim 
To: hjcho616 ; "ceph-users@lists.ceph.com" 
 
Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep
 


 
Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.
 
Thanks.
Bazli
 
From:ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] MDS crash when client goes to sleep
 
When CephFS is mounted on a client and when client decides to go to sleep, MDS 
segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
kernels 3.13.  What can I do to not crash the whole system if a client goes to 
sleep (and looks like disconnect may do the same)? Let me know if you need any 
more info.
 
Regards,
Hong
 
   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
-- ?+0 0x1ee9f080 con 0x2e56580
   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
caps per inode
   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept sd=18 
192.168.1.101:52026/0
   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52026/0)
   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 
c=0x1f0e2160).fault 0: Success
   -36> 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 
192.168.1.101:0/2113152127
   -35> 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset 
closing connection for session client.6019 192.168.1.101:0/2113152127
   -34> 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 
mark_down 0x1f0e2160 -- pipe dne
   -33> 2014-03-20 20:08:44.653628 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept sd=18 
192.168.1.101:52027/0
   -32> 2014-03-20 20:08:44.653677 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e22c0).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52027/0)
   -31> 2014-03-20 20:08:44.925618 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== client.6019 192.168.1.101:0/2113152127 1  client_reconnect(77349 caps) 
v2  0+0+11032578 (0 0 3293767716) 0x2e92780 con 0x1f0e22c0
   -30> 2014-03-20 20:08:44.925682 7fee411d4700  1 mds.0.server  no longer in 
reconnect state, ignoring reconnect, sending close
   -29> 2014-03-20 20:08:44.925735 7fee411d4700  0 log [INF] : denied reconnect 
attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
2014-03-20 20:08:44.925679 (allowed interval 45)
   -28> 2014-03-20 20:08:44.925748 7fee411d4700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.101:0/2113152127 -- client_session(close) v1 -- ?+0 0x3ea6540 con 
0x1f0e22c0
   -27> 2014-03-20 20:08:44.927727 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).reader couldn't read tag, Success
   -26> 2014-03-

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread Mohd Bazli Ab Karim
Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.

Thanks.
Bazli

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] MDS crash when client goes to sleep

When CephFS is mounted on a client and when client decides to go to sleep, MDS 
segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
kernels 3.13.  What can I do to not crash the whole system if a client goes to 
sleep (and looks like disconnect may do the same)? Let me know if you need any 
more info.

Regards,
Hong

   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
-- ?+0 0x1ee9f080 con 0x2e56580
   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
caps per inode
   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept sd=18 
192.168.1.101:52026/0
   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52026/0)
   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 
c=0x1f0e2160).fault 0: Success
   -36> 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 
192.168.1.101:0/2113152127
   -35> 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset 
closing connection for session client.6019 192.168.1.101:0/2113152127
   -34> 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 
mark_down 0x1f0e2160 -- pipe dne
   -33> 2014-03-20 20:08:44.653628 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept sd=18 
192.168.1.101:52027/0
   -32> 2014-03-20 20:08:44.653677 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e22c0).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52027/0)
   -31> 2014-03-20 20:08:44.925618 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== client.6019 192.168.1.101:0/2113152127 1  client_reconnect(77349 caps) 
v2  0+0+11032578 (0 0 3293767716) 0x2e92780 con 0x1f0e22c0
   -30> 2014-03-20 20:08:44.925682 7fee411d4700  1 mds.0.server  no longer in 
reconnect state, ignoring reconnect, sending close
   -29> 2014-03-20 20:08:44.925735 7fee411d4700  0 log [INF] : denied reconnect 
attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
2014-03-20 20:08:44.925679 (allowed interval 45)
   -28> 2014-03-20 20:08:44.925748 7fee411d4700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.101:0/2113152127 -- client_session(close) v1 -- ?+0 0x3ea6540 con 
0x1f0e22c0
   -27> 2014-03-20 20:08:44.927727 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).reader couldn't read tag, Success
   -26> 2014-03-20 20:08:44.927797 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).fault 0: Success
   -25> 2014-03-20 20:08:44.927849 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).fault, server, going to standby
   -24> 2014-03-20 20:08:46.372279 7fee401d2700 10 monclient: tick
   -23> 2014-03-20 20:08:46.372339 7fee401d2700 10 monclient: 
_check_auth_rotating have uptodate secrets (they expire after 2014-03-20 
20:08:16.372333)
   -22> 2014-03-20 20:08:46.372373 7fee401d2700 10 monclient: renew subs? (now: 
2014-03-20 20:08:46.372372; renew after: 2014-03-20 20:

[ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread hjcho616
When CephFS is mounted on a client and when client decides to go to sleep, MDS 
segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
kernels 3.13.  What can I do to not crash the whole system if a client goes to 
sleep (and looks like disconnect may do the same)? Let me know if you need any 
more info.

Regards,
Hong

   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
-- ?+0 0x1ee9f080 con 0x2e56580
   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
caps per inode
   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept sd=18 
192.168.1.101:52026/0
   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52026/0)
   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 
c=0x1f0e2160).fault 0: Success
   -36> 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 
192.168.1.101:0/2113152127
   -35> 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset 
closing connection for session client.6019 192.168.1.101:0/2113152127
   -34> 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 
mark_down 0x1f0e2160 -- pipe dne
   -33> 2014-03-20 20:08:44.653628 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>> :/0 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept sd=18 
192.168.1.101:52027/0
   -32> 2014-03-20 20:08:44.653677 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e22c0).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52027/0)
   -31> 2014-03-20 20:08:44.925618 7fee411d4700  1 -- 192.168.1.20:6801/17079 
<== client.6019 192.168.1.101:0/2113152127 1  client_reconnect(77349 caps) 
v2  0+0+11032578 (0 0 3293767716) 0x2e92780 con 0x1f0e22c0
   -30> 2014-03-20 20:08:44.925682 7fee411d4700  1 mds.0.server  no longer in 
reconnect state, ignoring reconnect, sending close
   -29> 2014-03-20 20:08:44.925735 7fee411d4700  0 log [INF] : denied reconnect 
attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
2014-03-20 20:08:44.925679 (allowed interval 45)
   -28> 2014-03-20 20:08:44.925748 7fee411d4700  1 -- 192.168.1.20:6801/17079 
--> 192.168.1.101:0/2113152127 -- client_session(close) v1 -- ?+0 0x3ea6540 con 
0x1f0e22c0
   -27> 2014-03-20 20:08:44.927727 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).reader couldn't read tag, Success
   -26> 2014-03-20 20:08:44.927797 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).fault 0: Success
   -25> 2014-03-20 20:08:44.927849 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).fault, server, going to standby
   -24> 2014-03-20 20:08:46.372279 7fee401d2700 10 monclient: tick
   -23> 2014-03-20 20:08:46.372339 7fee401d2700 10 monclient: 
_check_auth_rotating have uptodate secrets (they expire after 2014-03-20 
20:08:16.372333)
   -22> 2014-03-20 20:08:46.372373 7fee401d2700 10 monclient: renew subs? (now: 
2014-03-20 20:08:46.372372; renew after: 2014-03-20 20:09:56.370811) -- no
   -21> 2014-03-20 20:08:46.372403 7fee401d2700 10  log_queue is 1 last_log 2 
sent 1 num 1 unsent 1 sending 1
   -20> 2014-03-20 20:08:46.372421 7fee401d2700 10  will send 2014-03-20 
20:08:44.925741 mds.0 192.168.1.20:6801/17079 2 : [INF] denied reconnect 
attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
2014-03-20 20:08:44.925679 (allowed interval 45)
   -19> 2014-03-20 20:08:46.372466 7fee401d2700 10 monclient: _send_mon_message 
to mon.MDS1 at 192.168.1.20:6789/0
   -18> 2014-03-20 20:08:46.372483 7fee401d2700