Nope just these segfaults.

[149884.709608] ceph-mds[17366]: segfault at 200 ip 00007f09de9d60b8 sp 
00007f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
[211263.265402] ceph-mds[17135]: segfault at 200 ip 00007f59eec280b8 sp 
00007f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
[214638.927759] ceph-mds[16896]: segfault at 200 ip 00007fcb2c89e0b8 sp 
00007fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
[289338.461271] ceph-mds[20878]: segfault at 200 ip 00007f4b7211c0b8 sp 
00007f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
[373738.961475] ceph-mds[21341]: segfault at 200 ip 00007f36c3d480b8 sp 
00007f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]

Regards,
Hong


________________________________
 From: Luke Jing Yuan <jyl...@mimos.my>
To: hjcho616 <hjcho...@yahoo.com> 
Cc: Mohd Bazli Ab Karim <bazli.abka...@mimos.my>; "ceph-users@lists.ceph.com" 
<ceph-users@lists.ceph.com> 
Sent: Thursday, March 20, 2014 10:53 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
 


Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like 
that?

Regards, 
Luke

On Mar 21, 2014, at 11:09 AM, "hjcho616" <hjcho...@yahoo.com> wrote:


On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.
>
>
>I was able to recover by restarting every node.  I currently have three 
>machine, one with MDS and MON, and two with OSDs.
>
>
>It is failing everytime my client machine goes to sleep.  If you need me to 
>run something let me know what and how.
>
>
>Regards,
>Hong
>
>
>
>________________________________
> From: Mohd Bazli Ab Karim <bazli.abka...@mimos.my>
>To: hjcho616 <hjcho...@yahoo.com>; "ceph-users@lists.ceph.com" 
><ceph-users@lists.ceph.com> 
>Sent: Thursday, March 20, 2014 9:40 PM
>Subject: RE: [ceph-users] MDS crash when client goes to sleep
>
>
>
> 
>Hi Hong,
>May I know what has happened to your MDS once it crashed? Was it able to 
>recover from replay?
>We also facing this issue and I am interested to know on how to reproduce it.
> 
>Thanks.
>Bazli
> 
>From:ceph-users-boun...@lists.ceph.com 
>[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
>Sent: Friday, March 21, 2014 10:29 AM
>To: ceph-users@lists.ceph.com
>Subject: [ceph-users] MDS crash when client goes to sleep
> 
>When CephFS is mounted on a client and when client decides to go to sleep, MDS 
>segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
>in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
>kernels 3.13.  What can I do to not crash the whole system if a client goes to 
>sleep (and looks like disconnect may do the same)? Let me know if you need any 
>more info.
> 
>Regards,
>Hong
> 
>   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
>--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
>-- ?+0 0x1ee9f080 con 0x2e56580
>   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
><== mon.0 192.168.1.20:6789/0 21764 ==== mdsbeacon(6798/MDS1.2 up:active seq 
>21120 v6970) v2 ==== 108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
>   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
>check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
>baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
>caps per inode
>   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>>> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept 
>sd=18 192.168.1.101:52026/0
>   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
>192.168.1.101:52026/0)
>   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
>   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 
>c=0x1f0e2160).fault 0: Success
>   -36> 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 
>192.168.1.101:0/2113152127
>   -35> 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset 
>closing connection for session client.6019 192.168.1.101:0/2113152127
>   -34> 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 
>mark_down 0x1f0e2160 -- pipe dne
>   -33> 2014-03-20 20:08:44.653628 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
>>> :/0 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept 
>sd=18 192.168.1.101:52027/0
>   -32> 2014-03-20 20:08:44.653677 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e22c0).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
>192.168.1.101:52027/0)
>   -31> 2014-03-20 20:08:44.925618 7fee411d4700  1 -- 192.168.1.20:6801/17079 
><== client.6019 192.168.1.101:0/2113152127 1 ==== client_reconnect(77349 caps) 
>v2 ==== 0+0+11032578 (0 0 3293767716) 0x2e92780 con 0x1f0e22c0
>   -30> 2014-03-20 20:08:44.925682 7fee411d4700  1 mds.0.server  no longer in 
>reconnect state, ignoring reconnect, sending close
>   -29> 2014-03-20 20:08:44.925735 7fee411d4700  0 log [INF] : denied 
>reconnect attempt (mds is up:active) from client.6019 
>192.168.1.101:0/2113152127 after 2014-03-20 20:08:44.925679 (allowed interval 
>45)
>   -28> 2014-03-20 20:08:44.925748 7fee411d4700  1 -- 192.168.1.20:6801/17079 
>--> 192.168.1.101:0/2113152127 -- client_session(close) v1 -- ?+0 0x3ea6540 
>con 0x1f0e22c0
>   -27> 2014-03-20 20:08:44.927727 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
>c=0x1f0e22c0).reader couldn't read tag, Success
>   -26> 2014-03-20 20:08:44.927797 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
>c=0x1f0e22c0).fault 0: Success
>   -25> 2014-03-20 20:08:44.927849 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
>c=0x1f0e22c0).fault, server, going to standby
>   -24> 2014-03-20 20:08:46.372279 7fee401d2700 10 monclient: tick
>   -23> 2014-03-20 20:08:46.372339 7fee401d2700 10 monclient: 
>_check_auth_rotating have uptodate secrets (they expire after 2014-03-20 
>20:08:16.372333)
>   -22> 2014-03-20 20:08:46.372373 7fee401d2700 10 monclient: renew subs? 
>(now: 2014-03-20 20:08:46.372372; renew after: 2014-03-20 20:09:56.370811) -- 
>no
>   -21> 2014-03-20 20:08:46.372403 7fee401d2700 10  log_queue is 1 last_log 2 
>sent 1 num 1 unsent 1 sending 1
>   -20> 2014-03-20 20:08:46.372421 7fee401d2700 10  will send 2014-03-20 
>20:08:44.925741 mds.0 192.168.1.20:6801/17079 2 : [INF] denied reconnect 
>attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
>2014-03-20 20:08:44.925679 (allowed interval 45)
>   -19> 2014-03-20 20:08:46.372466 7fee401d2700 10 monclient: 
>_send_mon_message to mon.MDS1 at 192.168.1.20:6789/0
>   -18> 2014-03-20 20:08:46.372483 7fee401d2700  1 -- 192.168.1.20:6801/17079 
>--> 192.168.1.20:6789/0 -- log(1 entries) v1 -- ?+0 0x71d8d80 con 0x2e56580
>   -17> 2014-03-20 20:08:46.463496 7fee3f0cf700 10 monclient: 
>_send_mon_message to mon.MDS1 at 192.168.1.20:6789/0
>   -16> 2014-03-20 20:08:46.463524 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
>--> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21121 v6970) v2 
>-- ?+0 0x1ee91340 con 0x2e56580
>   -15> 2014-03-20 20:08:46.499688 7fee411d4700  1 -- 192.168.1.20:6801/17079 
><== mon.0 192.168.1.20:6789/0 21765 ==== log(last 2) v1 ==== 24+0+0 
>(1174756693 0 0) 0x3ea7a40 con 0x2e56580
>   -14> 2014-03-20 20:08:46.499751 7fee411d4700 10 handle_log_ack log(last 2) 
>v1
>   -13> 2014-03-20 20:08:46.499757 7fee411d4700 10  logged 2014-03-20 
>20:08:44.925741 mds.0 192.168.1.20:6801/17079 2 : [INF] denied reconnect 
>attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
>2014-03-20 20:08:44.925679 (allowed interval 45)
>   -12> 2014-03-20 20:08:46.499825 7fee411d4700  1 -- 192.168.1.20:6801/17079 
><== mon.0 192.168.1.20:6789/0 21766 ==== mdsbeacon(6798/MDS1.2 up:active seq 
>21121 v6970) v2 ==== 108+0+0 (51773011 0 0) 0x1ee25600 con 0x2e56580
>   -11> 2014-03-20 20:08:47.303323 7fee3ddca700  1 -- 192.168.1.20:6801/17079 
>>> :/0 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e3080).accept 
>sd=21 192.168.1.101:52029/0
>   -10> 2014-03-20 20:08:47.308830 7fee3ddca700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e3080).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
>192.168.1.101:52029/0)
>    -9> 2014-03-20 20:08:47.309221 7fee3ddca700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e3080).accept connect_seq 0 vs existing 1 state standby
>    -8> 2014-03-20 20:08:47.309251 7fee3ddca700  0 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 
>c=0x1f0e3080).accept peer reset, then tried to connect to us, replacing
>    -7> 2014-03-20 20:08:47.309337 7fee411d4700  5 mds.0.35 
>ms_handle_remote_reset on 192.168.1.101:0/2113152127
>    -6> 2014-03-20 20:08:47.309357 7fee411d4700  3 mds.0.35 
>ms_handle_remote_reset closing connection for session client.6019 
>192.168.1.101:0/2113152127
>    -5> 2014-03-20 20:08:47.309385 7fee411d4700  1 -- 192.168.1.20:6801/17079 
>mark_down 0x1f0e22c0 -- 0x3e92280
>    -4> 2014-03-20 20:08:47.309478 7fee411d4700  5 mds.0.35 ms_handle_reset on 
>192.168.1.101:0/2113152127
>    -3> 2014-03-20 20:08:47.309490 7fee411d4700  3 mds.0.35 ms_handle_reset 
>closing connection for session client.6019 192.168.1.101:0/2113152127
>    -2> 2014-03-20 20:08:47.309501 7fee411d4700  1 -- 192.168.1.20:6801/17079 
>mark_down 0x1f0e3080 -- 0x3e92280
>    -1> 2014-03-20 20:08:47.309485 7fee3ddca700  2 -- 192.168.1.20:6801/17079 
>>> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=4 pgs=136 cs=1 l=0 
>c=0x1f0e22c0).accept read error on newly_acked_seq
>     0> 2014-03-20 20:08:47.310724 7fee411d4700 -1 *** Caught signal 
>(Segmentation fault) **
> in thread 7fee411d4700
> 
> ceph version 0.77 (1bca9c5c412b3af722d5250f07fd562a23cf35ff)
> 1: /usr/bin/ceph-mds() [0x8baf22]
> 2: (()+0xf880) [0x7fee458ef880]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
>interpret this.
> 
>--- logging levels ---
>   0/ 5 none
>   0/ 1 lockdep
>   0/ 1 context
>   1/ 1 crush
>   1/ 5 mds
>   1/ 5 mds_balancer
>   1/ 5 mds_locker
>   1/ 5 mds_log
>   1/ 5 mds_log_expire
>   1/ 5 mds_migrator
>   0/ 1 buffer
>   0/ 1 timer
>   0/ 1 filer
>   0/ 1 striper
>   0/ 1 objecter
>   0/ 5 rados
>   0/ 5 rbd
>   0/ 5 journaler
>   0/ 5 objectcacher
>   0/ 5 client
>   0/ 5 osd
>   0/ 5 optracker
>   0/ 5 objclass
>   1/ 3 filestore
>   1/ 3 keyvaluestore
>   1/ 3 journal
>   0/ 5 ms
>   1/ 5 mon
>   0/10 monc
>   1/ 5 paxos
>   0/ 5 tp
>   1/ 5 auth
>   1/ 5 crypto
>   1/ 1 finisher
>   1/ 5 heartbeatmap
>   1/ 5 perfcounter
>   1/ 5 rgw
>   1/ 5 javaclient
>   1/ 5 asok
>   1/ 1 throttle
>  -2/-2 (syslog threshold)
>  -1/-1 (stderr threshold)
>  max_recent     10000
>  max_new         1000
>  log_file /var/log/ceph/ceph-mds.MDS1.2.log
>--- end dump of recent events ---
> 
> 
>>________________________________
> 
>DISCLAIMER:
>
>This e-mail (including any attachments) is for the addressee(s) only and may 
>be confidential, especially as regards personal data. If you are not the 
>intended recipient, please note that any dealing, review, distribution, 
>printing, copying or use of this e-mail is strictly prohibited. If you have 
>received this email in error, please notify the sender immediately and delete 
>the original message (including any attachments).
>
>MIMOS Berhad is a research and development institution under the purview of 
>the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
>conclusions and other information in this e-mail that do not relate to the 
>official business of MIMOS Berhad and/or its subsidiaries shall be understood 
>as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
>neither MIMOS Berhad nor its subsidiaries accepts responsibility for the same. 
>All liability arising from or in connection with computer viruses and/or 
>corrupted e-mails is excluded to the fullest extent permitted by law.
>
>
_______________________________________________
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
________________________________
 
DISCLAIMER:

This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).

MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.
------------------------------------------------------------------
-
-
DISCLAIMER: 

This e-mail (including any attachments) is for the addressee(s) 
only and may contain confidential information. If you are not the 
intended recipient, please note that any dealing, review, 
distribution, printing, copying or use of this e-mail is strictly 
prohibited. If you have received this email in error, please notify 
the sender  immediately and delete the original message. 
MIMOS Berhad is a research and development institution under 
the purview of the Malaysian Ministry of Science, Technology and 
Innovation. Opinions, conclusions and other information in this e-
mail that do not relate to the official business of MIMOS Berhad 
and/or its subsidiaries shall be understood as neither given nor 
endorsed by MIMOS Berhad and/or its subsidiaries and neither 
MIMOS Berhad nor its subsidiaries accepts responsibility for the 
same. All liability arising from or in connection with computer 
viruses and/or corrupted e-mails is excluded to the fullest extent 
permitted by law.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to