Re: [ceph-users] Ceph Status - Segmentation Fault
Hey, I opened an issue at tracker.ceph.com -> http://tracker.ceph.com/issues /16266-Original Message- From: Brad Hubbard To: Mathias Buresch Cc: jsp...@redhat.com , ceph-us...@ceph.com Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Thu, 2 Jun 2016 09:50:20 +1000 Could this be the call in RotatingKeyRing::get_secret() failing? Mathias, I'd suggest opening a tracker for this with the information in your last post and let us know the number here. Cheers, Brad On Wed, Jun 1, 2016 at 3:15 PM, Mathias Buresch wrote: > Hi, > > here is the output including --debug-auth=20. Does this help? > > (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug- > rados=20 --debug-auth=20 > Starting program: /usr/bin/python /usr/bin/ceph status --debug- > monc=20 > --debug-ms=20 --debug-rados=20 --debug-auth=20 > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux- > gnu/libthread_db.so.1". > [New Thread 0x710f5700 (LWP 2210)] > [New Thread 0x708f4700 (LWP 2211)] > [Thread 0x710f5700 (LWP 2210) exited] > [New Thread 0x710f5700 (LWP 2212)] > [Thread 0x710f5700 (LWP 2212) exited] > [New Thread 0x710f5700 (LWP 2213)] > [Thread 0x710f5700 (LWP 2213) exited] > [New Thread 0x710f5700 (LWP 2233)] > [Thread 0x710f5700 (LWP 2233) exited] > [New Thread 0x710f5700 (LWP 2236)] > [Thread 0x710f5700 (LWP 2236) exited] > [New Thread 0x710f5700 (LWP 2237)] > [Thread 0x710f5700 (LWP 2237) exited] > [New Thread 0x710f5700 (LWP 2238)] > [New Thread 0x7fffeb885700 (LWP 2240)] > 2016-06-01 07:12:55.656336 710f5700 10 monclient(hunting): > build_initial_monmap > 2016-06-01 07:12:55.656440 710f5700 1 librados: starting msgr at > :/0 > 2016-06-01 07:12:55.656446 710f5700 1 librados: starting > objecter > [New Thread 0x7fffeb084700 (LWP 2241)] > 2016-06-01 07:12:55.657552 710f5700 10 -- :/0 ready :/0 > [New Thread 0x7fffea883700 (LWP 2242)] > [New Thread 0x7fffea082700 (LWP 2245)] > 2016-06-01 07:12:55.659548 710f5700 1 -- :/0 messenger.start > [New Thread 0x7fffe9881700 (LWP 2248)] > 2016-06-01 07:12:55.660530 710f5700 1 librados: setting wanted > keys > 2016-06-01 07:12:55.660539 710f5700 1 librados: calling > monclient > init > 2016-06-01 07:12:55.660540 710f5700 10 monclient(hunting): init > 2016-06-01 07:12:55.660550 710f5700 5 adding auth protocol: > cephx > 2016-06-01 07:12:55.660552 710f5700 10 monclient(hunting): > auth_supported 2 method cephx > 2016-06-01 07:12:55.660532 7fffe9881700 10 -- :/1337675866 > reaper_entry > start > 2016-06-01 07:12:55.660570 7fffe9881700 10 -- :/1337675866 reaper > 2016-06-01 07:12:55.660572 7fffe9881700 10 -- :/1337675866 reaper > done > 2016-06-01 07:12:55.660733 710f5700 2 auth: KeyRing::load: > loaded > key file /etc/ceph/ceph.client.admin.keyring > [New Thread 0x7fffe9080700 (LWP 2251)] > [New Thread 0x7fffe887f700 (LWP 2252)] > 2016-06-01 07:12:55.662754 710f5700 10 monclient(hunting): > _reopen_session rank -1 name > 2016-06-01 07:12:55.662764 710f5700 10 -- :/1337675866 > connect_rank > to 62.176.141.181:6789/0, creating pipe and registering > [New Thread 0x7fffe3fff700 (LWP 2255)] > 2016-06-01 07:12:55.663789 710f5700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).register_pipe > 2016-06-01 07:12:55.663819 710f5700 10 -- :/1337675866 > get_connection mon.0 62.176.141.181:6789/0 new 0x7fffec064010 > 2016-06-01 07:12:55.663790 7fffe3fff700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).writer: state = connecting policy.server=0 > 2016-06-01 07:12:55.663830 7fffe3fff700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).connect 0 > 2016-06-01 07:12:55.663841 710f5700 10 monclient(hunting): picked > mon.pix01 con 0x7fffec05aa30 addr 62.176.141.181:6789/0 > 2016-06-01 07:12:55.663847 710f5700 20 -- :/1337675866 > send_keepalive con 0x7fffec05aa30, have pipe. > 2016-06-01 07:12:55.663850 7fffe3fff700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).connecting to 62.176.141.181:6789/0 > 2016-06-01 07:12:55.663863 710f5700 10 monclient(hunting): > _send_mon_message to mon.pix01 at 62.176.141.181:6789/0 > 2016-06-01 07:12:55.663866 710f5700 1 -- :/1337675866 --> > 62.176.141.181:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 > 0x7fffec060450 con 0x7fffec05aa30 > 2016-06-01 07:12:55.663870 710f5700 20 -- :
Re: [ceph-users] Ceph Status - Segmentation Fault
d=3 :41128 s=2 pgs=339278 cs=1 l=1 c=0x7fffec05aa30).writer sleeping 2016-06-01 07:12:55.665972 7fffea883700 10 monclient(hunting): dump: epoch 1 fsid 28af67eb-4060-4770-ac1d-d2be493877af last_changed 2014-11-12 15:44:27.182395 created 2014-11-12 15:44:27.182395 0: 62.176.141.181:6789/0 mon.pix01 1: 62.176.141.182:6789/0 mon.pix02 2016-06-01 07:12:55.665988 7fffea883700 10 -- 62.176.141.181:0/1337675866 dispatch_throttle_release 340 to dispatch throttler 373/104857600 2016-06-01 07:12:55.665992 7fffea883700 20 -- 62.176.141.181:0/1337675866 done calling dispatch on 0x7fffd0001cb0 2016-06-01 07:12:55.665997 7fffea883700 1 -- 62.176.141.181:0/1337675866 <== mon.0 62.176.141.181:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (3918039325 0 0) 0x7fffd0002f20 con 0x7fffec05aa30 2016-06-01 07:12:55.666015 7fffea883700 10 cephx: set_have_need_key no handler for service mon 2016-06-01 07:12:55.666016 7fffea883700 10 cephx: set_have_need_key no handler for service osd 2016-06-01 07:12:55.666017 7fffea883700 10 cephx: set_have_need_key no handler for service auth 2016-06-01 07:12:55.666018 7fffea883700 10 cephx: validate_tickets want 37 have 0 need 37 2016-06-01 07:12:55.666020 7fffea883700 10 monclient(hunting): my global_id is 3511432 2016-06-01 07:12:55.666022 7fffea883700 10 cephx client: handle_response ret = 0 2016-06-01 07:12:55.666023 7fffea883700 10 cephx client: got initial server challenge 3112857369079243605 2016-06-01 07:12:55.666025 7fffea883700 10 cephx client: validate_tickets: want=37 need=37 have=0 2016-06-01 07:12:55.666026 7fffea883700 10 cephx: set_have_need_key no handler for service mon 2016-06-01 07:12:55.666027 7fffea883700 10 cephx: set_have_need_key no handler for service osd 2016-06-01 07:12:55.666030 7fffea883700 10 cephx: set_have_need_key no handler for service auth 2016-06-01 07:12:55.666030 7fffea883700 10 cephx: validate_tickets want 37 have 0 need 37 2016-06-01 07:12:55.666031 7fffea883700 10 cephx client: want=37 need=37 have=0 2016-06-01 07:12:55.666034 7fffea883700 10 cephx client: build_request Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffea883700 (LWP 2242)] 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 110 auth/cephx/../Crypto.h: No such file or directory. (gdb) bt #0 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 #1 encode_encrypt_enc_bl (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:464 #2 encode_encrypt (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:489 #3 cephx_calc_client_server_challenge (cct=, secret=..., server_challenge=3112857369079243605, client_challenge=12899511428024786235, key=key@entry=0x7fffea8824a8 , ret="") at auth/cephx/CephxProtocol.cc:36 #4 0x7313aff4 in CephxClientHandler::build_request (this=0x7fffd4001520, bl=...) at auth/cephx/CephxClientHandler.cc:53 #5 0x72fe4a79 in MonClient::handle_auth (this=this@entry=0x7ff fec006b70, m=m@entry=0x7fffd0002f20) at mon/MonClient.cc:510 #6 0x72fe6507 in MonClient::ms_dispatch (this=0x7fffec006b70, m=0x7fffd0002f20) at mon/MonClient.cc:277 #7 0x730d5dc9 in ms_deliver_dispatch (m=0x7fffd0002f20, this=0x7fffec055410) at ./msg/Messenger.h:582 #8 DispatchQueue::entry (this=0x7fffec0555d8) at msg/simple/DispatchQueue.cc:185 #9 0x731023bd in DispatchQueue::DispatchThread::entry (this=) at msg/simple/DispatchQueue.h:103 #10 0x77bc4182 in start_thread () from /lib/x86_64-linux- gnu/libpthread.so.0 #11 0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Best regards Mathias-Original Message- From: Brad Hubbard To: jsp...@redhat.com Cc: ceph-us...@ceph.com, Mathias Buresch Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Wed, 25 May 2016 19:22:03 -0400 Hi John, This looks a lot like http://tracker.ceph.com/issues/12417 which is, of course, fixed. Worth gathering debug-auth=20 ? Maybe on the MON end as well? Cheers, Brad - Original Message - > > From: "Mathias Buresch" > To: jsp...@redhat.com > Cc: ceph-us...@ceph.com > Sent: Thursday, 26 May, 2016 12:57:47 AM > Subject: Re: [ceph-users] Ceph Status - Segmentation Fault > > There wasnt a package ceph-debuginfo available (Maybe bc I am running > Ubuntu). Have installed those: > > * ceph-dbg > * librados2-dbg > > There would be also ceph-mds-dbg and ceph-fs-common-dbg and so.. > > But now there are more information provided by the gdb output :) > > (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug- > rados=20 > Starting program: /usr/bin/python /usr/bin/ceph status --debug- > monc=20 > --debug-ms=20 --debug-rados=20 > [Thread debugging using libthread_
Re: [ceph-users] Ceph Status - Segmentation Fault
cs=1 l=1 c=0x7fffec05aa30).aborted = 0 2016-05-25 16:55:30.938413 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got 340 + 0 + 0 byte message 2016-05-25 16:55:30.938427 7fffe3efe700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).No session security set 2016-05-25 16:55:30.938434 7fffe3efe700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got message 1 0x7fffd0001cb0 mon_map magic: 0 v1 2016-05-25 16:55:30.938442 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 queue 0x7fffd0001cb0 prio 196 2016-05-25 16:55:30.938450 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader reading tag... 2016-05-25 16:55:30.938453 7fffe3fff700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).writer: state = open policy.server=0 2016-05-25 16:55:30.938464 7fffe3fff700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).write_ack 1 2016-05-25 16:55:30.938467 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got MSG 2016-05-25 16:55:30.938471 7fffe3fff700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).writer: state = open policy.server=0 2016-05-25 16:55:30.938472 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got envelope type=18 src mon.0 front=33 data=0 off 0 2016-05-25 16:55:30.938475 7fffe3fff700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).writer sleeping 2016-05-25 16:55:30.938476 7fffe3efe700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader wants 33 from dispatch throttler 340/104857600 2016-05-25 16:55:30.938456 7fffea883700 1 -- 62.176.141.181:0/3663984981 <== mon.0 62.176.141.181:6789/0 1 mon_map magic: 0 v1 340+0+0 (3213884171 0 0) 0x7fffd0001cb0 con 0x7fffec05aa30 2016-05-25 16:55:30.938481 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got front 33 2016-05-25 16:55:30.938484 7fffea883700 10 monclient(hunting): handle_monmap mon_map magic: 0 v1 2016-05-25 16:55:30.938485 7fffe3efe700 10 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffea883700 (LWP 26749)] 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 110 auth/cephx/../Crypto.h: No such file or directory. (gdb) bt #0 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 #1 encode_encrypt_enc_bl (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:464 #2 encode_encrypt (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:489 #3 cephx_calc_client_server_challenge (cct=, secret=..., server_challenge=9622349603176979543, client_challenge=7732813711656640623, key=key@entry=0x7fffea8824a8, ret="") at auth/cephx/CephxProtocol.cc:36 #4 0x7313aff4 in CephxClientHandler::build_request (this=0x7fffd4001520, bl=...) at auth/cephx/CephxClientHandler.cc:53 #5 0x72fe4a79 in MonClient::handle_auth (this=this@entry=0x7ff fec006b70, m=m@entry=0x7fffd0002ee0) at mon/MonClient.cc:510 #6 0x72fe6507 in MonClient::ms_dispatch (this=0x7fffec006b70, m=0x7fffd0002ee0) at mon/MonClient.cc:277 #7 0x730d5dc9 in ms_deliver_dispatch (m=0x7fffd0002ee0, this=0x7fffec055410) at ./msg/Messenger.h:582 #8 DispatchQueue::entry (this=0x7fffec0555d8) at msg/simple/DispatchQueue.cc:185 #9 0x731023bd in DispatchQueue::DispatchThread::entry (this=) at msg/simple/DispatchQueue.h:103 #10 0x77bc4182 in start_thread () from /lib/x86_64-linux- gnu/libpthread.so.0 #11 0x7ffff78f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 -Original Message- From: John Spray To: Mathias Buresch Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Wed, 25 May 2016 15:41:51 +0100 On Wed, May 25, 2016 at 3:00 PM, Mathias Buresch wrote: > > I don't know what exac
Re: [ceph-users] Ceph Status - Segmentation Fault
hread 0x710f5700 (LWP 23403) exited] [New Thread 0x710f5700 (LWP 23404)] [Thread 0x710f5700 (LWP 23404) exited] [New Thread 0x710f5700 (LWP 23405)] [Thread 0x710f5700 (LWP 23405) exited] [New Thread 0x710f5700 (LWP 23406)] [Thread 0x710f5700 (LWP 23406) exited] [New Thread 0x710f5700 (LWP 23407)] [Thread 0x710f5700 (LWP 23407) exited] [New Thread 0x710f5700 (LWP 23408)] [New Thread 0x7fffeb885700 (LWP 23409)] [New Thread 0x7fffeb084700 (LWP 23410)] [New Thread 0x7fffea883700 (LWP 23411)] [New Thread 0x7fffea082700 (LWP 23412)] [New Thread 0x7fffe9881700 (LWP 23413)] [New Thread 0x7fffe9080700 (LWP 23414)] [New Thread 0x7fffe887f700 (LWP 23415)] [New Thread 0x7fffe807e700 (LWP 23416)] [New Thread 0x7fffe7f7d700 (LWP 23419)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffea883700 (LWP 23411)] 0x73141a57 in ?? () from /usr/lib/librados.so.2 (gdb) bt #0 0x73141a57 in ?? () from /usr/lib/librados.so.2 #1 0x7313aff4 in ?? () from /usr/lib/librados.so.2 #2 0x72fe4a79 in ?? () from /usr/lib/librados.so.2 #3 0x72fe6507 in ?? () from /usr/lib/librados.so.2 #4 0x730d5dc9 in ?? () from /usr/lib/librados.so.2 #5 0x731023bd in ?? () from /usr/lib/librados.so.2 #6 0x77bc4182 in start_thread () from /lib/x86_64-linux- gnu/libpthread.so.0 #7 0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Does that help? I cant really see where the error is. :) -Original Message- From: John Spray To: Mathias Buresch Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Wed, 25 May 2016 10:16:55 +0100 On Mon, May 23, 2016 at 12:41 PM, Mathias Buresch wrote: > > Please found the logs with higher debug level attached to this email. You've attached the log from your mon, but it's not your mon that's segfaulting, right? You can use normal ceph command line flags to crank up the verbosity on the CLI too (--debug-monc=20 --debug-ms=20 spring to mind). You can also run the ceph CLI in gdb like this: gdb python (gdb) run /usr/bin/ceph status ... hopefully it crashes and then ... (gdb) bt Cheers, John > > > > Kind regards > Mathias > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Status - Segmentation Fault
Hi there, I was updating Ceph to 0.94.7 and now I am getting segmantation faults. When getting status via "ceph -s" or "ceph health detail" I am getting an error "Segmentation fault". I have only two Monitor Deamon.. but didn't had any problems yet with that.. maybe they maintenance time was too long this time..?! When getting the status via admin socket I get following for both: ceph daemon mon.pix01 mon_status { "name": "pix01", "rank": 0, "state": "leader", "election_epoch": 226, "quorum": [ 0, 1 ], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 1, "fsid": "28af67eb-4060-4770-ac1d-d2be493877af", "modified": "2014-11-12 15:44:27.182395", "created": "2014-11-12 15:44:27.182395", "mons": [ { "rank": 0, "name": "pix01", "addr": "x.x.x.x:6789\/0" }, { "rank": 1, "name": "pix02", "addr": "x.x.x.x:6789\/0" } ] } } ceph daemon mon.pix02 mon_status { "name": "pix02", "rank": 1, "state": "peon", "election_epoch": 226, "quorum": [ 0, 1 ], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 1, "fsid": "28af67eb-4060-4770-ac1d-d2be493877af", "modified": "2014-11-12 15:44:27.182395", "created": "2014-11-12 15:44:27.182395", "mons": [ { "rank": 0, "name": "pix01", "addr": "x.x.x.x:6789\/0" }, { "rank": 1, "name": "pix02", "addr": "x.x.x.x:6789\/0" } ] } } Kind regards Mathias smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and Ubuntu Backport Kernel Problem
Thank you so much Ilya! This is exactly what I have searched for!! -Original Message- From: Ilya Dryomov To: Mathias Buresch Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] CephFS and Ubuntu Backport Kernel Problem Date: Tue, 12 Apr 2016 16:21:04 +0200 On Tue, Apr 12, 2016 at 4:08 PM, Mathias Buresch wrote: > > Hi there, > > I have an issue with using Ceph and Ubuntu Backport Kernel newer than > 3.19.0-43. > > Following setup I have: > > Ubuntu 14.04 > Kernel 3.19.0-43 (Backport Kernel) > Ceph 0.94.6 > > I am using CephFS! The kernel 3.19.0-43 was the last working kernel. > Every newer kernel is failing and has a kernel panic or something. > When starting the server the processes itself starting normal, but > when > mounting CephFS (the kernel version - not the FUSE!) it hangs and I > only can restart the server. > > Does anyone know about that issue or that it would be fixed if I > upgrade to one of the newer Ceph versions?! See http://www.spinics.net/lists/ceph-devel/msg29504.html http://tracker.ceph.com/issues/15302 and search for "[ceph-users] cephfs Kernel panic" thread from yesterday here on ceph-users - archives haven't caught up yet. Thanks, Ilya smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS and Ubuntu Backport Kernel Problem
Hi there, I have an issue with using Ceph and Ubuntu Backport Kernel newer than 3.19.0-43. Following setup I have: Ubuntu 14.04 Kernel 3.19.0-43 (Backport Kernel) Ceph 0.94.6 I am using CephFS! The kernel 3.19.0-43 was the last working kernel. Every newer kernel is failing and has a kernel panic or something. When starting the server the processes itself starting normal, but when mounting CephFS (the kernel version - not the FUSE!) it hangs and I only can restart the server. Does anyone know about that issue or that it would be fixed if I upgrade to one of the newer Ceph versions?! Greetz Mathias smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph FS - MDS problem
Hi Dan, thanks for quick reply! Didnt read in detail yet but here are my first comments: 3.b BTW, our old friend updatedb seems to trigger the same problem.. grabbing caps very quickly as it indexes CephFS. updatedb.conf is configured to PRUNEFS="... fuse ...", but CephFS has type fuse.ceph-fuse. We'll need to add "ceph" to that list too. This was my first thought and I add CEPH paths to PRUNEPATH. But as you said maybe I have to add FS type too..?! >> PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs /*var/lib/ceph/osd **/mnt/ceph*" NEVERTHLESS since that I didnt see the mlocate process in 'D' state anymore. Plus the fact that the high load comes up with MDS / client problem I dont really think that updatedb ist still the problembut just a assumption 4. "mds cache size = 500" is going to use a lot of memory! We have an MDS with just 8GB of RAM and it goes OOM after delegating around 1 million caps. (this is with mds cache size = 10, btw) At least I never saw on 'htop' that much memory used or any other resources are high...just the load raises Best regards, Mathias smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph FS - MDS problem
Hi there, maybe you could be so kind and help me with following issue: We running Ceph FS but there's repeatedly a problem with the MDS. Sometimes following error occurs: "mds0: Client 701782 failing to respond to capability release" Listing the session informations shows that the "num_caps" on that Client is much more than on the other Clients. ( see also -> attachement ) The problem is that the load on one of the server is increasing to really high value ( 80 to 100 ) independent of client which is complaining. I guess my problem is also that I dont really understand the meaning of those "capabilties". Following facts (let me know if you need more): * CEPH-FS-Client, MDS, MON, OSD all on same server * Kernel-Client (Kernel: 3.14.16-031416-generic) * MDS config o only raised "mds cache size = 500" (because before there was error "failing to respond to cache pressure") Best regards Mathias # CEPH FS ERROR 09:33:30 PROD root@ceph01:~# ceph -s cluster xxx health HEALTH_WARN mds0: Client 701782 failing to respond to capability release monmap e1: 3 mons at {ceph01=xx.xx.xx.114:6789/0,ceph02=xx.xx.xx.115:6789/0,ceph03=xx.xx.xx.116:6789/0} election epoch 106, quorum 0,1,2 ceph01,ceph02,ceph03 mdsmap e260: 1/1/1 up {0=ceph01=up:active}, 2 up:standby . -> Load raises immedtiatly 09:33:32 PROD root@ceph01:~# ceph daemon mds.ceph01 session ls [ { "id": 701782, "num_leases": 16, "num_caps": 221397, "state": "open", "replay_requests": 0, "reconnecting": false, "inst": "client.701782 xx.xx.xx.114:0\/1344307356", "client_metadata": {} }, { "id": 692103, "num_leases": 1, "num_caps": 50115, "state": "open", "replay_requests": 0, "reconnecting": false, "inst": "client.692103 xx.xx.xx.117:0\/3600471798", "client_metadata": {} }, { "id": 691995, "num_leases": 2, "num_caps": 53227, "state": "open", "replay_requests": 0, "reconnecting": false, "inst": "client.691995 xx.xx.xx.115:0\/1220606159", "client_metadata": {} }, { "id": 692058, "num_leases": 8, "num_caps": 49722, "state": "open", "replay_requests": 0, "reconnecting": false, "inst": "client.692058 xx.xx.xx.116:0\/4048537076", "client_metadata": {} } ] 09:38:18 PROD root@ceph01:~# ceph daemon mds.ceph01 perf dump { "mds": { "request": 1387754, "reply": 1387696, "reply_latency": { "avgcount": 1387696, "sum": 6439.991891758 }, "forward": 0, "dir_fetch": 57946, "dir_commit": 35053, "dir_split": 0, "inode_max": 500, "inodes": 1116643, "inodes_top": 837156, "inodes_bottom": 279487, "inodes_pin_tail": 0, "inodes_pinned": 292936, "inodes_expired": 0, "inodes_with_caps": 269668, "caps": 374718, "subtrees": 2, "traverse": 2591500, "traverse_hit": 2492810, "traverse_forward": 0, "traverse_discover": 0, "traverse_dir_fetch": 19330, "traverse_remote_ino": 0, "traverse_lock": 2350, "load_cent": 138774897, "q": 0, "exported": 0, "exported_inodes": 0, "imported": 0, "imported_inodes": 0 }, "mds_cache": { "num_strays": 56, "num_strays_purging": 0, "num_strays_delayed": 0, "strays_created": 2835, "strays_purged": 2802, "num_recovering_processing": 0, "num_recovering_enqueued": 0, "num_recovering_prioritized": 0, "recovery_started": 0, "recovery_completed": 0 }, "mds_log": { "evadd": 376174, "evex": 377829, "evtrm": 377829, "ev": 13815, "evexg": 0, "evexd": 1024, "segadd": 738, "segex": 738, "segtrm": 738, "seg": 31, "segexg": 0, "segexd": 1, "expos": 6882857746, "wrpos": 6991387600, "rdpos": 4859818564, "jlat": 0 }, "mds_mem": { "ino": 1112733, "ino+": 1115537, "ino-": 2804, "dir": 66813, "dir+": 67017, "dir-": 204, "dn": 1116643, "dn+": 1121224, "dn-": 4581, "cap": 374718, "cap+": 1005845, "cap-": 631127, "rss": 6992420, "heap": 49060, "malloc": 18446744073708021059, "buf": 0 }, "mds_server": { "handle_client_request": 1387754, "handle_slave_request": 0, "handle_client_session": 80950, "dispatch_client_request": 2526245, "dispatch_server_request": 0 }, "objecter": { "op_active": 0,