Re: [ceph-users] Ceph Status - Segmentation Fault

2016-06-13 Thread Mathias Buresch
Hey,

I opened an issue at tracker.ceph.com -> http://tracker.ceph.com/issues
/16266-Original Message-
From: Brad Hubbard 
To: Mathias Buresch 
Cc: jsp...@redhat.com , ceph-us...@ceph.com 
Subject: Re: [ceph-users] Ceph Status - Segmentation Fault
Date: Thu, 2 Jun 2016 09:50:20 +1000

Could this be the call in RotatingKeyRing::get_secret() failing?

Mathias, I'd suggest opening a tracker for this with the information in
your last post and let us know the number here.
Cheers,
Brad

On Wed, Jun 1, 2016 at 3:15 PM, Mathias Buresch  wrote:
> Hi,
> 
> here is the output including --debug-auth=20. Does this help?
> 
> (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug-
> rados=20 --debug-auth=20
> Starting program: /usr/bin/python /usr/bin/ceph status --debug-
> monc=20
> --debug-ms=20 --debug-rados=20 --debug-auth=20
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-
> gnu/libthread_db.so.1".
> [New Thread 0x710f5700 (LWP 2210)]
> [New Thread 0x708f4700 (LWP 2211)]
> [Thread 0x710f5700 (LWP 2210) exited]
> [New Thread 0x710f5700 (LWP 2212)]
> [Thread 0x710f5700 (LWP 2212) exited]
> [New Thread 0x710f5700 (LWP 2213)]
> [Thread 0x710f5700 (LWP 2213) exited]
> [New Thread 0x710f5700 (LWP 2233)]
> [Thread 0x710f5700 (LWP 2233) exited]
> [New Thread 0x710f5700 (LWP 2236)]
> [Thread 0x710f5700 (LWP 2236) exited]
> [New Thread 0x710f5700 (LWP 2237)]
> [Thread 0x710f5700 (LWP 2237) exited]
> [New Thread 0x710f5700 (LWP 2238)]
> [New Thread 0x7fffeb885700 (LWP 2240)]
> 2016-06-01 07:12:55.656336 710f5700 10 monclient(hunting):
> build_initial_monmap
> 2016-06-01 07:12:55.656440 710f5700  1 librados: starting msgr at
> :/0
> 2016-06-01 07:12:55.656446 710f5700  1 librados: starting
> objecter
> [New Thread 0x7fffeb084700 (LWP 2241)]
> 2016-06-01 07:12:55.657552 710f5700 10 -- :/0 ready :/0
> [New Thread 0x7fffea883700 (LWP 2242)]
> [New Thread 0x7fffea082700 (LWP 2245)]
> 2016-06-01 07:12:55.659548 710f5700  1 -- :/0 messenger.start
> [New Thread 0x7fffe9881700 (LWP 2248)]
> 2016-06-01 07:12:55.660530 710f5700  1 librados: setting wanted
> keys
> 2016-06-01 07:12:55.660539 710f5700  1 librados: calling
> monclient
> init
> 2016-06-01 07:12:55.660540 710f5700 10 monclient(hunting): init
> 2016-06-01 07:12:55.660550 710f5700  5 adding auth protocol:
> cephx
> 2016-06-01 07:12:55.660552 710f5700 10 monclient(hunting):
> auth_supported 2 method cephx
> 2016-06-01 07:12:55.660532 7fffe9881700 10 -- :/1337675866
> reaper_entry
> start
> 2016-06-01 07:12:55.660570 7fffe9881700 10 -- :/1337675866 reaper
> 2016-06-01 07:12:55.660572 7fffe9881700 10 -- :/1337675866 reaper
> done
> 2016-06-01 07:12:55.660733 710f5700  2 auth: KeyRing::load:
> loaded
> key file /etc/ceph/ceph.client.admin.keyring
> [New Thread 0x7fffe9080700 (LWP 2251)]
> [New Thread 0x7fffe887f700 (LWP 2252)]
> 2016-06-01 07:12:55.662754 710f5700 10 monclient(hunting):
> _reopen_session rank -1 name 
> 2016-06-01 07:12:55.662764 710f5700 10 -- :/1337675866
> connect_rank
> to 62.176.141.181:6789/0, creating pipe and registering
> [New Thread 0x7fffe3fff700 (LWP 2255)]
> 2016-06-01 07:12:55.663789 710f5700 10 -- :/1337675866 >>
> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1
> c=0x7fffec05aa30).register_pipe
> 2016-06-01 07:12:55.663819 710f5700 10 -- :/1337675866
> get_connection mon.0 62.176.141.181:6789/0 new 0x7fffec064010
> 2016-06-01 07:12:55.663790 7fffe3fff700 10 -- :/1337675866 >>
> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1
> c=0x7fffec05aa30).writer: state = connecting policy.server=0
> 2016-06-01 07:12:55.663830 7fffe3fff700 10 -- :/1337675866 >>
> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1
> c=0x7fffec05aa30).connect 0
> 2016-06-01 07:12:55.663841 710f5700 10 monclient(hunting): picked
> mon.pix01 con 0x7fffec05aa30 addr 62.176.141.181:6789/0
> 2016-06-01 07:12:55.663847 710f5700 20 -- :/1337675866
> send_keepalive con 0x7fffec05aa30, have pipe.
> 2016-06-01 07:12:55.663850 7fffe3fff700 10 -- :/1337675866 >>
> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7fffec05aa30).connecting to 62.176.141.181:6789/0
> 2016-06-01 07:12:55.663863 710f5700 10 monclient(hunting):
> _send_mon_message to mon.pix01 at 62.176.141.181:6789/0
> 2016-06-01 07:12:55.663866 710f5700  1 -- :/1337675866 -->
> 62.176.141.181:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0
> 0x7fffec060450 con 0x7fffec05aa30
> 2016-06-01 07:12:55.663870 710f5700 20 -- :

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-31 Thread Mathias Buresch
d=3 :41128 s=2 pgs=339278 cs=1 l=1
c=0x7fffec05aa30).writer sleeping
2016-06-01 07:12:55.665972 7fffea883700 10 monclient(hunting): dump:
epoch 1
fsid 28af67eb-4060-4770-ac1d-d2be493877af
last_changed 2014-11-12 15:44:27.182395
created 2014-11-12 15:44:27.182395
0: 62.176.141.181:6789/0 mon.pix01
1: 62.176.141.182:6789/0 mon.pix02

2016-06-01 07:12:55.665988 7fffea883700 10 --
62.176.141.181:0/1337675866 dispatch_throttle_release 340 to dispatch
throttler 373/104857600
2016-06-01 07:12:55.665992 7fffea883700 20 --
62.176.141.181:0/1337675866 done calling dispatch on 0x7fffd0001cb0
2016-06-01 07:12:55.665997 7fffea883700  1 --
62.176.141.181:0/1337675866 <== mon.0 62.176.141.181:6789/0 2 
auth_reply(proto 2 0 (0) Success) v1  33+0+0 (3918039325 0 0)
0x7fffd0002f20 con 0x7fffec05aa30
2016-06-01 07:12:55.666015 7fffea883700 10 cephx: set_have_need_key no
handler for service mon
2016-06-01 07:12:55.666016 7fffea883700 10 cephx: set_have_need_key no
handler for service osd
2016-06-01 07:12:55.666017 7fffea883700 10 cephx: set_have_need_key no
handler for service auth
2016-06-01 07:12:55.666018 7fffea883700 10 cephx: validate_tickets want
37 have 0 need 37
2016-06-01 07:12:55.666020 7fffea883700 10 monclient(hunting): my
global_id is 3511432
2016-06-01 07:12:55.666022 7fffea883700 10 cephx client:
handle_response ret = 0
2016-06-01 07:12:55.666023 7fffea883700 10 cephx client:  got initial
server challenge 3112857369079243605
2016-06-01 07:12:55.666025 7fffea883700 10 cephx client:
validate_tickets: want=37 need=37 have=0
2016-06-01 07:12:55.666026 7fffea883700 10 cephx: set_have_need_key no
handler for service mon
2016-06-01 07:12:55.666027 7fffea883700 10 cephx: set_have_need_key no
handler for service osd
2016-06-01 07:12:55.666030 7fffea883700 10 cephx: set_have_need_key no
handler for service auth
2016-06-01 07:12:55.666030 7fffea883700 10 cephx: validate_tickets want
37 have 0 need 37
2016-06-01 07:12:55.666031 7fffea883700 10 cephx client: want=37
need=37 have=0
2016-06-01 07:12:55.666034 7fffea883700 10 cephx client: build_request

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea883700 (LWP 2242)]
0x73141a57 in encrypt (cct=,
error=0x7fffea882280, out=..., in=..., this=0x7fffea882470)
at auth/cephx/../Crypto.h:110
110 auth/cephx/../Crypto.h: No such file or directory.
(gdb) bt
#0  0x73141a57 in encrypt (cct=,
error=0x7fffea882280, out=..., in=..., this=0x7fffea882470)
at auth/cephx/../Crypto.h:110
#1  encode_encrypt_enc_bl (cct=,
error="", out=..., key=..., t=)
at auth/cephx/CephxProtocol.h:464
#2  encode_encrypt (cct=, error="",
out=..., key=..., t=)
at auth/cephx/CephxProtocol.h:489
#3  cephx_calc_client_server_challenge (cct=,
secret=..., server_challenge=3112857369079243605, 
client_challenge=12899511428024786235, key=key@entry=0x7fffea8824a8
, ret="") at auth/cephx/CephxProtocol.cc:36
#4  0x7313aff4 in CephxClientHandler::build_request
(this=0x7fffd4001520, bl=...) at auth/cephx/CephxClientHandler.cc:53
#5  0x72fe4a79 in MonClient::handle_auth (this=this@entry=0x7ff
fec006b70, m=m@entry=0x7fffd0002f20) at mon/MonClient.cc:510
#6  0x72fe6507 in MonClient::ms_dispatch (this=0x7fffec006b70,
m=0x7fffd0002f20) at mon/MonClient.cc:277
#7  0x730d5dc9 in ms_deliver_dispatch (m=0x7fffd0002f20,
this=0x7fffec055410) at ./msg/Messenger.h:582
#8  DispatchQueue::entry (this=0x7fffec0555d8) at
msg/simple/DispatchQueue.cc:185
#9  0x731023bd in DispatchQueue::DispatchThread::entry
(this=) at msg/simple/DispatchQueue.h:103
#10 0x77bc4182 in start_thread () from /lib/x86_64-linux-
gnu/libpthread.so.0
#11 0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6


Best regards
Mathias-Original Message-
From: Brad Hubbard 
To: jsp...@redhat.com
Cc: ceph-us...@ceph.com, Mathias Buresch 
Subject: Re: [ceph-users] Ceph Status - Segmentation Fault
Date: Wed, 25 May 2016 19:22:03 -0400

Hi John,

This looks a lot like http://tracker.ceph.com/issues/12417 which is, of
course, fixed.

Worth gathering debug-auth=20 ? Maybe on the MON end as well?

Cheers,
Brad


- Original Message -
> 
> From: "Mathias Buresch" 
> To: jsp...@redhat.com
> Cc: ceph-us...@ceph.com
> Sent: Thursday, 26 May, 2016 12:57:47 AM
> Subject: Re: [ceph-users] Ceph Status - Segmentation Fault
> 
> There wasnt a package ceph-debuginfo available (Maybe bc I am running
> Ubuntu). Have installed those:
> 
>  * ceph-dbg
>  * librados2-dbg
> 
> There would be also ceph-mds-dbg and ceph-fs-common-dbg and so..
> 
> But now there are more information provided by the gdb output :)
> 
> (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug-
> rados=20
> Starting program: /usr/bin/python /usr/bin/ceph status --debug-
> monc=20
> --debug-ms=20 --debug-rados=20
> [Thread debugging using libthread_

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread Mathias Buresch
cs=1 l=1
c=0x7fffec05aa30).aborted = 0
2016-05-25 16:55:30.938413 7fffe3efe700 20 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).reader got 340 + 0 + 0 byte message
2016-05-25 16:55:30.938427 7fffe3efe700 10 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).No session security set
2016-05-25 16:55:30.938434 7fffe3efe700 10 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).reader got message 1 0x7fffd0001cb0 mon_map magic: 0
v1
2016-05-25 16:55:30.938442 7fffe3efe700 20 --
62.176.141.181:0/3663984981 queue 0x7fffd0001cb0 prio 196
2016-05-25 16:55:30.938450 7fffe3efe700 20 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).reader reading tag...
2016-05-25 16:55:30.938453 7fffe3fff700 10 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).writer: state = open policy.server=0
2016-05-25 16:55:30.938464 7fffe3fff700 10 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).write_ack 1
2016-05-25 16:55:30.938467 7fffe3efe700 20 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).reader got MSG
2016-05-25 16:55:30.938471 7fffe3fff700 10 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).writer: state = open policy.server=0
2016-05-25 16:55:30.938472 7fffe3efe700 20 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).reader got envelope type=18 src mon.0 front=33 data=0
off 0
2016-05-25 16:55:30.938475 7fffe3fff700 20 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).writer sleeping
2016-05-25 16:55:30.938476 7fffe3efe700 10 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).reader wants 33 from dispatch throttler 340/104857600
2016-05-25 16:55:30.938456 7fffea883700  1 --
62.176.141.181:0/3663984981 <== mon.0 62.176.141.181:6789/0 1 
mon_map magic: 0 v1  340+0+0 (3213884171 0 0) 0x7fffd0001cb0 con
0x7fffec05aa30
2016-05-25 16:55:30.938481 7fffe3efe700 20 --
62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0
pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1
c=0x7fffec05aa30).reader got front 33
2016-05-25 16:55:30.938484 7fffea883700 10 monclient(hunting):
handle_monmap mon_map magic: 0 v1
2016-05-25 16:55:30.938485 7fffe3efe700 10 
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea883700 (LWP 26749)]
0x73141a57 in encrypt (cct=,
error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at
auth/cephx/../Crypto.h:110
110 auth/cephx/../Crypto.h: No such file or directory.
(gdb) bt
#0  0x73141a57 in encrypt (cct=,
error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at
auth/cephx/../Crypto.h:110
#1  encode_encrypt_enc_bl (cct=,
error="", out=..., key=..., t=) at
auth/cephx/CephxProtocol.h:464
#2  encode_encrypt (cct=, error="",
out=..., key=..., t=) at
auth/cephx/CephxProtocol.h:489
#3  cephx_calc_client_server_challenge (cct=,
secret=..., server_challenge=9622349603176979543,
client_challenge=7732813711656640623, key=key@entry=0x7fffea8824a8,
ret="")
at auth/cephx/CephxProtocol.cc:36
#4  0x7313aff4 in CephxClientHandler::build_request
(this=0x7fffd4001520, bl=...) at auth/cephx/CephxClientHandler.cc:53
#5  0x72fe4a79 in MonClient::handle_auth (this=this@entry=0x7ff
fec006b70, m=m@entry=0x7fffd0002ee0) at mon/MonClient.cc:510
#6  0x72fe6507 in MonClient::ms_dispatch (this=0x7fffec006b70,
m=0x7fffd0002ee0) at mon/MonClient.cc:277
#7  0x730d5dc9 in ms_deliver_dispatch (m=0x7fffd0002ee0,
this=0x7fffec055410) at ./msg/Messenger.h:582
#8  DispatchQueue::entry (this=0x7fffec0555d8) at
msg/simple/DispatchQueue.cc:185
#9  0x731023bd in DispatchQueue::DispatchThread::entry
(this=) at msg/simple/DispatchQueue.h:103
#10 0x77bc4182 in start_thread () from /lib/x86_64-linux-
gnu/libpthread.so.0
#11 0x7ffff78f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6

-Original Message-
From: John Spray 
To: Mathias Buresch 
Cc: ceph-us...@ceph.com 
Subject: Re: [ceph-users] Ceph Status - Segmentation Fault
Date: Wed, 25 May 2016 15:41:51 +0100

On Wed, May 25, 2016 at 3:00 PM, Mathias Buresch
 wrote:
> 
> I don't know what exac

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread Mathias Buresch
hread 0x710f5700 (LWP 23403) exited]
[New Thread 0x710f5700 (LWP 23404)]
[Thread 0x710f5700 (LWP 23404) exited]
[New Thread 0x710f5700 (LWP 23405)]
[Thread 0x710f5700 (LWP 23405) exited]
[New Thread 0x710f5700 (LWP 23406)]
[Thread 0x710f5700 (LWP 23406) exited]
[New Thread 0x710f5700 (LWP 23407)]
[Thread 0x710f5700 (LWP 23407) exited]
[New Thread 0x710f5700 (LWP 23408)]
[New Thread 0x7fffeb885700 (LWP 23409)]
[New Thread 0x7fffeb084700 (LWP 23410)]
[New Thread 0x7fffea883700 (LWP 23411)]
[New Thread 0x7fffea082700 (LWP 23412)]
[New Thread 0x7fffe9881700 (LWP 23413)]
[New Thread 0x7fffe9080700 (LWP 23414)]
[New Thread 0x7fffe887f700 (LWP 23415)]
[New Thread 0x7fffe807e700 (LWP 23416)]
[New Thread 0x7fffe7f7d700 (LWP 23419)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea883700 (LWP 23411)]
0x73141a57 in ?? () from /usr/lib/librados.so.2
(gdb) bt
#0  0x73141a57 in ?? () from /usr/lib/librados.so.2
#1  0x7313aff4 in ?? () from /usr/lib/librados.so.2
#2  0x72fe4a79 in ?? () from /usr/lib/librados.so.2
#3  0x72fe6507 in ?? () from /usr/lib/librados.so.2
#4  0x730d5dc9 in ?? () from /usr/lib/librados.so.2
#5  0x731023bd in ?? () from /usr/lib/librados.so.2
#6  0x77bc4182 in start_thread () from /lib/x86_64-linux-
gnu/libpthread.so.0
#7  0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6
 

Does that help? I cant really see where the error is. :)

-Original Message-
From: John Spray 
To: Mathias Buresch 
Cc: ceph-us...@ceph.com 
Subject: Re: [ceph-users] Ceph Status - Segmentation Fault
Date: Wed, 25 May 2016 10:16:55 +0100

On Mon, May 23, 2016 at 12:41 PM, Mathias Buresch
 wrote:
> 
> Please found the logs with higher debug level attached to this email.
You've attached the log from your mon, but it's not your mon that's
segfaulting, right?

You can use normal ceph command line flags to crank up the verbosity
on the CLI too (--debug-monc=20 --debug-ms=20 spring to mind).

You can also run the ceph CLI in gdb like this:
gdb python
(gdb) run /usr/bin/ceph status
... hopefully it crashes and then ...
(gdb) bt

Cheers,
John

> 
> 
> 
> Kind regards
> Mathias
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Status - Segmentation Fault

2016-05-23 Thread Mathias Buresch
Hi there,
I was updating Ceph to 0.94.7 and now I am getting segmantation faults.

When getting status via "ceph -s" or "ceph health detail" I am getting
an error "Segmentation fault".

I have only two Monitor Deamon.. but didn't had any problems yet with
that.. maybe they maintenance time was too long this time..?!

When getting the status via admin socket I get following for both:

ceph daemon mon.pix01 mon_status
{
"name": "pix01",
"rank": 0,
"state": "leader",
"election_epoch": 226,
"quorum": [
0,
1
],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 1,
"fsid": "28af67eb-4060-4770-ac1d-d2be493877af",
"modified": "2014-11-12 15:44:27.182395",
"created": "2014-11-12 15:44:27.182395",
"mons": [
{
"rank": 0,
"name": "pix01",
"addr": "x.x.x.x:6789\/0"
},
{
"rank": 1,
"name": "pix02",
"addr": "x.x.x.x:6789\/0"
}
]
}
}

ceph daemon mon.pix02 mon_status
{
"name": "pix02",
"rank": 1,
"state": "peon",
"election_epoch": 226,
"quorum": [
0,
1
],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 1,
"fsid": "28af67eb-4060-4770-ac1d-d2be493877af",
"modified": "2014-11-12 15:44:27.182395",
"created": "2014-11-12 15:44:27.182395",
"mons": [
{
"rank": 0,
"name": "pix01",
"addr": "x.x.x.x:6789\/0"
},
{
"rank": 1,
"name": "pix02",
"addr": "x.x.x.x:6789\/0"
}
]
}
}



Kind regards
Mathias

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and Ubuntu Backport Kernel Problem

2016-04-12 Thread Mathias Buresch
Thank you so much Ilya!

This is exactly what I have searched for!!

-Original Message-
From: Ilya Dryomov 
To: Mathias Buresch 
Cc: ceph-us...@ceph.com 
Subject: Re: [ceph-users] CephFS and Ubuntu Backport Kernel Problem
Date: Tue, 12 Apr 2016 16:21:04 +0200

On Tue, Apr 12, 2016 at 4:08 PM, Mathias Buresch
 wrote:
> 
> Hi there,
> 
> I have an issue with using Ceph and Ubuntu Backport Kernel newer than
> 3.19.0-43.
> 
> Following setup I have:
> 
> Ubuntu 14.04
> Kernel 3.19.0-43 (Backport Kernel)
> Ceph 0.94.6
> 
> I am using CephFS! The kernel 3.19.0-43 was the last working kernel.
> Every newer kernel is failing and has a kernel panic or something.
> When starting the server the processes itself starting normal, but
> when
> mounting CephFS (the kernel version - not the FUSE!) it hangs and I
> only can restart the server.
> 
> Does anyone know about that issue or that it would be fixed if I
> upgrade to one of the newer Ceph versions?!

See

http://www.spinics.net/lists/ceph-devel/msg29504.html
http://tracker.ceph.com/issues/15302

and search for "[ceph-users] cephfs Kernel panic" thread from yesterday
here on ceph-users - archives haven't caught up yet.

Thanks,

Ilya

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS and Ubuntu Backport Kernel Problem

2016-04-12 Thread Mathias Buresch

Hi there,

I have an issue with using Ceph and Ubuntu Backport Kernel newer than
3.19.0-43.

Following setup I have:

Ubuntu 14.04
Kernel 3.19.0-43 (Backport Kernel)
Ceph 0.94.6

I am using CephFS! The kernel 3.19.0-43 was the last working kernel.
Every newer kernel is failing and has a kernel panic or something.
When starting the server the processes itself starting normal, but when
mounting CephFS (the kernel version - not the FUSE!) it hangs and I
only can restart the server.

Does anyone know about that issue or that it would be fixed if I
upgrade to one of the newer Ceph versions?!


Greetz
Mathias

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph FS - MDS problem

2015-07-03 Thread Mathias Buresch

Hi Dan,

thanks for quick reply!
Didnt read in detail yet but here are my first comments:


3.b BTW, our old friend updatedb seems to trigger the same problem..
grabbing caps very quickly as it indexes CephFS. updatedb.conf is
configured to PRUNEFS="... fuse ...", but CephFS has type
fuse.ceph-fuse. We'll need to add "ceph" to that list too.
This was my first thought and I add CEPH paths to PRUNEPATH. But as you 
said maybe I have to add FS type too..?!
>> PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs 
/*var/lib/ceph/osd **/mnt/ceph*"
NEVERTHLESS since that I didnt see the mlocate process in 'D' state 
anymore. Plus the fact that the high load comes up with MDS / client 
problem I dont really think that updatedb ist still the problembut 
just a assumption

4. "mds cache size = 500" is going to use a lot of memory! We have
an MDS with just 8GB of RAM and it goes OOM after delegating  around 1
million caps. (this is with mds cache size = 10, btw)
At least I never saw on 'htop' that much memory used or any other 
resources are high...just the load raises



Best regards,

Mathias

 





smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph FS - MDS problem

2015-07-03 Thread Mathias Buresch

Hi there,

maybe you could be so kind and help me with following issue:

We running Ceph FS but there's repeatedly a problem with the MDS.

Sometimes following error occurs: "mds0: Client 701782 failing to 
respond to capability release"
Listing the session informations shows that the "num_caps" on that 
Client is much more than on the other Clients. ( see also -> attachement )


The problem is that the load on one of the server is increasing to 
really high value ( 80 to 100 ) independent of client which is complaining.


I guess my problem is also that I dont really understand the meaning of 
those "capabilties".


Following facts (let me know if you need more):

 * CEPH-FS-Client, MDS, MON, OSD all on same server
 * Kernel-Client (Kernel: 3.14.16-031416-generic)
 * MDS config
 o only raised "mds cache size = 500"  (because before there
   was error "failing to respond to cache pressure")


Best regards
Mathias



 




# CEPH FS ERROR

09:33:30 PROD root@ceph01:~# ceph -s
cluster xxx
 health HEALTH_WARN
mds0: Client 701782 failing to respond to capability release
 monmap e1: 3 mons at 
{ceph01=xx.xx.xx.114:6789/0,ceph02=xx.xx.xx.115:6789/0,ceph03=xx.xx.xx.116:6789/0}
election epoch 106, quorum 0,1,2 ceph01,ceph02,ceph03
 mdsmap e260: 1/1/1 up {0=ceph01=up:active}, 2 up:standby
 .


-> Load raises immedtiatly


09:33:32 PROD root@ceph01:~# ceph daemon mds.ceph01 session ls
[
{
"id": 701782,
"num_leases": 16,
"num_caps": 221397,
"state": "open",
"replay_requests": 0,
"reconnecting": false,
"inst": "client.701782 xx.xx.xx.114:0\/1344307356",
"client_metadata": {}
},
{
"id": 692103,
"num_leases": 1,
"num_caps": 50115,
"state": "open",
"replay_requests": 0,
"reconnecting": false,
"inst": "client.692103 xx.xx.xx.117:0\/3600471798",
"client_metadata": {}
},
{
"id": 691995,
"num_leases": 2,
"num_caps": 53227,
"state": "open",
"replay_requests": 0,
"reconnecting": false,
"inst": "client.691995 xx.xx.xx.115:0\/1220606159",
"client_metadata": {}
},
{
"id": 692058,
"num_leases": 8,
"num_caps": 49722,
"state": "open",
"replay_requests": 0,
"reconnecting": false,
"inst": "client.692058 xx.xx.xx.116:0\/4048537076",
"client_metadata": {}
}
]


09:38:18 PROD root@ceph01:~# ceph daemon mds.ceph01 perf dump
{
"mds": {
"request": 1387754,
"reply": 1387696,
"reply_latency": {
"avgcount": 1387696,
"sum": 6439.991891758
},
"forward": 0,
"dir_fetch": 57946,
"dir_commit": 35053,
"dir_split": 0,
"inode_max": 500,
"inodes": 1116643,
"inodes_top": 837156,
"inodes_bottom": 279487,
"inodes_pin_tail": 0,
"inodes_pinned": 292936,
"inodes_expired": 0,
"inodes_with_caps": 269668,
"caps": 374718,
"subtrees": 2,
"traverse": 2591500,
"traverse_hit": 2492810,
"traverse_forward": 0,
"traverse_discover": 0,
"traverse_dir_fetch": 19330,
"traverse_remote_ino": 0,
"traverse_lock": 2350,
"load_cent": 138774897,
"q": 0,
"exported": 0,
"exported_inodes": 0,
"imported": 0,
"imported_inodes": 0
},
"mds_cache": {
"num_strays": 56,
"num_strays_purging": 0,
"num_strays_delayed": 0,
"strays_created": 2835,
"strays_purged": 2802,
"num_recovering_processing": 0,
"num_recovering_enqueued": 0,
"num_recovering_prioritized": 0,
"recovery_started": 0,
"recovery_completed": 0
},
"mds_log": {
"evadd": 376174,
"evex": 377829,
"evtrm": 377829,
"ev": 13815,
"evexg": 0,
"evexd": 1024,
"segadd": 738,
"segex": 738,
"segtrm": 738,
"seg": 31,
"segexg": 0,
"segexd": 1,
"expos": 6882857746,
"wrpos": 6991387600,
"rdpos": 4859818564,
"jlat": 0
},
"mds_mem": {
"ino": 1112733,
"ino+": 1115537,
"ino-": 2804,
"dir": 66813,
"dir+": 67017,
"dir-": 204,
"dn": 1116643,
"dn+": 1121224,
"dn-": 4581,
"cap": 374718,
"cap+": 1005845,
"cap-": 631127,
"rss": 6992420,
"heap": 49060,
"malloc": 18446744073708021059,
"buf": 0
},
"mds_server": {
"handle_client_request": 1387754,
"handle_slave_request": 0,
"handle_client_session": 80950,
"dispatch_client_request": 2526245,
"dispatch_server_request": 0
},
"objecter": {
"op_active": 0,