Hi Greg,

Thank you for your concern.

It seems that problem was caused by ceph-mds. While the rest of Ceph modules have been upgraded to 0.61.8, ceph-mds was 0.56.7.

I've updated ceph-mds and cluster stabilised within few hours.

Kind regards, Serge

On 08/30/2013 08:22 PM, Gregory Farnum wrote:
Can you start up your mds with "dedug mds = 20" and "debug ms = 20"?
The "failed to decode message" line is suspicious but there's not
enough context here for me to be sure, and my pattern-matching isn't
reminding me of any serious bugs.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Thu, Aug 29, 2013 at 3:10 AM, Serge Slipchenko
<serge.slipche...@zoral.com.ua> wrote:
Hi,

I upgraded Ceph from Bobtail to Cuttlefish and everything seemed good.
Then I started to write to cephfs, but at some moment write stalled.
After that I'm not able to mount either with kernel driver, or with
custom utility.

ceph -s shows that everything is good.

health HEALTH_OK
monmap e2: 2 mons at {m01=5.9.118.83:6789/0,m02=5.9.122.115:6789/0},
election epoch 1320, quorum 0,1 m01,m02
osdmap e3967: 16 osds: 16 up, 16 in
pgmap v1315932: 256 pgs: 255 active+clean, 1 active+clean+scrubbing; 215
GB data, 448 GB used, 38441 GB / 40971 GB avail; 37585KB/s rd, 1op/s
mdsmap e774: 1/1/1 up {0=m02=up:active}, 1 up:standby

But in the mds.a log I see the following messages:

2013-08-29 10:06:34.371166 7f49e68aa700  0 -- 5.9.122.115:6807/1077 >>
91.193.166.194:0/2272475298 pipe(0x8de3780 sd=74 :6807 s=0 pgs=0 cs=0
l=0).accept peer addr is really 91.193.166.194:0/2272475298 (socket is
91.193.166.194:56649/0)
2013-08-29 10:07:38.454659 7f49e68aa700  0 -- 5.9.122.115:6807/1077 >>
91.193.166.194:0/2272475298 pipe(0x8de3780 sd=74 :6807 s=2 pgs=2 cs=1
l=0).fault, server, going to standby
2013-08-29 10:23:06.898089 7f49e60a2700  0 -- 5.9.122.115:6807/1077 >>
91.193.166.194:0/3930317661 pipe(0x7442c000 sd=78 :6807 s=0 pgs=0 cs=0
l=0).accept peer addr is really 91.193.166.194:0/3930317661 (socket is
91.193.166.194:56272/0)
2013-08-29 10:24:07.384136 7f49e60a2700  0 -- 5.9.122.115:6807/1077 >>
91.193.166.194:0/3930317661 pipe(0x7442c000 sd=78 :6807 s=2 pgs=2 cs=1
l=0).fault, server, going to standby
2013-08-29 10:30:21.177807 7f49e5c9e700  0 -- 5.9.122.115:6807/1077 >>
91.193.166.194:0/1838286378 pipe(0x73bd8a00 sd=80 :6807 s=0 pgs=0 cs=0
l=0).accept peer addr is really 91.193.166.194:0/1838286378 (socket is
91.193.166.194:59069/0)
2013-08-29 10:31:21.300004 7f49e5c9e700  0 -- 5.9.122.115:6807/1077 >>
91.193.166.194:0/1838286378 pipe(0x73bd8a00 sd=80 :6807 s=2 pgs=2 cs=1
l=0).fault, server, going to standby
2013-08-29 11:17:17.331613 7f040de6b700  0 -- 5.9.122.115:6807/7622 >>
91.193.166.194:0/2689145238 pipe(0x13ea780 sd=34 :6807 s=2 pgs=2 cs=1
l=0).fault with nothing to send, going to standby
2013-08-29 11:22:08.137711 7f0411897700  0 log [INF] : closing stale
session client.76201 91.193.166.194:0/2689145238 after 304.270364

And mds.b outputs a lot of:

2013-08-29 12:04:58.743938 7fa75604d700 -1 failed to decode message of
type 23 v2: buffer::end_of_buffer
2013-08-29 12:04:58.743969 7fa75604d700  0 -- 5.9.122.115:6800/977 >>
144.76.13.103:0/925435369 pipe(0x524e780 sd=39 :6800 s=2 pgs=130763
cs=12829 l=0).fault with nothing to send, going to standby
2013-08-29 12:04:58.744236 7fa755f4c700  0 -- 5.9.122.115:6800/977 >>
144.76.13.102:0/2955281877 pipe(0x524e500 sd=37 :6800 s=0 pgs=0 cs=0
l=0).accept connect_seq 12834 vs existing 12833 state standby
2013-08-29 12:04:58.744607 7fa756754700  0 -- 5.9.122.115:6800/977 >>
144.76.13.105:0/347604456 pipe(0x52c5a00 sd=38 :6800 s=0 pgs=0 cs=0
l=0).accept connect_seq 12538 vs existing 12537 state standby
2013-08-29 12:04:58.744627 7fa755f4c700 -1 failed to decode message of
type 23 v2: buffer::end_of_buffer
2013-08-29 12:04:58.744671 7fa755f4c700  0 -- 5.9.122.115:6800/977 >>
144.76.13.102:0/2955281877 pipe(0x524e500 sd=37 :6800 s=2 pgs=292532
cs=12835 l=0).fault with nothing to send, going to standby
2013-08-29 12:04:58.745006 7fa75614e700  0 -- 5.9.122.115:6800/977 >>
144.76.13.103:0/925435369 pipe(0x52c5780 sd=31 :6800 s=0 pgs=0 cs=0
l=0).accept connect_seq 12830 vs existing 12829 state standby
2013-08-29 12:04:58.745102 7fa756754700 -1 failed to decode message of
type 23 v2: buffer::end_of_buffer
2013-08-29 12:04:58.745146 7fa756754700  0 -- 5.9.122.115:6800/977 >>
144.76.13.105:0/347604456 pipe(0x52c5a00 sd=38 :6800 s=2 pgs=131368
cs=12539 l=0).fault with nothing to send, going to standby


--
Kind regards, Serge Slipchenko
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Serge Slipchenko
E-mail: serge.slipche...@zoral.com.ua
Skype: serge.slipchenko
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to