Thank you for your reply

I had read the 'mds crashing' thread and i dont think im seeing that bug (http://tracker.ceph.com/issues/10449) .

I have enabled "debug objector = 10" and here is the full log on starting mds : http://pastebin.com/dbk0uLYy

Here is the last part of log:


-35> 2015-05-29 09:28:23.104098 7f78cdcde700 10 mds.0.objecter ms_handle_connect 0x3f43440 -34> 2015-05-29 09:28:23.104555 7f78cdcde700 10 mds.0.objecter ms_handle_connect 0x3f43860 -33> 2015-05-29 09:28:23.105016 7f78cdcde700 10 mds.0.objecter ms_handle_connect 0x3f43de0 -32> 2015-05-29 09:28:23.105350 7f78c57ad700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(25 10000000064.00000002 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -31> 2015-05-29 09:28:23.105375 7f78c57ad700 10 mds.0.objecter in handle_osd_op_reply -30> 2015-05-29 09:28:23.105378 7f78c57ad700 7 mds.0.objecter handle_osd_op_reply 25 ondisk v 0'0 uv 0 in 11.2a2643ed attempt 1 -29> 2015-05-29 09:28:23.105381 7f78c57ad700 10 mds.0.objecter op 0 rval -95 len 0 -28> 2015-05-29 09:28:23.105387 7f78c57ad700 5 mds.0.objecter 1 unacked, 4 uncommitted -27> 2015-05-29 09:28:23.105678 7f78c55ab700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(26 10000000064.00000003 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -26> 2015-05-29 09:28:23.105696 7f78c55ab700 10 mds.0.objecter in handle_osd_op_reply -25> 2015-05-29 09:28:23.105699 7f78c55ab700 7 mds.0.objecter handle_osd_op_reply 26 ondisk v 0'0 uv 0 in 11.beb48626 attempt 1 -24> 2015-05-29 09:28:23.105702 7f78c55ab700 10 mds.0.objecter op 0 rval -95 len 0 -23> 2015-05-29 09:28:23.105708 7f78c55ab700 5 mds.0.objecter 1 unacked, 3 uncommitted -22> 2015-05-29 09:28:23.106134 7f78c54aa700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(27 10000000064.00000001 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -21> 2015-05-29 09:28:23.106152 7f78c54aa700 10 mds.0.objecter in handle_osd_op_reply -20> 2015-05-29 09:28:23.106155 7f78c54aa700 7 mds.0.objecter handle_osd_op_reply 27 ondisk v 0'0 uv 0 in 11.4a09fd98 attempt 1 -19> 2015-05-29 09:28:23.106158 7f78c54aa700 10 mds.0.objecter op 0 rval -95 len 0 -18> 2015-05-29 09:28:23.106163 7f78c54aa700 5 mds.0.objecter 1 unacked, 2 uncommitted -17> 2015-05-29 09:28:23.106524 7f78c53a9700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(28 10000000064.00000000 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -16> 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in handle_osd_op_reply -15> 2015-05-29 09:28:23.106543 7f78c53a9700 7 mds.0.objecter handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1 -14> 2015-05-29 09:28:23.106546 7f78c53a9700 10 mds.0.objecter op 0 rval -95 len 0 -13> 2015-05-29 09:28:23.106552 7f78c53a9700 5 mds.0.objecter 1 unacked, 1 uncommitted -12> 2015-05-29 09:28:23.106958 7f78c52a8700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(29 10000000064.00000004 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -11> 2015-05-29 09:28:23.106971 7f78c52a8700 10 mds.0.objecter in handle_osd_op_reply -10> 2015-05-29 09:28:23.106973 7f78c52a8700 7 mds.0.objecter handle_osd_op_reply 29 ondisk v 0'0 uv 0 in 11.50e84eb2 attempt 1 -9> 2015-05-29 09:28:23.106976 7f78c52a8700 10 mds.0.objecter op 0 rval -95 len 0 -8> 2015-05-29 09:28:23.106980 7f78c52a8700 5 mds.0.objecter 1 unacked, 0 uncommitted -7> 2015-05-29 09:28:23.107296 7f78c69bf700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(30 1.00000000 [omap-get-header 0~0,omap-get-vals 0~16] v0'0 uv1 ondisk = 0) v6 -6> 2015-05-29 09:28:23.107307 7f78c69bf700 10 mds.0.objecter in handle_osd_op_reply -5> 2015-05-29 09:28:23.107309 7f78c69bf700 7 mds.0.objecter handle_osd_op_reply 30 ondisk v 0'0 uv 1 in 13.6b2cdaff attempt 0 -4> 2015-05-29 09:28:23.107311 7f78c69bf700 10 mds.0.objecter op 0 rval 0 len 222 -3> 2015-05-29 09:28:23.107313 7f78c69bf700 10 mds.0.objecter op 1 rval 0 len 4 -2> 2015-05-29 09:28:23.107315 7f78c69bf700 10 mds.0.objecter op 1 handler 0x3e316b0 -1> 2015-05-29 09:28:23.107321 7f78c69bf700 5 mds.0.objecter 0 unacked, 0 uncommitted 0> 2015-05-29 09:28:23.108478 7f78cb4d9700 -1 mds/MDCache.cc: In function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 7f78cb4d9700 time 2015-05-29 09:28:23.107027
mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2)



On 28/05/15 17:43, John Spray wrote:

(This came up as in-reply-to to the previous "mds crashing" thread -- it's better to start threads with a fresh message)



On 28/05/2015 16:58, Peter Tiernan wrote:
Hi all,

I have been testing cephfs with erasure coded pool and cache tier. I have 3 mds running on the same physical server as 3 mons. The cluster is in ok state otherwise, rbd is working and all pg are active+clean. Im running v 0.87.2 giant on all nodes and ubuntu 14.04.2 .

The cluster was working fine but when copying a large file on a client to cephfs, it froze and now mdss keep crashing with:

0> 2015-05-28 16:50:58.267112 7f0282946700 -1 mds/MDCache.cc: In function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 7f0282946700 time 2015-05-28 16:50:58.243904
mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2)

any ideas?

You're getting some kind of IO error from RADOS, and the CephFS code doesn't have clean handling for that in many cases, so it's asserting out.

Enable "debug objecter = 10" on the MDS to see what the operation is that's failing, and please provide the whole section of the log leading up to the crash rather than just the last line.

Cheers,
John



--
Peter Tiernan, Storage Engineer, Digital Repository of Ireland (DRI)
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/  | ptier...@tchpc.tcd.ie
Tel: +353-1-896-4466

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to