Ceph mon crash
2012/3/20 Greg Farnum gregory.far...@dreamhost.com: On Monday, March 19, 2012 at 11:44 AM, ruslan usifov wrote: Sorry but no, i use precompiled binaries from this http://ceph.newdream.net/debian. Perhaps this helps, initialy i configure all ceph services mon, mds, osd, but then i test only rdb and remove all mds from cluster (3 vmware machines) throw follow command: ceph mds rm 1 (i write this lines by memory so can mistaken in syntax) Oh. That's a fun command! Where on earth did you find it documented? Unfortunately, it's only supposed to be used when things get weird. (And really, I'm not sure when it would be appropriate.) If you run it on a healthy cluster, it will break things. I created a bug to make it not do that: http://tracker.newdream.net/issues/2188 I found it in source. I want to liquidate of war messages which appear when i monitor cluster with follow: ceph -w There was messages that i have one down mds (actually i doesn't have any) If necessary I can figure out how to create a good MDSMap and inject it into your monitors, but I'd rather not if you don't have any data in there. (In which case, reformat the cluster.) This is a test environment, so i reformat cluster map manually. Also i must say that mons die not immediately when i run (ceph mds rm 1), but after some time -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph mon crash
On Tuesday, March 20, 2012 at 1:02 AM, ruslan usifov wrote: I found it in source. I want to liquidate of war messages which appear when i monitor cluster with follow: ceph -w There was messages that i have one down mds (actually i doesn't have any) Ah. I'm not sure we actually support removal of all the MDSes once you start one up. Given the prevalence of RBD users we probably should, though! Bug filed: http://tracker.newdream.net/issues/2195 I *think* that if you don't ever create an MDS that line won't show up; somebody who runs an RBD cluster could tell you for sure. :) If necessary I can figure out how to create a good MDSMap and inject it into your monitors, but I'd rather not if you don't have any data in there. (In which case, reformat the cluster.) This is a test environment, so i reformat cluster map manually. Also i must say that mons die not immediately when i run (ceph mds rm 1), but after some time Yes, when you ran the mds rm command you corrupted your MDSMap, but the system doesn't notice right away since the map is not being accessed by clients or your servers. But eventually you (or a monitoring service, more likely) ran the ceph health command, which made the mon look at the MDSMap, which caused an assert. Then the ceph tool tried again on a different monitor, etc. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph mon crash
On Monday, March 19, 2012 at 7:33 AM, ruslan usifov wrote: Hello I have follow stack trace: #0 0xb77fa424 in __kernel_vsyscall () (gdb) bt #0 0xb77fa424 in __kernel_vsyscall () #1 0xb77e98a0 in raise () from /lib/i386-linux-gnu/ libpthread.so.0 #2 0x08230f8b in ?? () #3 signal handler called #4 0xb77fa424 in __kernel_vsyscall () #5 0xb70eae71 in raise () from /lib/i386-linux-gnu/libc.so.6 #6 0xb70ee34e in abort () from /lib/i386-linux-gnu/libc.so.6 #7 0xb73130b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #8 0xb7310fa5 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #9 0xb7310fe2 in std::terminate() () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #10 0xb731114e in __cxa_throw () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #11 0x0822f8c7 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () #12 0x081cf8a4 in MDSMap::get_health(std::basic_ostreamchar, std::char_traitschar ) const () #13 0x0811e8a7 in MDSMonitor::get_health(std::basic_ostreamchar, std::char_traitschar ) const () #14 0x080c4977 in Monitor::handle_command(MMonCommand*) () #15 0x080cf244 in Monitor::_ms_dispatch(Message*) () #16 0x080df1a4 in Monitor::ms_dispatch(Message*) () #17 0x081f706d in SimpleMessenger::dispatch_entry() () #18 0x080b27d2 in SimpleMessenger::DispatchThread::entry() () #19 0x081b5d81 in Thread::_entry_func(void*) () #20 0xb77e0e99 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0 #21 0xb71919ee in clone () from /lib/i386-linux-gnu/libc.so.6 Can you get the line number from frame 12? (f 12 enter, then just paste the output) Also the output of ceph -s if things are still running. The only assert I see in get_health() is that each up MDS be in mds_info, which really ought to be true…. And when one mon crashes all other monitors in cluster will crashes too:-((. So one time in cluster not any alive mons Yeah, this is because the crash is being triggered by a get_health command and it's trying it out on each monitor in turn as they fail. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph mon crash
2012/3/19 Greg Farnum gregory.far...@dreamhost.com: On Monday, March 19, 2012 at 7:33 AM, ruslan usifov wrote: Hello I have follow stack trace: #0 0xb77fa424 in __kernel_vsyscall () (gdb) bt #0 0xb77fa424 in __kernel_vsyscall () #1 0xb77e98a0 in raise () from /lib/i386-linux-gnu/ libpthread.so.0 #2 0x08230f8b in ?? () #3 signal handler called #4 0xb77fa424 in __kernel_vsyscall () #5 0xb70eae71 in raise () from /lib/i386-linux-gnu/libc.so.6 #6 0xb70ee34e in abort () from /lib/i386-linux-gnu/libc.so.6 #7 0xb73130b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #8 0xb7310fa5 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #9 0xb7310fe2 in std::terminate() () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #10 0xb731114e in __cxa_throw () from /usr/lib/i386-linux-gnu/libstdc++.so.6 #11 0x0822f8c7 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () #12 0x081cf8a4 in MDSMap::get_health(std::basic_ostreamchar, std::char_traitschar ) const () #13 0x0811e8a7 in MDSMonitor::get_health(std::basic_ostreamchar, std::char_traitschar ) const () #14 0x080c4977 in Monitor::handle_command(MMonCommand*) () #15 0x080cf244 in Monitor::_ms_dispatch(Message*) () #16 0x080df1a4 in Monitor::ms_dispatch(Message*) () #17 0x081f706d in SimpleMessenger::dispatch_entry() () #18 0x080b27d2 in SimpleMessenger::DispatchThread::entry() () #19 0x081b5d81 in Thread::_entry_func(void*) () #20 0xb77e0e99 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0 #21 0xb71919ee in clone () from /lib/i386-linux-gnu/libc.so.6 Can you get the line number from frame 12? (f 12 enter, then just paste the output) Also the output of ceph -s if things are still running. The only assert I see in get_health() is that each up MDS be in mds_info, which really ought to be true…. Sorry but no, i use precompiled binaries from this http://ceph.newdream.net/debian. Perhaps this helps, initialy i configure all ceph services mon, mds, osd, but then i test only rdb and remove all mds from cluster (3 vmware machines) throw follow command: ceph mds rm 1 (i write this lines by memory so can mistaken in syntax) And when one mon crashes all other monitors in cluster will crashes too:-((. So one time in cluster not any alive mons Yeah, this is because the crash is being triggered by a get_health command and it's trying it out on each monitor in turn as they fail. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph mon crash
On Monday, March 19, 2012 at 11:44 AM, ruslan usifov wrote: Sorry but no, i use precompiled binaries from this http://ceph.newdream.net/debian. Perhaps this helps, initialy i configure all ceph services mon, mds, osd, but then i test only rdb and remove all mds from cluster (3 vmware machines) throw follow command: ceph mds rm 1 (i write this lines by memory so can mistaken in syntax) Oh. That's a fun command! Where on earth did you find it documented? Unfortunately, it's only supposed to be used when things get weird. (And really, I'm not sure when it would be appropriate.) If you run it on a healthy cluster, it will break things. I created a bug to make it not do that: http://tracker.newdream.net/issues/2188 If necessary I can figure out how to create a good MDSMap and inject it into your monitors, but I'd rather not if you don't have any data in there. (In which case, reformat the cluster.) -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html