Ceph mon crash

2012-03-20 Thread ruslan usifov
2012/3/20 Greg Farnum gregory.far...@dreamhost.com:
 On Monday, March 19, 2012 at 11:44 AM, ruslan usifov wrote:
 Sorry but no, i use precompiled binaries from this
 http://ceph.newdream.net/debian. Perhaps this helps, initialy i
 configure all ceph services mon, mds, osd, but then i test only rdb
 and remove all mds from cluster (3 vmware machines) throw follow
 command:

 ceph mds rm 1 (i write this lines by memory so can mistaken in syntax)

 Oh. That's a fun command! Where on earth did you find it documented?
 Unfortunately, it's only supposed to be used when things get weird. (And 
 really, I'm not sure when it would be appropriate.) If you run it on a 
 healthy cluster, it will break things.
 I created a bug to make it not do that: 
 http://tracker.newdream.net/issues/2188


I found it in source. I want to liquidate of war messages which appear
when i monitor cluster with follow:

ceph -w

There was messages that i have one down mds (actually i doesn't have any)


 If necessary I can figure out how to create a good MDSMap and inject it into 
 your monitors, but I'd rather not if you don't have any data in there. (In 
 which case, reformat the cluster.)


This is a test environment, so i reformat cluster map manually. Also i
must say that mons die not immediately when i run (ceph mds rm 1), but
after some time
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph mon crash

2012-03-20 Thread Greg Farnum
On Tuesday, March 20, 2012 at 1:02 AM, ruslan usifov wrote:
 I found it in source. I want to liquidate of war messages which appear
 when i monitor cluster with follow:
 
 ceph -w
 
 There was messages that i have one down mds (actually i doesn't have any)
Ah. I'm not sure we actually support removal of all the MDSes once you start 
one up. Given the prevalence of RBD users we probably should, though! Bug 
filed: http://tracker.newdream.net/issues/2195


I *think* that if you don't ever create an MDS that line won't show up; 
somebody who runs an RBD cluster could tell you for sure. :)
 
  If necessary I can figure out how to create a good MDSMap and inject it 
  into your monitors, but I'd rather not if you don't have any data in there. 
  (In which case, reformat the cluster.)
 
 This is a test environment, so i reformat cluster map manually. Also i
 must say that mons die not immediately when i run (ceph mds rm 1), but
 after some time

Yes, when you ran the mds rm command you corrupted your MDSMap, but the system 
doesn't notice right away since the map is not being accessed by clients or 
your servers. But eventually you (or a monitoring service, more likely) ran the 
ceph health command, which made the mon look at the MDSMap, which caused an 
assert. Then the ceph tool tried again on a different monitor, etc.



-Greg 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph mon crash

2012-03-19 Thread Greg Farnum
On Monday, March 19, 2012 at 7:33 AM, ruslan usifov wrote:
 Hello
  
 I have follow stack trace:
  
 #0 0xb77fa424 in __kernel_vsyscall ()
 (gdb) bt
 #0 0xb77fa424 in __kernel_vsyscall ()
 #1 0xb77e98a0 in raise () from /lib/i386-linux-gnu/
 libpthread.so.0
 #2 0x08230f8b in ?? ()
 #3 signal handler called
 #4 0xb77fa424 in __kernel_vsyscall ()
 #5 0xb70eae71 in raise () from /lib/i386-linux-gnu/libc.so.6
 #6 0xb70ee34e in abort () from /lib/i386-linux-gnu/libc.so.6
 #7 0xb73130b5 in __gnu_cxx::__verbose_terminate_handler() () from
 /usr/lib/i386-linux-gnu/libstdc++.so.6
 #8 0xb7310fa5 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
 #9 0xb7310fe2 in std::terminate() () from
 /usr/lib/i386-linux-gnu/libstdc++.so.6
 #10 0xb731114e in __cxa_throw () from /usr/lib/i386-linux-gnu/libstdc++.so.6
 #11 0x0822f8c7 in ceph::__ceph_assert_fail(char const*, char const*,
 int, char const*) ()
 #12 0x081cf8a4 in MDSMap::get_health(std::basic_ostreamchar,
 std::char_traitschar ) const ()
 #13 0x0811e8a7 in MDSMonitor::get_health(std::basic_ostreamchar,
 std::char_traitschar ) const ()
 #14 0x080c4977 in Monitor::handle_command(MMonCommand*) ()
 #15 0x080cf244 in Monitor::_ms_dispatch(Message*) ()
 #16 0x080df1a4 in Monitor::ms_dispatch(Message*) ()
 #17 0x081f706d in SimpleMessenger::dispatch_entry() ()
 #18 0x080b27d2 in SimpleMessenger::DispatchThread::entry() ()
 #19 0x081b5d81 in Thread::_entry_func(void*) ()
 #20 0xb77e0e99 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
 #21 0xb71919ee in clone () from /lib/i386-linux-gnu/libc.so.6

Can you get the line number from frame 12? (f 12 enter, then just paste the 
output) Also the output of ceph -s if things are still running. The only 
assert I see in get_health() is that each up MDS be in mds_info, which really 
ought to be true….
  
 And when one mon crashes all other monitors in cluster will crashes
 too:-((. So one time in cluster not any alive mons

Yeah, this is because the crash is being triggered by a get_health command and 
it's trying it out on each monitor in turn as they fail.
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph mon crash

2012-03-19 Thread ruslan usifov
2012/3/19 Greg Farnum gregory.far...@dreamhost.com:
 On Monday, March 19, 2012 at 7:33 AM, ruslan usifov wrote:
 Hello

 I have follow stack trace:

 #0 0xb77fa424 in __kernel_vsyscall ()
 (gdb) bt
 #0 0xb77fa424 in __kernel_vsyscall ()
 #1 0xb77e98a0 in raise () from /lib/i386-linux-gnu/
 libpthread.so.0
 #2 0x08230f8b in ?? ()
 #3 signal handler called
 #4 0xb77fa424 in __kernel_vsyscall ()
 #5 0xb70eae71 in raise () from /lib/i386-linux-gnu/libc.so.6
 #6 0xb70ee34e in abort () from /lib/i386-linux-gnu/libc.so.6
 #7 0xb73130b5 in __gnu_cxx::__verbose_terminate_handler() () from
 /usr/lib/i386-linux-gnu/libstdc++.so.6
 #8 0xb7310fa5 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
 #9 0xb7310fe2 in std::terminate() () from
 /usr/lib/i386-linux-gnu/libstdc++.so.6
 #10 0xb731114e in __cxa_throw () from /usr/lib/i386-linux-gnu/libstdc++.so.6
 #11 0x0822f8c7 in ceph::__ceph_assert_fail(char const*, char const*,
 int, char const*) ()
 #12 0x081cf8a4 in MDSMap::get_health(std::basic_ostreamchar,
 std::char_traitschar ) const ()
 #13 0x0811e8a7 in MDSMonitor::get_health(std::basic_ostreamchar,
 std::char_traitschar ) const ()
 #14 0x080c4977 in Monitor::handle_command(MMonCommand*) ()
 #15 0x080cf244 in Monitor::_ms_dispatch(Message*) ()
 #16 0x080df1a4 in Monitor::ms_dispatch(Message*) ()
 #17 0x081f706d in SimpleMessenger::dispatch_entry() ()
 #18 0x080b27d2 in SimpleMessenger::DispatchThread::entry() ()
 #19 0x081b5d81 in Thread::_entry_func(void*) ()
 #20 0xb77e0e99 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
 #21 0xb71919ee in clone () from /lib/i386-linux-gnu/libc.so.6

 Can you get the line number from frame 12? (f 12 enter, then just paste the 
 output) Also the output of ceph -s if things are still running. The only 
 assert I see in get_health() is that each up MDS be in mds_info, which 
 really ought to be true….


Sorry but no, i use precompiled binaries from this
http://ceph.newdream.net/debian. Perhaps this helps, initialy i
configure all ceph services mon, mds, osd, but then i test only rdb
and remove all mds from cluster (3 vmware machines) throw follow
command:

ceph mds rm 1 (i write this lines by memory so can mistaken in syntax)


 And when one mon crashes all other monitors in cluster will crashes
 too:-((. So one time in cluster not any alive mons

 Yeah, this is because the crash is being triggered by a get_health command 
 and it's trying it out on each monitor in turn as they fail.
 -Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph mon crash

2012-03-19 Thread Greg Farnum
On Monday, March 19, 2012 at 11:44 AM, ruslan usifov wrote:
 Sorry but no, i use precompiled binaries from this
 http://ceph.newdream.net/debian. Perhaps this helps, initialy i
 configure all ceph services mon, mds, osd, but then i test only rdb
 and remove all mds from cluster (3 vmware machines) throw follow
 command:
 
 ceph mds rm 1 (i write this lines by memory so can mistaken in syntax)

Oh. That's a fun command! Where on earth did you find it documented?
Unfortunately, it's only supposed to be used when things get weird. (And 
really, I'm not sure when it would be appropriate.) If you run it on a healthy 
cluster, it will break things.
I created a bug to make it not do that: http://tracker.newdream.net/issues/2188

If necessary I can figure out how to create a good MDSMap and inject it into 
your monitors, but I'd rather not if you don't have any data in there. (In 
which case, reformat the cluster.)
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html