Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2016-03-13 Thread Richard Bade
Hi Everyone,
Thanks for your input on this. I know it's been a long time but I just
wanted to report back that this issue has been resolved.
We added two more monitors which happened to be on Ubuntu 14.04
(rather than 12.04) and these had no issues. So we upgraded every host
to 14.04.
Since the OS update we have not had any Monitor crashes. It's now been
over two months and the Mon's have been stable.
Thanks again,
Richard

On 17 October 2015 at 07:26, Richard Bade  wrote:
> Ok, debugging increased
> ceph tell mon.[abc] injectargs --debug-mon 20
> ceph tell mon.[abc] injectargs --debug-ms 1
>
> Regards,
> Richard
>
> On 17 October 2015 at 01:38, Sage Weil  wrote:
>>
>> This doesn't look familiar.  Are you able to enable a higher log level so
>> that if it happens again we'll have more info?
>>
>> debug mon = 20
>> debug ms = 1
>>
>> Thanks!
>> sage
>>
>> On Fri, 16 Oct 2015, Dan van der Ster wrote:
>>
>> > Hmm, that's strange. I didn't see anything in the tracker that looks
>> > related. Hopefully an expert can chime in...
>> >
>> > Cheers, Dan
>> >
>> > On Fri, Oct 16, 2015 at 1:38 PM, Richard Bade  wrote:
>> > > Thanks for your quick response Dan, but no. All the ceph-mon.*.log
>> > > files are
>> > > empty.
>> > > I did track this down in syslog though, in case it helps:
>> > > ceph-mon: 2015-10-16 21:25:00.117115 7f4c9f458700 -1 *** Caught signal
>> > > (Segmentation fault) **#012 in thread 7f4c9f458700#012#012 ceph
>> > > version
>> > > 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)#012 1:
>> > > /usr/bin/ceph-mon()
>> > > [0x928b05]#012 2: (()+0xfcb0) [0x7f4ca50e0cb0]#012 3:
>> > > (get_str_map_key(std::map> > > std::less,
>> > > std::allocator > const&,
>> > > std::string const&, std::string const*)+0x37) [0x87d8e7]#012 4:
>> > > (LogMonitor::update_from_paxos(bool*)+0x801) [0x6846e1]#012 5:
>> > > (PaxosService::refresh(bool*)+0x3c6) [0x5dc326]#012 6:
>> > > (Monitor::refresh_from_paxos(bool*)+0x36b) [0x588aab]#012 7:
>> > > (Paxos::do_refresh()+0x4c) [0x5c465c]#012 8:
>> > > (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5cb2d3]#012 9:
>> > > (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5d3fbb]#012 10:
>> > > (Monitor::dispatch(MonSession*, Message*, bool)+0x864) [0x5ab0d4]#012
>> > > 11:
>> > > (Monitor::_ms_dispatch(Message*)+0x2c9) [0x5a8a19]#012 12:
>> > > (Monitor::ms_dispatch(Message*)+0x32) [0x5c3952]#012 13:
>> > > (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x8ac987]#012 14:
>> > > (DispatchQueue::entry()+0x44a) [0x8a9b2a]#012 15:
>> > > (DispatchQueue::DispatchThread::entry()+0xd) [0x79e4ad]#012 16:
>> > > (()+0x7e9a)
>> > > [0x7f4ca50d8e9a]#012 17: (clone()+0x6d) [0x7f4ca3dca38d]#012 NOTE: a
>> > > copy of
>> > > the executable, or `objdump -rdS ` is needed to interpret
>> > > this.
>> > >
>> > > Regards,
>> > > Richard
>> > >
>> > > On 17 October 2015 at 00:33, Dan van der Ster 
>> > > wrote:
>> > >>
>> > >> Hi,
>> > >> Is there a backtrace in /var/log/ceph/ceph-mon.*.log ?
>> > >> Cheers, Dan
>> > >>
>> > >> On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade 
>> > >> wrote:
>> > >> > Hi Everyone,
>> > >> > I upgraded our cluster to Hammer 0.94.3 a couple of days ago and
>> > >> > today
>> > >> > we've
>> > >> > had one monitor crash twice and another one once. We have 3
>> > >> > monitors
>> > >> > total
>> > >> > and have been running Firefly 0.80.10 for quite some time without
>> > >> > any
>> > >> > monitor issues.
>> > >> > When the monitor crashes it leaves a core file and a crash file in
>> > >> > /var/crash
>> > >> > I can't see anything obviously the same goolging about.
>> > >> > Has anyone seen anything like this?
>> > >> > Any suggestions? What other info would be useful to help track down
>> > >> > the
>> > >> > issue.
>> > >> >
>> > >> > Regards,
>> > >> > Richard
>> > >> >
>> > >> > ___
>> > >> > ceph-users mailing list
>> > >> > ceph-users@lists.ceph.com
>> > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >> >
>> > >
>> > >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
Ok, debugging increased
ceph tell mon.[abc] injectargs --debug-mon 20
ceph tell mon.[abc] injectargs --debug-ms 1

Regards,
Richard

On 17 October 2015 at 01:38, Sage Weil  wrote:

> This doesn't look familiar.  Are you able to enable a higher log level so
> that if it happens again we'll have more info?
>
> debug mon = 20
> debug ms = 1
>
> Thanks!
> sage
>
> On Fri, 16 Oct 2015, Dan van der Ster wrote:
>
> > Hmm, that's strange. I didn't see anything in the tracker that looks
> > related. Hopefully an expert can chime in...
> >
> > Cheers, Dan
> >
> > On Fri, Oct 16, 2015 at 1:38 PM, Richard Bade  wrote:
> > > Thanks for your quick response Dan, but no. All the ceph-mon.*.log
> files are
> > > empty.
> > > I did track this down in syslog though, in case it helps:
> > > ceph-mon: 2015-10-16 21:25:00.117115 7f4c9f458700 -1 *** Caught signal
> > > (Segmentation fault) **#012 in thread 7f4c9f458700#012#012 ceph version
> > > 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)#012 1:
> /usr/bin/ceph-mon()
> > > [0x928b05]#012 2: (()+0xfcb0) [0x7f4ca50e0cb0]#012 3:
> > > (get_str_map_key(std::map std::less,
> > > std::allocator > const&,
> > > std::string const&, std::string const*)+0x37) [0x87d8e7]#012 4:
> > > (LogMonitor::update_from_paxos(bool*)+0x801) [0x6846e1]#012 5:
> > > (PaxosService::refresh(bool*)+0x3c6) [0x5dc326]#012 6:
> > > (Monitor::refresh_from_paxos(bool*)+0x36b) [0x588aab]#012 7:
> > > (Paxos::do_refresh()+0x4c) [0x5c465c]#012 8:
> > > (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5cb2d3]#012 9:
> > > (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5d3fbb]#012 10:
> > > (Monitor::dispatch(MonSession*, Message*, bool)+0x864) [0x5ab0d4]#012
> 11:
> > > (Monitor::_ms_dispatch(Message*)+0x2c9) [0x5a8a19]#012 12:
> > > (Monitor::ms_dispatch(Message*)+0x32) [0x5c3952]#012 13:
> > > (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x8ac987]#012 14:
> > > (DispatchQueue::entry()+0x44a) [0x8a9b2a]#012 15:
> > > (DispatchQueue::DispatchThread::entry()+0xd) [0x79e4ad]#012 16:
> (()+0x7e9a)
> > > [0x7f4ca50d8e9a]#012 17: (clone()+0x6d) [0x7f4ca3dca38d]#012 NOTE: a
> copy of
> > > the executable, or `objdump -rdS ` is needed to interpret
> this.
> > >
> > > Regards,
> > > Richard
> > >
> > > On 17 October 2015 at 00:33, Dan van der Ster 
> wrote:
> > >>
> > >> Hi,
> > >> Is there a backtrace in /var/log/ceph/ceph-mon.*.log ?
> > >> Cheers, Dan
> > >>
> > >> On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade 
> wrote:
> > >> > Hi Everyone,
> > >> > I upgraded our cluster to Hammer 0.94.3 a couple of days ago and
> today
> > >> > we've
> > >> > had one monitor crash twice and another one once. We have 3 monitors
> > >> > total
> > >> > and have been running Firefly 0.80.10 for quite some time without
> any
> > >> > monitor issues.
> > >> > When the monitor crashes it leaves a core file and a crash file in
> > >> > /var/crash
> > >> > I can't see anything obviously the same goolging about.
> > >> > Has anyone seen anything like this?
> > >> > Any suggestions? What other info would be useful to help track down
> the
> > >> > issue.
> > >> >
> > >> > Regards,
> > >> > Richard
> > >> >
> > >> > ___
> > >> > ceph-users mailing list
> > >> > ceph-users@lists.ceph.com
> > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >> >
> > >
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
Hi Everyone,
I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today
we've had one monitor crash twice and another one once. We have 3 monitors
total and have been running Firefly 0.80.10 for quite some time without any
monitor issues.
When the monitor crashes it leaves a core file and a crash file in
/var/crash
I can't see anything obviously the same goolging about.
Has anyone seen anything like this?
Any suggestions? What other info would be useful to help track down the
issue.

Regards,
Richard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Sage Weil
This doesn't look familiar.  Are you able to enable a higher log level so 
that if it happens again we'll have more info?

debug mon = 20
debug ms = 1

Thanks!
sage

On Fri, 16 Oct 2015, Dan van der Ster wrote:

> Hmm, that's strange. I didn't see anything in the tracker that looks
> related. Hopefully an expert can chime in...
> 
> Cheers, Dan
> 
> On Fri, Oct 16, 2015 at 1:38 PM, Richard Bade  wrote:
> > Thanks for your quick response Dan, but no. All the ceph-mon.*.log files are
> > empty.
> > I did track this down in syslog though, in case it helps:
> > ceph-mon: 2015-10-16 21:25:00.117115 7f4c9f458700 -1 *** Caught signal
> > (Segmentation fault) **#012 in thread 7f4c9f458700#012#012 ceph version
> > 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)#012 1: /usr/bin/ceph-mon()
> > [0x928b05]#012 2: (()+0xfcb0) [0x7f4ca50e0cb0]#012 3:
> > (get_str_map_key(std::map > std::allocator > const&,
> > std::string const&, std::string const*)+0x37) [0x87d8e7]#012 4:
> > (LogMonitor::update_from_paxos(bool*)+0x801) [0x6846e1]#012 5:
> > (PaxosService::refresh(bool*)+0x3c6) [0x5dc326]#012 6:
> > (Monitor::refresh_from_paxos(bool*)+0x36b) [0x588aab]#012 7:
> > (Paxos::do_refresh()+0x4c) [0x5c465c]#012 8:
> > (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5cb2d3]#012 9:
> > (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5d3fbb]#012 10:
> > (Monitor::dispatch(MonSession*, Message*, bool)+0x864) [0x5ab0d4]#012 11:
> > (Monitor::_ms_dispatch(Message*)+0x2c9) [0x5a8a19]#012 12:
> > (Monitor::ms_dispatch(Message*)+0x32) [0x5c3952]#012 13:
> > (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x8ac987]#012 14:
> > (DispatchQueue::entry()+0x44a) [0x8a9b2a]#012 15:
> > (DispatchQueue::DispatchThread::entry()+0xd) [0x79e4ad]#012 16: (()+0x7e9a)
> > [0x7f4ca50d8e9a]#012 17: (clone()+0x6d) [0x7f4ca3dca38d]#012 NOTE: a copy of
> > the executable, or `objdump -rdS ` is needed to interpret this.
> >
> > Regards,
> > Richard
> >
> > On 17 October 2015 at 00:33, Dan van der Ster  wrote:
> >>
> >> Hi,
> >> Is there a backtrace in /var/log/ceph/ceph-mon.*.log ?
> >> Cheers, Dan
> >>
> >> On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade  wrote:
> >> > Hi Everyone,
> >> > I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today
> >> > we've
> >> > had one monitor crash twice and another one once. We have 3 monitors
> >> > total
> >> > and have been running Firefly 0.80.10 for quite some time without any
> >> > monitor issues.
> >> > When the monitor crashes it leaves a core file and a crash file in
> >> > /var/crash
> >> > I can't see anything obviously the same goolging about.
> >> > Has anyone seen anything like this?
> >> > Any suggestions? What other info would be useful to help track down the
> >> > issue.
> >> >
> >> > Regards,
> >> > Richard
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Dan van der Ster
Hmm, that's strange. I didn't see anything in the tracker that looks
related. Hopefully an expert can chime in...

Cheers, Dan

On Fri, Oct 16, 2015 at 1:38 PM, Richard Bade  wrote:
> Thanks for your quick response Dan, but no. All the ceph-mon.*.log files are
> empty.
> I did track this down in syslog though, in case it helps:
> ceph-mon: 2015-10-16 21:25:00.117115 7f4c9f458700 -1 *** Caught signal
> (Segmentation fault) **#012 in thread 7f4c9f458700#012#012 ceph version
> 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)#012 1: /usr/bin/ceph-mon()
> [0x928b05]#012 2: (()+0xfcb0) [0x7f4ca50e0cb0]#012 3:
> (get_str_map_key(std::map std::allocator > const&,
> std::string const&, std::string const*)+0x37) [0x87d8e7]#012 4:
> (LogMonitor::update_from_paxos(bool*)+0x801) [0x6846e1]#012 5:
> (PaxosService::refresh(bool*)+0x3c6) [0x5dc326]#012 6:
> (Monitor::refresh_from_paxos(bool*)+0x36b) [0x588aab]#012 7:
> (Paxos::do_refresh()+0x4c) [0x5c465c]#012 8:
> (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5cb2d3]#012 9:
> (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5d3fbb]#012 10:
> (Monitor::dispatch(MonSession*, Message*, bool)+0x864) [0x5ab0d4]#012 11:
> (Monitor::_ms_dispatch(Message*)+0x2c9) [0x5a8a19]#012 12:
> (Monitor::ms_dispatch(Message*)+0x32) [0x5c3952]#012 13:
> (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x8ac987]#012 14:
> (DispatchQueue::entry()+0x44a) [0x8a9b2a]#012 15:
> (DispatchQueue::DispatchThread::entry()+0xd) [0x79e4ad]#012 16: (()+0x7e9a)
> [0x7f4ca50d8e9a]#012 17: (clone()+0x6d) [0x7f4ca3dca38d]#012 NOTE: a copy of
> the executable, or `objdump -rdS ` is needed to interpret this.
>
> Regards,
> Richard
>
> On 17 October 2015 at 00:33, Dan van der Ster  wrote:
>>
>> Hi,
>> Is there a backtrace in /var/log/ceph/ceph-mon.*.log ?
>> Cheers, Dan
>>
>> On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade  wrote:
>> > Hi Everyone,
>> > I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today
>> > we've
>> > had one monitor crash twice and another one once. We have 3 monitors
>> > total
>> > and have been running Firefly 0.80.10 for quite some time without any
>> > monitor issues.
>> > When the monitor crashes it leaves a core file and a crash file in
>> > /var/crash
>> > I can't see anything obviously the same goolging about.
>> > Has anyone seen anything like this?
>> > Any suggestions? What other info would be useful to help track down the
>> > issue.
>> >
>> > Regards,
>> > Richard
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
Thanks for your quick response Dan, but no. All the ceph-mon.*.log files
are empty.
I did track this down in syslog though, in case it helps:
ceph-mon: 2015-10-16 21:25:00.117115 7f4c9f458700 -1 *** Caught signal
(Segmentation fault) **#012 in thread 7f4c9f458700#012#012 ceph version
0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)#012 1:
/usr/bin/ceph-mon() [0x928b05]#012 2: (()+0xfcb0) [0x7f4ca50e0cb0]#012 3:
(get_str_map_key(std::map > const&,
std::string const&, std::string const*)+0x37) [0x87d8e7]#012 4:
(LogMonitor::update_from_paxos(bool*)+0x801) [0x6846e1]#012 5:
(PaxosService::refresh(bool*)+0x3c6) [0x5dc326]#012 6:
(Monitor::refresh_from_paxos(bool*)+0x36b) [0x588aab]#012 7:
(Paxos::do_refresh()+0x4c) [0x5c465c]#012 8:
(Paxos::handle_commit(MMonPaxos*)+0x243) [0x5cb2d3]#012 9:
(Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5d3fbb]#012 10:
(Monitor::dispatch(MonSession*, Message*, bool)+0x864) [0x5ab0d4]#012 11:
(Monitor::_ms_dispatch(Message*)+0x2c9) [0x5a8a19]#012 12:
(Monitor::ms_dispatch(Message*)+0x32) [0x5c3952]#012 13:
(Messenger::ms_deliver_dispatch(Message*)+0x77) [0x8ac987]#012 14:
(DispatchQueue::entry()+0x44a) [0x8a9b2a]#012 15:
(DispatchQueue::DispatchThread::entry()+0xd) [0x79e4ad]#012 16: (()+0x7e9a)
[0x7f4ca50d8e9a]#012 17: (clone()+0x6d) [0x7f4ca3dca38d]#012 NOTE: a copy
of the executable, or `objdump -rdS ` is needed to interpret
this.

Regards,
Richard

On 17 October 2015 at 00:33, Dan van der Ster  wrote:

> Hi,
> Is there a backtrace in /var/log/ceph/ceph-mon.*.log ?
> Cheers, Dan
>
> On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade  wrote:
> > Hi Everyone,
> > I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today
> we've
> > had one monitor crash twice and another one once. We have 3 monitors
> total
> > and have been running Firefly 0.80.10 for quite some time without any
> > monitor issues.
> > When the monitor crashes it leaves a core file and a crash file in
> > /var/crash
> > I can't see anything obviously the same goolging about.
> > Has anyone seen anything like this?
> > Any suggestions? What other info would be useful to help track down the
> > issue.
> >
> > Regards,
> > Richard
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Dan van der Ster
Hi,
Is there a backtrace in /var/log/ceph/ceph-mon.*.log ?
Cheers, Dan

On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade  wrote:
> Hi Everyone,
> I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today we've
> had one monitor crash twice and another one once. We have 3 monitors total
> and have been running Firefly 0.80.10 for quite some time without any
> monitor issues.
> When the monitor crashes it leaves a core file and a crash file in
> /var/crash
> I can't see anything obviously the same goolging about.
> Has anyone seen anything like this?
> Any suggestions? What other info would be useful to help track down the
> issue.
>
> Regards,
> Richard
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com