0.55 crashed during upgrade to bobtail

2013-01-01 Thread Andrey Korolyov
Hi,

All osds in the dev cluster died shortly after upgrade (packet-only,
i.e. binary upgrade, even without restart running processes), please
see attached file.

Was: 0.55.1-356-g850d1d5
Upgraded to: 0.56 tag

The only one difference is a version of the gcc corresponding
libstdc++ - 4.6 on the buildhost and 4.7 on the cluster. Of course I
may do a rollback and problem will eliminate with high probability,
but seems there should be some fix. Also I have something simular in
the the testing env days ago -  packet upgrade inside 0.55 killed all
_windows_ guests_ and one of tens of linux guests running above rbd.
Unfortunately I have no debug sessions at this moment and I have only
tail of the log from qemu:

terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer

I`m blaming ldconfig action from librbd because nothing else `ll cause
such case of destroy on the running processes - may be I`m wrong.

thanks!

WBR, Andrey


crashes-2013-01-01.tgz
Description: GNU Zip compressed data


Re: 0.55 crashed during upgrade to bobtail

2013-01-01 Thread Andrey Korolyov
On Tue, Jan 1, 2013 at 9:49 PM, Andrey Korolyov  wrote:
> Hi,
>
> All osds in the dev cluster died shortly after upgrade (packet-only,
> i.e. binary upgrade, even without restart running processes), please
> see attached file.
>
> Was: 0.55.1-356-g850d1d5
> Upgraded to: 0.56 tag
>
> The only one difference is a version of the gcc corresponding
> libstdc++ - 4.6 on the buildhost and 4.7 on the cluster. Of course I
> may do a rollback and problem will eliminate with high probability,
> but seems there should be some fix. Also I have something simular in
> the the testing env days ago -  packet upgrade inside 0.55 killed all
> _windows_ guests_ and one of tens of linux guests running above rbd.
> Unfortunately I have no debug sessions at this moment and I have only
> tail of the log from qemu:
>
> terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
>   what():  buffer::end_of_buffer
>
> I`m blaming ldconfig action from librbd because nothing else `ll cause
> such case of destroy on the running processes - may be I`m wrong.
>
> thanks!
>
> WBR, Andrey

Sorry, I`m not able to reproduce crash after rollback and traces was
uncomplete due to lack of disk space on specified core location, so
please don`t mind it.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 0.55 crashed during upgrade to bobtail

2013-01-01 Thread Andrey Korolyov
On Wed, Jan 2, 2013 at 12:16 AM, Andrey Korolyov  wrote:
> On Tue, Jan 1, 2013 at 9:49 PM, Andrey Korolyov  wrote:
>> Hi,
>>
>> All osds in the dev cluster died shortly after upgrade (packet-only,
>> i.e. binary upgrade, even without restart running processes), please
>> see attached file.
>>
>> Was: 0.55.1-356-g850d1d5
>> Upgraded to: 0.56 tag
>>
>> The only one difference is a version of the gcc corresponding
>> libstdc++ - 4.6 on the buildhost and 4.7 on the cluster. Of course I
>> may do a rollback and problem will eliminate with high probability,
>> but seems there should be some fix. Also I have something simular in
>> the the testing env days ago -  packet upgrade inside 0.55 killed all
>> _windows_ guests_ and one of tens of linux guests running above rbd.
>> Unfortunately I have no debug sessions at this moment and I have only
>> tail of the log from qemu:
>>
>> terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
>>   what():  buffer::end_of_buffer
>>
>> I`m blaming ldconfig action from librbd because nothing else `ll cause
>> such case of destroy on the running processes - may be I`m wrong.
>>
>> thanks!
>>
>> WBR, Andrey
>
> Sorry, I`m not able to reproduce crash after rollback and traces was
> uncomplete due to lack of disk space on specified core location, so
> please don`t mind it.

Ahem, finally it seems that osd process stumbling on something on the
fs, because my other environments also was able to reproduce crash
once, but reproducing is not possible since new osd process started
over existing filestore(offline version rollback and another try to
online upgrade doing fine). And backtrace in first message is
complete, at least 1 and 2, despite of lack of space first time - I
have received a couple of coredumps which trace looks exactly the same
as 2.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html