[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-09-24 Thread Denis Krienbühl
I’m interested in the following as well, any chance you could point us to a 
specific commit Jason?

> On 14 Sep 2020, at 13:55, Jason Dillaman  wrote:
> 
> Can you try the latest development release of Octopus [1]? A librbd
> crash fix has been sitting in that branch for about a month now to be
> included in the next point release.

> On 22 Sep 2020, at 11:48, Michael Bisig  wrote:
> 
> We also facing the problem and we would like to upgrade the clients to the 
> specific release.
> @jason can you point us to the respective commit and the point release that 
> contains the fix?


Cheers, Denis
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-09-22 Thread Michael Bisig
Hallo all

We also facing the problem and we would like to upgrade the clients to the 
specific release.
@jason can you point us to the respective commit and the point release that 
contains the fix?

Thanks in advance for your help.
Best regards,
Michael


On 18.09.20, 15:12, "Lomayani S. Laizer"  wrote:

Hello Jason,

I confirm this release fixes the crashes. there is no a single crash for
past 4 days



On Mon, Sep 14, 2020 at 2:55 PM Jason Dillaman  wrote:

> On Mon, Sep 14, 2020 at 5:13 AM Lomayani S. Laizer 
> wrote:
> >
> > Hello,
> > Last week i got time to try debug crashes of these vms
> >
> > Below  log  includes rados debug which i left last time
> >
> > https://storage.habari.co.tz/index.php/s/AQEJ7tQS7epC4Zn
> >
> > I have observed the following with these settings in openstack and ceph
> >
> > disk_cachemodes="network=writeback" is set in openstack environment
> > and in ceph  rbd_cache_policy = writearound is set--- crashes occur
> >
> > disk_cachemodes="network=writeback" is set in openstack environment
> > and in ceph  rbd_cache_policy = writeback is set---  no crashes
> >
> > disk_cachemodes="none" is set in openstack environment and in ceph
> > rbd_cache_policy = writearound is set  no crashes
> >
> > disk_cachemodes="none" is set in openstack environment and in ceph
> > rbd_cache_policy = writeback is set --- crashes occur
> >
> > Is disk_cachemodes="network=writeback" is no longer recommended in
> > octopus because i see it is left out in new documentation for octopus?
> >
> > https://ceph.readthedocs.io/en/latest/rbd/rbd-openstack/
>
> Can you try the latest development release of Octopus [1]? A librbd
> crash fix has been sitting in that branch for about a month now to be
> included in the next point release.
>
> >
> >
> > >* > Hello,*
> > >* >*
> > >* > Below is full debug log of 2 minutes before crash of virtual
> machine.*
> > >* Download from below url*
> > >* >*
> > >* > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU <
> https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU>*
> > >
> > >* This log has rbd debug output, but not rados :(*
> > >
> > >* I guess you'll need to try and capture a coredump if you can't get a*
> > >* backtrace.*
> > >
> > >* I'd also suggest opening a tracker in case one of the rbd devs has
> any*
> > >* ideas on this, or has seen something similar. Without a backtrace or*
> > >* core it will be impossible to definitively identify the issue 
though.*
> > >
> >
> > +1 to needing the backtrace. I don't see any indications of a problem in
> > that log.
> >
> >
> > >* >*
> > >* >*
> > >* > apport.log*
> > >* >*
> > >* > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate 
mount*
> > >* namespace, ignoring*
> > >* >*
> > >* > kernel.log*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373217]
> fn-radosclient[4485]:*
> > >* segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in*
> > >* librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08
> 48 81*
> > >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48
> 8b 03*
> > >* 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44
> 24 08*
> > >* 48 8b 0b 48 21 f8 48 39 0c*
> > >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838520] device
> tap33511c4d-2c*
> > >* left promiscuous mode*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* >*
> > >* > syslog*
> > >* > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at
> 0 ip*
> > >* 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i*
> > >* > n librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08
> 48 81*
> > >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8*
> > >* > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44
> 24 08*
> > >* 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c*
> > >* > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of
> file*
> > >* from qemu monitor*
> > >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c:
> Link*
> > >* DOWN*
> > >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c:
> Lost*
> > >* carrier*
> > >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9:
> port*
> > >* 

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-09-18 Thread Lomayani S. Laizer
Hello Jason,

I confirm this release fixes the crashes. there is no a single crash for
past 4 days



On Mon, Sep 14, 2020 at 2:55 PM Jason Dillaman  wrote:

> On Mon, Sep 14, 2020 at 5:13 AM Lomayani S. Laizer 
> wrote:
> >
> > Hello,
> > Last week i got time to try debug crashes of these vms
> >
> > Below  log  includes rados debug which i left last time
> >
> > https://storage.habari.co.tz/index.php/s/AQEJ7tQS7epC4Zn
> >
> > I have observed the following with these settings in openstack and ceph
> >
> > disk_cachemodes="network=writeback" is set in openstack environment
> > and in ceph  rbd_cache_policy = writearound is set--- crashes occur
> >
> > disk_cachemodes="network=writeback" is set in openstack environment
> > and in ceph  rbd_cache_policy = writeback is set---  no crashes
> >
> > disk_cachemodes="none" is set in openstack environment and in ceph
> > rbd_cache_policy = writearound is set  no crashes
> >
> > disk_cachemodes="none" is set in openstack environment and in ceph
> > rbd_cache_policy = writeback is set --- crashes occur
> >
> > Is disk_cachemodes="network=writeback" is no longer recommended in
> > octopus because i see it is left out in new documentation for octopus?
> >
> > https://ceph.readthedocs.io/en/latest/rbd/rbd-openstack/
>
> Can you try the latest development release of Octopus [1]? A librbd
> crash fix has been sitting in that branch for about a month now to be
> included in the next point release.
>
> >
> >
> > >* > Hello,*
> > >* >*
> > >* > Below is full debug log of 2 minutes before crash of virtual
> machine.*
> > >* Download from below url*
> > >* >*
> > >* > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU <
> https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU>*
> > >
> > >* This log has rbd debug output, but not rados :(*
> > >
> > >* I guess you'll need to try and capture a coredump if you can't get a*
> > >* backtrace.*
> > >
> > >* I'd also suggest opening a tracker in case one of the rbd devs has
> any*
> > >* ideas on this, or has seen something similar. Without a backtrace or*
> > >* core it will be impossible to definitively identify the issue though.*
> > >
> >
> > +1 to needing the backtrace. I don't see any indications of a problem in
> > that log.
> >
> >
> > >* >*
> > >* >*
> > >* > apport.log*
> > >* >*
> > >* > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount*
> > >* namespace, ignoring*
> > >* >*
> > >* > kernel.log*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373217]
> fn-radosclient[4485]:*
> > >* segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in*
> > >* librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08
> 48 81*
> > >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48
> 8b 03*
> > >* 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44
> 24 08*
> > >* 48 8b 0b 48 21 f8 48 39 0c*
> > >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838520] device
> tap33511c4d-2c*
> > >* left promiscuous mode*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* >*
> > >* > syslog*
> > >* > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at
> 0 ip*
> > >* 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i*
> > >* > n librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> > >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08
> 48 81*
> > >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8*
> > >* > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44
> 24 08*
> > >* 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c*
> > >* > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of
> file*
> > >* from qemu monitor*
> > >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c:
> Link*
> > >* DOWN*
> > >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c:
> Lost*
> > >* carrier*
> > >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838520] device
> tap33511c4d-2c*
> > >* left promiscuous mode*
> > >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9:
> port*
> > >* 1(tap33511c4d-2c) entered disabled state*
> > >* > May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to
> request*
> > >* link: No such device*
> > >* >*
> > >* > On Fri, May 8, 2020 at 5:40 AM Brad Hubbard 
> wrote:*
> > >* >>*
> > >* >> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer
> *
> > >* wrote:*
> > >* >> >*
> > >* >> > Hello,*
> > >* >> > On my side at point of vm crash these are logs below. At the
> moment*
> > >* my debug is at 10 value. I will rise to 20 for full debug. these
> crashes*
> > >* are random and so 

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-09-14 Thread Jason Dillaman
On Mon, Sep 14, 2020 at 5:13 AM Lomayani S. Laizer  wrote:
>
> Hello,
> Last week i got time to try debug crashes of these vms
>
> Below  log  includes rados debug which i left last time
>
> https://storage.habari.co.tz/index.php/s/AQEJ7tQS7epC4Zn
>
> I have observed the following with these settings in openstack and ceph
>
> disk_cachemodes="network=writeback" is set in openstack environment
> and in ceph  rbd_cache_policy = writearound is set--- crashes occur
>
> disk_cachemodes="network=writeback" is set in openstack environment
> and in ceph  rbd_cache_policy = writeback is set---  no crashes
>
> disk_cachemodes="none" is set in openstack environment and in ceph
> rbd_cache_policy = writearound is set  no crashes
>
> disk_cachemodes="none" is set in openstack environment and in ceph
> rbd_cache_policy = writeback is set --- crashes occur
>
> Is disk_cachemodes="network=writeback" is no longer recommended in
> octopus because i see it is left out in new documentation for octopus?
>
> https://ceph.readthedocs.io/en/latest/rbd/rbd-openstack/

Can you try the latest development release of Octopus [1]? A librbd
crash fix has been sitting in that branch for about a month now to be
included in the next point release.

>
>
> >* > Hello,*
> >* >*
> >* > Below is full debug log of 2 minutes before crash of virtual machine.*
> >* Download from below url*
> >* >*
> >* > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU 
> >*
> >
> >* This log has rbd debug output, but not rados :(*
> >
> >* I guess you'll need to try and capture a coredump if you can't get a*
> >* backtrace.*
> >
> >* I'd also suggest opening a tracker in case one of the rbd devs has any*
> >* ideas on this, or has seen something similar. Without a backtrace or*
> >* core it will be impossible to definitively identify the issue though.*
> >
>
> +1 to needing the backtrace. I don't see any indications of a problem in
> that log.
>
>
> >* >*
> >* >*
> >* > apport.log*
> >* >*
> >* > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount*
> >* namespace, ignoring*
> >* >*
> >* > kernel.log*
> >* > May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]:*
> >* segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in*
> >* librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81*
> >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03*
> >* 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08*
> >* 48 8b 0b 48 21 f8 48 39 0c*
> >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c*
> >* left promiscuous mode*
> >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* >*
> >* > syslog*
> >* > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip*
> >* 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i*
> >* > n librbd.so.1.12.0[7f4c8c65a000+5cb000]*
> >* > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81*
> >* c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8*
> >* > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08*
> >* 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c*
> >* > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file*
> >* from qemu monitor*
> >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link*
> >* DOWN*
> >* > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost*
> >* carrier*
> >* > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c*
> >* left promiscuous mode*
> >* > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port*
> >* 1(tap33511c4d-2c) entered disabled state*
> >* > May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request*
> >* link: No such device*
> >* >*
> >* > On Fri, May 8, 2020 at 5:40 AM Brad Hubbard  wrote:*
> >* >>*
> >* >> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer 
> >*
> >* wrote:*
> >* >> >*
> >* >> > Hello,*
> >* >> > On my side at point of vm crash these are logs below. At the moment*
> >* my debug is at 10 value. I will rise to 20 for full debug. these crashes*
> >* are random and so far happens on very busy vms. Downgrading clients in 
> >host*
> >* to Nautilus these crashes disappear*
> >* >>*
> >* >> You could try adding debug_rados as well but you may get a very large*
> >* >> log so keep an eye on things.*
> >* >>*
> >* >> >*
> >* >> > Qemu is not shutting down in general because other vms on the same*
> >* host continues working*
> >* >>*
> >* >> A process can not reliably continue 

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-14 Thread Jason Dillaman
On Thu, May 14, 2020 at 3:12 AM Brad Hubbard  wrote:

> On Wed, May 13, 2020 at 6:00 PM Lomayani S. Laizer 
> wrote:
> >
> > Hello,
> >
> > Below is full debug log of 2 minutes before crash of virtual machine.
> Download from below url
> >
> > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU
>
> This log has rbd debug output, but not rados :(
>
> I guess you'll need to try and capture a coredump if you can't get a
> backtrace.
>
> I'd also suggest opening a tracker in case one of the rbd devs has any
> ideas on this, or has seen something similar. Without a backtrace or
> core it will be impossible to definitively identify the issue though.
>

+1 to needing the backtrace. I don't see any indications of a problem in
that log.


> >
> >
> > apport.log
> >
> > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount
> namespace, ignoring
> >
> > kernel.log
> > May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]:
> segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in
> librbd.so.1.12.0[7f4c8c65a000+5cb000]
> > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81
> c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03
> 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08
> 48 8b 0b 48 21 f8 48 39 0c
> > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c
> left promiscuous mode
> > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> >
> > syslog
> > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip
> 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i
> > n librbd.so.1.12.0[7f4c8c65a000+5cb000]
> > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81
> c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8
> > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08
> 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c
> > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file
> from qemu monitor
> > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link
> DOWN
> > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost
> carrier
> > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c
> left promiscuous mode
> > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> > May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request
> link: No such device
> >
> > On Fri, May 8, 2020 at 5:40 AM Brad Hubbard  wrote:
> >>
> >> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer 
> wrote:
> >> >
> >> > Hello,
> >> > On my side at point of vm crash these are logs below. At the moment
> my debug is at 10 value. I will rise to 20 for full debug. these crashes
> are random and so far happens on very busy vms. Downgrading clients in host
> to Nautilus these crashes disappear
> >>
> >> You could try adding debug_rados as well but you may get a very large
> >> log so keep an eye on things.
> >>
> >> >
> >> > Qemu is not shutting down in general because other vms on the same
> host continues working
> >>
> >> A process can not reliably continue after encountering a segfault so
> >> the qemu-kvm process must be ending and therefore it should be
> >> possible to capture a coredump with the right configuration.
> >>
> >> In the following example, if you were to search for pid 6060 you would
> >> find it is no longer running.
> >> >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip
> 7f8165cc0a50 sp 7f81397f6490 error 4 in
> librbd.so.1.12.0[7f8165ab4000+537000]
> >>
> >> Without a backtrace at a minimum it may be very difficult to work out
> >> what's going on with certainty. If you open a tracker for the issue
> >> though maybe one of the devs specialising in rbd may have some
> >> feedback.
> >>
> >> >
> >> > 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult:
> 0x7f88c80bfbf0 finish:  got {} for [0,24576] bl 24576
> >> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult:
> 0x7f88c80f9330 finish: C_ObjectReadRequest: r=0
> >> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult:
> 0x7f88c80f9330 finish:  got {} for [0,16384] bl 16384
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState:
> 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState:
> 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10
> librbd::image::CloseRequest: 0x7f88c8175fd0 send_block_image_watcher
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-14 Thread Brad Hubbard
On Wed, May 13, 2020 at 6:00 PM Lomayani S. Laizer  wrote:
>
> Hello,
>
> Below is full debug log of 2 minutes before crash of virtual machine. 
> Download from below url
>
> https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU

This log has rbd debug output, but not rados :(

I guess you'll need to try and capture a coredump if you can't get a backtrace.

I'd also suggest opening a tracker in case one of the rbd devs has any
ideas on this, or has seen something similar. Without a backtrace or
core it will be impossible to definitively identify the issue though.

>
>
> apport.log
>
> Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount 
> namespace, ignoring
>
> kernel.log
> May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]: 
> segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in 
> librbd.so.1.12.0[7f4c8c65a000+5cb000]
> May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81 c3 
> d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03 48 89 
> 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08 48 8b 0b 
> 48 21 f8 48 39 0c
> May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
> May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c left 
> promiscuous mode
> May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
>
> syslog
> compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip 
> 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i
> n librbd.so.1.12.0[7f4c8c65a000+5cb000]
> May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81 c3 
> d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8
> 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 
> 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c
> May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file from 
> qemu monitor
> May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link DOWN
> May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost carrier
> May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
> May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c left 
> promiscuous mode
> May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
> May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request link: 
> No such device
>
> On Fri, May 8, 2020 at 5:40 AM Brad Hubbard  wrote:
>>
>> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer  
>> wrote:
>> >
>> > Hello,
>> > On my side at point of vm crash these are logs below. At the moment my 
>> > debug is at 10 value. I will rise to 20 for full debug. these crashes are 
>> > random and so far happens on very busy vms. Downgrading clients in host to 
>> > Nautilus these crashes disappear
>>
>> You could try adding debug_rados as well but you may get a very large
>> log so keep an eye on things.
>>
>> >
>> > Qemu is not shutting down in general because other vms on the same host 
>> > continues working
>>
>> A process can not reliably continue after encountering a segfault so
>> the qemu-kvm process must be ending and therefore it should be
>> possible to capture a coredump with the right configuration.
>>
>> In the following example, if you were to search for pid 6060 you would
>> find it is no longer running.
>> >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 
>> >> > 7f8165cc0a50 sp 7f81397f6490 error 4 in 
>> >> > librbd.so.1.12.0[7f8165ab4000+537000]
>>
>> Without a backtrace at a minimum it may be very difficult to work out
>> what's going on with certainty. If you open a tracker for the issue
>> though maybe one of the devs specialising in rbd may have some
>> feedback.
>>
>> >
>> > 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult: 
>> > 0x7f88c80bfbf0 finish:  got {} for [0,24576] bl 24576
>> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 
>> > 0x7f88c80f9330 finish: C_ObjectReadRequest: r=0
>> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 
>> > 0x7f88c80f9330 finish:  got {} for [0,16384] bl 16384
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 
>> > 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 
>> > 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 
>> > 0x7f88c8175fd0 send_block_image_watcher
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageWatcher: 
>> > 0x7f88c400dfe0 block_notifies
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500  5 librbd::Watcher: 
>> > 0x7f88c400dfe0 block_notifies: blocked_count=1
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Brad Hubbard
On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer  wrote:
>
> Hello,
> On my side at point of vm crash these are logs below. At the moment my debug 
> is at 10 value. I will rise to 20 for full debug. these crashes are random 
> and so far happens on very busy vms. Downgrading clients in host to Nautilus 
> these crashes disappear

You could try adding debug_rados as well but you may get a very large
log so keep an eye on things.

>
> Qemu is not shutting down in general because other vms on the same host 
> continues working

A process can not reliably continue after encountering a segfault so
the qemu-kvm process must be ending and therefore it should be
possible to capture a coredump with the right configuration.

In the following example, if you were to search for pid 6060 you would
find it is no longer running.
>> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 7f8165cc0a50 
>> > sp 7f81397f6490 error 4 in librbd.so.1.12.0[7f8165ab4000+537000]

Without a backtrace at a minimum it may be very difficult to work out
what's going on with certainty. If you open a tracker for the issue
though maybe one of the devs specialising in rbd may have some
feedback.

>
> 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult: 
> 0x7f88c80bfbf0 finish:  got {} for [0,24576] bl 24576
> 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 
> 0x7f88c80f9330 finish: C_ObjectReadRequest: r=0
> 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 
> 0x7f88c80f9330 finish:  got {} for [0,16384] bl 16384
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 
> 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 
> 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 
> 0x7f88c8175fd0 send_block_image_watcher
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageWatcher: 
> 0x7f88c400dfe0 block_notifies
> 2020-05-07T13:02:28.694+0300 7f890ba90500  5 librbd::Watcher: 0x7f88c400dfe0 
> block_notifies: blocked_count=1
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 
> 0x7f88c8175fd0 handle_block_image_watcher: r=0
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 
> 0x7f88c8175fd0 send_shut_down_update_watchers
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 
> 0x7f88c8175fd0 handle_shut_down_update_watchers: r=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 
> 0x7f88c8175fd0 send_shut_down_io_queue
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700  5 librbd::io::ImageRequestWQ: 
> 0x7f88e8001570 shut_down: shut_down: in_flight=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 
> 0x7f88c8175fd0 handle_shut_down_io_queue: r=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 
> 0x7f88c8175fd0 send_shut_down_exclusive_lock
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock: 
> 0x7f88c4011ba0 shut_down
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: 
> 0x7f88c4011bb8 shut_down:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: 
> 0x7f88c4011bb8 send_shutdown:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: 
> 0x7f88c4011bb8 send_shutdown_release:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock: 
> 0x7f88c4011ba0 pre_release_lock_handler
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 
> send_cancel_op_requests:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 
> handle_cancel_op_requests: r=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_block_writes:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700  5 librbd::io::ImageRequestWQ: 
> 0x7f88e8001570 block_writes: 0x5569b5e1ffd0, num=1
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 
> handle_block_writes: r=0
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_wait_for_ops:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 handle_wait_for_ops:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 
> send_invalidate_cache:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700  5 librbd::io::ObjectDispatcher: 
> 0x5569b5dab700 invalidate_cache:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 
> handle_invalidate_cache: r=0
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 
> librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_flush_notifies:
> 

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Lomayani S. Laizer
Hello,
On my side at point of vm crash these are logs below. At the moment my
debug is at 10 value. I will rise to 20 for full debug. these crashes are
random and so far happens on very busy vms. Downgrading clients in host to
Nautilus these crashes disappear

Qemu is not shutting down in general because other vms on the same host
continues working

2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult:
0x7f88c80bfbf0 finish:  got {} for [0,24576] bl 24576
2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult:
0x7f88c80f9330 finish: C_ObjectReadRequest: r=0
2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult:
0x7f88c80f9330 finish:  got {} for [0,16384] bl 16384
2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState:
0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState:
0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest:
0x7f88c8175fd0 send_block_image_watcher
2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageWatcher:
0x7f88c400dfe0 block_notifies
2020-05-07T13:02:28.694+0300 7f890ba90500  5 librbd::Watcher:
0x7f88c400dfe0 block_notifies: blocked_count=1
2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest:
0x7f88c8175fd0 handle_block_image_watcher: r=0
2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest:
0x7f88c8175fd0 send_shut_down_update_watchers
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest:
0x7f88c8175fd0 handle_shut_down_update_watchers: r=0
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest:
0x7f88c8175fd0 send_shut_down_io_queue
2020-05-07T13:02:28.694+0300 7f88d4ff9700  5 librbd::io::ImageRequestWQ:
0x7f88e8001570 shut_down: shut_down: in_flight=0
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest:
0x7f88c8175fd0 handle_shut_down_io_queue: r=0
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest:
0x7f88c8175fd0 send_shut_down_exclusive_lock
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock:
0x7f88c4011ba0 shut_down
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 shut_down:
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 send_shutdown:
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 send_shutdown_release:
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock:
0x7f88c4011ba0 pre_release_lock_handler
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_cancel_op_requests:
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_cancel_op_requests: r=0
2020-05-07T13:02:28.694+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_block_writes:
2020-05-07T13:02:28.698+0300 7f88d4ff9700  5 librbd::io::ImageRequestWQ:
0x7f88e8001570 block_writes: 0x5569b5e1ffd0, num=1
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_block_writes: r=0
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_wait_for_ops:
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_wait_for_ops:
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_invalidate_cache:
2020-05-07T13:02:28.698+0300 7f88d4ff9700  5 librbd::io::ObjectDispatcher:
0x5569b5dab700 invalidate_cache:
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_invalidate_cache: r=0
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_flush_notifies:
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_flush_notifies:
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_close_object_map:
2020-05-07T13:02:28.698+0300 7f88d4ff9700 10
librbd::object_map::UnlockRequest: 0x7f88c807a450 send_unlock:
oid=rbd_object_map.2f18f2a67fad72
2020-05-07T13:02:28.702+0300 7f88d57fa700 10
librbd::object_map::UnlockRequest: 0x7f88c807a450 handle_unlock: r=0
2020-05-07T13:02:28.702+0300 7f88d57fa700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_close_object_map: r=0
2020-05-07T13:02:28.702+0300 7f88d57fa700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_unlock:
2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 handle_shutdown_pre_release: r=0
2020-05-07T13:02:28.702+0300 7f88d4ff9700 10
librbd::managed_lock::ReleaseRequest: 0x7f88c80b68a0 send_unlock:
entity=client.58292796, cookie=auto 

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Brad Hubbard
On Fri, May 8, 2020 at 3:42 AM Erwin Lubbers  wrote:
>
> Hi,
>
> Did anyone found a way to resolve the problem? I'm seeing the same on a clean 
> Octopus Ceph installation on Ubuntu 18 with an Octopus compiled KVM server 
> running on CentOS 7.8. The KVM machine shows:
>
> [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 7f8165cc0a50 sp 
> 7f81397f6490 error 4 in librbd.so.1.12.0[7f8165ab4000+537000]

Are you able to either capture a backtrace from a coredump or set up
logging and hopefully capture a backtrace that way?

>
> Ceph is healthy and stable for a few weeks and I did not get these messages 
> while running on KVM compiled with Luminous libraries.
>
> Regards,
> Erwin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Brad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Erwin Lubbers
Hi,

Did anyone found a way to resolve the problem? I'm seeing the same on a clean 
Octopus Ceph installation on Ubuntu 18 with an Octopus compiled KVM server 
running on CentOS 7.8. The KVM machine shows:

[ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 7f8165cc0a50 sp 
7f81397f6490 error 4 in librbd.so.1.12.0[7f8165ab4000+537000]

Ceph is healthy and stable for a few weeks and I did not get these messages 
while running on KVM compiled with Luminous libraries.

Regards,
Erwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-04-06 Thread Lomayani S. Laizer
Hello Eugen,

All OSDs are up. the only issue i have is when server is rebooted i have to
manually start osds.

Now when i delete config of starting up osd in
/run/systemd/system/ceph-osd.target.wants/ and i reboot the server osds
will come automatically.

So config of starting osd in systemd is broken somewhere but not figured
out where

On Mon, Apr 6, 2020 at 5:08 PM Eugen Block  wrote:

> Hi,
>
> did you manage to get all OSDs up (you reported issues some days ago)?
> Is the cluster in a healthy state?
>
>
> Zitat von "Lomayani S. Laizer" :
>
> > Hello,
> >
> > After upgrade our ceph cluster to octopus few days ago we are seeing vms
> > crashes with below error. We are using ceph with openstack(rocky).
> >  Everything running ubuntu 18.04 with kernel 5.3. We seeing this crashes
> in
> > busy vms. this is cluster was upgraded from nautilus.
> >
> > kernel: [430751.176904] fn-radosclient[3905]: segfault at da0801 ip
> > 7fe78e076686 sp 7fe7697f9470 erro
> > r 4 in librbd.so.1.12.0[7fe78de73000+5cb000]
> > Apr  6 03:26:00 compute6 kernel: [430751.176922] Code: 00 64 48 8b 04
> > 25 28 00 00 00 48 89 44 24 18 31 c0 48 85 db 0f 84 fa 00 00 00 8
> > 0 bf 38 01 00 00 00 48 89 fd 0f 84 ea 00 00 00 <83> bb 20 3f 00 00 ff
> > 0f 84 dd 00 00 00 48 8b 83 18 3f 00 00 48 8d
> > Apr  6 03:26:11 compute6 libvirtd[1671]: 2020-04-06 03:26:11.955+:
> > 1671: error : qemuMonitorIO:719 : internal error: End of file f
> > rom qemu monitor
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-04-06 Thread Lomayani S. Laizer
Hello,

Clients and cluster are running Octopus. the only config changed after
upgrading to octopus is rbd_read_from_replica_policy set to balance.

Is this a risk configuration? Although the performance of vms is really
good now in my hdd based cluster

On Mon, Apr 6, 2020 at 5:17 PM Jason Dillaman  wrote:

> On Mon, Apr 6, 2020 at 3:55 AM Lomayani S. Laizer 
> wrote:
> >
> > Hello,
> >
> > After upgrade our ceph cluster to octopus few days ago we are seeing vms
> > crashes with below error. We are using ceph with openstack(rocky).
> >  Everything running ubuntu 18.04 with kernel 5.3. We seeing this crashes
> in
> > busy vms. this is cluster was upgraded from nautilus.
>
> Just for clarity, have the hypervisor hosts been upgraded to Octopus
> clients or was this just a cluster upgrade to Octopus and the clients
> are still running an older version?
>
> > kernel: [430751.176904] fn-radosclient[3905]: segfault at da0801 ip
> > 7fe78e076686 sp 7fe7697f9470 erro
> > r 4 in librbd.so.1.12.0[7fe78de73000+5cb000]
> > Apr  6 03:26:00 compute6 kernel: [430751.176922] Code: 00 64 48 8b 04
> > 25 28 00 00 00 48 89 44 24 18 31 c0 48 85 db 0f 84 fa 00 00 00 8
> > 0 bf 38 01 00 00 00 48 89 fd 0f 84 ea 00 00 00 <83> bb 20 3f 00 00 ff
> > 0f 84 dd 00 00 00 48 8b 83 18 3f 00 00 48 8d
> > Apr  6 03:26:11 compute6 libvirtd[1671]: 2020-04-06 03:26:11.955+:
> > 1671: error : qemuMonitorIO:719 : internal error: End of file f
> > rom qemu monitor
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Jason
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-04-06 Thread Jason Dillaman
On Mon, Apr 6, 2020 at 3:55 AM Lomayani S. Laizer  wrote:
>
> Hello,
>
> After upgrade our ceph cluster to octopus few days ago we are seeing vms
> crashes with below error. We are using ceph with openstack(rocky).
>  Everything running ubuntu 18.04 with kernel 5.3. We seeing this crashes in
> busy vms. this is cluster was upgraded from nautilus.

Just for clarity, have the hypervisor hosts been upgraded to Octopus
clients or was this just a cluster upgrade to Octopus and the clients
are still running an older version?

> kernel: [430751.176904] fn-radosclient[3905]: segfault at da0801 ip
> 7fe78e076686 sp 7fe7697f9470 erro
> r 4 in librbd.so.1.12.0[7fe78de73000+5cb000]
> Apr  6 03:26:00 compute6 kernel: [430751.176922] Code: 00 64 48 8b 04
> 25 28 00 00 00 48 89 44 24 18 31 c0 48 85 db 0f 84 fa 00 00 00 8
> 0 bf 38 01 00 00 00 48 89 fd 0f 84 ea 00 00 00 <83> bb 20 3f 00 00 ff
> 0f 84 dd 00 00 00 48 8b 83 18 3f 00 00 48 8d
> Apr  6 03:26:11 compute6 libvirtd[1671]: 2020-04-06 03:26:11.955+:
> 1671: error : qemuMonitorIO:719 : internal error: End of file f
> rom qemu monitor
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-04-06 Thread Eugen Block

Hi,

did you manage to get all OSDs up (you reported issues some days ago)?  
Is the cluster in a healthy state?



Zitat von "Lomayani S. Laizer" :


Hello,

After upgrade our ceph cluster to octopus few days ago we are seeing vms
crashes with below error. We are using ceph with openstack(rocky).
 Everything running ubuntu 18.04 with kernel 5.3. We seeing this crashes in
busy vms. this is cluster was upgraded from nautilus.

kernel: [430751.176904] fn-radosclient[3905]: segfault at da0801 ip
7fe78e076686 sp 7fe7697f9470 erro
r 4 in librbd.so.1.12.0[7fe78de73000+5cb000]
Apr  6 03:26:00 compute6 kernel: [430751.176922] Code: 00 64 48 8b 04
25 28 00 00 00 48 89 44 24 18 31 c0 48 85 db 0f 84 fa 00 00 00 8
0 bf 38 01 00 00 00 48 89 fd 0f 84 ea 00 00 00 <83> bb 20 3f 00 00 ff
0f 84 dd 00 00 00 48 8b 83 18 3f 00 00 48 8d
Apr  6 03:26:11 compute6 libvirtd[1671]: 2020-04-06 03:26:11.955+:
1671: error : qemuMonitorIO:719 : internal error: End of file f
rom qemu monitor
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io