[ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating
Problem with signed / unsigned sounds about right. Not having much luck with addr2line though. I just manually migrated the VM to cause the problem again, not sure if this partial NUMA config warning could be contributing: 2020-04-09T23:54:05.537028Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future 2020-04-11 07:25:04.146+: initiating migration tcmalloc: large alloc 562949953421312 bytes == (nil) @ 0x7f4a0bb464ef 0x7f4a0bb66367 0x7f4a2364b736 0x561527438ac8 0x5615274398e5 0x5615273e9bae 0x5615273f07b6 0x5615275b8de5 0x5615275b4bdf 0x7f4a0aaeee65 0x7f4a0a81788d (process:32202): GLib-ERROR **: 08:25:04.151: gmem.c:135: failed to allocate 562949953421312 bytes 2020-04-11 07:25:08.408+: shutting down, reason=crashed Attempt to use addr2line # addr2line -e /usr/libexec/qemu-kvm 0x7f4a0bb464ef 0x7f4a0bb66367 0x7f4a2364b736 0x561527438ac8 0x5615274398e5 0x5615273e9bae 0x5615273f07b6 0x5615275b8de5 0x5615275b4bdf 0x7f4a0aaeee65 0x7f4a0a81788d ??:0 ??:0 Single addresses give the same: 0x7f4a0bb464ef ??:0 0x7f4a0a81788d ??:0 Maybe need debug packages ? On Fri, 10 Apr 2020 at 22:23, wrote: > I found this thread on Stack overflow: > > > https://stackoverflow.com/questions/9077457/how-to-trace-tcmalloc-large-alloc > > > > See > http://code.google.com/p/gperftools/source/browse/trunk/src/tcmalloc.cc?r=80&redir=1 > line > 843 > > Depending on your application - the large allocation may or may not be a > bug. > > In any case - the part after the @ mark is a stack trace and can be used > to locate the source of the message > > The repeating number (4294488064 which seems to be equal to 4G-479232 or > 0x1-0x75000) makes me suspect the original allocation call got a > negative signed value and used it as an unsigned value. > > It also had this to trace the memory leak: > > to trace the mem address to a line in your code, use addr2line commandline > tool.. use it as addr2line -e then press enter and then > paste an address and press enter > > > > I’m not sure if this is helpful but it does sound like a memory leak. > > > > In a related Microsoft doc it stated: > > > > 1073741824 Allocations larger than this value cause a stack trace > to be dumped to stderr. The threshold for dumping stack traces is increased > by a factor of 1.125 every time we print a message so that the threshold > automatically goes up by a factor of ~1000 every 60 messages. This bounds > the amount of extra logging generated by this flag. Default value of this > flag is very large and therefore you should see no extra logging unless the > flag is overridden. > > > > The default in Windows is 1 GB. I’m not sure about Linux. > > > > I hope this is helpful. > > > > Eric Evans > > Digital Data Services LLC. > > 304.660.9080 > > > > *From:* Maton, Brett > *Sent:* Friday, April 10, 2020 4:53 PM > *To:* eev...@digitaldatatechs.com > *Cc:* Ovirt Users > *Subject:* [ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when > migrating > > > > The hosts are identical, and yes I'm sure about the 563 terrabytes, which > is obviously wrong, and why I mentioned it. Possibly an overflow? > > > > On Fri, 10 Apr 2020, 21:31 , wrote: > > I have a Windows 10 guest and a Server 2016 guest that migrate without an > issue. > Are your CPU architectures comparable between the hosts? > BTW, 56294995342131 bytes is 562 terabytes. Are you sure that's correct? > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JDAC6SVJIPJRMLDHHZIREUGC3EDR6FP/ > > > > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/D74C7MSRJEQOTPNDB55XTRDSNT2WK6ST/
[ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating
I found this thread on Stack overflow: https://stackoverflow.com/questions/9077457/how-to-trace-tcmalloc-large-alloc See <http://code.google.com/p/gperftools/source/browse/trunk/src/tcmalloc.cc?r=80&redir=1> http://code.google.com/p/gperftools/source/browse/trunk/src/tcmalloc.cc?r=80&redir=1 line 843 Depending on your application - the large allocation may or may not be a bug. In any case - the part after the @ mark is a stack trace and can be used to locate the source of the message The repeating number (4294488064 which seems to be equal to 4G-479232 or 0x1-0x75000) makes me suspect the original allocation call got a negative signed value and used it as an unsigned value. It also had this to trace the memory leak: to trace the mem address to a line in your code, use addr2line commandline tool.. use it as addr2line -e then press enter and then paste an address and press enter I’m not sure if this is helpful but it does sound like a memory leak. In a related Microsoft doc it stated: 1073741824 Allocations larger than this value cause a stack trace to be dumped to stderr. The threshold for dumping stack traces is increased by a factor of 1.125 every time we print a message so that the threshold automatically goes up by a factor of ~1000 every 60 messages. This bounds the amount of extra logging generated by this flag. Default value of this flag is very large and therefore you should see no extra logging unless the flag is overridden. The default in Windows is 1 GB. I’m not sure about Linux. I hope this is helpful. Eric Evans Digital Data Services LLC. 304.660.9080 From: Maton, Brett Sent: Friday, April 10, 2020 4:53 PM To: eev...@digitaldatatechs.com Cc: Ovirt Users Subject: [ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating The hosts are identical, and yes I'm sure about the 563 terrabytes, which is obviously wrong, and why I mentioned it. Possibly an overflow? On Fri, 10 Apr 2020, 21:31 , mailto:eev...@digitaldatatechs.com> > wrote: I have a Windows 10 guest and a Server 2016 guest that migrate without an issue. Are your CPU architectures comparable between the hosts? BTW, 56294995342131 bytes is 562 terabytes. Are you sure that's correct? ___ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-le...@ovirt.org <mailto:users-le...@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JDAC6SVJIPJRMLDHHZIREUGC3EDR6FP/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BQCP4V32U3EX2UCCSSJEBX55EGTQMH3V/
[ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating
The hosts are identical, and yes I'm sure about the 563 terrabytes, which is obviously wrong, and why I mentioned it. Possibly an overflow? On Fri, 10 Apr 2020, 21:31 , wrote: > I have a Windows 10 guest and a Server 2016 guest that migrate without an > issue. > Are your CPU architectures comparable between the hosts? > BTW, 56294995342131 bytes is 562 terabytes. Are you sure that's correct? > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JDAC6SVJIPJRMLDHHZIREUGC3EDR6FP/ > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YRJ2QS4NXAKISWRMPOFHDO74V63ARPBN/
[ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating
I have a Windows 10 guest and a Server 2016 guest that migrate without an issue. Are your CPU architectures comparable between the hosts? BTW, 56294995342131 bytes is 562 terabytes. Are you sure that's correct? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JDAC6SVJIPJRMLDHHZIREUGC3EDR6FP/
[ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating
Any other suggestions ? I'm already running a later version of qemu (qemu-kvm-ev-2.12.0-33.1.el7_7.4) than the one referenced (qemu-kvm-rhev-2.9.0-16) in https://access.redhat.com/solutions/3423481 (from what I can see on that page without a subscription). Regards, Brett On Tue, 7 Apr 2020 at 11:56, Maton, Brett wrote: > I haven't got an active RHEL subscription so I can't view that solution > unfortunately. > > > Thanks for the log pointers though, looking in the qemu log I'm not > surprised it's crashing... > > tcmalloc: large alloc 562949953421312 bytes == (nil) @ 0x7f93c080b4ef > 0x7f93c082b367 0x7f93d8310736 0x55efa0670ac8 0x55efa06718e5 0x55efa0621bae > 0x55efa06287b6 0x55efa07f0de5 0x55efa07ecbdf 0x7f93bf7b3e65 0x7f93bf4dc88d > > (process:1374): GLib-ERROR **: 09:26:39.525: gmem.c:135: *failed to > allocate 562949953421312 bytes* > 2020-04-06 08:26:43.036+: shutting down, reason=crashed > ... > libvirt version: 4.5.0, package: 23.el7_7.6 (CentOS BuildSystem < > http://bugs.centos.org>, 2020-03-17-23:39:10, x86-01.bsys.centos.org), > qemu version: 2.12.0qemu-kvm-ev-2.12.0-33.1.el7_7.4, kernel: > 3.10.0-1062.18.1.el7.x86_64 > > 562949953421312 bytes is mighty big, nigh on 563 TB! > The VM in question is allocated 4GB RAM and has a 60GB disk... > > Couldn't see any errors in the VDSM log at the time that qemu failed. > > On Tue, 7 Apr 2020 at 10:52, Shani Leviim wrote: > >> Hi Brett, >> According to [1], you can try to update the package qemu-kvm-rhev. >> (Or yum update if there're more packages related need to be upgraded). >> >> You may also find some more information about that error on the vdsm log >> (/var/log/vdsm/vdsm.log) >> and the qemu log (/var/log/libvirt/qemu/vm_name.log) >> >> [1] https://access.redhat.com/solutions/3423481 >> >> >> *Regards,* >> >> *Shani Leviim* >> >> >> On Mon, Apr 6, 2020 at 12:09 PM Maton, Brett >> wrote: >> >>> I recently added a Windows 10 Pro 64 bit (release 1909) VM, and I'm >>> seeing a lot of failures when oVirt tries to move the VM to another host >>> (triggered by load balancing), >>> >>> These errors are showing up in the UI event log >>> >>> Migration failed (VM: , Source: , Destination: >> 2>). >>> >>> Followed by: >>> >>> VM is down with error. Exit message: Lost connection with qemu >>> process. >>> >>> Google returned some references to 'options kvm ignore_msrs=1' which >>> I've added to /etc/modprobe/d/kvm.conf and restarted the hosts but that >>> doesn't appear to have made a difference. >>> >>> Is this a known issue with Windows 10 guests? >>> ___ >>> Users mailing list -- users@ovirt.org >>> To unsubscribe send an email to users-le...@ovirt.org >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QNJ7GFDXKBVREHJY4FBIORLBVEBO353R/ >>> >> ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/JNNOKV3N6E3BZ2FGLYCO2UULDLF6WENN/
[ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating
I haven't got an active RHEL subscription so I can't view that solution unfortunately. Thanks for the log pointers though, looking in the qemu log I'm not surprised it's crashing... tcmalloc: large alloc 562949953421312 bytes == (nil) @ 0x7f93c080b4ef 0x7f93c082b367 0x7f93d8310736 0x55efa0670ac8 0x55efa06718e5 0x55efa0621bae 0x55efa06287b6 0x55efa07f0de5 0x55efa07ecbdf 0x7f93bf7b3e65 0x7f93bf4dc88d (process:1374): GLib-ERROR **: 09:26:39.525: gmem.c:135: *failed to allocate 562949953421312 bytes* 2020-04-06 08:26:43.036+: shutting down, reason=crashed ... libvirt version: 4.5.0, package: 23.el7_7.6 (CentOS BuildSystem < http://bugs.centos.org>, 2020-03-17-23:39:10, x86-01.bsys.centos.org), qemu version: 2.12.0qemu-kvm-ev-2.12.0-33.1.el7_7.4, kernel: 3.10.0-1062.18.1.el7.x86_64 562949953421312 bytes is mighty big, nigh on 563 TB! The VM in question is allocated 4GB RAM and has a 60GB disk... Couldn't see any errors in the VDSM log at the time that qemu failed. On Tue, 7 Apr 2020 at 10:52, Shani Leviim wrote: > Hi Brett, > According to [1], you can try to update the package qemu-kvm-rhev. > (Or yum update if there're more packages related need to be upgraded). > > You may also find some more information about that error on the vdsm log > (/var/log/vdsm/vdsm.log) > and the qemu log (/var/log/libvirt/qemu/vm_name.log) > > [1] https://access.redhat.com/solutions/3423481 > > > *Regards,* > > *Shani Leviim* > > > On Mon, Apr 6, 2020 at 12:09 PM Maton, Brett > wrote: > >> I recently added a Windows 10 Pro 64 bit (release 1909) VM, and I'm >> seeing a lot of failures when oVirt tries to move the VM to another host >> (triggered by load balancing), >> >> These errors are showing up in the UI event log >> >> Migration failed (VM: , Source: , Destination: > 2>). >> >> Followed by: >> >> VM is down with error. Exit message: Lost connection with qemu >> process. >> >> Google returned some references to 'options kvm ignore_msrs=1' which >> I've added to /etc/modprobe/d/kvm.conf and restarted the hosts but that >> doesn't appear to have made a difference. >> >> Is this a known issue with Windows 10 guests? >> ___ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QNJ7GFDXKBVREHJY4FBIORLBVEBO353R/ >> > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XAYHLSBLONRGROONCQPWXSBBYVFFK6KK/
[ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating
Hi Brett, According to [1], you can try to update the package qemu-kvm-rhev. (Or yum update if there're more packages related need to be upgraded). You may also find some more information about that error on the vdsm log (/var/log/vdsm/vdsm.log) and the qemu log (/var/log/libvirt/qemu/vm_name.log) [1] https://access.redhat.com/solutions/3423481 *Regards,* *Shani Leviim* On Mon, Apr 6, 2020 at 12:09 PM Maton, Brett wrote: > I recently added a Windows 10 Pro 64 bit (release 1909) VM, and I'm seeing > a lot of failures when oVirt tries to move the VM to another host > (triggered by load balancing), > > These errors are showing up in the UI event log > > Migration failed (VM: , Source: , Destination: 2>). > > Followed by: > > VM is down with error. Exit message: Lost connection with qemu > process. > > Google returned some references to 'options kvm ignore_msrs=1' which I've > added to /etc/modprobe/d/kvm.conf and restarted the hosts but that doesn't > appear to have made a difference. > > Is this a known issue with Windows 10 guests? > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/QNJ7GFDXKBVREHJY4FBIORLBVEBO353R/ > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6H735KZWX7DMB6ONUYRSNZ3R5IUBQ4WY/