Hi Vladimir, The packages are available on launchpad here: https://launchpad.net/ubuntu/+source/qemu/1:2.5+dfsg-5ubuntu10.5/+build/10938755
On Tue, 20 Dec 2016 at 12:49 Vladimir Prokofev <v...@prokofev.me> wrote: > Using > compute1:~$ dpkg-query -W qemu-system-x86 > qemu-system-x86 1:2.5+dfsg-5ubuntu10.6 > compute1:~$ qemu-system-x86_64 --version > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.6), Copyright (c) > 2003-2008 Fabrice Bellard > > So I guess you may be right. Now I have to confirm this. > Did you build your package from source? 1:2.5+dfsg-5ubuntu10.5 is no > longer available in official repository, and I can't find it anywhere > except for the source code. > > > 2016-12-20 12:17 GMT+03:00 Sean Redmond <sean.redmo...@gmail.com>: > > What version of qemu are you running? I was hit by this bug[1] that seems > to give off the same faults you are reporting and had to downgrade qemu > packages to version '1:2.5+dfsg-5ubuntu10.5' > > [1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1647389 > > On Tue, Dec 20, 2016 at 12:10 AM, Vladimir Prokofev <v...@prokofev.me> wrote: > > Hello Ops. > > I want to pick your brains on a live migration issue, cause I'm kinda > stuck atm. > > I'm running a small HA Ubuntu Openstack cloud - 3 controllers(VMs on > Ubuntu/KVM) with ha-proxy LB, corosync/pacemaker manages VIP, Galera DB, 3 > compute nodes(using KVM hypervisor), 3 network nodes(VMs on Ubuntu/KVM), > CEPH cluster(4 OSD nodes, 3 MON nodes(VMs on Ubuntu/KVM)). > Nova, cinder, and glance use CEPH as backend. Neutron uses linux-bridge. > > We were running Ubuntu Trusty 14.04/Mitaka, and decided to upgrade to > Ubuntu Xenial 16.04/Newton. The issue appeared sometime after upgrade of > all Mitaka packages to the latest version prior to Xenial upgrade, and > stayed even after upgrade to Xenial/Newton. It's hard to tell for sure as > we didn't use LM for a while. It worked fine under Trusty/Mitaka, and broke > under Xenial/latest-Mitaka. > > > With a chance of 20-30% live migration will fail. The instance will pause > on a source node, but will not be resumed on target node. Target node will > destroy instance assuming LM failed, and on source node instance will stay > paused. On source node no new messages appear in nova-compute.log, and > commands such as "virsh list" won't provide any output, or even exit for > that matter. nova-compute can be restarted, but after normal startup > messages it once again doesn't provide any new log entries. Nova considers > compute node as up and running. > In dashboard you can see that instance is now residing at a new host, and > is in shutoff state. > > If I restart libvirtd on target or source node then whole system > "unfreezes". Source host unpauses instance, and it is live again. But it > now resides on a source node, while database thinks it's on a target node > and is shutdown. Warning messages will appear on source host: > > 2016-12-20 01:13:37.095 20025 WARNING nova.compute.manager > [req-b3879539-989f-4075-9ef0-d23ef8868102 - - - - -] While synchronizing > instance power states, found 3 instances in the database and 4 instances on > the hypervisor. > > Currenly I'm stopping/destroying/undefining instance on a source node, and > launching it via standart Openstack means, but this leads to an instance > reboot. > > > Last messages in nova-compute.log on source node after LM start: > 2016-12-20 00:09:35.961 16834 INFO nova.virt.libvirt.migration > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Increasing downtime to 1251 ms after > 0 sec elapsed time > 2016-12-20 00:09:36.127 16834 INFO nova.virt.libvirt.driver > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Migration running for 0 secs, memory > 100% remaining; (bytes processed=0, remaining=0, total=0) > 2016-12-20 00:09:36.894 16834 INFO nova.compute.manager > [req-3a2a828f-8b76-4a65-b49b-ea8d232a3de5 - - - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] VM Paused (Lifecycle Event) > 2016-12-20 00:09:37.046 16834 INFO nova.compute.manager > [req-3a2a828f-8b76-4a65-b49b-ea8d232a3de5 - - - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] During sync_power_state the instance > has a pending task (migrating). Skip. > 2016-12-20 00:09:37.300 16834 INFO nova.virt.libvirt.driver > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Migration operation has completed > 2016-12-20 00:09:37.301 16834 INFO nova.compute.manager > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] _post_live_migration() is started.. > 2016-12-20 00:09:39.718 16834 INFO os_vif > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] Successfully unplugged vif > VIFBridge(active=False,address=fa:16:3e:e3:c4:c7,bridge_name='brq0b7e75ee-b6',has_traffic_filtering=True,id=2a141727-77a7-4b43-b4cd-21fb415a6a55,network=Network(0b7e75ee-b691-4851-9b29-08c3c62a96e0),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap2a141727-77') > 2016-12-20 00:09:39.777 16834 INFO nova.virt.libvirt.driver > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Deleting instance files > /var/lib/nova/instances/cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53_del > 2016-12-20 00:09:39.778 16834 INFO nova.virt.libvirt.driver > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Deletion of > /var/lib/nova/instances/cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53_del complete > 2016-12-20 00:10:33.644 16834 INFO nova.compute.resource_tracker > [req-00c4c403-d039-4347-b7ad-3098834a891c - - - - -] Auditing locally > available compute resources for node compute1.cloud1.local > > No new messages will appear until libvirtd restart. > > > Messages during LM in nova-compute.log on target node: > 2016-12-20 00:09:32.234 29438 INFO nova.virt.libvirt.driver > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] Instance launched has CPU info: > {"vendor": "Intel", "model": "Nehalem", "arch": "x86_64", "features": > ["pge", "clflush", "sep", "syscall", "vme", "dtes64", "msr", "vmx", "xtpr", > "cmov", "ssse3", "est", "pat", "monitor", "smx", "pbe", "lm", "tsc", "nx", > "fxsr", "tm", "sse4.1", "pae", "sse4.2", "acpi", "mmx", "cx8", "mce", "de", > "rdtscp", "ht", "dca", "lahf_lm", "pdcm", "mca", "pdpe1gb", "apic", "sse", > "pse", "ds", "invtsc", "pni", "tm2", "sse2", "ss", "ds_cpl", "pcid", "fpu", > "cx16", "pse36", "mtrr", "popcnt"], "topology": {"cores": 6, "cells": 2, > "threads": 2, "sockets": 1}} > 2016-12-20 00:09:34.476 29438 INFO os_vif > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] Successfully plugged vif > VIFBridge(active=True,address=fa:16:3e:e3:c4:c7,bridge_name='brq0b7e75ee-b6',has_traffic_filtering=True,id=2a141727-77a7-4b43-b4cd-21fb415a6a55,network=Network(0b7e75ee-b691-4851-9b29-08c3c62a96e0),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap2a141727-77') > 2016-12-20 00:09:36.711 29438 INFO nova.compute.manager > [req-75abcda4-701a-4b84-8f28-c11f6ad6f933 - - - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] VM Started (Lifecycle Event) > 2016-12-20 00:09:39.777 29438 INFO nova.compute.manager > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Post operation of migration started > 2016-12-20 00:09:40.381 29438 INFO nova.network.neutronv2.api > [req-51801b5e-b77a-4d76-ad87-176326ac910e 84498aa7e26443c4908d973f3e86d530 > ecee1197f46e453dba25669554226ce5 - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Updating port > 2a141727-77a7-4b43-b4cd-21fb415a6a55 with attributes {'binding:profile': > {}, 'binding:host_id': 'compute3'} > 2016-12-20 00:09:52.589 29438 INFO nova.compute.manager [-] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] VM Stopped (Lifecycle Event) > 2016-12-20 00:09:52.747 29438 WARNING nova.compute.manager > [req-e945fe0d-dea0-47ee-8b87-b88991b9796e - - - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Instance shutdown by itself. Calling > the stop API. Current vm_state: active, current task_state: None, original > DB power_state: 4, current VM power_state: 4 > 2016-12-20 00:09:52.964 29438 INFO nova.compute.manager > [req-e945fe0d-dea0-47ee-8b87-b88991b9796e - - - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Instance is already powered off in > the hypervisor when stop is called. > 2016-12-20 00:09:53.039 29438 INFO nova.virt.libvirt.driver > [req-e945fe0d-dea0-47ee-8b87-b88991b9796e - - - - -] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Instance already shutdown. > 2016-12-20 00:09:53.043 29438 INFO nova.virt.libvirt.driver [-] [instance: > cd8cb1db-dca3-4b0f-a03e-c0befbbd7b53] Instance destroyed successfully. > 2016-12-20 00:09:56.696 29438 INFO nova.compute.resource_tracker > [req-53e5b7a1-aba9-4b26-a3fb-b9b293bf2f17 - - - - -] Auditing locally > available compute resources for node compute3.cloud1.local > 2016-12-20 00:09:57.295 29438 INFO nova.compute.resource_tracker > [req-53e5b7a1-aba9-4b26-a3fb-b9b293bf2f17 - - - - -] Total usable vcpus: > 24, total allocated vcpus: 39 > 2016-12-20 00:09:57.295 29438 INFO nova.compute.resource_tracker > [req-53e5b7a1-aba9-4b26-a3fb-b9b293bf2f17 - - - - -] Final resource view: > name=compute3.cloud1.local phys_ram=145060MB used_ram=94720MB > phys_disk=14710GB used_disk=45GB total_vcpus=24 used_vcpus=39 pci_stats=[] > > Target node will proceed normal operations. > > Nothing interesting appears in libvirtd log or syslog on any compute node. > I've tried running debug log for libvirt, but it's unreadable for me - all > I can figure out is that libvirt is still running and other instances are > unaffected. > > > So, my questions. > - Is there any workaround to sync the state of DB and compute node? I wish > at least to avoid instance reboot, and for that I need some way to tell DB > that instance is on a source node and running, without service > interruption. Is it ok to update database directly via SQL? > - How to further troubleshoot this issue? As I said. I'm kinda stuck and > have no idea where to look further. If the issue appeared 100% of the time > it would be a lot easier, but there's no pattern I can figure out - it > happens randomly with heavy loaded instances, and idle instances alike. > > Any suggestions appreciated. I need a boot in a right direction. > I will provide any logs and/or config files per request. > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators