Re: [Openstack-operators] qemu 1.x to 2.0

Daniele Venzano Mon, 20 Oct 2014 23:01:58 -0700

The version we are using is:
1.10.2-0ubuntu2~cloud0

The version that was not working for us is:
2.0.1+git20140120-0ubuntu2~cloud1


Network:
Intel Corporation I350 Gigabit Network Connection (igb module)

We were seeing the problem, strangely enough, at the application level,inside the VMs, where Hadoop was reporting corrupted data on TCPconnections. No other messages on the hypervisor or in the VM kernel.Hadoop makes lots of connections to lots of different VMs moving lots(terabytes) of data as fast as posssibile. Also, it wasnon-deterministic, Hadoop would try several times to transfer the data,sometimes successfully, sometimes giving up. I tried some quick iperftests, but they worked fine.


Daniele

On 10/20/14 18:46, Manish Godara wrote:

> We had to do the same downgrade with openvswitch, the newestversion, under heavy load, corrupts packets in-transit, but we do nothave the time to investigate the issue further.
Daniele, what was the openvswitch version before and after theupgrade? And which ethernet drivers do you have? The corruptionmaybe related to the drivers you have (the issues may be triggered bythe way openvswitch flows are configured in Icehouse vs Havana).
Thanks.
From: Daniele Venzano <daniele.venz...@eurecom.fr<mailto:daniele.venz...@eurecom.fr>>
Organization: Eurecom
Date: Sunday, October 19, 2014 11:46 PM
To: "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>"<openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] qemu 1.x to 2.0
We have the same setup (Icehouse on Ubuntu 12.04) and had similarissues. We downgraded qemu from 2.x to 1.x, as we cannot terminate allVMs for all users. We had non-resumable VMs also in the middle of the1.x series and nothing was documented in the changlelog.We had to do the same downgrade with openvswitch, the newest version,under heavy load, corrupts packets in-transit, but we do not have thetime to investigate the issue further.
We plan to warn our users in time for the next major upgrade to Junothat all VMs need to be terminated, probably during the Christmasholidays. I do not think they will be happy.Seeing also all the problems we had upgrading Neutron from OVS to ML2,terminating all VMs is probably the best policy anyway during anOpenStack upgrade. Or you do lots of migrations and upgrade qemu onecompute host at the time, but if something goes wrong you end-up withan angry user and a stuck VM.
It certainly is a big deal.

On 10/20/14 00:59, Joe Topjian wrote:
Hello,
We recently upgraded an OpenStack Grizzly environment to Icehouse(doing a quick stop-over at Havana). This environment is stillrunning Ubuntu 12.04.
The Ubuntu 14.04 release notes<https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes#Ubuntu_Server> makemention of incompatibilities with 12.04 and moving to 14.04 and qemu2.0. I didn't think that this would apply for upgrades staying on12.04, but it indeed does.
We found that existing instances could not be live migrated (as perthe release notes). Additionally, instances that were hard-rebootedand had the libvirt xml file rebuilt could no longer start, either.
The exact error message we saw was:

"Length mismatch: vga.vram: 1000000 in != 800000"
I found a few bugs that are related to this, but I don't thinkthey're fully relevant to the issue I ran into:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1308756
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1291321
https://bugs.launchpad.net/nova/+bug/1312133
We ended up downgrading to the stock Ubuntu 12.04 qemu 1.0 packagesand everything is working nicely.
I'm wondering if anyone else has run into this issue and how theydealt with it or plan to deal with it.
Also, I'm curious as to why exactly qemu 1.x to 2.0 are incompatiblewith each other. Is this just an Ubuntu issue? Or is this native of qemu?
Unless I'm missing something, this seems like a big deal. If wecontinue to use Ubuntu's OpenStack packages, we're basically stuck at12.04 and Icehouse unless we have all users snapshot their instanceand re-launch in a new cloud.
Thanks,
Joe



_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] qemu 1.x to 2.0

Reply via email to