The version we are using is:
1.10.2-0ubuntu2~cloud0
The version that was not working for us is:
2.0.1+git20140120-0ubuntu2~cloud1
Network:
Intel Corporation I350 Gigabit Network Connection (igb module)
We were seeing the problem, strangely enough, at the application level,
inside the VMs, where Hadoop was reporting corrupted data on TCP
connections. No other messages on the hypervisor or in the VM kernel.
Hadoop makes lots of connections to lots of different VMs moving lots
(terabytes) of data as fast as posssibile. Also, it was
non-deterministic, Hadoop would try several times to transfer the data,
sometimes successfully, sometimes giving up. I tried some quick iperf
tests, but they worked fine.
Daniele
On 10/20/14 18:46, Manish Godara wrote:
> We had to do the same downgrade with openvswitch, the newest
version, under heavy load, corrupts packets in-transit, but we do not
have the time to investigate the issue further.
Daniele, what was the openvswitch version before and after the
upgrade? And which ethernet drivers do you have? The corruption
maybe related to the drivers you have (the issues may be triggered by
the way openvswitch flows are configured in Icehouse vs Havana).
Thanks.
From: Daniele Venzano <daniele.venz...@eurecom.fr
<mailto:daniele.venz...@eurecom.fr>>
Organization: Eurecom
Date: Sunday, October 19, 2014 11:46 PM
To: "openstack-operators@lists.openstack.org
<mailto:openstack-operators@lists.openstack.org>"
<openstack-operators@lists.openstack.org
<mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] qemu 1.x to 2.0
We have the same setup (Icehouse on Ubuntu 12.04) and had similar
issues. We downgraded qemu from 2.x to 1.x, as we cannot terminate all
VMs for all users. We had non-resumable VMs also in the middle of the
1.x series and nothing was documented in the changlelog.
We had to do the same downgrade with openvswitch, the newest version,
under heavy load, corrupts packets in-transit, but we do not have the
time to investigate the issue further.
We plan to warn our users in time for the next major upgrade to Juno
that all VMs need to be terminated, probably during the Christmas
holidays. I do not think they will be happy.
Seeing also all the problems we had upgrading Neutron from OVS to ML2,
terminating all VMs is probably the best policy anyway during an
OpenStack upgrade. Or you do lots of migrations and upgrade qemu one
compute host at the time, but if something goes wrong you end-up with
an angry user and a stuck VM.
It certainly is a big deal.
On 10/20/14 00:59, Joe Topjian wrote:
Hello,
We recently upgraded an OpenStack Grizzly environment to Icehouse
(doing a quick stop-over at Havana). This environment is still
running Ubuntu 12.04.
The Ubuntu 14.04 release notes
<https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes#Ubuntu_Server> make
mention of incompatibilities with 12.04 and moving to 14.04 and qemu
2.0. I didn't think that this would apply for upgrades staying on
12.04, but it indeed does.
We found that existing instances could not be live migrated (as per
the release notes). Additionally, instances that were hard-rebooted
and had the libvirt xml file rebuilt could no longer start, either.
The exact error message we saw was:
"Length mismatch: vga.vram: 1000000 in != 800000"
I found a few bugs that are related to this, but I don't think
they're fully relevant to the issue I ran into:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1308756
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1291321
https://bugs.launchpad.net/nova/+bug/1312133
We ended up downgrading to the stock Ubuntu 12.04 qemu 1.0 packages
and everything is working nicely.
I'm wondering if anyone else has run into this issue and how they
dealt with it or plan to deal with it.
Also, I'm curious as to why exactly qemu 1.x to 2.0 are incompatible
with each other. Is this just an Ubuntu issue? Or is this native of qemu?
Unless I'm missing something, this seems like a big deal. If we
continue to use Ubuntu's OpenStack packages, we're basically stuck at
12.04 and Icehouse unless we have all users snapshot their instance
and re-launch in a new cloud.
Thanks,
Joe
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators