On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote: > Aurelien Jarno wrote: > > Hi all, > > > > For a long time I am seeing data corruption in guests when using KVM, > > but I am convinced only since today that the problem comes from KVM. > > > > The symptoms are a few bytes that are mangled to 0x00 in a file that has > > been written. For now I have only seen 2 or 4 consecutive bytes mangled, > > but that may due to statistics given the limited samples. > > > > The problem appears very rarely. I am only seeing it when doing huge > > compilations (for example gcc or glibc), and not for every build. Note > > that I am only detecting build failures, so I can miss some corruptions. > > > > Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and > > plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit > > hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU > > (always multi-core). > > > > I have never seen such corruptions using QEMU, so I would say the > > problem does not comes from the disk emulation, though it may be due to > > statistics. Note that I have made a lot of compilation in a MIPS QEMU > > guest (a few hundred of hours), without any problem. This platform uses > > the same IDE controller as the one in KVM. > > > > Does anybody have seen the same kind of problem? Without a way to > > reproduce the corruption, I think it will be very difficult to debug > > the problem. > > Did you observe anything about the corruption? For example, are the > offsets at page boundary? Can you provide a corrupted file and the > same, non-corrupted file as a reference?
For now I am still trying to find an easy way to reproduce it. You will find below a sample of a bad and a good file. I have gzipped them to make sure they will not be mangled once more by a MUA or a MTA. What is strange with this sample is that the size of the file is not the same. I will try to get more corrupted file. I have been able to reproduce the bug with one or multiple guests running, so it is not dependent on the number of guests running. > For the 32-bit case, were the guests pae, nonpae, or both? I am using nonpae guests (I only give 1GB of memory to the guests). > How would I go about reproducing this? Is a single ./configure; make > clean; make in a loop compiling gcc sufficient? Yes basically that's what I am doing but on the glibc sources as I get more "success" to reproduce the bug. Note that you should run the configure in a different directory from the sources. I generally observed the bug every 10 to 15 builds. One build takes about 45 minutes here. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' [EMAIL PROTECTED] | [EMAIL PROTECTED] `- people.debian.org/~aurel32 | www.aurel32.net
sem_close.o.d.bad.gz
Description: Binary data
sem_close.o.d.good.gz
Description: Binary data
------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel