On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote:
> Aurelien Jarno wrote:
> > Hi all,
> >
> > For a long time I am seeing data corruption in guests when using KVM,
> > but I am convinced only since today that the problem comes from KVM.
> >
> > The symptoms are a few bytes that are mangled to 0x00 in a file that has
> > been written. For now I have only seen 2 or 4 consecutive bytes mangled,
> > but that may due to statistics given the limited samples.
> >
> > The problem appears very rarely. I am only seeing it when doing huge 
> > compilations (for example gcc or glibc), and not for every build. Note
> > that I am only detecting build failures, so I can miss some corruptions.
> >
> > Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
> > plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
> > hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
> > (always multi-core).
> >
> > I have never seen such corruptions using QEMU, so I would say the
> > problem does not comes from the disk emulation, though it may be due to
> > statistics. Note that I have made a lot of compilation in a MIPS QEMU
> > guest (a few hundred of hours), without any problem. This platform uses
> > the same IDE controller as the one in KVM.
> >
> > Does anybody have seen the same kind of problem? Without a way to 
> > reproduce the corruption, I think it will be very difficult to debug 
> > the problem.
> 
> Did you observe anything about the corruption?  For example, are the 
> offsets at page boundary?  Can you provide a corrupted file and the 
> same, non-corrupted file as a reference?

For now I am still trying to find an easy way to reproduce it. You will
find below a sample of a bad and a good file. I have gzipped them to
make sure they will not be mangled once more by a MUA or a MTA.

What is strange with this sample is that the size of the file is not the
same. I will try to get more corrupted file.

I have been able to reproduce the bug with one or multiple guests
running, so it is not dependent on the number of guests running.


> For the 32-bit case, were the guests pae, nonpae, or both?

I am using nonpae guests (I only give 1GB of memory to the guests).


> How would I go about reproducing this?   Is a single ./configure; make 
> clean; make in a loop compiling gcc sufficient?

Yes basically that's what I am doing but on the glibc sources as I get 
more "success" to reproduce the bug. Note that you should run the 
configure in a different directory from the sources.

I generally observed the bug every 10 to 15 builds. One build takes
about 45 minutes here.

-- 
  .''`.  Aurelien Jarno             | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   [EMAIL PROTECTED]         | [EMAIL PROTECTED]
   `-    people.debian.org/~aurel32 | www.aurel32.net

Attachment: sem_close.o.d.bad.gz
Description: Binary data

Attachment: sem_close.o.d.good.gz
Description: Binary data

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to