Aurelien Jarno wrote: > On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote: > >> Aurelien Jarno wrote: >> >>> Hi all, >>> >>> For a long time I am seeing data corruption in guests when using KVM, >>> but I am convinced only since today that the problem comes from KVM. >>> >>> The symptoms are a few bytes that are mangled to 0x00 in a file that has >>> been written. For now I have only seen 2 or 4 consecutive bytes mangled, >>> but that may due to statistics given the limited samples. >>> >>> The problem appears very rarely. I am only seeing it when doing huge >>> compilations (for example gcc or glibc), and not for every build. Note >>> that I am only detecting build failures, so I can miss some corruptions. >>> >>> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and >>> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit >>> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU >>> (always multi-core). >>> >>> I have never seen such corruptions using QEMU, so I would say the >>> problem does not comes from the disk emulation, though it may be due to >>> statistics. Note that I have made a lot of compilation in a MIPS QEMU >>> guest (a few hundred of hours), without any problem. This platform uses >>> the same IDE controller as the one in KVM. >>> >>> Does anybody have seen the same kind of problem? Without a way to >>> reproduce the corruption, I think it will be very difficult to debug >>> the problem. >>> >> Did you observe anything about the corruption? For example, are the >> offsets at page boundary? Can you provide a corrupted file and the >> same, non-corrupted file as a reference? >> > > For now I am still trying to find an easy way to reproduce it. You will > find below a sample of a bad and a good file. I have gzipped them to > make sure they will not be mangled once more by a MUA or a MTA. > > What is strange with this sample is that the size of the file is not the > same. I will try to get more corrupted file. >
I guess that this is because the corruption is in some userspace data structure, not pagecache, so there is not a 1:1 correspondence between the area corrupted and the output file. If you do happen to get a same-size corruption, that may tell us more. > >> How would I go about reproducing this? Is a single ./configure; make >> clean; make in a loop compiling gcc sufficient? >> > > Yes basically that's what I am doing but on the glibc sources as I get > more "success" to reproduce the bug. Note that you should run the > configure in a different directory from the sources. > > I generally observed the bug every 10 to 15 builds. One build takes > about 45 minutes here. > > Okay, I am running a glibc build on a 384 MB x86-64 guest, in a loop. We'll see how it goes. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel