Aurelien Jarno wrote:
> On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote:
>   
>> Aurelien Jarno wrote:
>>     
>>> Hi all,
>>>
>>> For a long time I am seeing data corruption in guests when using KVM,
>>> but I am convinced only since today that the problem comes from KVM.
>>>
>>> The symptoms are a few bytes that are mangled to 0x00 in a file that has
>>> been written. For now I have only seen 2 or 4 consecutive bytes mangled,
>>> but that may due to statistics given the limited samples.
>>>
>>> The problem appears very rarely. I am only seeing it when doing huge 
>>> compilations (for example gcc or glibc), and not for every build. Note
>>> that I am only detecting build failures, so I can miss some corruptions.
>>>
>>> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
>>> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
>>> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
>>> (always multi-core).
>>>
>>> I have never seen such corruptions using QEMU, so I would say the
>>> problem does not comes from the disk emulation, though it may be due to
>>> statistics. Note that I have made a lot of compilation in a MIPS QEMU
>>> guest (a few hundred of hours), without any problem. This platform uses
>>> the same IDE controller as the one in KVM.
>>>
>>> Does anybody have seen the same kind of problem? Without a way to 
>>> reproduce the corruption, I think it will be very difficult to debug 
>>> the problem.
>>>       
>> Did you observe anything about the corruption?  For example, are the 
>> offsets at page boundary?  Can you provide a corrupted file and the 
>> same, non-corrupted file as a reference?
>>     
>
> For now I am still trying to find an easy way to reproduce it. You will
> find below a sample of a bad and a good file. I have gzipped them to
> make sure they will not be mangled once more by a MUA or a MTA.
>
> What is strange with this sample is that the size of the file is not the
> same. I will try to get more corrupted file.
>   

I guess that this is because the corruption is in some userspace data 
structure, not pagecache, so there is not a 1:1 correspondence between 
the area corrupted and the output file.

If you do happen to get a same-size corruption, that may tell us more.

>
>> How would I go about reproducing this?   Is a single ./configure; make 
>> clean; make in a loop compiling gcc sufficient?
>>     
>
> Yes basically that's what I am doing but on the glibc sources as I get 
> more "success" to reproduce the bug. Note that you should run the 
> configure in a different directory from the sources.
>
> I generally observed the bug every 10 to 15 builds. One build takes
> about 45 minutes here.
>
>   

Okay, I am running a glibc build on a 384 MB x86-64 guest, in a loop.  
We'll see how it goes.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to