Re: qcow2 corruption observed, fixed by reverting old change

2009-02-15 Thread Marc Bevand
On Sun, Feb 15, 2009 at 3:46 AM, Marc Bevand  wrote:
> Other factors you might consider when trying to reproduce: [...]

And the probability of that bug occuring seems less than 1% (I only
witnessed 6 or 7 occurences out of about a thousand shutdown events).

Also, contrary to what I said I am *not* sure whether the "quit"
monitor command was used or not. Instead the guests may have been
using ACPI to shut themselves down.

-marc
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption observed, fixed by reverting old change

2009-02-15 Thread Marc Bevand
On Sun, Feb 15, 2009 at 2:57 AM, Gleb Natapov  wrote:
>
> I am not able to reproduce this. After more then hundred boot linux; generate
> disk io; quit loops all I've got is an image with 7 leaked blocks and
> couple of filesystem corruptions that were fixed by fsck.

The type of activity occuring in the guest is most likely an important
factor determining the probability of the bug occuring. So you should
try running guest OSes I remember having been affected by it: Windows
2003 SP2 x64.

And now that I think about it, I don't recall any other guest OS
having been a victim of that bug... coincidence ?

Other factors you might consider when trying to reproduce: the qcow2
images that ended up being corrupted had a backing file (a read-only
qcow2 image); NPT was in use; the host filesystem was xfs; my command
line was:

$ qemu-system-x86_64 -name xxx -monitor stdio -vnc xxx:xxx -hda hda
-net nic,macaddr=xx:xx:xx:xx:xx:xx,model=rtl8139 -net tap -boot c
-cdrom "" -cpu qemu64 -m 1024 -usbdevice tablet

-marc
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption observed, fixed by reverting old change

2009-02-15 Thread Gleb Natapov
> > I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of 
> > the
> > qcow2 performance regression caused by the default writethrough caching 
> > policy)
> > but it randomly triggers an even worse bug: the moment I shut down a guest 
> > by
> > typing "quit" in the monitor, it sometimes overwrite the first 4kB of the 
> > disk
> > image with mostly NUL bytes (!) which completely destroys it. I am familiar 
> > with
> > the qcow2 format and apparently this 4kB block seems to be an L2 table with 
> > most
> > entries set to zero. I have had to restore at least 6 or 7 disk images from
> > backup after occurences of that bug. My intuition tells me this may be the 
> > qcow2
> > code trying to allocate a cluster to write a new L2 table, but not noticing 
> > the
> > allocation failed (represented by a 0 offset), and writing the L2 table at 
> > that
> > 0 offset, overwriting the qcow2 header.
> > 
> > Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c 
> > reverted
> > to its kvm-72 version.
> > 
> > Basically qcow2 in kvm-73 or newer is completely unreliable.
> > 
> > -marc
> 
> I think the corruption is a completely unrelated bug. I would suspect it
> was introduced in one of Gleb's patches in December. Adding him to CC.
> 
I am not able to reproduce this. After more then hundred boot linux; generate
disk io; quit loops all I've got is an image with 7 leaked blocks and
couple of filesystem corruptions that were fixed by fsck.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change

2009-02-14 Thread Marc Bevand
On Sat, Feb 14, 2009 at 2:28 PM, Dor Laor  wrote:
>
> Both qcow2 and vmdk have the ability to keep 'external' snapshots.

I know but they don't implement one feature I cited: clones, or
"writable snapshots", which I would like implemented with support for
deduplication. Base images / backing files are too limited because
they have to be managed by the enduser and there is no deduplication
done between multiple images based on the same backing file.

> We might use vmdk format or VHD as a base for the future high performing,
> safe image format for qemu

Neither vmdk nor vhd satisfy my requirements: not always consistent on
disk, no possibility of detecting/correcting errors, susceptible to
fragmentation (affects vmdk, not sure about vhd), and possibly others.

Jamie: yes in an ideal world, the storage virtualization layer could
make use of the host's filesystem or block layer snapshotting/cloning
features, but in the real world too few OSes implement these.

-marc
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change

2009-02-14 Thread Jamie Lokier
Marc Bevand wrote:
> On Fri, Feb 13, 2009 at 8:23 AM, Jamie Lokier  wrote:
> >
> > Marc..  this is quite a serious bug you've reported.  Is there a
> > reason you didn't report it earlier?
> 
> Because I only started hitting that bug a couple weeks ago after
> having upgraded to a buggy kvm version.
> 
> > Is there a way to restructure the code and/or how it works so it's
> > more clearly correct?
> 
> I am seriously concerned about the general design of qcow2. The code
> base is more complex than it needs to be, the format itself is
> susceptible to race conditions causing cluster leaks when updating
> some internal datastructures, it gets easily fragmented, etc.

When I read it, I thought the code was remarkably compact for what it
does, although I agree that the leaks, fragmentation and inconsistency
on crashes are serious.  From elsewhere it sounds like the refcount
update cost is significant too.

> I am considering implementing a new disk image format that supports
> base images, snapshots (of the guest state), clones (of the disk
> content); that has a radically simpler design & code base; that is
> always consistent "on disk"; that is friendly to delta diffing (ie.
> space-efficient when used with ZFS snapshots or rsync); and that makes
> use of checksumming & replication to detect & fix corruption of
> critical data structures (ideally this should be implemented by the
> filesystem, unfortunately ZFS is not available everywhere :D).

You have just described a high quality modern filesystem or database
engine; both would certainly be far more complex than qcow2's code.
Especially with checksumming and replication :)

ZFS isn't everywhere, but it looks like everyone wants to clone ZFS's
best features everywhere (but not it's worst feature: lots of memory
required).

I've had similar thoughts myself, by the way :-)

> I believe the key to achieve these (seemingly utopian) goals is to
> represent a disk "image" as a set of sparse files, 1 per
> snapshot/clone.

You can already do this, if your filesystem supports snapshotting.  On
Linux hosts, any filesystem can snapshot by using LVM underneath it
(although it's not pretty to do).  A few experimental Linux
filesystems let you snapshot at the filesystem level.

A feature you missed in the utopian vision is sharing backing store
for equal parts of files between different snapshots _after_ they've
been written in separate branches (with the same data), and also among
different VMs.  It's becoming stylish to put similarity detection in
the filesystem somewhere too :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change

2009-02-14 Thread Jamie Lokier
Dor Laor wrote:
> Both qcow2 and vmdk have the ability to keep 'external' snapshots.

I didn't see any mention of this in QEMU's documentation.  One of the
most annoying features of qcow2 is "savevm" storing all VM snapshots
in the same qcow2 file.  Is this not true?

> In addition to what you wrote, qcow2 is missing journal for its meta 
> data and
> also performs poorly because of complex meta data and sync calls.
> 
> We might use vmdk format or VHD as a base for the future high 
> performing, safe
> image format for qemu

You'll want to validate VHD carefully.  I tested it just yesterday
(with kvm-83), and "qemu-img convert" does not correctly unpack my VHD
image (from Microsoft Virtual PC) to raw, compared with the unpacked
version from MSVPC's own conversion tool.  There's some patches which
greatly improve the VHD support; I'm not sure if they're in kvm-83.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change

2009-02-14 Thread Dor Laor

Marc Bevand wrote:

On Fri, Feb 13, 2009 at 8:23 AM, Jamie Lokier  wrote:
  

Marc..  this is quite a serious bug you've reported.  Is there a
reason you didn't report it earlier?



Because I only started hitting that bug a couple weeks ago after
having upgraded to a buggy kvm version.

  

Is there a way to restructure the code and/or how it works so it's
more clearly correct?



I am seriously concerned about the general design of qcow2. The code
base is more complex than it needs to be, the format itself is
susceptible to race conditions causing cluster leaks when updating
some internal datastructures, it gets easily fragmented, etc.

I am considering implementing a new disk image format that supports
base images, snapshots (of the guest state), clones (of the disk
content); that has a radically simpler design & code base; that is
always consistent "on disk"; that is friendly to delta diffing (ie.
space-efficient when used with ZFS snapshots or rsync); and that makes
use of checksumming & replication to detect & fix corruption of
critical data structures (ideally this should be implemented by the
filesystem, unfortunately ZFS is not available everywhere :D).

I believe the key to achieve these (seemingly utopian) goals is to
represent a disk "image" as a set of sparse files, 1 per
snapshot/clone.
  

Both qcow2 and vmdk have the ability to keep 'external' snapshots.
In addition to what you wrote, qcow2 is missing journal for its meta 
data and

also performs poorly because of complex meta data and sync calls.

We might use vmdk format or VHD as a base for the future high 
performing, safe

image format for qemu
-dor

-marc
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change

2009-02-13 Thread Marc Bevand
On Fri, Feb 13, 2009 at 8:23 AM, Jamie Lokier  wrote:
>
> Marc..  this is quite a serious bug you've reported.  Is there a
> reason you didn't report it earlier?

Because I only started hitting that bug a couple weeks ago after
having upgraded to a buggy kvm version.

> Is there a way to restructure the code and/or how it works so it's
> more clearly correct?

I am seriously concerned about the general design of qcow2. The code
base is more complex than it needs to be, the format itself is
susceptible to race conditions causing cluster leaks when updating
some internal datastructures, it gets easily fragmented, etc.

I am considering implementing a new disk image format that supports
base images, snapshots (of the guest state), clones (of the disk
content); that has a radically simpler design & code base; that is
always consistent "on disk"; that is friendly to delta diffing (ie.
space-efficient when used with ZFS snapshots or rsync); and that makes
use of checksumming & replication to detect & fix corruption of
critical data structures (ideally this should be implemented by the
filesystem, unfortunately ZFS is not available everywhere :D).

I believe the key to achieve these (seemingly utopian) goals is to
represent a disk "image" as a set of sparse files, 1 per
snapshot/clone.

-marc
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change

2009-02-13 Thread Chris Wright
* Jamie Lokier (ja...@shareable.org) wrote:
> no reason to believe kvm-83 is "stable", but there's no reason to
> believe any other version of KVM is especially stable either - there's
> no stabilising bug fix only branch that I'm aware of.

There's ad-hoc one w/out formal releases.  But...never been closer ;-)

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/28179

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change

2009-02-13 Thread Jamie Lokier
Marc Bevand schrieb:
> I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older
> because of the qcow2 performance regression caused by the default
> writethrough caching policy) but it randomly triggers an even worse
> bug: the moment I shut down a guest by typing "quit" in the monitor,
> it sometimes overwrite the first 4kB of the disk image with mostly
> NUL bytes (!) which completely destroys it. I am familiar with the
> qcow2 format and apparently this 4kB block seems to be an L2 table
> with most entries set to zero. I have had to restore at least 6 or 7
> disk images from backup after occurences of that bug.

Ow!  That's a really serious bug.  How many of us have regular hourly
backups of our disk images?  And how many of us are running databases
or mail servers on our VMs, where even restoring from a recent backup
is a harmful event?

I've not noticed this bug reported by Marc, probably because I nearly
always finish a KVM session by killing it, either because I'm testing
or because KVM locks up occasionally and needs kill -9 :-(

And because I've not used any KVM since kvm-72 in production until
recently, only for testing my personal VMs.

I must say, _thank goodness_ that the bug I reported occurs at boot
time, and caused me to revert the qcow2 code.  I'm now running a
crticial VM on kvm-83 with reverted qcow2.  Sure it's risky as there's
no reason to believe kvm-83 is "stable", but there's no reason to
believe any other version of KVM is especially stable either - there's
no stabilising bug fix only branch that I'm aware of.

If I hadn't had the boot time bug which I reported, I could have
unrecoverable corruption instead from Marc's bug.

For the time being, I'm going to _strongly_ advise my VM using
professional clients to never, *ever* use qcow2 except for snapshot
testing.

Unfortunately the other delta/growable formats seem to be even less
reliable, because they're not used much, so they should be avoided too.

This corruption plus the data integrity/durability issues on host
failure are a big deal.  Even with kvm-72, I'm nervous about qcow2 now.
Just because a bug hasn't caused obvious guest failures, doesn't mean
it's not happening.

Is there a way to restructure the code and/or how it works so it's
more clearly correct?

> My intuition tells me this may be the qcow2 code trying to allocate
> a cluster to write a new L2 table, but not noticing the allocation
> failed (represented by a 0 offset), and writing the L2 table at that
> 0 offset, overwriting the qcow2 header.

My intuition says it's important to identify the cause of this, as it
might not be qcow2 but the AIO code going awry with a random offset
when closing down, e.g. if there's a use-after-free bug.

Marc..  this is quite a serious bug you've reported.  Is there a
reason you didn't report it earlier?

-- Jamie 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption observed, fixed by reverting old change

2009-02-13 Thread Kevin Wolf
Hi Marc,

You should not take qemu-devel out of the CC list. This is where the
bugs need to be fixed, they aren't KVM specific. I'm quoting your
complete mail to forward it to where it belongs.

Marc Bevand schrieb:
> Jamie Lokier  shareable.org> writes:
>> As you see from the subject, I'm getting qcow2 corruption.
>>
>> I have a Windows 2000 guest which boots and runs fine in kvm-72, fails
>> with a blue-screen indicating file corruption errors in kvm-73 through
>> to kvm-83 (the latest), and succeeds if I replace block-qcow2.c with
>> the version from kvm-72.
>>
>> The blue screen appears towards the end of the boot sequence, and
>> shows only briefly before rebooting.  It says:
>>
>> STOP: c218 (Registry File Failure)
>> The registry cannot load the hive (file):
>> \SystemRoot\System32\Config\SOFTWARE
>> or its log or alternate.
>> It is corrupt, absent, or not writable.
>>
>> Beginning dump of physical memory
>> Physical memory dump complete. Contact your system administrator or
>> technical support [...?]
> 
> I have got a massive KVM installation with hundreds of guests runnings dozens 
> of
> different OSes, and have also noticed multiple qcow2 corruption bugs. All my
> guests are using the qcow2 format, and my hosts are running vanilla linux 
> 2.6.28
> x86_64 kernels and use NPT (Opteron 'Barcelona' 23xx processors).
> 
> My Windows 2000 guests BSOD just like yours with kvm-73 or newer. I have to 
> run
> kvm-75 (I need the NPT fixes it contains) with block-qcow2.c reverted to the
> version from kvm-72 to fix the BSOD.
> 
> kvm-73+ also causes some of my Windows 2003 guests to exhibit this exact
> registry corruption error:
> http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
> This bug is also fixed by reverting block-qcow2.c to the version from kvm-72.
> 
> I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the
> qcow2 performance regression caused by the default writethrough caching 
> policy)
> but it randomly triggers an even worse bug: the moment I shut down a guest by
> typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk
> image with mostly NUL bytes (!) which completely destroys it. I am familiar 
> with
> the qcow2 format and apparently this 4kB block seems to be an L2 table with 
> most
> entries set to zero. I have had to restore at least 6 or 7 disk images from
> backup after occurences of that bug. My intuition tells me this may be the 
> qcow2
> code trying to allocate a cluster to write a new L2 table, but not noticing 
> the
> allocation failed (represented by a 0 offset), and writing the L2 table at 
> that
> 0 offset, overwriting the qcow2 header.
> 
> Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c 
> reverted
> to its kvm-72 version.
> 
> Basically qcow2 in kvm-73 or newer is completely unreliable.
> 
> -marc

I think the corruption is a completely unrelated bug. I would suspect it
was introduced in one of Gleb's patches in December. Adding him to CC.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption observed, fixed by reverting old change

2009-02-12 Thread Marc Bevand
Jamie Lokier  shareable.org> writes:
> 
> As you see from the subject, I'm getting qcow2 corruption.
> 
> I have a Windows 2000 guest which boots and runs fine in kvm-72, fails
> with a blue-screen indicating file corruption errors in kvm-73 through
> to kvm-83 (the latest), and succeeds if I replace block-qcow2.c with
> the version from kvm-72.
> 
> The blue screen appears towards the end of the boot sequence, and
> shows only briefly before rebooting.  It says:
> 
> STOP: c218 (Registry File Failure)
> The registry cannot load the hive (file):
> \SystemRoot\System32\Config\SOFTWARE
> or its log or alternate.
> It is corrupt, absent, or not writable.
> 
> Beginning dump of physical memory
> Physical memory dump complete. Contact your system administrator or
> technical support [...?]

I have got a massive KVM installation with hundreds of guests runnings dozens of
different OSes, and have also noticed multiple qcow2 corruption bugs. All my
guests are using the qcow2 format, and my hosts are running vanilla linux 2.6.28
x86_64 kernels and use NPT (Opteron 'Barcelona' 23xx processors).

My Windows 2000 guests BSOD just like yours with kvm-73 or newer. I have to run
kvm-75 (I need the NPT fixes it contains) with block-qcow2.c reverted to the
version from kvm-72 to fix the BSOD.

kvm-73+ also causes some of my Windows 2003 guests to exhibit this exact
registry corruption error:
http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
This bug is also fixed by reverting block-qcow2.c to the version from kvm-72.

I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the
qcow2 performance regression caused by the default writethrough caching policy)
but it randomly triggers an even worse bug: the moment I shut down a guest by
typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk
image with mostly NUL bytes (!) which completely destroys it. I am familiar with
the qcow2 format and apparently this 4kB block seems to be an L2 table with most
entries set to zero. I have had to restore at least 6 or 7 disk images from
backup after occurences of that bug. My intuition tells me this may be the qcow2
code trying to allocate a cluster to write a new L2 table, but not noticing the
allocation failed (represented by a 0 offset), and writing the L2 table at that
0 offset, overwriting the qcow2 header.

Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c reverted
to its kvm-72 version.

Basically qcow2 in kvm-73 or newer is completely unreliable.

-marc

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption?

2009-01-09 Thread Ryan Harper
* John Morrissey  [2009-01-08 21:44]:
> On Thu, Jan 08, 2009 at 02:33:28PM -0600, Anthony Liguori wrote:
> > John Morrissey wrote:
> > >I'm encountering what seems like disk corruption when using qcow2 images,
> > >created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> > >
> > >A simple test case is to use the Debian installer (I'm using the lenny
> > >rc1 images from http://www.debian.org/devel/debian-installer/) to install
> > >a new domain. The qcow2 file on disk grows due to the mkfs(8) activity,
> > >then the installer faults while trying to mount the root filesystem
> > >(Invalid argument). 'fdisk -l' shows that the partition table just
> > >created by the installer is gone.
> > 
> > There are patches that touch the block layer.  Please try to reproduce 
> > on vanilla kvm.  I don't trust the debian patches.
> 
> Couldn't reproduce this with Debian packaging minus its patch for
> CVE-2008-0928 (taken from Fedora FWIW), which is the only one touching the
> block layer.
> 
> Upon further scrutiny, I realized I pooched updating the patch for KVM 82.
> The value for the BDRV_O_AUTOGROW constant introduced in that patch collides
> with a new BDRV_ constant introduced between KVM 79 and 82. Changing the
> constant's value (Fedora project has an updated patch, too) fixes this.
> 
> Ryan, this seems to fix the SCSI BUGging, too. I figure you won't want to
> pursue that further?

excellent!  I had seen the error before but only while developing some
new code for the scsi device, so it was a little surprising to see.  If
you can't recreate now, I think we're done. =)

> 
> Sorry for the bother, guys.

np, thanks for testing.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption?

2009-01-08 Thread John Morrissey
On Thu, Jan 08, 2009 at 02:33:28PM -0600, Anthony Liguori wrote:
> John Morrissey wrote:
> >I'm encountering what seems like disk corruption when using qcow2 images,
> >created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> >
> >A simple test case is to use the Debian installer (I'm using the lenny
> >rc1 images from http://www.debian.org/devel/debian-installer/) to install
> >a new domain. The qcow2 file on disk grows due to the mkfs(8) activity,
> >then the installer faults while trying to mount the root filesystem
> >(Invalid argument). 'fdisk -l' shows that the partition table just
> >created by the installer is gone.
> 
> There are patches that touch the block layer.  Please try to reproduce 
> on vanilla kvm.  I don't trust the debian patches.

Couldn't reproduce this with Debian packaging minus its patch for
CVE-2008-0928 (taken from Fedora FWIW), which is the only one touching the
block layer.

Upon further scrutiny, I realized I pooched updating the patch for KVM 82.
The value for the BDRV_O_AUTOGROW constant introduced in that patch collides
with a new BDRV_ constant introduced between KVM 79 and 82. Changing the
constant's value (Fedora project has an updated patch, too) fixes this.

Ryan, this seems to fix the SCSI BUGging, too. I figure you won't want to
pursue that further?

Sorry for the bother, guys.

john
-- 
John Morrissey  _o/\   __o
j...@horde.net_-< \_  /  \     <  \,
www.horde.net/__(_)/_(_)/\___(_) /_(_)__
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption?

2009-01-08 Thread Anthony Liguori

John Morrissey wrote:

I'm encountering what seems like disk corruption when using qcow2 images,
created with 'kvm-img create -f qcow2 image.qcow2 15G'.

A simple test case is to use the Debian installer (I'm using the lenny rc1
images from http://www.debian.org/devel/debian-installer/) to install a new
domain. The qcow2 file on disk grows due to the mkfs(8) activity, then the
installer faults while trying to mount the root filesystem (Invalid
argument). 'fdisk -l' shows that the partition table just created by the
installer is gone.

In a few cases, I've managed to finish an installation, but the resulting
filesystem is strangely corrupt:

# file /usr/bin/w.procps
/usr/bin/w.procps: gzip compressed data, was "aptitude.8", from Unix, last 
modified: Wed Mar 14 14:11:18 2007, max compression

I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same
behavior (disclaimer: Debian has about a dozen patches in their kvm
packaging, but they all seem to be changes to the build/install process or
security-related). Host running KVM is up-to-date Debian lenny
(64-bit/amd64) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64
2.6.26-12).
  


There are patches that touch the block layer.  Please try to reproduce 
on vanilla kvm.  I don't trust the debian patches.


Regards,

Anthony Liguori


john
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption?

2009-01-08 Thread John Morrissey
On Thu, Jan 08, 2009 at 02:10:31PM -0600, Ryan Harper wrote:
> * John Morrissey  [2009-01-08 13:28]:
> > I'm encountering what seems like disk corruption when using qcow2 images,
> > created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> 
> using ide or scsi as your block device?

IDE.

> > A simple test case is to use the Debian installer (I'm using the lenny
> > rc1 images from http://www.debian.org/devel/debian-installer/) to
> > install a new domain. The qcow2 file on disk grows due to the mkfs(8)
> > activity, then the installer faults while trying to mount the root
> > filesystem (Invalid argument). 'fdisk -l' shows that the partition table
> > just created by the installer is gone.
> > 
> > In a few cases, I've managed to finish an installation, but the resulting
> > filesystem is strangely corrupt:
> > 
> > # file /usr/bin/w.procps
> > /usr/bin/w.procps: gzip compressed data, was "aptitude.8", from Unix, last 
> > modified: Wed Mar 14 14:11:18 2007, max compression
> 
> If you are using ide and getting corruption, try again but with creating
> a disk image with the raw format: 
> 
> qemu-img create -f raw  
> 
> That should help track down where the corruption is coming from.

raw images are fine. (Sorry, should've mentioned that in the first place.)

john
-- 
John Morrissey  _o/\   __o
j...@horde.net_-< \_  /  \     <  \,
www.horde.net/__(_)/_(_)/\___(_) /_(_)__
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 corruption?

2009-01-08 Thread Ryan Harper
* John Morrissey  [2009-01-08 13:28]:
> I'm encountering what seems like disk corruption when using qcow2 images,
> created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> 

using ide or scsi as your block device?

> A simple test case is to use the Debian installer (I'm using the lenny rc1
> images from http://www.debian.org/devel/debian-installer/) to install a new
> domain. The qcow2 file on disk grows due to the mkfs(8) activity, then the
> installer faults while trying to mount the root filesystem (Invalid
> argument). 'fdisk -l' shows that the partition table just created by the
> installer is gone.
> 
> In a few cases, I've managed to finish an installation, but the resulting
> filesystem is strangely corrupt:
> 
> # file /usr/bin/w.procps
> /usr/bin/w.procps: gzip compressed data, was "aptitude.8", from Unix, last 
> modified: Wed Mar 14 14:11:18 2007, max compression

If you are using ide and getting corruption, try again but with creating
a disk image with the raw format: 

qemu-img create -f raw  

That should help track down where the corruption is coming from.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html