Hello,

We are using NFS storage.  It is actually native NFS mounts on a NetApp storage 
system.  We haven't seen those log entries, but we also don't always know when 
a VM gets corrupted...  When we finally get a call that a VM is having issues, 
we've found that it was corrupted a while ago.


-----Original Message-----
From: cloudstack-fan [mailto:cloudstack-...@protonmail.com.INVALID] 
Sent: Sunday, January 27, 2019 1:45 PM
To: us...@cloudstack.apache.org
Cc: dev@cloudstack.apache.org
Subject: Re: Snapshots on KVM corrupting disk images

Hello Sean,

It seems that you've encountered the same issue that I've been facing during 
the last 5-6 years of using ACS with KVM hosts (see this thread, if you're 
interested in additional details: 
https://mail-archives.apache.org/mod_mbox/cloudstack-users/201807.mbox/browser).

I'd like to state that creating snapshots of a running virtual machine is a bit 
risky. I've implemented some workarounds in my environment, but I'm still not 
sure that they are 100% effective.

I have a couple of questions, if you don't mind. What kind of storage do you 
use, if it's not a secret? Does you storage use XFS as a filesystem? Did you 
see something like this in your log-files?
[***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size 65552 in 
kmem_realloc (mode:0x250) [***.***] XFS: qemu-kvm(***) possible memory 
allocation deadlock size 65552 in kmem_realloc (mode:0x250) [***.***] XFS: 
qemu-kvm(***) possible memory allocation deadlock size 65552 in kmem_realloc 
(mode:0x250) Did you see any unusual messages in your log-file when the 
disaster happened?

I hope, things will be well. Wish you good luck and all the best!


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, 22 January 2019 18:30, Sean Lair <sl...@ippathways.com> wrote:

> Hi all,
>
> We had some instances where VM disks are becoming corrupted when using KVM 
> snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7.
>
> The first time was when someone mass-enabled scheduled snapshots on a lot of 
> large number VMs and secondary storage filled up. We had to restore all those 
> VM disks... But believed it was just our fault with letting secondary storage 
> fill up.
>
> Today we had an instance where a snapshot failed and now the disk image is 
> corrupted and the VM can't boot. here is the output of some commands:
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ------------------------------------------------
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': 
> Could not read snapshots: File too large
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': 
> Could not read snapshots: File too large
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> -rw-r--r--. 1 root root 73G Jan 22 11:04 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -----------------------------------------------------------
>
> We tried restoring to before the snapshot failure, but still have strange 
> errors:
>
> ----------------------------------------------------------------------
> --------------
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> -rw-r--r--. 1 root root 73G Jan 22 11:04 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> file format: qcow2
> virtual size: 50G (53687091200 bytes)
> disk size: 73G
> cluster_size: 65536
> Snapshot list:
> ID TAG VM SIZE DATE VM CLOCK
> 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 
> 3099:35:55.242
> 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 
> 3431:52:23.942 Format specific information:
> compat: 1.1
> lazy refcounts: false
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check 
> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> tcmalloc: large alloc 1539750010880 bytes == (nil) @ 0x7fb9cbbf7bf3 
> 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc 0x55d16ddf2541 
> 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 0x55d16de373e6 0x7fb9c63a3c05 
> 0x55d16ddd9f7d No errors were found on the image.
>
> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img 
> snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> Snapshot list:
> ID TAG VM SIZE DATE VM CLOCK
> 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 
> 3099:35:55.242
> 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 
> 3431:52:23.942
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ---------------------------------------------------------------
>
> Everyone is now extremely hesitant to use snapshots in KVM.... We tried 
> deleting the snapshots in the restored disk image, but it errors out...
>
> Does anyone else have issues with KVM snapshots? We are considering just 
> disabling this functionality now...
>
> Thanks
> Sean


Reply via email to