Sounds good, I think something needs to be done. Very scary that users can corrupt their VMs if they are doing volume snapshots
-----Original Message----- From: Ivan Kudryavtsev [mailto:kudryavtsev...@bw-sw.com] Sent: Sunday, January 27, 2019 7:29 PM To: users <us...@cloudstack.apache.org>; cloudstack-fan <cloudstack-...@protonmail.com> Cc: dev <dev@cloudstack.apache.org> Subject: Re: Snapshots on KVM corrupting disk images Well, guys. I dived into CS agent scripts, which make volume snapshots and found there are no code for suspend/resume and also no code for qemu-agent call fsfreeze/fsthaw. I don't see any blockers adding that code yet and try to add it in nearest days. If tests go well, I'll publish the PR, which I suppose could be integrated into 4.11.3. пн, 28 янв. 2019 г., 2:45 cloudstack-fan cloudstack-...@protonmail.com.invalid: > Hello Sean, > > It seems that you've encountered the same issue that I've been facing > during the last 5-6 years of using ACS with KVM hosts (see this > thread, if you're interested in additional details: > https://mail-archives.apache.org/mod_mbox/cloudstack-users/201807.mbox > /browser > ). > > I'd like to state that creating snapshots of a running virtual machine > is a bit risky. I've implemented some workarounds in my environment, > but I'm still not sure that they are 100% effective. > > I have a couple of questions, if you don't mind. What kind of storage > do you use, if it's not a secret? Does you storage use XFS as a filesystem? > Did you see something like this in your log-files? > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size > 65552 in kmem_realloc (mode:0x250) > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size > 65552 in kmem_realloc (mode:0x250) > [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size > 65552 in kmem_realloc (mode:0x250) > Did you see any unusual messages in your log-file when the disaster > happened? > > I hope, things will be well. Wish you good luck and all the best! > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Tuesday, 22 January 2019 18:30, Sean Lair <sl...@ippathways.com> wrote: > > > Hi all, > > > > We had some instances where VM disks are becoming corrupted when > > using > KVM snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7. > > > > The first time was when someone mass-enabled scheduled snapshots on > > a > lot of large number VMs and secondary storage filled up. We had to > restore all those VM disks... But believed it was just our fault with > letting secondary storage fill up. > > > > Today we had an instance where a snapshot failed and now the disk > > image > is corrupted and the VM can't boot. here is the output of some commands: > > > > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ------------------------------------------------ > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > check > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': > > Could > not read snapshots: File too large > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > info > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': > > Could > not read snapshots: File too large > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > -rw-r--r--. 1 root root 73G Jan 22 11:04 > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ----------------------------------------------------------- > > > > We tried restoring to before the snapshot failure, but still have > strange errors: > > > > > ---------------------------------------------------------------------- > -------------- > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > -rw-r--r--. 1 root root 73G Jan 22 11:04 > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > info > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > file format: qcow2 > > virtual size: 50G (53687091200 bytes) disk size: 73G > > cluster_size: 65536 > > Snapshot list: > > ID TAG VM SIZE DATE VM CLOCK > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > 3099:35:55.242 > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > 3431:52:23.942 > > Format specific information: > > compat: 1.1 > > lazy refcounts: false > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > > check > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > tcmalloc: large alloc 1539750010880 bytes == (nil) @ 0x7fb9cbbf7bf3 > 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc > 0x55d16ddf2541 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db > 0x55d16de373e6 0x7fb9c63a3c05 0x55d16ddd9f7d > > No errors were found on the image. > > > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > Snapshot list: > > ID TAG VM SIZE DATE VM CLOCK > > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > 3099:35:55.242 > > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > 3431:52:23.942 > > > > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > --------------------------------------------------------------- > > > > Everyone is now extremely hesitant to use snapshots in KVM.... We > > tried > deleting the snapshots in the restored disk image, but it errors out... > > > > Does anyone else have issues with KVM snapshots? We are considering > > just > disabling this functionality now... > > > > Thanks > > Sean > > >