Thanks Wei! We really appreciate the response and the link. Shouldn't we be doing something to stop the ability to use snapshots (scheduled and other snapshot operations) in CloudStack?
-----Original Message----- From: Wei ZHOU [mailto:[email protected]] Sent: Tuesday, January 22, 2019 4:06 PM To: [email protected] Subject: Re: Snapshots on KVM corrupting disk images Hi Sean, The (recurring) volume snapshot on running vms should be disabled in cloudstack. According to some discussions (for example https://bugzilla.redhat.com/show_bug.cgi?id=920020), the image might be corrupted due to the concurrent read/write operations in volume snapshot (by qemu-img snapshot). ``` qcow2 images must not be used in read-write mode from two processes at the same time. You can either have them opened either by one read-write process or by many read-only processes. Having one (paused) read-write process (the running VM) and additional read-only processes (copying out a snapshot with qemu-img) may happen to work in practice, but you're on your own and we won't give support for such attempts. ``` The safe way to take a volume snapshot of running vm is (1) take a vm snapshot (vm will be paused) (2) then create a volume snapshot from the vm snapshot -Wei Sean Lair <[email protected]> 于2019年1月22日周二 下午5:30写道: > Hi all, > > We had some instances where VM disks are becoming corrupted when using > KVM snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7. > > The first time was when someone mass-enabled scheduled snapshots on a > lot of large number VMs and secondary storage filled up. We had to > restore all those VM disks... But believed it was just our fault with > letting secondary storage fill up. > > Today we had an instance where a snapshot failed and now the disk > image is corrupted and the VM can't boot. here is the output of some > commands: > > ----------------------- > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': > Could not read snapshots: File too large > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': > Could not read snapshots: File too large > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > -rw-r--r--. 1 root root 73G Jan 22 11:04 > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > ----------------------- > > We tried restoring to before the snapshot failure, but still have > strange > errors: > > ---------------------- > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > -rw-r--r--. 1 root root 73G Jan 22 11:04 > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > file format: qcow2 > virtual size: 50G (53687091200 bytes) > disk size: 73G > cluster_size: 65536 > Snapshot list: > ID TAG VM SIZE DATE VM CLOCK > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > 3099:35:55.242 > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > 3431:52:23.942 > Format specific information: > compat: 1.1 > lazy refcounts: false > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > tcmalloc: large alloc 1539750010880 bytes == (nil) @ 0x7fb9cbbf7bf3 > 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc > 0x55d16ddf2541 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db > 0x55d16de373e6 0x7fb9c63a3c05 0x55d16ddd9f7d No errors were found on > the image. > > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img > snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 > Snapshot list: > ID TAG VM SIZE DATE VM CLOCK > 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 > 3099:35:55.242 > 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 > 3431:52:23.942 > -------------------------- > > Everyone is now extremely hesitant to use snapshots in KVM.... We > tried deleting the snapshots in the restored disk image, but it errors out... > > > Does anyone else have issues with KVM snapshots? We are considering > just disabling this functionality now... > > Thanks > Sean > > > > > > >
