Re: [Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?
On 04/24/2016 10:15 AM, Matt Riedemann wrote: >> > > To clarify, live snapshots aren't disabled by default because they don't > work, it's because at least with libvirt 1.2.2 and QEMU 2.0 (which is > what we test against in the gate), we'd hit a lot of failures (about 25% > failure rate with the devstack/tempest jobs) when running live snapshot, > so we suspect there are concurrency issues when running live snapshot on > a compute along with other operations at the same time (the CI jobs run > 4 tests concurrently on a single-node devstack). > > This might not be an issue on newer libvirt/QEMU, we'll see when we > start testing with Ubuntu 16.04 nodes. Correct. I tried to build a clarifying comment in the config option here (as I realized it wasn't described all that well in docs) - https://review.openstack.org/#/c/309629/ -Sean -- Sean Dague http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?
> I actually have a card in my trello board to implement live snapshots, > pointing to this link > http://www.sebastien-han.fr/blog/2015/02/09/openstack-perform-consistent-snapshots-with-qemu-guest-agent/ > > However, I haven't tested it yet. If you test it let me know how it goes. Hello Antonio, i tried to add to my nova.conf [workarounds] disable_libvirt_livesnapshot=False however it has no effect because I am running Kilo and I have rbd backend. I found that only since recently live snapshots with rbd backend are possible. The following patch is in master (and cherry-picked few days ago in Mitaka). commit 231832354932e26f0d76af1cf1711e701375672b Author: Nicolas Simonds Date: Mon Mar 7 14:46:32 2016 -0800 libvirt: Allow use of live snapshots with RBD snapshot/clone The recently merged functionality for making use of RBD snapshot/clone when available is very valuable for the Ceph/RBD users out there. The new method also makes it possible to do live instance snapshots with Ceph/RBD. However, the current code explicitly forbids it. This patch allows the use of live instance snapshots when a RBD snapshot/clone is performed directly, and reverts back to cold instance snapshot when the old method is used. Co-Authored-By: Nicolas Simonds Change-Id: Ic3a3c73898aa868d6c510639ab12d2401dcb5001 Closes-Bug: #1539179 Antonio what kind of storage backend are you using ? Saverio ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?
On 4/24/2016 2:09 AM, Saverio Proto wrote: We are in an even worst situation: we have flavors with 256GB of ram but only 100GB on the local hard disk, which means that we cannot snapshot VMs with this flavor. If there is any way to avoid saving the content of the ram to disk (or maybe there is a way to snapshot the ram to, e.g., ceph), we would be very happy. Hello Antonio, I received a new feedback in the Openstack patch review (https://review.openstack.org/#/c/295865/) pointing me to this: https://github.com/openstack/nova/blob/82a684fb1ae1dd1bd49e2a8792a2456b4d3ab037/nova/conf/workarounds.py#L72 so it looks like live snapshots are disabled because of an old buggy libvirt. So actually this process of dumping all the RAM to disk is not a bad design, but it is a necessary workaround because of libvirt not being stable. It makes sense now. At the moment I am running libvirt version 1.2.12-0ubuntu14.4~cloud0. Maybe I can disable the workaround and try to do faster snapshots ? Any other operator has feedback about this ? Thank you Saverio ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators To clarify, live snapshots aren't disabled by default because they don't work, it's because at least with libvirt 1.2.2 and QEMU 2.0 (which is what we test against in the gate), we'd hit a lot of failures (about 25% failure rate with the devstack/tempest jobs) when running live snapshot, so we suspect there are concurrency issues when running live snapshot on a compute along with other operations at the same time (the CI jobs run 4 tests concurrently on a single-node devstack). This might not be an issue on newer libvirt/QEMU, we'll see when we start testing with Ubuntu 16.04 nodes. -- Thanks, Matt Riedemann ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?
I actually have a card in my trello board to implement live snapshots, pointing to this link http://www.sebastien-han.fr/blog/2015/02/09/openstack-perform-consistent-snapshots-with-qemu-guest-agent/ However, I haven't tested it yet. If you test it let me know how it goes. .a. On Sun, Apr 24, 2016 at 9:09 AM, Saverio Proto wrote: >> We are in an even worst situation: we have flavors with 256GB of ram >> but only 100GB on the local hard disk, which means that we cannot >> snapshot VMs with this flavor. >> >> If there is any way to avoid saving the content of the ram to disk (or >> maybe there is a way to snapshot the ram to, e.g., ceph), we would be >> very happy. > > Hello Antonio, > > I received a new feedback in the Openstack patch review > (https://review.openstack.org/#/c/295865/) pointing me to this: > > https://github.com/openstack/nova/blob/82a684fb1ae1dd1bd49e2a8792a2456b4d3ab037/nova/conf/workarounds.py#L72 > > so it looks like live snapshots are disabled because of an old buggy > libvirt. So actually this process of dumping all the RAM to disk is > not a bad design, but it is a necessary workaround because of libvirt > not being stable. It makes sense now. > > At the moment I am running libvirt version 1.2.12-0ubuntu14.4~cloud0. > Maybe I can disable the workaround and try to do faster snapshots ? > > Any other operator has feedback about this ? > > Thank you > > Saverio -- antonio.s.mess...@gmail.com antonio.mess...@uzh.ch +41 (0)44 635 42 22 S3IT: Service and Support for Science IT http://www.s3it.uzh.ch/ University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?
> We are in an even worst situation: we have flavors with 256GB of ram > but only 100GB on the local hard disk, which means that we cannot > snapshot VMs with this flavor. > > If there is any way to avoid saving the content of the ram to disk (or > maybe there is a way to snapshot the ram to, e.g., ceph), we would be > very happy. Hello Antonio, I received a new feedback in the Openstack patch review (https://review.openstack.org/#/c/295865/) pointing me to this: https://github.com/openstack/nova/blob/82a684fb1ae1dd1bd49e2a8792a2456b4d3ab037/nova/conf/workarounds.py#L72 so it looks like live snapshots are disabled because of an old buggy libvirt. So actually this process of dumping all the RAM to disk is not a bad design, but it is a necessary workaround because of libvirt not being stable. It makes sense now. At the moment I am running libvirt version 1.2.12-0ubuntu14.4~cloud0. Maybe I can disable the workaround and try to do faster snapshots ? Any other operator has feedback about this ? Thank you Saverio ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?
We are in an even worst situation: we have flavors with 256GB of ram but only 100GB on the local hard disk, which means that we cannot snapshot VMs with this flavor. If there is any way to avoid saving the content of the ram to disk (or maybe there is a way to snapshot the ram to, e.g., ceph), we would be very happy. .a. On Sat, Apr 23, 2016 at 12:31 AM, Saverio Proto wrote: > Hello Operators, > > one of the users of our cluster opened a ticket about a snapshot > corner case. It is not possible to snapshot a instance that is booted > from volume when the instance is paused. So I wrote this patch, and > from the discussion you can see that I learnt a lot about snapshots. > https://review.openstack.org/#/c/295865/ > > Discussing about the patch I found something that I found totally > strange, so I want to check with the community if this is the expected > behavior. > > Scenario: > Openstack Kilo > libvirt > rbd storage for the images > instance booted from image > > Now the developers pointed to the fact that when I snapshot an active > instance, nova makes a "managedSave" > https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainManagedSave > > I thought there was a misunderstanding, because I did not see the > point of dumping the all content of the RAM to disk. > > I was surprised to check on the hypervisor where the instance is > scheduled that I really see two temporary files created during the > snapshotting process > > As soon as you click "snapshot" you will see this file: > > /var/lib/libvirt/qemu/save/instance-0001cf8c.save > > This file will have the size of the RAM memory of the instance. In my > case I had to wait for 32Gb of RAM to be written to disk. > > Once that is finished this second process starts. > > qemu-img convert -O raw > rbd:volumes/ee3a84c3-b870-4669-8847-6b9ac93a8eac_disk:id=cinder:conf=/etc/ceph/ceph.conf > /var/lib/nova/instances/snapshots/ > > ok this convert is also slow but already fixed in Mitaka: > Problem description - > http://www.sebastien-han.fr/blog/2015/10/05/openstack-nova-snapshots-on-ceph-rbd/ > Patches that should solve the problem: > https://review.openstack.org/#/c/205282/ > https://review.openstack.org/#/c/188244/ > Merged for Mitaka - > https://blueprints.launchpad.net/nova/+spec/rbd-instance-snapshots > > > as a result you have a file with a name that looks like a uuid in this > other folder: > > ls /var/lib/nova/instances/snapshots/tmpWsKqvl/ > 51574e9140204c0f89c7d86fcf741579 > > So this means that when we take a snapshot of an active instance, we > dump all the RAM memory into a temp file. > > This has an impact for us because we have flavors with 32Gb of RAM. > Because our instances are completely rbd backed, we have small disks > on the compute nodes. > Also, it takes time to dump to disk 32Gb of RAM for nothing !! > > So, is calling managedSave the intedend behavior ? Or nova should just > make a call to libvirt to make sure that filesystem caches are written > to disk before snapshotting ? > > I tracked this call in the git, and looks like nova is implemented > this way since 2012. > > Please opeators tell me that I configured something wrong and this is > not really how snapshots are implemented :) Or explain why the dump of > the all RAM memory is needed :) > > Any feedback is appreciated !! > > Saverio > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- antonio.s.mess...@gmail.com antonio.mess...@uzh.ch +41 (0)44 635 42 22 S3IT: Service and Support for Science IT http://www.s3it.uzh.ch/ University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?
Hello Operators, one of the users of our cluster opened a ticket about a snapshot corner case. It is not possible to snapshot a instance that is booted from volume when the instance is paused. So I wrote this patch, and from the discussion you can see that I learnt a lot about snapshots. https://review.openstack.org/#/c/295865/ Discussing about the patch I found something that I found totally strange, so I want to check with the community if this is the expected behavior. Scenario: Openstack Kilo libvirt rbd storage for the images instance booted from image Now the developers pointed to the fact that when I snapshot an active instance, nova makes a "managedSave" https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainManagedSave I thought there was a misunderstanding, because I did not see the point of dumping the all content of the RAM to disk. I was surprised to check on the hypervisor where the instance is scheduled that I really see two temporary files created during the snapshotting process As soon as you click "snapshot" you will see this file: /var/lib/libvirt/qemu/save/instance-0001cf8c.save This file will have the size of the RAM memory of the instance. In my case I had to wait for 32Gb of RAM to be written to disk. Once that is finished this second process starts. qemu-img convert -O raw rbd:volumes/ee3a84c3-b870-4669-8847-6b9ac93a8eac_disk:id=cinder:conf=/etc/ceph/ceph.conf /var/lib/nova/instances/snapshots/ ok this convert is also slow but already fixed in Mitaka: Problem description - http://www.sebastien-han.fr/blog/2015/10/05/openstack-nova-snapshots-on-ceph-rbd/ Patches that should solve the problem: https://review.openstack.org/#/c/205282/ https://review.openstack.org/#/c/188244/ Merged for Mitaka - https://blueprints.launchpad.net/nova/+spec/rbd-instance-snapshots as a result you have a file with a name that looks like a uuid in this other folder: ls /var/lib/nova/instances/snapshots/tmpWsKqvl/ 51574e9140204c0f89c7d86fcf741579 So this means that when we take a snapshot of an active instance, we dump all the RAM memory into a temp file. This has an impact for us because we have flavors with 32Gb of RAM. Because our instances are completely rbd backed, we have small disks on the compute nodes. Also, it takes time to dump to disk 32Gb of RAM for nothing !! So, is calling managedSave the intedend behavior ? Or nova should just make a call to libvirt to make sure that filesystem caches are written to disk before snapshotting ? I tracked this call in the git, and looks like nova is implemented this way since 2012. Please opeators tell me that I configured something wrong and this is not really how snapshots are implemented :) Or explain why the dump of the all RAM memory is needed :) Any feedback is appreciated !! Saverio ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators