[Openstack-operators] [neutron] IPv6 Status
Hi, is there a central place where I can find a matrix (or something similar) that shows what is currently supposed to work in the sense of IPv6 Networking? I also had a look at a couple of blueprints out there, but I'm looking for a simple overview containing what's supported, on which features are people working on and what's future. I mean all the good stuff for Tenant Networks like - SNAT - FloatingIP - External Provider Networks - DVR - fwaas, vpnaas,... and also about the Host Network - e.g. vxlan/gre tunneling via ipv6 host network... -- Andreas (irc: scheuran) ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
We're using image member to share images instead of public images because we can share different images with same name for others, and when updating images we replace members of the existing image to new one. Then, we delete the old image when all vms using it are deleted on hypervisors. This operation doesn't increase public image list. Masa On 2015/02/06 6:42, Belmiro Moreira wrote: We don't delete public images from Glance because it breaks migrate/resize and block live migration. Not tested with upstream Kilo, though. As consequence, our public image list has been growing over time... In order to manage image releases we use glance image properties to tag them. Some relevant reviews: https://review.openstack.org/#/c/150337/ https://review.openstack.org/#/c/90321/ Belmiro CERN On Thu, Feb 5, 2015 at 8:16 PM, Kris G. Lindgren klindg...@godaddy.com mailto:klindg...@godaddy.com wrote: In the case of a raw backed qcow2 image (pretty sure that¹s the default) the instances root disk as seen inside the vm is made up of changes made on the instance disk (qcow2 layer) + the base image (raw). Also, remember that as currently coded a resize migration will almost always be a migrate. However, since the vm is successfully running on the old compute node it *should* be a trivial change that if the backing image is no longer available via glance - copy that over to the new host as well. Kris Lindgren Senior Linux Systems Engineer GoDaddy, LLC. On 2/5/15, 11:55 AM, Clint Byrum cl...@fewbar.com mailto:cl...@fewbar.com wrote: Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800: Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) Some thoughts: * All of the operations described should be operating on an image ID. So the other suggestions of renaming seems the right way to go. Ubuntu 14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system for a while. If something inside Nova doesn't work with IDs, it seems like a bug. * rebuild, revert, rescue, and resize, are all very _not_ cloud things that increase the complexity of Nova. Perhaps we should all reconsider their usefulness and encourage our users to spin up new resources, use volumes and/or backup/restore methods, and then tear down old instances. One way to encourage them is to make it clear that these operations will only work for X amount of time before old versions images will be removed. So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues are only guaranteed to work for 6 months. Then aggressively clean up 6 month old image ids. To make this practical, you might even require a role, something like reverter, rescuer, resizer and only allow those roles to do these operations, and then before purging images, notify those users in those roles of instances they won't be able to resize/rescue/revert anymore. It also makes no sense to me why migrating an instance requires its original image. The instance root disk is all that should matter. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- 室井 雅仁(Masahito MUROI) Software Innovation Center, NTT Tel: +81-422-59-4539,FAX: +81-422-59-2699 ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
I always recommend the following: All public images are named generically enough that they can be replaced with a new version of the same name. This helps new instances booting. The prior image is renamed with -OLD-$date. This lets users know that their image has been deprecated. This image is made private so no new instances can be launched. All images include an updated motd that indicates available security updates. We're discussing baking the images with automatic updates, but still haven't reached an agreement. On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch wrote: -Original Message- From: George Shuklin [mailto:george.shuk...@gmail.com javascript:;] Sent: 05 February 2015 14:10 To: openstack-operators@lists.openstack.org javascript:; Subject: [Openstack-operators] How to handle updates of public images? Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? We have a similar problem. For the Horizon based end users, we've defined a panel using image meta data. Details are at http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html . For the CLI users, we propose to use the sort options from Glance to find the latest image of a particular OS. It would be good if there was a way of marking an image as hidden so that it can still be used for snapshots/migration but would not be shown in image list operations. Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org javascript:; http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org javascript:; http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] How to handle updates of public images?
Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [neutron] IPv6 Status
Hi Andreas What about https://wiki.openstack.org/wiki/Neutron/IPv6 ? Or are you looking for a Working Group specialized in this area? In that case, maybe the NFV - Telco Working group can better drive your questions (they are very concerned about IPv6 too): https://wiki.openstack.org/wiki/TelcoWorkingGroup#Technical_Team_Meetings . They meet every wednesday Regards On 2015-02-05 3:14 AM, Andreas Scheuring wrote: Hi, is there a central place where I can find a matrix (or something similar) that shows what is currently supposed to work in the sense of IPv6 Networking? I also had a look at a couple of blueprints out there, but I'm looking for a simple overview containing what's supported, on which features are people working on and what's future. I mean all the good stuff for Tenant Networks like - SNAT - FloatingIP - External Provider Networks - DVR - fwaas, vpnaas,... and also about the Host Network - e.g. vxlan/gre tunneling via ipv6 host network... -- *Marcos Garcia * Technical Sales Engineer - eNovance (a Red Hat company); RHCE, RHCVA, ITIL *PHONE : *(514) – 907 - 0068 *EMAIL :*mgarc...@redhat.com mailto:mgarc...@redhat.com - *SKYPE : *enovance-marcos.garcia** *ADDRESS :* 127 St-Pierre – Montréal (QC) H2Y 2L6, Canada *WEB : *www.enovance.com http://www.enovance.com/ ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote: I always recommend the following: All public images are named generically enough that they can be replaced with a new version of the same name. This helps new instances booting. The prior image is renamed with -OLD-$date. This lets users know that their image has been deprecated. This image is made private so no new instances can be launched. All images include an updated motd that indicates available security updates. I like this approach, but I have the following caveat: What if users are using the uuid of the image instead of the name in some automation scripts that they might have? If we make the -OLD-$date images private, then we just broke their scripts. We're discussing baking the images with automatic updates, but still haven't reached an agreement. On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch wrote: -Original Message- From: George Shuklin [mailto:george.shuk...@gmail.com] Sent: 05 February 2015 14:10 To: openstack-operators@lists.openstack.org Subject: [Openstack-operators] How to handle updates of public images? Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? We have a similar problem. For the Horizon based end users, we've defined a panel using image meta data. Details are at http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html. For the CLI users, we propose to use the sort options from Glance to find the latest image of a particular OS. It would be good if there was a way of marking an image as hidden so that it can still be used for snapshots/migration but would not be shown in image list operations. Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
Updated report for 'no image' with deleted '_base' behaviour in juno (my previous comment was about havana): 1. If snapshot is removed, original image is used (image that was used for 1st instance to produce snapshot). Rather strange and unexpected, but nice (minus one headache). 2. If all images in chain are removed, behaviour changed: * hard reboot works fine (raw disks) * reinstallation asks for new image, seems no problem * rescue causes ugly problem, rendering instance completely broken (do not work but no ERROR state). https://bugs.launchpad.net/nova/+bug/1418590 I didn't test migrations yet. On 02/05/2015 03:09 PM, George Shuklin wrote: Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
In the case of a raw backed qcow2 image (pretty sure that¹s the default) the instances root disk as seen inside the vm is made up of changes made on the instance disk (qcow2 layer) + the base image (raw). Also, remember that as currently coded a resize migration will almost always be a migrate. However, since the vm is successfully running on the old compute node it *should* be a trivial change that if the backing image is no longer available via glance - copy that over to the new host as well. Kris Lindgren Senior Linux Systems Engineer GoDaddy, LLC. On 2/5/15, 11:55 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800: Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) Some thoughts: * All of the operations described should be operating on an image ID. So the other suggestions of renaming seems the right way to go. Ubuntu 14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system for a while. If something inside Nova doesn't work with IDs, it seems like a bug. * rebuild, revert, rescue, and resize, are all very _not_ cloud things that increase the complexity of Nova. Perhaps we should all reconsider their usefulness and encourage our users to spin up new resources, use volumes and/or backup/restore methods, and then tear down old instances. One way to encourage them is to make it clear that these operations will only work for X amount of time before old versions images will be removed. So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues are only guaranteed to work for 6 months. Then aggressively clean up 6 month old image ids. To make this practical, you might even require a role, something like reverter, rescuer, resizer and only allow those roles to do these operations, and then before purging images, notify those users in those roles of instances they won't be able to resize/rescue/revert anymore. It also makes no sense to me why migrating an instance requires its original image. The instance root disk is all that should matter. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
We do exactly this. Public images are named very generically like Ubuntu 14.04. Not even 14.04.1 or something like that. Old images are renamed and made private. Existing instances continue to run, but, as others have mentioned, if a user is using a UUID to launch instances, that will break for them. This is an acceptable trade-offf or us. Our documentation makes mention of this and to use the names. The OpenStack CLI tools as well as Vagrant (the two most used non-Dashboard tools that are used) both support image names, so we haven't run into a UUID-only issue. We have a modified MOTD that lists some different scripts that the user can run, such as: * Using our local apt-cache server (Ubuntu only) * Enabling automatic updates * Install the openstack command-line tools We had a few debates about turning on automatic updates in the images we provide. Ultimately we chose to not enable them and instead go with the MOTD message. There are several reasons why having automatic updates enabled is a benefit, but the single reason that made us not do it is simply if an automatic update breaks the user's instance, it's our fault. It's a very debatable argument. Also, we use Packer to bundle all of this. We have most of it available here: https://github.com/cybera/openstack-images In addition to all of this, we allow users to upload their own images. So if the core set of images we provide doesn't meet their needs, they're free to do create their own solution. On Thu, Feb 5, 2015 at 7:02 AM, Abel Lopez alopg...@gmail.com wrote: I always recommend the following: All public images are named generically enough that they can be replaced with a new version of the same name. This helps new instances booting. The prior image is renamed with -OLD-$date. This lets users know that their image has been deprecated. This image is made private so no new instances can be launched. All images include an updated motd that indicates available security updates. We're discussing baking the images with automatic updates, but still haven't reached an agreement. On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch wrote: -Original Message- From: George Shuklin [mailto:george.shuk...@gmail.com] Sent: 05 February 2015 14:10 To: openstack-operators@lists.openstack.org Subject: [Openstack-operators] How to handle updates of public images? Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? We have a similar problem. For the Horizon based end users, we've defined a panel using image meta data. Details are at http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html . For the CLI users, we propose to use the sort options from Glance to find the latest image of a particular OS. It would be good if there was a way of marking an image as hidden so that it can still be used for snapshots/migration but would not be shown in image list operations. Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
That is a very real concern. This I systems from images being named very uniquely, with versions, dates, etc. To the end user this is ALMOST as hard as a UUID. Easy/generic names encourage users to use them, but there is an aspect of documentation and user training/education on the proper use of name-based automation. On Thursday, February 5, 2015, Marc Heckmann marc.heckm...@ubisoft.com wrote: On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote: I always recommend the following: All public images are named generically enough that they can be replaced with a new version of the same name. This helps new instances booting. The prior image is renamed with -OLD-$date. This lets users know that their image has been deprecated. This image is made private so no new instances can be launched. All images include an updated motd that indicates available security updates. I like this approach, but I have the following caveat: What if users are using the uuid of the image instead of the name in some automation scripts that they might have? If we make the -OLD-$date images private, then we just broke their scripts. We're discussing baking the images with automatic updates, but still haven't reached an agreement. On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch javascript:; wrote: -Original Message- From: George Shuklin [mailto:george.shuk...@gmail.com javascript:;] Sent: 05 February 2015 14:10 To: openstack-operators@lists.openstack.org javascript:; Subject: [Openstack-operators] How to handle updates of public images? Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? We have a similar problem. For the Horizon based end users, we've defined a panel using image meta data. Details are at http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html . For the CLI users, we propose to use the sort options from Glance to find the latest image of a particular OS. It would be good if there was a way of marking an image as hidden so that it can still be used for snapshots/migration but would not be shown in image list operations. Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org javascript:; http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org javascript:; http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org javascript:; http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
On Thu, 2015-02-05 at 06:39 -0800, Abel Lopez wrote: That is a very real concern. This I systems from images being named very uniquely, with versions, dates, etc. To the end user this is ALMOST as hard as a UUID. Easy/generic names encourage users to use them, but there is an aspect of documentation and user training/education on the proper use of name-based automation. Yup, I agree with that. The best approach is probably a tweaked version of what you proposed: Use a generic name for the latest image and rename the outdated ones to something like image name-OLD-$date but don't make them private. The documentation that you provide to your end users should clearly tell them to use image names vs UUIDs and to discourage them from using OLD images. For those that don't read the doc, the naming alone will discourage bad practices. Some sort of automated motd with a big fat warning if the image is older than a certain date would help as well. On Thursday, February 5, 2015, Marc Heckmann marc.heckm...@ubisoft.com wrote: On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote: I always recommend the following: All public images are named generically enough that they can be replaced with a new version of the same name. This helps new instances booting. The prior image is renamed with -OLD-$date. This lets users know that their image has been deprecated. This image is made private so no new instances can be launched. All images include an updated motd that indicates available security updates. I like this approach, but I have the following caveat: What if users are using the uuid of the image instead of the name in some automation scripts that they might have? If we make the -OLD-$date images private, then we just broke their scripts. We're discussing baking the images with automatic updates, but still haven't reached an agreement. On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch wrote: -Original Message- From: George Shuklin [mailto:george.shuk...@gmail.com] Sent: 05 February 2015 14:10 To: openstack-operators@lists.openstack.org Subject: [Openstack-operators] How to handle updates of public images? Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? We have a similar problem. For the Horizon based end users, we've defined a panel using image meta data. Details are at http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html. For the CLI users, we propose to use the sort options from Glance to find the latest image of a particular OS. It would
Re: [Openstack-operators] How to handle updates of public images?
I'm curious: are you using _base files? We're not and we're able to block migrate instances based on deleted images or images that were public but are now private. On Thu, Feb 5, 2015 at 2:42 PM, Belmiro Moreira moreira.belmiro.email.li...@gmail.com wrote: We don't delete public images from Glance because it breaks migrate/resize and block live migration. Not tested with upstream Kilo, though. As consequence, our public image list has been growing over time... In order to manage image releases we use glance image properties to tag them. Some relevant reviews: https://review.openstack.org/#/c/150337/ https://review.openstack.org/#/c/90321/ Belmiro CERN On Thu, Feb 5, 2015 at 8:16 PM, Kris G. Lindgren klindg...@godaddy.com wrote: In the case of a raw backed qcow2 image (pretty sure that¹s the default) the instances root disk as seen inside the vm is made up of changes made on the instance disk (qcow2 layer) + the base image (raw). Also, remember that as currently coded a resize migration will almost always be a migrate. However, since the vm is successfully running on the old compute node it *should* be a trivial change that if the backing image is no longer available via glance - copy that over to the new host as well. Kris Lindgren Senior Linux Systems Engineer GoDaddy, LLC. On 2/5/15, 11:55 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800: Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) Some thoughts: * All of the operations described should be operating on an image ID. So the other suggestions of renaming seems the right way to go. Ubuntu 14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system for a while. If something inside Nova doesn't work with IDs, it seems like a bug. * rebuild, revert, rescue, and resize, are all very _not_ cloud things that increase the complexity of Nova. Perhaps we should all reconsider their usefulness and encourage our users to spin up new resources, use volumes and/or backup/restore methods, and then tear down old instances. One way to encourage them is to make it clear that these operations will only work for X amount of time before old versions images will be removed. So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues are only guaranteed to work for 6 months. Then aggressively clean up 6 month old image ids. To make this practical, you might even require a role, something like reverter, rescuer, resizer and only allow those roles to do these operations, and then before purging images, notify those users in those roles of instances they won't be able to resize/rescue/revert anymore. It also makes no sense to me why migrating an instance requires its original image. The instance root disk is all that should matter. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800: Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) Some thoughts: * All of the operations described should be operating on an image ID. So the other suggestions of renaming seems the right way to go. Ubuntu 14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system for a while. If something inside Nova doesn't work with IDs, it seems like a bug. * rebuild, revert, rescue, and resize, are all very _not_ cloud things that increase the complexity of Nova. Perhaps we should all reconsider their usefulness and encourage our users to spin up new resources, use volumes and/or backup/restore methods, and then tear down old instances. One way to encourage them is to make it clear that these operations will only work for X amount of time before old versions images will be removed. So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues are only guaranteed to work for 6 months. Then aggressively clean up 6 month old image ids. To make this practical, you might even require a role, something like reverter, rescuer, resizer and only allow those roles to do these operations, and then before purging images, notify those users in those roles of instances they won't be able to resize/rescue/revert anymore. It also makes no sense to me why migrating an instance requires its original image. The instance root disk is all that should matter. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Fwd: [Openstack] OpenStack L naming poll
I wanted to make sure everyone on these lists saw the “L” release naming poll is live. The four options are Lizard, Love, Liberty and London, and the deadline to vote is Tuesday at 19:59 UTC. It only takes a second to cast your vote: https://www.surveymonkey.com/r/openstack-l-naming Cheers, Lauren Begin forwarded message: Date: February 4, 2015 at 11:10:51 PM GMT+9 From: Thierry Carrez thie...@openstack.org To: openst...@lists.openstack.org Openstack openst...@lists.openstack.org Subject: [Openstack] OpenStack L naming poll Hi everyone, As you may know, OpenStack development cycles and releases are named after cities or landmarks placed near where the corresponding design summit will happen. We'd like your help again in selecting the right name for the development cycle and release coming after Kilo. Our next summit will happen in Vancouver, BC (Canada) in May. L candidate names were proposed, selected and checked for various issues... leaving 4 candidates on the final public poll. Please take a moment to participate to our poll: https://www.surveymonkey.com/r/openstack-l-naming and order the 4 candidates in your personal order of preference! You can find a quick rationale behind each name at: https://wiki.openstack.org/wiki/Release_Naming The poll closes Tuesday, February 10th at 19:59 UTC (just before the TC IRC meeting where the results will be proclaimed). Thanks! -- Thierry Carrez (ttx) ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Way to check compute - rabbitmq connectivity
Is Mirantis going to have someone at the ops mid-cycle? We were talking about this in the operators channel today and it seemed like pretty much everyone who was active has problems with rabbitmq. Either from maintenance, failovers, or transient issues and having to restart world after rabbitmq hicups to get things to recover. I am thinking if the issue is relatively prevalent it would be nice for people who have either figured it out or have something that is working to discuss their setup. We noticed that miratis has a number of patches to oslo.messaging to fix rabbitmq specific stuff. So I was hoping that someone could come and talk about what Mirantis has done there to make it better and if its there yet and if not what still needs to be done. We use clustered rabbitmq + LB and honestly this config on paper is better but in practice it nothing short of a nightmare. Any maintenance done on rabbitmq (restart/patching ect ect) or the load balancer seems to cause clients to not notice that they are no longer correctly connected to the rabbitmq server and they will sit happily, doing nothing, until they are restarted. We had similar problems listing all of the rabbitmq servers out in the configuration as well. So far my experience has been any maintenance that touches rabbitmq is going to require a restart of all service that communicate on rpc to avoid hard to troubleshoot (IE silent errors) rpc issues. In my experience rabbitmq is pretty much the #1 cause of issues in our environment and I think other operators would agree with that as well. Anything that would make rabbit + openstack more stable would be very welcome. Kris Lindgren Senior Linux Systems Engineer GoDaddy, LLC. On 1/20/15, 8:33 AM, Andrew Woodward xar...@gmail.com wrote: So this is exactly what we (@mirantis) ran into while working on the HA story in Fuel / Mirantis OpenStack. The short message is without heatbeat keepalive, rabbit is un-able to properly keep track of partially open connections resulting consumers (not senders) believing that they have a live connection to rabbit when in-fact they don't. Summary of the parts needed for rabbit HA * rabbit heartbeats (https://review.openstack.org/#/c/146047/) the oslo.messaging team is working to merge this and is well aware its a critical need for rabbit HA. * rabbit_hosts with a list of all rabbit nodes (haproxy should be avoided except for services that don't support rabbit_hosts [list of servers] there are further needs to make haproxy behave properly in HA) * consumer_cancel_notify (CCN)b * rabbit grater than 3.3.0 Optional: * rip failed nodes out of amesa db. We found that rabbit node down discovery was slower than we wanted (minutes) and we can force an election sooner by ripping the failed node out of amesa. (in this case Pacemaker tells us this) we have a master/slave type mechanism in our pacemaker script to perform this. The long message on rabbit connections. Through a quite long process we found that due to the way rabbit uses connection from erlang that it won't close connections, instead rabbit (can) send a consumer cancel notification. The consumer upon receiving this message is supposed to hang-up and reconnect. Otherwise the connection is reaped by the linux kernel when the TCP connection timeout is reached ( 2 Hours ). For publishers they pick up the next time they attempt to send a message to the queue (because it's not acknowledged) and tend to hangup and reconnect on their own. you will observe after removing a rabbit node is that on a compute node ~1/3 rabbit connections re-establishes to the remaining rabbit node(s) while the other leave sockets open to the down server (using netstat, strace, lsof) fixes that don't work well * turning down TCP timeouts (LDPRELOAD or system-wide). While it will shorten from the 2 hour recovery, turning lower than 15 minutes leads to frequent false disconnects and tends towards bad behavior * rabbit in haproxy. This further masks the partial connection problem. Although we stopped using it, it might be better now with heartbeats enabled. * script to check for partial connections in rabbit server and forcibly close them. A partial solution that actually gets the job done the best besides hearbeats. It some times killed innocent connections for us. heartbeats fixes this by running a ping/ack in a separate channel thread. This allows for the consumer to have a response from rabbit that will ensure that the connections have not gone away via stale sockets. When combined with CCN, it works in multiple failure condtions as expected and the rabbit consumers can be healthy within 1 minute. On Mon, Jan 19, 2015 at 2:55 PM, Gustavo Randich gustavo.rand...@gmail.com wrote: In the meantime, I'm using this horrendous script inside compute nodes to check for rabbitmq connectivity. It uses the 'set_host_enabled' rpc call, which in my case is innocuous. This will still result in
Re: [Openstack-operators] Way to check compute - rabbitmq connectivity
It's certainly a pain to diagnose. On Thu, Feb 5, 2015 at 3:19 PM, Kris G. Lindgren klindg...@godaddy.com wrote: Is Mirantis going to have someone at the ops mid-cycle? We were talking about this in the operators channel today and it seemed like pretty much everyone who was active has problems with rabbitmq. Either from maintenance, failovers, or transient issues and having to restart world after rabbitmq hicups to get things to recover. I am thinking if the issue is relatively prevalent it would be nice for people who have either figured it out or have something that is working to discuss their setup. We noticed that miratis has a number of patches to oslo.messaging to fix rabbitmq specific stuff. So I was hoping that someone could come and talk about what Mirantis has done there to make it better and if its there yet and if not what still needs to be done. We use clustered rabbitmq + LB and honestly this config on paper is better but in practice it nothing short of a nightmare. Any maintenance done on rabbitmq (restart/patching ect ect) or the load balancer seems to cause clients to not notice that they are no longer correctly connected to the rabbitmq server and they will sit happily, doing nothing, until they are restarted. We had similar problems listing all of the rabbitmq servers out in the configuration as well. So far my experience has been any maintenance that touches rabbitmq is going to require a restart of all service that communicate on rpc to avoid hard to troubleshoot (IE silent errors) rpc issues. In my experience rabbitmq is pretty much the #1 cause of issues in our environment and I think other operators would agree with that as well. Anything that would make rabbit + openstack more stable would be very welcome. Kris Lindgren Senior Linux Systems Engineer GoDaddy, LLC. On 1/20/15, 8:33 AM, Andrew Woodward xar...@gmail.com wrote: So this is exactly what we (@mirantis) ran into while working on the HA story in Fuel / Mirantis OpenStack. The short message is without heatbeat keepalive, rabbit is un-able to properly keep track of partially open connections resulting consumers (not senders) believing that they have a live connection to rabbit when in-fact they don't. Summary of the parts needed for rabbit HA * rabbit heartbeats (https://review.openstack.org/#/c/146047/) the oslo.messaging team is working to merge this and is well aware its a critical need for rabbit HA. * rabbit_hosts with a list of all rabbit nodes (haproxy should be avoided except for services that don't support rabbit_hosts [list of servers] there are further needs to make haproxy behave properly in HA) * consumer_cancel_notify (CCN)b * rabbit grater than 3.3.0 Optional: * rip failed nodes out of amesa db. We found that rabbit node down discovery was slower than we wanted (minutes) and we can force an election sooner by ripping the failed node out of amesa. (in this case Pacemaker tells us this) we have a master/slave type mechanism in our pacemaker script to perform this. The long message on rabbit connections. Through a quite long process we found that due to the way rabbit uses connection from erlang that it won't close connections, instead rabbit (can) send a consumer cancel notification. The consumer upon receiving this message is supposed to hang-up and reconnect. Otherwise the connection is reaped by the linux kernel when the TCP connection timeout is reached ( 2 Hours ). For publishers they pick up the next time they attempt to send a message to the queue (because it's not acknowledged) and tend to hangup and reconnect on their own. you will observe after removing a rabbit node is that on a compute node ~1/3 rabbit connections re-establishes to the remaining rabbit node(s) while the other leave sockets open to the down server (using netstat, strace, lsof) fixes that don't work well * turning down TCP timeouts (LDPRELOAD or system-wide). While it will shorten from the 2 hour recovery, turning lower than 15 minutes leads to frequent false disconnects and tends towards bad behavior * rabbit in haproxy. This further masks the partial connection problem. Although we stopped using it, it might be better now with heartbeats enabled. * script to check for partial connections in rabbit server and forcibly close them. A partial solution that actually gets the job done the best besides hearbeats. It some times killed innocent connections for us. heartbeats fixes this by running a ping/ack in a separate channel thread. This allows for the consumer to have a response from rabbit that will ensure that the connections have not gone away via stale sockets. When combined with CCN, it works in multiple failure condtions as expected and the rabbit consumers can be healthy within 1 minute. On Mon, Jan 19, 2015 at 2:55 PM, Gustavo Randich gustavo.rand...@gmail.com wrote: