[Openstack-operators] [neutron] IPv6 Status

2015-02-05 Thread Andreas Scheuring
Hi, 

is there a central place where I can find a matrix (or something
similar) that shows what is currently supposed to work in the sense of
IPv6 Networking?

I also had a look at a couple of blueprints out there, but I'm looking
for a simple overview containing what's supported, on which features are
people working on and what's future. I mean all the good stuff for
Tenant Networks like 

- SNAT
- FloatingIP
- External Provider Networks
- DVR
- fwaas, vpnaas,...

and also about the Host Network
- e.g. vxlan/gre tunneling via ipv6 host network...


-- 
Andreas 
(irc: scheuran)




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Masahito MUROI
We're using image member to share images instead of public images 
because we can share different images with same name for others, and 
when updating images we replace members of the existing image to new 
one. Then, we delete the old image when all vms using it are deleted on 
hypervisors.


This operation doesn't increase public image list.

Masa

On 2015/02/06 6:42, Belmiro Moreira wrote:

We don't delete public images from Glance because it breaks
migrate/resize and block live migration. Not tested with upstream Kilo,
though.
As consequence, our public image list has been growing over time...

In order to manage image releases we use glance image properties to
tag them.

Some relevant reviews:
https://review.openstack.org/#/c/150337/
https://review.openstack.org/#/c/90321/

Belmiro
CERN

On Thu, Feb 5, 2015 at 8:16 PM, Kris G. Lindgren klindg...@godaddy.com
mailto:klindg...@godaddy.com wrote:

In the case of a raw backed qcow2 image (pretty sure that¹s the default)
the instances root disk as seen inside the vm is made up of changes made
on the instance disk (qcow2 layer) + the base image (raw).  Also,
remember
that as currently coded a resize migration will almost always be a
migrate.  However, since the vm is successfully running on the old
compute
node it *should* be a trivial change that if the backing image is no
longer available via glance - copy that over to the new host as well.


Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.




On 2/5/15, 11:55 AM, Clint Byrum cl...@fewbar.com
mailto:cl...@fewbar.com wrote:

 Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800:
  Hello everyone.
 
  We are updating our public images regularly (to provide them to
  customers in up-to-date state). But there is a problem: If some
 instance
  starts from image it becomes 'used'. That means:
  * That image is used as _base for nova
  * If instance is reverted this image is used to recreate
instance's disk
  * If instance is rescued this image is used as rescue base
  * It is redownloaded during resize/migration (on a new compute node)
 
 
 Some thoughts:
 
 * All of the operations described should be operating on an image
ID. So
 the other suggestions of renaming seems the right way to go. Ubuntu
 14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the
system
 for a while. If something inside Nova doesn't work with IDs, it seems
 like a bug.
 
 * rebuild, revert, rescue, and resize, are all very _not_ cloud things
 that increase the complexity of Nova. Perhaps we should all reconsider
 their usefulness and encourage our users to spin up new resources, use
 volumes and/or backup/restore methods, and then tear down old
instances.
 
 One way to encourage them is to make it clear that these
operations will
 only work for X amount of time before old versions images will be
removed.
 So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues
 are only guaranteed to work for 6 months. Then aggressively clean up 
 6 month old image ids. To make this practical, you might even require
 a role, something like reverter, rescuer, resizer and only allow
 those roles to do these operations, and then before purging images,
 notify those users in those roles of instances they won't be able to
 resize/rescue/revert anymore.
 
 It also makes no sense to me why migrating an instance requires its
 original image. The instance root disk is all that should matter.
 
 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




--
室井 雅仁(Masahito MUROI)
Software Innovation Center, NTT
Tel: +81-422-59-4539,FAX: +81-422-59-2699


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Abel Lopez
I always recommend the following:
All public images are named generically enough that they can be replaced
with a new version of the same name. This helps new instances booting.
The prior image is renamed with -OLD-$date. This lets users know that their
image has been deprecated. This image is made private so no new instances
can be launched.
All images include an updated motd that indicates available security
updates.

We're discussing baking the images with automatic updates, but still
haven't reached an agreement.

On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch wrote:

  -Original Message-
  From: George Shuklin [mailto:george.shuk...@gmail.com javascript:;]
  Sent: 05 February 2015 14:10
  To: openstack-operators@lists.openstack.org javascript:;
  Subject: [Openstack-operators] How to handle updates of public images?
 
  Hello everyone.
 
  We are updating our public images regularly (to provide them to
 customers in
  up-to-date state). But there is a problem: If some instance starts from
 image it
  becomes 'used'. That means:
  * That image is used as _base for nova
  * If instance is reverted this image is used to recreate instance's disk
  * If instance is rescued this image is used as rescue base
  * It is redownloaded during resize/migration (on a new compute node)
 
  One more (our specific):
  We're using raw disks with _base on slow SATA drives (in comparison to
 fast SSD
  for disks), and if that SATA fails, we replace it (and nova redownloads
 stuff in
  _base).
 
  If image is deleted, it causes problems with nova (nova can't download
 _base).
 
  The second part of the problem: glance disallows to update image (upload
 new
  image with same ID), so we're forced to upload updated image with new ID
 and
  to remove the old one. This causes problems described above.
  And if tenant boots from own snapshot and removes snapshot without
 removing
  instance, it causes same problem even without our activity.
 
  How do you handle public image updates in your case?
 

 We have a similar problem. For the Horizon based end users, we've defined
 a panel using image meta data. Details are at
 http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html
 .

 For the CLI users, we propose to use the sort options from Glance to find
 the latest image of a particular OS.

 It would be good if there was a way of marking an image as hidden so that
 it can still be used for snapshots/migration but would not be shown in
 image list operations.

  Thanks!
 
  ___
  OpenStack-operators mailing list
  OpenStack-operators@lists.openstack.org javascript:;
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org javascript:;
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] How to handle updates of public images?

2015-02-05 Thread George Shuklin

Hello everyone.

We are updating our public images regularly (to provide them to 
customers in up-to-date state). But there is a problem: If some instance 
starts from image it becomes 'used'. That means:

* That image is used as _base for nova
* If instance is reverted this image is used to recreate instance's disk
* If instance is rescued this image is used as rescue base
* It is redownloaded during resize/migration (on a new compute node)

One more (our specific):
We're using raw disks with _base on slow SATA drives (in comparison to 
fast SSD for disks), and if that SATA fails, we replace it (and nova 
redownloads stuff in _base).


If image is deleted, it causes problems with nova (nova can't download 
_base).


The second part of the problem: glance disallows to update image (upload 
new image with same ID), so we're forced to upload updated image with 
new ID and to remove the old one. This causes problems described above. 
And if tenant boots from own snapshot and removes snapshot without 
removing instance, it causes same problem even without our activity.


How do you handle public image updates in your case?

Thanks!

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [neutron] IPv6 Status

2015-02-05 Thread Marcos Garcia
Hi Andreas

What about https://wiki.openstack.org/wiki/Neutron/IPv6 ?
Or are you looking for a Working Group specialized in this area? In that
case, maybe the NFV - Telco Working group can better drive your
questions (they are very concerned about IPv6 too):
https://wiki.openstack.org/wiki/TelcoWorkingGroup#Technical_Team_Meetings .
They meet every wednesday

Regards

On 2015-02-05 3:14 AM, Andreas Scheuring wrote:
 Hi, 

 is there a central place where I can find a matrix (or something
 similar) that shows what is currently supposed to work in the sense of
 IPv6 Networking?

 I also had a look at a couple of blueprints out there, but I'm looking
 for a simple overview containing what's supported, on which features are
 people working on and what's future. I mean all the good stuff for
 Tenant Networks like 

 - SNAT
 - FloatingIP
 - External Provider Networks
 - DVR
 - fwaas, vpnaas,...

 and also about the Host Network
 - e.g. vxlan/gre tunneling via ipv6 host network...



-- 

*Marcos Garcia
*
Technical Sales Engineer - eNovance (a Red Hat company); RHCE, RHCVA, ITIL

*PHONE : *(514) – 907 - 0068 *EMAIL :*mgarc...@redhat.com
mailto:mgarc...@redhat.com - *SKYPE : *enovance-marcos.garcia**
*ADDRESS :* 127 St-Pierre – Montréal (QC) H2Y 2L6, Canada *WEB
: *www.enovance.com http://www.enovance.com/



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Marc Heckmann
On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote:
 I always recommend the following:
 All public images are named generically enough that they can be
 replaced with a new version of the same name. This helps new instances
 booting. 
 The prior image is renamed with -OLD-$date. This lets users know that
 their image has been deprecated. This image is made private so no new
 instances can be launched. 
 All images include an updated motd that indicates available security
 updates. 

I like this approach, but I have the following caveat: What if users are
using the uuid of the image instead of the name in some automation
scripts that they might have? If we make the -OLD-$date images
private, then we just broke their scripts.

 
 
 We're discussing baking the images with automatic updates, but still
 haven't reached an agreement. 
 
 On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch wrote:
  -Original Message-
  From: George Shuklin [mailto:george.shuk...@gmail.com]
  Sent: 05 February 2015 14:10
  To: openstack-operators@lists.openstack.org
  Subject: [Openstack-operators] How to handle updates of
 public images?
 
  Hello everyone.
 
  We are updating our public images regularly (to provide them
 to customers in
  up-to-date state). But there is a problem: If some instance
 starts from image it
  becomes 'used'. That means:
  * That image is used as _base for nova
  * If instance is reverted this image is used to recreate
 instance's disk
  * If instance is rescued this image is used as rescue base
  * It is redownloaded during resize/migration (on a new
 compute node)
 
  One more (our specific):
  We're using raw disks with _base on slow SATA drives (in
 comparison to fast SSD
  for disks), and if that SATA fails, we replace it (and nova
 redownloads stuff in
  _base).
 
  If image is deleted, it causes problems with nova (nova
 can't download _base).
 
  The second part of the problem: glance disallows to update
 image (upload new
  image with same ID), so we're forced to upload updated image
 with new ID and
  to remove the old one. This causes problems described above.
  And if tenant boots from own snapshot and removes snapshot
 without removing
  instance, it causes same problem even without our activity.
 
  How do you handle public image updates in your case?
 
 
 We have a similar problem. For the Horizon based end users,
 we've defined a panel using image meta data. Details are at
 
 http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html.
 
 For the CLI users, we propose to use the sort options from
 Glance to find the latest image of a particular OS.
 
 It would be good if there was a way of marking an image as
 hidden so that it can still be used for snapshots/migration
 but would not be shown in image list operations.
 
  Thanks!
 
  ___
  OpenStack-operators mailing list
  OpenStack-operators@lists.openstack.org
 
 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 
 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread George Shuklin
Updated report for 'no image' with deleted '_base' behaviour in juno (my 
previous comment was about havana):


1. If snapshot is removed, original image is used (image that was used 
for 1st instance to produce snapshot). Rather strange and unexpected, 
but nice (minus one headache).

2. If all images in chain are removed, behaviour changed:
* hard reboot works fine (raw disks)
* reinstallation asks for new image, seems no problem
* rescue causes ugly problem, rendering instance completely broken (do 
not work but no ERROR state). https://bugs.launchpad.net/nova/+bug/1418590


I didn't test migrations yet.

On 02/05/2015 03:09 PM, George Shuklin wrote:

Hello everyone.

We are updating our public images regularly (to provide them to 
customers in up-to-date state). But there is a problem: If some 
instance starts from image it becomes 'used'. That means:

* That image is used as _base for nova
* If instance is reverted this image is used to recreate instance's disk
* If instance is rescued this image is used as rescue base
* It is redownloaded during resize/migration (on a new compute node)

One more (our specific):
We're using raw disks with _base on slow SATA drives (in comparison to 
fast SSD for disks), and if that SATA fails, we replace it (and nova 
redownloads stuff in _base).


If image is deleted, it causes problems with nova (nova can't download 
_base).


The second part of the problem: glance disallows to update image 
(upload new image with same ID), so we're forced to upload updated 
image with new ID and to remove the old one. This causes problems 
described above. And if tenant boots from own snapshot and removes 
snapshot without removing instance, it causes same problem even 
without our activity.


How do you handle public image updates in your case?

Thanks!



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Kris G. Lindgren
In the case of a raw backed qcow2 image (pretty sure that¹s the default)
the instances root disk as seen inside the vm is made up of changes made
on the instance disk (qcow2 layer) + the base image (raw).  Also, remember
that as currently coded a resize migration will almost always be a
migrate.  However, since the vm is successfully running on the old compute
node it *should* be a trivial change that if the backing image is no
longer available via glance - copy that over to the new host as well.

 
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.




On 2/5/15, 11:55 AM, Clint Byrum cl...@fewbar.com wrote:

Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800:
 Hello everyone.
 
 We are updating our public images regularly (to provide them to
 customers in up-to-date state). But there is a problem: If some
instance 
 starts from image it becomes 'used'. That means:
 * That image is used as _base for nova
 * If instance is reverted this image is used to recreate instance's disk
 * If instance is rescued this image is used as rescue base
 * It is redownloaded during resize/migration (on a new compute node)
 

Some thoughts:

* All of the operations described should be operating on an image ID. So
the other suggestions of renaming seems the right way to go. Ubuntu
14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system
for a while. If something inside Nova doesn't work with IDs, it seems
like a bug.

* rebuild, revert, rescue, and resize, are all very _not_ cloud things
that increase the complexity of Nova. Perhaps we should all reconsider
their usefulness and encourage our users to spin up new resources, use
volumes and/or backup/restore methods, and then tear down old instances.

One way to encourage them is to make it clear that these operations will
only work for X amount of time before old versions images will be removed.
So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues
are only guaranteed to work for 6 months. Then aggressively clean up 
6 month old image ids. To make this practical, you might even require
a role, something like reverter, rescuer, resizer and only allow
those roles to do these operations, and then before purging images,
notify those users in those roles of instances they won't be able to
resize/rescue/revert anymore.

It also makes no sense to me why migrating an instance requires its
original image. The instance root disk is all that should matter.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Joe Topjian
We do exactly this.

Public images are named very generically like Ubuntu 14.04. Not even
14.04.1 or something like that. Old images are renamed and made private.
Existing instances continue to run, but, as others have mentioned, if a
user is using a  UUID to launch instances, that will break for them. This
is an acceptable trade-offf or us. Our documentation makes mention of this
and to use the names.

The OpenStack CLI tools as well as Vagrant (the two most used non-Dashboard
tools that are used) both support image names, so we haven't run into a
UUID-only issue.

We have a modified MOTD that lists some different scripts that the user can
run, such as:

* Using our local apt-cache server (Ubuntu only)
* Enabling automatic updates
* Install the openstack command-line tools

We had a few debates about turning on automatic updates in the images we
provide. Ultimately we chose to not enable them and instead go with the
MOTD message. There are several reasons why having automatic updates
enabled is a benefit, but the single reason that made us not do it is
simply if an automatic update breaks the user's instance, it's our fault.
It's a very debatable argument.

Also, we use Packer to bundle all of this. We have most of it available
here:

https://github.com/cybera/openstack-images

In addition to all of this, we allow users to upload their own images. So
if the core set of images we provide doesn't meet their needs, they're free
to do create their own solution.

On Thu, Feb 5, 2015 at 7:02 AM, Abel Lopez alopg...@gmail.com wrote:

 I always recommend the following:
 All public images are named generically enough that they can be replaced
 with a new version of the same name. This helps new instances booting.
 The prior image is renamed with -OLD-$date. This lets users know that
 their image has been deprecated. This image is made private so no new
 instances can be launched.
 All images include an updated motd that indicates available security
 updates.

 We're discussing baking the images with automatic updates, but still
 haven't reached an agreement.


 On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch wrote:

  -Original Message-
  From: George Shuklin [mailto:george.shuk...@gmail.com]
  Sent: 05 February 2015 14:10
  To: openstack-operators@lists.openstack.org
  Subject: [Openstack-operators] How to handle updates of public images?
 
  Hello everyone.
 
  We are updating our public images regularly (to provide them to
 customers in
  up-to-date state). But there is a problem: If some instance starts from
 image it
  becomes 'used'. That means:
  * That image is used as _base for nova
  * If instance is reverted this image is used to recreate instance's disk
  * If instance is rescued this image is used as rescue base
  * It is redownloaded during resize/migration (on a new compute node)
 
  One more (our specific):
  We're using raw disks with _base on slow SATA drives (in comparison to
 fast SSD
  for disks), and if that SATA fails, we replace it (and nova redownloads
 stuff in
  _base).
 
  If image is deleted, it causes problems with nova (nova can't download
 _base).
 
  The second part of the problem: glance disallows to update image
 (upload new
  image with same ID), so we're forced to upload updated image with new
 ID and
  to remove the old one. This causes problems described above.
  And if tenant boots from own snapshot and removes snapshot without
 removing
  instance, it causes same problem even without our activity.
 
  How do you handle public image updates in your case?
 

 We have a similar problem. For the Horizon based end users, we've defined
 a panel using image meta data. Details are at
 http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html
 .

 For the CLI users, we propose to use the sort options from Glance to find
 the latest image of a particular OS.

 It would be good if there was a way of marking an image as hidden so that
 it can still be used for snapshots/migration but would not be shown in
 image list operations.

  Thanks!
 
  ___
  OpenStack-operators mailing list
  OpenStack-operators@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Abel Lopez
That is a very real concern. This I systems from images being named very
uniquely, with versions, dates, etc. To the end user this is ALMOST as hard
as a UUID.
Easy/generic names encourage users to use them, but there is an aspect of
documentation and user training/education on the proper use of name-based
automation.

On Thursday, February 5, 2015, Marc Heckmann marc.heckm...@ubisoft.com
wrote:

 On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote:
  I always recommend the following:
  All public images are named generically enough that they can be
  replaced with a new version of the same name. This helps new instances
  booting.
  The prior image is renamed with -OLD-$date. This lets users know that
  their image has been deprecated. This image is made private so no new
  instances can be launched.
  All images include an updated motd that indicates available security
  updates.

 I like this approach, but I have the following caveat: What if users are
 using the uuid of the image instead of the name in some automation
 scripts that they might have? If we make the -OLD-$date images
 private, then we just broke their scripts.

 
 
  We're discussing baking the images with automatic updates, but still
  haven't reached an agreement.
 
  On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch javascript:;
 wrote:
   -Original Message-
   From: George Shuklin [mailto:george.shuk...@gmail.com
 javascript:;]
   Sent: 05 February 2015 14:10
   To: openstack-operators@lists.openstack.org javascript:;
   Subject: [Openstack-operators] How to handle updates of
  public images?
  
   Hello everyone.
  
   We are updating our public images regularly (to provide them
  to customers in
   up-to-date state). But there is a problem: If some instance
  starts from image it
   becomes 'used'. That means:
   * That image is used as _base for nova
   * If instance is reverted this image is used to recreate
  instance's disk
   * If instance is rescued this image is used as rescue base
   * It is redownloaded during resize/migration (on a new
  compute node)
  
   One more (our specific):
   We're using raw disks with _base on slow SATA drives (in
  comparison to fast SSD
   for disks), and if that SATA fails, we replace it (and nova
  redownloads stuff in
   _base).
  
   If image is deleted, it causes problems with nova (nova
  can't download _base).
  
   The second part of the problem: glance disallows to update
  image (upload new
   image with same ID), so we're forced to upload updated image
  with new ID and
   to remove the old one. This causes problems described above.
   And if tenant boots from own snapshot and removes snapshot
  without removing
   instance, it causes same problem even without our activity.
  
   How do you handle public image updates in your case?
  
 
  We have a similar problem. For the Horizon based end users,
  we've defined a panel using image meta data. Details are at
 
 http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html
 .
 
  For the CLI users, we propose to use the sort options from
  Glance to find the latest image of a particular OS.
 
  It would be good if there was a way of marking an image as
  hidden so that it can still be used for snapshots/migration
  but would not be shown in image list operations.
 
   Thanks!
  
   ___
   OpenStack-operators mailing list
   OpenStack-operators@lists.openstack.org javascript:;
  
 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 
  ___
  OpenStack-operators mailing list
  OpenStack-operators@lists.openstack.org javascript:;
 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
  ___
  OpenStack-operators mailing list
  OpenStack-operators@lists.openstack.org javascript:;
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Marc Heckmann
On Thu, 2015-02-05 at 06:39 -0800, Abel Lopez wrote:
 That is a very real concern. This I systems from images being named
 very uniquely, with versions, dates, etc. To the end user this is
 ALMOST as hard as a UUID. 
 Easy/generic names encourage users to use them, but there is an aspect
 of documentation and user training/education on the proper use of
 name-based automation. 

Yup, I agree with that. The best approach is probably a tweaked version
of what you proposed: Use a generic name for the latest image and rename
the outdated ones to something like image name-OLD-$date but don't
make them private. The documentation that you provide to your end users
should clearly tell them to use image names vs UUIDs and to discourage
them from using OLD images. For those that don't read the doc, the
naming alone will discourage bad practices. Some sort of automated motd
with a big fat warning if the image is older than a certain date would
help as well.

 
 On Thursday, February 5, 2015, Marc Heckmann
 marc.heckm...@ubisoft.com wrote:
 On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote:
  I always recommend the following:
  All public images are named generically enough that they can
 be
  replaced with a new version of the same name. This helps new
 instances
  booting.
  The prior image is renamed with -OLD-$date. This lets users
 know that
  their image has been deprecated. This image is made private
 so no new
  instances can be launched.
  All images include an updated motd that indicates available
 security
  updates.
 
 I like this approach, but I have the following caveat: What if
 users are
 using the uuid of the image instead of the name in some
 automation
 scripts that they might have? If we make the -OLD-$date
 images
 private, then we just broke their scripts.
 
 
 
  We're discussing baking the images with automatic updates,
 but still
  haven't reached an agreement.
 
  On Thursday, February 5, 2015, Tim Bell tim.b...@cern.ch
 wrote:
   -Original Message-
   From: George Shuklin
 [mailto:george.shuk...@gmail.com]
   Sent: 05 February 2015 14:10
   To: openstack-operators@lists.openstack.org
   Subject: [Openstack-operators] How to handle
 updates of
  public images?
  
   Hello everyone.
  
   We are updating our public images regularly (to
 provide them
  to customers in
   up-to-date state). But there is a problem: If some
 instance
  starts from image it
   becomes 'used'. That means:
   * That image is used as _base for nova
   * If instance is reverted this image is used to
 recreate
  instance's disk
   * If instance is rescued this image is used as
 rescue base
   * It is redownloaded during resize/migration (on a
 new
  compute node)
  
   One more (our specific):
   We're using raw disks with _base on slow SATA
 drives (in
  comparison to fast SSD
   for disks), and if that SATA fails, we replace it
 (and nova
  redownloads stuff in
   _base).
  
   If image is deleted, it causes problems with nova
 (nova
  can't download _base).
  
   The second part of the problem: glance disallows
 to update
  image (upload new
   image with same ID), so we're forced to upload
 updated image
  with new ID and
   to remove the old one. This causes problems
 described above.
   And if tenant boots from own snapshot and removes
 snapshot
  without removing
   instance, it causes same problem even without our
 activity.
  
   How do you handle public image updates in your
 case?
  
 
  We have a similar problem. For the Horizon based end
 users,
  we've defined a panel using image meta data. Details
 are at
 
  
 http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html.
 
  For the CLI users, we propose to use the sort
 options from
  Glance to find the latest image of a particular OS.
 
  It would 

Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Joe Topjian
I'm curious: are you using _base files? We're not and we're able to block
migrate instances based on deleted images or images that were public but
are now private.

On Thu, Feb 5, 2015 at 2:42 PM, Belmiro Moreira 
moreira.belmiro.email.li...@gmail.com wrote:

 We don't delete public images from Glance because it breaks migrate/resize
 and block live migration. Not tested with upstream Kilo, though.
 As consequence, our public image list has been growing over time...

 In order to manage image releases we use glance image properties to tag
 them.

 Some relevant reviews:
 https://review.openstack.org/#/c/150337/
 https://review.openstack.org/#/c/90321/

 Belmiro
 CERN

 On Thu, Feb 5, 2015 at 8:16 PM, Kris G. Lindgren klindg...@godaddy.com
 wrote:

 In the case of a raw backed qcow2 image (pretty sure that¹s the default)
 the instances root disk as seen inside the vm is made up of changes made
 on the instance disk (qcow2 layer) + the base image (raw).  Also, remember
 that as currently coded a resize migration will almost always be a
 migrate.  However, since the vm is successfully running on the old compute
 node it *should* be a trivial change that if the backing image is no
 longer available via glance - copy that over to the new host as well.
 

 Kris Lindgren
 Senior Linux Systems Engineer
 GoDaddy, LLC.




 On 2/5/15, 11:55 AM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800:
  Hello everyone.
 
  We are updating our public images regularly (to provide them to
  customers in up-to-date state). But there is a problem: If some
 instance
  starts from image it becomes 'used'. That means:
  * That image is used as _base for nova
  * If instance is reverted this image is used to recreate instance's
 disk
  * If instance is rescued this image is used as rescue base
  * It is redownloaded during resize/migration (on a new compute node)
 
 
 Some thoughts:
 
 * All of the operations described should be operating on an image ID. So
 the other suggestions of renaming seems the right way to go. Ubuntu
 14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system
 for a while. If something inside Nova doesn't work with IDs, it seems
 like a bug.
 
 * rebuild, revert, rescue, and resize, are all very _not_ cloud things
 that increase the complexity of Nova. Perhaps we should all reconsider
 their usefulness and encourage our users to spin up new resources, use
 volumes and/or backup/restore methods, and then tear down old instances.
 
 One way to encourage them is to make it clear that these operations will
 only work for X amount of time before old versions images will be
 removed.
 So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues
 are only guaranteed to work for 6 months. Then aggressively clean up 
 6 month old image ids. To make this practical, you might even require
 a role, something like reverter, rescuer, resizer and only allow
 those roles to do these operations, and then before purging images,
 notify those users in those roles of instances they won't be able to
 resize/rescue/revert anymore.
 
 It also makes no sense to me why migrating an instance requires its
 original image. The instance root disk is all that should matter.
 
 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Clint Byrum
Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800:
 Hello everyone.
 
 We are updating our public images regularly (to provide them to 
 customers in up-to-date state). But there is a problem: If some instance 
 starts from image it becomes 'used'. That means:
 * That image is used as _base for nova
 * If instance is reverted this image is used to recreate instance's disk
 * If instance is rescued this image is used as rescue base
 * It is redownloaded during resize/migration (on a new compute node)
 

Some thoughts:

* All of the operations described should be operating on an image ID. So
the other suggestions of renaming seems the right way to go. Ubuntu
14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system
for a while. If something inside Nova doesn't work with IDs, it seems
like a bug.

* rebuild, revert, rescue, and resize, are all very _not_ cloud things
that increase the complexity of Nova. Perhaps we should all reconsider
their usefulness and encourage our users to spin up new resources, use
volumes and/or backup/restore methods, and then tear down old instances.

One way to encourage them is to make it clear that these operations will
only work for X amount of time before old versions images will be removed.
So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues
are only guaranteed to work for 6 months. Then aggressively clean up 
6 month old image ids. To make this practical, you might even require
a role, something like reverter, rescuer, resizer and only allow
those roles to do these operations, and then before purging images,
notify those users in those roles of instances they won't be able to
resize/rescue/revert anymore.

It also makes no sense to me why migrating an instance requires its
original image. The instance root disk is all that should matter.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Fwd: [Openstack] OpenStack L naming poll

2015-02-05 Thread Lauren Sell
I wanted to make sure everyone on these lists saw the “L” release naming poll 
is live. The four options are Lizard, Love, Liberty and London, and the 
deadline to vote is Tuesday at 19:59 UTC.

It only takes a second to cast your vote: 
https://www.surveymonkey.com/r/openstack-l-naming

Cheers,
Lauren


 Begin forwarded message:
 
 Date: February 4, 2015 at 11:10:51 PM GMT+9
 From: Thierry Carrez thie...@openstack.org
 To: openst...@lists.openstack.org Openstack openst...@lists.openstack.org
 Subject: [Openstack] OpenStack L naming poll
 
 Hi everyone,
 
 As you may know, OpenStack development cycles and releases are named
 after cities or landmarks placed near where the corresponding design
 summit will happen.
 
 We'd like your help again in selecting the right name for the
 development cycle and release coming after Kilo. Our next summit will
 happen in Vancouver, BC (Canada) in May. L candidate names were
 proposed, selected and checked for various issues... leaving 4
 candidates on the final public poll.
 
 Please take a moment to participate to our poll:
 https://www.surveymonkey.com/r/openstack-l-naming
 
 and order the 4 candidates in your personal order of preference!
 
 You can find a quick rationale behind each name at:
 https://wiki.openstack.org/wiki/Release_Naming
 
 The poll closes Tuesday, February 10th at 19:59 UTC (just before the TC
 IRC meeting where the results will be proclaimed).
 
 Thanks!
 
 -- 
 Thierry Carrez (ttx)
 
 ___
 Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openst...@lists.openstack.org
 Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Way to check compute - rabbitmq connectivity

2015-02-05 Thread Kris G. Lindgren
Is Mirantis going to have someone at the ops mid-cycle?  We were talking
about this in the operators channel today and it seemed like pretty much
everyone who was active has problems with rabbitmq.  Either from
maintenance, failovers, or transient issues and having to restart world
after rabbitmq hicups to get things to recover.  I am thinking if the
issue is relatively prevalent it would be nice for people who have either
figured it out or have something that is working to discuss their setup.
 We noticed that miratis has a number of patches to oslo.messaging to fix
rabbitmq specific stuff.  So I was hoping that someone could come and talk
about what Mirantis has done there to make it better and if its there
yet and if not what still needs to be done.

We use clustered rabbitmq + LB and honestly this config on paper is
better but in practice it nothing short of a nightmare.  Any maintenance
done on rabbitmq (restart/patching ect ect) or the load balancer seems to
cause clients to not notice that they are no longer correctly connected to
the rabbitmq server and they will sit happily, doing nothing, until they
are restarted. We had similar problems listing all of the rabbitmq servers
out in the configuration as well.  So far my experience has been any
maintenance that touches rabbitmq is going to require a restart of all
service that communicate on rpc to avoid hard to troubleshoot (IE silent
errors) rpc issues.

In my experience rabbitmq is pretty much the #1 cause of issues in our
environment and I think other operators would agree with that as well.
Anything that would make rabbit + openstack more stable would be very
welcome.


 
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.


On 1/20/15, 8:33 AM, Andrew Woodward xar...@gmail.com wrote:

So this is exactly what we (@mirantis) ran into while working on the
HA story in Fuel / Mirantis OpenStack.

The short message is without heatbeat keepalive, rabbit is un-able to
properly keep track of partially open connections resulting consumers
(not senders) believing that they have a live connection to rabbit
when in-fact they don't.

Summary of the parts needed for rabbit HA
* rabbit heartbeats (https://review.openstack.org/#/c/146047/) the
oslo.messaging team is working to merge this and is well aware its a
critical need for rabbit HA.
* rabbit_hosts with a list of all rabbit nodes (haproxy should be
avoided except for services that don't support rabbit_hosts [list of
servers] there are further needs to make haproxy behave properly in
HA)
* consumer_cancel_notify (CCN)b
* rabbit grater than 3.3.0

Optional:
* rip failed nodes out of amesa db. We found that rabbit node down
discovery was slower than we wanted (minutes) and we can force an
election sooner by ripping the failed node out of amesa. (in this case
Pacemaker tells us this) we have a master/slave type mechanism in our
pacemaker script to perform this.

The long message on rabbit connections.

Through a quite long process we found that due to the way rabbit uses
connection from erlang that it won't close connections, instead rabbit
(can) send a consumer cancel notification. The consumer upon receiving
this message is supposed to hang-up and reconnect. Otherwise the
connection is reaped by the linux kernel when the TCP connection
timeout is reached ( 2 Hours ). For publishers they pick up the next
time they attempt to send a message to the queue (because it's not
acknowledged) and tend to hangup and reconnect on their own.

you will observe after removing a rabbit node is that on a compute
node ~1/3 rabbit connections re-establishes to the remaining rabbit
node(s) while the other leave sockets open to the down server (using
netstat, strace, lsof)

fixes that don't work well
* turning down TCP timeouts (LDPRELOAD or system-wide). While it will
shorten from the 2 hour recovery, turning lower than 15 minutes leads
to frequent false disconnects and tends towards bad behavior
* rabbit in haproxy. This further masks the partial connection
problem. Although we stopped using it, it might be better now with
heartbeats enabled.
* script to check for partial connections in rabbit server and
forcibly close them. A partial solution that actually gets the job
done the best besides hearbeats. It some times killed innocent
connections for us.

heartbeats fixes this by running a ping/ack in a separate channel 
thread. This allows for the consumer to have a response from rabbit
that will ensure that the connections have not gone away via stale
sockets. When combined with CCN, it works in multiple failure
condtions as expected and the rabbit consumers can be healthy within 1
minute.


On Mon, Jan 19, 2015 at 2:55 PM, Gustavo Randich
gustavo.rand...@gmail.com wrote:
 In the meantime, I'm using this horrendous script inside compute nodes
to
 check for rabbitmq connectivity. It uses the 'set_host_enabled' rpc
call,
 which in my case is innocuous.

This will still result in 

Re: [Openstack-operators] Way to check compute - rabbitmq connectivity

2015-02-05 Thread matt
It's certainly a pain to diagnose.

On Thu, Feb 5, 2015 at 3:19 PM, Kris G. Lindgren klindg...@godaddy.com
wrote:

 Is Mirantis going to have someone at the ops mid-cycle?  We were talking
 about this in the operators channel today and it seemed like pretty much
 everyone who was active has problems with rabbitmq.  Either from
 maintenance, failovers, or transient issues and having to restart world
 after rabbitmq hicups to get things to recover.  I am thinking if the
 issue is relatively prevalent it would be nice for people who have either
 figured it out or have something that is working to discuss their setup.
  We noticed that miratis has a number of patches to oslo.messaging to fix
 rabbitmq specific stuff.  So I was hoping that someone could come and talk
 about what Mirantis has done there to make it better and if its there
 yet and if not what still needs to be done.

 We use clustered rabbitmq + LB and honestly this config on paper is
 better but in practice it nothing short of a nightmare.  Any maintenance
 done on rabbitmq (restart/patching ect ect) or the load balancer seems to
 cause clients to not notice that they are no longer correctly connected to
 the rabbitmq server and they will sit happily, doing nothing, until they
 are restarted. We had similar problems listing all of the rabbitmq servers
 out in the configuration as well.  So far my experience has been any
 maintenance that touches rabbitmq is going to require a restart of all
 service that communicate on rpc to avoid hard to troubleshoot (IE silent
 errors) rpc issues.

 In my experience rabbitmq is pretty much the #1 cause of issues in our
 environment and I think other operators would agree with that as well.
 Anything that would make rabbit + openstack more stable would be very
 welcome.

 

 Kris Lindgren
 Senior Linux Systems Engineer
 GoDaddy, LLC.


 On 1/20/15, 8:33 AM, Andrew Woodward xar...@gmail.com wrote:

 So this is exactly what we (@mirantis) ran into while working on the
 HA story in Fuel / Mirantis OpenStack.
 
 The short message is without heatbeat keepalive, rabbit is un-able to
 properly keep track of partially open connections resulting consumers
 (not senders) believing that they have a live connection to rabbit
 when in-fact they don't.
 
 Summary of the parts needed for rabbit HA
 * rabbit heartbeats (https://review.openstack.org/#/c/146047/) the
 oslo.messaging team is working to merge this and is well aware its a
 critical need for rabbit HA.
 * rabbit_hosts with a list of all rabbit nodes (haproxy should be
 avoided except for services that don't support rabbit_hosts [list of
 servers] there are further needs to make haproxy behave properly in
 HA)
 * consumer_cancel_notify (CCN)b
 * rabbit grater than 3.3.0
 
 Optional:
 * rip failed nodes out of amesa db. We found that rabbit node down
 discovery was slower than we wanted (minutes) and we can force an
 election sooner by ripping the failed node out of amesa. (in this case
 Pacemaker tells us this) we have a master/slave type mechanism in our
 pacemaker script to perform this.
 
 The long message on rabbit connections.
 
 Through a quite long process we found that due to the way rabbit uses
 connection from erlang that it won't close connections, instead rabbit
 (can) send a consumer cancel notification. The consumer upon receiving
 this message is supposed to hang-up and reconnect. Otherwise the
 connection is reaped by the linux kernel when the TCP connection
 timeout is reached ( 2 Hours ). For publishers they pick up the next
 time they attempt to send a message to the queue (because it's not
 acknowledged) and tend to hangup and reconnect on their own.
 
 you will observe after removing a rabbit node is that on a compute
 node ~1/3 rabbit connections re-establishes to the remaining rabbit
 node(s) while the other leave sockets open to the down server (using
 netstat, strace, lsof)
 
 fixes that don't work well
 * turning down TCP timeouts (LDPRELOAD or system-wide). While it will
 shorten from the 2 hour recovery, turning lower than 15 minutes leads
 to frequent false disconnects and tends towards bad behavior
 * rabbit in haproxy. This further masks the partial connection
 problem. Although we stopped using it, it might be better now with
 heartbeats enabled.
 * script to check for partial connections in rabbit server and
 forcibly close them. A partial solution that actually gets the job
 done the best besides hearbeats. It some times killed innocent
 connections for us.
 
 heartbeats fixes this by running a ping/ack in a separate channel 
 thread. This allows for the consumer to have a response from rabbit
 that will ensure that the connections have not gone away via stale
 sockets. When combined with CCN, it works in multiple failure
 condtions as expected and the rabbit consumers can be healthy within 1
 minute.
 
 
 On Mon, Jan 19, 2015 at 2:55 PM, Gustavo Randich
 gustavo.rand...@gmail.com wrote: