Re: [Openstack-operators] [openstack-dev] [all] Bringing the community together (combine the lists!)

2018-08-30 Thread Chris Friesen

On 08/30/2018 11:03 AM, Jeremy Stanley wrote:


The proposal is simple: create a new openstack-discuss mailing list
to cover all the above sorts of discussion and stop using the other
four.


Do we want to merge usage and development onto one list?  That could be a busy 
list for someone who's just asking a simple usage question.


Alternately, if we are going to merge everything then why not just use the 
"openstack" mailing list since it already exists and there are references to it 
on the web.


(Or do you want to force people to move to something new to make them recognize 
that something has changed?)


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Chris Friesen

On 08/29/2018 10:02 AM, Jay Pipes wrote:


Also, I'd love to hear from anyone in the real world who has successfully
migrated (live or otherwise) an instance that "owns" expensive hardware
(accelerators, SR-IOV PFs, GPUs or otherwise).


I thought cold migration of instances with such devices was supported upstream?

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [puppet] migrating to storyboard

2018-08-15 Thread Chris Friesen

On 08/14/2018 10:33 AM, Tobias Urdin wrote:


My goal is that we will be able to swap to Storyboard during the Stein cycle but
considering that we have a low activity on
bugs my opinion is that we could do this swap very easily anything soon as long
as everybody is in favor of it.

Please let me know what you think about moving to Storyboard?


Not a puppet dev, but am currently using Storyboard.

One of the things we've run into is that there is no way to attach log files for 
bug reports to a story.  There's an open story on this[1] but it's not assigned 
to anyone.


Chris


[1] https://storyboard.openstack.org/#!/story/2003071

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] large high-performance ephemeral storage

2018-06-13 Thread Chris Friesen

On 06/13/2018 07:58 AM, Blair Bethwaite wrote:


Is the collective wisdom to use LVM based instances for these use-cases? Putting
a host filesystem with qcow2 based disk images on it can't help
performance-wise... Though we have not used LVM based instance storage before,
are there any significant gotchas? And furthermore, is it possible to use set IO
QoS limits on these?


LVM has the drawback that deleting instances results in significant disk traffic 
while the volume is scrubbed with zeros.  If you don't care about security you 
can set a config option to turn this off.  Also, while this is happening I think 
your disk resource tracking will be wrong because nova assumes the space is 
available. (At least it used to be this way, I haven't checked that code recently.)


Also, migration and resize are not supported for LVM-backed instances.  I 
proposed a patch to support them (https://review.openstack.org/#/c/337334/) but 
hit issues and never got around to fixing them up.


Chris



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Chris Friesen

On 06/04/2018 05:43 AM, Tobias Urdin wrote:

Hello,

I have received a question about a more specialized use case where we need to
isolate several hypervisors to a specific project. My first thinking was
using nova flavors for only that project and add extra specs properties to
use a specific host aggregate but this means I need to assign values to all
other flavors to not use those which seems weird.

How could I go about solving this the easies/best way or from the
history of the mailing lists, the most supported way since there is a
lot of changes to scheduler/placement part right now?


There was a "Strict isolation of group of hosts for images" spec that was 
proposed for a number of releases but never got accepted:


https://review.openstack.org/#/c/381912/

The idea was to have special metadata on a host aggregate and a new scheduler 
filter such that only instances with images having a property matching the 
metadata would be allowed to land on that host aggregate.


In the end the spec was abandoned (see the final comment in the review) because 
it was expected that a combination of other accepted features would enable the 
desired behaviour.


It might be worth checking out the links in the final comment.

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment

2018-05-18 Thread Chris Friesen
Are you talking about downtime of instances (and the dataplane), or of the 
OpenStack API and control plane?


And when you say "zero downtime" are you really talking about "five nines" or 
similar?  Because nothing is truly zero downtime.


If you care about HA then you'll need additional components outside of OpenStack 
proper.  You'll need health checks on your physical nodes, health checks on your 
network links, possibly end-to-end health checks up into the applications 
running in your guests, redundant network paths, redundant controller nodes, HA 
storage, etc.  You'll have to think about how to ensure your database and 
messaging service are HA.  You may want to look at ensuring that your OpenStack 
services do not interfere with the VMs running on that node and vice versa.


We ended up rolling our own install mechanisms because we weren't satisfied with 
any of the existing projects.  That was a while ago now so I don't know how far 
they've come.


Chris

On 05/18/2018 02:07 PM, Fox, Kevin M wrote:

I don't think openstack itself can meet full zero downtime requirements. But if
it can, then I also think none of the deployment tools try and support that use
case either.

Thanks,
Kevin

*From:* Amit Kumar [ebiib...@gmail.com]
*Sent:* Friday, May 18, 2018 3:46 AM
*To:* OpenStack Operators; Openstack
*Subject:* [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding
production grade OpenStack deployment

Hi All,

We want to deploy our private cloud using OpenStack as highly available (zero
downtime (ZDT) - in normal course of action and during upgrades as well)
production grade environment. We came across following tools.

  * We thought of using /Kolla-Kubernetes/ as deployment tool, but we got
feedback from Kolla IRC channel that this project is being retired.
Moreover, we couldn't find latest documents having multi-node deployment
steps and, High Availability support was also not mentioned at all anywhere
in the documentation.
  * Another option to have Kubernetes based deployment is to use OpenStack-Helm,
but it seems the OSH community has not made OSH 1.0 officially available 
yet.
  * Last option, is to use /Kolla-Ansible/, although it is not a Kubernetes
deployment, but seems to have good community support around it. Also, its
documentation talks a little about production grade deployment, probably it
is being used in production grade environments.


If you folks have used any of these tools for deploying OpenStack to fulfill
these requirements: HA and ZDT, then please provide your inputs specifically
about HA and ZDT support of the deployment tool, based on your experience. And
please share if you have any reference links that you have used for achieving HA
and ZDT for the respective tools.

Lastly, if you think we should think that we have missed another more viable and
stable options of deployment tools which can serve our requirement: HA and ZDT,
then please do suggest the same.

Regards,
Amit




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [Openstack-sigs] [keystone] [oslo] new unified limit library

2018-03-07 Thread Chris Friesen

On 03/07/2018 10:44 AM, Tim Bell wrote:

I think nested quotas would give the same thing, i.e. you have a parent project
for the group and child projects for the users. This would not need user/group
quotas but continue with the ‘project owns resources’ approach.


Agreed, I think that if we support nested quotas with a suitable depth of 
nesting it could be used to handle the existing nova user/project quotas.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata

2018-01-29 Thread Chris Friesen

On 01/29/2018 07:47 AM, Jay Pipes wrote:


What I believe we can do is change the behaviour so that if a 0.0 value is found
in the nova.conf file on the nova-compute worker, then instead of defaulting to
16.0, the resource tracker would first look to see if the compute node was
associated with a host aggregate that had the "cpu_allocation_ratio" a metadata
item. If one was found, then the host aggregate's cpu_allocation_ratio would be
used. If not, then the 16.0 default would be used.


Presumably you'd need to handle the case where the host is in multiple host 
aggregates that have "cpu_allocation_ratio" as a metadata item.  I think the 
AggregateCoreFilter uses the smallest ratio in this case.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Openstack release compatibility

2017-11-02 Thread Chris Friesen

On 10/31/2017 01:13 AM, haad wrote:

Hi,

We have an OSA installation with 10-12 compute nodes running Mitaka on Ubuntu
16.04. As initially we have not prepared any long term update strategy we would
like to create one now. Plan would be to upgrade it to new OSA
release(Ocata/Pike/Queens) in near future.

Our original plan was to update management/networking/backend at once by using
rolling updates to newer release and then upgrade compute nodes one by one to
new release.. I think that [2] provides a general upgrade manual. Is there any
document describing how are different OSA releases compatible ? Is there any
policy in place about backward compatibility ?


As a general rule, OpenStack only supports an online upgrade of one version at a 
time.  That is, controller nodes running version N can talk to compute nodes 
running version N-1.


If you can tolerate downtime of the API layer, there has been some discussion 
around "skip-level" upgrades.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Live migration failures

2017-11-02 Thread Chris Friesen

On 11/02/2017 08:48 AM, Mike Lowe wrote:

After moving from CentOS 7.3 to 7.4, I’ve had trouble getting live migration to 
work when a volume is attached.  As it turns out when a live migration takes 
place the libvirt driver rewrites portions of the xml definition for the 
destination hypervisor and gets it wrong.  Here is an example.


Did you change versions of OpenStack as well?

Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [ironic][nova][libvirt] Adding ironic to already-existing kvm deployment

2017-10-18 Thread Chris Friesen

On 10/18/2017 11:37 AM, Chris Apsey wrote:

All,

I'm working to add baremetal provisioning to an already-existing libvirt (kvm)
deployment.  I was under the impression that our currently-existing endpoints
that already run nova-conductor/nova-scheduler/etc. can be modified to support
both kvm and ironic, but after looking at the ironic installation guide
(https://docs.openstack.org/ironic/latest/install/configure-compute.html), this
doesn't appear to be the case.  Changes are made in the [default] section that
you obviously wouldn't want to apply to your virtual instances.

Given that information, it would appear that ironic requires that you create an
additional host to run nova-compute separately from your already-existing
compute nodes purely for the purpose of managing the ironic-nova integration,
which makes sense.


I think you could run nova-compute with a config file specified as part of the 
commandline.  From what I understand if you run it on the same host as the 
libvirt nova-compute you'd need to use a separate hostname for running the 
ironic nova-compute since nova uses the binary/hostname tuple to uniquely 
identify services in the DB.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-04 Thread Chris Friesen

On 10/03/2017 11:12 AM, Clint Byrum wrote:


My personal opinion is that rebuild is an anti-pattern for cloud, and
should be frozen and deprecated. It does nothing but complicate Nova
and present challenges for scaling.

That said, if it must stay as a feature, I don't think updating the
user_data should be a priority. At that point, you've basically created an
entirely new server, and you can already do that by creating an entirely
new server.


If you've got a whole heat stack with multiple resources, and you realize that 
you messed up one thing in the template and one of your servers has the wrong 
personality/user_data, it can be useful to be able to rebuild that one server 
without affecting anything else in the stack.  That's just a convenience though.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Experience with Cinder volumes as root disks?

2017-08-01 Thread Chris Friesen

On 08/01/2017 02:32 PM, Mike Lowe wrote:

Two things, first info does not show how much disk is used du does.  Second, the
semantics count, copy is different than clone and flatten.  Clone and flatten
which should happen if you have things working correctly is much faster than
copy.  If you are using copy then you may be limited by the number of management
ops in flight, this is a setting for more recent versions of ceph.  I don’t know
if copy skips zero byte objects but clone and flatten certainly do.  You need to
be sure that you have the proper settings in nova.conf for discard/unmap as well
as using hw_scsi_model=virtio-scsi and hw_disk_bus=scsi in the image properties.
  Once discard is working and you have the qemu guest agent running in your
instances you can force them to do a fstrim to reclaim space as an additional
benefit.



Just a heads-up...with virtio-scsi there is a bug where you cannot boot from 
volume and then attach another volume.


(The bug is 1702999, though it's possible the fix for 1686116 will address it in 
which case it'd be fixed in pike.)


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Experience with Cinder volumes as root disks?

2017-08-01 Thread Chris Friesen

On 08/01/2017 08:50 AM, Kimball, Conrad wrote:


·Are other operators routinely booting onto Cinder volumes instead of ephemeral
storage?


It's up to the end-user, but yes.


·What has been your experience with this; any advice?


It works fine.  With Horizon you can do it in one step (select the image but 
tell it to boot from volume) but with the CLI I think you need two steps (make 
the volume from the image, then boot from the volume).  The extra steps are a 
moot point if you are booting programmatically (from a custom script or 
something like heat).


I think that generally speaking the default is to use ephemeral storage because 
it's:


a) cheaper
b) "cloudy" in that if anything goes wrong you just spin up another instance

On the other hand, booting from volume does allow for faster migrations since it 
avoids the need to transfer the boot disk contents as part of the migration.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] cannot attach new volume to an instance

2017-07-27 Thread Chris Friesen
For what it's worth, there's a patch series that went into master and is under 
review for backport to Ocata that might help:


https://review.openstack.org/#/q/topic:bug/1686116

Chris

On 07/27/2017 12:30 PM, Ignazio Cassano wrote:

The instance boots from a volume.
Fails attacching a second volume.
A guy suggested to use hw_disk_bus=virtio and it works.
Regards

Il 27/Lug/2017 05:33 PM, "Chris Friesen" mailto:chris.frie...@windriver.com>> ha scritto:

On 07/27/2017 08:44 AM, Ignazio Cassano wrote:

Hello All,
instance created from images with the following metadata:

hw_disk_bus=scsi
hw_scsi_model=virtio-scsi

do not allow to attach new volume.

Deletting the above metadata in the image and creating a new instance,
it allows
new volume attachment.

I got the same behaviour on all openstack releases from liberty to 
ocata.
Regards
Ignazio


How does it fail?  Does it behave differently if you try to boot an instance
from such a volume?  (As opposed to attaching a volume to a running 
instance.)

Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] cannot attach new volume to an instance

2017-07-27 Thread Chris Friesen

On 07/27/2017 12:30 PM, Ignazio Cassano wrote:

The instance boots from a volume.
Fails attacching a second volume.
A guy suggested to use hw_disk_bus=virtio and it works.


Okay...I think this is a known issue with virtio-scsi:

https://bugs.launchpad.net/nova/+bug/1702999

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] cannot attach new volume to an instance

2017-07-27 Thread Chris Friesen

On 07/27/2017 08:44 AM, Ignazio Cassano wrote:

Hello All,
instance created from images with the following metadata:

hw_disk_bus=scsi
hw_scsi_model=virtio-scsi

do not allow to attach new volume.

Deletting the above metadata in the image and creating a new instance, it allows
new volume attachment.

I got the same behaviour on all openstack releases from liberty to ocata.
Regards
Ignazio


How does it fail?  Does it behave differently if you try to boot an instance 
from such a volume?  (As opposed to attaching a volume to a running instance.)


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [masakari][nova] Allow evacuation of instances in resized state

2017-07-19 Thread Chris Friesen

On 07/12/2017 06:57 PM, Jay Pipes wrote:

On 07/04/2017 05:21 AM, Kekane, Abhishek wrote:

Hi operators,

I want to know how evacuation of resized instances is handled in real
environment.
For example if the vm is in resized state and if the compute host on which the
vm is resized goes down, then how will operator evacuate the vm.

One possible way is to reset that vm state to error and then evacuate it to
new compute host.
Please refer below scenario for reference:

Scenario:
=

Pre-conditions:

1. Config option allow_resize_to_same_host is False.
2. Instance path is not mounted on shared storage.
3. Three compute nodes: "compute node A", "compute node B" and "compute node C"

Steps:

1. Boot an instance on "compute node A".
2. User tries to resize the newly created instance and nova-scheduler selects
"compute node B" as a destination node for resize.
In this case nova creates a instance directory on destination "compute
node B" and mark the instance directory which is present on the source
"compute node A" as "*_resize".

Note that the resize operation is yet not confirmed and "compute node B" goes
down.

3. Reset instance state to ERROR as nova allows evacuation only if instance
state is 'ACTIVE', 'STOPPED' or 'ERROR'.


I don't understand why you would do this, Abhishek. Why not REVERT the resize
operation (which would clean up the _resize directory and files on the original
host A) and then try the resize again?


Sorry for the delayed reply (just got back from vacation).  If the dest compute 
node is dead, is it even possible to revert the resize?


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Obtaining nova settings at runtime

2017-06-14 Thread Chris Friesen

On 06/14/2017 10:31 AM, Matt Riedemann wrote:

On 6/14/2017 10:57 AM, Carlos Konstanski wrote:

Is there a way to obtain nova configuration settings at runtime without
resorting to SSHing onto the compute host and greping nova.conf? For
instance a CLI call? At the moment I'm looking at cpu_allocation_ratio
and ram_allocation_ratio. There may be more along this vein.

Alles Gute,
Carlos Konstanski

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



No there isn't.

For the specific case you're looking for though, you could use the Placement
REST API:

http://docs-draft.openstack.org/32/470932/5/check/gate-placement-api-ref-nv/771f604//placement-api-ref/build/html/?expanded=show-resource-provider-detail,show-resource-provider-inventory-detail


Using ^ you could get the VCPU resource class inventory allocation ratio for a
given resource provider, where an example of a resource provider is a compute 
node.

For a compute node resource provider, the uuid is the same as the
compute_nodes.uuid field and the name is the compute_nodes.hypervisor_hostname
field.


There's a small complication here...if AggregateRamFilter/AggregateCoreFilter 
scheduler filters are enabled the cpu_allocation_ratio and ram_allocation_ratio 
set in nova.conf on the compute nodes can be overwridden by ratios set in the 
aggregates.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] How to move on from live_migration_uri?

2017-06-07 Thread Chris Friesen

On 06/07/2017 09:20 AM, Matt Riedemann wrote:


What I'm trying to do is continue to use ssh as the scheme since that's what
devstack sets up. So I set live_migration_scheme=ssh.

Within the libvirt driver, it starts with a URL like this for qemu:

qemu+%s://%s/system

And does a string replace on that URL with (scheme, destination), which would
give us:

qemu+ssh:///system

The problem lies in the dest part. Devstack is trying to specify the username
for the ssh URI, so it wants "stack@%s" for the dest part. I tried setting
live_migration_inbound_addr="stack@%s" but that doesn't work because the driver
doesn't replace the dest on top of that again, so we just end up with this:

qemu+ssh://stack@%s/system

Is there some other way to be doing this? We could try to use tunneling but the
config option help text for live_migration_tunnelled makes that sound scary,
e.g. "Enable this option will definitely
impact performance massively." Holy crap Scoobs, let's scram!


In my testing tunneling really only makes a big difference if you've got 10G 
links, and even then it'll hit about 70% of the line rate for an unencrypted 
tunnel (with libvirtd running at ~70% cpu util).



Should we check if the scheme is ssh and try a final string replacement with the
destination host after we've already applied (live_migration_scheme,
live_migration_inbound_addr)?


Something like that seems reasonable.  It'd be nice if we didn't need an 
explicit username config option.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] preferred option to fix long-standing user-visible bug in nova?

2017-05-25 Thread Chris Friesen

On 05/25/2017 01:53 PM, Marc Heckmann wrote:

On Mon, 2017-05-15 at 11:46 -0600, Chris Friesen wrote:



What do operators think we should do?  I see two options, neither of
which is
really ideal:

1) Decide that the "new" behaviour has been out in the wild long
enough to
become the defacto standard and update the docs to reflect
this.  This breaks
the "None and 'prefer' are equivalent" model that was originally
intended.

2) Fix the bug to revert back to the original behaviour and backport
the fix to
Ocata.  Backporting to Newton might not happen since it's in phase
II
maintenance.  This could potentially break anyone that has come to
rely on the
"new" behaviour.


Whatever will or has been chosen should match the documentation.
Personally, we would never do anything other than specifying the policy
in the flavor as our flavors are associated w/ HW  profiles but I could
see how other operators might manage things differently. That being
said, that sort of thing should not necessarily be user controlled and
I haven't really explored Glance property protections..

So from my point of view "cpu_thread_policy" set in the flavor should
take precedence over anything else.


So a vote to keep the status quo and change the documentation to match?  (Since 
the current behaviour doesn't match the original documentation.)


Incidentally, it's allowed to be specified in an image because whether or not HT 
is desirable depends entirely on the application code.  It may be faster with 
"isolate", or it may be faster with "require" and double the vCPUs in the guest. 
 If the software in the guest is licensed per vCPU then "isolate" might make 
sense to maximize performance per licensing dollar.


"prefer" is almost never a sensible choice for anything that cares about 
performance--it was always intended to be a way to represent "the behaviour that 
you get if you don't specify a cpu thread policy".


Oh, and I'd assume that a customer would be billed for the number of host cores 
actually used...so "isolate" with N vCPUs and "require" with 2*N vCPUs would end 
up costing the same.


Chris




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-24 Thread Chris Friesen

On 05/22/2017 01:55 PM, Jay Pipes wrote:

On 05/22/2017 03:53 PM, Jonathan Proulx wrote:

To be clear on my view of the whole proposal

most of my Rescheduling that I've seen and want are of type "A" where
claim exceeds resources.  At least I think they are type "A" and not
"C" unknown.

The exact case is that I over subsribe RAM (1.5x) my users typically over
claim so this is OK (my worst case is a hypervisor using only 10% of
claimed RAM).  But there are some hotspots where propertional
utilization is high so libvirt won't start more VMs becasue it really
doesn't have the memory.

If that's solved (or will be at the time reschedule goes away), teh
cases I've actually experienced would be solved.

The anit-affinity use cases are currently most important to be of the
affinity scheduling and I haven't (to my knowlege) seen collisions in
that direction.  So I could live with that race becuase for me it is
uncommon (though I imagine for others where positive affinity is
important teh race may get lost mroe frequently)


Thanks for the feedback, Jon.

For the record, affinity really doesn't have much of a race condition at all.
It's really only anti-affinity that has much of a chance of last-minute 
violation.


Don't they have the same race on instance boot?  Two instances being started in 
the same (initially empty) affinity group could be scheduled in parallel and end 
up on different compute nodes.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] preferred option to fix long-standing user-visible bug in nova?

2017-05-15 Thread Chris Friesen

Hi,

In Mitaka nova introduced the "cpu_thread_policy" which can be specified in 
flavor extra-specs.  In the original spec, and in the original implementation, 
not specifying the thread policy in the flavor was supposed to be equivalent to 
specifying a policy of "prefer", and in both cases if the image set a policy 
then nova would use the image policy.


In Newton, the code was changed to fix a bug but there was an unforeseen side 
effect.  Now the behaviour is different depending on whether the flavor 
specifies no policy at all or specifies a policy of "prefer".   Specifically, if 
the flavor doesn't specify a policy at all and the image does then we'll use the 
flavor policy.  However, if the flavor specifies a policy of "prefer" and the 
image specifies a different policy then we'll use the flavor policy.


This is clearly a bug (tracked as part of bug #1687077), but it's now been out 
in the wild for two releases (Newton and Ocata).


What do operators think we should do?  I see two options, neither of which is 
really ideal:


1) Decide that the "new" behaviour has been out in the wild long enough to 
become the defacto standard and update the docs to reflect this.  This breaks 
the "None and 'prefer' are equivalent" model that was originally intended.


2) Fix the bug to revert back to the original behaviour and backport the fix to 
Ocata.  Backporting to Newton might not happen since it's in phase II 
maintenance.  This could potentially break anyone that has come to rely on the 
"new" behaviour.


Either change is trivial from a dev standpoint, so it's really an operator 
issue--what makes the most sense for operators/users?


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Expected behavior of disabled nova service

2017-05-08 Thread Chris Friesen
As I understand it, the behaviour *should* be that any active nova-conductor or 
nova-scheduler could possibly process any outstanding work item pulled from the 
RPC queue.  I don't think that nova-conductor and nova-scheduler need to be 
co-located.


I think you might have found a bug though...I'm not an expert in this area of 
the code, but I didn't see any checks for components other than nova-compute 
being disabled.


Chris


On 05/08/2017 04:54 AM, Masha Atakova wrote:

Hi everyone,

I have a setup with 2 controllers and 2 compute nodes. For testing purposes, I
want to make sure that when I send a request for launching a new instance, it's
being processed by a particular scheduler. For this I have several options:

1) ssh to controller with scheduler I don't want to use and power it down

2) disable scheduler using nova api

Option #2 seems much cleaner and effective to me (less time needed to get
service up and running again), but looks like I'm missing something very
important in how nova disables service.

My `nova service-list` gives me the following:

++--+-+--+--+---+++

| Id | Binary   | Host| Zone | Status
| State | Updated_at | Disabled Reason|
++--+-+--+--+---+++

| 25 | nova-consoleauth | controller1 | internal | enabled  | up|
2017-05-08T08:46:09.00 | -  |
| 28 | nova-consoleauth | controller2 | internal | disabled | up|
2017-05-08T08:46:10.00 | -  |
| 31 | nova-scheduler   | controller1 | internal | enabled  | up|
2017-05-08T08:46:14.00 | -  |
| 34 | nova-scheduler   | controller2 | internal | disabled | up|
2017-05-08T08:46:17.00 | Test |
| 37 | nova-conductor   | controller1 | internal | enabled  | up|
2017-05-08T08:46:13.00 | -  |
| 46 | nova-conductor   | controller2 | internal | disabled | up|
2017-05-08T08:46:13.00 | Test |
| 55 | nova-compute | compute1 | nova | enabled  | up|
2017-05-08T08:46:10.00 | -  |
| 58 | nova-compute | compute2 | nova | enabled  | up|
2017-05-08T08:46:16.00 | -  |
++--+-+--+--+---+++


But when I run request for new instance (either from python-client, command line
or horizon), I see in logs that nova-scheduler at controller2 is working to
process that request half of the time. The same behavior as if I didn't disable
it at all.

I've read some code and noticed that each nova-conductor is always paired up
with the scheduler on the same controller node, so I've disabled nova-conductor
as well on controller2. Which didn't change anything.

While I'm going through the code of nova-api, could you please help me to
understand if it's a correct behavior or a bug?

Thanks in advance for your time.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova

2017-03-23 Thread Chris Friesen

On 03/23/2017 11:01 AM, Jean-Philippe Methot wrote:


So basically, my question is, how does openstack actually manage ram allocation?
Will it ever take back the unused ram of a guest process? Can I force it to take
back that ram?


I don't think nova will automatically reclaim memory.

I'm pretty sure that if you have CONF.libvirt.mem_stats_period_seconds set 
(which it is by default) then you can manually tell libvirt to reclaim some 
memory via the "virsh setmem" command.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Flavors

2017-03-21 Thread Chris Friesen

On 03/20/2017 04:24 PM, Blair Bethwaite wrote:


For me an interesting question to know the answer to here would be at what point
you have to stop resource sharing to guarantee your performance promises/SLAs
(disregarding memory over-provisioning). My gut says that unless you are also
doing all the other high-end performance tuning (CPU & memory pinning​, NUMA
topology, hugepages, optimised networking such as macvtap or SRIOV, plus all the
regular host-side system/BIOS and power settings) you'll see very little
benefit, i.e., under-provisioning on its own is not a performance win.


Yes, that's a great question.  Someone should do a paper on it. :)

My suspicion would be that for a CPU-intensive workload you could probably gain 
quite a lot just by not overprovisioning your CPU.  The host scheduler should 
end up balancing the busy virtual CPU threads across the available host CPUs.


This will still result in periodic interruptions due to scheduler interruptions, 
RCU callbacks, etc.  To get to the next level where the guest has near-hardware 
performance requires a bunch more work with host CPU isolation, IRQ affinity, 
RCU offload, etc.


Once you get into a memory-bandwidth or network-bandwidth intensive workload 
then I agree you need to start caring about memory pinning, hugepages, NUMA 
topology, etc.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Flavors

2017-03-16 Thread Chris Friesen

On 03/16/2017 07:06 PM, Blair Bethwaite wrote:


Statement: breaks bin packing / have to match flavor dimensions to hardware
dimensions.
Comment: neither of these ring true to me given that most operators tend to
agree that memory is there first constraining resource dimension and it is
difficult to achieve high CPU utilisation before memory is exhausted. Plus
virtualisation is inherently about resource sharing and over-provisioning,
unless you have very detailed knowledge of your workloads a priori (or some
cycle-stealing/back-filling mechanism) you will always have under-utilisation
(possibly quite high on average) in some resource dimensions.


I think this would be highly dependent on the workload.  A virtual router is 
going to run out of CPU/network bandwidth far before memory is exhausted.


For similar reasons I'd disagree that virtualization is inherently about 
over-provisioning and suggest that (in some cases at least) it's more about 
flexibility over time.  Our customers generally care about maximizing 
performance and so nothing is over-provisioned...disk, NICs, CPUs, RAM are 
generally all exclusively allocated.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] query NUMA topology via API

2017-02-01 Thread Chris Friesen

On 02/01/2017 09:49 AM, Gustavo Randich wrote:

Hi, is there any way to query via Compute API the NUMA topology of a compute
node, and free ram/cpu of each NUMA cell?


Not that I know of, but might be a useful thing for the admin to have.

Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Removal of support for sparse LVM volumes from Nova

2017-01-31 Thread Chris Friesen

On 01/30/2017 09:48 AM, Matthew Booth wrote:

As noted here https://bugs.launchpad.net/mos/+bug/1591084 (a MOS bug, but also
confirmed upstream and in RHOS), this is broken. By the looks of it, it has
never worked and would require architectural changes to make it work.

I'm assuming this means that nobody could possibly be using it, and that nobody
will be sad if I remove all traces of it in its current form. If anybody is
using it, or knows of anybody using it, could you let me know? What workarounds
are you using?

Secondly, if it did work is anybody interested in this feature?


We don't care about sparse LVM, but we do use thinly-provisioned LVM in nova. 
This avoids the "zero out the volume all at once" cost in exchange for some 
additional overhead when allocating disk blocks.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Live-migration CPU doesn't have compatibility

2016-10-31 Thread Chris Friesen

On 10/27/2016 11:09 PM, William Josefsson wrote:

hi, I did 'virsh capabilities' on the Haswell, which turned out to
list model: Haswell-noTSX. So I set in nova.conf
cpu_model=Haswell-noTSX on both Haswell and Broadwell hosts and it
seems to work. I believe this is my smallest common denominator.


Almost certainly, yes.

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Live-migration CPU doesn't have compatibility

2016-10-27 Thread Chris Friesen

On 10/26/2016 06:07 AM, William Josefsson wrote:

Hi list,

I'm facing issues on Liberty/CentOS7 doing live migrations between to
hosts. The hosts are Haswell and Broadwell. However, there is not
feature specific running on my VMs

Haswell -> Broadwell works
Broadwell -> Haswell fails with the error below.


I have on both hosts configured
[libvirt]
cpu_mode=none

and restarted openstack-nova-compute on hosts, however that didn't
help, with the same error. there gotta be a way of ignoring this
check? pls advice. thx will


If you are using kvm/qemu and set cpu_mode=none, then it will use 'host-model', 
and any instances started on Broadwell can't be live-migrated onto Haswell.


In your case you probably want to set both computes to have:

[libvirt]
cpu_mode = custom
cpu_model = Haswell

This will cause nova to start guests with the "Haswell" model on both nodes.

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova resize on shared storage

2016-07-28 Thread Chris Friesen

On 07/28/2016 02:34 AM, Marcus Furlong wrote:

Hi,

I've been trying to find some information about using nova resize on
shared storage, without giving each compute node ssh access to every
other compute node. As the VM is on shared storage, the compute node
shouldn't need ssh access to another compute node?

Is this something that anyone has succeeded in doing?

I've found the following documentation:

http://docs.openstack.org/mitaka/config-reference/compute/resize.html
http://docs.openstack.org/user-guide/cli_change_the_size_of_your_server.html

but it does not say what to do in the case of shared storage.



If you have shared storage for all compute nodes you could modify the code to 
just hard-code the routines to always return that storage is shared.


With the stock code, it still wants to use ssh to determine whether or not the 
storage is actually shared.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Is verification of images in the image cache necessary?

2016-05-24 Thread Chris Friesen

On 05/24/2016 09:54 AM, Dan Smith wrote:

I like the idea of checking the md5 matches before each boot, as it
mirrors the check we do after downloading from glance. Its possible
thats very unlikely to spot anything that shouldn't already be worried
about by something else. It may just be my love of symmetry that makes
me like that idea?


IMHO, checking this at boot after we've already checked it on download
is not very useful. It supposes that the attacker was kind enough to
visit our system before an instance was booted and not after. If I have
rooted the system, it's far easier for me to show up after a bunch of
instances are booted and modify the base images (or even better, the
instance images themselves which are hard to validate from the host side).

I would also point out that if I'm going to root a compute node, the
first thing I'm going to do is disable the feature in nova-compute or in
some other way cripple it so it can't do its thing.


It was my impression we were trying to prevent bitrot, not defend against an 
attacker that has gained control over the compute node.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-24 Thread Chris Friesen

On 05/23/2016 08:46 PM, John Griffith wrote:



On Mon, May 23, 2016 at 8:32 AM, Ivan Kolodyazhny mailto:e...@e0ne.info>> wrote:

Hi developers and operators,
I would like to get any feedback from you about my idea before I'll start
work on spec.

In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
of instance builds to run concurrently' per each compute. There is no
equivalent Cinder.

Why do we need it for Cinder? IMO, it could help us to address following 
issues:

  * Creation of N volumes at the same time increases a lot of resource usage
by cinder-volume service. Image caching feature [2] could help us a bit
in case when we create volume form image. But we still have to upload N
images to the volumes backend at the same time.
  * Deletion on N volumes at parallel. Usually, it's not very hard task for
Cinder, but if you have to delete 100+ volumes at once, you can fit
different issues with DB connections, CPU and memory usages. In case of
LVM, it also could use 'dd' command to cleanup volumes.
  * It will be some kind of load balancing in HA mode: if cinder-volume
process is busy with current operations, it will not catch message from
RabbitMQ and other cinder-volume service will do it.
  *  From users perspective, it seems that better way is to create/delete N
volumes a bit slower than fail after X volumes were created/deleted.


[1]

https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
[2]

https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html

Regards,
Ivan Kolodyazhny,
http://blog.e0ne.info/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

​Just curious about a couple things:  Is this attempting to solve a problem in
the actual Cinder Volume Service or is this trying to solve problems with
backends that can't keep up and deliver resources under heavy load?  I get the
copy-image to volume, that's a special case that certainly does impact Cinder
services and the Cinder node itself, but there's already throttling going on
there, at least in terms of IO allowed.

Also, I'm curious... would the exiting API Rate Limit configuration achieve the
same sort of thing you want to do here?  Granted it's not selective but maybe
it's worth mentioning.

If we did do something like this I would like to see it implemented as a driver
config; but that wouldn't help if the problem lies in the Rabbit or RPC space.
That brings me back to wondering about exactly where we want to solve problems
and exactly which.  If delete is causing problems like you describe I'd suspect
we have an issue in our DB code (too many calls to start with) and that we've
got some overhead elsewhere that should be eradicated.  Delete is a super simple
operation on the Cinder side of things (and most back ends) so I'm a bit freaked
out thinking that it's taxing resources heavily.


For what it's worth, with the LVM backend under heavy load we've run into cases 
where cinder-volume ends up being blocked by disk I/O for over a minute.


Now this was pretty much a worst-case, with cinder volumes on a single spinning 
disk.  But the fact that IO cgroups don't work with LVM (this is a linux kernel 
limitation) means that it's difficult to ensure that the cinder process doesn't 
block indefinitely on disk IO.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] User Survey usage of QEMU (as opposed to KVM) ?

2016-05-11 Thread Chris Friesen

On 05/11/2016 01:29 PM, Robert Starmer wrote:

I don't disagree, what we're really getting at is that any lookup (ask the
system what it's using on a particular instance, look at the config, look at the
output of a nova CLI request, querry via Horizon), should all return the same
answer.  So one is a bug (Horizon), the other requires looking up information in
the system itself.  As I suggested, the config is one path, and I still believe
will provide the current correct answer for the hypervisor node (Linux QEMU/KVM
or QEMU/QEMU) regardless of other issues, and the Horizon path is a bug that
should be fixed.


I think the problem is poor modeling.  We specify "virt_type", but export the 
hypervisor.


The "virt_type" option does not have a 1:1 mapping to hypervisor.  Both kvm and 
qemu will use the "qemu" hypervisor but kvm will enable hardware acceleration.


Perhaps we should change it to export "virt_type" instead via a microversion.

Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] User Survey usage of QEMU (as opposed to KVM) ?

2016-05-11 Thread Chris Friesen

On 05/11/2016 11:46 AM, Ronald Bradford wrote:

I have been curious as to why as mentioned in the thread virt_type=kvm, but
os-hypervisors API call states QEMU.


Arguably in both cases the hypervisor is qemu.  When virt_type=kvm we simply 
enable some additional acceleration.


So rather than asking "Are you using qemu or kvm?", it would be more accurate to 
ask "Are you using hardware-accelerated qemu or just software emulation?".


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][scheduler] please review nova sched logging proposal

2016-05-03 Thread Chris Friesen

Hi all,

There's a proposal for improving the nova scheduler logs up at 
https://review.openstack.org/#/c/306647/


If you would like to be able to more easily determine why no valid host was 
found, please review the proposal and leave feedback.


Thanks,
Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Upstream linux kernel bug if CONFIG_VIRT_CPU_ACCOUNTING_GEN enabled in guest

2016-03-06 Thread Chris Friesen

Hi all,

Just thought I'd mention that if anyone has been seeing odd idle/system/user 
results in /proc/stats or "top" in a guest with CONFIG_VIRT_CPU_ACCOUNTING_GEN 
enabled (it's automatically selected by CONFIG_NO_HZ_FULL) that it's not your 
imagination or anything you did wrong.


I recently discovered a bug in the kernel which triggers in the above scenario. 
 It appears to have been introduced in kernel version 3.15.  The proposed patch 
is available at:


http://www.spinics.net/lists/kernel/msg2205350.html

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] libvirt cpu type per instance?

2016-03-03 Thread Chris Friesen

On 03/03/2016 03:20 PM, Kris G. Lindgren wrote:

I would be curious if specifing the cpu type would actually restrict
performance.  As far as I know, this only restricts the cpu features presented
to a vm.  You can present a vm that has the cpu instruction sets of a Pentium 3
– but runs and is as performant as a single core on a 2.8ghz hexcore cpu.


This is my understanding as well.  We're not simulating the performance of a CPU 
model, just the instruction set and exposed features.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] cinder volume_clear=zero makes sense with rbd ?

2015-11-04 Thread Chris Friesen

On 11/04/2015 08:46 AM, Saverio Proto wrote:

Hello there,

I am using cinder with rbd, and most volumes are created from glance
images on rbd as well.
Because of ceph features, these volumes are CoW and only blocks
different from the original parent image are really written.

Today I am debugging why in my production system deleting cinder
volumes gets very slow. Looks like the problem happens only at scale,
I can't reproduce it on my small test cluster.

I read all the cinder.conf reference, and I found this default value
=>   volume_clear=0.

Is this parameter evaluated when cinder works with rbd ?


I don't think that's actually used with rbd, since as you say Ceph uses CoW 
internally.


I believe it's also ignored if you use LVM with thin provisioning.

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] how to deal with always-growing notifications.info queue?

2015-09-30 Thread Chris Friesen

Hi,

We've recently run into an issue where the notifications.info rabbitmq queue is 
perpetually growing, ultimately consuming significant amounts of memory.


How do others deal with this?  Do you always have a consumer draining the queue?

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Migrating an instance to a host with less cores fails

2015-09-25 Thread Chris Friesen

On 09/25/2015 12:34 PM, Steve Gordon wrote:


Nikola's reply got bounced because he isn't subscribed, but:

"""
Thanks Steve!

So the below is likely the same root cause as this bug:

https://launchpad.net/bugs/1461777

Which has been fixed in Liberty and backported to stable/kilo (see
https://review.openstack.org/#/c/191594/)

Updating your lab to the latest stable/kilo release (2015.1.1) will
likely fix the problem for you.

Let me know if this helps!


My concern with that patch is that it only applies if there is no numa topology 
for the instance.  I think it's possible to specify hugepages but not CPU 
pinning, in which case there would still be a numa topology for the instance, it 
just wouldn't have pinned CPUs in it.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Migrating an instance to a host with less cores fails

2015-09-25 Thread Chris Friesen
This is a long-standing issue.  Nikola has been working on it in Liberty for the 
CPU pinning case, not sure about the non-pinned case.  And of course patching 
back to Kilo hasn't been done yet.


Aubrey, what you're seeing is definitely a bug.  There is an existing bug 
https://bugs.launchpad.net/nova/+bug/1417667 but that is specifically for 
dedicated CPUs which doesn't apply in your case.  Please feel free to open a new 
bug.


Chris

On 09/25/2015 12:16 PM, Kris G. Lindgren wrote:

I believe TWC - (medberry on irc) was lamenting to me about cpusets, different 
hypervisors HW configs, and unassigned vcpu's in numa nodes.

The problem is the migration does not re-define the domain.xml, specifically, 
the vcpu mapping to match what makes sense on the new host.  I believe the 
issue is more pronounced when you go from a compute node with more cores to a 
compute node with less cores. I believe the opposite migration works, just the 
vcpu/numa nodes are all wrong.

CC'ing him as well.
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy







On 9/25/15, 11:53 AM, "Steve Gordon"  wrote:


Adding Nikola as he has been working on this.

- Original Message -

From: "Aubrey Wells" 
To: openstack-operators@lists.openstack.org

Greetings,
Trying to decide if this is a bug or just a config option that I can't
find. The setup I'm currently testing in my lab with is two compute nodes
running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x 4c
+ HT). I don't have any CPU pinning enabled in my nova config, which seems
to have the effect of setting in libvirt.xml a vcpu cpuset element like (if
created on the 40c node):

1

And then if I migrate that instance to the 16c node, it will bomb out with
an exception:

Live Migration failure: Invalid value
'0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for 'cpuset.cpus':
Invalid argument

Which makes sense, since that node doesn't have any vcpus after 15 (0-15).

I can fix the symptom by commenting out a line in
nova/virt/libvirt/config.py (circa line 1831) so it always has an empty
cpuset and thus doesn't write that line to libvirt.xml:
# vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset))

And the instance will happily migrate to the host with less CPUs, but this
loses some of the benefit of openstack trying to evenly spread out the core
usage on the host, at least that's what I think the purpose of that is.

I'd rather fix it the right way if there's a config option I don't see or
file a bug if its a bug.

What I think should be happening is that when it creates the libvirt
definition on the destination compute node, it write out the correct cpuset
per the specs of the hardware its going on to.

If it matters, in my nova-compute.conf file, I also have cpu mode and model
defined to allow me to migrate between the two different architectures to
begin with (the 40c is Sandybridge and the 16c is Westmere so I set it to
the lowest common denominator of Westmere):

cpu_mode=custom
cpu_model=Westmere

Any help is appreciated.

-
Aubrey

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



--
Steve Gordon, RHCE
Sr. Technical Product Manager,
Red Hat Enterprise Linux OpenStack Platform

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Rate limit an max_count

2015-09-10 Thread Chris Friesen

On 09/10/2015 08:11 AM, Matt Fischer wrote:

While I think there is probably some value in rate limiting API calls, I think
your "user wants to launch x000 instances" is extremely limited. There's maybe 1
or 2 (or 0) operators that have that amount of spare capacity just sitting
around that they can allow a user to have a quota of 2000 instances without
doing an infrastructure build-out.


At the default CPU overcommit of 16x, 2000 instances at say 2 cores/instance is 
only 250 physical cores.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Rate limit an max_count

2015-09-09 Thread Chris Friesen

On 09/09/2015 10:36 AM, David Medberry wrote:

Your users should also have reasonable quotas set. If they can boot thousands of
instances, you may have a quota issue to address. (No problem with the blueprint
or need to set an overall limit though--just that you should be able to address
this without waiting for that to land.)


I'm pretty sure if I paid for a suitable Rackspace account they would let me 
boot up thousands of instances as long as I could pay for them...


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Puzzling issue: Unacceptable CPU info: CPU doesn't have compatibility

2015-07-17 Thread Chris Friesen

On 07/17/2015 07:28 AM, Daniel P. Berrange wrote:

On Fri, Jul 17, 2015 at 06:58:46AM -0600, David Medberry wrote:

HI Daniel,

Yep found that all out.

Now I'm struggling through the NUMA mismatch. NUMA as there are two cpus.
The old CPU was a 10 core 20 thread thus 40 "cpus", {0-9,20-29} and then
{10-19,30-39} on the other cell. The new CPU is a 12 core 24 thread.
Apparently even in kilo, this results in a mismatch if I'm running a 2 VCPU
guest and trying to migrate from new to old. I suspect I have to disable
NUMA somehow (filter, etc) but it is entirely non-obvious. And of course
I'm doing this again in OpenStack nova (not direct libvirt) so I'm going to
do a bit more research and then file a new bug. This also may be fixed in
Kilo but I"m not finding it (and it may be fixed in Liberty already and
just need a backport.)

My apologies for not following up to the list once I found the Kilo
solution to the original problem.


The fact that Nova doesn't rewrite numa topology on migrate is a known
bug which Nikola is working on fixing in Liberty. IIRC, you ought to be
able to avoid it by just disabling the NUMA schedular filter.


I wonder if this is due to the "isolate an instance on a numa node" work 
combined with the fact that numa topology doesn't get rewritten.  If so, it 
might be "fixed" by commit 41ba203 on stable/kilo.


If the guest is using dedicated CPUs then there's no fix, it's just broken for 
live/cold migration, resize, and evacuate.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] What to do when a compute node dies?

2015-03-30 Thread Chris Friesen

On 03/30/2015 09:53 PM, Jay Pipes wrote:

On 03/30/2015 07:30 PM, Chris Friesen wrote:

On 03/30/2015 04:57 PM, Jay Pipes wrote:

On 03/30/2015 06:42 PM, Chris Friesen wrote:

On 03/30/2015 02:47 PM, Jay Pipes wrote:

On 03/30/2015 10:42 AM, Chris Friesen wrote:

On 03/29/2015 09:26 PM, Mike Dorman wrote:

Hi all,

I’m curious about how people deal with failures of compute
 nodes, as in total failure when the box is gone for good.
 (Mainly care about KVM HV, but also interested in more
general cases as well.)

The particular situation we’re looking at: how end users
could identify or be notified of VMs that no longer exist,
because their hypervisor is dead.  As I understand it, Nova
will still believe VMs are running, and really has no way
to know anything has changed (other than the nova-compute
instance has dropped off.)

I understand failure detection is a tricky thing.  But it
seems like there must be something a little better than
this.


This is a timely question...I was wondering if it might make
 sense to upstream one of the changes we've made locally.

We have an external entity monitoring the health of compute
nodes. When one of them goes down we automatically take
action regarding the instances that had been running on it.

Normally nova won't let you evacuate an instance until the
compute node is detected as "down", but that takes 60 sec
typically and our software knows the compute node is gone
within a few seconds.


Any external monitoring solution that detects the compute node
is "down" could issue a call to `nova evacuate $HOST`.

The question I have for you is what does your software
consider as a "downed" node? Is it some heartbeat-type stuff in
network connectivity? A watchdog in KVM? Some proactive
monitoring of disk or memory faults? Some combination?
Something entirely different? :)


Combination of the above.  A local entity monitors "critical
stuff" on the compute node, and heartbeats with a control node
via one or more network links.


OK.


The change we made was to patch nova to allow the health
monitor to explicitly tell nova that the node is to be
considered "down" (so that instances can be evacuated
without delay).


Why was it necessary to modify Nova for this? The external
monitoring script could easily do: `nova service-disable $HOST
 nova-compute` and that immediately takes the compute node out
 of service and enables evacuation.


Disabling the service is not sufficient.
compute.api.API.evacuate() throws an exception if
servicegroup.api.API.service_is_up(service) is true.


servicegroup.api.service_is_up() returns whether the service has
been disabled in the database (when using the DB servicegroup
driver). Which is what `nova service-disable $HOST nova-compute`
does.


I must be missing something.

It seems to me that servicegroup.drivers.db.DbDriver.is_up() returns
whether the database row for the service has been updated for any
reason within the last 60 seconds. (Assuming the default
CONF.service_down_time.)

Incidentally, I've proposed https://review.openstack.org/163060 to
change that logic so that it returns whether the service has sent in
a status report in the last 60 seconds.  (As it stands currently if
you disable/enable a "down" service it'll report that the service is
"up" for the next 60 seconds.)


What servicegroup driver are you using?


The DB driver.


You've hit upon a bug. In no way should a disabled service be considered
"up". Apologies. I checked the code and indeed, there is no test for
whether the service record from the DB is disabled or not.


I don't think it's a bug.  It makes sense to have the administrative state 
(enabled/disabled) tracked separately from the operational state (up/down).


If we administratively disable a compute node, that just means that the 
scheduler won't put new instances on it.  It doesn't do anything to the 
instances already there.  It's up to something outside of nova (the admin user, 
or some orchestration software) to move them elsewhere if appropriate.


It actually makes sense to only allow evacuating from an operationally down 
compute node, because if the compute node is operationally up (even if 
administratively disabled) then you could do a migration (live or cold) which 
would be cleaner than an evacuate.  The evacuate code assumes the instance isn't 
currently running, and that assumption is only true if the compute node is 
operationally down.


The only issue I see with the current code is that it's possible to have a 
situation where some other code (the external monitor) knows quicker than the 
nova code that another compute node should be considered "down".


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] What to do when a compute node dies?

2015-03-30 Thread Chris Friesen

On 03/30/2015 04:57 PM, Jay Pipes wrote:

On 03/30/2015 06:42 PM, Chris Friesen wrote:

On 03/30/2015 02:47 PM, Jay Pipes wrote:

On 03/30/2015 10:42 AM, Chris Friesen wrote:

On 03/29/2015 09:26 PM, Mike Dorman wrote:

Hi all,

I’m curious about how people deal with failures of compute
nodes, as in total failure when the box is gone for good.
(Mainly care about KVM HV, but also interested in more general
cases as well.)

The particular situation we’re looking at: how end users could
identify or be notified of VMs that no longer exist, because
their hypervisor is dead.  As I understand it, Nova will still
believe VMs are running, and really has no way to know anything
has changed (other than the nova-compute instance has dropped
off.)

I understand failure detection is a tricky thing.  But it
seems like there must be something a little better than this.


This is a timely question...I was wondering if it might make
sense to upstream one of the changes we've made locally.

We have an external entity monitoring the health of compute
nodes. When one of them goes down we automatically take action
regarding the instances that had been running on it.

Normally nova won't let you evacuate an instance until the
compute node is detected as "down", but that takes 60 sec
typically and our software knows the compute node is gone within
a few seconds.


Any external monitoring solution that detects the compute node is
"down" could issue a call to `nova evacuate $HOST`.

The question I have for you is what does your software consider as
a "downed" node? Is it some heartbeat-type stuff in network
connectivity? A watchdog in KVM? Some proactive monitoring of disk
or memory faults? Some combination? Something entirely different?
:)


Combination of the above.  A local entity monitors "critical stuff"
on the compute node, and heartbeats with a control node via one or
more network links.


OK.


The change we made was to patch nova to allow the health monitor
to explicitly tell nova that the node is to be considered "down"
(so that instances can be evacuated without delay).


Why was it necessary to modify Nova for this? The external
monitoring script could easily do: `nova service-disable $HOST
nova-compute` and that immediately takes the compute node out of
service and enables evacuation.


Disabling the service is not sufficient.  compute.api.API.evacuate()
 throws an exception if servicegroup.api.API.service_is_up(service)
is true.


servicegroup.api.service_is_up() returns whether the service has been disabled
in the database (when using the DB servicegroup driver). Which is what `nova
service-disable $HOST nova-compute` does.


I must be missing something.

It seems to me that servicegroup.drivers.db.DbDriver.is_up() returns whether the 
database row for the service has been updated for any reason within the last 60 
seconds. (Assuming the default CONF.service_down_time.)


Incidentally, I've proposed https://review.openstack.org/163060 to change that 
logic so that it returns whether the service has sent in a status report in the 
last 60 seconds.  (As it stands currently if you disable/enable a "down" service 
it'll report that the service is "up" for the next 60 seconds.)



What servicegroup driver are you using?


The DB driver.

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] What to do when a compute node dies?

2015-03-30 Thread Chris Friesen

On 03/30/2015 02:47 PM, Jay Pipes wrote:

On 03/30/2015 10:42 AM, Chris Friesen wrote:

On 03/29/2015 09:26 PM, Mike Dorman wrote:

Hi all,

I’m curious about how people deal with failures of compute nodes,
as in total failure when the box is gone for good.  (Mainly care
about KVM HV, but also interested in more general cases as well.)

The particular situation we’re looking at: how end users could
identify or be notified of VMs that no longer exist, because their
hypervisor is dead.  As I understand it, Nova will still believe
VMs are running, and really has no way to know anything has changed
(other than the nova-compute instance has dropped off.)

I understand failure detection is a tricky thing.  But it seems
like there must be something a little better than this.


This is a timely question...I was wondering if it might make sense to
upstream one of the changes we've made locally.

We have an external entity monitoring the health of compute nodes.
When one of them goes down we automatically take action regarding the
instances that had been running on it.

Normally nova won't let you evacuate an instance until the compute
node is detected as "down", but that takes 60 sec typically and our
software knows the compute node is gone within a few seconds.


Any external monitoring solution that detects the compute node is "down" could
issue a call to `nova evacuate $HOST`.

The question I have for you is what does your software consider as a "downed"
node? Is it some heartbeat-type stuff in network connectivity? A watchdog in
KVM? Some proactive monitoring of disk or memory faults? Some combination?
Something entirely different? :)


Combination of the above.  A local entity monitors "critical stuff" on the 
compute node, and heartbeats with a control node via one or more network links.



The change we made was to patch nova to allow the health monitor to
explicitly tell nova that the node is to be considered "down" (so
that instances can be evacuated without delay).


Why was it necessary to modify Nova for this? The external monitoring script
could easily do: `nova service-disable $HOST nova-compute` and that immediately
takes the compute node out of service and enables evacuation.


Disabling the service is not sufficient.  compute.api.API.evacuate() throws an 
exception if servicegroup.api.API.service_is_up(service) is true.



 > When the external

monitoring entity detects that the compute node is back, it tells
nova the node may be considered "up" (if nova agrees that it's
"up").


You mean `nova service-disable $HOST nova-compute`?


Is this ability to tell nova that a compute node is "down" something
 that would be of interest to others?


Unless I'm mistaken, `nova service-disable $HOST nova-compute` already exists
that does this?


No, what we have is basically a way to cause 
servicegroup.api.API.service_is_up() to return false.  That causes the correct 
status to be displayed in the "State" column in the output of "nova 
service-list" and allows evacuation to proceed.


Chris


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] What to do when a compute node dies?

2015-03-30 Thread Chris Friesen

On 03/29/2015 09:26 PM, Mike Dorman wrote:

Hi all,

I’m curious about how people deal with failures of compute nodes, as in total
failure when the box is gone for good.  (Mainly care about KVM HV, but also
interested in more general cases as well.)

The particular situation we’re looking at: how end users could identify or be
notified of VMs that no longer exist, because their hypervisor is dead.  As I
understand it, Nova will still believe VMs are running, and really has no way to
know anything has changed (other than the nova-compute instance has dropped 
off.)

I understand failure detection is a tricky thing.  But it seems like there must
be something a little better than this.


This is a timely question...I was wondering if it might make sense to upstream 
one of the changes we've made locally.


We have an external entity monitoring the health of compute nodes.  When one of 
them goes down we automatically take action regarding the instances that had 
been running on it.


Normally nova won't let you evacuate an instance until the compute node is 
detected as "down", but that takes 60 sec typically and our software knows the 
compute node is gone within a few seconds.


The change we made was to patch nova to allow the health monitor to explicitly 
tell nova that the node is to be considered "down" (so that instances can be 
evacuated without delay).  When the external monitoring entity detects that the 
compute node is back, it tells nova the node may be considered "up" (if nova 
agrees that it's "up").


Is this ability to tell nova that a compute node is "down" something that would 
be of interest to others?


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Hyper-converged OpenStack with Ceph

2015-03-19 Thread Chris Friesen

On 03/19/2015 10:33 AM, Fox, Kevin M wrote:

We've running it both ways. We have clouds with dedicated storage nodes, and
clouds sharing storage/compute.

The storage/compute solution with ceph is working ok for us. But, that
particular cloud is 1gigabit only and seems very slow compared to our other
clouds. But because of the gigabit interconnect, while the others are
40gigabit, its not clear if its slow because of the storage/compute together,
or simply because of the slower interconnect. Could be some of both.


Are you doing anything to isolate the ceph work from the nova compute work? 
(Separate NICs, separate CPUs, different disks, etc.)


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] rbd ephemeral storage, very slow deleting...

2014-09-24 Thread Chris Friesen

On 09/24/2014 02:43 PM, Scott Kohler wrote:

On 09/24/2014 04:29 PM, Abel Lopez wrote:

This is expected behavior, unfortunately.
I spoke to the ceph guys about this last year. When you delete an ‘image’ from 
a pool, the monitors (IIRC) don’t instantly know where all the segments are 
across all the OSDs, so it takes a while to find/delete each one.



We also saw this, and were especially alarmed at the high CPU load on
deletion. So we came up with a work-around that also makes for very fast
instance creation as well as deletion, if you are using Ceph:

Create a volume from the image first, then create an instance from that
bootable volume. Volume creation/deletion is quite fast, with down side
of a two-step process. We have a CLI script that has an instance up and
ssh-able in less than 4 minutes, including volume creation. booting and
selinux recalc on first boot. Deletes take about 30 seconds. FWIW.


Presumably the resource reclamation time is affected by the volume_clear 
setting and volume size?  I'm guessing it'd take longer than 30 seconds 
to zero out a large volume.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] anyone using RabbitMQ with active/active mirrored queues?

2014-09-11 Thread Chris Friesen

On 09/11/2014 01:50 PM, James Dempsey wrote:

On 12/09/14 04:15, Chris Friesen wrote:

Hi,

The OpenStack high availability guide seems to be a bit ambiguous about
whether RabbitMQ should be configured active/standby or
active/active...both methods are described.

Has anyone tried using active/active with mirrored queues as recommended
by the RabbitMQ developers?  If so, what problems did you run into?

Thanks,
Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Hi Chris,

We do Active/Active RabbitMQ with mirrored queues and Precise/Havana.
We faced a lot of failover problems where clients weren't figuring out
that their connections were dead.  The TCP keepalives mentioned in the
following bug seemed to help a lot.
https://bugs.launchpad.net/nova/+bug/856764  The moral of our story is
to make sure you are monitoring the sanity of your agents.


We had client-side issues on failovers as well.  Keepalives seemed to 
help, but we also ported the following patch to all the applicable 
clients (since they're not using oslo.messaging yet).


https://github.com/openstack/oslo.messaging/commit/0400cbf4f83cf8d58076c7e65e08a156ec3508a8

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] anyone using RabbitMQ with active/active mirrored queues?

2014-09-11 Thread Chris Friesen

Hi,

The OpenStack high availability guide seems to be a bit ambiguous about 
whether RabbitMQ should be configured active/standby or 
active/active...both methods are described.


Has anyone tried using active/active with mirrored queues as recommended 
by the RabbitMQ developers?  If so, what problems did you run into?


Thanks,
Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators