[Openstack-operators] Live-migration experiences?

2018-08-06 Thread Clint Byrum
Hello! At GoDaddy, we're about to start experimenting with live 
migration. While setting it up, we've found a number of options that 
seem attractive/useful, but we're wondering if anyone has data/anecdotes 
about specific configurations of live migration. Your time in reading 
them is appreciated!


First a few facts about our installation:

* We're using kolla-ansible and basically leaving most nova settings at 
the default, meaning libvirt+kvm
* We will be using block migration, as we have no shared storage of any 
kind.
* We use routed networks to set up L2 segments per-rack. Each rack is 
basically an island unto itself. The VMs on one rack cannot be migrated 
to another rack  because of this.
* Our main resource limitation is disk, followed closely by RAM. As 
such, our main motivation for wanting to do live migration is to be able 
to move VMs off of machines where over-subscribed disk users start to 
threaten the free space of the others.


Now, some things we'd love your help with:

* TLS for libvirt - We do not want to transfer the contents of VMs' RAM 
over unencrypted sockets. We want to setup TLS with an internal CA and 
tls_allowed_dn_list controlling access. Has anyone reading this used 
this setup? Do you have suggestions, reservations, or encouragement for 
us wanting to do it this way?


* Raw backed qcow2 files - Our instances use qcow2, and our images are 
uploaded as a raw-backed qcow2. As a result we get maximum disk savings 
with excellent read performance. When live migrating these around, have 
you found that they continue to use the same space on the target node as 
they did on the source? If not, did you find a workaround?


* Do people have feedback on live_migrate_permit_auto_convergence? It 
seems like a reasonable trade-off, but since it is defaulted to false, I 
wonder if there are some hidden gotchas there.


* General pointers to excellent guides, white papers, etc, that might 
help us avoid doing all of our learning via trial/error.


Thanks very much for your time!

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Mixed service version CI testing

2017-12-28 Thread Clint Byrum
Excerpts from Matt Riedemann's message of 2017-12-19 09:58:34 -0600:
> During discussion in the TC channel today [1], we got talking about how 
> there is a perception that you must upgrade all of the services together 
> for anything to work, at least the 'core' services like 
> keystone/nova/cinder/neutron/glance - although maybe that's really just 
> nova/cinder/neutron?
> 
> Anyway, I posit that the services are not as tightly coupled as some 
> people assume they are, at least not since kilo era when microversions 
> started happening in nova.
> 
> However, with the way we do CI testing, and release everything together, 
> the perception is there that all things must go together to work.
> 
> In our current upgrade job, we upgrade everything to N except the 
> nova-compute service, that remains at N-1 to test rolling upgrades of 
> your computes and to make sure guests are unaffected by the upgrade of 
> the control plane.
> 
> I asked if it would be valuable to our users (mostly ops for this 
> right?) if we had an upgrade job where everything *except* nova were 
> upgraded. If that's how the majority of people are doing upgrades anyway 
> it seems we should make sure that works.
> 
> I figure leaving nova at N-1 makes more sense because nova depends on 
> the other services (keystone/glance/cinder/neutron) and is likely the 
> harder / slower upgrade if you're going to do rolling upgrades of your 
> compute nodes.
> 
> This type of job would not run on nova changes on the master branch, 
> since those changes would not be exercised in this type of environment. 
> So we'd run this on master branch changes to 
> keystone/cinder/glance/neutron/trove/designate/etc.
> 
> Does that make sense? Would this be valuable at all? Or should the 
> opposite be tested where we upgrade nova to N and leave all of the 
> dependent services at N-1?
> 

It makes sense completely. What would really be awesome would be to test
the matrix of single upgrades:

upgrade only keystone
upgrade only glance
upgrade only neutron
upgrade only cinder
upgrade only nova

That would have a good chance at catching any co-dependencies that crop
up.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Upstream LTS Releases

2017-11-12 Thread Clint Byrum
Excerpts from Doug Hellmann's message of 2017-11-11 12:19:56 -0500:
> Excerpts from John Dickinson's message of 2017-11-10 14:51:08 -0800:
> > On 7 Nov 2017, at 15:28, Erik McCormick wrote:
> > 
> > > Hello Ops folks,
> > >
> > > This morning at the Sydney Summit we had a very well attended and very
> > > productive session about how to go about keeping a selection of past
> > > releases available and maintained for a longer period of time (LTS).
> > >
> > > There was agreement in the room that this could be accomplished by
> > > moving the responsibility for those releases from the Stable Branch
> > > team down to those who are already creating and testing patches for
> > > old releases: The distros, deployers, and operators.
> > >
> > > The concept, in general, is to create a new set of cores from these
> > > groups, and use 3rd party CI to validate patches. There are lots of
> > > details to be worked out yet, but our amazing UC (User Committee) will
> > > be begin working out the details.
> > >
> > > Please take a look at the Etherpad from the session if you'd like to
> > > see the details. More importantly, if you would like to contribute to
> > > this effort, please add your name to the list starting on line 133.
> > >
> > > https://etherpad.openstack.org/p/SYD-forum-upstream-lts-releases
> > >
> > > Thanks to everyone who participated!
> > >
> > > Cheers,
> > > Erik
> > >
> > > ___
> > > OpenStack-operators mailing list
> > > OpenStack-operators@lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > 
> > I'm not a fan of the current proposal. I feel like the discussion jumped 
> > into a policy/procedure solution without getting much more feedback from 
> > operators. The room heard "ops want LTS" and we now have a new governance 
> > model to work out.
> > 
> > What I heard from ops in the room is that they want (to start) one release 
> > a year who's branch isn't deleted after a year. What if that's exactly what 
> > we did? I propose that OpenStack only do one release a year instead of two. 
> > We still keep N-2 stable releases around. We still do backports to all open 
> > stable branches. We still do all the things we're doing now, we just do it 
> > once a year instead of twice.
> 
> We have so far only been able to find people to maintain stable
> branches for 12-18 months. Keeping N-2 branches for annual releases
> open would mean extending that support period to 2+ years. So, if
> we're going to do that, we need to address the fact that we haven't
> been able to retain anyone's attention that long up to this point.
> Do you think keeping the branches open longer will be sufficient
> to attract contributors to actually work on them?
> 

I don't think that, no. However, I do think if you also relieved the
current stable release pressure by cutting stable releases less often,
you'd be tasking the same backporters and maintainers with the same
amount of work. The difference would be that the work done early in a
cycle would mean more, as it would last longer, and thus might get
a bump in priority and overall community efficiency.

It's not all free though. There's a balancing act to play here. If you
force users to wait longer for things that aren't being backported,
they're likely to diverge more downstream

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] Upstream LTS Releases

2017-11-11 Thread Clint Byrum
Excerpts from Doug Hellmann's message of 2017-11-10 13:11:45 -0500:
> Excerpts from Clint Byrum's message of 2017-11-08 23:15:15 -0800:
> > Excerpts from Samuel Cassiba's message of 2017-11-08 08:27:12 -0800:
> > > On Tue, Nov 7, 2017 at 3:28 PM, Erik McCormick
> > >  wrote:
> > > > Hello Ops folks,
> > > >
> > > > This morning at the Sydney Summit we had a very well attended and very
> > > > productive session about how to go about keeping a selection of past
> > > > releases available and maintained for a longer period of time (LTS).
> > > >
> > > > There was agreement in the room that this could be accomplished by
> > > > moving the responsibility for those releases from the Stable Branch
> > > > team down to those who are already creating and testing patches for
> > > > old releases: The distros, deployers, and operators.
> > > >
> > > > The concept, in general, is to create a new set of cores from these
> > > > groups, and use 3rd party CI to validate patches. There are lots of
> > > > details to be worked out yet, but our amazing UC (User Committee) will
> > > > be begin working out the details.
> > > >
> > > > Please take a look at the Etherpad from the session if you'd like to
> > > > see the details. More importantly, if you would like to contribute to
> > > > this effort, please add your name to the list starting on line 133.
> > > >
> > > > https://etherpad.openstack.org/p/SYD-forum-upstream-lts-releases
> > > >
> > > > Thanks to everyone who participated!
> > > >
> > > > Cheers,
> > > > Erik
> > > >
> > > > __
> > > > OpenStack Development Mailing List (not for usage questions)
> > > > Unsubscribe: 
> > > > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > 
> > > In advance, pardon the defensive tone. I was not in a position to
> > > attend, or even be in Sydney. However, as this comes across the ML, I
> > > can't help but get the impression this effort would be forcing more
> > > work on already stretched teams, ie. deployment-focused development
> > > teams already under a crunch as contributor count continues to decline
> > > in favor of other projects inside and out of OpenStack.
> > > 
> > 
> > I suspect if LTS's become a normal part of OpenStack, most deployment
> > projects will decline to support the interim releases. We can infer this
> > from the way Ubuntu is used. This might actually be a good thing for the
> > chef OpenStack community. 3 out of 3.5 of you can focus on the LTS bits,
> > and the 0.5 person can do some best effort to cover the weird corner
> > case of "previous stable release to master".
> > 
> > The biggest challenge will be ensuring that the skip-level upgrades
> > work. The current grenade based upgrade jobs are already quite a bear to
> > keep working IIRC. I've not seen if chef or any of the deployment projects
> > test upgrades like that.
> > 
> > However, if people can stop caring much about the interim releases and
> > just keep "previous LTS to master" upgrades working, then that might be
> > good for casual adoption.
> > 
> > Personally I'd rather we make it easier to run "rolling release"
> > OpenStack. Maybe we can do that if we stop cutting stable releases every
> > 6 months.
> > 
> 
> We should stop calling what we're talking about "LTS". It isn't
> going to match the expectations of anyone receiving LTS releases
> for other products, at least at first. Perhaps "Deployer Supported"
> or "User Supported" are better terms for what we're talking about.
> 

I believe this state we're in is a stop-gap on the way to the full
LTS. People are already getting stuck. We're going to help them stay stuck
by upstreaming bug fixes.  We should be mindful of that and provide a way
to get less-stuck. The LTS model from other projects has proven quite
popular, and it would make sense for us to embrace it if our operators
are hurting with the current model, which I believe they are.

> In the "LTS" room we did not agree to stop cutting stable releases
> or to start supporting upgrades directly from N-2 (or older) to N.
> Both of those changes would require modifying the support the
> existing contributor base has committed to provide.
> 

Thanks, I am just inferring those things from what was agreed on. However,
It would make a lot of sense to discuss the plans for the future, even
if we don't have data from the present proposal.

> Fast-forward upgrades will still need to run the migration steps
> of each release in order, one at a time. The team working on that
> is going to produce a document describing what works today so we
> can analyze it for ways to improve the upgrade experience, for both
> fast-forward and "regular" upgrades.  That was all discussed in a
> separate session.
> 

We are what we test. If we're going to test fast-forwards, how far into
the past do we test? It 

Re: [Openstack-operators] [openstack-dev] Upstream LTS Releases

2017-11-08 Thread Clint Byrum
Excerpts from Samuel Cassiba's message of 2017-11-08 08:27:12 -0800:
> On Tue, Nov 7, 2017 at 3:28 PM, Erik McCormick
>  wrote:
> > Hello Ops folks,
> >
> > This morning at the Sydney Summit we had a very well attended and very
> > productive session about how to go about keeping a selection of past
> > releases available and maintained for a longer period of time (LTS).
> >
> > There was agreement in the room that this could be accomplished by
> > moving the responsibility for those releases from the Stable Branch
> > team down to those who are already creating and testing patches for
> > old releases: The distros, deployers, and operators.
> >
> > The concept, in general, is to create a new set of cores from these
> > groups, and use 3rd party CI to validate patches. There are lots of
> > details to be worked out yet, but our amazing UC (User Committee) will
> > be begin working out the details.
> >
> > Please take a look at the Etherpad from the session if you'd like to
> > see the details. More importantly, if you would like to contribute to
> > this effort, please add your name to the list starting on line 133.
> >
> > https://etherpad.openstack.org/p/SYD-forum-upstream-lts-releases
> >
> > Thanks to everyone who participated!
> >
> > Cheers,
> > Erik
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> In advance, pardon the defensive tone. I was not in a position to
> attend, or even be in Sydney. However, as this comes across the ML, I
> can't help but get the impression this effort would be forcing more
> work on already stretched teams, ie. deployment-focused development
> teams already under a crunch as contributor count continues to decline
> in favor of other projects inside and out of OpenStack.
> 

I suspect if LTS's become a normal part of OpenStack, most deployment
projects will decline to support the interim releases. We can infer this
from the way Ubuntu is used. This might actually be a good thing for the
chef OpenStack community. 3 out of 3.5 of you can focus on the LTS bits,
and the 0.5 person can do some best effort to cover the weird corner
case of "previous stable release to master".

The biggest challenge will be ensuring that the skip-level upgrades
work. The current grenade based upgrade jobs are already quite a bear to
keep working IIRC. I've not seen if chef or any of the deployment projects
test upgrades like that.

However, if people can stop caring much about the interim releases and
just keep "previous LTS to master" upgrades working, then that might be
good for casual adoption.

Personally I'd rather we make it easier to run "rolling release"
OpenStack. Maybe we can do that if we stop cutting stable releases every
6 months.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-06 Thread Clint Byrum
Thanks Tomas, I did understand you, I just didn't make my point perfectly.

The point is that OpenStack has two very different missions today, and
that is causing my frustration and I have let that go for now. There
is a hosting mission, where we try to keep computing pets alive, and a
cloud mission, where we try to give people flexible access to computing
resources at scale to use as cattle.

I've done a poor job of acknowledging those who use OpenStack for hosting,
and I'm trying to get better. Thanks for being a user!

Excerpts from Tomáš Vondra's message of 2017-10-06 12:06:45 +0200:
> Dear Clint,
> maybe you misunderstood a little, or I didn't write it explicitly. We use 
> OpenStack for providing a VPS service, yes. But the VPS users do not get 
> access to OpenStack directly, but instead, they use our Customer Portal which 
> does the orchestration. The whole point is to make the service as easy as 
> possible to use for them and not expose them to the complexity of the Cloud. 
> As I said, we couldn't use Rebuild because VPS's have Volumes. We do use 
> Resize because it is there. But we could as well use more low-level cloud 
> primitives. The user does not care in this case. How does, e.g., WHMCS do it? 
> That is a stock software that you can use to provide VPS over OpenStack.
> Tomas from Homeatcloud
> 
> -----Original Message-
> From: Clint Byrum [mailto:cl...@fewbar.com] 
> Sent: Thursday, October 05, 2017 6:50 PM
> To: openstack-operators
> Subject: Re: [Openstack-operators] [nova] Should we allow passing new 
> user_data during rebuild?
> 
> No offense is intended, so please forgive me for the possibly incendiary 
> nature of what I'm about to write:
> 
> VPS is the predecessor of cloud (and something I love very much, and rely on 
> every day!), and encourages all the bad habits that a cloud disallows. At 
> small scale, it's the right thing, and that's why I use it for my small scale 
> needs. Get a VM, put your stuff on it, and keep it running forever.
> 
> But at scale, VMs in clouds go away. They get migrated, rebooted, turned off, 
> and discarded, often. Most clouds are terrible for VPS compared to VPS 
> hosting environments.
> 
> I'm glad it's working for you. And I think rebuild and resize will stay and 
> improve to serve VPS style users in interesting ways. I'm learning now who 
> our users are today, and I'm confident we should make sure everyone who has 
> taken the time to deploy and care for OpenStack should be served by expanding 
> rebuild to meet their needs.
> 
> You can all consider this my white flag. :)
> 
> Excerpts from Tomáš Vondra's message of 2017-10-05 10:22:14 +0200:
> > In our cloud, we offer the possibility to reinstall the same or another OS 
> > on a VPS (Virtual Private Server). Unfortunately, we couldn’t use the 
> > rebuild function because of the VPS‘s use of Cinder for root disk. We 
> > create a new instance and inject the same User Data so that the new 
> > instance has the same password and key as the last one. It also has the 
> > same name, and the same floating IP is attached. I believe it even has the 
> > same IPv6 through some Neutron port magic.
> > 
> > BTW, you wouldn’t believe how often people use the Reinstall feature.
> > 
> > Tomas from Homeatcloud
> > 
> >  
> > 
> > From: Belmiro Moreira [mailto:moreira.belmiro.email.li...@gmail.com]
> > Sent: Wednesday, October 04, 2017 5:34 PM
> > To: Chris Friesen
> > Cc: openstack-operators@lists.openstack.org
> > Subject: Re: [Openstack-operators] [nova] Should we allow passing new 
> > user_data during rebuild?
> > 
> >  
> > 
> > In our cloud rebuild is the only way for a user to keep the same IP. 
> > Unfortunately, we don't offer floating IPs, yet.
> > 
> > Also, we use the user_data to bootstrap some actions in new instances 
> > (puppet, ...).
> > 
> > Considering all the use-cases for rebuild it would be great if the 
> > user_data can be updated at rebuild time.
> > 
> >  
> > 
> > On Wed, Oct 4, 2017 at 5:15 PM, Chris Friesen <chris.frie...@windriver.com> 
> > wrote:
> > 
> > On 10/03/2017 11:12 AM, Clint Byrum wrote:
> > 
> > My personal opinion is that rebuild is an anti-pattern for cloud, and 
> > should be frozen and deprecated. It does nothing but complicate Nova 
> > and present challenges for scaling.
> > 
> > That said, if it must stay as a feature, I don't think updating the 
> > user_data should be a priority. At that point, you've basically 
> > created an entirely new server, and you can already do that by 
> > creating an entirely new server.
> > 
> > 
>

Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-05 Thread Clint Byrum
No offense is intended, so please forgive me for the possibly incendiary
nature of what I'm about to write:

VPS is the predecessor of cloud (and something I love very much, and
rely on every day!), and encourages all the bad habits that a cloud
disallows. At small scale, it's the right thing, and that's why I use
it for my small scale needs. Get a VM, put your stuff on it, and keep
it running forever.

But at scale, VMs in clouds go away. They get migrated, rebooted, turned
off, and discarded, often. Most clouds are terrible for VPS compared to
VPS hosting environments.

I'm glad it's working for you. And I think rebuild and resize will stay
and improve to serve VPS style users in interesting ways. I'm learning now
who our users are today, and I'm confident we should make sure everyone
who has taken the time to deploy and care for OpenStack should be served
by expanding rebuild to meet their needs.

You can all consider this my white flag. :)

Excerpts from Tomáš Vondra's message of 2017-10-05 10:22:14 +0200:
> In our cloud, we offer the possibility to reinstall the same or another OS on 
> a VPS (Virtual Private Server). Unfortunately, we couldn’t use the rebuild 
> function because of the VPS‘s use of Cinder for root disk. We create a new 
> instance and inject the same User Data so that the new instance has the same 
> password and key as the last one. It also has the same name, and the same 
> floating IP is attached. I believe it even has the same IPv6 through some 
> Neutron port magic.
> 
> BTW, you wouldn’t believe how often people use the Reinstall feature.
> 
> Tomas from Homeatcloud
> 
>  
> 
> From: Belmiro Moreira [mailto:moreira.belmiro.email.li...@gmail.com] 
> Sent: Wednesday, October 04, 2017 5:34 PM
> To: Chris Friesen
> Cc: openstack-operators@lists.openstack.org
> Subject: Re: [Openstack-operators] [nova] Should we allow passing new 
> user_data during rebuild?
> 
>  
> 
> In our cloud rebuild is the only way for a user to keep the same IP. 
> Unfortunately, we don't offer floating IPs, yet.
> 
> Also, we use the user_data to bootstrap some actions in new instances 
> (puppet, ...).
> 
> Considering all the use-cases for rebuild it would be great if the user_data 
> can be updated at rebuild time.
> 
>  
> 
> On Wed, Oct 4, 2017 at 5:15 PM, Chris Friesen <chris.frie...@windriver.com> 
> wrote:
> 
> On 10/03/2017 11:12 AM, Clint Byrum wrote:
> 
> My personal opinion is that rebuild is an anti-pattern for cloud, and
> should be frozen and deprecated. It does nothing but complicate Nova
> and present challenges for scaling.
> 
> That said, if it must stay as a feature, I don't think updating the
> user_data should be a priority. At that point, you've basically created an
> entirely new server, and you can already do that by creating an entirely
> new server.
> 
> 
> If you've got a whole heat stack with multiple resources, and you realize 
> that you messed up one thing in the template and one of your servers has the 
> wrong personality/user_data, it can be useful to be able to rebuild that one 
> server without affecting anything else in the stack.  That's just a 
> convenience though.
> 
> Chris
> 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-05 Thread Clint Byrum
Excerpts from Chris Friesen's message of 2017-10-04 09:15:28 -0600:
> On 10/03/2017 11:12 AM, Clint Byrum wrote:
> 
> > My personal opinion is that rebuild is an anti-pattern for cloud, and
> > should be frozen and deprecated. It does nothing but complicate Nova
> > and present challenges for scaling.
> >
> > That said, if it must stay as a feature, I don't think updating the
> > user_data should be a priority. At that point, you've basically created an
> > entirely new server, and you can already do that by creating an entirely
> > new server.
> 
> If you've got a whole heat stack with multiple resources, and you realize 
> that 
> you messed up one thing in the template and one of your servers has the wrong 
> personality/user_data, it can be useful to be able to rebuild that one server 
> without affecting anything else in the stack.  That's just a convenience 
> though.
> 

If you just changed that personality/user_data in the template, Heat
would spin up a new one, change all the references to it, wait for any
wait conditions to fire, allowing dependent servers to reconfigure with
the new one and acknowledge that, and then delete the old one for you.

Making your app work like this means being able to replace failed or
undersized servers with less downtime. You can do other things too,
like spin up a replacement in a different AZ to deal with maintenance
issues on your side or the cloud's side. Or you can deploy a new image,
without any downtime.

My point remains: rebuild (and resize) train users to see a server as
precious, instead of training users to write automation that expects
cloud servers to come and go often.

This, btw, is one reason I like that EC2 calls them _instances_ and not
_servers_. They're not servers. We call them servers, but they're just
little regions of memory on actual servers, and as such, they're not
precious, and should be discarded as necessary.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-05 Thread Clint Byrum
Excerpts from Belmiro Moreira's message of 2017-10-04 17:33:40 +0200:
> In our cloud rebuild is the only way for a user to keep the same IP.
> Unfortunately, we don't offer floating IPs, yet.
> Also, we use the user_data to bootstrap some actions in new instances
> (puppet, ...).
> Considering all the use-cases for rebuild it would be great if the
> user_data can be updated at rebuild time.
> 

Indeed, it sounds like we're too far down the rabbit hole with rebuild to
stop digging.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-03 Thread Clint Byrum
I fully appreciate that there are users of it today, and that it is
a thing that will likely live for years.

Long lived VMs can use all sorts of features to make VMs work more like
precious long lived servers. However, supporting these cases directly
doesn't make OpenStack scalable or simple. Quite the opposite.

It's worth noting that AD and Kerberos were definitely not designed
for clouds that have short lived VMs, so it does not surprise me that
treating VMs as cattle and then putting them in AD would confuse it.

Excerpts from Tim Bell's message of 2017-10-03 18:46:31 +:
> We use rebuild when reverting with snapshots. Keeping the same IP and 
> hostname avoids some issues with Active Directory and Kerberos.
> 
> Tim
> 
> -Original Message-----
> From: Clint Byrum <cl...@fewbar.com>
> Date: Tuesday, 3 October 2017 at 19:17
> To: openstack-operators <openstack-operators@lists.openstack.org>
> Subject: Re: [Openstack-operators] [nova] Should we allow passing new
> user_data during rebuild?
> 
> 
> Excerpts from Matt Riedemann's message of 2017-10-03 10:53:44 -0500:
> > We plan on deprecating personality files from the compute API in a new 
> > microversion. The spec for that is here:
> > 
> > https://review.openstack.org/#/c/509013/
> > 
> > Today you can pass new personality files to inject during rebuild, and 
> > at the PTG we said we'd allow passing new user_data to rebuild as a 
> > replacement for the personality files.
> > 
> > However, if the only reason one would need to pass personality files 
> > during rebuild is because we don't persist them during the initial 
> > server create, do we really need to also allow passing user_data for 
> > rebuild? The initial user_data is stored with the instance during 
> > create, and re-used during rebuild, so do we need to allow updating it 
> > during rebuild?
> > 
> 
> My personal opinion is that rebuild is an anti-pattern for cloud, and
> should be frozen and deprecated. It does nothing but complicate Nova
> and present challenges for scaling.
> 
> That said, if it must stay as a feature, I don't think updating the
> user_data should be a priority. At that point, you've basically created an
> entirely new server, and you can already do that by creating an entirely
> new server.
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-03 Thread Clint Byrum

Excerpts from Matt Riedemann's message of 2017-10-03 10:53:44 -0500:
> We plan on deprecating personality files from the compute API in a new 
> microversion. The spec for that is here:
> 
> https://review.openstack.org/#/c/509013/
> 
> Today you can pass new personality files to inject during rebuild, and 
> at the PTG we said we'd allow passing new user_data to rebuild as a 
> replacement for the personality files.
> 
> However, if the only reason one would need to pass personality files 
> during rebuild is because we don't persist them during the initial 
> server create, do we really need to also allow passing user_data for 
> rebuild? The initial user_data is stored with the instance during 
> create, and re-used during rebuild, so do we need to allow updating it 
> during rebuild?
> 

My personal opinion is that rebuild is an anti-pattern for cloud, and
should be frozen and deprecated. It does nothing but complicate Nova
and present challenges for scaling.

That said, if it must stay as a feature, I don't think updating the
user_data should be a priority. At that point, you've basically created an
entirely new server, and you can already do that by creating an entirely
new server.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Metadata service over virtio-vsock

2017-02-21 Thread Clint Byrum
Excerpts from Daniel P. Berrange's message of 2017-02-21 10:40:02 +:
> On Mon, Feb 20, 2017 at 02:36:15PM -0500, Clint Byrum wrote:
> > What exactly is the security concern of the metadata service? Perhaps
> > those concerns can be addressed directly?
> > 
> > I ask because anything that requires special software on the guest is
> > a non-starter IMO. virtio is a Linux thing, so what does this do for
> > users of Windows?  FreeBSD? etc.
> 
> Red Hat is investing in creating virtio vsock drivers for Windows
> but I don't have an ETA for that yet. There's no work in *BSD in
> this area that I know of, but BSD does have support for virtio
> in general, so if virtio-vsock becomes used in any important
> places I would not be suprised if some BSD developers implemented
> vsock too.
> 

> In any case, I don't think it neccessarily needs to be supported
> in every single possible scenario. The config drive provides the
> same data in a highly portable manner, albeit with the caveat
> about it being read-only. The use of metadata service (whether
> TCP or vsock based) is useful for cases needing the info from
> config drive to be dynamically updated - eg the role device
> tagging metadata. Only a very small subset of guests running on
> openstack actually use that data today. So it would not be the
> end of the world if some guests don't support vsock in the short
> to medium term - if the facility proves to be critically important
> to a wider range of guests that'll motivate developers of those
> OS to support it.
> 

Cool, so there's a chance it gets to near ubiquitous usability.

However, I wonder, there's no need for performance here. Why not just
make it a virtual USB drive that ejects and re-attaches on changes? That
way you don't need Windows/BSD drivers.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Metadata service over virtio-vsock

2017-02-20 Thread Clint Byrum
Excerpts from Jeremy Stanley's message of 2017-02-20 20:08:00 +:
> On 2017-02-20 14:36:15 -0500 (-0500), Clint Byrum wrote:
> > What exactly is the security concern of the metadata service? Perhaps
> > those concerns can be addressed directly?
> [...]
> 
> A few I'm aware of:
> 

Thanks!

> 1. It's something that runs in the control plane but needs to be
> reachable from untrusted server instances (which may themselves even
> want to be on completely non-routed networks).
> 

As is DHCP

> 2. If you put a Web proxy between your server instances and the
> metadata service and also make it reachable without going through
> that proxy then instances may be able to spoof one another
> (OSSN-0074).
> 

That's assuming the link-local approach used by the EC2 style service.

If you have DHCP hand out a metadata URL with a nonce in it, that's no
longer an issue.

> 3. Lots of things, for example facter, like to beat on it heavily
> which makes for a fun DDoS and so is a bit of a scaling challenge in
> large deployments.
> 

These are fully mitigated by caching.

> There are probably plenty more I don't know since I'm not steeped in
> operating OpenStack deployments.

Thanks. I don't mean to combat the suggestions, but rather just see
what it is exactly that makes us dislike the metadata service.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Metadata service over virtio-vsock

2017-02-20 Thread Clint Byrum
What exactly is the security concern of the metadata service? Perhaps
those concerns can be addressed directly?

I ask because anything that requires special software on the guest is
a non-starter IMO. virtio is a Linux thing, so what does this do for
users of Windows?  FreeBSD? etc.

Excerpts from Artom Lifshitz's message of 2017-02-20 13:22:36 -0500:
> We've been having a discussion [1] in openstack-dev about how to best
> expose dynamic metadata that changes over a server's lifetime to the
> server. The specific use case is device role tagging with hotplugged
> devices, where a network interface or volume is attached with a role
> tag, and the guest would like to know what that role tag is right
> away.
> 
> The metadata API currently fulfills this function, but my
> understanding is that it's not hugely popular amongst operators and is
> therefore not universally deployed.
> 
> Dan Berrange came up with an idea [2] to add virtio-vsock support to
> Nova. To quote his explanation, " think of this as UNIX domain sockets
> between the host and guest. [...] It'd likely address at least some
> people's security concerns wrt metadata service. It would also fix the
> ability to use the metadata service in IPv6-only environments, as we
> would not be using IP at all."
> 
> So to those operators who are not deploying the metadata service -
> what are your reasons for doing so, and would those concerns be
> addressed by Dan's idea?
> 
> Cheers!
> 
> [1] 
> http://lists.openstack.org/pipermail/openstack-dev/2017-February/112490.html
> [2] 
> http://lists.openstack.org/pipermail/openstack-dev/2017-February/112602.html
> 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Ironic with top-rack switches management

2017-01-04 Thread Clint Byrum
Excerpts from George Shuklin's message of 2016-12-26 00:22:38 +0200:
> Hello everyone.
> 
> 
> Did someone actually made Ironic running with ToR (top rack switches) 
> under neutron in production? Which switch verdor/plugin (and OS version) 
> do you use? Do you have some switch configuration with parts outside of 
> Neutron reach? Is it worth spent efforts on integration, etc?
> 

We had an experimental setup with Ironic and the OVN Neutron driver and
VTEP-capable switches (Juniper, I forget the model #, but Arista also has
models that fully support VTEP). It was able to boot baremetal nodes on
isolated L2's (including an isolated provisioning network). In theory this
would also allow VM<->baremetal L2 networking (and with kuryr, you could
get VM<->baremetal<->container working too). But we never proved this
definitively as we got tripped up on scheduling and hostmanager issues
running with ironic in one host agg and libvirt in another. I believe
these are solved, though I've not seen the documentation to prove it.

> And one more question: Does Ironic support snapshotting of baremetal 
> servers? With some kind of agent/etc?
> 

I think that's asking too much really. The point of baremetal is that
you _don't_ have any special agents between your workload and hardware.
Consider traditional backup strategies.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Using novaclient, glanceclient, etc, from python

2016-11-17 Thread Clint Byrum
You may find the 'shade' library a straight forward choice:

http://docs.openstack.org/infra/shade/

Excerpts from George Shuklin's message of 2016-11-17 20:17:08 +0200:
> Hello.
> 
> I can't find proper documentation about how to use openstack clients 
> from inside python application. All I can find is just examples and 
> rather abstract (autogenerated) reference. Is there any normal 
> documentation about proper way to use openstack clients from python 
> applications?
> 
> 
> Thanks.
> 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How do you even test for that?

2016-10-18 Thread Clint Byrum
Excerpts from Jonathan Proulx's message of 2016-10-17 14:49:13 -0400:
> Hi All,
> 
> Just on the other side of a Kilo->Mitaka upgrade (with a very brief
> transit through Liberty in the middle).
> 
> As usual I've caught a few problems in production that I have no idea
> how I could possibly have tested for because they relate to older
> running instances and some remnants of older package versions on the
> production side which wouldn't have existed in test unless I'd
> installed the test server with Havana and done incremental upgrades
> starting a fairly wide suite of test instances along the way.
> 

In general, modifying _anything_ in place is hard to test.

You're much better off with as much immutable content as possible on all
of your nodes. If you've been wondering what this whole Docker nonsense
is about, well, that's what it's about. You docker build once per software
release attempt, and then mount data read/write, and configs readonly.
Both openstack-ansible and kolla are deployment projects that try to do
some of this via lxc or docker, IIRC.

This way when you test your container image in test, you copy it out to
prod, start up the new containers, stop the old ones, and you know that
_at least_ you don't have older stuff running anymore. Data and config
are still likely to be the source of issues, but there are other ways
to help test that.

> First thing that bit me was neutron-db-manage being confused because
> my production system still had migrations from Havana hanging around.
> I'm calling this a packaging bug
> https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1633576 but I
> also feel like remembering release names forever might be a good
> thing.
> 

Ouch, indeed one of the first things to do _before_ an upgrade is to run
the migrations of the current version to make sure your schema is up to
date. Also it's best to make sure you have _all_ of the stable updates
before you do that, since it's possible fixes have landed in the
migrations that are meant to smooth the upgrade process.

> Later I discovered during the Juno release (maybe earlier ones too)
> making snapshot of running instances populated the snapshot's meta
> data with "instance_type_vcpu_weight: none".  Currently (Mitaka) this
> value must be an integer if it is set or boot fails.  This has the
> interesting side effect of putting your instance into shutdown/error
> state if you try a hard reboot of a formerly working instance.  I
> 'fixed' this manually frobbing the DB to set lines where
> instance_type_vcpu_weight was set to none to be deleted.
> 

This one is tough because it is clearly data and state related. It's
hard to say how you got the 'none' values in there instead of ints.
Somebody else suggested making db snapshots and loading them into a test
control plane. That seems like an easy-ish one to do some surface level
finding, but the fact is it could also be super dangerous if not isolated
well, and the more isolation, the less of a real simulation it is.

> Does anyone have strategies on how to actually test for problems with
> "old" artifacts like these?
> 
> Yes having things running from 18-24month old snapshots is "bad" and
> yes it would be cleaner to install a fresh control plane at each
> upgrade and cut over rather than doing an actual in place upgrade.  But
> neither of these sub-optimal patterns are going all the way away
> anytime soon.
>

In-place upgrades must work. If they don't, please file bugs and
complain loudly. :)

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Ops@Barcelona - Updates

2016-10-17 Thread Clint Byrum
Excerpts from Tom Fifield's message of 2016-10-17 12:58:27 +0800:
> Hi all,
> 
> There have been some schedule updates for the Ops design summit sessions:
> 
> https://www.openstack.org/summit/barcelona-2016/summit-schedule/global-search?t=Ops+Summit%3A
>  
> 
> 
> New Sessions added:
> * Ops Meetups Team
> * Some working groups not previously listed
> * Horizon: Operator and Plugin Author Feedback
> * Neutron: End user and operator feedback
> * Barbican: User and Operator Feedback Session
> 
> and some minor room and time changes too - please doublecheck your schedule!
> 
> 
> ** Call for Moderators **
> 
> We really need a moderator for:
>  >> * HAProy, MySQL, Rabbit Tuning
> 
> since it looks like it will be one of the most popular sessions, but we 
> don't have a moderator yet.
> 

I've added my name to this one, but

a) I'm not a RabbitMQ expert (the other two are my focus)
b) I may have trouble getting to the session on time as I might be in a
different location until 11:00.

It would be great if somebody else would co-moderate. I'm sure there
are plenty of willing and able folks who can help.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

2016-10-12 Thread Clint Byrum
Excerpts from Adam Kijak's message of 2016-10-12 12:23:41 +:
> > 
> > From: Xav Paice 
> > Sent: Monday, October 10, 2016 8:41 PM
> > To: openstack-operators@lists.openstack.org
> > Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do 
> > you handle Nova on Ceph?
> > 
> > On Mon, 2016-10-10 at 13:29 +, Adam Kijak wrote:
> > > Hello,
> > >
> > > We use a Ceph cluster for Nova (Glance and Cinder as well) and over
> > > time,
> > > more and more data is stored there. We can't keep the cluster so big
> > > because of
> > > Ceph's limitations. Sooner or later it needs to be closed for adding
> > > new
> > > instances, images and volumes. Not to mention it's a big failure
> > > domain.
> > 
> > I'm really keen to hear more about those limitations.
> 
> Basically it's all related to the failure domain ("blast radius") and risk 
> management.
> Bigger Ceph cluster means more users.

Are these risks well documented? Since Ceph is specifically designed
_not_ to have the kind of large blast radius that one might see with
say, a centralized SAN, I'm curious to hear what events trigger
cluster-wide blasts.

> Growing the Ceph cluster temporary slows it down, so many users will be 
> affected.

One might say that a Ceph cluster that can't be grown without the users
noticing is an over-subscribed Ceph cluster. My understanding is that
one is always advised to provision a certain amount of cluster capacity
for growing and replicating to replaced drives.

> There are bugs in Ceph which can cause data corruption. It's rare, but when 
> it happens 
> it can affect many (maybe all) users of the Ceph cluster.
> 

:(

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [EXTERNAL] Re: Tenant/Project naming restrictions

2016-10-07 Thread Clint Byrum
Sounds like a bug in the API documentation:

http://developer.openstack.org/api-ref/identity/v3/?expanded=create-project-detail

"name   bodystring  The name of the project, which must be unique
within the owning domain. A project can have the same name as its
domain."

Unfortunately, IMO these API's are woefully under-documented. There's no
mentioned limit on length. One can assume anything valid in the JSON body
as a "string" is valid, so, any utf-8 character should work. In reality,
there are limits in the backend storage schema, and likely problems with
the wider UTF-8 characters in most peoples' clouds because MySQL doesn't
really support > 4 byte UTF-8.

I suggest opening a bug against any of the projects that fail to
document the limitations in their api-ref.

However, we can at least refer to the API tests as what is tested to
work:

https://github.com/openstack/tempest/blob/master/tempest/api/identity/admin/v3/test_projects_negative.py#L60-L64

Some tests that verify that one cannot save invalid utf-8 chars would be
useful there.

Excerpts from Vigil, David Gabriel's message of 2016-10-07 15:38:13 +:
> So, no one knows of official documents on tenant naming restrictions? 
> 
> 
> Dave G Vigil Sr
> Systems Integration Analyst Sr/SAIC Lead 09321
> Common Engineering Environment
> dgv...@sandia.gov
> 
> -Original Message-
> From: Saverio Proto [mailto:ziopr...@gmail.com] 
> Sent: Thursday, October 6, 2016 1:21 AM
> To: Steve Martinelli 
> Cc: Vigil, David Gabriel ; 
> openstack-operators@lists.openstack.org
> Subject: [EXTERNAL] Re: [Openstack-operators] Tenant/Project naming 
> restrictions
> 
> Is the '@' character allowed in the tenant/project names ?
> 
> Saverio
> 
> 2016-10-05 23:36 GMT+02:00 Steve Martinelli :
> > There are some restrictions.
> >
> > 1. The project name cannot be longer than 64 characters.
> > 2. Within a domain, the project name is unique. So you can have 
> > project "foo" in the "default" domain, and in any other domain.
> >
> > On Wed, Oct 5, 2016 at 5:16 PM, Vigil, David Gabriel 
> > 
> > wrote:
> >>
> >> What, if any, are the official tenant/project naming 
> >> requirements/restrictions? I can’t find any documentation that speaks 
> >> to any limitations. Is this documented somewhere?
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Dave G Vigil Sr
> >>
> >> Systems Integration Analyst Sr/SAIC Lead 09321
> >>
> >> Common Engineering Environment
> >>
> >> dgv...@sandia.gov
> >>
> >> 505-284-0157 (office)
> >>
> >> SAIC
> >>
> >>
> >>
> >>
> >> ___
> >> OpenStack-operators mailing list
> >> OpenStack-operators@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
> >> rs
> >>
> >
> >
> > ___
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator
> > s
> >
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] SDN for hybridcloud, does it *really* exist?

2016-10-03 Thread Clint Byrum
Excerpts from Jonathan Proulx's message of 2016-10-03 13:52:42 -0400:
> 
> So my sense from responses so far:
> 
> No one is doing unified SDN solutions across clouds and no one really
> wants to.
> 
> Consensus is just treat each network island like another remote DC and
> use normal VPN type stuff to glue them together.
> 
> ( nod to http://romana.io an interesting looking network and security
> automation project as a network agnostic alternative to SDN for
> managing cross cloud policy on whatever networks are available. )
> 

Oh sorry, there are people taking the complex route to what you want..
sort of:

https://wiki.openstack.org/wiki/Tricircle

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] SDN for hybridcloud, does it *really* exist?

2016-10-03 Thread Clint Byrum
Excerpts from Jonathan Proulx's message of 2016-10-03 11:16:03 -0400:
> On Sat, Oct 01, 2016 at 02:39:38PM -0700, Clint Byrum wrote:
> 
> :I know it's hard to believe, but this world was foretold long ago and
> :what you want requires no special equipment or changes to OpenStack,
> :just will-power.  You can achieve it now if you can use operating system
> :versions published in the last 5 or so years.
> :
> :The steps to do this:
> :
> :1) Fix your apps to work via IPv6
> :2) Fix your internal users to have v6 native
> :3) Attach your VMs and containers to a provider network with v6 subnets
> :4) Use IPSec and firewalls for critical isolation. (What we use L2
> :   separation for now)
> 
> That *is* hard to belive :) IPv6 has been coming soon since I started
> in tech a very long time ago ... 
> 
> I will consider that but I have a diverse set of users I don't
> control.  I *may* be able to apply pressure in the if you really need
> this then do the right thing, but I probably still want a v4 solution
> in my pocket.
> 

Treat v4 as an internet-only, insecure, extra service that one must ask
for. It's extremely easy, with OpenStack, to provide both if people want
it, and just let them choose. Those who choose v4 only will find they
can't do some things, and have a clear incentive to change.

It's not that v6 is coming. It's here, knocking on your door. But,
like a vampire, you still have to invite it in.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] SDN for hybridcloud, does it *really* exist?

2016-10-01 Thread Clint Byrum
Excerpts from Jonathan Proulx's message of 2016-09-30 10:15:26 -0400:
> 
> Starting to think refactoring my SDN world (currently just neutron
> ml2/ovs inside OpenStack) in preparation for maybe finally lighting up
> that second Region I've been threatening for the past year...
> 
> Networking is always the hardest design challeng.  Has anyone seen my
> unicorn?  I dream of something the first works with neutron of course
> but also can extend the same network features to hardware out side
> openstack and into random public cloud infrastructures through VM and/or
> containerised gateways.  Also I don't want to hire a whole networking
> team to run it.
> 
> I'm fairly certain this is still fantasy though I've heard various
> vendors promise the earth and stars but I'd love to hear if anyone is
> actually getting close to this in production systems and if so what
> your experience has been like.
> 

I know it's hard to believe, but this world was foretold long ago and
what you want requires no special equipment or changes to OpenStack,
just will-power.  You can achieve it now if you can use operating system
versions published in the last 5 or so years.

The steps to do this:

1) Fix your apps to work via IPv6
2) Fix your internal users to have v6 native
3) Attach your VMs and containers to a provider network with v6 subnets
4) Use IPSec and firewalls for critical isolation. (What we use L2
   separation for now)

This is not complicated, but your SDN vendor probably doesn't want you
to know that. You can still attach v4 addresses to your edge endpoints
so they can talk to legacy stuff while you migrate. But the idea here
is, if you control both ends of a connection, there is no reason you
should still be using v4 except tradition.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Openstack team size vs's deployment size

2016-09-12 Thread Clint Byrum
Excerpts from Mathieu Gagné's message of 2016-09-12 17:55:34 -0400:
> On Fri, Sep 9, 2016 at 6:41 PM, Mathieu Gagné  wrote:
> > On Wed, Sep 7, 2016 at 6:59 PM, Kris G. Lindgren  
> > wrote:
> >>
> >> I was hoping to poll other operators to see what their average team size
> >> vs’s deployment size is, as I am trying to use this in an internal company
> >> discussion.
> >
> > It is difficult to come up with numbers without context as not all
> > team are equally created. But I think we are currently facing the same
> > situation where you feel the team can't keep up with the amount of
> > work.
> 
> I also think that small team with small deployments has little
> incentive to invest in *heavy* automation (to help themselves) and/or
> tools to delegate its operation to a 3rd party or team. Your
> deployment isn't "big enough" so you feel it's not worth the
> investment because "I can manage those just fine" or "There isn't
> enough compute nodes to automate their installation", etc.
> 
> Once you hit a certain size, you need to have those in place. Without
> those tools, you will feel overwhelm by the task, feel like you are
> the only ones capable of managing/operating the infra and can't
> delegate to anyone.
> 

^^ This, so very this.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Openstack team size vs's deployment size

2016-09-12 Thread Clint Byrum
Excerpts from gustavo panizzo (gfa)'s message of 2016-09-09 22:07:49 +0800:
> On Thu, Sep 08, 2016 at 03:52:42PM +, Kris G. Lindgren wrote:
> > I completely agree about the general rule of thumb.  I am only looking at 
> > the team that specifically supports openstack.  For us frontend support for 
> > public clouds is handled by another team/org all together.
> 
> in my previous job the ratio was 1 openstack guy / 300 prod hv and
> ~ 50 non prod hypervisors (non prod clouds).
> 
> we had 5 different clouds, 2 biggest clouds shared keystone and
> glance (same dc, different buildings, almost lan latency). the biggest cloud 
> had 2 regions
> (different connectivity on same dc building)
> 
> a different team took care of the underlying hw, live migrations (when
> necessary but usually escalated to the openstack team) and install the
> computes running a single salt stack command. another team developed a
> in-house horizon replacement
> 
> that job definitively burned me, i'd say that the sweet spot is about
> 1 person every 200 hv, but if your regions are very similar and you have
> control of the whole stack (i didn't) i think 1 person every 300 hv is
> doable
> 
> we only used nova, neutron (provider networks), glance and keystone.
> 

This ratio is entirely dependent on the point of the cloud, and where
one's priorities lie.

If you have a moderately sized private cloud (< 200 hv's) with no AZ's,
and 1 region, and an uptime SLA of 99.99%, I'd expect 1 FTE would be
plenty. But larger clouds that expect to continuously grow should try to
drive it down as low as possible so that value can be extracted from the
equipment in as small a batch size as possible. The higher that ratio,
the more value we're getting from the servers themselves.

Basically, I'd expect this to scale logarithmically, with clouds in the
1 - 500 hv range being similar in total cost, but everything above that
leveling off in total cost growth, but continuing a more or less linear
path up in value proposition. The only way we get there is to attack
complexity with a vengeance.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Liberty RabbitMQ and ZeroMQ

2016-08-14 Thread Clint Byrum
Excerpts from William Josefsson's message of 2016-08-14 15:39:06 +0800:
> Hi everyone,
> 
> I see advice in replacing RabbitMQ with ZeroMQ. I've been running 2
> clusters Liberty/CentOS7 with RabbitMQ now for while. The larger
> cluster consists of 3x Controllers and 4x Compute nodes. RabbitMQ is
> running is HA mode as per:
> http://docs.openstack.org/ha-guide/shared-messaging.html#configure-rabbitmq-for-ha-queues.
> 

For 7 real computers, RabbitMQ is actually a better choice. You get
centralized management and the most battle-tested driver of all.

ZeroMQ is meant to remove the bottleneck and SPOF of a RabbitMQ cluster
from much larger systems by making the data path for messaging directly
peer-to-peer, but it still needs a central matchmaker database. So at
that scale, you're not really winning much by using it.

I can't really speak to the answers for your problems that you've seen,
but in general I'd expect Liberty and Mitaka on RabbitMQ to handle your
cluster size without breaking a sweat. Have you reported the errors as
bugs in oslo.messaging? That might be where to start:

https://bugs.launchpad.net/oslo.messaging/+filebug

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Too many connections

2016-05-30 Thread Clint Byrum
Excerpts from Kris G. Lindgren's message of 2016-05-30 15:33:49 +:
> You are most likely running db pools with a number of worker processes.  If 
> you look at the MySQL connections most of them will be idle.  If that's the 
> case set the db pool timeout lower.  Lower the pool size down.  Each worker 
> thread opens a connection pool to the database.  If you are running 10 
> workers with a min db pool size of 5 and a max of 10.  You will have a 
> minimum number of 50 db connections, max 100, per server running that service.
> 
> 
> I would be looking at: pool_timeout, min_pool_size, max_pool_size
> 
> http://docs.openstack.org/developer/oslo.db/opts.html
> 

This is great information Kris.

It's also worth noting that MySQL connections that are idle eat up very
little RAM and so you can probably bump it up a bit.

The setting for that is max_connections:

https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_max_connections

The default of 151 is pretty conservative. You can probably safely raise
it to 400 on anything but the most memory constrained MySQL server.

> 
> On May 30, 2016, at 9:24 AM, Fran Barrera 
> > wrote:
> 
> Hi,
> 
> I'm using Mitaka on ubuntu 16.04 and I have many problems in horizon. I can 
> see this in the logs of all components: "OperationalError: 
> (pymysql.err.OperationalError) (1040, u'Too many connections')" If I increase 
> the max_connections on mysql works well a few minutes but the same error. 
> Maybe Openstack don't close connections with Mysql. The version of Mysql is 
> 5.7.
> 
> Any suggestions?
> 
> Regards,
> Fran

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] New networking solution for Cloud Native apps....

2016-02-03 Thread Clint Byrum
Excerpts from Chris Marino's message of 2016-02-01 06:08:34 -0800:
> Hello everyone, just wanted to let you know that today we opened up the
> repos for the new open source networking project we’ve been working on.
> It’s called Romana and the project site is romana.io.
> 
> Thought you would be interested because it enables multi-tenant networking
> without a virtual network overlay. It's targeted for use with applications
> that only need L3 networks so we’ve been able to eliminate and simplify
> many things to make the network faster, and easier to build and operate.
> 
> If you run these kind of Cloud Native apps on OpenStack (or even directly
> on bare metal with Docker or Kubernetes), we’d love to hear what you think.
> We’re still working on the container CNM/CNI integration. Any and all
> feedback is welcome.
> 
> The code is on Github at github.com/romana and you can see how it all works
> with a demo we’ve set up that lets you install and run OpenStack on EC2
> .
> 
> You can read about how Romana works on the project site, here
> . In summary, it extends the physical
> network hierarchy of a layer 3 routed access design
>  from spine and
> leaf switches on to hosts, VMs and containers.
> 
> This enables a very simple and intuitive tenancy model: For every tenant
> (and each of their network segments) there is an actual physical network
> CIDR on each host, with all tenants sharing the host-specific address
> prefix.  The advantage of this is that route aggregation makes route
> distribution unnecessary and collapses the number of iptables rules
> required for segment isolation. In addition, traffic policies, such as
> security rules, can easily be applied to those tenant or segment specific
> CIDRs across all hosts.
> 
> Any/all comments welcome.

Really interesting, thanks Chris. For baremetal, which is a very real
thing for users of OpenStack right now, this presents some challenges.

The agents that sit on compute nodes in Romana are not going to be able
to enforce any isolation themselves, since baremetal nodes will end
up on the same L2. The agents would either have to get back into the
business Neutron ML2 is in, of configuring switches through a mechanism
driver, or servers would have to self-isolate, which may not be obvious
or acceptible for some. I wonder if you've thought through any other
solution to that particular problem.

I also think you should share this on openstack-dev, as the developers
may also be aware of other efforts that may conflict with or complement
Romana.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] I have an installation question and possible bug

2016-01-25 Thread Clint Byrum
Excerpts from Christopher Hull's message of 2016-01-25 09:11:59 -0800:
> Hello all;
> 
> I'm an experienced developer and I work at Cisco.  Chances are I've covered
> the basics here,but just in case, check me.
> I've followed the Kilo install instructions to the letter so far as I can
> tell.   I have not installed Swift, but I think everything else, and my
> installation almost works.   I'm having a little trouble with Glance.
> 
> It seems that when I attempt to create a large image (that may or not may
> be the issue), the checksum that Glance records in it's DB is incorrect.
> Cirros image runs just fine.  CentOS cloud works.  But when I offload and
> create an image from a big CentOS install (say 100gb), nova says the
> checksum is wrong when I try to boot it.
> 

Did you check the file that glance saved to disk to make sure it was
the same one you uploaded? I kind of wonder if something timed out and
did not properly report the error, leading to a partially written file.

Also, is there some reason you aren't deploying Liberty?

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Keystone token HA

2015-12-18 Thread Clint Byrum
Excerpts from Ajay Kalambur (akalambu)'s message of 2015-12-17 22:48:24 -0800:
> Hi
> If we deploy Keystone using memcached as token backend we see that bringing 
> down 1 of 3 memcache servers results in some tokens getting invalidated. Does 
> memcached not support replication of tokens
> So if we wanted HA w.r.t keystone tokens should we use SQL backend for tokens?
> 

I'd recommend using Fernet + SQL (for revocation events). Not having to
store all of the tokens is worth the extra CPU to validate/generate.

If you do use SQL as the backend for UUID, make sure you're cleaning up
expired tokens aggressively.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Service Catalog TNG urls

2015-12-06 Thread Clint Byrum
Excerpts from Xav Paice's message of 2015-12-05 13:26:23 -0800:
> >
> >
> > >> SecurityAuditing/Accounting:
> > >> Having a separate internal API (for the OpenStack services) from the
> > >> Public API (for humans and remote automation), allows the operator to
> > >> apply a strict firewall in front of the public API to restrict access
> > >> from outside the cloud. Such a device may also help deflect/absorb a
> > >> DOS attack against the API. This firewall can be an encryption
> > >> endpoint, so the traffic can be unencrypted and examined or logged. I
> > >> wouldn't want the extra latency of such a firewall in front of all my
> > >> OpenStack internal service calls.
> > >>
> > >
> > > This one is rough. One way to do it is to simply host the firewall in
> > > a DMZ segment, setting up your routes for that IP to go through the
> > > firewall. This means multi-homing the real load balancer/app servers to
> > > have an IP that the firewall can proxy to directly.
> > >
> > > But I also have to point out that not making your internal servers pass
> > > through this is another example of a squishy center, trading security
> > > for performance.
> > >
> >
> 
> It's not just the firewall issue though - the Keystone adminurl and
> publicurl is important to us because it allows us to open the publicurl to
> customers, and know that admin actions are protected by another layer.  We
> don't want admin to be available to the public for any service - but we do
> want our customers to have API access.  With fancy url redirects and
> proxies we can achieve this with two sets of API servers, each with their
> own policy.json, but that's larger and more complex than necessary and
> would be better if the individual services were to reject 'admin' calls
> from the publicurl.

I respect that this is what works for you and we shouldn't require you to
change your ways without good reason. However, I just want to point out
that if you don't trust Keystone's own ACL's to prevent administrative
access by users who haven't been granted access, then you also don't
trust Keystone to keep users out of each-others accounts!

That said, if there really is a desire to keep admin functions separate
from user functions, why not formalize that and make it an entirely
separate service in the catalog? So far, Keystone is the only service
to make use of "adminurl". So a valid path forward is to simply make it
a different entry.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Service Catalog TNG urls

2015-12-06 Thread Clint Byrum
Excerpts from Xav Paice's message of 2015-12-06 14:09:58 -0800:
> On 7 December 2015 at 05:38, Clint Byrum <cl...@fewbar.com> wrote:
> 
> > Excerpts from Xav Paice's message of 2015-12-05 13:26:23 -0800:
> > > >
> >
> > I respect that this is what works for you and we shouldn't require you to
> > change your ways without good reason. However, I just want to point out
> > that if you don't trust Keystone's own ACL's to prevent administrative
> > access by users who haven't been granted access, then you also don't
> > trust Keystone to keep users out of each-others accounts!
> >
> >
> That's an excellent point, and one which scares me quite a lot.  But that's
> the sad reason we need two lots of API servers - so even if someone were to
> get hold of an admin userid/password, they still can't go deleting the
> entire cloud.  It does at least limit the damage.
> 

Indeed, this is one paradigm where it actually creates a hardened core,
not a squisy one, so I think it's not a terrible idea.

> > That said, if there really is a desire to keep admin functions separate
> > from user functions, why not formalize that and make it an entirely
> > separate service in the catalog? So far, Keystone is the only service
> > to make use of "adminurl". So a valid path forward is to simply make it
> > a different entry.
> >
> 
> Keystone is indeed the only one that does this - I hesitate to say "right"
> because it might not be.
> 
> I'm not sure I follow when you say separate service - you mean a completely
> different service, with a full set of endpoints?  Makes sense if the
> projects that use the catalogue also honour that, but I don't know I see
> the difference between having a different service for admin requests, and a
> split admin url and public url.  Maybe I'm just being thick here, but I had
> thought that was the original intention despite it never being used by
> anyone other than Keystone.
> 

What I mean is, you'd just have two completely separate _services_,
instead of one service, with two separate endpoint "interface" entries:

now:

[
  {
"name": "keystone",
"id": "bd73972c0e14fb69bae8ff76e112a90",
"type": "identity",
"endpoints": [
  {
"id": "29beb2f1567642eb810b042b6719ea88",
"interface": "admin",
"region": "RegionOne",
"url": "http://your.network.internal:3537/v2.0;
  },
  {
"id": "8707e3735d4415c97ae231b4841eb1c",
"interface": "internal",
"region": "RegionOne",
"url": "http://your.network.internal:5000/v2.0;
  },
  {
"id": "ef303187fc8d41668f25199c298396a5",
"interface": "public",
"region": "RegionOne",
"url": "http://your.cloud.com:5000/v2.0;
  }
]
  }
]

Proposed:

[
  {
"name": "keystone",
"id": "bd73972c0e14fb69bae8ff76e112a90",
"type": "identity",
"endpoints: [
  {
"id": "29beb2f1567642eb810b042b6719ea88",
"region": "RegionOne",
"url": "http://your.cloud.com:3537/v2.0;
  },
]
  },
  {
"name": "keystone-admin",
"id": "94e572e15aad443a937b100be977b26c",
"type": "identity-admin",
"endpoints": [
  {
"id": "8707e3735d4415c97ae231b4841eb1c",
"region": "RegionOne",
"url": "http://your.network.internal:5000/v2.0;
  }
  }
]

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Performance][Proposal] Moving IRC meeting from 15:00 UTC to 16:00 UTC

2015-12-04 Thread Clint Byrum
Excerpts from Dina Belova's message of 2015-12-04 01:46:06 -0800:
> Dear performance folks,
> 
> There is a suggestion to move our meeting time from 15:00 UTC (Tuesdays
> ) to
> 16:00 UTC (also Tuesdays
> ) to
> make them more comfortable for US guys.
> 
> Please leave your +1 / -1 here in the email thread.
> 
> Btw +1 from me :)

+1, this makes it actually possible for me to participate. :)

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Service Catalog TNG urls

2015-12-03 Thread Clint Byrum
Excerpts from Dan Sneddon's message of 2015-12-03 09:43:59 -0800:
> On 12/03/2015 06:14 AM, Sean Dague wrote:
> > For folks that don't know, we've got an effort under way to look at some
> > of what's happened with the service catalog, how it's organically grown,
> > and do some pruning and tuning to make sure it's going to support what
> > we want to do with OpenStack for the next 5 years (wiki page to dive
> > deeper here - https://wiki.openstack.org/wiki/ServiceCatalogTNG).
> > 
> > One of the early Open Questions is about urls. Today there is a
> > completely free form field to specify urls, and there are conventions
> > about having publicURL, internalURL, adminURL. These are, however, only
> > conventions.
> > 
> > The only project that's ever really used adminURL has been Keystone, so
> > that's something we feel we can phase out in new representations.
> > 
> > The real question / concern is around public vs. internal. And something
> > we'd love feedback from people on.
> > 
> > When this was brought up in Tokyo the answer we got was that internal
> > URL was important because:
> > 
> > * users trusted it to mean "I won't get changed for bandwidth"
> > * it is often http instead of https, which provides a 20% performance
> > gain for transfering large amounts of data (i.e. glance images)
> > 
> > The question is, how hard would it be for sites to be configured so that
> > internal routing is used whenever possible? Or is this a concept we need
> > to formalize and make user applications always need to make the decision
> > about which interface they should access?
> > 
> > -Sean
> > 
> 
> I think the real question is whether we need to bind APIs to multiple
> IP addresses, or whether we need to use a proxy to provide external
> access to a single API endpoint. It seems unacceptable to me to have
> the API only hosted externally, then use routing tricks for the
> services to access the APIs.
> 

I'm not sure I agree that using the lowest cost route is a "trick".

> While I am not an operator myself, I design OpenStack networks for
> large (and very large) operators on a regular basis. I can tell you
> that there is a strong desire from the customers and partners I deal
> with for separate public/internal endpoints for the following reasons:
> 
> Performance:
> There is a LOT of API traffic in a busy OpenStack deployment. Having
> the internal OpenStack processes use the Internal API via HTTP is a
> performance advantage. I strongly recommend a separate Internal API
> VLAN that is non-routable, to ensure that no traffic is unencrypted
> accidentally.
> 

I'd be interested in some real metrics on the performance advantage.
It's pretty important to weigh that vs. the loss of security inside
a network. Because this argument leads to the "hard shell, squishy
center" security model, and that leads to rapid cascading failure
(leads.. to..  suffering..). I wonder how much of that
performance loss would be regained by using persistent sessions.

Anyway, this one can be kept by specifying schemeless URLs, and simply
configuring your internal services to default to http, but have the
default for schemeless URLs be https.

> SecurityAuditing/Accounting:
> Having a separate internal API (for the OpenStack services) from the
> Public API (for humans and remote automation), allows the operator to
> apply a strict firewall in front of the public API to restrict access
> from outside the cloud. Such a device may also help deflect/absorb a
> DOS attack against the API. This firewall can be an encryption
> endpoint, so the traffic can be unencrypted and examined or logged. I
> wouldn't want the extra latency of such a firewall in front of all my
> OpenStack internal service calls.
> 

This one is rough. One way to do it is to simply host the firewall in
a DMZ segment, setting up your routes for that IP to go through the
firewall. This means multi-homing the real load balancer/app servers to
have an IP that the firewall can proxy to directly.

But I also have to point out that not making your internal servers pass
through this is another example of a squishy center, trading security
for performance.

> Routing:
> If there is only one API, then it has to be externally accessible. This
> means that a node without an external connection (like a Ceph node, for
> instance) would have to either have its API traffic routed, or it would
> have to be placed on an external segment. Either choice is not optimal.
> Routers can be a chokepoint. Ceph nodes should be back-end only.
> 
> Uniform connection path:
> If there is only one API, and it is externally accessible, then it is
> almost certainly on a different network segment than the database, AMQP
> bus, redis (if applicable), etc. If there is an Internal API it can
> share a segment with these other services while the Public API is on an
> external segment.
> 

It seems a little contrary to me that it's preferrable to have a
software-specific solution to security 

Re: [Openstack-operators] DIB in container vs VM

2015-12-02 Thread Clint Byrum
Excerpts from Abel Lopez's message of 2015-12-01 16:16:08 -0800:
> Hey everyone,
> I've been running diskimage-builder just fine for over a year inside various 
> VMs, but now, I'm having to run it inside a docker container.
> I'm curious if anyone has experience with making the 'ubuntu latest' docker 
> hub image more like a VM.
> 
> For example, I can `vagrant up` the trusty image from vagrant cloud, and 
> every image I attempt to make "JUST WORKS"
> When I try the same things with `docker run` things fail left right and 
> center.
> 
> I figure there is some combination of packages that need to be installed, or 
> services that need to be running, or mount points that need to exist to make 
> this happen.

I can't really comment without seeing specifics, but I can say that
requirements _should_ be listed here:

http://docs.openstack.org/developer/diskimage-builder/user_guide/installation.html

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] FYI, local conductor mode is deprecated, pending removal in N

2015-11-12 Thread Clint Byrum
Excerpts from Joshua Harlow's message of 2015-11-12 10:35:21 -0800:
> Mike Dorman wrote:
> > We do have a backlog story to investigate this more deeply, we just have 
> > not had the time to do it yet.  For us, it’s been easier/faster to add more 
> > hardware to conductor to get over the hump temporarily.
> >
> > We kind of have that work earmarked for after the Liberty upgrade, in hopes 
> > that maybe it’ll be fixed there.
> >
> > If anybody else has done even some trivial troubleshooting already, it’d be 
> > great to get that info as a starting point.  I.e. which specific calls to 
> > conductor are causing the load, etc.
> >
> > Mike
> >
> 
> +1 I think we in the #openstack-performance channel really need to 
> investigate this, because it really worries me personally from hearing 
> many many rumors about how the remote conductor falls over. Please join 
> there and we can try to work through a plan to figure out what to do 
> about this situation. It would be great if the nova people also joined 
> there (because in the end, likely something in nova will need to be 
> fixed/changed/something else to resolve what appears to be a problem for 
> many operators).
> 

Falling over is definitely a bad sign. ;)

The concept of pushing messages over a bus instead of just making local
calls shouldn't result in much extra load. Perhaps we just have too many
layers of unoptimized encapsulation. I have to wonder if something like
protobuf would help.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [stable][all] Keeping Juno "alive" for longer.

2015-11-09 Thread Clint Byrum
Excerpts from James King's message of 2015-11-09 08:47:08 -0800:
> disclaimer: I’ve never worked in a software auditing department or on in a 
> company with one
> 
> What about risk-averse organizations with strict policy compliance 
> guidelines? Can we expect them to audit a new distribution of Openstack every 
> 6 months? Some sort of community-supported LTS system would at least give 
> these consulting firms a base on which to build such a compliant Openstack 
> distribution for industry X.
> 

Nobody has said that the idea of an LTS is bad. The _realities_ are
simply challenging. Upgrades from release to release are already
painful, which I suspect is at least part of the reason many are still
on older releases. Upgrading across 2 years of development would
possibly be a herculean effort that so far nobody has even tried to
tackle without stepping through all intermediary releases.

> If we’re only talking about patches to support minor updates to system 
> packages what’s the cost to the community?
> 

The two biggest are the testing infrastructure and maintaining upgrade
support. The first is already stretched thin, and the latter has
managed to scare away anyone who brings up LTS releases without them
even attempting it.

> I’m not against Tom’s idea and would be satisfied with it but it would be 
> better, I think, to at least give the community an option of a solid base on 
> which to build a compliant Openstack distribution that isn’t going to move 
> out from underneath them in six months.
> 
> Unless of course that should be the job of some distribution maintainer… in 
> which case how to we work with them?
> 

We do work with distro maintainers. They do a ton of work in the stable
branches.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [stable][all] Keeping Juno "alive" for longer.

2015-11-06 Thread Clint Byrum
Excerpts from Tony Breeds's message of 2015-11-05 22:15:18 -0800:
> Hello all,
> 
> I'll start by acknowledging that this is a big and complex issue and I
> do not claim to be across all the view points, nor do I claim to be
> particularly persuasive ;P
> 

 In the future, also consider not making it even more complex by
cross posting!

Indeed, it is not cut and dry.

> Having stated that, I'd like to seek constructive feedback on the idea of
> keeping Juno around for a little longer.  During the summit I spoke to a
> number of operators, vendors and developers on this topic.  There was some
> support and some "That's crazy pants!" responses.  I clearly didn't make it
> around to everyone, hence this email.
> 

While I wish everybody would get on the Continuous Delivery train, I
understand that it's still comforting to many to use stable releases with
the hope that this will somehow keep them more available (it won't, there
are more scalability and reslience bugs fixed than "big risky changes"
landed by quite a large factor, in trunk) or that it will be less change
(it isn't, you can eat the elephant one spoon-full at a time, or prepare
a feast every 6 months).

Until we all embrace the chaos-drip in favor of the chaos-deluge, we
have to keep supporting stable releases.

> Acknowledging my affiliation/bias:  I work for Rackspace in the private
> cloud team.  We support a number of customers currently running Juno that are,
> for a variety of reasons, challenged by the Kilo upgrade.
> 
> Here is a summary of the main points that have come up in my conversations,
> both for and against.
> 
> Keep Juno:
>  * According to the current user survey[1] Icehouse still has the
>biggest install base in production clouds.  Juno is second, which makes
>sense. If we EOL Juno this month that means ~75% of production clouds
>will be running an EOL'd release.  Clearly many of these operators have
>support contracts from their vendor, so those operators won't be left 
>completely adrift, but I believe it's the vendors that benefit from keeping
>Juno around. By working together *in the community* we'll see the best
>results.
> 
>  * We only recently EOL'd Icehouse[2].  Sure it was well communicated, but we
>still have a huge Icehouse/Juno install base.
> 

Wow, yeah, I had not seen these numbers yet. It sounds to me like the
number of operators who can keep up with the 6 month cadence is very low,
and LTS cycles need to be considered.

> For me this is pretty compelling but for balance  
> 
> Keep the current plan and EOL Juno Real Soon Now:
>  * There is also no ignoring the elephant in the room that with HP stepping
>back from public cloud there are questions about our CI capacity, and
>keeping Juno will have an impact on that critical resource.
> 
>  * Juno (and other stable/*) resources have a non-zero impact on *every*
>project, esp. @infra and release management.  We need to ensure this
>isn't too much of a burden.  This mostly means we need enough trustworthy
>volunteers.
> 
>  * Juno is also tied up with Python 2.6 support. When
>Juno goes, so will Python 2.6 which is a happy feeling for a number of
>people, and more importantly reduces complexity in our project
>infrastructure.
> 
>  * Even if we keep Juno for 6 months or 1 year, that doesn't help vendors
>that are "on the hook" for multiple years of support, so for that case
>we're really only delaying the inevitable.
> 
>  * Some number of the production clouds may never migrate from $version, in
>which case longer support for Juno isn't going to help them.
> 
> 
> I'm sure these question were well discussed at the VYR summit where we set
> the EOL date for Juno, but I was new then :) What I'm asking is:
> 
> 1) Is it even possible to keep Juno alive (is the impact on the project as
>a whole acceptable)?
> 

Sure, but...

> Assuming a positive answer:
> 
> 2) Who's going to do the work?
> - Me, who else?

That's really the rub. Historically barely anybody has maintained the
stable branches, though lately that has gotten a bit better.

> 3) What do we do if people don't actually do the work but we as a community
>have made a commitment?

There's certainly an argument to be made for downstream entities to take
this over if we, upstream, don't want it.

> 4) If we keep Juno alive for $some_time, does that imply we also bump the
>life cycle on Kilo and liberty and Mitaka etc?
> 

Probably. Doesn't seem like this problem can go away without making
concessions for longer term support though. One concession might be that
6 months is too short, and extend the cycle out. But that would have
gigantic ripple effects.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Keystone performance issue

2015-10-26 Thread Clint Byrum
Excerpts from Reza Bakhshayeshi's message of 2015-10-27 05:11:28 +0900:
> Hi all,
> 
> I've installed OpenStack Kilo (with help of official document) on a
> physical HP server with following specs:
> 
> 2 Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz each 12 physical core (totally
> 48 threads)
> and 128 GB of Ram
> 
> I'm going to benchmark keystone performance (with Apache JMeter) in order
> to deploy OpenStack in production, but unfortunately I'm facing extremely
> low performance.
> 
> 1000 simultaneously token creation requests took around 45 seconds. (WOW!)
> By using memcached in keystone.conf (following configuration) and threading
> Keystone processes to 48, response time decreased to 18 seconds, which is
> still too high.
> 

I'd agree that 56 tokens per second isn't very high. However, it
also isn't all that terrible given that keystone is meant to be load
balanced, and so you can at least just throw more boxes at it without
any complicated solution at all.

Of course, that's assuming you're running with Fernet tokens. With UUID,
which is the default if you haven't changed it, then you're pounding those
tokens into the database, and that means you need to tune your database
service quite a bit and provide high performance I/O (you didn't mention
the I/O system).

So, first thing I'd recommend is to switch to Liberty, as it has had some
performance fixes for sure. But I'd also recommend evaluating the Fernet
token provider. You will see much higher CPU usage on token validations,
because the caching bonuses you get with UUID tokens aren't as mature in
Fernet even in Liberty, but you should still see an overall scalability
win by not needing to scale out your database server for heavy writes.

> [cache]
> enabled = True
> config_prefix = cache.keystone
> expiration_time = 300
> backend = dogpile.cache.memcached
> backend_argument = url:localhost:11211
> use_key_mangler = True
> debug_cache_backend = False
> 
> I also increased Mariadb, "max_connections" and Apache allowed open files
> to 4096, but they didn't help much (2 seconds!)
> 
> Is it natural behavior? or we can optimize keystone performance more?
> What are your suggestions?

I'm pretty focused on doing exactly that right now, but we will need to
establish some baselines and try to make sure we have tools to maintain
the performance long-term.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] Scheduler proposal

2015-10-08 Thread Clint Byrum
Excerpts from Maish Saidel-Keesing's message of 2015-10-08 00:14:55 -0700:
> Forgive the top-post.
> 
> Cross-posting to openstack-operators for their feedback as well.
> 
> Ed the work seems very promising, and I am interested to see how this 
> evolves.
> 
> With my operator hat on I have one piece of feedback.
> 
> By adding in a new Database solution (Cassandra) we are now up to three 
> different database solutions in use in OpenStack
> 
> MySQL (practically everything)
> MongoDB (Ceilometer)
> Cassandra.
> 
> Not to mention two different message queues
> Kafka (Monasca)
> RabbitMQ (everything else)
> 
> Operational overhead has a cost - maintaining 3 different database 
> tools, backing them up, providing HA, etc. has operational cost.
> 
> This is not to say that this cannot be overseen, but it should be taken 
> into consideration.
> 
> And *if* they can be consolidated into an agreed solution across the 
> whole of OpenStack - that would be highly beneficial (IMHO).
> 

Just because they both say they're databases, doesn't mean they're even
remotely similar.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [Large Deployments Team][Performance Team] New informal working group suggestion

2015-09-24 Thread Clint Byrum
Excerpts from Dina Belova's message of 2015-09-22 05:57:19 -0700:
> Hey, OpenStackers!
> 
> I'm writing to propose to organise new informal team to work specifically
> on the OpenStack performance issues. This will be a sub team in already
> existing Large Deployments Team, and I suppose it will be a good idea to
> gather people interested in OpenStack performance in one room and identify
> what issues are worrying contributors, what can be done and share results
> of performance researches :)
> 
> So please volunteer to take part in this initiative. I hope it will be many
> people interested and we'll be able to use cross-projects session slot
>  to meet in Tokyo and hold a
> kick-off meeting.
> 
> I would like to apologise I'm writing to two mailing lists at the same
> time, but I want to make sure that all possibly interested people will
> notice the email.
> 

Dina, this is great. Count me in, and see you in Tokyo!

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] CMDB

2015-05-07 Thread Clint Byrum
I added information about Assimilation Monitoring to the etherpad. It's
not strictly a CMDB, but it can be used for what CMDB's often are used
for, and is worth noting in the discussion.

Excerpts from Shane Gibson's message of 2015-05-07 08:27:01 -0700:
 
 At Symantec we use YiDB for CMDB data - however - we find it to be lacking as 
 it is NOT a GraphDB, but emulates GraphDB like behavior.  In addition, it 
 requires MongoDB as a backing store - and that will eventually corrupt and 
 lose your data...often silently (please direct your flame mail to /dev/null 
  thank you).   We also have a project (the lead author) which is a 
 complete rewrite of the Yahoo libCrange tool; called Range++.
 
 Range++ is a full GraphDB solution designed specifically to act as a CMS 
 (config mgmt service - I dislike the term cmdb).  In addition it can easily 
 also support ENC (external node cllassifier) duties as well.  Range++ is 
 designed to allow you to describe your topology and environment with the idea 
 of environments and with clusters and clusters within clusters.  
 Range++ is in operation at LinkedIN, Yahoo, and Mozilla.  Range++ is also 
 integrated within the Saltstack tool for targetting via the -R (range 
 cluster) syntax - as it's completely libCrange compatible. Conceptually - 
 you can describe your environments and physical assets, then describe your 
 application via a configuration file (say YAML), then via your config mgmt 
 tooling, prescriptively build that cluster by populating your CM tools 
 metadata, and assigning resources from physical/virtual assets.
 
 Our existing deployment (bare metal) framework dynamically auto discovers 
 assets as they come online, and populates our CMS with the asset data and 
 information.
 
 Note that Range++ is in active use and development, and is located on GITHUB: 
  https://github.com/jbeverly/rangexx
 
 I have updated the Etherpad with this info.
 
 ~~shane
 
 
 
 On Wed, May 6, 2015 at 10:36 PM, Allamaraju, Subbu 
 su...@subbu.orgmailto:su...@subbu.org wrote:
 Hi Tom,
 
 Thanks for adding this slot.
 
 We do have a fairly full-fledged CMDB in house that keeps tracks of all our 
 infra and apps. Unfortunately none of that team is going to be able to make 
 it to the Summit, but I’m trying to have someone do a demo remotely and 
 participate on EtherPad.
 
 Subbu
 
  On May 6, 2015, at 3:04 AM, Tom Fifield 
  t...@openstack.orgmailto:t...@openstack.org wrote:
 
  Hi,
 
  Is anyone interested enough in CMDB to run a working session on it at
  the design summit?
 
  https://libertydesignsummit.sched.org/event/553947ceb7c1c223fa689da188abb9a9
 
  It was suggested on the planning etherpad, but so far we've found no-one
  interested in running it.
 
 
  Regards,
 
 
  Tom
 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [all] QPID incompatible with python 3 and untested in gate -- what to do?

2015-04-14 Thread Clint Byrum
Hello! There's been some recent progress on python3 compatibility for
core libraries that OpenStack depends on[1], and this is likely to open
the flood gates for even more python3 problems to be found and fixed.

Recently a proposal was made to make oslo.messaging start to run python3
tests[2], and it was found that qpid-python is not python3 compatible yet.

This presents us with questions: Is anyone using QPID, and if so, should
we add gate testing for it? If not, can we deprecate the driver? In the
most recent survey results I could find [3] I don't even see message
broker mentioned, whereas Databases in use do vary somewhat.

Currently it would appear that only oslo.messaging runs functional tests
against QPID. I was unable to locate integration testing for it, but I
may not know all of the places to dig around to find that.

So, please let us know if QPID is important to you. Otherwise it may be
time to unburden ourselves of its maintenance.

[1] https://pypi.python.org/pypi/eventlet/0.17.3
[2] https://review.openstack.org/#/c/172135/
[3] 
http://superuser.openstack.org/articles/openstack-user-survey-insights-november-2014

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Packaging with fpm

2015-03-04 Thread Clint Byrum
Excerpts from Mathieu Gagné's message of 2015-03-04 08:31:45 -0800:
 Hi,
 
 I'm currently experimenting with fpm.
 
 I learned that fpm does not generate the files needed to upload your new 
 package to an APT repository. Since the package type built by fpm is 
 binary, that file would be the .changes control file.
 
 This bothers me a lot because my current workflow looks like this:
 1) Fork Ubuntu Cloud Archive OpenStack source packages
 2) Apply custom patches using quilt [1]
 3) Build source and binary packages using standard dpkg tools
 4) Upload source and binary packages to private APT repository with dput
 5) Install new packages
 
 (repeat steps 2-4 until a new upstream release is available)
 
 While I didn't test fpm against OpenStack packages, I did test it with 
 other internal projects. I faced the same challenges and came to similar 
 conclusions:
 
 If I used fpm instead, step 4 would fail because there is no .changes 
 control file required by dput to upload to APT.
 
 This raises the question:
 
 How are people (using fpm) managing and uploading their deb packages for 
 distribution? APT? Maven? Pulp? Black magic?
 
 I really like APT repositories and would like to continue using them for 
 the time being.

I'm impressed you took the time to setup dput!

You really only need to run apt-ftparchive on a directory full of debs:

apt-ftparchive packages path/to/your/debs | gzip  Packages.gz

You can also use reprepro, which is somewhat handy for combining a
remote repo with locally built debs:

http://mirrorer.alioth.debian.org/

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Packaging with fpm

2015-03-04 Thread Clint Byrum
Excerpts from Mathieu Gagné's message of 2015-03-04 09:39:34 -0800:
 On 2015-03-04 12:18 PM, Clint Byrum wrote:
  Excerpts from Mathieu Gagné's message of 2015-03-04 08:31:45 -0800:
 
  I really like APT repositories and would like to continue using them for
  the time being.
 
  I'm impressed you took the time to setup dput!
 
 It's super simple to setup and use. Create a basic dput.cf and you are 
 good to go.
 
   You can also use reprepro, which is somewhat handy for combining a
   remote repo with locally built debs:
  
 
 I use reprepro too. Super simple to setup and use, would recommend.
 
   You really only need to run apt-ftparchive on a directory full of debs:
  
   apt-ftparchive packages path/to/your/debs | gzip  Packages.gz
  
 
 This is something I would like to avoid as I might not always have full 
 shell access to the repository from where the package is built.
 
 Furthermore, I don't have access to all the packages in the repository 
 in the same folder to manually generate Packages.gz. (reprepro can do it 
 for me)
 
 Ideally, I would like to upload a signed .changes control file to ensure 
 the package wasn't tampered with or got corrupted during the transfer. 
 (since .changes contains checksums)
 

So I guess I didn't realize that dput was that simple to make work for a
private repo. That's pretty interesting.

I do think that fpm not producing a .changes file is probably just a
matter of teaching fpm how to run the step that produces the changes
file, which probably wouldn't be as hard as changing all of your
workflow at this point.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Clint Byrum
Excerpts from George Shuklin's message of 2015-02-11 17:59:02 -0800:
 Ceilometer is in sad state.
 
 1. Collector leaks memory. We ran it on same host with mongo, and it 
 grab 29Gb out of 32, leaving mongo with less than gig memory available.

I wonder how hard it would be to push Ceilometer down the road of being
an OpenStack shim for collectd instead of a full implementation. This
would make the problem above go away, as collectd is written in C and is
well known to be highly optimized for exactly this type of workload.

You would need a more advanced AMQP plugin that understands how to turn
the notifications in OpenStack into collectd values, and then make some
decisions on whether to keep Ceilometer's SQL/MongoDB backend or just
teach Ceilometer to read from the various collectd output formats. I
think the latter will be a bigger win, but the former would be easier
for a more incremental migration.

Anyway, if people are interested in saving Ceilometer from being a
bit sluggish, that seems like a good first step in the investigation.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Clint Byrum
Excerpts from George Shuklin's message of 2015-02-05 05:09:51 -0800:
 Hello everyone.
 
 We are updating our public images regularly (to provide them to 
 customers in up-to-date state). But there is a problem: If some instance 
 starts from image it becomes 'used'. That means:
 * That image is used as _base for nova
 * If instance is reverted this image is used to recreate instance's disk
 * If instance is rescued this image is used as rescue base
 * It is redownloaded during resize/migration (on a new compute node)
 

Some thoughts:

* All of the operations described should be operating on an image ID. So
the other suggestions of renaming seems the right way to go. Ubuntu
14.04 becomes Ubuntu 14.04 02052015 and the ID remains in the system
for a while. If something inside Nova doesn't work with IDs, it seems
like a bug.

* rebuild, revert, rescue, and resize, are all very _not_ cloud things
that increase the complexity of Nova. Perhaps we should all reconsider
their usefulness and encourage our users to spin up new resources, use
volumes and/or backup/restore methods, and then tear down old instances.

One way to encourage them is to make it clear that these operations will
only work for X amount of time before old versions images will be removed.
So if you spin up Ubuntu 14.04 today, reverts and resizes and rescues
are only guaranteed to work for 6 months. Then aggressively clean up 
6 month old image ids. To make this practical, you might even require
a role, something like reverter, rescuer, resizer and only allow
those roles to do these operations, and then before purging images,
notify those users in those roles of instances they won't be able to
resize/rescue/revert anymore.

It also makes no sense to me why migrating an instance requires its
original image. The instance root disk is all that should matter.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Heat in production?

2014-12-15 Thread Clint Byrum
We (the TripleO developers) use Heat as part of TripleO to manage small
(40 nodes) clouds for TripleO. This means that we're using Heat to manage
our bare metal nodes which are in turn managed by nova+ironic.

For the most part it is only used to bring up new nodes or change
parameters (not something we do often). For that purpose, it works fine.

Icehouse Heat lacks some really nice features that came out in Juno,
not the least of which being that Juno's Heat can resume from failed
operations where Icehouse cannot.

Also, my employer, HP, is shipping Heat as part of the Helion distribution
of OpenStack. I'm not certain if anybody is in full production using
the Heat in Helion just yet, as it only recently released.

I also know that Rackspace has been running some kind of Heat public
beta/alpha/something for a while, so it might be worth asking them about
their Heat.

Excerpts from Ari Rubenstein's message of 2014-12-15 15:10:50 -0800:
 Hi there,
 My name is Ari Rubenstein, and I'm new to the list.  I've been researching 
 Heat.  I was wondering if anyone has direct experience or links to how 
 customers are Heat orchestration in the real world.
 Use cases?  Experiences?  Best practices?  Gotchas?
 I'm working with a customer on an Icehouse based cluster.  They're wondering 
 if Heat is ready for production or should be considered beta.  I've heard 
 good things about Heat, but the customer is cautious.
 Thanks in advance,
 - Ari

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Operations project: Packaging

2014-11-19 Thread Clint Byrum
Excerpts from Michael Chapman's message of 2014-11-17 22:16:28 -0800:
 Hi all,
 
 Packaging was one of the biggest points of interest in the Friday Paris
 meeting, and I'd like to use this thread to have a centralised discussion
 and/or argument regarding whether there is a packaging system that is
 flexible enough that we can adopt it as a community and reduce the
 fragmentation. This conversation began in Paris, but will likely continue
 for some time.
 
 The Friday session indicates that as operators we have common requirements:
 
 A system that takes the sources from upstream projects and produces
 artifacts (packages or images).
 
 There are numerous projects that have attempted to solve this problem. Some
 are on stackforge, some live outside. If you are an author or a user of one
 of these systems, please give your opinion.
 
 Once it becomes clear who is interested in this topic, we can create a
 working group that will move towards standardising on a single system that
 meets the needs of the community. Once the key individuals for this project
 are clear, we can schedule an appropriate meeting time on irc for further
 discussion that will probably be needed.

The responses to this thread have been pretty interesting.

Count me in as interested in pulling packages into the image building
that we do for TripleO. A number of operators have expressed confusion
and concern about updating an app with whole image updates, so it would
be quite useful if we could instead just help people build and distribute
their own packages on top of the image based deploys that we already do.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Proposal for an 'Operations' project

2014-11-09 Thread Clint Byrum
Excerpts from Jeremy Stanley's message of 2014-11-08 16:23:02 -0800:
 On 2014-11-08 04:08:08 -0500 (-0500), Fischer, Matt wrote:
 [...]
  Perhaps some of the code fits in some places as previously
  mentioned on the list, but the issue is that none of those
  projects really focus on operations. The projects are inevitably
  developer focused, despite the best attempts to get operator
  feedback.
 [...]
 
 I would counter that we have lots of operator-focused projects
 already underway... the Infra and QA teams, for example, have plenty
 of projects which are entirely shell scripts and configuration
 management. If you were in any of the Deployment Program's design
 sessions, there was a fairly consistent message that Triple-O is
 encouraging direct involvement from the various config management
 teams to bring more officialness to the diverse tools with which
 OpenStack is deployed and managed at production sites.
 
 If the projects you want to start aren't focused on deployment and
 lifecycle management, nor on community infrastructure tools, nor on
 documentation, then I would buy that there's some potential project
 use cases for which we haven't made suitable homes yet. But I'd hate
 to see that used to further what I see as an unnecessary separation
 between developers and operators.

There's this crazy movement underway called DevOps where we stop
treating these two groups as independent victims of each-other's
conflicting responsibilities. Instead we need to see each as simply _more_
focused on one responsibility or another, but all on the same team with
the same end goal of a stable, agile deployment.

Given that most of us seem to believe that, I am really confused
why anybody sees the Deployment Program as anything other than an
operations focused project. It is entirely focused on deploying and
operating OpenStack. The fact that we've stated we will use components
of OpenStack whenever that is possible is secondary to the first charge
of actually deploying a managable OpenStack cloud.

Now I understand, in the past we have been entirely prescriptive and
opinionated on levels that have made large portions of the operators
feel excluded. That may have been necessary to make some early progress,
but I don't believe it will continue.

I sat with several people who attended the gathering of operators that
this thread references, and at least the few of us there agreed that most
of what they discussed wasn't specifically about deploying OpenStack,
but about deploying it with all of the supporting tools that surround
OpenStack. This sounds like the beginnings of a complimentary program.

I am quite excited to see some discussion around coalescing these
efforts. Having diverse deployments is useful for finding efficient ways
to solve real problems. How you get logs into LogStash and what Kibana
queries you use to find issues is interesting. Whether you express that
you want your logs in LogStash as Chef, Puppet, or diskimage-builder
elements with os-apply-config templates and os-refresh-config scripts
is pretty uninteresting in comparison.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Proposal for an 'Operations' project

2014-11-06 Thread Clint Byrum
Excerpts from Michael Chapman's message of 2014-11-06 12:20:52 +0100:
 Hi Operators!
 
 I felt like OpenStack didn't have enough projects, so here's another one.
 
 During the summit I feel like I'm repeatedly having the same conversations
 with different groups about bespoke approaches to operational tasks, and
 wrapping these in a project might be a way to promote collaboration on them.
 
 Off the top of my head there's half a dozen things that might belong here:
 
  - Packaging tooling (spec files/fpm script/whatever)
  - (Ansible/other) playbooks for common tasks
  - Log aggregation (Logstash/Splunk) filters and tagging
  - Monitoring (Nagios) checks
  - Ops dashboards
 
 There's also things that *might* belong here but maybe not:
 
  - Chef/Puppet/Ansible/Salt OpenStack config management modules
 
 Today these are generally either wrapped up in products from various
 companies, or in each company's github repo.
 

Except in the case of TripleO which is available as an official
OpenStack project.

I think we're at a point where we're far more open than we've ever been
before to the idea of collaborating with many different tools and
approaches. So, would you entertain the idea of participating in the 
Deployment
program rather than starting smoething new?

 For those of you who are still around at the design summit and don't have
 plans for tomorrow, how about we meet on Friday morning at the large white
 couch in the meridian foyer at 9am? Let's see what we can sort out.
 

You're more than welcome in the TripleO meetup tomorrow in Gaughin.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators