[Openstack-operators] osops-tools-monitoring Dependency problems

2018-10-19 Thread Tomáš Vondra
Hi!
I'm a long time user of monitoring-for-openstack, also known as oschecks.
Concretely, I used a version from 2015 with OpenStack python client
libraries from Kilo. Now I have upgraded them to Mitaka and it got broken.
Even the latest oschecks don't work. I didn't quite expect that, given that
there are several commits from this year e.g. by Nagasai Vinaykumar
Kapalavai and paramite. Can one of them or some other user step up and say
what version of OpenStack clients is oschecks working with? Ideally, write
it down in requirements.txt so that it will be reproducible? Also, some
documentation of what is the minimal set of parameters would also come in
handy.
Thanks a lot, Tomas from Homeatcloud

The error messages are as absurd as:
oschecks-check_glance_api --os_auth_url='http://10.1.101.30:5000/v2.0'
--os_username=monitoring --os_password=XXX --os_tenant_name=monitoring

CRITICAL: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/oschecks/utils.py", line 121, in
safe_run
method()
  File "/usr/lib/python2.7/dist-packages/oschecks/glance.py", line 29, in
_check_glance_api
glance = utils.Glance()
  File "/usr/lib/python2.7/dist-packages/oschecks/utils.py", line 177, in
__init__
self.glance.parser = self.glance.get_base_parser(sys.argv)
TypeError: get_base_parser() takes exactly 1 argument (2 given)

(I can see 4 parameters on the command line.)


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey

2018-04-27 Thread Tomáš Vondra
Hi!

What we‘ve got in our small public cloud:

 

scheduler_default_filters=AggregateInstanceExtraSpecsFilter,

AggregateImagePropertiesIsolation,

RetryFilter,

AvailabilityZoneFilter,

AggregateRamFilter,

AggregateDiskFilter,

AggregateCoreFilter,

ComputeFilter,

ImagePropertiesFilter,

ServerGroupAntiAffinityFilter,

ServerGroupAffinityFilter

 

#ComputeCapabilitiesFilter off because of conflict with 
AggregateInstanceExtraSpecFilter https://bugs.launchpad.net/nova/+bug/1279719

 

I really like to set resource limits using Aggregate metadata.

Also, Windows host isolation is done using image metadata. I have filled a bug 
somewhere that it does not work correctly with Boot from Volume. I believe it 
got pretty much ignored. That’s why we also use flavor metadata.

 

Tomas from Homeatcloud

 

From: Massimo Sgaravatto [mailto:massimo.sgarava...@gmail.com] 
Sent: Saturday, April 21, 2018 7:49 AM
To: Simon Leinen
Cc: OpenStack Development Mailing List (not for usage questions); OpenStack 
Operators
Subject: Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler 
filters survey

 

enabled_filters = 
AggregateInstanceExtraSpecsFilter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,AggregateRamFilter,AggregateCoreFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter

 

Cheers, Massimo

 

On Wed, Apr 18, 2018 at 10:20 PM, Simon Leinen  wrote:

Artom Lifshitz writes:
> To that end, we'd like to know what filters operators are enabling in
> their deployment. If you can, please reply to this email with your
> [filter_scheduler]/enabled_filters (or
> [DEFAULT]/scheduler_default_filters if you're using an older version)
> option from nova.conf. Any other comments are welcome as well :)

We have the following enabled on our semi-public (academic community)
cloud, which runs on Newton:

AggregateInstanceExtraSpecsFilter
AvailabilityZoneFilter
ComputeCapabilitiesFilter
ComputeFilter
ImagePropertiesFilter
PciPassthroughFilter
RamFilter
RetryFilter
ServerGroupAffinityFilter
ServerGroupAntiAffinityFilter

(sorted alphabetically) Recently we've also been trying

AggregateImagePropertiesIsolation

...but it looks like we'll replace it with our own because it's a bit
awkward to use for our purpose (scheduling Windows instance to licensed
compute nodes).
-- 
Simon.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] about rebuild instance booted from volume

2018-03-14 Thread Tomáš Vondra
Hi!
I say delete! Delete them all!
Really, it's called delete_on_termination and should be ignored on Rebuild.
We have a VPS service implemented on top of OpenStack and do throw the old 
contents away on Rebuild. When the user has the Backup service paid, they can 
restore a snapshot. Backup is implemented as volume snapshot, then clone 
volume, then upload to image (glance is on a different disk array).

I also sometimes multi-attach a volume manually to a service node and just dd 
an image onto it. If it was to be implemented this way, then there would be no 
deleting a volume with delete_on_termination, just overwriting. But the effect 
is the same.

IMHO you can have snapshots of volumes that have been deleted. Just some 
backends like our 3PAR don't allow it, but it's not disallowed in the API 
contract.
Tomas from Homeatcloud

-Original Message-
From: Saverio Proto [mailto:ziopr...@gmail.com] 
Sent: Wednesday, March 14, 2018 3:19 PM
To: Tim Bell; Matt Riedemann
Cc: OpenStack Development Mailing List (not for usage questions); 
openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [openstack-dev] [nova] about rebuild 
instance booted from volume

My idea is that if delete_on_termination flag is set to False the Volume should 
never be deleted by Nova.

my 2 cents

Saverio

2018-03-14 15:10 GMT+01:00 Tim Bell :
> Matt,
>
> To add another scenario and make things even more difficult (sorry (), if the 
> original volume has snapshots, I don't think you can delete it.
>
> Tim
>
>
> -Original Message-
> From: Matt Riedemann 
> Reply-To: "OpenStack Development Mailing List (not for usage 
> questions)" 
> Date: Wednesday, 14 March 2018 at 14:55
> To: "openstack-...@lists.openstack.org" 
> , openstack-operators 
> 
> Subject: Re: [openstack-dev] [nova] about rebuild instance booted from 
> volume
>
> On 3/14/2018 3:42 AM, 李杰 wrote:
> >
> >  This is the spec about  rebuild a instance booted from
> > volume.In the spec,there is a
> >question about if we should delete the old root_volume.Anyone who
> > is interested in
> >booted from volume can help to review this. Any suggestion is
> > welcome.Thank you!
> >The link is here.
> >Re:the rebuild spec:https://review.openstack.org/#/c/532407/
>
> Copying the operators list and giving some more context.
>
> This spec is proposing to add support for rebuild with a new image for
> volume-backed servers, which today is just a 400 failure in the API
> since the compute doesn't support that scenario.
>
> With the proposed solution, the backing root volume would be deleted and
> a new volume would be created from the new image, similar to how boot
> from volume works.
>
> The question raised in the spec is whether or not nova should delete the
> root volume even if its delete_on_termination flag is set to False. The
> semantics get a bit weird here since that flag was not meant for this
> scenario, it's meant to be used when deleting the server to which the
> volume is attached. Rebuilding a server is not deleting it, but we would
> need to replace the root volume, so what do we do with the volume we're
> replacing?
>
> Do we say that delete_on_termination only applies to deleting a server
> and not rebuild and therefore nova can delete the root volume during a
> rebuild?
>
> If we don't delete the volume during rebuild, we could end up leaving a
> lot of volumes lying around that the user then has to clean up,
> otherwise they'll eventually go over quota.
>
> We need user (and operator) feedback on this issue and what they would
> expect to happen.
>
> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator
> s

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How are you handling billing/chargeback?

2018-03-13 Thread Tomáš Vondra
Hi!

We at Homeatcloud have rolled our own engine taking data from Ceilometer 
events. However, CloudKitty didn‘t exist back then. Now we would probably use 
it to calculate the rating AND roll our own engine for billing and invoice 
printing.

Tomas

 

From: Flint WALRUS [mailto:gael.ther...@gmail.com] 
Sent: Monday, March 12, 2018 9:41 PM
To: Lars Kellogg-Stedman
Cc: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] How are you handling billing/chargeback?

 

Hi lars, personally using an internally crafted service.

It’s one of my main regret with Openstack, lack of a decent billing system.

Le lun. 12 mars 2018 à 20:22, Lars Kellogg-Stedman  a écrit :

Hey folks,

I'm curious what folks out there are using for chargeback/billing in
your OpenStack environment.

Are you doing any sort of chargeback (or showback)?  Are you using (or
have you tried) CloudKitty?  Or some other existing project?  Have you
rolled your own instead?

I ask because I am helping out some folks get a handle on the
operational side of their existing OpenStack environment, and they are
interested in but have not yet deployed some sort of reporting
mechanism.

Thanks,

--
Lars Kellogg-Stedman  | larsks @ {irc,twitter,github}
http://blog.oddbit.com/|

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] thierry's longer dev cycle proposal

2017-12-15 Thread Tomáš Vondra
The thread on the dev list is already too long for my liking. I hope there will 
be a TL;DR in the dev mailing list digest.
Tomas

-Original Message-
From: arkady.kanev...@dell.com [mailto:arkady.kanev...@dell.com] 
Sent: Thursday, December 14, 2017 3:40 AM
To: mrhills...@gmail.com; fu...@yuggoth.org; 
openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] thierry's longer dev cycle proposal

It is a sign of the maturity of OpenStack. With lots of deployment and most of 
them in production, the emphasis is shifting from rapid functionality additions 
to stability, manageability, and long term operability.

-Original Message-
From: Melvin Hillsman [mailto:mrhills...@gmail.com] 
Sent: Wednesday, December 13, 2017 5:29 PM
To: Jeremy Stanley ; openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] thierry's longer dev cycle proposal

I think this is a good opportunity to allow some stress relief to the developer 
community and offer space for more discussions with operators where some 
operators do not feel like they are bothering/bugging developers. I believe 
this is the main gain for operators; my personal opinion. In general I think 
the opportunity costs/gains are worth it for this and it is the responsibility 
of the community to make the change be useful as you mentioned in your original 
thread Thierry. It is not a silver bullet for all of the issues folks have with 
the way things are done but I believe that if it does not hurt things and 
offers even a slight gain in some area it makes sense.

Any change is not going to satisfy/dis-satisfy 100% of the constituents.

-- 
Kind regards,

Melvin Hillsman
mrhills...@gmail.com
mobile: +1 (832) 264-2646
irc: mrhillsman

On 12/13/17, 4:39 PM, "Jeremy Stanley"  wrote:

On 2017-12-13 22:35:41 +0100 (+0100), Thierry Carrez wrote:
[...]
> It's not really fait accompli, it's just a proposal up for discussion at
> this stage. Which is the reason why I started the thread on -dev -- to
> check the sanity of the change from a dev perspective first. If it makes
> things harder and not simpler on that side, I don't expect the TC to
> proceed.
[...]

With my TC hat on, regardless of what impression the developer
community has on this, I plan to take subsequent operator and
end-user/app-dev feedback into account as well before making any
binding decisions (and expect other TC members feel the same).
-- 
Jeremy Stanley
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-06 Thread Tomáš Vondra
Dear Clint,
maybe you misunderstood a little, or I didn't write it explicitly. We use 
OpenStack for providing a VPS service, yes. But the VPS users do not get access 
to OpenStack directly, but instead, they use our Customer Portal which does the 
orchestration. The whole point is to make the service as easy as possible to 
use for them and not expose them to the complexity of the Cloud. As I said, we 
couldn't use Rebuild because VPS's have Volumes. We do use Resize because it is 
there. But we could as well use more low-level cloud primitives. The user does 
not care in this case. How does, e.g., WHMCS do it? That is a stock software 
that you can use to provide VPS over OpenStack.
Tomas from Homeatcloud

-Original Message-
From: Clint Byrum [mailto:cl...@fewbar.com] 
Sent: Thursday, October 05, 2017 6:50 PM
To: openstack-operators
Subject: Re: [Openstack-operators] [nova] Should we allow passing new user_data 
during rebuild?

No offense is intended, so please forgive me for the possibly incendiary nature 
of what I'm about to write:

VPS is the predecessor of cloud (and something I love very much, and rely on 
every day!), and encourages all the bad habits that a cloud disallows. At small 
scale, it's the right thing, and that's why I use it for my small scale needs. 
Get a VM, put your stuff on it, and keep it running forever.

But at scale, VMs in clouds go away. They get migrated, rebooted, turned off, 
and discarded, often. Most clouds are terrible for VPS compared to VPS hosting 
environments.

I'm glad it's working for you. And I think rebuild and resize will stay and 
improve to serve VPS style users in interesting ways. I'm learning now who our 
users are today, and I'm confident we should make sure everyone who has taken 
the time to deploy and care for OpenStack should be served by expanding rebuild 
to meet their needs.

You can all consider this my white flag. :)

Excerpts from Tomáš Vondra's message of 2017-10-05 10:22:14 +0200:
> In our cloud, we offer the possibility to reinstall the same or another OS on 
> a VPS (Virtual Private Server). Unfortunately, we couldn’t use the rebuild 
> function because of the VPS‘s use of Cinder for root disk. We create a new 
> instance and inject the same User Data so that the new instance has the same 
> password and key as the last one. It also has the same name, and the same 
> floating IP is attached. I believe it even has the same IPv6 through some 
> Neutron port magic.
> 
> BTW, you wouldn’t believe how often people use the Reinstall feature.
> 
> Tomas from Homeatcloud
> 
>  
> 
> From: Belmiro Moreira [mailto:moreira.belmiro.email.li...@gmail.com]
> Sent: Wednesday, October 04, 2017 5:34 PM
> To: Chris Friesen
> Cc: openstack-operators@lists.openstack.org
> Subject: Re: [Openstack-operators] [nova] Should we allow passing new 
> user_data during rebuild?
> 
>  
> 
> In our cloud rebuild is the only way for a user to keep the same IP. 
> Unfortunately, we don't offer floating IPs, yet.
> 
> Also, we use the user_data to bootstrap some actions in new instances 
> (puppet, ...).
> 
> Considering all the use-cases for rebuild it would be great if the user_data 
> can be updated at rebuild time.
> 
>  
> 
> On Wed, Oct 4, 2017 at 5:15 PM, Chris Friesen  
> wrote:
> 
> On 10/03/2017 11:12 AM, Clint Byrum wrote:
> 
> My personal opinion is that rebuild is an anti-pattern for cloud, and 
> should be frozen and deprecated. It does nothing but complicate Nova 
> and present challenges for scaling.
> 
> That said, if it must stay as a feature, I don't think updating the 
> user_data should be a priority. At that point, you've basically 
> created an entirely new server, and you can already do that by 
> creating an entirely new server.
> 
> 
> If you've got a whole heat stack with multiple resources, and you realize 
> that you messed up one thing in the template and one of your servers has the 
> wrong personality/user_data, it can be useful to be able to rebuild that one 
> server without affecting anything else in the stack.  That's just a 
> convenience though.
> 
> Chris
> 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?

2017-10-05 Thread Tomáš Vondra
In our cloud, we offer the possibility to reinstall the same or another OS on a 
VPS (Virtual Private Server). Unfortunately, we couldn’t use the rebuild 
function because of the VPS‘s use of Cinder for root disk. We create a new 
instance and inject the same User Data so that the new instance has the same 
password and key as the last one. It also has the same name, and the same 
floating IP is attached. I believe it even has the same IPv6 through some 
Neutron port magic.

BTW, you wouldn’t believe how often people use the Reinstall feature.

Tomas from Homeatcloud

 

From: Belmiro Moreira [mailto:moreira.belmiro.email.li...@gmail.com] 
Sent: Wednesday, October 04, 2017 5:34 PM
To: Chris Friesen
Cc: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [nova] Should we allow passing new user_data 
during rebuild?

 

In our cloud rebuild is the only way for a user to keep the same IP. 
Unfortunately, we don't offer floating IPs, yet.

Also, we use the user_data to bootstrap some actions in new instances (puppet, 
...).

Considering all the use-cases for rebuild it would be great if the user_data 
can be updated at rebuild time.

 

On Wed, Oct 4, 2017 at 5:15 PM, Chris Friesen  
wrote:

On 10/03/2017 11:12 AM, Clint Byrum wrote:

My personal opinion is that rebuild is an anti-pattern for cloud, and
should be frozen and deprecated. It does nothing but complicate Nova
and present challenges for scaling.

That said, if it must stay as a feature, I don't think updating the
user_data should be a priority. At that point, you've basically created an
entirely new server, and you can already do that by creating an entirely
new server.


If you've got a whole heat stack with multiple resources, and you realize that 
you messed up one thing in the template and one of your servers has the wrong 
personality/user_data, it can be useful to be able to rebuild that one server 
without affecting anything else in the stack.  That's just a convenience though.

Chris




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] cinder/nova issues

2017-08-24 Thread Tomáš Vondra
Hi!

If this was OpenStack Kilo and HPE 3PAR over Fibre Channel, I would tell you 
that the volume extend operation is designed to work with detached volumes 
only. Hence you need cinder reset-state. At least in our case, it does not 
update the SCSI devices and multipath setup. The volume continues to work with 
the old size. We do a live migrate operation afterwards to disconnect the 
storage from one node and connect to another. Even resize to the same node 
works. However, os-brick was introduced in Liberty, so the case may be 
different.

Tomas

 

From: Adam Dibiase [mailto:adibi...@digiumcloud.com] 
Sent: Wednesday, August 23, 2017 9:06 PM
To: Sean McGinnis
Cc: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] cinder/nova issues

 

Thanks Sean. I filed a bug report to track this. Bug #1712651. I would agree 
with you on connectivity issues with the Netapp if it happened on all volume 
extensions, but this only happens in one scenario only.




Thanks, 

 

Adam

 

 

 

 

On Wed, Aug 23, 2017 at 2:04 PM, Sean McGinnis  wrote:

Hey Adam,

There have been some updates since Liberty to improve handling in the os-brick
library that handles the local device management. But with this showing the
paths down, I wonder if there's something else going on there between the
NetApp box and the Nova compute host.

Could you file a bug to track this? I think you could just copy and paste the
content of your original email since it captures a lot of great info.

https://bugs.launchpad.net/cinder/+filebug

We can tag it with netapp so maybe it will get some attention there.

Thanks,
Sean

On Wed, Aug 23, 2017 at 01:01:24PM -0400, Adam Dibiase wrote:
> Greetings,
>
> I am having an issue with nova starting an instance that is using a root
> volume that cinder has extended. More specifically, a volume that has been
> extended past the max resize limit of our Netapp filer. I am running
> Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take advantage
> of this functionality. From what I can gather, it uses sub-lun cloning to
> get past the hard limit set by Netapp when cloning past 64G (starting from
> a 4G volume).
>
> *Environment*:
>
>- Release: Liberty
>- Filer:   Netapp
>- Protocol: Fiberchannel
>- Multipath: yes
>
>
>
> *Steps to reproduce: *
>
>- Create new instance
>- stop instance
>- extend the volume by running the following commands:
>   - cinder reset-state --state available (volume-ID or name)
>   - cinder extend (volume-ID or name) 100
>   - cinder reset-state --state in-use (volume-ID or name)
>- start instance with either nova start or nova reboot --hard  --same
>result
>
>
> I can see that the instance's multipath status is good before the resize...
>
> *360a98000417643556a2b496d58665473 dm-17 NETAPP  ,LUN *
>
> size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
>
> |-+- policy='round-robin 0' prio=-1 status=active
>
> | |- 6:0:1:5 sdy   65:128 active undef  running
>
> | `- 7:0:0:5 sdz   65:144 active undef  running
>
> `-+- policy='round-robin 0' prio=-1 status=enabled
>
>   |- 6:0:0:5 sdx   65:112 active undef  running
>
>   `- 7:0:1:5 sdaa  65:160 active undef  running
>
>
> Once the volume is resized, the lun goes to a failed state and it does not
> show the new size:
>
>
> *360a98000417643556a2b496d58665473 dm-17 NETAPP  ,LUN *
>
> size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
>
> |-+- policy='round-robin 0' prio=-1 status=enabled
>
> | |- 6:0:1:5 sdy   65:128 failed undef  running
>
> | `- 7:0:0:5 sdz   65:144 failed undef  running
>
> `-+- policy='round-robin 0' prio=-1 status=enabled
>
>   |- 6:0:0:5 sdx   65:112 failed undef  running
>
>   `- 7:0:1:5 sdaa  65:160 failed undef  running
>
>
> Like I said, this only happens on volumes that have been extended past 64G.
> Smaller sizes to not have this issue. I can only assume that the original
> lun is getting destroyed after the clone process and that is cause of the
> failed state. Why is it not picking up the new one and attaching it to the
> compute node?  Is there something I am missing?
>
> Thanks in advance,
>
> Adam

> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] custom build image is slow

2017-08-01 Thread Tomáš Vondra
Hi!

How big are the actual image files? Because qcow2 is a sparse format, it does 
not store zeroes. If the free space in one image is zeroed out, it will convert 
much faster. If that is the problem, use „dd if=/dev/zero of=temp;sync;rm temp“ 
or zerofree.

Tomas

 

From: Paras pradhan [mailto:pradhanpa...@gmail.com] 
Sent: Monday, July 31, 2017 11:54 PM
To: openstack-operators@lists.openstack.org
Subject: [Openstack-operators] custom build image is slow

 

Hello

 

I have two qcow2 images uploaded to glance. One is CentOS 7 cloud image 
downloaded from centos.org.  The other one is custom built using CentOS 7.DVD.  
When I create cinder volumes from them, volume creation from the custom built 
image it is very very slow.

 

 

CenOS qcow2:

 

2017-07-31 21:42:44.287 881609 INFO cinder.image.image_utils 
[req-ea2d7b12-ae9e-45b2-8b4b-ea8465497d5a 
e090e605170a778610438bfabad7aa7764d0a77ef520ae392e2b59074c9f88cf 
490910c1d4e1486d8e3a62d7c0ae698e - d67a18e70dd9467db25b74d33feaad6d default] 
Converted 8192.00 MB image at 253.19 MB/s

 

Custom built qcow2:

INFO cinder.image.image_utils [req-032292d8-1500-474d-95c7-2e8424e2b864 
e090e605170a778610438bfabad7aa7764d0a77ef520ae392e2b59074c9f88cf 
490910c1d4e1486d8e3a62d7c0ae698e - d67a18e70dd9467db25b74d33feaad6d default] 
Converted 10240.00 MB image at 32.22 MB/s

 

I used the following command to create the qcow2 file

qemu-img create -f qcow2 custom.qcow2 10G

 

What am I missing ?

 

Thanks
Paras.

 

 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] instance partition size discrepancy

2017-04-26 Thread Tomáš Vondra
Have you tried
# resize2fs /dev/vda
?
Alternatively, if you use images with cloud-init and initramfs-growroot 
installed, it should work out of the box.
Tomas

-Original Message-
From: Carlos Konstanski [mailto:ckonstan...@pippiandcarlos.com] 
Sent: Wednesday, April 26, 2017 12:02 AM
To: openstack-operators@lists.openstack.org
Subject: [Openstack-operators] [nova] instance partition size discrepancy

I'm having an issue where the instance thinks its root filesystem is much 
smaller than the size of the volume that I used to create it. Not only that, 
the OS cannot decide on whether it thinks the size is right or wrong.

See the following pastebin:
https://paste.pound-python.org/show/eNt8nLNLhHAL5OYICqbs/

Notice that everything shows the size as 20 GB except df, which shows it as 2.8 
GB. I ran the previous instance out of space before spinning up this new one, 
so 2.8 seems to be the winner (though wrong).

Figured I'd check to see if this is a known issue while I dig deeper.

Carlos Konstanski

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [scientific] Resource reservation requirements (Blazar) - Forum session

2017-04-04 Thread Tomáš Vondra
Hi!
Did someone mention automation changing the spot instance capacity? I did an 
article in 2013 that proposes exactly that. The model forecasts the workload 
curve of the majority traffic, which is presumed to be interactive, and the 
rest may be used for batch traffic. The forecast used is SARIMA and is usable 
up to a few days in advance. Would anybody be interested in trying the forecast 
on data from their cloud?
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.671.7397=rep1=pdf¨
Tomas Vondra, dept. of Cybernetics, CTU FEE

-Original Message-
From: Blair Bethwaite [mailto:blair.bethwa...@gmail.com] 
Sent: Tuesday, April 04, 2017 12:08 AM
To: Jay Pipes
Cc: openstack-oper.
Subject: Re: [Openstack-operators] [scientific] Resource reservation 
requirements (Blazar) - Forum session

Hi Jay,

On 4 April 2017 at 00:20, Jay Pipes  wrote:
> However, implementing the above in any useful fashion requires that 
> Blazar be placed *above* Nova and essentially that the cloud operator 
> turns off access to Nova's  POST /servers API call for regular users. 
> Because if not, the information that Blazar acts upon can be simply 
> circumvented by any user at any time.

That's something of an oversimplification. A reservation system outside of Nova 
could manipulate Nova host-aggregates to "cordon off"
infrastructure from on-demand access (I believe Blazar already uses this 
approach), and it's not much of a jump to imagine operators being able to 
twiddle the available reserved capacity in a finite cloud so that reserved 
capacity can be offered to the subset of users/projects that need (or perhaps 
have paid for) it. Such a reservation system would even be able to backfill 
capacity between reservations. At the end of the reservation the system 
cleans-up any remaining instances and preps for the next reservation.

The are a couple of problems with putting this outside of Nova though.
The main issue is that pre-emptible/spot type instances can't be accommodated 
within the on-demand cloud capacity. You could have the reservation system 
implementing this feature, but that would then put other scheduling constraints 
on the cloud in order to be effective (e.g., there would need to be automation 
changing the size of the on-demand capacity so that the maximum pre-emptible 
capacity was always available). The other issue (admittedly minor, but still a
consideration) is that it's another service - personally I'd love to see Nova 
support these advanced use-cases directly.

--
Cheers,
~Blairo

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Flavors

2017-03-16 Thread Tomáš Vondra
We at Homeatcloud.com do exactly this in our VPS service. The user can 
configure the VPS with any combination of CPU, RAM, and disk. However a) the 
configurations are all about 10% the size of the physical machines and b) the 
disks are in a SAN array, provisioned as volumes. So I give the users some 
flexibility and can better see what configurations they actually want and build 
new hypervisors with that in mind. They mostly want up to 4 GB RAM anyway, si 
it’s not a big deal.

Tomas Vondra

 

From: Adam Lawson [mailto:alaw...@aqorn.com] 
Sent: Thursday, March 16, 2017 5:57 PM
To: Jonathan D. Proulx
Cc: OpenStack Operators
Subject: Re: [Openstack-operators] Flavors

 

One way I know some providers work around this when using OpenStack is by 
fronting the VM request with some code in the web server that checks if the 
requested spec has an existing flavor. if so, use the flavor, if not, use an 
admin account that creates a new flavor and assign use it for that user request 
then remove if when the build is complete. This naturally impacts your control 
over hardware efficiency but it makes your scenario possible (for better or for 
worse). I also hate being forced to do what someone else decided was going to 
be best for me. That's my decision and thanksully with openStack, this kind of 
thing is rather easy to do.

 

//adam





Adam Lawson

 

Principal Architect, CEO

Office: +1-916-794-5706

 

On Thu, Mar 16, 2017 at 7:52 AM, Jonathan D. Proulx  wrote:


I have always hated flavors and so do many of my users.

On Wed, Mar 15, 2017 at 03:22:48PM -0700, James Downs wrote:
:On Wed, Mar 15, 2017 at 10:10:00PM +, Fox, Kevin M wrote:
:> I think the really short answer is something like: It greatly simplifies 
scheduling and billing.
:
:The real answer is that once you buy hardware, it's in a fixed radio of 
CPU/Ram/Disk/IOPS, etc.

This while apparently reasonable is BS (at least in private cloud
space).  What users request and what they actualy use are wildly
divergent.

*IF* usage of claimed resorces were at or near optimal then this might
be true .  But if people are claiming 32G of ram because that how much
you assigned to a 16 vCPU instance type but really just need 16
threads with 2G or 4G then you packing still sucks.

I'm mostly bound on memory so I mostly have my users select on that
basis and over provide and over provision CPU since that can be
effectively shared between VMs where memory needs to be dedicated
(well mostly)

I'm sure I've ranted abotu this before but as you see from other
responses we seem to be in the minority position so mostly I rant at
the walls while my office mates look on perplexed (actually they're
pretty used to it by now and ignore me :) )

-Jon


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Do you use os-instance-usage-audit-log?

2017-01-13 Thread Tomáš Vondra
Hi Matt,
I've looked at my Nova config and yes, I have it on. We do billing using
Ceilometer data and I think compute.instance.exists is consumed as well. The
Ceilometer event retention is set to 6 months and the database size is in
single gigabytes. Nova database table task_log only contains the fact that
the audit job ran successfully and has 6 MB. It was not pruned for more than
a year.
Tomas

-Original Message-
From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com] 
Sent: Thursday, January 12, 2017 12:09 AM
To: openstack-operators@lists.openstack.org
Subject: [Openstack-operators] [nova] Do you use
os-instance-usage-audit-log?

Nova's got this REST API [1] which pulls task_log data from the nova
database if the 'instance_usage_audit' config option value is True on any
compute host.

That table is populated in a periodic task from all computes that have it
enabled and by default it 'audits' instances created in the last month (the
time window is adjustable via the 'instance_get_active_by_window_joined'
config option).

The periodic task also emits a 'compute.instance.exists' notification for
each instance on that compute host which falls into the audit period. I'm
fairly certain that notification is meant to be consumed by Ceilometer which
is going to store it in it's own time-series database.

It just so happens that Nova is also storing this audit data in it's own
database, and never cleaning it up - the only way in-tree to move that data
out of the nova.task_log table is to archive it into shadow tables, but that
doesn't cut down on the bloat in your database. That
os-instance-usage-audit-log REST API is relying on the nova database though.

So my question is, is anyone using this in any shape or form, either via the
Nova REST API or Ceilometer? Or are you using it in one form but not the
other (maybe only via Ceilometer)? If you're using it, how are you
controlling the table growth, i.e. are you deleting records over a certain
age from the nova database using a cron job?

Mike Bayer was going to try and find some large production data sets to see
how many of these records are in a big and busy production DB that's using
this feature, but I'm also simply interested in how people use this, if it's
useful at all, and if there is interest in somehow putting a limit on the
data, i.e. we could add a config option to nova to only store records in the
task_log table under a certain max age.

[1] 
http://developer.openstack.org/api-ref/compute/#server-usage-audit-log-os-in
stance-usage-audit-log

-- 

Thanks,

Matt Riedemann


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RabbitMQ 3.6.x experience?

2017-01-09 Thread Tomáš Vondra
We have upgraded to RabbitMQ 3.6, and it resulted in one node crashing about 
every week on out of memory errors. To avoid this, we had to turn off the 
message rate collection. So no throughput graphs until it gets fixed. Avoid 
this version if you can.

Tomas

 

From: Sam Morrison [mailto:sorri...@gmail.com] 
Sent: Monday, January 09, 2017 3:55 AM
To: Matt Fischer
Cc: OpenStack Operators
Subject: Re: [Openstack-operators] RabbitMQ 3.6.x experience?

 

We’ve been running 3.6.5 for sometime now and it’s working well.

 

3.6.1 - 3.6.3 are unusable, we had lots of issues with stats DB and other 
weirdness. 

 

Our setup is a 3 physical node cluster with around 9k connections, average 
around the 300 messages/sec delivery. We have the stats sample rate set to 
default and it is working fine.

 

Yes we did have to restart the cluster to upgrade.

 

Cheers,

Sam

 

 

 

On 6 Jan 2017, at 5:26 am, Matt Fischer  wrote:

 

MIke,

 

I did a bunch of research and experiments on this last fall. We are running 
Rabbit 3.5.6 on our main cluster and 3.6.5 on our Trove cluster which has 
significantly less load (and criticality). We were going to upgrade to 3.6.5 
everywhere but in the end decided not to, mainly because there was little 
perceived benefit at the time. Our main issue is unchecked memory growth at 
random times. I ended up making several config changes to the stats collector 
and then we also restart it after every deploy and that solved it (so far). 

 

I'd say these were my main reasons for not going to 3.6 for our control nodes:

*   In 3.6.x they re-wrote the stats processor to make it parallel. In 
every 3.6 release since then, Pivotal has fixed bugs in this code. Then finally 
they threw up their hands and said "we're going to make a complete rewrite in 
3.7/4.x" (you need to look through issues on Github to find this discussion)
*   Out of the box with the same configs 3.6.5 used more memory than 3.5.6, 
since this was our main issue, I consider this a negative.
*   Another issue is the ancient version of erlang we have with Ubuntu 
Trusty (which we are working on) which made upgrades more complex/impossible 
depending on the version.

Given those negatives, the main one being that I didn't think there would be 
too many more fixes to the parallel statsdb collector in 3.6, we decided to 
stick with 3.5.6. In the end the devil we know is better than the devil we 
don't and I had no evidence that 3.6.5 would be an improvement.

 

I did decide to leave Trove on 3.6.5 because this would give us some bake-in 
time if 3.5.x became untenable we'd at least have had it up and running in 
production and some data on it.

 

If statsdb is not a concern for you, I think this changes the math and maybe 
you should use 3.6.x. I would however recommend at least going to 3.5.6, it's 
been better than 3.3/3.4 was.

 

No matter what you do definitely read all the release notes. There are some 
upgrades which require an entire cluster shutdown. The upgrade to 3.5.6 did not 
require this IIRC.

 

Here's the hiera for our rabbit settings which I assume you can translate:

 

rabbitmq::cluster_partition_handling: 'autoheal'

rabbitmq::config_variables:

  'vm_memory_high_watermark': '0.6'

  'collect_statistics_interval': 3

rabbitmq::config_management_variables:

  'rates_mode': 'none'

rabbitmq::file_limit: '65535'

 

Finally, if you do upgrade to 3.6.x please report back here with your results 
at scale!

 

 

On Thu, Jan 5, 2017 at 8:49 AM, Mike Dorman  wrote:

We are looking at upgrading to the latest RabbitMQ in an effort to ease some 
cluster failover issues we’ve been seeing.  (Currently on 3.4.0)

 

Anyone been running 3.6.x?  And what has been your experience?  Any gottchas to 
watch out for?

 

Thanks,

Mike

 


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators