[Openstack-operators] osops-tools-monitoring Dependency problems
Hi! I'm a long time user of monitoring-for-openstack, also known as oschecks. Concretely, I used a version from 2015 with OpenStack python client libraries from Kilo. Now I have upgraded them to Mitaka and it got broken. Even the latest oschecks don't work. I didn't quite expect that, given that there are several commits from this year e.g. by Nagasai Vinaykumar Kapalavai and paramite. Can one of them or some other user step up and say what version of OpenStack clients is oschecks working with? Ideally, write it down in requirements.txt so that it will be reproducible? Also, some documentation of what is the minimal set of parameters would also come in handy. Thanks a lot, Tomas from Homeatcloud The error messages are as absurd as: oschecks-check_glance_api --os_auth_url='http://10.1.101.30:5000/v2.0' --os_username=monitoring --os_password=XXX --os_tenant_name=monitoring CRITICAL: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/oschecks/utils.py", line 121, in safe_run method() File "/usr/lib/python2.7/dist-packages/oschecks/glance.py", line 29, in _check_glance_api glance = utils.Glance() File "/usr/lib/python2.7/dist-packages/oschecks/utils.py", line 177, in __init__ self.glance.parser = self.glance.get_base_parser(sys.argv) TypeError: get_base_parser() takes exactly 1 argument (2 given) (I can see 4 parameters on the command line.) ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey
Hi! What we‘ve got in our small public cloud: scheduler_default_filters=AggregateInstanceExtraSpecsFilter, AggregateImagePropertiesIsolation, RetryFilter, AvailabilityZoneFilter, AggregateRamFilter, AggregateDiskFilter, AggregateCoreFilter, ComputeFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter #ComputeCapabilitiesFilter off because of conflict with AggregateInstanceExtraSpecFilter https://bugs.launchpad.net/nova/+bug/1279719 I really like to set resource limits using Aggregate metadata. Also, Windows host isolation is done using image metadata. I have filled a bug somewhere that it does not work correctly with Boot from Volume. I believe it got pretty much ignored. That’s why we also use flavor metadata. Tomas from Homeatcloud From: Massimo Sgaravatto [mailto:massimo.sgarava...@gmail.com] Sent: Saturday, April 21, 2018 7:49 AM To: Simon Leinen Cc: OpenStack Development Mailing List (not for usage questions); OpenStack Operators Subject: Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey enabled_filters = AggregateInstanceExtraSpecsFilter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,AggregateRamFilter,AggregateCoreFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter Cheers, Massimo On Wed, Apr 18, 2018 at 10:20 PM, Simon Leinen wrote: Artom Lifshitz writes: > To that end, we'd like to know what filters operators are enabling in > their deployment. If you can, please reply to this email with your > [filter_scheduler]/enabled_filters (or > [DEFAULT]/scheduler_default_filters if you're using an older version) > option from nova.conf. Any other comments are welcome as well :) We have the following enabled on our semi-public (academic community) cloud, which runs on Newton: AggregateInstanceExtraSpecsFilter AvailabilityZoneFilter ComputeCapabilitiesFilter ComputeFilter ImagePropertiesFilter PciPassthroughFilter RamFilter RetryFilter ServerGroupAffinityFilter ServerGroupAntiAffinityFilter (sorted alphabetically) Recently we've also been trying AggregateImagePropertiesIsolation ...but it looks like we'll replace it with our own because it's a bit awkward to use for our purpose (scheduling Windows instance to licensed compute nodes). -- Simon. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [openstack-dev] [nova] about rebuild instance booted from volume
Hi! I say delete! Delete them all! Really, it's called delete_on_termination and should be ignored on Rebuild. We have a VPS service implemented on top of OpenStack and do throw the old contents away on Rebuild. When the user has the Backup service paid, they can restore a snapshot. Backup is implemented as volume snapshot, then clone volume, then upload to image (glance is on a different disk array). I also sometimes multi-attach a volume manually to a service node and just dd an image onto it. If it was to be implemented this way, then there would be no deleting a volume with delete_on_termination, just overwriting. But the effect is the same. IMHO you can have snapshots of volumes that have been deleted. Just some backends like our 3PAR don't allow it, but it's not disallowed in the API contract. Tomas from Homeatcloud -Original Message- From: Saverio Proto [mailto:ziopr...@gmail.com] Sent: Wednesday, March 14, 2018 3:19 PM To: Tim Bell; Matt Riedemann Cc: OpenStack Development Mailing List (not for usage questions); openstack-operators@lists.openstack.org Subject: Re: [Openstack-operators] [openstack-dev] [nova] about rebuild instance booted from volume My idea is that if delete_on_termination flag is set to False the Volume should never be deleted by Nova. my 2 cents Saverio 2018-03-14 15:10 GMT+01:00 Tim Bell : > Matt, > > To add another scenario and make things even more difficult (sorry (), if the > original volume has snapshots, I don't think you can delete it. > > Tim > > > -Original Message- > From: Matt Riedemann > Reply-To: "OpenStack Development Mailing List (not for usage > questions)" > Date: Wednesday, 14 March 2018 at 14:55 > To: "openstack-...@lists.openstack.org" > , openstack-operators > > Subject: Re: [openstack-dev] [nova] about rebuild instance booted from > volume > > On 3/14/2018 3:42 AM, 李杰 wrote: > > > > This is the spec about rebuild a instance booted from > > volume.In the spec,there is a > >question about if we should delete the old root_volume.Anyone who > > is interested in > >booted from volume can help to review this. Any suggestion is > > welcome.Thank you! > >The link is here. > >Re:the rebuild spec:https://review.openstack.org/#/c/532407/ > > Copying the operators list and giving some more context. > > This spec is proposing to add support for rebuild with a new image for > volume-backed servers, which today is just a 400 failure in the API > since the compute doesn't support that scenario. > > With the proposed solution, the backing root volume would be deleted and > a new volume would be created from the new image, similar to how boot > from volume works. > > The question raised in the spec is whether or not nova should delete the > root volume even if its delete_on_termination flag is set to False. The > semantics get a bit weird here since that flag was not meant for this > scenario, it's meant to be used when deleting the server to which the > volume is attached. Rebuilding a server is not deleting it, but we would > need to replace the root volume, so what do we do with the volume we're > replacing? > > Do we say that delete_on_termination only applies to deleting a server > and not rebuild and therefore nova can delete the root volume during a > rebuild? > > If we don't delete the volume during rebuild, we could end up leaving a > lot of volumes lying around that the user then has to clean up, > otherwise they'll eventually go over quota. > > We need user (and operator) feedback on this issue and what they would > expect to happen. > > -- > > Thanks, > > Matt > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator > s ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How are you handling billing/chargeback?
Hi! We at Homeatcloud have rolled our own engine taking data from Ceilometer events. However, CloudKitty didn‘t exist back then. Now we would probably use it to calculate the rating AND roll our own engine for billing and invoice printing. Tomas From: Flint WALRUS [mailto:gael.ther...@gmail.com] Sent: Monday, March 12, 2018 9:41 PM To: Lars Kellogg-Stedman Cc: openstack-operators@lists.openstack.org Subject: Re: [Openstack-operators] How are you handling billing/chargeback? Hi lars, personally using an internally crafted service. It’s one of my main regret with Openstack, lack of a decent billing system. Le lun. 12 mars 2018 à 20:22, Lars Kellogg-Stedman a écrit : Hey folks, I'm curious what folks out there are using for chargeback/billing in your OpenStack environment. Are you doing any sort of chargeback (or showback)? Are you using (or have you tried) CloudKitty? Or some other existing project? Have you rolled your own instead? I ask because I am helping out some folks get a handle on the operational side of their existing OpenStack environment, and they are interested in but have not yet deployed some sort of reporting mechanism. Thanks, -- Lars Kellogg-Stedman | larsks @ {irc,twitter,github} http://blog.oddbit.com/| ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] thierry's longer dev cycle proposal
The thread on the dev list is already too long for my liking. I hope there will be a TL;DR in the dev mailing list digest. Tomas -Original Message- From: arkady.kanev...@dell.com [mailto:arkady.kanev...@dell.com] Sent: Thursday, December 14, 2017 3:40 AM To: mrhills...@gmail.com; fu...@yuggoth.org; openstack-operators@lists.openstack.org Subject: Re: [Openstack-operators] thierry's longer dev cycle proposal It is a sign of the maturity of OpenStack. With lots of deployment and most of them in production, the emphasis is shifting from rapid functionality additions to stability, manageability, and long term operability. -Original Message- From: Melvin Hillsman [mailto:mrhills...@gmail.com] Sent: Wednesday, December 13, 2017 5:29 PM To: Jeremy Stanley ; openstack-operators@lists.openstack.org Subject: Re: [Openstack-operators] thierry's longer dev cycle proposal I think this is a good opportunity to allow some stress relief to the developer community and offer space for more discussions with operators where some operators do not feel like they are bothering/bugging developers. I believe this is the main gain for operators; my personal opinion. In general I think the opportunity costs/gains are worth it for this and it is the responsibility of the community to make the change be useful as you mentioned in your original thread Thierry. It is not a silver bullet for all of the issues folks have with the way things are done but I believe that if it does not hurt things and offers even a slight gain in some area it makes sense. Any change is not going to satisfy/dis-satisfy 100% of the constituents. -- Kind regards, Melvin Hillsman mrhills...@gmail.com mobile: +1 (832) 264-2646 irc: mrhillsman On 12/13/17, 4:39 PM, "Jeremy Stanley" wrote: On 2017-12-13 22:35:41 +0100 (+0100), Thierry Carrez wrote: [...] > It's not really fait accompli, it's just a proposal up for discussion at > this stage. Which is the reason why I started the thread on -dev -- to > check the sanity of the change from a dev perspective first. If it makes > things harder and not simpler on that side, I don't expect the TC to > proceed. [...] With my TC hat on, regardless of what impression the developer community has on this, I plan to take subsequent operator and end-user/app-dev feedback into account as well before making any binding decisions (and expect other TC members feel the same). -- Jeremy Stanley ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?
Dear Clint, maybe you misunderstood a little, or I didn't write it explicitly. We use OpenStack for providing a VPS service, yes. But the VPS users do not get access to OpenStack directly, but instead, they use our Customer Portal which does the orchestration. The whole point is to make the service as easy as possible to use for them and not expose them to the complexity of the Cloud. As I said, we couldn't use Rebuild because VPS's have Volumes. We do use Resize because it is there. But we could as well use more low-level cloud primitives. The user does not care in this case. How does, e.g., WHMCS do it? That is a stock software that you can use to provide VPS over OpenStack. Tomas from Homeatcloud -Original Message- From: Clint Byrum [mailto:cl...@fewbar.com] Sent: Thursday, October 05, 2017 6:50 PM To: openstack-operators Subject: Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild? No offense is intended, so please forgive me for the possibly incendiary nature of what I'm about to write: VPS is the predecessor of cloud (and something I love very much, and rely on every day!), and encourages all the bad habits that a cloud disallows. At small scale, it's the right thing, and that's why I use it for my small scale needs. Get a VM, put your stuff on it, and keep it running forever. But at scale, VMs in clouds go away. They get migrated, rebooted, turned off, and discarded, often. Most clouds are terrible for VPS compared to VPS hosting environments. I'm glad it's working for you. And I think rebuild and resize will stay and improve to serve VPS style users in interesting ways. I'm learning now who our users are today, and I'm confident we should make sure everyone who has taken the time to deploy and care for OpenStack should be served by expanding rebuild to meet their needs. You can all consider this my white flag. :) Excerpts from Tomáš Vondra's message of 2017-10-05 10:22:14 +0200: > In our cloud, we offer the possibility to reinstall the same or another OS on > a VPS (Virtual Private Server). Unfortunately, we couldn’t use the rebuild > function because of the VPS‘s use of Cinder for root disk. We create a new > instance and inject the same User Data so that the new instance has the same > password and key as the last one. It also has the same name, and the same > floating IP is attached. I believe it even has the same IPv6 through some > Neutron port magic. > > BTW, you wouldn’t believe how often people use the Reinstall feature. > > Tomas from Homeatcloud > > > > From: Belmiro Moreira [mailto:moreira.belmiro.email.li...@gmail.com] > Sent: Wednesday, October 04, 2017 5:34 PM > To: Chris Friesen > Cc: openstack-operators@lists.openstack.org > Subject: Re: [Openstack-operators] [nova] Should we allow passing new > user_data during rebuild? > > > > In our cloud rebuild is the only way for a user to keep the same IP. > Unfortunately, we don't offer floating IPs, yet. > > Also, we use the user_data to bootstrap some actions in new instances > (puppet, ...). > > Considering all the use-cases for rebuild it would be great if the user_data > can be updated at rebuild time. > > > > On Wed, Oct 4, 2017 at 5:15 PM, Chris Friesen > wrote: > > On 10/03/2017 11:12 AM, Clint Byrum wrote: > > My personal opinion is that rebuild is an anti-pattern for cloud, and > should be frozen and deprecated. It does nothing but complicate Nova > and present challenges for scaling. > > That said, if it must stay as a feature, I don't think updating the > user_data should be a priority. At that point, you've basically > created an entirely new server, and you can already do that by > creating an entirely new server. > > > If you've got a whole heat stack with multiple resources, and you realize > that you messed up one thing in the template and one of your servers has the > wrong personality/user_data, it can be useful to be able to rebuild that one > server without affecting anything else in the stack. That's just a > convenience though. > > Chris > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild?
In our cloud, we offer the possibility to reinstall the same or another OS on a VPS (Virtual Private Server). Unfortunately, we couldn’t use the rebuild function because of the VPS‘s use of Cinder for root disk. We create a new instance and inject the same User Data so that the new instance has the same password and key as the last one. It also has the same name, and the same floating IP is attached. I believe it even has the same IPv6 through some Neutron port magic. BTW, you wouldn’t believe how often people use the Reinstall feature. Tomas from Homeatcloud From: Belmiro Moreira [mailto:moreira.belmiro.email.li...@gmail.com] Sent: Wednesday, October 04, 2017 5:34 PM To: Chris Friesen Cc: openstack-operators@lists.openstack.org Subject: Re: [Openstack-operators] [nova] Should we allow passing new user_data during rebuild? In our cloud rebuild is the only way for a user to keep the same IP. Unfortunately, we don't offer floating IPs, yet. Also, we use the user_data to bootstrap some actions in new instances (puppet, ...). Considering all the use-cases for rebuild it would be great if the user_data can be updated at rebuild time. On Wed, Oct 4, 2017 at 5:15 PM, Chris Friesen wrote: On 10/03/2017 11:12 AM, Clint Byrum wrote: My personal opinion is that rebuild is an anti-pattern for cloud, and should be frozen and deprecated. It does nothing but complicate Nova and present challenges for scaling. That said, if it must stay as a feature, I don't think updating the user_data should be a priority. At that point, you've basically created an entirely new server, and you can already do that by creating an entirely new server. If you've got a whole heat stack with multiple resources, and you realize that you messed up one thing in the template and one of your servers has the wrong personality/user_data, it can be useful to be able to rebuild that one server without affecting anything else in the stack. That's just a convenience though. Chris ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] cinder/nova issues
Hi! If this was OpenStack Kilo and HPE 3PAR over Fibre Channel, I would tell you that the volume extend operation is designed to work with detached volumes only. Hence you need cinder reset-state. At least in our case, it does not update the SCSI devices and multipath setup. The volume continues to work with the old size. We do a live migrate operation afterwards to disconnect the storage from one node and connect to another. Even resize to the same node works. However, os-brick was introduced in Liberty, so the case may be different. Tomas From: Adam Dibiase [mailto:adibi...@digiumcloud.com] Sent: Wednesday, August 23, 2017 9:06 PM To: Sean McGinnis Cc: openstack-operators@lists.openstack.org Subject: Re: [Openstack-operators] cinder/nova issues Thanks Sean. I filed a bug report to track this. Bug #1712651. I would agree with you on connectivity issues with the Netapp if it happened on all volume extensions, but this only happens in one scenario only. Thanks, Adam On Wed, Aug 23, 2017 at 2:04 PM, Sean McGinnis wrote: Hey Adam, There have been some updates since Liberty to improve handling in the os-brick library that handles the local device management. But with this showing the paths down, I wonder if there's something else going on there between the NetApp box and the Nova compute host. Could you file a bug to track this? I think you could just copy and paste the content of your original email since it captures a lot of great info. https://bugs.launchpad.net/cinder/+filebug We can tag it with netapp so maybe it will get some attention there. Thanks, Sean On Wed, Aug 23, 2017 at 01:01:24PM -0400, Adam Dibiase wrote: > Greetings, > > I am having an issue with nova starting an instance that is using a root > volume that cinder has extended. More specifically, a volume that has been > extended past the max resize limit of our Netapp filer. I am running > Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take advantage > of this functionality. From what I can gather, it uses sub-lun cloning to > get past the hard limit set by Netapp when cloning past 64G (starting from > a 4G volume). > > *Environment*: > >- Release: Liberty >- Filer: Netapp >- Protocol: Fiberchannel >- Multipath: yes > > > > *Steps to reproduce: * > >- Create new instance >- stop instance >- extend the volume by running the following commands: > - cinder reset-state --state available (volume-ID or name) > - cinder extend (volume-ID or name) 100 > - cinder reset-state --state in-use (volume-ID or name) >- start instance with either nova start or nova reboot --hard --same >result > > > I can see that the instance's multipath status is good before the resize... > > *360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN * > > size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw > > |-+- policy='round-robin 0' prio=-1 status=active > > | |- 6:0:1:5 sdy 65:128 active undef running > > | `- 7:0:0:5 sdz 65:144 active undef running > > `-+- policy='round-robin 0' prio=-1 status=enabled > > |- 6:0:0:5 sdx 65:112 active undef running > > `- 7:0:1:5 sdaa 65:160 active undef running > > > Once the volume is resized, the lun goes to a failed state and it does not > show the new size: > > > *360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN * > > size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw > > |-+- policy='round-robin 0' prio=-1 status=enabled > > | |- 6:0:1:5 sdy 65:128 failed undef running > > | `- 7:0:0:5 sdz 65:144 failed undef running > > `-+- policy='round-robin 0' prio=-1 status=enabled > > |- 6:0:0:5 sdx 65:112 failed undef running > > `- 7:0:1:5 sdaa 65:160 failed undef running > > > Like I said, this only happens on volumes that have been extended past 64G. > Smaller sizes to not have this issue. I can only assume that the original > lun is getting destroyed after the clone process and that is cause of the > failed state. Why is it not picking up the new one and attaching it to the > compute node? Is there something I am missing? > > Thanks in advance, > > Adam > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] custom build image is slow
Hi! How big are the actual image files? Because qcow2 is a sparse format, it does not store zeroes. If the free space in one image is zeroed out, it will convert much faster. If that is the problem, use „dd if=/dev/zero of=temp;sync;rm temp“ or zerofree. Tomas From: Paras pradhan [mailto:pradhanpa...@gmail.com] Sent: Monday, July 31, 2017 11:54 PM To: openstack-operators@lists.openstack.org Subject: [Openstack-operators] custom build image is slow Hello I have two qcow2 images uploaded to glance. One is CentOS 7 cloud image downloaded from centos.org. The other one is custom built using CentOS 7.DVD. When I create cinder volumes from them, volume creation from the custom built image it is very very slow. CenOS qcow2: 2017-07-31 21:42:44.287 881609 INFO cinder.image.image_utils [req-ea2d7b12-ae9e-45b2-8b4b-ea8465497d5a e090e605170a778610438bfabad7aa7764d0a77ef520ae392e2b59074c9f88cf 490910c1d4e1486d8e3a62d7c0ae698e - d67a18e70dd9467db25b74d33feaad6d default] Converted 8192.00 MB image at 253.19 MB/s Custom built qcow2: INFO cinder.image.image_utils [req-032292d8-1500-474d-95c7-2e8424e2b864 e090e605170a778610438bfabad7aa7764d0a77ef520ae392e2b59074c9f88cf 490910c1d4e1486d8e3a62d7c0ae698e - d67a18e70dd9467db25b74d33feaad6d default] Converted 10240.00 MB image at 32.22 MB/s I used the following command to create the qcow2 file qemu-img create -f qcow2 custom.qcow2 10G What am I missing ? Thanks Paras. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova] instance partition size discrepancy
Have you tried # resize2fs /dev/vda ? Alternatively, if you use images with cloud-init and initramfs-growroot installed, it should work out of the box. Tomas -Original Message- From: Carlos Konstanski [mailto:ckonstan...@pippiandcarlos.com] Sent: Wednesday, April 26, 2017 12:02 AM To: openstack-operators@lists.openstack.org Subject: [Openstack-operators] [nova] instance partition size discrepancy I'm having an issue where the instance thinks its root filesystem is much smaller than the size of the volume that I used to create it. Not only that, the OS cannot decide on whether it thinks the size is right or wrong. See the following pastebin: https://paste.pound-python.org/show/eNt8nLNLhHAL5OYICqbs/ Notice that everything shows the size as 20 GB except df, which shows it as 2.8 GB. I ran the previous instance out of space before spinning up this new one, so 2.8 seems to be the winner (though wrong). Figured I'd check to see if this is a known issue while I dig deeper. Carlos Konstanski ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [scientific] Resource reservation requirements (Blazar) - Forum session
Hi! Did someone mention automation changing the spot instance capacity? I did an article in 2013 that proposes exactly that. The model forecasts the workload curve of the majority traffic, which is presumed to be interactive, and the rest may be used for batch traffic. The forecast used is SARIMA and is usable up to a few days in advance. Would anybody be interested in trying the forecast on data from their cloud? http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.671.7397&rep=rep1&type=pdf¨ Tomas Vondra, dept. of Cybernetics, CTU FEE -Original Message- From: Blair Bethwaite [mailto:blair.bethwa...@gmail.com] Sent: Tuesday, April 04, 2017 12:08 AM To: Jay Pipes Cc: openstack-oper. Subject: Re: [Openstack-operators] [scientific] Resource reservation requirements (Blazar) - Forum session Hi Jay, On 4 April 2017 at 00:20, Jay Pipes wrote: > However, implementing the above in any useful fashion requires that > Blazar be placed *above* Nova and essentially that the cloud operator > turns off access to Nova's POST /servers API call for regular users. > Because if not, the information that Blazar acts upon can be simply > circumvented by any user at any time. That's something of an oversimplification. A reservation system outside of Nova could manipulate Nova host-aggregates to "cordon off" infrastructure from on-demand access (I believe Blazar already uses this approach), and it's not much of a jump to imagine operators being able to twiddle the available reserved capacity in a finite cloud so that reserved capacity can be offered to the subset of users/projects that need (or perhaps have paid for) it. Such a reservation system would even be able to backfill capacity between reservations. At the end of the reservation the system cleans-up any remaining instances and preps for the next reservation. The are a couple of problems with putting this outside of Nova though. The main issue is that pre-emptible/spot type instances can't be accommodated within the on-demand cloud capacity. You could have the reservation system implementing this feature, but that would then put other scheduling constraints on the cloud in order to be effective (e.g., there would need to be automation changing the size of the on-demand capacity so that the maximum pre-emptible capacity was always available). The other issue (admittedly minor, but still a consideration) is that it's another service - personally I'd love to see Nova support these advanced use-cases directly. -- Cheers, ~Blairo ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Flavors
We at Homeatcloud.com do exactly this in our VPS service. The user can configure the VPS with any combination of CPU, RAM, and disk. However a) the configurations are all about 10% the size of the physical machines and b) the disks are in a SAN array, provisioned as volumes. So I give the users some flexibility and can better see what configurations they actually want and build new hypervisors with that in mind. They mostly want up to 4 GB RAM anyway, si it’s not a big deal. Tomas Vondra From: Adam Lawson [mailto:alaw...@aqorn.com] Sent: Thursday, March 16, 2017 5:57 PM To: Jonathan D. Proulx Cc: OpenStack Operators Subject: Re: [Openstack-operators] Flavors One way I know some providers work around this when using OpenStack is by fronting the VM request with some code in the web server that checks if the requested spec has an existing flavor. if so, use the flavor, if not, use an admin account that creates a new flavor and assign use it for that user request then remove if when the build is complete. This naturally impacts your control over hardware efficiency but it makes your scenario possible (for better or for worse). I also hate being forced to do what someone else decided was going to be best for me. That's my decision and thanksully with openStack, this kind of thing is rather easy to do. //adam Adam Lawson Principal Architect, CEO Office: +1-916-794-5706 On Thu, Mar 16, 2017 at 7:52 AM, Jonathan D. Proulx wrote: I have always hated flavors and so do many of my users. On Wed, Mar 15, 2017 at 03:22:48PM -0700, James Downs wrote: :On Wed, Mar 15, 2017 at 10:10:00PM +, Fox, Kevin M wrote: :> I think the really short answer is something like: It greatly simplifies scheduling and billing. : :The real answer is that once you buy hardware, it's in a fixed radio of CPU/Ram/Disk/IOPS, etc. This while apparently reasonable is BS (at least in private cloud space). What users request and what they actualy use are wildly divergent. *IF* usage of claimed resorces were at or near optimal then this might be true . But if people are claiming 32G of ram because that how much you assigned to a 16 vCPU instance type but really just need 16 threads with 2G or 4G then you packing still sucks. I'm mostly bound on memory so I mostly have my users select on that basis and over provide and over provision CPU since that can be effectively shared between VMs where memory needs to be dedicated (well mostly) I'm sure I've ranted abotu this before but as you see from other responses we seem to be in the minority position so mostly I rant at the walls while my office mates look on perplexed (actually they're pretty used to it by now and ignore me :) ) -Jon ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova] Do you use os-instance-usage-audit-log?
Hi Matt, I've looked at my Nova config and yes, I have it on. We do billing using Ceilometer data and I think compute.instance.exists is consumed as well. The Ceilometer event retention is set to 6 months and the database size is in single gigabytes. Nova database table task_log only contains the fact that the audit job ran successfully and has 6 MB. It was not pruned for more than a year. Tomas -Original Message- From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com] Sent: Thursday, January 12, 2017 12:09 AM To: openstack-operators@lists.openstack.org Subject: [Openstack-operators] [nova] Do you use os-instance-usage-audit-log? Nova's got this REST API [1] which pulls task_log data from the nova database if the 'instance_usage_audit' config option value is True on any compute host. That table is populated in a periodic task from all computes that have it enabled and by default it 'audits' instances created in the last month (the time window is adjustable via the 'instance_get_active_by_window_joined' config option). The periodic task also emits a 'compute.instance.exists' notification for each instance on that compute host which falls into the audit period. I'm fairly certain that notification is meant to be consumed by Ceilometer which is going to store it in it's own time-series database. It just so happens that Nova is also storing this audit data in it's own database, and never cleaning it up - the only way in-tree to move that data out of the nova.task_log table is to archive it into shadow tables, but that doesn't cut down on the bloat in your database. That os-instance-usage-audit-log REST API is relying on the nova database though. So my question is, is anyone using this in any shape or form, either via the Nova REST API or Ceilometer? Or are you using it in one form but not the other (maybe only via Ceilometer)? If you're using it, how are you controlling the table growth, i.e. are you deleting records over a certain age from the nova database using a cron job? Mike Bayer was going to try and find some large production data sets to see how many of these records are in a big and busy production DB that's using this feature, but I'm also simply interested in how people use this, if it's useful at all, and if there is interest in somehow putting a limit on the data, i.e. we could add a config option to nova to only store records in the task_log table under a certain max age. [1] http://developer.openstack.org/api-ref/compute/#server-usage-audit-log-os-in stance-usage-audit-log -- Thanks, Matt Riedemann ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] RabbitMQ 3.6.x experience?
The version is 3.6.2, but the issue that I believe is relevant is still not fixed: https://github.com/rabbitmq/rabbitmq-management/issues/41 Tomas -Original Message- From: Mike Dorman [mailto:mdor...@godaddy.com] Sent: Monday, January 09, 2017 6:00 PM To: Ricardo Rocha; Sam Morrison Cc: OpenStack Operators Subject: Re: [Openstack-operators] RabbitMQ 3.6.x experience? Great info, thanks so much for this. We, too, have turned off stats collection some time ago (and haven’t really missed it.) Tomáš, what minor version of 3.6 are you using? We would probably go to 3.6.6 if we upgrade. Thanks again all! Mike On 1/9/17, 2:34 AM, "Ricardo Rocha" wrote: Same here, running 3.6.5 for (some) of the rabbit clusters. It's been stable over the last month (fingers crossed!), though: * gave up on stats collection (set to 6 which makes it not so useful) * can still make it very sick with a couple of misconfigured clients (rabbit_retry_interval=1 and rabbit_retry_backoff=60 currently everywhere). Some data from the neutron rabbit cluster (3 vm nodes, not all infra currently talks to neutron): * connections: ~8k * memory used per node: 2.5GB, 1.7GB, 0.1GB (the last one is less used due to a previous net partition i believe) * rabbit hiera configuration rabbitmq::cluster_partition_handling: 'autoheal' rabbitmq::config_kernel_variables: inet_dist_listen_min: 41055 inet_dist_listen_max: 41055 rabbitmq::config_variables: collect_statistics_interval: 6 reverse_dns_lookups: true vm_memory_high_watermark: 0.8 rabbitmq::environment_variables: SERVER_ERL_ARGS: "'+K true +A 128 +P 1048576'" rabbitmq::tcp_keepalive: true rabbitmq::tcp_backlog: 4096 * package versions erlang-kernel-18.3.4.4-1 rabbitmq-server-3.6.5-1 It's stable enough to keep scaling it up in the next couple months and see how it goes. Cheers, Ricardo On Mon, Jan 9, 2017 at 3:54 AM, Sam Morrison wrote: > We’ve been running 3.6.5 for sometime now and it’s working well. > > 3.6.1 - 3.6.3 are unusable, we had lots of issues with stats DB and other > weirdness. > > Our setup is a 3 physical node cluster with around 9k connections, average > around the 300 messages/sec delivery. We have the stats sample rate set to > default and it is working fine. > > Yes we did have to restart the cluster to upgrade. > > Cheers, > Sam > > > > On 6 Jan 2017, at 5:26 am, Matt Fischer wrote: > > MIke, > > I did a bunch of research and experiments on this last fall. We are running > Rabbit 3.5.6 on our main cluster and 3.6.5 on our Trove cluster which has > significantly less load (and criticality). We were going to upgrade to 3.6.5 > everywhere but in the end decided not to, mainly because there was little > perceived benefit at the time. Our main issue is unchecked memory growth at > random times. I ended up making several config changes to the stats > collector and then we also restart it after every deploy and that solved it > (so far). > > I'd say these were my main reasons for not going to 3.6 for our control > nodes: > > In 3.6.x they re-wrote the stats processor to make it parallel. In every 3.6 > release since then, Pivotal has fixed bugs in this code. Then finally they > threw up their hands and said "we're going to make a complete rewrite in > 3.7/4.x" (you need to look through issues on Github to find this discussion) > Out of the box with the same configs 3.6.5 used more memory than 3.5.6, > since this was our main issue, I consider this a negative. > Another issue is the ancient version of erlang we have with Ubuntu Trusty > (which we are working on) which made upgrades more complex/impossible > depending on the version. > > Given those negatives, the main one being that I didn't think there would be > too many more fixes to the parallel statsdb collector in 3.6, we decided to > stick with 3.5.6. In the end the devil we know is better than the devil we > don't and I had no evidence that 3.6.5 would be an improvement. > > I did decide to leave Trove on 3.6.5 because this would give us some bake-in > time if 3.5.x became untenable we'd at least have had it up and running in > production and some data on it. > > If statsdb is not a concern for you, I think this changes the math and maybe > you should use 3.6.x. I would however recommend at least going to 3.5.6, > it's been better than 3.3/3.4 was. > > No matter what you do definitely read all the release notes. There are some > upgrades which require an entire cluster shutdown. The upgrade to 3.5.6 did > not require this IIRC. > > Here's the hiera for our rabbit settings whic
Re: [Openstack-operators] RabbitMQ 3.6.x experience?
We have upgraded to RabbitMQ 3.6, and it resulted in one node crashing about every week on out of memory errors. To avoid this, we had to turn off the message rate collection. So no throughput graphs until it gets fixed. Avoid this version if you can. Tomas From: Sam Morrison [mailto:sorri...@gmail.com] Sent: Monday, January 09, 2017 3:55 AM To: Matt Fischer Cc: OpenStack Operators Subject: Re: [Openstack-operators] RabbitMQ 3.6.x experience? We’ve been running 3.6.5 for sometime now and it’s working well. 3.6.1 - 3.6.3 are unusable, we had lots of issues with stats DB and other weirdness. Our setup is a 3 physical node cluster with around 9k connections, average around the 300 messages/sec delivery. We have the stats sample rate set to default and it is working fine. Yes we did have to restart the cluster to upgrade. Cheers, Sam On 6 Jan 2017, at 5:26 am, Matt Fischer wrote: MIke, I did a bunch of research and experiments on this last fall. We are running Rabbit 3.5.6 on our main cluster and 3.6.5 on our Trove cluster which has significantly less load (and criticality). We were going to upgrade to 3.6.5 everywhere but in the end decided not to, mainly because there was little perceived benefit at the time. Our main issue is unchecked memory growth at random times. I ended up making several config changes to the stats collector and then we also restart it after every deploy and that solved it (so far). I'd say these were my main reasons for not going to 3.6 for our control nodes: * In 3.6.x they re-wrote the stats processor to make it parallel. In every 3.6 release since then, Pivotal has fixed bugs in this code. Then finally they threw up their hands and said "we're going to make a complete rewrite in 3.7/4.x" (you need to look through issues on Github to find this discussion) * Out of the box with the same configs 3.6.5 used more memory than 3.5.6, since this was our main issue, I consider this a negative. * Another issue is the ancient version of erlang we have with Ubuntu Trusty (which we are working on) which made upgrades more complex/impossible depending on the version. Given those negatives, the main one being that I didn't think there would be too many more fixes to the parallel statsdb collector in 3.6, we decided to stick with 3.5.6. In the end the devil we know is better than the devil we don't and I had no evidence that 3.6.5 would be an improvement. I did decide to leave Trove on 3.6.5 because this would give us some bake-in time if 3.5.x became untenable we'd at least have had it up and running in production and some data on it. If statsdb is not a concern for you, I think this changes the math and maybe you should use 3.6.x. I would however recommend at least going to 3.5.6, it's been better than 3.3/3.4 was. No matter what you do definitely read all the release notes. There are some upgrades which require an entire cluster shutdown. The upgrade to 3.5.6 did not require this IIRC. Here's the hiera for our rabbit settings which I assume you can translate: rabbitmq::cluster_partition_handling: 'autoheal' rabbitmq::config_variables: 'vm_memory_high_watermark': '0.6' 'collect_statistics_interval': 3 rabbitmq::config_management_variables: 'rates_mode': 'none' rabbitmq::file_limit: '65535' Finally, if you do upgrade to 3.6.x please report back here with your results at scale! On Thu, Jan 5, 2017 at 8:49 AM, Mike Dorman wrote: We are looking at upgrading to the latest RabbitMQ in an effort to ease some cluster failover issues we’ve been seeing. (Currently on 3.4.0) Anyone been running 3.6.x? And what has been your experience? Any gottchas to watch out for? Thanks, Mike ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators