Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-28 Thread Sam Morrison


> On 26 Oct 2018, at 1:42 am, Dan Smith  wrote:
> 
>> I guess our architecture is pretty unique in a way but I wonder if
>> other people are also a little scared about the whole all DB servers
>> need to up to serve API requests?
> 
> When we started down this path, we acknowledged that this would create a
> different access pattern which would require ops to treat the cell
> databases differently. The input we were getting at the time was that
> the benefits outweighed the costs here, and that we'd work on caching to
> deal with performance issues if/when that became necessary.
> 
>> I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still
>> have the top level api cell DB but the API would only ever read from
>> it. Nova-api would only write to the compute cell DBs.
>> Then keep the nova-cells processes just doing instance_update_at_top to keep 
>> the nova-cell-api db up to date.
> 
> I'm definitely not in favor of doing more replication in python to
> address this. What was there in cellsv1 was lossy, even for the subset
> of things it actually supported (which didn't cover all nova features at
> the time and hasn't kept pace with features added since, obviously).
> 
> About a year ago, I proposed that we add another "read only mirror"
> field to the cell mapping, which nova would use if and only if the
> primary cell database wasn't reachable, and only for read
> operations. The ops, if they wanted to use this, would configure plain
> old one-way mysql replication of the cell databases to a
> highly-available server (probably wherever the api_db is) and nova could
> use that as a read-only cache for things like listing instances and
> calculating quotas. The reaction was (very surprisingly to me) negative
> to this option. It seems very low-effort, high-gain, and proper re-use
> of existing technologies to me, without us having to replicate a
> replication engine (hah) in python. So, I'm curious: does that sound
> more palatable to you?

Yeah I think that could work for us, so far I can’t think of anything better. 

Thanks,
Sam


> 
> --Dan


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-24 Thread Sam Morrison


> On 24 Oct 2018, at 4:01 pm, melanie witt  wrote:
> 
> On Wed, 24 Oct 2018 10:54:31 +1100, Sam Morrison wrote:
>> Hi nova devs,
>> Have been having a good look into cellsv2 and how we migrate to them (we’re 
>> still on cellsv1 and about to upgrade to queens and still run cells v1 for 
>> now).
>> One of the problems I have is that now all our nova cell database servers 
>> need to respond to API requests.
>> With cellsv1 our architecture was to have a big powerful DB cluster (3 
>> physical servers) at the API level to handle the API cell and then a 
>> smallish non HA DB server (usually just a VM) for each of the compute cells.
>> This architecture won’t work with cells V2 and we’ll now need to have a lot 
>> of highly available and responsive DB servers for all the cells.
>> It will also mean that our nova-apis which reside in Melbourne, Australia 
>> will now need to talk to database servers in Auckland, New Zealand.
>> The biggest issue we have is when a cell is down. We sometimes have cells go 
>> down for an hour or so planned or unplanned and with cellsv1 this does not 
>> affect other cells.
>> Looks like some good work going on here 
>> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/handling-down-cell
>> But what about quota? If a cell goes down then it would seem that a user all 
>> of a sudden would regain some quota from the instances that are in the down 
>> cell?
>> Just wondering if anyone has thought about this?
> 
> Yes, we've discussed it quite a bit. The current plan is to offer a 
> policy-driven behavior as part of the "down" cell handling which will control 
> whether nova will:
> 
> a) Reject a server create request if the user owns instances in "down" cells
> 
> b) Go ahead and count quota usage "as-is" if the user owns instances in 
> "down" cells and allow quota limit to be potentially exceeded
> 
> We would like to know if you think this plan will work for you.
> 
> Further down the road, if we're able to come to an agreement on a consumer 
> type/owner or partitioning concept in placement (to be certain we are 
> counting usage our instance of nova owns, as placement is a shared service), 
> we could count quota usage from placement instead of querying cells.

OK great, always good to know other people are thinking for you :-) , I don’t 
really like a or b but the idea about using placement sounds like a good one to 
me.

I guess our architecture is pretty unique in a way but I wonder if other people 
are also a little scared about the whole all DB servers need to up to serve API 
requests?

I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.

We’d still have syncing issues but we have that with placement now and that is 
more frequent than nova-cells-v1 is for us.

Cheers,
Sam



> 
> Cheers,
> -melanie
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-23 Thread Sam Morrison
Hi nova devs,

Have been having a good look into cellsv2 and how we migrate to them (we’re 
still on cellsv1 and about to upgrade to queens and still run cells v1 for now).

One of the problems I have is that now all our nova cell database servers need 
to respond to API requests.
With cellsv1 our architecture was to have a big powerful DB cluster (3 physical 
servers) at the API level to handle the API cell and then a smallish non HA DB 
server (usually just a VM) for each of the compute cells. 

This architecture won’t work with cells V2 and we’ll now need to have a lot of 
highly available and responsive DB servers for all the cells. 

It will also mean that our nova-apis which reside in Melbourne, Australia will 
now need to talk to database servers in Auckland, New Zealand.

The biggest issue we have is when a cell is down. We sometimes have cells go 
down for an hour or so planned or unplanned and with cellsv1 this does not 
affect other cells. 
Looks like some good work going on here 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/handling-down-cell
 


But what about quota? If a cell goes down then it would seem that a user all of 
a sudden would regain some quota from the instances that are in the down cell?
Just wondering if anyone has thought about this?

Cheers,
Sam



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder][neutron] Cross-cell cold migration

2018-08-22 Thread Sam Morrison
I think in our case we’d only migrate between cells if we know the network and 
storage is accessible and would never do it if not. 
Thinking moving from old to new hardware at a cell level.

If storage and network isn’t available ideally it would fail at the api request.

There is also ceph backed instances and so this is also something to take into 
account which nova would be responsible for.

I’ll be in Denver so we can discuss more there too.

Cheers,
Sam





> On 23 Aug 2018, at 11:23 am, Matt Riedemann  wrote:
> 
> Hi everyone,
> 
> I have started an etherpad for cells topics at the Stein PTG [1]. The main 
> issue in there right now is dealing with cross-cell cold migration in nova.
> 
> At a high level, I am going off these requirements:
> 
> * Cells can shard across flavors (and hardware type) so operators would like 
> to move users off the old flavors/hardware (old cell) to new flavors in a new 
> cell.
> 
> * There is network isolation between compute hosts in different cells, so no 
> ssh'ing the disk around like we do today. But the image service is global to 
> all cells.
> 
> Based on this, for the initial support for cross-cell cold migration, I am 
> proposing that we leverage something like shelve offload/unshelve 
> masquerading as resize. We shelve offload from the source cell and unshelve 
> in the target cell. This should work for both volume-backed and 
> non-volume-backed servers (we use snapshots for shelved offloaded 
> non-volume-backed servers).
> 
> There are, of course, some complications. The main ones that I need help with 
> right now are what happens with volumes and ports attached to the server. 
> Today we detach from the source and attach at the target, but that's assuming 
> the storage backend and network are available to both hosts involved in the 
> move of the server. Will that be the case across cells? I am assuming that 
> depends on the network topology (are routed networks being used?) and storage 
> backend (routed storage?). If the network and/or storage backend are not 
> available across cells, how do we migrate volumes and ports? Cinder has a 
> volume migrate API for admins but I do not know how nova would know the 
> proper affinity per-cell to migrate the volume to the proper host (cinder 
> does not have a routed storage concept like routed provider networks in 
> neutron, correct?). And as far as I know, there is no such thing as port 
> migration in Neutron.
> 
> Could Placement help with the volume/port migration stuff? Neutron routed 
> provider networks rely on placement aggregates to schedule the VM to a 
> compute host in the same network segment as the port used to create the VM, 
> however, if that segment does not span cells we are kind of stuck, correct?
> 
> To summarize the issues as I see them (today):
> 
> * How to deal with the targeted cell during scheduling? This is so we can 
> even get out of the source cell in nova.
> 
> * How does the API deal with the same instance being in two DBs at the same 
> time during the move?
> 
> * How to handle revert resize?
> 
> * How are volumes and ports handled?
> 
> I can get feedback from my company's operators based on what their deployment 
> will look like for this, but that does not mean it will work for others, so I 
> need as much feedback from operators, especially those running with multiple 
> cells today, as possible. Thanks in advance.
> 
> [1] https://etherpad.openstack.org/p/nova-ptg-stein-cells
> 
> -- 
> 
> Thanks,
> 
> Matt


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Listing servers with filters and policy

2017-12-11 Thread Sam Morrison
Hi Nova devs,

I’m after some feedback on how to give “non-admin” users the ability to list 
instances using “admin” filters.

https://review.openstack.org/#/c/526558/ 


Basically just trying to be able to do some finer grained permission handling 
as opposed to just whether you’re admin or not.

Thanks!
Sam

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova][glance] Who needs multiple api_servers?

2017-05-01 Thread Sam Morrison

> On 1 May 2017, at 4:24 pm, Sean McGinnis  wrote:
> 
> On Mon, May 01, 2017 at 10:17:43AM -0400, Matthew Treinish wrote:
>>> 
>> 
>> I thought it was just nova too, but it turns out cinder has the same exact
>> option as nova: (I hit this in my devstack patch trying to get glance 
>> deployed
>> as a wsgi app)
>> 
>> https://github.com/openstack/cinder/blob/d47eda3a3ba9971330b27beeeb471e2bc94575ca/cinder/common/config.py#L51-L55
>> 
>> Although from what I can tell you don't have to set it and it will fallback 
>> to
>> using the catalog, assuming you configured the catalog info for cinder:
>> 
>> https://github.com/openstack/cinder/blob/19d07a1f394c905c23f109c1888c019da830b49e/cinder/image/glance.py#L117-L129
>> 
>> 
>> -Matt Treinish
>> 
> 
> FWIW, that came with the original fork out of Nova. I do not have any real
> world data on whether that is used or not.

Yes this is used in cinder.

A lot of the projects you can set endpoints for them to use. This is extremely 
useful in a a large production Openstack install where you want to control the 
traffic.

I can understand using the catalog in certain situations and feel it’s OK for 
that to be the default but please don’t prevent operators configuring it 
differently.

Glance is the big one as you want to control the data flow efficiently but any 
service to service configuration should ideally be able to be manually 
configured.

Cheers,
Sam


> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-api-metadata managing firewall

2017-01-16 Thread Sam Morrison
Thanks Jens,

Is someone able to change the status of the bug from won’t-fix to confirmed so 
its visible.

Cheers,
Sam


> On 10 Jan 2017, at 10:52 pm, Jens Rosenboom <j.rosenb...@x-ion.de> wrote:
> 
> 2017-01-10 4:33 GMT+01:00 Sam Morrison <sorri...@gmail.com 
> <mailto:sorri...@gmail.com>>:
>> Hi nova-devs,
>> 
>> I raised a bug about nova-api-metadata messing with iptables on a host
>> 
>> https://bugs.launchpad.net/nova/+bug/1648643
>> 
>> It got closed as won’t fix but I think it could do with a little more
>> discussion.
>> 
>> Currently nova-api-metadata will create an iptable rule and also delete
>> other rules on the host. This was needed for back in the nova-network days
>> as there was some trickery going on there.
>> Now with neutron and neutron-metadata-proxy nova-api-metadata is little more
>> that a web server much like nova-api.
>> 
>> I may be missing some use case but I don’t think nova-api-metadata needs to
>> care about firewall rules (much like nova-api doesn’t care about firewall
>> rules)
> 
> I agree with Sam on this. Looking a bit into the code, the mangling part of 
> the
> iptables rules is only called in nova/network/l3.py, which seems to happen 
> only
> when nova-network is being used. The installation of the global nova-iptables
> setup however happens unconditionally in nova/api/manager.py as soon as the
> nova-api-metadata service is started, which doesn't make much sense in a
> Neutron environment. So I would propose to either make this setup happen
> only when nova-network is used or at least allow an deployer to turn it off 
> via
> a config option.
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org 
> <mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] nova-api-metadata managing firewall

2017-01-09 Thread Sam Morrison
Hi nova-devs,

I raised a bug about nova-api-metadata messing with iptables on a host 

https://bugs.launchpad.net/nova/+bug/1648643 


It got closed as won’t fix but I think it could do with a little more 
discussion.

Currently nova-api-metadata will create an iptable rule and also delete other 
rules on the host. This was needed for back in the nova-network days as there 
was some trickery going on there.
Now with neutron and neutron-metadata-proxy nova-api-metadata is little more 
that a web server much like nova-api.

I may be missing some use case but I don’t think nova-api-metadata needs to 
care about firewall rules (much like nova-api doesn’t care about firewall rules)

Thanks,
Sam

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cinder] pending review

2016-12-12 Thread Sam Morrison
Hi Cinder devs,

I’ve had a review [1] waiting for some eyes for over a month now. What’s the 
process here, usually I get a response to a review in other projects in a day 
or two. 
Is there someone I need to alert or add to the review specifically for cinder 
patches?

Thanks,
Sam

[1] https://review.openstack.org/#/c/393092/ 



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] influxdb driver gate error

2016-12-01 Thread Sam Morrison
I’ve been working a bit on this and the errors I’m getting in the gate I can 
now replicate in my environment if I use the same version of influxdb in the 
gate (0.10)

Using influxdb v1.1 works fine. anything less than 1.0 I would deem unusable 
for influxdb. So to get this to work we’d need a newer version of influxdb 
installed. 

Any idea how to do this? I see they push out a custom ceph repo to install a 
newer ceph so I guess we’d need to do something similar although influx don’t 
provide a repo, just a deb.

Sam




> On 30 Nov. 2016, at 7:35 pm, Sam Morrison <sorri...@gmail.com> wrote:
> 
> 
>> On 30 Nov. 2016, at 6:23 pm, Mehdi Abaakouk <sil...@sileht.net> wrote:
>> 
>> 
>> 
>> Le 2016-11-30 08:06, Sam Morrison a écrit :
>>> 2016-11-30 06:50:14.969302 | + pifpaf -e GNOCCHI_STORAGE run influxdb
>>> -- pifpaf -e GNOCCHI_INDEXER run mysql -- ./tools/pretty_tox.sh
>>> 2016-11-30 06:50:17.399380 | ERROR: pifpaf: 'ascii' codec can't decode
>>> byte 0xc2 in position 165: ordinal not in range(128)
>>> 2016-11-30 06:50:17.415485 | ERROR: InvocationError:
>>> '/home/jenkins/workspace/gate-gnocchi-tox-db-py27-mysql-ubuntu-xenial/run-tests.sh’
>> 
>> You can temporary pass '--debug' to pifpaf to get the full backtrace.
> 
> Good idea, thanks! Get this error. Don’t get it on the py3 job though
> 
> Get further with the py3 job but get some other errors I don’t see in my env 
> so trying to figure out what is different.
> 
> 
> 2016-11-30 07:40:17.209979 | + pifpaf --debug -e GNOCCHI_STORAGE run influxdb 
> -- pifpaf -e GNOCCHI_INDEXER run mysql -- ./tools/pretty_tox.sh
> 2016-11-30 07:40:17.746304 | DEBUG: pifpaf.drivers: executing: ['influxd', 
> '-config', '/tmp/tmp.7pq0EBpjgt/tmpikRcvn/config']
> 2016-11-30 07:40:17.759236 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> 2016-11-30 07:40:17.759804 | DEBUG: pifpaf.drivers: influxd[20435] output:  
> 888   .d888 888   888b.  88b.
> 2016-11-30 07:40:17.759909 | DEBUG: pifpaf.drivers: influxd[20435] output:
> 888d88P"  888   888  "Y88b 888  "88b
> 2016-11-30 07:40:17.760003 | DEBUG: pifpaf.drivers: influxd[20435] output:
> 888888888   888888 888  .88P
> 2016-11-30 07:40:17.760094 | DEBUG: pifpaf.drivers: influxd[20435] output:
> 888   8b.  88 888 888  888 888  888 888888 888K.
> 2016-11-30 07:40:17.760196 | DEBUG: pifpaf.drivers: influxd[20435] output:
> 888   888 "88b 888888 888  888  Y8bd8P' 888888 888  "Y88b
> 2016-11-30 07:40:17.760296 | DEBUG: pifpaf.drivers: influxd[20435] output:
> 888   888  888 888888 888  888   X88K   888888 888888
> 2016-11-30 07:40:17.760384 | DEBUG: pifpaf.drivers: influxd[20435] output:
> 888   888  888 888888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
> 2016-11-30 07:40:17.760474 | DEBUG: pifpaf.drivers: influxd[20435] output:  
> 888 888  888 888888  "Y8 888  888 888P"  888P"
> 2016-11-30 07:40:17.760516 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> 2016-11-30 07:40:17.760643 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> 2016/11/30 07:40:17 InfluxDB starting, version 0.10.0, branch unknown, commit 
> unknown, built unknown
> 2016-11-30 07:40:17.760722 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> 2016/11/30 07:40:17 Go version go1.6rc1, GOMAXPROCS set to 8
> 2016-11-30 07:40:17.859524 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> 2016/11/30 07:40:17 Using configuration at: 
> /tmp/tmp.7pq0EBpjgt/tmpikRcvn/config
> 2016-11-30 07:40:17.860852 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> [meta] 2016/11/30 07:40:17 Starting meta service
> 2016-11-30 07:40:17.861033 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> [meta] 2016/11/30 07:40:17 Listening on HTTP: 127.0.0.1:51232
> 2016-11-30 07:40:17.871362 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> [metastore] 2016/11/30 07:40:17 Using data dir: 
> /tmp/tmp.7pq0EBpjgt/tmpikRcvn/meta
> 2016-11-30 07:40:17.878511 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> [metastore] 2016/11/30 07:40:17 Node at localhost:51233 [Follower]
> 2016-11-30 07:40:19.079831 | DEBUG: pifpaf.drivers: influxd[20435] output: 
> [metastore] 2016/11/30 07:40:19 Node at localhost:51233 [Leader]. 
> peers=[localhost:51233]
> 2016-11-30 07:40:19.180811 | Traceback (most recent call last):
> 2016-11-30 07:40:19.180865 |   File "/usr/lib/python2.7/logging/__init__.py", 
> line 884, in emit
> 2016-11-30 07:40:19.182121 | stream.write(fs % msg.encode("UTF-8"))
> 2016-11-30 07:40:19.182194 | UnicodeDecodeError: 'ascii' codec can't decode 
> byte 0xc2 i

Re: [openstack-dev] [gnocchi] influxdb driver gate error

2016-11-30 Thread Sam Morrison

> On 30 Nov. 2016, at 6:23 pm, Mehdi Abaakouk <sil...@sileht.net> wrote:
> 
> 
> 
> Le 2016-11-30 08:06, Sam Morrison a écrit :
>> 2016-11-30 06:50:14.969302 | + pifpaf -e GNOCCHI_STORAGE run influxdb
>> -- pifpaf -e GNOCCHI_INDEXER run mysql -- ./tools/pretty_tox.sh
>> 2016-11-30 06:50:17.399380 | ERROR: pifpaf: 'ascii' codec can't decode
>> byte 0xc2 in position 165: ordinal not in range(128)
>> 2016-11-30 06:50:17.415485 | ERROR: InvocationError:
>> '/home/jenkins/workspace/gate-gnocchi-tox-db-py27-mysql-ubuntu-xenial/run-tests.sh’
> 
> You can temporary pass '--debug' to pifpaf to get the full backtrace.

Good idea, thanks! Get this error. Don’t get it on the py3 job though

Get further with the py3 job but get some other errors I don’t see in my env so 
trying to figure out what is different.


2016-11-30 07:40:17.209979 | + pifpaf --debug -e GNOCCHI_STORAGE run influxdb 
-- pifpaf -e GNOCCHI_INDEXER run mysql -- ./tools/pretty_tox.sh
2016-11-30 07:40:17.746304 | DEBUG: pifpaf.drivers: executing: ['influxd', 
'-config', '/tmp/tmp.7pq0EBpjgt/tmpikRcvn/config']
2016-11-30 07:40:17.759236 | DEBUG: pifpaf.drivers: influxd[20435] output: 
2016-11-30 07:40:17.759804 | DEBUG: pifpaf.drivers: influxd[20435] output:  
888   .d888 888   888b.  88b.
2016-11-30 07:40:17.759909 | DEBUG: pifpaf.drivers: influxd[20435] output:
888d88P"  888   888  "Y88b 888  "88b
2016-11-30 07:40:17.760003 | DEBUG: pifpaf.drivers: influxd[20435] output:
888888888   888888 888  .88P
2016-11-30 07:40:17.760094 | DEBUG: pifpaf.drivers: influxd[20435] output:
888   8b.  88 888 888  888 888  888 888888 888K.
2016-11-30 07:40:17.760196 | DEBUG: pifpaf.drivers: influxd[20435] output:
888   888 "88b 888888 888  888  Y8bd8P' 888888 888  "Y88b
2016-11-30 07:40:17.760296 | DEBUG: pifpaf.drivers: influxd[20435] output:
888   888  888 888888 888  888   X88K   888888 888888
2016-11-30 07:40:17.760384 | DEBUG: pifpaf.drivers: influxd[20435] output:
888   888  888 888888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
2016-11-30 07:40:17.760474 | DEBUG: pifpaf.drivers: influxd[20435] output:  
888 888  888 888888  "Y8 888  888 888P"  888P"
2016-11-30 07:40:17.760516 | DEBUG: pifpaf.drivers: influxd[20435] output: 
2016-11-30 07:40:17.760643 | DEBUG: pifpaf.drivers: influxd[20435] output: 
2016/11/30 07:40:17 InfluxDB starting, version 0.10.0, branch unknown, commit 
unknown, built unknown
2016-11-30 07:40:17.760722 | DEBUG: pifpaf.drivers: influxd[20435] output: 
2016/11/30 07:40:17 Go version go1.6rc1, GOMAXPROCS set to 8
2016-11-30 07:40:17.859524 | DEBUG: pifpaf.drivers: influxd[20435] output: 
2016/11/30 07:40:17 Using configuration at: /tmp/tmp.7pq0EBpjgt/tmpikRcvn/config
2016-11-30 07:40:17.860852 | DEBUG: pifpaf.drivers: influxd[20435] output: 
[meta] 2016/11/30 07:40:17 Starting meta service
2016-11-30 07:40:17.861033 | DEBUG: pifpaf.drivers: influxd[20435] output: 
[meta] 2016/11/30 07:40:17 Listening on HTTP: 127.0.0.1:51232
2016-11-30 07:40:17.871362 | DEBUG: pifpaf.drivers: influxd[20435] output: 
[metastore] 2016/11/30 07:40:17 Using data dir: 
/tmp/tmp.7pq0EBpjgt/tmpikRcvn/meta
2016-11-30 07:40:17.878511 | DEBUG: pifpaf.drivers: influxd[20435] output: 
[metastore] 2016/11/30 07:40:17 Node at localhost:51233 [Follower]
2016-11-30 07:40:19.079831 | DEBUG: pifpaf.drivers: influxd[20435] output: 
[metastore] 2016/11/30 07:40:19 Node at localhost:51233 [Leader]. 
peers=[localhost:51233]
2016-11-30 07:40:19.180811 | Traceback (most recent call last):
2016-11-30 07:40:19.180865 |   File "/usr/lib/python2.7/logging/__init__.py", 
line 884, in emit
2016-11-30 07:40:19.182121 | stream.write(fs % msg.encode("UTF-8"))
2016-11-30 07:40:19.182194 | UnicodeDecodeError: 'ascii' codec can't decode 
byte 0xc2 in position 211: ordinal not in range(128)
2016-11-30 07:40:19.182225 | Logged from file __init__.py, line 80
2016-11-30 07:40:19.183188 | ERROR: pifpaf: Traceback (most recent call last):
2016-11-30 07:40:19.183271 |   File 
"/home/jenkins/workspace/gate-gnocchi-tox-db-py27-mysql-ubuntu-xenial/.tox/py27-mysql/local/lib/python2.7/site-packages/fixtures/fixture.py",
 line 197, in setUp
2016-11-30 07:40:19.183325 | self._setUp()
2016-11-30 07:40:19.183389 |   File 
"/home/jenkins/workspace/gate-gnocchi-tox-db-py27-mysql-ubuntu-xenial/.tox/py27-mysql/local/lib/python2.7/site-packages/pifpaf/drivers/influxdb.py",
 line 72, in _setUp
2016-11-30 07:40:19.183410 | path=["/opt/influxdb"])
2016-11-30 07:40:19.183469 |   File 
"/home/jenkins/workspace/gate-gnocchi-tox-db-py27-mysql-ubuntu-xenial/.tox/py27-mysql/local/lib/python2.7/site-packages/pifpaf/drivers/__init__.py",
 line 140, in _exec
2016

[openstack-dev] [gnocchi] influxdb driver gate error

2016-11-29 Thread Sam Morrison
Have been working on my influxdb driver [1] and have managed to figure out the 
gate to get it to install the deps ect. Now I just get this cryptic error

2016-11-30 06:50:14.969302 | + pifpaf -e GNOCCHI_STORAGE run influxdb -- pifpaf 
-e GNOCCHI_INDEXER run mysql -- ./tools/pretty_tox.sh 
2016-11-30 06:50:17.399380 | ERROR: pifpaf: 'ascii' codec can't decode byte 
0xc2 in position 165: ordinal not in range(128) 
2016-11-30 06:50:17.415485 | ERROR: InvocationError: 
'/home/jenkins/workspace/gate-gnocchi-tox-db-py27-mysql-ubuntu-xenial/run-tests.sh’

Full logs at 
http://logs.openstack.org/60/390260/8/check/gate-gnocchi-tox-db-py27-mysql-ubuntu-xenial/42da72d/console.html

Anyone have an idea what is going on here? I can’t replicate on my machine.

Cheers,
Sam


[1] https://review.openstack.org/#/c/390260/






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [puppet][fuel][packstack][tripleo] puppet 3 end of life

2016-11-03 Thread Sam Morrison

> On 4 Nov. 2016, at 1:33 pm, Emilien Macchi <emil...@redhat.com> wrote:
> 
> On Thu, Nov 3, 2016 at 9:10 PM, Sam Morrison <sorri...@gmail.com 
> <mailto:sorri...@gmail.com>> wrote:
>> Wow I didn’t realise puppet3 was being deprecated, is anyone actually using 
>> puppet4?
>> 
>> I would hope that the openstack puppet modules would support puppet3 for a 
>> while still, at lest until the next ubuntu LTS is out else we would get to 
>> the stage where the openstack  release supports Xenial but the corresponding 
>> puppet module would not? (Xenial has puppet3)
> 
> I'm afraid we made a lot of communications around it but you might
> have missed it, no problem.
> I have 3 questions for you:
> - for what reasons would you not upgrade puppet?

Because I’m a time poor operator with more important stuff to upgrade :-)
Upgrading puppet *could* be a big task and something we haven’t had time to 
look into. Don’t follow along with puppetlabs so didn’t realise puppet3 was 
being deprecated. Now that this has come to my attention we’ll look into it for 
sure.

> - would it be possible for you to use puppetlabs packaging if you need
> puppet4 on Xenial? (that's what upstream CI is using, and it works
> quite well).

OK thats promising, good to know that the CI is using puppet4. It’s all my 
other dodgy puppet code I’m worried about.

> - what version of the modules do you deploy? (and therefore what
> version of OpenStack)

We’re using a mixture of newton/mitaka/liberty/kilo, sometimes the puppet 
module version is newer than the openstack version too depending on where we’re 
at in the upgrade process of the particular openstack project.

I understand progress must go on, I am interested though in how many operators 
use puppet4. We may be in the minority and then I’ll be quiet :-)

Maybe it should be deprecated in one release and then dropped in the next?


Cheers,
Sam





> 
>> My guess is that this would also be the case for RedHat and other distros 
>> too.
> 
> Fedora is shipping Puppet 4 and we're going to do the same for Red Hat
> and CentOS7.
> 
>> Thoughts?
>> 
>> 
>> 
>>> On 4 Nov. 2016, at 2:58 am, Alex Schultz <aschu...@redhat.com> wrote:
>>> 
>>> Hey everyone,
>>> 
>>> Puppet 3 is reaching it's end of life at the end of this year[0].
>>> Because of this we are planning on dropping official puppet 3 support
>>> as part of the Ocata cycle.  While we currently are not planning on
>>> doing any large scale conversion of code over to puppet 4 only syntax,
>>> we may allow some minor things in that could break backwards
>>> compatibility.  Based on feedback we've received, it seems that most
>>> people who may still be using puppet 3 are using older (< Newton)
>>> versions of the modules.  These modules will continue to be puppet 3.x
>>> compatible but we're using Ocata as the version where Puppet 4 should
>>> be the target version.
>>> 
>>> If anyone has any concerns or issues around this, please let us know.
>>> 
>>> Thanks,
>>> -Alex
>>> 
>>> [0] https://puppet.com/misc/puppet-enterprise-lifecycle
>>> 
>>> ___
>>> OpenStack-operators mailing list
>>> openstack-operat...@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> 
>> 
>> ___
>> OpenStack-operators mailing list
>> openstack-operat...@lists.openstack.org 
>> <mailto:openstack-operat...@lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators 
>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
> 
> 
> 
> -- 
> Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [puppet][fuel][packstack][tripleo] puppet 3 end of life

2016-11-03 Thread Sam Morrison
Wow I didn’t realise puppet3 was being deprecated, is anyone actually using 
puppet4?

I would hope that the openstack puppet modules would support puppet3 for a 
while still, at lest until the next ubuntu LTS is out else we would get to the 
stage where the openstack  release supports Xenial but the corresponding puppet 
module would not? (Xenial has puppet3)

My guess is that this would also be the case for RedHat and other distros too.

Thoughts?



> On 4 Nov. 2016, at 2:58 am, Alex Schultz  wrote:
> 
> Hey everyone,
> 
> Puppet 3 is reaching it's end of life at the end of this year[0].
> Because of this we are planning on dropping official puppet 3 support
> as part of the Ocata cycle.  While we currently are not planning on
> doing any large scale conversion of code over to puppet 4 only syntax,
> we may allow some minor things in that could break backwards
> compatibility.  Based on feedback we've received, it seems that most
> people who may still be using puppet 3 are using older (< Newton)
> versions of the modules.  These modules will continue to be puppet 3.x
> compatible but we're using Ocata as the version where Puppet 4 should
> be the target version.
> 
> If anyone has any concerns or issues around this, please let us know.
> 
> Thanks,
> -Alex
> 
> [0] https://puppet.com/misc/puppet-enterprise-lifecycle
> 
> ___
> OpenStack-operators mailing list
> openstack-operat...@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] Support for other drivers - influxdb

2016-09-19 Thread Sam Morrison
Hi Julien,


> On 16 Sep 2016, at 7:46 PM, Julien Danjou  wrote:
> 
>> I could push it up to gerrit but I think something will need to change
>> for it to run the influxdb tests?
> 
> You can use pifpaf like we do for the indexer, InfluxDB is supported.
> That should make it possible to run the unit tests in the gate right
> away.

OK I’ve made some changes to support this, will push this to my branch shortly. 
Just fixing up a coupe more things.

> As for the functional tests, you can set up support via devstack and
> we wouldhad a job in infra.

Will look into this soon.

>> It should act more like the carbonara drivers now as opposed to the
>> old influx driver. It will do downsampling and retention based on the
>> archive policies.
> 
> That's great, and I imagine it'd be faster than doing it on the fly like
> previously.
> 
>> Currently it is failing one test [1] and that is to do with retention. 
>> This is because influxDB does retention based on the current time, e.g. a 1 
>> day retention policy will be from the current time. 
>> The tests assume that the retention period is based on the data stored and 
>> so it will keep 1 day of data no matter how old that data is.
> 
> lol, yeah the test assume it's a database that does not block you to
> insert things as you want. I feel like that being a bad and funny design
> decision (Whisper has the same defect).
> 
>> I also had to disable retention policies in influx while running the tests as
>> when I backfill data influx is too smart and won’t backfill data that 
>> wouldn’t
>> meet the retention policy.
> 
> I imagine that's because some of our tests are using date in year e.g.
> 2014? :)

Yeah exactly, I think it is ok with these disabled. We can just trust that 
influx retention policies work.

> Great. Do you have performance numbers, scalability, or things that are
> different/better/worse than using Carbonara based drivers?

No performance numbers. Do you have a test in mind so I can compare. Is there a 
standard way to test this?

Cheers,
Sam


> 
> Cheers,
> -- 
> Julien Danjou
> // Free Software hacker
> // https://julien.danjou.info


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] Support for other drivers - influxdb

2016-09-15 Thread Sam Morrison
Hi Julien,

Been working a bit on this and have a patch based on master that is working at:

https://github.com/NeCTAR-RC/gnocchi/tree/influxdb-driver

I could push it up to gerrit but I think something will need to change for it 
to run the influxdb tests?

It should act more like the carbonara drivers now as opposed to the old influx 
driver. It will do downsampling and retention based on the archive policies.

Currently it is failing one test [1] and that is to do with retention. 
This is because influxDB does retention based on the current time, e.g. a 1 day 
retention policy will be from the current time. 
The tests assume that the retention period is based on the data stored and so 
it will keep 1 day of data no matter how old that data is.

I also had to disable retention policies in influx while running the tests as 
when I backfill data influx is too smart and won’t backfill data that wouldn’t 
meet the retention policy.
One way to fix all this would be to change all the test times to be relative 
from now but then there could be other race conditions etc. I think.

I’m still not 100% happy with the code, particularly around how the continuous 
queries are created based on the archive policies.

We are using this code in preprod and so far all is going well. 

Should also note that you will need influxDB v1.0 to use this. To test all you 
should need is an influx server running locally.

Would love some feed back particularly from people who know how influx works as 
there are a few ways it could be architected.

Cheers,
Sam

[1] gnocchi.tests.test_rest.MetricTest.test_add_measures_back_window




> On 6 Sep 2016, at 11:24 AM, Sam Morrison <sorri...@gmail.com> wrote:
> 
> Hi Julien,
> 
>> On 5 Sep 2016, at 5:36 PM, Julien Danjou <jul...@danjou.info> wrote:
>> 
>> On Mon, Sep 05 2016, Sam Morrison wrote:
>> 
>> Hi Sam,
>> 
>>> The issue I’m having are with the tests. Because the continuous queries are
>>> asynchronous and there is no current way in influxdb to force them to run I 
>>> get
>>> tests failing due to
>>> them not having run yet.
>>> 
>>> I’m not sure how to get around this issue, apart from the tests failing
>>> everything is working quite well. I’m going to start some load testing soon 
>>> to
>>> see what it’s like when pushing in a lot of metrics.
>> 
>> Does it break the REST API, or only some storage tests?
> 
> REST API is fine, in fact it fixes some tests that influx was failing on.
> 
>> If it's just some storage test, you can change the tests so they are
>> retrying until the operation are done. Either in the test, or via a
>> special flag in the driver – we used to have that in the first version
>> of the driver.
> 
> OK good idea, I’ll work on that.
> 
> 
>>> Wondering if there would be time to talk about this in Barcelona.
>> 
>> Sure.
> 
> Cheers,
> Sam
> 
> 
>> 
>> -- 
>> Julien Danjou
>> -- Free Software hacker
>> -- https://julien.danjou.info


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] Support for other drivers - influxdb

2016-09-06 Thread Sam Morrison

> On 6 Sep 2016, at 11:14 PM, Maxime Belanger <mbelan...@internap.com> wrote:
> 
> Hey Sam,
> 
> Are the driver you are implementing stores the index in Gnocchi index or 
> directly in influx?
> In other words, are you fully using influx under gnochi API?

Hi Max,

The index lives in gnocchi index much like the other drivers, all that is 
stored in influx db is the samples (time, metric_id, value)

Sam


> Max
> From: Sam Morrison <sorri...@gmail.com>
> Sent: September 5, 2016 9:24:21 PM
> To: Julien Danjou
> Cc: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [gnocchi] Support for other drivers - influxdb
>  
> Hi Julien,
> 
> > On 5 Sep 2016, at 5:36 PM, Julien Danjou <jul...@danjou.info> wrote:
> > 
> > On Mon, Sep 05 2016, Sam Morrison wrote:
> > 
> > Hi Sam,
> > 
> >> The issue I’m having are with the tests. Because the continuous queries are
> >> asynchronous and there is no current way in influxdb to force them to run 
> >> I get
> >> tests failing due to
> >> them not having run yet.
> >> 
> >> I’m not sure how to get around this issue, apart from the tests failing
> >> everything is working quite well. I’m going to start some load testing 
> >> soon to
> >> see what it’s like when pushing in a lot of metrics.
> > 
> > Does it break the REST API, or only some storage tests?
> 
> REST API is fine, in fact it fixes some tests that influx was failing on.
> 
> > If it's just some storage test, you can change the tests so they are
> > retrying until the operation are done. Either in the test, or via a
> > special flag in the driver – we used to have that in the first version
> > of the driver.
> 
> OK good idea, I’ll work on that.
> 
> 
> >> Wondering if there would be time to talk about this in Barcelona.
> > 
> > Sure.
> 
> Cheers,
> Sam
> 
> 
> > 
> > -- 
> > Julien Danjou
> > -- Free Software hacker
> > -- https://julien.danjou.info <https://julien.danjou.info/>
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org 
> <mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org 
> <mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] Support for other drivers - influxdb

2016-09-05 Thread Sam Morrison
Hi Julien,

> On 5 Sep 2016, at 5:36 PM, Julien Danjou <jul...@danjou.info> wrote:
> 
> On Mon, Sep 05 2016, Sam Morrison wrote:
> 
> Hi Sam,
> 
>> The issue I’m having are with the tests. Because the continuous queries are
>> asynchronous and there is no current way in influxdb to force them to run I 
>> get
>> tests failing due to
>> them not having run yet.
>> 
>> I’m not sure how to get around this issue, apart from the tests failing
>> everything is working quite well. I’m going to start some load testing soon 
>> to
>> see what it’s like when pushing in a lot of metrics.
> 
> Does it break the REST API, or only some storage tests?

REST API is fine, in fact it fixes some tests that influx was failing on.

> If it's just some storage test, you can change the tests so they are
> retrying until the operation are done. Either in the test, or via a
> special flag in the driver – we used to have that in the first version
> of the driver.

OK good idea, I’ll work on that.


>> Wondering if there would be time to talk about this in Barcelona.
> 
> Sure.

Cheers,
Sam


> 
> -- 
> Julien Danjou
> -- Free Software hacker
> -- https://julien.danjou.info


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] Support for other drivers - influxdb

2016-09-04 Thread Sam Morrison
Hi Julien,

I’ve been working on the influx driver and have been re designing it to use 
continuous queries and retention policies so it acts more like the carbonara 
based drivers.
Basically the continuous queries down sample the data. All metrics are stored 
in the same influx measurement with the metric_id being a tag.

The issue I’m having are with the tests. Because the continuous queries are 
asynchronous and there is no current way in influxdb to force them to run I get 
tests failing due to
them not having run yet.

I’m not sure how to get around this issue, apart from the tests failing 
everything is working quite well. I’m going to start some load testing soon to 
see what it’s like when pushing in a lot of metrics.

Wondering if there would be time to talk about this in Barcelona.

Cheers,
Sam



> On 4 Aug 2016, at 2:59 PM, Sam Morrison <sorri...@gmail.com> wrote:
> 
> OK thanks Julien,
> 
> I’m about to go on holiday for a month so I’ll pick this up when I return. 
> One of our devs is playing with this and thinking of ways to support the 
> things currently not implemented/working.
> 
> Cheers,
> Sam
> 
> 
>> On 2 Aug 2016, at 8:35 PM, Julien Danjou <jul...@danjou.info> wrote:
>> 
>> On Tue, Aug 02 2016, Sam Morrison wrote:
>> 
>> Hi Sam!
>> 
>>> We have been using gnocchi for a while now with the influxDB driver
>>> and are keen to get the influxdb driver back into upstream.
>>> 
>>> However looking into the code and how it’s arranged it looks like
>>> there are a lot of assumptions that the backend storage driver is
>>> carbonara based.
>> 
>> More or less. There is a separation layer (index/storage) and a full
>> abstraction layer so it's possible to write a driver for any TSDB.
>> Proof, we had an InfluxDB driver.
>> Now the separation layer is not optimal for some TSDBs like InfluxDB,
>> unfortunately nobody never stepped up to enhance it.
>> 
>>> Is gnocchi an API for time series DBs or is it a time series DB
>>> itself?
>> 
>> Both. It's an API over TSDBs, and it also has its own TSDB based on
>> Carbonara+{Ceph,File,Swift}.
>> 
>>> The tests that are failing are due to the way carbonara and influx handle 
>>> the
>>> retention and multiple granularities differently. (which we can work around
>>> outside of gnocchi for now)
>>> 
>>> So I guess I’m wondering if there will be support for other drivers apart 
>>> from carbonara?
>> 
>> Sure. We dropped the InfluxDB driver because nobody was maintaining it
>> and it was not passing the tests anymore. But we'd be glad to have it
>> in-tree I'd say.
>> 
>>> We use influx because we already use it for other stuff within our 
>>> organisation
>>> and don’t want to set up ceph or swift (which is quite an endeavour) to 
>>> support
>>> another time series DB.
>> 
>> That makes sense. If you don't need scaling, I can only encourage you
>> taking a look at using Carbonara+file rather than InfluxDB in the
>> future, which I think is still a better choice.
>> 
>> But in the meantime, feel free to send a patch to include back InfluxDB
>> in Gnocchi. As long as you're ready to help us maintain it, we'll all
>> open on that. :)
>> 
>> Cheers,
>> -- 
>> Julien Danjou
>> # Free Software hacker
>> # https://julien.danjou.info
> 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] Support for other drivers - influxdb

2016-08-03 Thread Sam Morrison
OK thanks Julien,

I’m about to go on holiday for a month so I’ll pick this up when I return. One 
of our devs is playing with this and thinking of ways to support the things 
currently not implemented/working.

Cheers,
Sam


> On 2 Aug 2016, at 8:35 PM, Julien Danjou <jul...@danjou.info> wrote:
> 
> On Tue, Aug 02 2016, Sam Morrison wrote:
> 
> Hi Sam!
> 
>> We have been using gnocchi for a while now with the influxDB driver
>> and are keen to get the influxdb driver back into upstream.
>> 
>> However looking into the code and how it’s arranged it looks like
>> there are a lot of assumptions that the backend storage driver is
>> carbonara based.
> 
> More or less. There is a separation layer (index/storage) and a full
> abstraction layer so it's possible to write a driver for any TSDB.
> Proof, we had an InfluxDB driver.
> Now the separation layer is not optimal for some TSDBs like InfluxDB,
> unfortunately nobody never stepped up to enhance it.
> 
>> Is gnocchi an API for time series DBs or is it a time series DB
>> itself?
> 
> Both. It's an API over TSDBs, and it also has its own TSDB based on
> Carbonara+{Ceph,File,Swift}.
> 
>> The tests that are failing are due to the way carbonara and influx handle the
>> retention and multiple granularities differently. (which we can work around
>> outside of gnocchi for now)
>> 
>> So I guess I’m wondering if there will be support for other drivers apart 
>> from carbonara?
> 
> Sure. We dropped the InfluxDB driver because nobody was maintaining it
> and it was not passing the tests anymore. But we'd be glad to have it
> in-tree I'd say.
> 
>> We use influx because we already use it for other stuff within our 
>> organisation
>> and don’t want to set up ceph or swift (which is quite an endeavour) to 
>> support
>> another time series DB.
> 
> That makes sense. If you don't need scaling, I can only encourage you
> taking a look at using Carbonara+file rather than InfluxDB in the
> future, which I think is still a better choice.
> 
> But in the meantime, feel free to send a patch to include back InfluxDB
> in Gnocchi. As long as you're ready to help us maintain it, we'll all
> open on that. :)
> 
> Cheers,
> -- 
> Julien Danjou
> # Free Software hacker
> # https://julien.danjou.info


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [gnocchi] Support for other drivers - influxdb

2016-08-01 Thread Sam Morrison
Hi Gnocchi Devs,

We have been using gnocchi for a while now with the influxDB driver and are 
keen to get the influxdb driver back into upstream.

However looking into the code and how it’s arranged it looks like there are a 
lot of assumptions that the backend storage driver is carbonara based.

Is gnocchi an API for time series DBs or is it a time series DB itself? 

In saying that we have resurrected the driver in the stable/2.1 branch and it 
works great. 

Running tox I get:

==
Totals
==
Ran: 2655 tests in 342. sec.
 - Passed: 2353
 - Skipped: 293
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 9
Sum of execute time for each test: 1135.6970 sec.

The tests that are failing are due to the way carbonara and influx handle the 
retention and multiple granularities differently. (which we can work around 
outside of gnocchi for now)

So I guess I’m wondering if there will be support for other drivers apart from 
carbonara?

We use influx because we already use it for other stuff within our organisation 
and don’t want to set up ceph or swift (which is quite an endeavour) to support 
another time series DB.

Thanks,
Sam


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [puppet] [desginate] An update on the state of puppet-designate (and designate in RDO)

2016-07-05 Thread Sam Morrison
We (NeCTAR) use puppet-designate on Ubuntu 14.04 with Liberty.

Cheers,
Sam


> On 6 Jul 2016, at 11:47 AM, David Moreau Simard  wrote:
> 
> Hi !
> 
> tl;dr
> puppet-designate is going under some significant updates to bring it
> up to par right now.
> While I will try to ensure it is well tested and backwards compatible,
> things *could* break. Would like feedback.
> 
> I cc'd -operators because I'm interested in knowing if there are any
> users of puppet-designate right now: which distro and release of
> OpenStack?
> 
> I'm a RDO maintainer and I took interest in puppet-designate because
> we did not have any proper test coverage for designate in RDO
> packaging until now.
> 
> The RDO community mostly relies on collaboration with installation and
> deployment projects such as Puppet OpenStack to test our packaging.
> We can, in turn, provide some level of guarantee that packages built
> out of trunk branches (and eventually stable releases) should work.
> The idea is to make puppet-designate work with RDO, then integrate it
> in the puppet-openstack-integration CI scenarios and we can leverage
> that in RDO CI afterwards.
> 
> Both puppet-designate and designate RDO packaging were unfortunately
> in quite a sad state after not being maintained very well and a lot of
> work was required to even get basic tests to pass.
> The good news is that it didn't work with RDO before and now it does,
> for newton.
> Testing coverage has been improved and will be improved even further
> for both RDO and Ubuntu Cloud Archive.
> 
> If you'd like to follow the progress of the work, the reviews are
> tagged with the topic "designate-with-rdo" [1].
> 
> Let me know if you have any questions !
> 
> [1]: https://review.openstack.org/#/q/topic:designate-with-rdo
> 
> David Moreau Simard
> Senior Software Engineer | Openstack RDO
> 
> dmsimard = [irc, github, twitter]
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Rabbit-mq 3.4 crashing (anyone else seen this?)

2016-07-05 Thread Sam Morrison
We had some issues related to this too, we ended up changing our 
collect_statistics_interval to 30 seconds as opposed to the default which is 5 
I think.

We also upgraded to 3.6.2 and that version is very buggy and wouldn’t recommend 
anyone to use it. It has a memory leak and some other nasty bugs we encountered.

3.6.1 on the other hand is very stable for us and we’ve been using it in 
production for several months now. 

Sam


> On 6 Jul 2016, at 3:50 AM, Alexey Lebedev  wrote:
> 
> Hi Joshua,
> 
> Does this happen with `rates_mode` set to `none` and tuned 
> `collect_statistics_interval`? Like in 
> https://bugs.launchpad.net/fuel/+bug/1510835 
> 
> 
> High connection/channel churn during upgrade can cause such issues.
> 
> BTW, soon-to-be-released rabbitmq 3.6.3 contains several improvements related 
> to management plugin statistics handling. And almost every version before 
> that also contained some related fixes. And I think that upstream devs 
> response will have some mention of upgrade =)
> 
> Best,
> Alexey
> 
> On Tue, Jul 5, 2016 at 8:02 PM, Joshua Harlow  > wrote:
> Hi ops and dev-folks,
> 
> We over at godaddy (running rabbitmq with openstack) have been hitting a 
> issue that has been causing the `rabbit_mgmt_db` consuming nearly all the 
> processes memory (after a given amount of time),
> 
> We've been thinking that this bug (or bugs?) may have existed for a while and 
> our dual-version-path (where we upgrade the control plane and then 
> slowly/eventually upgrade the compute nodes to the same version) has somehow 
> triggered this memory leaking bug/issue since it has happened most 
> prominently on our cloud which was running nova-compute at kilo and the other 
> services at liberty (thus using the versioned objects code path more 
> frequently due to needing translations of objects).
> 
> The rabbit we are running is 3.4.0 on CentOS Linux release 7.2.1511 with 
> kernel 3.10.0-327.4.4.el7.x86_64 (do note that upgrading to 3.6.2 seems to 
> make the issue go away),
> 
> # rpm -qa | grep rabbit
> 
> rabbitmq-server-3.4.0-1.noarch
> 
> The logs that seem relevant:
> 
> ```
> **
> *** Publishers will be blocked until this alarm clears ***
> **
> 
> =INFO REPORT 1-Jul-2016::16:37:46 ===
> accepting AMQP connection <0.23638.342> (127.0.0.1:51932 
>  -> 127.0.0.1:5671 )
> 
> =INFO REPORT 1-Jul-2016::16:37:47 ===
> vm_memory_high_watermark clear. Memory used:29910180640 allowed:47126781542
> ```
> 
> This happens quite often, the crashes have been affecting our cloud over the 
> weekend (which made some dev/ops not so happy especially due to the july 4th 
> mini-vacation),
> 
> Looking to see if anyone else has seen anything similar?
> 
> For those interested this is the upstream bug/mail that I'm also seeing about 
> getting confirmation from the upstream users/devs (which also has erlang 
> crash dumps attached/linked),
> 
> https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg 
> 
> 
> Thanks,
> 
> -Josh
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> 
> 
> 
> 
> -- 
> Best,
> Alexey
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [glance] Proposal for a mid-cycle virtual sync on operator issues

2016-05-25 Thread Sam Morrison
I’m hoping some people from the Large Deployment Team can come along. It’s not 
a good time for me in Australia but hoping someone else can join in.

Sam


> On 26 May 2016, at 2:16 AM, Nikhil Komawar  wrote:
> 
> Hello,
> 
> 
> Firstly, I would like to thank Fei Long for bringing up a few operator
> centric issues to the Glance team. After chatting with him on IRC, we
> realized that there may be more operators who would want to contribute
> to the discussions to help us take some informed decisions.
> 
> 
> So, I would like to call for a 2 hour sync for the Glance team along
> with interested operators on Thursday June 9th, 2016 at 2000UTC. 
> 
> 
> If you are interested in participating please RSVP here [1], and
> participate in the poll for the tool you'd prefer. I've also added a
> section for Topics and provided a template to document the issues clearly.
> 
> 
> Please be mindful of everyone's time and if you are proposing issue(s)
> to be discussed, come prepared with well documented & referenced topic(s).
> 
> 
> If you've feedback that you are not sure if appropriate for the
> etherpad, you can reach me on irc (nick: nikhil).
> 
> 
> [1] https://etherpad.openstack.org/p/newton-glance-and-ops-midcycle-sync
> 
> -- 
> 
> Thanks,
> Nikhil Komawar
> Newton PTL for OpenStack Glance
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [glance] glance-registry deprecation: Request for feedback

2016-05-15 Thread Sam Morrison

> On 14 May 2016, at 5:36 AM, Flavio Percoco <fla...@redhat.com> wrote:
> 
>> On 5/12/16 9:20 PM, Sam Morrison wrote:
>> 
>>   We find glance registry quite useful. Have a central glance-registry api 
>> is useful when you have multiple datacenters all with glance-apis and 
>> talking back to a central registry service. I guess they could all talk back 
>> to the central DB server but currently that would be over the public 
>> Internet for us. Not really an issue, we can work around it.
>> 
>>   The major thing that the registry has given us has been rolling upgrades. 
>> We have been able to upgrade our registry first then one by one upgrade our 
>> API servers (we have about 15 glance-apis)
> 
> I'm curious to know how you did this upgrade, though. Did you shutdown your
> registry nodes, upgraded the database and then re-started them? Did you 
> upgraded
> one registry node at a time?
> 
> I'm asking because, as far as I can tell, the strategy you used for upgrading
> the registry nodes is the one you would use to upgrade the glance-api nodes
> today. Shutting down all registry nodes would live you with unusable 
> glance-api
> nodes anyway so I'd assume you did a partial upgrade or something similar to
> that.

Yeah, if glance supported versioned objects then yes this would be great. 

We only have 3 glance-registries and so upgrading these first is a lot easier 
than upgrading all ~15 of our glance-apis at once.

Sam




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [glance] glance-registry deprecation: Request for feedback

2016-05-12 Thread Sam Morrison
We find glance registry quite useful. Have a central glance-registry api is 
useful when you have multiple datacenters all with glance-apis and talking back 
to a central registry service. I guess they could all talk back to the central 
DB server but currently that would be over the public Internet for us. Not 
really an issue, we can work around it.

The major thing that the registry has given us has been rolling upgrades. We 
have been able to upgrade our registry first then one by one upgrade our API 
servers (we have about 15 glance-apis) 

I don’t think we would’ve been able to do that if all the glance-apis were 
talking to the DB, (At least not in glance’s current state)

Sam




> On 12 May 2016, at 1:51 PM, Flavio Percoco  wrote:
> 
> Greetings,
> 
> The Glance team is evaluating the needs and usefulness of the Glance Registry
> service and this email is a request for feedback from the overall community
> before the team moves forward with anything.
> 
> Historically, there have been reasons to create this service. Some deployments
> use it to hide database credentials from Glance public endpoints, others use 
> it
> for scaling purposes and others because v1 depends on it. This is a good time
> for the team to re-evaluate the need of these services since v2 doesn't depend
> on it.
> 
> So, here's the big question:
> 
> Why do you think this service should be kept around?
> 
> Summit etherpad: 
> https://etherpad.openstack.org/p/newton-glance-registry-deprecation
> 
> Flavio
> -- 
> @flaper87
> Flavio Percoco
> ___
> OpenStack-operators mailing list
> openstack-operat...@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [cinder] [all] The future of Cinder API v1

2015-09-28 Thread Sam Morrison
Yeah we’re still using v1 as the clients that are packaged with most distros 
don’t support v2 easily.

Eg. with Ubuntu Trusty they have version 1.1.1, I just updated our “volume” 
endpoint to point to v2 (we have a volumev2 endpoint too) and the client breaks.

$ cinder list
ERROR: OpenStack Block Storage API version is set to 1 but you are accessing a 
2 endpoint. Change its value through --os-volume-api-version or 
env[OS_VOLUME_API_VERSION].

Sam


> On 29 Sep 2015, at 8:34 am, Matt Fischer  wrote:
> 
> Yes, people are probably still using it. Last time I tried to use V2 it 
> didn't work because the clients were broken, and then it went back on the 
> bottom of my to do list. Is this mess fixed?
> 
> http://lists.openstack.org/pipermail/openstack-operators/2015-February/006366.html
>  
> 
> 
> On Mon, Sep 28, 2015 at 4:25 PM, Ivan Kolodyazhny  > wrote:
> Hi all,
> 
> As you may know, we've got 2 APIs in Cinder: v1 and v2. Cinder v2 API was 
> introduced in Grizzly and v1 API is deprecated since Juno.
> 
> After [1] is merged, Cinder API v1 is disabled in gates by default. We've got 
> a filed bug [2] to remove Cinder v1 API at all.
> 
> 
> According to Deprecation Policy [3] looks like we are OK to remote it. But I 
> would like to ask Cinder API users if any still use API v1.
> Should we remove it at all Mitaka release or just disable by default in the 
> cinder.conf?
> 
> AFAIR, only Rally doesn't support API v2 now and I'm going to implement it 
> asap.
> 
> [1] https://review.openstack.org/194726  
> [2] https://bugs.launchpad.net/cinder/+bug/1467589 
> 
> [3] 
> http://lists.openstack.org/pipermail/openstack-dev/2015-September/073576.html 
> 
> Regards,
> Ivan Kolodyazhny
> 
> ___
> OpenStack-operators mailing list
> openstack-operat...@lists.openstack.org 
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators 
> 
> 
> 
> ___
> OpenStack-operators mailing list
> openstack-operat...@lists.openstack.org 
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators 
> 
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-24 Thread Sam Morrison

> On 24 Sep 2015, at 6:19 pm, Sylvain Bauza  wrote:
> 
> Ahem, Nova AZs are not failure domains - I mean the current implementation, 
> in the sense of many people understand what is a failure domain, ie. a 
> physical unit of machines (a bay, a room, a floor, a datacenter).
> All the AZs in Nova share the same controlplane with the same message queue 
> and database, which means that one failure can be propagated to the other AZ.
> 
> To be honest, there is one very specific usecase where AZs *are* failure 
> domains : when cells exact match with AZs (ie. one AZ grouping all the hosts 
> behind one cell). That's the very specific usecase that Sam is mentioning in 
> his email, and I certainly understand we need to keep that.
> 
> What are AZs in Nova is pretty well explained in a quite old blogpost : 
> http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/
>  
> 
Yes an AZ may not be considered a failure domain in terms of control 
infrastructure, I think all operators understand this. If you want control 
infrastructure failure domains use regions.

However from a resource level (eg. running instance/ running volume) I would 
consider them some kind of failure domain. It’s a way of saying to a user if 
you have resources running in 2 AZs you have a more available service. 

Every cloud will have a different definition of what an AZ is, a 
rack/collection of racks/DC etc. openstack doesn’t need to decide what that is.

Sam

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Sam Morrison
Just got alerted to this on the operator list.

We very much rely on this.

We have multiple availability zones in nova and each zone has a corresponding 
cinder-volume service(s) in the same availability zone.

We don’t want people attaching a volume from one zone to another as the network 
won’t allow that as the zones are in different network domains and different 
data centres.

I wonder if you guys can reconsider deprecating this option as it is very 
useful to us.

Cheers,
Sam



> On 24 Sep 2015, at 7:43 am, Mathieu Gagné  wrote:
> 
> On 2015-09-23 4:50 PM, Andrew Laski wrote:
>> On 09/23/15 at 04:30pm, Mathieu Gagné wrote:
>>> On 2015-09-23 4:12 PM, Andrew Laski wrote:
 On 09/23/15 at 02:55pm, Matt Riedemann wrote:
> 
> Heh, so when I just asked in the cinder channel if we can just
> deprecate nova boot from volume with source=(image|snapshot|blank)
> (which automatically creates the volume and polls for it to be
> available) and then add a microversion that doesn't allow it, I was
> half joking, but I see we're on the same page.  This scenario seems to
> introduce a lot of orchestration work that nova shouldn't necessarily
> be in the business of handling.
 
 I am very much in support of this.  This has been a source of
 frustration for our users because it is prone to failures we can't
 properly expose to users and timeouts.  There are much better places to
 handle the orchestration of creating a volume and then booting from it
 than Nova.
 
>>> 
>>> Unfortunately, this is a feature our users *heavily* rely on and we
>>> worked very hard to make it happen. We had a private patch on our side
>>> for years to optimize boot-from-volume before John Griffith came up with
>>> an upstream solution for SolidFire [2] and others with a generic
>>> solution [3] [4].
>>> 
>>> Being able to "nova boot" and have everything done for you is awesome.
>>> Just see what Monty Taylor mentioned in his thread about sane default
>>> networking [1]. Having orchestration on the client side is just
>>> something our users don't want to have to do and often complain about.
>> 
>> At risk of getting too offtopic I think there's an alternate solution to
>> doing this in Nova or on the client side.  I think we're missing some
>> sort of OpenStack API and service that can handle this.  Nova is a low
>> level infrastructure API and service, it is not designed to handle these
>> orchestrations.  I haven't checked in on Heat in a while but perhaps
>> this is a role that it could fill.
>> 
>> I think that too many people consider Nova to be *the* OpenStack API
>> when considering instances/volumes/networking/images and that's not
>> something I would like to see continue.  Or at the very least I would
>> like to see a split between the orchestration/proxy pieces and the
>> "manage my VM/container/baremetal" bits.
>> 
> 
> "too many people" happens to include a lot of 3rd party tools supporting
> OpenStack which our users complain a lot about. Just see all the
> possible way to get an external IP [5]. Introducing yet another service
> would increase the pain on our users which will see their tools and
> products not working even more.
> 
> Just see how EC2 is doing it [6], you won't see them suggest to use yet
> another service to orchestrate what I consider a fundamental feature "I
> wish to boot an instance on a volume".
> 
> The current ease to boot from volume is THE selling feature our users
> want and heavily/actively use. We fought very hard to make it work and
> reading about how it should be removed is frustrating.
> 
> Issues we identified shouldn't be a reason to drop this feature. Other
> providers are making it work and I don't see why we couldn't. I'm
> convinced we can do better.
> 
> [5]
> https://github.com/openstack-infra/shade/blob/03c1556a12aabfc21de60a9fac97aea7871485a3/shade/meta.py#L106-L173
> [6]
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html
> 
> Mathieu
> 
>>> 
>>> [1]
>>> http://lists.openstack.org/pipermail/openstack-dev/2015-September/074527.html
>>> 
>>> [2] https://review.openstack.org/#/c/142859/
>>> [3] https://review.openstack.org/#/c/195795/
>>> [4] https://review.openstack.org/#/c/201754/
>>> 
>>> -- 
>>> Mathieu
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

2015-09-23 Thread Sam Morrison

> On 24 Sep 2015, at 9:59 am, Andrew Laski  wrote:
> 
> I was perhaps hasty in approving that patch and didn't realize that Matt had 
> reached out for operator feedback at the same time that he proposed it. Since 
> this is being used in production I wouldn't want it to be removed without at 
> least having an alternative, and hopefully better, method of achieving your 
> goal.  Reverting the deprecation seems reasonable to me for now while we work 
> out the details around Cinder/Nova AZ interactions.

Thanks Andrew,

What we basically want is for our users to have instances and volumes on a 
section of hardware and then for them to be able to have other instances and 
volumes in another section of hardware.

If one section dies then the other section is fine. For us we use 
availability-zones for this. If this is not the intended use for AZs what is a 
better way for us to do this.

Cheers,
Sam



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [keystone][all] Deprecating slash ('/') in project names

2015-07-06 Thread Sam Morrison
Do you mean project names or project IDs?

Sam


 On 3 Jul 2015, at 12:12 am, Henrique Truta henriquecostatr...@gmail.com 
 wrote:
 
 Hi everyone,
 
 In Kilo, keystone introduced the concept of Hierarchical Multitenancy[1], 
 which allows cloud operators to organize projects in hierarchies. This 
 concept is evolving in Liberty, with the addition of the Reseller use 
 case[2], where among other features, it’ll have hierarchies of domains by 
 making the domain concept a feature of projects and not a different entity: 
 from now on, every domain will be treated as a project that has the 
 “is_domain” property set to True.
 
 Currently, getting a project scoped token can be made by only passing the 
 project name and the domain it belongs to, once project names are unique 
 between domains. However with those hierarchies of projects, in M we intend 
 to remove this constraint in order to make a project name unique only in its 
 level in the hierarchy (project parent). In other words, it won’t be possible 
 to have sibling projects with the same name. For example. the following 
 hierarchy will be valid:
 
A - project with the domain feature
  /\
 B   C   - “pure” projects, children of A
 |  |
A B  - “pure” projects, children of B and C respectively
 
 Therefore, the cloud user faces some problems when getting a project scoped 
 token by name to projects A or B, since keystone won’t be able to distinguish 
 them only by their names. The best way to solve this problem is providing the 
 full hierarchy, like “A/B/A”, “A/B”, “A/C/B” and so on.
 
 To achieve this, we intend to deprecate the “/” character in project names in 
 Liberty and prohibit it in M, removing/replacing this character in a database 
 migration**.
 
 Do you have some strong reason to keep using this character in project names? 
 How bad would it be for existing deploys? We’d like to hear from you.
 
 Best regards,
 Henrique
 
 ** LDAP as assignment backend does not support Hierarchical Multitenancy. 
 This change will be only applied to SQL backends.
 [1] 
 http://specs.openstack.org/openstack/keystone-specs/specs/juno/hierarchical_multitenancy.html
  
 http://specs.openstack.org/openstack/keystone-specs/specs/juno/hierarchical_multitenancy.html
 [2] 
 http://specs.openstack.org/openstack/keystone-specs/specs/kilo/reseller.html 
 http://specs.openstack.org/openstack/keystone-specs/specs/kilo/reseller.html
 ___
 OpenStack-operators mailing list
 openstack-operat...@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] [neutron] Re: How do your end users use networking?

2015-06-17 Thread Sam Morrison

 On 17 Jun 2015, at 8:35 pm, Neil Jerram neil.jer...@metaswitch.com wrote:
 
 Hi Sam,
 
 On 17/06/15 01:31, Sam Morrison wrote:
 We at NeCTAR are starting the transition to neutron from nova-net and 
 neutron almost does what we want.
 
 We have 10 “public networks and 10 “service networks and depending on 
 which compute node you land on you get attached to one of them.
 
 In neutron speak we have multiple shared externally routed provider 
 networks. We don’t have any tenant networks or any other fancy stuff yet.
 How I’ve currently got this set up is by creating 10 networks and subsequent 
 subnets eg. public-1, public-2, public-3 … and service-1, service-2, 
 service-3 and so on.
 
 In nova we have made a slight change in allocate for instance [1] whereby 
 the compute node has a designated hardcoded network_ids for the public and 
 service network it is physically attached to.
 We have also made changes in the nova API so users can’t select a network 
 and the neutron endpoint is not registered in keystone.
 
 That all works fine but ideally I want a user to be able to choose if they 
 want a public and or service network. We can’t let them as we have 10 public 
 networks, we almost need something in neutron like a network group” or 
 something that allows a user to select “public” and it allocates them a port 
 in one of the underlying public networks.
 
 This begs the question: why have you defined 10 public-N networks, instead of 
 just one public network?

I think this has all been answered but just in case.
There are multiple reasons. We don’t have a single IPv4 range big enough for 
our cloud, don’t want the broadcast domain too be massive, the compute nodes 
are in different data centres etc. etc.
Basically it’s not how our underlying physical network is set up and we can’t 
change that.

Sam


 
 I tried going down the route of having 1 public and 1 service network in 
 neutron then creating 10 subnets under each. That works until you get to 
 things like dhcp-agent and metadata agent although this looks like it could 
 work with a few minor changes. Basically I need a dhcp-agent to be spun up 
 per subnet and ensure they are spun up in the right place.
 
 Why the 10 subnets?  Is it to do with where you actually have real L2 
 segments, in your deployment?
 
 Thanks,
   Neil
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] [neutron] Re: How do your end users use networking?

2015-06-17 Thread Sam Morrison

 On 18 Jun 2015, at 2:59 am, Neil Jerram neil.jer...@metaswitch.com wrote:
 
 
 
 On 17/06/15 16:17, Kris G. Lindgren wrote:
 See inline.
 
 
 Kris Lindgren
 Senior Linux Systems Engineer
 GoDaddy, LLC.
 
 
 
 On 6/17/15, 5:12 AM, Neil Jerram neil.jer...@metaswitch.com wrote:
 
 Hi Kris,
 
 Apologies in advance for questions that are probably really dumb - but
 there are several points here that I don't understand.
 
 On 17/06/15 03:44, Kris G. Lindgren wrote:
 We are doing pretty much the same thing - but in a slightly different
 way.
   We extended the nova scheduler to help choose networks (IE. don't put
 vm's on a network/host that doesn't have any available IP address).
 
 Why would a particular network/host not have any available IP address?
 
  If a created network has 1024 ip's on it (/22) and we provision 1020 vms,
  anything deployed after that will not have an additional ip address
 because
  the network doesn't have any available ip addresses (loose some ip's to
  the network).
 
 OK, thanks, that certainly explains the particular network possibility.
 
 So I guess this applies where your preference would be for network A, but it 
 would be OK to fall back to network B, and so on.  That sounds like it could 
 be a useful general enhancement.
 
 (But, if a new VM absolutely _has_ to be on, say, the 'production' network, 
 and the 'production' network is already fully used, you're fundamentally 
 stuck, aren't you?)
 
 What about the /host part?  Is it possible in your system for a network to 
 have IP addresses available, but for them not to be usable on a particular 
 host?
 
 Then,
 we add into the host-aggregate that each HV is attached to a network
 metadata item which maps to the names of the neutron networks that host
 supports.  This basically creates the mapping of which host supports
 what
 networks, so we can correctly filter hosts out during scheduling. We do
 allow people to choose a network if they wish and we do have the neutron
 end-point exposed. However, by default if they do not supply a boot
 command with a network, we will filter the networks down and choose one
 for them.  That way they never hit [1].  This also works well for us,
 because the default UI that we provide our end-users is not horizon.
 
 Why do you define multiple networks - as opposed to just one - and why
 would one of your users want to choose a particular one of those?
 
 (Do you mean multiple as in public-1, public-2, ...; or multiple as in
 public, service, ...?)
 
  This is answered in the other email and original email as well.  But
 basically
  we have multiple L2 segments that only exists on certain switches and
 thus are
  only tied to certain hosts.  With the way neutron is currently structured
 we
  need to create a network for each L2. So that¹s why we define multiple
 networks.
 
 Thanks!  Ok, just to check that I really understand this:
 
 - You have real L2 segments connecting some of your compute hosts together - 
 and also I guess to a ToR that does L3 to the rest of the data center.
 
 - You presumably then just bridge all the TAP interfaces, on each host, to 
 the host's outwards-facing interface.
 
   + VM
   |
   +- Host + VM
   |   |
   |   + VM
   |
   |   + VM
   |   |
   +- Host + VM
   |   |
 ToR ---+   + VM
   |
   |   + VM
   |   |
   |- Host + VM
   |
   + VM
 
 - You specify each such setup as a network in the Neutron API - and hence you 
 have multiple similar networks, for your data center as a whole.
 
 Out of interest, do you do this just because it's the Right Thing according 
 to the current Neutron API - i.e. because a Neutron network is L2 - or also 
 because it's needed in order to get the Neutron implementation components 
 that you use to work correctly?  For example, so that you have a DHCP agent 
 for each L2 network (if you use the Neutron DHCP agent).
 
  For our end users - they only care about getting a vm with a single ip
 address
  in a network which is really a zone like prod or dev or test.
 They stop
  caring after that point.  So in the scheduler filter that we created we
 do
  exactly that.  We will filter down from all the hosts and networks down
 to a
  combo that intersects at a host that has space, with a network that has
 space,
  And the network that was chosen is actually available to that host.
 
 Thanks, makes perfect sense now.
 
 So I think there are two possible representations, overall, of what you are 
 looking for.
 
 1. A 'network group' of similar L2 networks.  When a VM is launched, tenant 
 specifies the network group instead of a particular L2 network, and 
 Nova/Neutron select a host and network with available 

Re: [openstack-dev] [Openstack-operators] [nova] [neutron] Re: How do your end users use networking?

2015-06-16 Thread Sam Morrison

 On 17 Jun 2015, at 10:56 am, Armando M. arma...@gmail.com wrote:
 
 
 
 On 16 June 2015 at 17:31, Sam Morrison sorri...@gmail.com 
 mailto:sorri...@gmail.com wrote:
 We at NeCTAR are starting the transition to neutron from nova-net and neutron 
 almost does what we want.
 
 We have 10 “public networks and 10 “service networks and depending on which 
 compute node you land on you get attached to one of them.
 
 In neutron speak we have multiple shared externally routed provider networks. 
 We don’t have any tenant networks or any other fancy stuff yet.
 How I’ve currently got this set up is by creating 10 networks and subsequent 
 subnets eg. public-1, public-2, public-3 … and service-1, service-2, 
 service-3 and so on.
 
 In nova we have made a slight change in allocate for instance [1] whereby the 
 compute node has a designated hardcoded network_ids for the public and 
 service network it is physically attached to.
 We have also made changes in the nova API so users can’t select a network and 
 the neutron endpoint is not registered in keystone.
 
 That all works fine but ideally I want a user to be able to choose if they 
 want a public and or service network. We can’t let them as we have 10 public 
 networks, we almost need something in neutron like a network group” or 
 something that allows a user to select “public” and it allocates them a port 
 in one of the underlying public networks.
 
 I tried going down the route of having 1 public and 1 service network in 
 neutron then creating 10 subnets under each. That works until you get to 
 things like dhcp-agent and metadata agent although this looks like it could 
 work with a few minor changes. Basically I need a dhcp-agent to be spun up 
 per subnet and ensure they are spun up in the right place.
 
 I’m not sure what the correct way of doing this. What are other people doing 
 in the interim until this kind of use case can be done in Neutron?
 
 Would something like [1] be adequate to address your use case? If not, I'd 
 suggest you to file an RFE bug (more details in [2]), so that we can keep the 
 discussion focused on this specific case.
 
 HTH
 Armando
 
 [1] https://blueprints.launchpad.net/neutron/+spec/rbac-networks 
 https://blueprints.launchpad.net/neutron/+spec/rbac-networks
That’s not applicable in this case. We don’t care about what tenants are when 
in this case.

 [2] 
 https://github.com/openstack/neutron/blob/master/doc/source/policies/blueprints.rst#neutron-request-for-feature-enhancements
  
 https://github.com/openstack/neutron/blob/master/doc/source/policies/blueprints.rst#neutron-request-for-feature-enhancements
The bug Kris mentioned outlines all I want too I think.

Sam


 
  
 
 Cheers,
 Sam
 
 [1] 
 https://github.com/NeCTAR-RC/nova/commit/1bc2396edc684f83ce471dd9dc9219c4635afb12
  
 https://github.com/NeCTAR-RC/nova/commit/1bc2396edc684f83ce471dd9dc9219c4635afb12
 
 
 
  On 17 Jun 2015, at 12:20 am, Jay Pipes jaypi...@gmail.com 
  mailto:jaypi...@gmail.com wrote:
 
  Adding -dev because of the reference to the Neutron Get me a network 
  spec. Also adding [nova] and [neutron] subject markers.
 
  Comments inline, Kris.
 
  On 05/22/2015 09:28 PM, Kris G. Lindgren wrote:
  During the Openstack summit this week I got to talk to a number of other
  operators of large Openstack deployments about how they do networking.
   I was happy, surprised even, to find that a number of us are using a
  similar type of networking strategy.  That we have similar challenges
  around networking and are solving it in our own but very similar way.
   It is always nice to see that other people are doing the same things
  as you or see the same issues as you are and that you are not crazy.
  So in that vein, I wanted to reach out to the rest of the Ops Community
  and ask one pretty simple question.
 
  Would it be accurate to say that most of your end users want almost
  nothing to do with the network?
 
  That was my experience at ATT, yes. The vast majority of end users could 
  not care less about networking, as long as the connectivity was reliable, 
  performed well, and they could connect to the Internet (and have others 
  connect from the Internet to their VMs) when needed.
 
  In my experience what the majority of them (both internal and external)
  want is to consume from Openstack a compute resource, a property of
  which is it that resource has an IP address.  They, at most, care about
  which network they are on.  Where a network is usually an arbitrary
  definition around a set of real networks, that are constrained to a
  location, in which the company has attached some sort of policy.  For
  example, I want to be in the production network vs's the xyz lab
  network, vs's the backup network, vs's the corp network.  I would say
  for Godaddy, 99% of our use cases would be defined as: I want a compute
  resource in the production network zone, or I want a compute resource in
  this other network zone.  The end user only cares

Re: [openstack-dev] [Openstack-operators] [nova] [neutron] Re: How do your end users use networking?

2015-06-16 Thread Sam Morrison
We at NeCTAR are starting the transition to neutron from nova-net and neutron 
almost does what we want.

We have 10 “public networks and 10 “service networks and depending on which 
compute node you land on you get attached to one of them.

In neutron speak we have multiple shared externally routed provider networks. 
We don’t have any tenant networks or any other fancy stuff yet.
How I’ve currently got this set up is by creating 10 networks and subsequent 
subnets eg. public-1, public-2, public-3 … and service-1, service-2, service-3 
and so on.

In nova we have made a slight change in allocate for instance [1] whereby the 
compute node has a designated hardcoded network_ids for the public and service 
network it is physically attached to.
We have also made changes in the nova API so users can’t select a network and 
the neutron endpoint is not registered in keystone.

That all works fine but ideally I want a user to be able to choose if they want 
a public and or service network. We can’t let them as we have 10 public 
networks, we almost need something in neutron like a network group” or 
something that allows a user to select “public” and it allocates them a port in 
one of the underlying public networks.

I tried going down the route of having 1 public and 1 service network in 
neutron then creating 10 subnets under each. That works until you get to things 
like dhcp-agent and metadata agent although this looks like it could work with 
a few minor changes. Basically I need a dhcp-agent to be spun up per subnet and 
ensure they are spun up in the right place.

I’m not sure what the correct way of doing this. What are other people doing in 
the interim until this kind of use case can be done in Neutron?

Cheers,
Sam
 
[1] 
https://github.com/NeCTAR-RC/nova/commit/1bc2396edc684f83ce471dd9dc9219c4635afb12



 On 17 Jun 2015, at 12:20 am, Jay Pipes jaypi...@gmail.com wrote:
 
 Adding -dev because of the reference to the Neutron Get me a network spec. 
 Also adding [nova] and [neutron] subject markers.
 
 Comments inline, Kris.
 
 On 05/22/2015 09:28 PM, Kris G. Lindgren wrote:
 During the Openstack summit this week I got to talk to a number of other
 operators of large Openstack deployments about how they do networking.
  I was happy, surprised even, to find that a number of us are using a
 similar type of networking strategy.  That we have similar challenges
 around networking and are solving it in our own but very similar way.
  It is always nice to see that other people are doing the same things
 as you or see the same issues as you are and that you are not crazy.
 So in that vein, I wanted to reach out to the rest of the Ops Community
 and ask one pretty simple question.
 
 Would it be accurate to say that most of your end users want almost
 nothing to do with the network?
 
 That was my experience at ATT, yes. The vast majority of end users could not 
 care less about networking, as long as the connectivity was reliable, 
 performed well, and they could connect to the Internet (and have others 
 connect from the Internet to their VMs) when needed.
 
 In my experience what the majority of them (both internal and external)
 want is to consume from Openstack a compute resource, a property of
 which is it that resource has an IP address.  They, at most, care about
 which network they are on.  Where a network is usually an arbitrary
 definition around a set of real networks, that are constrained to a
 location, in which the company has attached some sort of policy.  For
 example, I want to be in the production network vs's the xyz lab
 network, vs's the backup network, vs's the corp network.  I would say
 for Godaddy, 99% of our use cases would be defined as: I want a compute
 resource in the production network zone, or I want a compute resource in
 this other network zone.  The end user only cares that the IP the vm
 receives works in that zone, outside of that they don't care any other
 property of that IP.  They do not care what subnet it is in, what vlan
 it is on, what switch it is attached to, what router its attached to, or
 how data flows in/out of that network.  It just needs to work. We have
 also found that by giving the users a floating ip address that can be
 moved between vm's (but still constrained within a network zone) we
 can solve almost all of our users asks.  Typically, the internal need
 for a floating ip is when a compute resource needs to talk to another
 protected internal or external resource. Where it is painful (read:
 slow) to have the acl's on that protected resource updated. The external
 need is from our hosting customers who have a domain name (or many) tied
 to an IP address and changing IP's/DNS is particularly painful.
 
 This is precisely my experience as well.
 
 Since the vast majority of our end users don't care about any of the
 technical network stuff, we spend a large amount of time/effort in
 abstracting or hiding the technical stuff from the users view. Which 

[openstack-dev] [nova] Tempest failure help

2015-05-19 Thread Sam Morrison
Hi nova devs,

I have a patch https://review.openstack.org/#/c/181776/ 
https://review.openstack.org/#/c/181776/ where I have a failing tempest job 
which I can’t figure out. Can anyone help me?

Cheers,
Sam



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] readout from Philly Operators Meetup

2015-03-11 Thread Sam Morrison

 On 11 Mar 2015, at 11:59 pm, Sean Dague s...@dague.net wrote:
 
 Nova Rolling Upgrades
 -
 
 Most people really like the concept, couldn't find anyone that had
 used it yet because Neutron doesn't support it, so they had to big
 bang upgrades anyway.

I couldn’t make it to the ops summit but we (NeCTAR) have been using the 
rolling upgrades for Havana - Icehouse and Icehouse - Juno 
and it has worked great. (We’re still using nova-network)

Cheers,
Sam


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Taking a break..

2014-10-23 Thread Sam Morrison
Thanks for all the help Chris and all the best.

Now on the lookout for another cells core I can harass and pass obscure bugs 
too. It was always reassuring knowing you’d probably already come across the 
issue and could point me to a review or git branch with a fix.

Cheers,
Sam


 On 23 Oct 2014, at 4:37 am, Chris Behrens cbehr...@codestud.com wrote:
 
 Hey all,
 
 Just wanted to drop a quick note to say that I decided to leave Rackspace to 
 pursue another opportunity. My last day was last Friday. I won’t have much 
 time for OpenStack, but I’m going to continue to hang out in the channels. 
 Having been involved in the project since day 1, I’m going to find it 
 difficult to fully walk away. I really don’t know how much I’ll continue to 
 stay involved. I am completely burned out on nova. However, I’d really like 
 to see versioned objects broken out into oslo and Ironic synced with nova’s 
 object advancements. So, if I work on anything, it’ll probably be related to 
 that.
 
 Cells will be left in a lot of capable hands. I have shared some thoughts 
 with people on how I think we can proceed to make it ‘the way’ in nova. I’m 
 going to work on documenting some of this in an etherpad so the thoughts 
 aren’t lost.
 
 Anyway, it’s been fun… the project has grown like crazy! Keep on trucking... 
 And while I won’t be active much, don’t be afraid to ping me!
 
 - Chris
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread Sam Morrison

 On 23 Oct 2014, at 5:55 am, Andrew Laski andrew.la...@rackspace.com wrote:
 
 While I agree that N is a bit interesting, I have seen N=3 in production
 
 [central API]--[state/region1]--[state/region DC1]
\-[state/region DC2]
   --[state/region2 DC]
   --[state/region3 DC]
   --[state/region4 DC]
 
 I would be curious to hear any information about how this is working out.  
 Does everything that works for N=2 work when N=3?  Are there fixes that 
 needed to be added to make this work?  Why do it this way rather than bring 
 [state/region DC1] and [state/region DC2] up a level?

We (NeCTAR) have 3 tiers, our current setup has one parent, 6 children then 3 
of the children have 2 grandchildren each. All compute nodes are at the lowest 
level.

Everything works fine and we haven’t needed to do any modifications. 

We run in a 3 tier system because it matches how our infrastructure is 
logically laid out, but I don’t see a problem in just having a 2 tier system 
and getting rid of the middle man.

Sam


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] [nova] Havana - Icehouse upgrades with cells

2014-04-24 Thread Sam Morrison
Hey Chris,

On 24 Apr 2014, at 4:28 pm, Chris Behrens cbehr...@codestud.com wrote:

 On Apr 23, 2014, at 6:36 PM, Sam Morrison sorri...@gmail.com wrote:
 
 Yeah I’m not sure what’s going on, I removed my hacks and tried it using the 
 conductor rpcapi service and got what I think is a recursive call in 
 nova-conductor.
 
 Added more details to https://bugs.launchpad.net/nova/+bug/1308805
 
 I’m thinking there maybe something missing in the stable/havana branch or 
 else cells is doing something different when it comes to objects.
 I don’t think it is a cells issue though as debugging it, it seems like it 
 just can’t back port a 1.13 object to 1.9.
 
 Cheers,
 Sam
 
 Oh.  You know, it turns out that conductor API bug you found…was really not a 
 real bug, I don’t think. The only thing that can backport is the conductor 
 service, if the conductor service has been upgraded. Ie, ‘use_local’ would 
 never ever work, because it was the local service that didn’t understand the 
 new object version to begin with. So trying to use_local would still not 
 understand the new version. Make sense? (This should probably be made to fail 
 gracefully, however :)

Yeah I understand that now, thanks for that.

 And yeah, I think what you have going on now when you’re actually using the 
 conductor… is that conductor is getting a request to backport, but it doesn’t 
 know how to backport…. so it’s kicking it to itself to backport.. and 
 infinite recursion occurs. Do you happen to have use_local=False in your 
 nova-conductor nova.conf? That would cause nova-conductor to RPC to itself to 
 try to backport, hehe. Again, we should probably have some graceful failing 
 here in some way. 1) nova-conductor should probably always force 
 use_local=True. And the LocalAPI should probably just implement 
 object_backport() such that it raises a nice error.

Hmm I may have but I’ve just done another test with everything set to 
use_local=False except nova-conductor where use_local=True
I also reverted that change I put though as mentioned above and I still get an 
infinite loop. Can’t really figure out what is going on here. 
Conductor is trying to talk to conductor and use_local definitely equals True.
(this is all with havana conductor btw)

 So, does your nova-conductor not have object version 1.13? As I was trying to 
 get at in a previous reply, I think the only way this can possibly work is 
 that you have Icehouse nova-conductor running in ALL cells.

OK so in my compute cell I am now running an Icehouse conductor. Everything 
else is Havana including the DB version.

This actually seems to make all the things that didn’t work now work. However 
it also means that the thing that did work (booting an instance) no longer 
works.
This is an easy fix and just requires nova-conductor to call the run_instance 
scheduler rpcapi method with version 2.9 as opposed the icehouse version 3.0.
I don’t think anything has changed here so this might be an easy fix that could 
be pushed upstream. It just needs to change the scheduler rpcapi to be aware 
what version it can use.
I changed the upgrade_levels scheduler=havana but that wasn’t handled by the 
scheduler rpcapi and just gave a version not new enough exception.

I think I’m making progress…..

Sam




 
 - Chris
 
 
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] [nova] Havana - Icehouse upgrades with cells

2014-04-24 Thread Sam Morrison

On 25 Apr 2014, at 12:30 am, Chris Behrens cbehr...@codestud.com wrote:

 So, does your nova-conductor not have object version 1.13? As I was trying 
 to get at in a previous reply, I think the only way this can possibly work 
 is that you have Icehouse nova-conductor running in ALL cells.
 
 OK so in my compute cell I am now running an Icehouse conductor. Everything 
 else is Havana including the DB version.
 
 This actually seems to make all the things that didn’t work now work. 
 However it also means that the thing that did work (booting an instance) no 
 longer works.
 This is an easy fix and just requires nova-conductor to call the 
 run_instance scheduler rpcapi method with version 2.9 as opposed the 
 icehouse version 3.0.
 I don’t think anything has changed here so this might be an easy fix that 
 could be pushed upstream. It just needs to change the scheduler rpcapi to be 
 aware what version it can use.
 I changed the upgrade_levels scheduler=havana but that wasn’t handled by the 
 scheduler rpcapi and just gave a version not new enough exception.
 
 I think I’m making progress…..
 
 Cool. So, what is tested upstream is upgrading everything except 
 nova-compute. You could try upgrading nova-scheduler as well.

Yeah that fixes it although I don’t think that’s needed. Ideally what I want to 
do is upgrade our API cell to Icehouse then one at a time upgrade each of our 
compute cells.
This is possible in 2 ways:

1. Hack the object_backport code in havana to handle icehouse objects (and a 
few other minor things I mentioned earlier)
2. Run an Icehouse conductor in the compute cell 

Still figuring out the best way.

 Although, I didn’t think we had any build path going through conductor yet. 
 Do you happen to have a traceback from that? (Curious what the call path 
 looks like)

2014-04-25 08:28:35.975 1171 ERROR oslo.messaging.rpc.dispatcher [-] Exception 
during message handling: Specified RPC version cap, 2.9, is too low. Needs to 
be higher than 3.0.
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher Traceback 
(most recent call last):
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/nova-icehouse/.venv/local/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 133, in _dispatch_and_reply
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher 
incoming.message))
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/nova-icehouse/.venv/local/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 176, in _dispatch
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher return 
self._do_dispatch(endpoint, method, ctxt, args)
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/nova-icehouse/.venv/local/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 122, in _do_dispatch
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher result = 
getattr(endpoint, method)(ctxt, **new_args)
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/nova-icehouse/nova/conductor/manager.py, line 797, in build_instances
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher 
legacy_bdm_in_spec=legacy_bdm)
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/nova-icehouse/nova/scheduler/rpcapi.py, line 116, in run_instance
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher 
cctxt.cast(ctxt, 'run_instance', **msg_kwargs)
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/nova-icehouse/.venv/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py,
 line 130, in cast
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher 
self._check_version_cap(msg.get('version'))
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/nova-icehouse/.venv/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py,
 line 115, in _check_version_cap
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher 
version_cap=self.version_cap)
2014-04-25 08:28:35.975 1171 TRACE oslo.messaging.rpc.dispatcher 
RPCVersionCapError: Specified RPC version cap, 2.9, is too low. Needs to be 
higher than 3.0.


Sam___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] [nova] Havana - Icehouse upgrades with cells

2014-04-23 Thread Sam Morrison
Yeah I’m not sure what’s going on, I removed my hacks and tried it using the 
conductor rpcapi service and got what I think is a recursive call in 
nova-conductor.

Added more details to https://bugs.launchpad.net/nova/+bug/1308805

I’m thinking there maybe something missing in the stable/havana branch or else 
cells is doing something different when it comes to objects.
I don’t think it is a cells issue though as debugging it, it seems like it just 
can’t back port a 1.13 object to 1.9.

Cheers,
Sam


On 23 Apr 2014, at 1:01 am, Chris Behrens cbehr...@codestud.com wrote:

 
 On Apr 19, 2014, at 11:08 PM, Sam Morrison sorri...@gmail.com wrote:
 
 Thanks for the info Chris, I’ve actually managed to get things working. 
 Haven’t tested everything fully but seems to be working pretty good.
 
 On 19 Apr 2014, at 7:26 am, Chris Behrens cbehr...@codestud.com wrote:
 
 The problem here is that Havana is not going to know how to backport the 
 Icehouse object, even if had the conductor methods to do so… unless you’re 
 running the Icehouse conductor. But yes, your nova-computes would also need 
 the code to understand to hit conductor to do the backport, which we must 
 not have in Havana?
 
 OK this conductor api method was actually back ported to Havana, it kept 
 it’s 1.62 version for the method but in Havana conductor manager it is set 
 to 1.58.
 That is easily fixed but then it gets worse. I may be missing something but 
 the object_backport method doesn’t work at all and looking at the signature 
 never worked?
 I’ve raised a bug: https://bugs.launchpad.net/nova/+bug/1308805
 
 (CCing openstack-dev and Dan Smith)
 
 That looked wrong to me as well, and then I talked with Dan Smith and he 
 reminded me the RPC deserializer would turn that primitive into a an object 
 on the conductor side. The primitive there is the full primitive we use to 
 wrap the object with the versioning information, etc.
 
 Does your backport happen to not pass the full object primitive?  Or maybe 
 missing the object RPC deserializer on conductor? (I would think that would 
 have to be set in Havana)  nova/service.py would have:
 
 194 serializer = objects_base.NovaObjectSerializer()
 195
 196 self.rpcserver = rpc.get_server(target, endpoints, serializer)
 197 self.rpcserver.start()
 
 I’m guessing that’s there… so I would think maybe the object_backport call 
 you have is not passing the full primitive.
 
 I don’t have the time to peak at your code on github right this second, but 
 maybe later. :)
 
 - Chris
 
 
 
 This also means that if you don’t want your computes on Icehouse yet, you 
 must actually be using nova-conductor and not use_local=True for it. (I saw 
 the patch go up to fix the objects use of conductor API… so I’m guessing 
 you must be using local right now?)
 
 Yeah we still haven’t moved to use conductor so if you also don’t use 
 conductor you’ll need the simple fix at bug: 
 https://bugs.launchpad.net/nova/+bug/1308811
 
 So, I think an upgrade process could be:
 
 1) Backport the ‘object backport’ code into Havana.
 2) Set up *Icehouse* nova-conductor in your child cells and use_local=False 
 on your nova-computes
 3) Restart your nova-computes.
 4) Update *all* nova-cells processes (in all cells) to Icehouse. You can 
 keep use_local=False on these, but you’ll need that object conductor API 
 patch.
 
 At this point you’d have all nova-cells and all nova-conductors on Icehouse 
 and everything else on Havana. If the Havana computes are able to talk to 
 the Icehouse conductors, they should be able to backport any newer object 
 versions. Same with nova-cells receiving older objects from nova-api. It 
 should be able to backport them.
 
 After this, you should be able to upgrade nova-api… and then probably 
 upgrade your nova-computes on a cell-by-cell basis.
 
 I don’t *think* nova-scheduler is getting objects yet, especially if you’re 
 somehow magically able to get builds to work in what you tested so far. :) 
 But if it is, you may find that you need to insert an upgrade of your 
 nova-schedulers to Icehouse between steps 3 and 4 above…or maybe just after 
 #4… so that it can backport objects, also.
 
 I still doubt this will work 100%… but I dunno. :)  And I could be missing 
 something… but… I wonder if that makes sense?
 
 What I have is an Icehouse API cell and a Havana compute cell and havana 
 compute nodes with the following changes:
 
 Change the method signature of attach_volume to match icehouse, the 
 additional arguments are optional and don’t seem to break things if you 
 ignore them.
 https://bugs.launchpad.net/nova/+bug/1308846
 
 Needed a small fix for unlocking, there is a race condition that I have a 
 fix for but haven’t pushed up.
 
 Then I hacked up a fix for object back porting.
 The code is at 
 https://github.com/NeCTAR-RC/nova/commits/nectar/havana-icehouse-compat
 The last three commits are the fixes needed. 
 I still need to push up the unlocking one

[openstack-dev] Glance Icehouse RC bugs

2014-04-07 Thread Sam Morrison
Hi,

We’ve found a couple of bugs in glance RC. They both have simple fixes that fix 
some major features, I’m wondering if some glance experts can cast their eye on 
them and if they qualify for icehouse.

glance registry v2 doesn't work (This has got some attention already, thanks)
https://bugs.launchpad.net/glance/+bug/1302351
https://review.openstack.org/#/c/85313/

v2 API can't create image
https://bugs.launchpad.net/glance/+bug/1302345
https://review.openstack.org/#/c/85918/


Thanks,
Sam


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova][Cells] compute api and objects

2013-12-09 Thread Sam Morrison
Hi,

I’m trying to fix up some cells issues related to objects. Do all compute api 
methods take objects now?
cells is still sending DB objects for most methods (except start and stop) and 
I know there are more than that.

Eg. I know lock/unlock, shelve/unshelve take objects, I assume there are others 
if not all methods now?

Cheers,
Sam



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Shall backward compatibility env. vars be removed from python-clients?

2013-11-08 Thread Sam Morrison
I think you need to do option 2 and print deprecation warnings, then the 
question becomes for how long.

I think there is a policy for this and that it is deprecate in N remove in N+1
Clients are a bit different so maybe keep it for ~6 months?

Sam



On 8 Nov 2013, at 10:49 am, Leandro Costantino leandro.i.costant...@intel.com 
wrote:

 Hi all,
 
 Clients like python-novaclient/cinderclient/trove still support NOVA_*, 
 CINDER_*, TROVE_* variables for 
 backward compatibility support,while clients like 
 python-neutronclient/keystoneclient only supports OS_* env vars.
 
 
 The following bug (#1061491, see review), mentions that it may be confusing 
 if novaclient for instance, 
 silently accept those variables without even being mentioned on the current 
 help/documentation neither warning the user at all. 
 (User has exported NOVA_USERNAME/CINDER_USERNAME/etc instead of OS_USERNAME 
 etc.)
 
 As Kevin suggested on the review, (https://review.openstack.org/#/c/55588/) 
 “we need to make sure that we have consensus that enough time has  passed to 
 drop the old variables”.
 
 I would like to hear opinions about this. Some options that i can think of 
 are:
 
 1) Remove NOVA_USERNAME, NOVA_PASSWORD and NOVA_PROJECT_ID support 
   from novaclient. Same for other clients allowing vars other than 
 OS_USERNAME,
   OS_PASSWORD, OS_TENANT_ID for this specific options.
 2) Warn about deprecation if they are being used.
 3) Ignore this topic, keep everything as today and just close the 
 'bug/suggestion'.
 4) Other?
 
 Best Regards
 -Leandro
 
 PS: shall this message belong to another ML, please, let me know.
 
 
 
 
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] upgrade_levels and cells

2013-10-27 Thread Sam Morrison
Hey,

I’ve been playing with Grizzly - Havana upgrade paths with cells.
Having Havana at the api cell with Grizzly compute cells won’t work (don’t 
think it’s supposed too?)

Having Grizzly at api level with Havana compute cell almost works, it seems the 
only major issue is the block_device_mappings.

When havana child gets a parent grizzly build request the block_device_mapping 
is an empty list, when it come from Havana it has something like:

   block_device_mapping = [{'boot_index': 0,
 'connection_info': None,
 'delete_on_termination': True,
 'destination_type': 'local',
 'device_name': None,
 'device_type': 'disk',
 'disk_bus': None,
 'guest_format': None,
 'image_id': ‘XXX-XXX-XXX-XXX',
 'instance_uuid': XXX-XXX-XXX-XXX,
 'no_device': None,
 'snapshot_id': None,
 'source_type': 'image',
 'volume_id': None,
 'volume_size': None}]


I don’t know much about the new BDM stuff but it looks like not too much code 
needed to get this working?
Was wondering if anyone else is looking into this kind of stuff or is willing 
to help?

Cheers,
Sam


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Swift] container forwarding/cluster federation blueprint

2013-10-13 Thread Sam Morrison
Hi,

I'd be interested in the differences this has to using swift global clusters?

Cheers,
Sam



On 12/10/2013, at 3:49 AM, Coles, Alistair alistair.co...@hp.com wrote:

 We’ve just committed a first set of patches to gerrit that address this 
 blueprint:
  
 https://blueprints.launchpad.net/swift/+spec/cluster-federation
  
 Quoting from that page: “The goal of this work is to enable account contents 
 to be dispersed across multiple clusters, motivated by (a) accounts that 
 might grow beyond the remaining capacity of a single cluster and (b) clusters 
 offering differentiated service levels such as different levels of redundancy 
 or different storage tiers. Following feedback at the Portland summit, the 
 work is initially limited to dispersal at the container level, i.e. each 
 container within an account may be stored on a different cluster, whereas 
 every object within a container will be stored on the same cluster.”
  
 It is work in progress, but we’d welcome feedback on this thread, or in 
 person for anyone who might be at the hackathon in Austin next week.
 The bulk of the new features are in this patch: 
 https://review.openstack.org/51236 (Middleware module for container 
 forwarding.)
 
 There’s a couple of patches refactoring/adding support to existing modules:
 https://review.openstack.org/51242 (Refactor proxy/controllers obj  base 
 http code)
 https://review.openstack.org/51228 (Store x-container-attr-* headers in 
 container db.)
 
 And some tests…
 https://review.openstack.org/51245 (Container-forwarding unit and functional 
 tests)
 
  
 Regards,
 Alistair Coles, Eric Deliot, Aled Edwards
  
 HP Labs, Bristol, UK
 -
 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 
 1HN . Registered No: 690597 England
 The contents of this message and any attachments to it are confidential and 
 may be legally privileged. If you have received this message in error, you 
 should delete it from your system immediately and advise the sender.
  
  
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev