Re: [Openstack-operators] Security group rules not working on instances kilo

2016-04-21 Thread raju
Thanks Kris, issue resolved after adding below lines to sysctl.conf

net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1


appreciate your help, thanks a lot again.




On Thu, Apr 21, 2016 at 8:25 PM, Kris G. Lindgren 
wrote:

> Make sure that the bridges are being created (1 bridge per vm) they should
> be named close to the vm tap device name.  Then make sure that you have
> bridge nf-call-* files enabled:
>
> http://wiki.libvirt.org/page/Net.bridge.bridge-nf-call_and_sysctl.conf
>
> Under hybrid mode what happens is a linux bridge (not an ovs bridge
> (brctl)) is created per vm.  The vm's tap device is plugged into this
> bridge.  A veth is created that spans from the vm's linux bridge to br-int
> and is plugged at both ends.  This is done because older versions of OVS
> did not have support (or efficient support) for doing firewalling.  The
> problem is that in the kernel, packets traversing the Openvswitch code
> paths are unable to be hooked into by netfilter.  So the linux bridge is
> created solely to allow the VM traffic to pass through a netfilter
> hookable location, so security groups work.
>
> You need at a minimum to make sure
> /proc/sys/net/bridge/bridge-nf-call-iptables is set to 1.  If its not then
> when you look at the iptables rules that are created – you will see that
> none of the security group chains are seeing traffic.
> ___
> Kris Lindgren
> Senior Linux Systems Engineer
> GoDaddy
>
> From: raju 
> Date: Thursday, April 21, 2016 at 5:26 PM
> To: "openstack-operators@lists.openstack.org" <
> openstack-operators@lists.openstack.org>
> Subject: [Openstack-operators] Security group rules not working on
> instances kilo
>
> Hi,
>
> I am running into a issue where security group rules are not applying to
> instances when I create a new security group with default rules it should
> reject all incoming traffic but it is allowing everything without blocking
>
> here is my config for nova :
>
> security_group_api = neutron
> firewall_driver = nova.virt.firewall.NoopFirewallDriver
>
> and in ml2.con.ini
>
> firewall_driver =
> neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
>
> iptables service is running on all the nodes, please suggest me if  I miss
> anything.
>
>
> Thanks.
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Security group rules not working on instances kilo

2016-04-21 Thread Kris G. Lindgren
Make sure that the bridges are being created (1 bridge per vm) they should be 
named close to the vm tap device name.  Then make sure that you have bridge 
nf-call-* files enabled:

http://wiki.libvirt.org/page/Net.bridge.bridge-nf-call_and_sysctl.conf

Under hybrid mode what happens is a linux bridge (not an ovs bridge (brctl)) is 
created per vm.  The vm's tap device is plugged into this bridge.  A veth is 
created that spans from the vm's linux bridge to br-int and is plugged at both 
ends.  This is done because older versions of OVS did not have support (or 
efficient support) for doing firewalling.  The problem is that in the kernel, 
packets traversing the Openvswitch code paths are unable to be hooked into by 
netfilter.  So the linux bridge is created solely to allow the VM traffic to 
pass through a netfilter hookable location, so security groups work.

You need at a minimum to make sure /proc/sys/net/bridge/bridge-nf-call-iptables 
is set to 1.  If its not then when you look at the iptables rules that are 
created – you will see that none of the security group chains are seeing 
traffic.
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: raju >
Date: Thursday, April 21, 2016 at 5:26 PM
To: 
"openstack-operators@lists.openstack.org"
 
>
Subject: [Openstack-operators] Security group rules not working on instances 
kilo

Hi,

I am running into a issue where security group rules are not applying to 
instances when I create a new security group with default rules it should 
reject all incoming traffic but it is allowing everything without blocking

here is my config for nova :

security_group_api = neutron
firewall_driver = nova.virt.firewall.NoopFirewallDriver

and in ml2.con.ini

firewall_driver = 
neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

iptables service is running on all the nodes, please suggest me if  I miss 
anything.


Thanks.
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Security group rules not working on instances kilo

2016-04-21 Thread raju
Hi,

I am running into a issue where security group rules are not applying to
instances when I create a new security group with default rules it should
reject all incoming traffic but it is allowing everything without blocking

here is my config for nova :

security_group_api = neutron
firewall_driver = nova.virt.firewall.NoopFirewallDriver

and in ml2.con.ini

firewall_driver =
neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

iptables service is running on all the nodes, please suggest me if  I miss
anything.


Thanks.
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

2016-04-21 Thread Ajay Kalambur (akalambu)
See something similar with heartbeat seems like reconnection attempt fails

2016-04-21 15:27:01.294 6 DEBUG nova.openstack.common.loopingcall 
[req-9c9785ed-2598-4b95-a40c-307f8d7e8416 - - - - -] Dynamic looping call 
> sleeping for 60.00 seconds _inner 
/usr/lib/python2.7/site-packages/nova/openstack/common/loopingcall.py:132

2016-04-21 15:28:01.294 6 DEBUG nova.openstack.common.periodic_task 
[req-9c9785ed-2598-4b95-a40c-307f8d7e8416 - - - - -] Running periodic task 
ComputeManager._instance_usage_audit run_periodic_tasks 
/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py:219

2016-04-21 15:28:01.295 6 DEBUG nova.openstack.common.periodic_task 
[req-9c9785ed-2598-4b95-a40c-307f8d7e8416 - - - - -] Running periodic task 
ComputeManager._poll_rebooting_instances run_periodic_tasks 
/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py:219

2016-04-21 15:28:01.295 6 DEBUG nova.openstack.common.periodic_task 
[req-9c9785ed-2598-4b95-a40c-307f8d7e8416 - - - - -] Running periodic task 
ComputeManager._poll_volume_usage run_periodic_tasks 
/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py:219

2016-04-21 15:28:01.295 6 DEBUG nova.openstack.common.periodic_task 
[req-9c9785ed-2598-4b95-a40c-307f8d7e8416 - - - - -] Running periodic task 
ComputeManager._cleanup_running_deleted_instances run_periodic_tasks 
/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py:219

2016-04-21 15:28:48.421 6 ERROR oslo_messaging._drivers.impl_rabbit [-] 
Declaring queue failed with (Socket closed), retrying

2016-04-21 15:28:48.422 6 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed 
to consume message from queue: Socket closed

2016-04-21 15:28:48.422 6 ERROR oslo_messaging._drivers.amqpdriver [-] Failed 
to process incoming message, retrying...

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver Traceback 
(most recent call last):

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
228, in poll

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver 
self.conn.consume(limit=1)

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 
1194, in consume

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver 
six.next(it)

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 
1105, in iterconsume

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver 
error_callback=_error_callback)

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 
885, in ensure

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver 'retry': 
retry}

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver TypeError: 
%d format: a number is required, not NoneType

2016-04-21 15:28:48.422 6 TRACE oslo_messaging._drivers.amqpdriver

2016-04-21 15:28:48.430 6 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP 
server on 10.23.221.110:5672 is unreachable: connection already closed. Trying 
again in 1 seconds.

2016-04-21 15:29:01.302 6 ERROR nova.openstack.common.periodic_task 
[req-9c9785ed-2598-4b95-a40c-307f8d7e8416 - - - - -] Error during 
ComputeManager._cleanup_running_deleted_instances: Timed out waiting for a 
reply to message ID c0c46bd3ebfb4441981617e089c5a18d

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task Traceback 
(most recent call last):

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task   File 
"/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 
224, in run_periodic_tasks

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task 
task(self, context)

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6410, in 
_cleanup_running_deleted_instances

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task for 
instance in self._running_deleted_instances(context):

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6464, in 
_running_deleted_instances

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task 
instances = self._get_instances_on_driver(context, filters)

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 796, in 
_get_instances_on_driver

2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task 
context, filters, use_slave=True)

2016-04-21 15:29:01.302 6 TRACE 

Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

2016-04-21 Thread Ajay Kalambur (akalambu)
We are seeing issues only on client side as of now.
But we do have
net.ipv4.tcp_retries2 = 3 set

Ajay

From: "Edmund Rhudy (BLOOMBERG/ 731 LEX)" 
>
Reply-To: "Edmund Rhudy (BLOOMBERG/ 731 LEX)" 
>
Date: Thursday, April 21, 2016 at 12:11 PM
To: Ajay Kalambur >
Cc: 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Are you seeing issues only on the client side, or anything on the broker side? 
We were having issues with nodes not successfully reconnecting and ended up 
making a number of changes on the broker side to improve resiliency (upgrading 
to RabbitMQ 3.5.5 or higher, reducing net.ipv4.tcp_retries2 to evict failed 
connections faster, configuring heartbeats in RabbitMQ to detect failed clients 
more quickly).

From: akala...@cisco.com
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo
Do you recommend both or can I do away with the system timers and just keep the 
heartbeat?
Ajay


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 11:54 AM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Yea, that only fixes part of the issue.  The other part is getting the 
openstack messaging code itself to figure out the connection its using is no 
longer valid.  Heartbeats by itself solved 90%+ of our issues with rabbitmq and 
nodes being disconnected and never reconnecting.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 12:51 PM
To: "Kris G. Lindgren" >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Trying that now. I had aggressive system keepalive timers before

net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 5


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 11:50 AM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Do you have rabbitmq/oslo messaging heartbeats enabled?

If you aren't using heartbeats it will take a long time  for the nova-compute 
agent to figure out that its actually no longer attached to anything.  
Heartbeat does periodic checks against rabbitmq and will catch this state and 
reconnect.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 11:43 AM
To: 
"openstack-operators@lists.openstack.org"
 
>
Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo


Hi
I am seeing on Kilo if I bring down one contoller node sometimes some computes 
report down forever.
I need to restart the compute service on compute node to recover. Looks like 
oslo is not reconnecting in nova-compute
Here is the Trace from nova-compute
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in 
call
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
retry=self.retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in 
_send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
timeout=timeout, retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
350, in send
2016-04-19 20:25:39.090 6 TRACE 

[Openstack-operators] [Openstack] OpenStack Mitaka for Ubuntu 14.04 LTS and Ubuntu 16.04 LTS

2016-04-21 Thread Corey Bryant
Hi All,

The Ubuntu OpenStack Engineering team is pleased to announce the general
availability of OpenStack Mitaka in Ubuntu 16.04 LTS and for Ubuntu 14.04
LTS via the Ubuntu Cloud Archive.

Ubuntu 14.04 LTS


You can enable the Ubuntu Cloud Archive for OpenStack Mitaka on Ubuntu
14.04 installations by running the following commands (make sure you have
an up-to-date software-properties-common package first):

  sudo add-apt-repository cloud-archive:mitaka
  sudo apt-get update

The Ubuntu Cloud Archive for Mitaka includes updates for Nova, Glance,
Keystone, Neutron, Neutron-FWaaS, Neutron-LBaaS, Neutron-VPNaaS, Cinder,
Swift, Horizon, Ceilometer, Aodh, Heat, Designate, Barbican, Manila,
Mistral, Murano, Ironic, Trove, Sahara, Senlin, and Zaqar; Ceph (10.1.2),
Nova-LXD (13.0.0), RabbitMQ (3.5.7), QEMU (1.2.5), libvirt (1.3.1),
OpenvSwitch (2.5.0) and DPDK (2.2.0) backports from Ubuntu 16.04
development have also been provided.

You can see the full list of packages and versions at [0].

Ubuntu 16.04 LTS


No extra steps are required; just start installing OpenStack!

Release notes for Ubuntu 16.04 LTS can be found at [1].

Reporting bugs
-

If you have any issues, please report bugs using the 'ubuntu-bug' tool:

  sudo ubuntu-bug nova-conductor

This will ensure that bugs get logged in the right place in Launchpad.

Also feel free to poke anyone on our team in #ubuntu-server (coreycb,
ddellav, beisner, jamespage) if you have any issues.

Finally, thank you to the entire OpenStack community and everyone who's
made OpenStack Mitaka a reality!

[0]
http://reqorts.qa.ubuntu.com/reports/ubuntu-server/cloud-archive/mitaka_versions.html
[1] https://wiki.ubuntu.com/XenialXerus/ReleaseNotes

-- 
Regards,
Corey Bryant
Ubuntu Core Developer
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

2016-04-21 Thread Ajay Kalambur (akalambu)
Thanks Kris that’s good information will try out your suggestions
Ajay


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 12:08 PM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

We just use heartbeat.  But from what I recall other people have good luck with 
both set. I would keep them if they are already set , maybe just dial down how 
aggressive they are.  One thing I should mention is that if you have a large 
number of RPC workers, enabling heartbeats will increase cpu consumption about 
1-2% per worker (in our experience).  Since its now doing something with 
rabbitmq every few seconds.  This can also increase load on the rabbitmq side 
as well.  For us having a stable rabbit environment is well worth the tradeoff.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 1:04 PM
To: "Kris G. Lindgren" >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Do you recommend both or can I do away with the system timers and just keep the 
heartbeat?
Ajay


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 11:54 AM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Yea, that only fixes part of the issue.  The other part is getting the 
openstack messaging code itself to figure out the connection its using is no 
longer valid.  Heartbeats by itself solved 90%+ of our issues with rabbitmq and 
nodes being disconnected and never reconnecting.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 12:51 PM
To: "Kris G. Lindgren" >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Trying that now. I had aggressive system keepalive timers before

net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 5


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 11:50 AM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Do you have rabbitmq/oslo messaging heartbeats enabled?

If you aren't using heartbeats it will take a long time  for the nova-compute 
agent to figure out that its actually no longer attached to anything.  
Heartbeat does periodic checks against rabbitmq and will catch this state and 
reconnect.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 11:43 AM
To: 
"openstack-operators@lists.openstack.org"
 
>
Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo


Hi
I am seeing on Kilo if I bring down one contoller node sometimes some computes 
report down forever.
I need to restart the compute service on compute node to recover. Looks like 
oslo is not reconnecting in nova-compute
Here is the Trace from nova-compute
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in 
call
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
retry=self.retry)

Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

2016-04-21 Thread Ajay Kalambur (akalambu)
Do you recommend both or can I do away with the system timers and just keep the 
heartbeat?
Ajay


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 11:54 AM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Yea, that only fixes part of the issue.  The other part is getting the 
openstack messaging code itself to figure out the connection its using is no 
longer valid.  Heartbeats by itself solved 90%+ of our issues with rabbitmq and 
nodes being disconnected and never reconnecting.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 12:51 PM
To: "Kris G. Lindgren" >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Trying that now. I had aggressive system keepalive timers before

net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 5


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 11:50 AM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Do you have rabbitmq/oslo messaging heartbeats enabled?

If you aren't using heartbeats it will take a long time  for the nova-compute 
agent to figure out that its actually no longer attached to anything.  
Heartbeat does periodic checks against rabbitmq and will catch this state and 
reconnect.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 11:43 AM
To: 
"openstack-operators@lists.openstack.org"
 
>
Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo


Hi
I am seeing on Kilo if I bring down one contoller node sometimes some computes 
report down forever.
I need to restart the compute service on compute node to recover. Looks like 
oslo is not reconnecting in nova-compute
Here is the Trace from nova-compute
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in 
call
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
retry=self.retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in 
_send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
timeout=timeout, retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
350, in send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
339, in _send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db result = 
self._waiter.wait(msg_id, timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
243, in wait
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db message = 
self.waiters.get(msg_id, timeout=timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
149, in get
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 'to message ID 
%s' % msg_id)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db MessagingTimeout: 
Timed out waiting for a reply to message ID e064b5f6c8244818afdc5e91fff8ebf1


Any thougths. I am at stable/kilo for oslo

Ajay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org

Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

2016-04-21 Thread Kris G. Lindgren
Yea, that only fixes part of the issue.  The other part is getting the 
openstack messaging code itself to figure out the connection its using is no 
longer valid.  Heartbeats by itself solved 90%+ of our issues with rabbitmq and 
nodes being disconnected and never reconnecting.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 12:51 PM
To: "Kris G. Lindgren" >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Trying that now. I had aggressive system keepalive timers before

net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 5


From: "Kris G. Lindgren" >
Date: Thursday, April 21, 2016 at 11:50 AM
To: Ajay Kalambur >, 
"openstack-operators@lists.openstack.org"
 
>
Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

Do you have rabbitmq/oslo messaging heartbeats enabled?

If you aren't using heartbeats it will take a long time  for the nova-compute 
agent to figure out that its actually no longer attached to anything.  
Heartbeat does periodic checks against rabbitmq and will catch this state and 
reconnect.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 11:43 AM
To: 
"openstack-operators@lists.openstack.org"
 
>
Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo


Hi
I am seeing on Kilo if I bring down one contoller node sometimes some computes 
report down forever.
I need to restart the compute service on compute node to recover. Looks like 
oslo is not reconnecting in nova-compute
Here is the Trace from nova-compute
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in 
call
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
retry=self.retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in 
_send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
timeout=timeout, retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
350, in send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
339, in _send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db result = 
self._waiter.wait(msg_id, timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
243, in wait
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db message = 
self.waiters.get(msg_id, timeout=timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
149, in get
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 'to message ID 
%s' % msg_id)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db MessagingTimeout: 
Timed out waiting for a reply to message ID e064b5f6c8244818afdc5e91fff8ebf1


Any thougths. I am at stable/kilo for oslo

Ajay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo

2016-04-21 Thread Kris G. Lindgren
Do you have rabbitmq/oslo messaging heartbeats enabled?

If you aren't using heartbeats it will take a long time  for the nova-compute 
agent to figure out that its actually no longer attached to anything.  
Heartbeat does periodic checks against rabbitmq and will catch this state and 
reconnect.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Ajay Kalambur (akalambu)" >
Date: Thursday, April 21, 2016 at 11:43 AM
To: 
"openstack-operators@lists.openstack.org"
 
>
Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo


Hi
I am seeing on Kilo if I bring down one contoller node sometimes some computes 
report down forever.
I need to restart the compute service on compute node to recover. Looks like 
oslo is not reconnecting in nova-compute
Here is the Trace from nova-compute
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in 
call
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
retry=self.retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in 
_send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
timeout=timeout, retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
350, in send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
339, in _send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db result = 
self._waiter.wait(msg_id, timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
243, in wait
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db message = 
self.waiters.get(msg_id, timeout=timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
149, in get
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 'to message ID 
%s' % msg_id)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db MessagingTimeout: 
Timed out waiting for a reply to message ID e064b5f6c8244818afdc5e91fff8ebf1


Any thougths. I am at stable/kilo for oslo

Ajay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [oslo]nova compute reconnection Issue Kilo

2016-04-21 Thread Ajay Kalambur (akalambu)

Hi
I am seeing on Kilo if I bring down one contoller node sometimes some computes 
report down forever.
I need to restart the compute service on compute node to recover. Looks like 
oslo is not reconnecting in nova-compute
Here is the Trace from nova-compute
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in 
call
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
retry=self.retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in 
_send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 
timeout=timeout, retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
350, in send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=retry)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
339, in _send
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db result = 
self._waiter.wait(msg_id, timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
243, in wait
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db message = 
self.waiters.get(msg_id, timeout=timeout)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 
149, in get
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 'to message ID 
%s' % msg_id)
2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db MessagingTimeout: 
Timed out waiting for a reply to message ID e064b5f6c8244818afdc5e91fff8ebf1


Any thougths. I am at stable/kilo for oslo

Ajay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [OpenStack-Ansible] Liberty to Mitaka upgrade?

2016-04-21 Thread Jesse Pretorius
On 20 April 2016 at 04:27, Dale Baley  wrote:

> Is there a document or guide for upgrading from Liberty to Mitaka yet?
>

Hi Dale,

The active work to test and implement any plays to assist with upgrades has
not yet been implemented for Liberty->Mitaka. We hope to do this work by
Newton Milestone-1. If you wish to participate in the process for getting
this done then please join us in #openstack-ansible to compare notes.

The starting point for this work is always to begin by using the same
process as is used for minor upgrades [1] in a test environment and to see
whether anything breaks, then to prepare patches to resolve that. As our
roles and plays already automatically detect and deal with things like DB
migrations and the software version changes this may actually work
perfectly well - it's just that we've not had the chance to really get down
to testing the upgrades with focal attention yet.

[1]
http://docs.openstack.org/developer/openstack-ansible/install-guide/app-minorupgrade.html

Best regards,

Jesse
IRC: odyssey4me
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [Performance] Austin Performance Team working group session - please comment

2016-04-21 Thread Dina Belova
Folks,

we're going to have Performance Working Group session during the upcoming
summit - here is the event description
.
Our team was kicked off during Mitaka Summit, so that was our first cycle
:) Please attend the session to learn what we have done and what plans do
we have for Newton.

*Etherpad link:* https://etherpad.openstack.org/p/newton-performance-team

We'll be using this ^^ etherpad during the session, feel free to comment,
add your suggestions of what do you think will be important to be tested
during Newton and what issues seem to be vital to investigate and fix.

See you in Austin :)

Cheers,
Dina
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators