[Openstack-operators] [neutron][connection tracking] OVS connection tracking for a DNS VNF
Hi Has anyone had any experience running a DNS VNF on Openstack. Typically for these VNFs there is a really huge volume of DNS lookups and this translates to entries for udp in the conntrack table Sometimes under load this can lead to nf_conntrack table being FULL The default max on most systems for conntrack is 65536. Some forums suggest increasing this to a very large value to handle large DNS scale. Question I have is there a way to disable OVS connection tracking on a per port basis in neutron. Also folks running this in production do you get this working by tweaking ip_conntrack_max and udp timeout? Ajay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [openstack][placement] Placement API service catalog
Hi Curtis Thanks for the help. You were spot on in pointing out the issue Copy pasted previous nova api haproxy config and forgot to update port Thanks again Ajay On 6/10/17, 1:52 PM, "Curtis" <serverasc...@gmail.com> wrote: >On Sat, Jun 10, 2017 at 11:56 AM, Ajay Kalambur (akalambu) ><akala...@cisco.com> wrote: >> Hi >> I made all the changes as documented in >> https://docs.openstack.org/ocata/install-guide-ubuntu/nova-controller-install.html >> https://docs.openstack.org/ocata/install-guide-ubuntu/nova-compute-install.html >> >> The issue im facing is when nova compute comes up and queries the placement >> API it seems to get a status 300 error code >> 2017-06-10 10:48:27.236 33 ERROR nova.scheduler.client.report >> [req-18ea91e0-a210-42af-a560-5c7697a20604 - - - - -] Failed to create >> resource provider record in placement API for UUID >> d2067675-062b-4550-8631-d23a3b13343b. Got 300: {"choices": [{"status": >> "SUPPORTED", "media-types": [{"base": "application/json", "type": >> "application/vnd.openstack.compute+json;version=2"}], "id": "v2.0", "links": >> [{"href": "http://15.0.0.42:8778/v2/resource_providers;, "rel": "self"}]}, >> {"status": "CURRENT", "media-types": [{"base": "application/json", "type": >> "application/vnd.openstack.compute+json;version=2.1"}], "id": "v2.1", >> "links": [{"href": "http://15.0.0.42:8778/v2.1/resource_providers;, "rel": >> "self"}]}]}. >> > >If I do this against an ocata placement api: > >$ OS_TOKEN=$(openstack token issue -f value -c id) >$ curl -s -H "X-Auth-Token: $OS_TOKEN" http://:8778/ >{"versions": [{"min_version": "1.0", "max_version": "1.4", "id": "v1.0"}]} > >Is you loadbalancer for listening on 8778 but pointing to your nova >api port maybe? (Just a random guess.) > >Thanks, >Curtis. > >> >> >> The symptoms look like service catalog is messed up as even if I stop >> placement API I get this error >> >> Now when I looked at the keystone service catalog it seems fine >> | placement | placement | RegionOne >> | >> || | publicURL: https://172.29.86.12:8778 >> | >> || | internalURL: http://15.0.0.42:8778 >> | >> || | adminURL: http://15.0.0.42:8778 >> >> >> | nova | compute | RegionOne >> | >> || | publicURL: >> https://172.29.86.12:8774/v2.1 | >> || | internalURL: >> http://15.0.0.42:8774/v2.1 | >> || | adminURL: http://15.0.0.42:8774/v2.1 >> >> >> Not sure what I am doing wrong here >> >> Also nova-status upgrade check returns an error >> nova-status upgrade check >> Option "verbose" from group "DEFAULT" is deprecated for removal. Its value >> may be silently ignored in the future. >> {u'versions': [{u'status': u'SUPPORTED', u'updated': >> u'2011-01-21T11:33:21Z', u'links': [{u'href': u'http://15.0.0.42:8778/v2/', >> u'rel': u'self'}], u'min_version': u'', u'version': u'', u'id': u'v2.0'}, >> {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', u'links': >> [{u'href': u'http://15.0.0.42:8778/v2.1/', u'rel': u'self'}], >> u'min_version': u'2.1', u'version': u'2.42', u'id': u'v2.1'}]} >> Error: >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/nova/cmd/status.py", line 457, in >> main >> ret = fn(*fn_args, **fn_kwargs) >> File "/usr/lib/python2.7/site-packages/nova/cmd/status.py", line 387, in >> check >> result = func(self) >> File "/usr/lib/python2.7/site-packages/nova/cmd/status.py", line 202, in >> _check_placement >> max_version = float(versions["versions"][0]["max_version"]) >> KeyError: 'max_version' >> >> This is with Ocata >> >> Ajay >> >> >> ___ >> OpenStack-operators mailing list >> OpenStack-operators@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > > >-- >Blog: serverascode.com ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [openstack][placement] Placement API service catalog
Hi I made all the changes as documented in https://docs.openstack.org/ocata/install-guide-ubuntu/nova-controller-install.html https://docs.openstack.org/ocata/install-guide-ubuntu/nova-compute-install.html The issue im facing is when nova compute comes up and queries the placement API it seems to get a status 300 error code 2017-06-10 10:48:27.236 33 ERROR nova.scheduler.client.report [req-18ea91e0-a210-42af-a560-5c7697a20604 - - - - -] Failed to create resource provider record in placement API for UUID d2067675-062b-4550-8631-d23a3b13343b. Got 300: {"choices": [{"status": "SUPPORTED", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.compute+json;version=2"}], "id": "v2.0", "links": [{"href": "http://15.0.0.42:8778/v2/resource_providers;, "rel": "self"}]}, {"status": "CURRENT", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.compute+json;version=2.1"}], "id": "v2.1", "links": [{"href": "http://15.0.0.42:8778/v2.1/resource_providers;, "rel": "self"}]}]}. The symptoms look like service catalog is messed up as even if I stop placement API I get this error Now when I looked at the keystone service catalog it seems fine | placement | placement | RegionOne | || | publicURL: https://172.29.86.12:8778 | || | internalURL: http://15.0.0.42:8778 | || | adminURL: http://15.0.0.42:8778 | nova | compute | RegionOne | || | publicURL: https://172.29.86.12:8774/v2.1 | || | internalURL: http://15.0.0.42:8774/v2.1 | || | adminURL: http://15.0.0.42:8774/v2.1 Not sure what I am doing wrong here Also nova-status upgrade check returns an error nova-status upgrade check Option "verbose" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future. {u'versions': [{u'status': u'SUPPORTED', u'updated': u'2011-01-21T11:33:21Z', u'links': [{u'href': u'http://15.0.0.42:8778/v2/', u'rel': u'self'}], u'min_version': u'', u'version': u'', u'id': u'v2.0'}, {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', u'links': [{u'href': u'http://15.0.0.42:8778/v2.1/', u'rel': u'self'}], u'min_version': u'2.1', u'version': u'2.42', u'id': u'v2.1'}]} Error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/cmd/status.py", line 457, in main ret = fn(*fn_args, **fn_kwargs) File "/usr/lib/python2.7/site-packages/nova/cmd/status.py", line 387, in check result = func(self) File "/usr/lib/python2.7/site-packages/nova/cmd/status.py", line 202, in _check_placement max_version = float(versions["versions"][0]["max_version"]) KeyError: 'max_version' This is with Ocata Ajay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo
riodic_task context, filters, use_slave=True) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/objects/base.py", line 161, in wrapper 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task args, kwargs) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 335, in object_class_action 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task objver=objver, args=args, kwargs=kwargs) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task retry=self.retry) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task timeout=timeout, retry=retry) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 381, in send 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task retry=retry) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 370, in _send 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task result = self._waiter.wait(msg_id, timeout) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 274, in wait 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task message = self.waiters.get(msg_id, timeout=timeout) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 180, in get 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task 'to message ID %s' % msg_id) 2016-04-21 15:29:01.302 6 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID c0c46bd3ebfb4441981617e089c5a18d From: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 12:11 PM To: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Thanks Kris that’s good information will try out your suggestions Ajay From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 12:08 PM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo We just use heartbeat. But from what I recall other people have good luck with both set. I would keep them if they are already set , maybe just dial down how aggressive they are. One thing I should mention is that if you have a large number of RPC workers, enabling heartbeats will increase cpu consumption about 1-2% per worker (in our experience). Since its now doing something with rabbitmq every few seconds. This can also increase load on the rabbitmq side as well. For us having a stable rabbit environment is well worth the tradeoff. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 1:04 PM To: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Do you recommend both or can I do away with the system timers and just keep the heartbeat? Ajay From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 11:54 AM To:
Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo
We are seeing issues only on client side as of now. But we do have net.ipv4.tcp_retries2 = 3 set Ajay From: "Edmund Rhudy (BLOOMBERG/ 731 LEX)" <erh...@bloomberg.net<mailto:erh...@bloomberg.net>> Reply-To: "Edmund Rhudy (BLOOMBERG/ 731 LEX)" <erh...@bloomberg.net<mailto:erh...@bloomberg.net>> Date: Thursday, April 21, 2016 at 12:11 PM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>> Cc: "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Are you seeing issues only on the client side, or anything on the broker side? We were having issues with nodes not successfully reconnecting and ended up making a number of changes on the broker side to improve resiliency (upgrading to RabbitMQ 3.5.5 or higher, reducing net.ipv4.tcp_retries2 to evict failed connections faster, configuring heartbeats in RabbitMQ to detect failed clients more quickly). From: akala...@cisco.com<mailto:akala...@cisco.com> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Do you recommend both or can I do away with the system timers and just keep the heartbeat? Ajay From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 11:54 AM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Yea, that only fixes part of the issue. The other part is getting the openstack messaging code itself to figure out the connection its using is no longer valid. Heartbeats by itself solved 90%+ of our issues with rabbitmq and nodes being disconnected and never reconnecting. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 12:51 PM To: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Trying that now. I had aggressive system keepalive timers before net.ipv4.tcp_keepalive_intvl = 10 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 5 From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 11:50 AM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Do you have rabbitmq/oslo messaging heartbeats enabled? If you aren't using heartbeats it will take a long time for the nova-compute agent to figure out that its actually no longer attached to anything. Heartbeat does periodic checks against rabbitmq and will catch this state and reconnect. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 11:43 AM To: "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Hi I am seeing on Kilo if I bring down one contoller node sometimes some computes report down forever. I need to restart the compute service on compute node to recover. Looks like oslo is not reconnecting in nova-compute Here is the Trace from nova-compute 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=self.retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/s
Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo
Thanks Kris that’s good information will try out your suggestions Ajay From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 12:08 PM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo We just use heartbeat. But from what I recall other people have good luck with both set. I would keep them if they are already set , maybe just dial down how aggressive they are. One thing I should mention is that if you have a large number of RPC workers, enabling heartbeats will increase cpu consumption about 1-2% per worker (in our experience). Since its now doing something with rabbitmq every few seconds. This can also increase load on the rabbitmq side as well. For us having a stable rabbit environment is well worth the tradeoff. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 1:04 PM To: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Do you recommend both or can I do away with the system timers and just keep the heartbeat? Ajay From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 11:54 AM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Yea, that only fixes part of the issue. The other part is getting the openstack messaging code itself to figure out the connection its using is no longer valid. Heartbeats by itself solved 90%+ of our issues with rabbitmq and nodes being disconnected and never reconnecting. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 12:51 PM To: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Trying that now. I had aggressive system keepalive timers before net.ipv4.tcp_keepalive_intvl = 10 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 5 From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 11:50 AM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Do you have rabbitmq/oslo messaging heartbeats enabled? If you aren't using heartbeats it will take a long time for the nova-compute agent to figure out that its actually no longer attached to anything. Heartbeat does periodic checks against rabbitmq and will catch this state and reconnect. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 11:43 AM To: "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Hi I am seeing on Kilo if I bring down one contoller node sometimes some computes report down forever. I need to
Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo
Do you recommend both or can I do away with the system timers and just keep the heartbeat? Ajay From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 11:54 AM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Yea, that only fixes part of the issue. The other part is getting the openstack messaging code itself to figure out the connection its using is no longer valid. Heartbeats by itself solved 90%+ of our issues with rabbitmq and nodes being disconnected and never reconnecting. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 12:51 PM To: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Trying that now. I had aggressive system keepalive timers before net.ipv4.tcp_keepalive_intvl = 10 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 5 From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>> Date: Thursday, April 21, 2016 at 11:50 AM To: Ajay Kalambur <akala...@cisco.com<mailto:akala...@cisco.com>>, "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Do you have rabbitmq/oslo messaging heartbeats enabled? If you aren't using heartbeats it will take a long time for the nova-compute agent to figure out that its actually no longer attached to anything. Heartbeat does periodic checks against rabbitmq and will catch this state and reconnect. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: "Ajay Kalambur (akalambu)" <akala...@cisco.com<mailto:akala...@cisco.com>> Date: Thursday, April 21, 2016 at 11:43 AM To: "openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>> Subject: [Openstack-operators] [oslo]nova compute reconnection Issue Kilo Hi I am seeing on Kilo if I bring down one contoller node sometimes some computes report down forever. I need to restart the compute service on compute node to recover. Looks like oslo is not reconnecting in nova-compute Here is the Trace from nova-compute 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=self.retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db timeout=timeout, retry=retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db result = self._waiter.wait(msg_id, timeout) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db message = self.waiters.get(msg_id, timeout=timeout) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 'to message ID %s' % msg_id) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.driv
[Openstack-operators] [oslo]nova compute reconnection Issue Kilo
Hi I am seeing on Kilo if I bring down one contoller node sometimes some computes report down forever. I need to restart the compute service on compute node to recover. Looks like oslo is not reconnecting in nova-compute Here is the Trace from nova-compute 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=self.retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db timeout=timeout, retry=retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db retry=retry) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db result = self._waiter.wait(msg_id, timeout) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db message = self.waiters.get(msg_id, timeout=timeout) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db 'to message ID %s' % msg_id) 2016-04-19 20:25:39.090 6 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID e064b5f6c8244818afdc5e91fff8ebf1 Any thougths. I am at stable/kilo for oslo Ajay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [openstack-operators] Fernet key rotation
Hi In a multi node HA deployment for production does key rotate need a keystone process reboot or should we just run the fernet rotate on one node and distribute it without restarting any process I presume keystone can handle the rotation without a restart? I also assume this key rotation can happen without a maintenance window What do folks typically do in production and how often do you rotate keys Ajay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Keystone token HA
Hi If we deploy Keystone using memcached as token backend we see that bringing down 1 of 3 memcache servers results in some tokens getting invalidated. Does memcached not support replication of tokens So if we wanted HA w.r.t keystone tokens should we use SQL backend for tokens? Ajay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Keystone audit logs with haproxy
Hi Have a deployment where keystone sits behind a ha proxy node. Now authentication requests are made to a vip. Problem is when there is an authentication failure we cannot track the remote ip that failed login as all authentication failures show the VIP ip since ha proxy fwds the request to a backend keystone server How do we use a load balancer like ha proxy and also track the remote failed ip for authentication failures We get all authentication failures showing up with remote ip as vip ip Ajay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Rabbit HA queues
Hi How is the rabbit_ha_queues parameter used in configuration files like nova.conf, neutron.conf, cinder.conf etc What happens if on the rabbit node the ha queue is set to mirrored but the ha queues is set to False on client side [root@j10-controller-1 /]# rabbitmqctl list_policies Listing policies ... / ha-all all {"ha-mode":"all","ha-sync-mode":"automatic"} 0 ...done. [root@j10-controller-1 /] I have rabbit_ha_queues=False set on nova and neutron.conf and from what I can see the queues seem to be mirrored. So why is this needed in nova.conf, neutron.conf etc This is Juno release ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Control exchange configuration
Hi When we configure the control_exchange parameter in each of the openstack components it defaults to openstack Is there a recommendation to have separate exchanges per component or just use the openstack exchange for rabbit Is there any impact of using one vs the other Ajay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators