What do you see in the Neutron server logs while it's not responding? On Tue, Feb 28, 2017 at 1:27 AM, Satyanarayana Patibandla <[email protected]> wrote: > Hi Kevin, > > Thanks for your suggestion. I will modify the parameter value and will test > the changes. > > Could you please provide your suggestion on recovering to normal state after > getting this error. Once we get this error the neutron CLI gives "504 > gateway timeout". We tried to restart all neutron-server and neutron-agents > container but still we are getting the same "504 gateway timeout" error. > Every time we have to reimage the servers and redeploy from scratch to make > neutron CLI to work again. > > Thanks, > Satya.P > > On Tue, Feb 28, 2017 at 2:18 PM, Kevin Benton <[email protected]> wrote: >> >> That particular update query is issued by the agent state report handler. >> And it looks like they might be falling behind based on the timestamp it's >> trying to update in the DB (14:46:35) and the log statement (14:50:29). >> >> Can you try increasing the rpc_state_report_workers value? If you haven't >> modified it, the default value is only 1. You can probably cut the number of >> RPC workers down to make up for the difference. >> >> On Mon, Feb 27, 2017 at 11:39 AM, Satyanarayana Patibandla >> <[email protected]> wrote: >>> >>> Hi Kevin, >>> >>> After increasing the parameter values mentioned in the below mail, we are >>> able to create few hundreds of VMs properly. There were no errors related to >>> neutron. Our environment contain multiple regions. One of our team member by >>> mistake ran all openstack service tempest tests against the site. After >>> running the tempest tests, again we observed the "504 gateway timeout" >>> error. This time even after restarting all neutron agents related containers >>> the neutron CLI was not responsive. We are getting the same gateway timeout >>> error even after restarting all the neutron agent containers. >>> >>> We did SHOW PROCESSLIST in MySQL. we can see a lock on the agent table >>> query. >>> >>> In the logs we can see below error. >>> >>> 2017-02-27 14:50:29.085 38 ERROR oslo_messaging.rpc.server DBDeadlock: >>> (pymysql.err.InternalError) (1205, u'Lock wait timeout exceeded; try >>> restarting transaction') [SQL: u'UPDATE agents SET >>> heartbeat_timestamp=%(heartbeat_timestamp)s WHERE agents.id = >>> %(agents_id)s'] [parameters: {'heartbeat_timestamp': datetime.datetime(2017, >>> 2, 27, 14, 46, 35, 229400), 'agents_id': >>> u'94535d12-4b04-42c2-8a74-f2358db41634'}] >>> >>> We are using stable/ocata code in our enviornment. We had to reimage and >>> redeploy all the nodes to continue our testing. Could you please let us know >>> your thoughts on the above issue. >>> >>> Thanks, >>> Satya.P >>> >>> On Mon, Feb 27, 2017 at 12:32 PM, Satyanarayana Patibandla >>> <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> We increased api_workers,rpc_workers and metadata_workers based on the >>>> number of cores we are running on controller node ( the workers are half of >>>> the number of cores. i.e if we have 24 cores then we are running 12 workers >>>> for each). Increased rpc_connect_timeout to 180 and rpc_response_timeout to >>>> 600. As of now it seems these are fine. >>>> >>>> Let me know if you have any comments or suggestions about increasing >>>> those parameter values. >>>> >>>> Thanks, >>>> Satya.P >>>> >>>> On Mon, Feb 27, 2017 at 11:16 AM, Kevin Benton <[email protected]> wrote: >>>>> >>>>> Thanks for following up. Would you mind sharing the parameters you had >>>>> to tune (db pool limits, etc) just in case someone comes across this same >>>>> thread in a google search? >>>>> >>>>> Thanks, >>>>> Kevin Benton >>>>> >>>>> On Sun, Feb 26, 2017 at 8:48 PM, Satyanarayana Patibandla >>>>> <[email protected]> wrote: >>>>>> >>>>>> Hi Saverio, >>>>>> >>>>>> The issue seems to be related to neutron tuning. We observed the same >>>>>> issue with stable/ocata branch code. When we tuned few neutron >>>>>> parameters it >>>>>> is working fine. >>>>>> Thanks for your suggestion. >>>>>> >>>>>> Thanks, >>>>>> Satya.P >>>>>> >>>>>> On Wed, Feb 22, 2017 at 10:10 AM, Satyanarayana Patibandla >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> Hi Saverio, >>>>>>> >>>>>>> Thanks for your inputs. Will test with statable/ocata branch code and >>>>>>> will share the result. >>>>>>> >>>>>>> Thanks, >>>>>>> Satya.P >>>>>>> >>>>>>> On Wed, Feb 22, 2017 at 1:54 AM, Saverio Proto <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I would use at least the stable/ocata branch. If you just use master >>>>>>>> that is not supposed to be stable, and also I am not sure if you can >>>>>>>> fill a bug against a specific commit in master. >>>>>>>> >>>>>>>> Saverio >>>>>>>> >>>>>>>> 2017-02-21 21:12 GMT+01:00 Satyanarayana Patibandla >>>>>>>> <[email protected]>: >>>>>>>> > Hi Saverio, >>>>>>>> > >>>>>>>> > We have tried to create 20 VMs each time using heat template. >>>>>>>> > There is 1 sec >>>>>>>> > time gap between each VM creation request. When we reached 114 VMs >>>>>>>> > we got >>>>>>>> > the error mentioned in the below mail.Heat template will boot >>>>>>>> > instance from >>>>>>>> > volume and it assigns floating IP to the instance. >>>>>>>> > >>>>>>>> > Except neutron-server container we restarted all the neutron agent >>>>>>>> > containers which are present on all network and compute nodes. We >>>>>>>> > are using >>>>>>>> > kolla to deploy openstack services. >>>>>>>> > >>>>>>>> > We are using 1 month old master branch openstack code to deploy >>>>>>>> > our >>>>>>>> > services. >>>>>>>> > >>>>>>>> > Please find the error logs in the below link. >>>>>>>> > http://paste.openstack.org/show/599892/ >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > Satya.P >>>>>>>> > >>>>>>>> > On Wed, Feb 22, 2017 at 12:21 AM, Saverio Proto >>>>>>>> > <[email protected]> wrote: >>>>>>>> >> >>>>>>>> >> Hello Satya, >>>>>>>> >> >>>>>>>> >> I would fill a bug on launchpad for this issue. >>>>>>>> >> 114 VMs is not much. Can you identify how to trigger the issue to >>>>>>>> >> reproduce it ? or it just happens randomly ? >>>>>>>> >> >>>>>>>> >> When you say rebooting the network node, do you mean the server >>>>>>>> >> running the neutron-server process ? >>>>>>>> >> >>>>>>>> >> what version and distribution of openstack are you using ? >>>>>>>> >> >>>>>>>> >> thank you >>>>>>>> >> >>>>>>>> >> Saverio >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> 2017-02-21 13:54 GMT+01:00 Satyanarayana Patibandla >>>>>>>> >> <[email protected]>: >>>>>>>> >> > Hi All, >>>>>>>> >> > >>>>>>>> >> > We are trying to deploy Openstack in our production >>>>>>>> >> > environment. For >>>>>>>> >> > networking we are using DVR with out L3 HA. We are able to >>>>>>>> >> > create 114 >>>>>>>> >> > VMs >>>>>>>> >> > with out any issue. After creating 114 VMs we are getting the >>>>>>>> >> > below >>>>>>>> >> > error. >>>>>>>> >> > >>>>>>>> >> > Error: <html><body><h1>504 Gateway Time-out</h1> The server >>>>>>>> >> > didn't >>>>>>>> >> > respond >>>>>>>> >> > in time. </body></html> >>>>>>>> >> > >>>>>>>> >> > Neutron services are getting freezed up due to a persistent >>>>>>>> >> > lock on the >>>>>>>> >> > agents table. it seems one of the network node is holding the >>>>>>>> >> > lock on >>>>>>>> >> > the >>>>>>>> >> > table. After rebooting the network node, the Neutron CLI was >>>>>>>> >> > responsive >>>>>>>> >> > again. >>>>>>>> >> > >>>>>>>> >> > Neutron agent and neutron server is throwing below errors. >>>>>>>> >> > >>>>>>>> >> > Neutron-server errors: >>>>>>>> >> > ERROR oslo_db.sqlalchemy.exc_filters "Can't reconnect until >>>>>>>> >> > invalid >>>>>>>> >> > " >>>>>>>> >> > ERROR oslo_db.sqlalchemy.exc_filters InvalidRequestError: Can't >>>>>>>> >> > reconnect >>>>>>>> >> > until invalid transaction is rolled back >>>>>>>> >> > ERROR neutron.api.v2.resource >>>>>>>> >> > [req-24fa6eaa-a9e0-4f55-97e0-59db203e72c6 >>>>>>>> >> > 3eb776587c9c40569731ebe5c3557bc7 >>>>>>>> >> > f43e8699cd5a46e89ffe39e3cac75341 - - -] >>>>>>>> >> > index failed: No details. >>>>>>>> >> > ERROR neutron.api.v2.resource DBError: Can't reconnect until >>>>>>>> >> > invalid >>>>>>>> >> > transaction is rolled back >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > Neutron agents errors: >>>>>>>> >> > MessagingTimeout: Timed out waiting for a reply to message ID >>>>>>>> >> > 40638b6bf12c44cd9a404ecaa14a9909 >>>>>>>> >> > >>>>>>>> >> > Could you please provide us your valuable inputs or suggestions >>>>>>>> >> > for >>>>>>>> >> > above >>>>>>>> >> > errors. >>>>>>>> >> > >>>>>>>> >> > Thanks, >>>>>>>> >> > Satya.P >>>>>>>> >> > >>>>>>>> >> > _______________________________________________ >>>>>>>> >> > OpenStack-operators mailing list >>>>>>>> >> > [email protected] >>>>>>>> >> > >>>>>>>> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>>>>>> >> > >>>>>>>> > >>>>>>>> > >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-operators mailing list >>>>>> [email protected] >>>>>> >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
