[bringing a side email conversation onto the main mailing list]
I have been looking into the issue with neutron_fwaas having an error
when running under the neutron-l3-agent on Debian when using wsgi.
Here's what I have tracked it down to at this point. I am going to lay
it all out there, including points that you already know, because I am
going to bring in another party or two at this point.
To make sure we are on solid ground, let me restate what are the
parameters of the problem:
1. The error does not occur when neutron_fwaas is disabled.
2. The error does not occur if wsgi is in use. If standard eventlet is
used, the error is not observed.
3. The error only occurs on debian; centos and ubuntu do not manifest
the problem.
As the neutron-l3-agent loads, it is trying to initialize the fwaas_v2
driver. The driver initializes without incident, and then proceeds to
attempt to fetch firewall groups. Note that you do not need to exercise
tempest to see this behavior; it is visible in the logs without anything
else going on. Running pdb, I was able to trace the attempt to send the
message deep into the RPC transmission code; I saw very little there to
be suspicious of.
2018-06-20 21:06:34.761 915007 DEBUG
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-]
Fetch firewall groups from plugin get_firewall_groups_for_project
/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py:44
...
2018-06-20 21:07:05.551 915007 ERROR neutron.common.rpc [-] Timeout in RPC
method get_firewall_groups_for_project. Waiting for 10 seconds before next
attempt. If the server is not down, consider increasing the
rpc_response_timeout option as Neutron server(s) may be overloaded and unable
to respond quickly enough.:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID 8616e98dd8d943eea1dcf99c04bd2be6
You can see, the RPC message goes into the ether and does not return. This
results in the stacktraces in neutron-l3-agent.log. This example is from a
later transaction.
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-]
FWaaS router add RPC info call failed for 8c13b5d7-7b93-4b91-ae4c-c387abe96734:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to
message ID 44851518f5ee4d40a2cdbcabc27e3c92
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
Traceback (most recent call last):
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line
324, in get
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
return self._queues[msg_id].get(block=True, timeout=timeout)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/eventlet/queue.py", line 313, in get
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
return waiter.wait()
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
return get_hub().switch()
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 294, in switch
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
return self.greenlet.switch()
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
queue.Empty
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 During
handling of the above exception, another exception occurred:
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
Traceback (most recent call last):
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py",
line 292, in add_router
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
self._process_router_update(context, new_router)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py",
line 256, in _process_router_update
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
fwg_list = self.fwplugin_rpc.get_firewall_groups_for_project(ctx)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py",
line 45, in get_firewall_groups_for_project
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
host=self.host)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 185, in call
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
time.sleep(wait)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
self.force_reraise()
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in
force_reraise
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
six.reraise(self.type_, self.value, self.tb)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/six.py", line 686, in reraise
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
raise value
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 162, in call
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
return self._original_context.call(ctxt, method, **kwargs)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
retry=self.retry)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 131, in _send
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
timeout=timeout, retry=retry)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line
559, in send
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
retry=retry)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line
548, in _send
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
result = self._waiter.wait(msg_id, timeout)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line
440, in wait
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
message = self.waiters.get(msg_id, timeout=timeout)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line
328, in get
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 'to
message ID %s' % msg_id)
2018-06-20 21:23:24.772 919205 ERROR
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to
message ID 44851518f5ee4d40a2cdbcabc27e3c92
So it seems the RPC message for get_firewall_groups_for_project gets
lost. I can tell that there is no shortage of AMQP messaging occurring
between the neutron-l3-agent and RabbitMQ. Looking at the RabbitMQ
error log, I am concerned that the message is not even really being
transmitted, because RabbitMQ is registering premature disconnections
from only the neutron_l3_agent at times that correspond to this testing.
=WARNING REPORT==== 20-Jun-2018::19:42:41 ===
closing AMQP connection <0.16171.21> (127.0.0.1:47942 -> 127.0.0.1:5671
- neutron-l3-agent:804510:ecc4ca7d-361f-49f7-94f3-9b4a07d102fc):
client unexpectedly closed TCP connection
=WARNING REPORT==== 20-Jun-2018::19:42:41 ===
closing AMQP connection <0.16110.21> (127.0.0.1:47934 -> 127.0.0.1:5671
- neutron-l3-agent:804510:8974dfd0-cb22-4925-bf95-ed5dcd905e41):
client unexpectedly closed TCP connection
=WARNING REPORT==== 20-Jun-2018::19:42:41 ===
closing AMQP connection <0.15410.21> (127.0.0.1:47814 -> 127.0.0.1:5671
- neutron-l3-agent:804510:ce915f0c-e46d-442f-aba2-5c204589fb0f):
client unexpectedly closed TCP connection
=WARNING REPORT==== 20-Jun-2018::19:42:41 ===
closing AMQP connection <0.15388.21> (127.0.0.1:47810 -> 127.0.0.1:5671
- neutron-l3-agent:804510:ab9b2193-199b-4312-9eca-2fbfc6cf27cd):
client unexpectedly closed TCP connection
I will continue to debug the issue tomorrow. I see no lonkage at this
point with any of the previously listed constraints on this scenario.
So I am going to copy Brian Haley for his L3 expertise, as well as the 3
FWAaaS cores to see if this directs their thoughts in any particular
direction. I hope to continue the investigation tomorrow.
Thanks,
Nate Johnston
njohnston
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev