----- Original Message ----- > One reason for not sending the heartbeat from a separate greenthread could be > that the agent is already doing it [1]. > The current proposed patch addresses the issue blindly - that is to say > before declaring an agent dead let's wait for some more time because it > could be stuck doing stuff. In that case I would probably make the > multiplier (currently 2x) configurable. > > The reason for which state report does not occur is probably that both it and > the resync procedure are periodic tasks. If I got it right they're both > executed as eventlet greenthreads but one at a time. Perhaps then adding an > initial delay to the full sync task might ensure the first thing an agent > does when it comes up is sending a heartbeat to the server?
There's a patch that is related to this issue: https://review.openstack.org/#/c/186584/ I made a comment there where, at least to me, it makes a lot of sense to insert a report_state call in the after_start method, right after the agent initializes but before it performs the first full sync. So, right here before line 560: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L560 That should help *some* of the issues discussed in this thread, but not all. > > On the other hand, while doing the initial full resync, is the agent able to > process updates? If not perhaps it makes sense to have it down until it > finishes synchronisation. > > Salvatore > > [1] > http://git.openstack.org/cgit/openstack/neutron/tree/neutron/agent/l3/agent.py#n587 > > On 4 June 2015 at 16:16, Kevin Benton < blak...@gmail.com > wrote: > > > > > Why don't we put the agent heartbeat into a separate greenthread on the agent > so it continues to send updates even when it's busy processing changes? > On Jun 4, 2015 2:56 AM, "Anna Kamyshnikova" < akamyshnik...@mirantis.com > > wrote: > > > > Hi, neutrons! > > Some time ago I discovered a bug for l3 agent rescheduling [1]. When there > are a lot of resources and agent_down_time is not big enough neutron-server > starts marking l3 agents as dead. The same issue has been discovered and > fixed for DHCP-agents. I proposed a change similar to those that were done > for DHCP-agents. [2] > > There is no unified opinion on this bug and proposed change, so I want to ask > developers whether it worth to continue work on this patch or not. > > [1] - https://bugs.launchpad.net/neutron/+bug/1440761 > [2] - https://review.openstack.org/171592 > > -- > Regards, > Ann Kamyshnikova > Mirantis, Inc > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev