Re: [openstack-dev] [Neutron] DHCP Agent Reliability

Édouard Thuleau Wed, 04 Dec 2013 23:48:25 -0800

There also another bug you can link/duplicate with #1192381 is
https://bugs.launchpad.net/neutron/+bug/1185916.
I proposed a fix but it's not the good way. I abandoned it.


Édouard.

On Wed, Dec 4, 2013 at 10:43 PM, Carl Baldwin <[email protected]> wrote:
> I have offered up https://review.openstack.org/#/c/60082/ as a
> backport to Havana.  Interest was expressed in the blueprint for doing
> this even before this thread.  If there is consensus for this as the
> stop-gap then it is there for the merging.  However, I do not want to
> discourage discussion of other stop-gap solutions like what Maru
> proposed in the original post.
>
> Carl
>
> On Wed, Dec 4, 2013 at 9:12 AM, Ashok Kumaran <[email protected]> 
> wrote:
>>
>>
>>
>> On Wed, Dec 4, 2013 at 8:30 PM, Maru Newby <[email protected]> wrote:
>>>
>>>
>>> On Dec 4, 2013, at 8:55 AM, Carl Baldwin <[email protected]> wrote:
>>>
>>> > Stephen, all,
>>> >
>>> > I agree that there may be some opportunity to split things out a bit.
>>> > However, I'm not sure what the best way will be.  I recall that Mark
>>> > mentioned breaking out the processes that handle API requests and RPC
>>> > from each other at the summit.  Anyway, it is something that has been
>>> > discussed.
>>> >
>>> > I actually wanted to point out that the neutron server now has the
>>> > ability to run a configurable number of sub-processes to handle a
>>> > heavier load.  Introduced with this commit:
>>> >
>>> > https://review.openstack.org/#/c/37131/
>>> >
>>> > Set api_workers to something > 1 and restart the server.
>>> >
>>> > The server can also be run on more than one physical host in
>>> > combination with multiple child processes.
>>>
>>> I completely misunderstood the import of the commit in question.  Being
>>> able to run the wsgi server(s) out of process is a nice improvement, thank
>>> you for making it happen.  Has there been any discussion around making the
>>> default for api_workers > 0 (at least 1) to ensure that the default
>>> configuration separates wsgi and rpc load?  This also seems like a great
>>> candidate for backporting to havana and maybe even grizzly, although
>>> api_workers should probably be defaulted to 0 in those cases.
>>
>>
>> +1 for backporting the api_workers feature to havana as well as Grizzly :)
>>>
>>>
>>> FYI, I re-ran the test that attempted to boot 75 micro VM's simultaneously
>>> with api_workers = 2, with mixed results.  The increased wsgi throughput
>>> resulted in almost half of the boot requests failing with 500 errors due to
>>> QueuePool errors (https://bugs.launchpad.net/neutron/+bug/1160442) in
>>> Neutron.  It also appears that maximizing the number of wsgi requests has
>>> the side-effect of increasing the RPC load on the main process, and this
>>> means that the problem of dhcp notifications being dropped is little
>>> improved.  I intend to submit a fix that ensures that notifications are sent
>>> regardless of agent status, in any case.
>>>
>>>
>>> m.
>>>
>>> >
>>> > Carl
>>> >
>>> > On Tue, Dec 3, 2013 at 9:47 AM, Stephen Gran
>>> > <[email protected]> wrote:
>>> >> On 03/12/13 16:08, Maru Newby wrote:
>>> >>>
>>> >>> I've been investigating a bug that is preventing VM's from receiving
>>> >>> IP
>>> >>> addresses when a Neutron service is under high load:
>>> >>>
>>> >>> https://bugs.launchpad.net/neutron/+bug/1192381
>>> >>>
>>> >>> High load causes the DHCP agent's status updates to be delayed,
>>> >>> causing
>>> >>> the Neutron service to assume that the agent is down.  This results in
>>> >>> the
>>> >>> Neutron service not sending notifications of port addition to the DHCP
>>> >>> agent.  At present, the notifications are simply dropped.  A simple
>>> >>> fix is
>>> >>> to send notifications regardless of agent status.  Does anybody have
>>> >>> any
>>> >>> objections to this stop-gap approach?  I'm not clear on the
>>> >>> implications of
>>> >>> sending notifications to agents that are down, but I'm hoping for a
>>> >>> simple
>>> >>> fix that can be backported to both havana and grizzly (yes, this bug
>>> >>> has
>>> >>> been with us that long).
>>> >>>
>>> >>> Fixing this problem for real, though, will likely be more involved.
>>> >>> The
>>> >>> proposal to replace the current wsgi framework with Pecan may increase
>>> >>> the
>>> >>> Neutron service's scalability, but should we continue to use a 'fire
>>> >>> and
>>> >>> forget' approach to notification?  Being able to track the success or
>>> >>> failure of a given action outside of the logs would seem pretty
>>> >>> important,
>>> >>> and allow for more effective coordination with Nova than is currently
>>> >>> possible.
>>> >>
>>> >>
>>> >> It strikes me that we ask an awful lot of a single neutron-server
>>> >> instance -
>>> >> it has to take state updates from all the agents, it has to do
>>> >> scheduling,
>>> >> it has to respond to API requests, and it has to communicate about
>>> >> actual
>>> >> changes with the agents.
>>> >>
>>> >> Maybe breaking some of these out the way nova has a scheduler and a
>>> >> conductor and so on might be a good model (I know there are things
>>> >> people
>>> >> are unhappy about with nova-scheduler, but imagine how much worse it
>>> >> would
>>> >> be if it was built into the API).
>>> >>
>>> >> Doing all of those tasks, and doing it largely single threaded, is just
>>> >> asking for overload.
>>> >>
>>> >> Cheers,
>>> >> --
>>> >> Stephen Gran
>>> >> Senior Systems Integrator - theguardian.com
>>> >> Please consider the environment before printing this email.
>>> >> ------------------------------------------------------------------
>>> >> Visit theguardian.com
>>> >> On your mobile, download the Guardian iPhone app theguardian.com/iphone
>>> >> and
>>> >> our iPad edition theguardian.com/iPad   Save up to 33% by subscribing
>>> >> to the
>>> >> Guardian and Observer - choose the papers you want and get full digital
>>> >> access.
>>> >> Visit subscribe.theguardian.com
>>> >>
>>> >> This e-mail and all attachments are confidential and may also
>>> >> be privileged. If you are not the named recipient, please notify
>>> >> the sender and delete the e-mail and all attachments immediately.
>>> >> Do not disclose the contents to another person. You may not use
>>> >> the information for any purpose, or store, or copy, it in any way.
>>> >>
>>> >> Guardian News & Media Limited is not liable for any computer
>>> >> viruses or other material transmitted with or as part of this
>>> >> e-mail. You should employ virus checking software.
>>> >>
>>> >> Guardian News & Media Limited
>>> >>
>>> >> A member of Guardian Media Group plc
>>> >> Registered Office
>>> >> PO Box 68164
>>> >> Kings Place
>>> >> 90 York Way
>>> >> London
>>> >> N1P 2AP
>>> >>
>>> >> Registered in England Number 908396
>>> >>
>>> >>
>>> >> --------------------------------------------------------------------------
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> OpenStack-dev mailing list
>>> >> [email protected]
>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>> > _______________________________________________
>>> > OpenStack-dev mailing list
>>> > [email protected]
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> [email protected]
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> [email protected]
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] DHCP Agent Reliability

Reply via email to