Unfortunately retry doesn't work in our case as we run haproxy on 2
layers, frontend servers and backend servers (to distribute traffic
among multiple processes on each server). So when an app on a server
goes down, the haproxy on that server is still up and accepting
connections, but the layer 7 http checks from the frontend haproxy are
failing. But since the backend haproxy is still accepting connections,
the retry option does not work.

-Patrick

------------------------------------------------------------------------
*From: *Baptiste <bed...@gmail.com>
*Sent: * 2014-02-24 07:18:00 E
*To: *Malcolm Turnbull <malc...@loadbalancer.org>
*CC: *Neil <n...@iamafreeman.com>, Patrick Hemmer
<hapr...@stormcloud9.net>, HAProxy <haproxy@formilux.org>
*Subject: *Re: Just a simple thought on health checks after a soft
reload of HAProxy....

> Hi Malcolm,
>
> Hence the retry and redispatch options :)
> I know it's a dirty workaround.
>
> Baptiste
>
>
> On Sun, Feb 23, 2014 at 8:42 PM, Malcolm Turnbull
> <malc...@loadbalancer.org> wrote:
>> Neil,
>>
>> Yes, peers are great for passing stick tables to the new HAProxy
>> instance and any current connections bound to the old process will be
>> fine.
>> However  any new connections will hit the new HAProxy process and if
>> the backend server is down but haproxy hasn't health checked it yet
>> then the user will hit a failed server.
>>
>>
>>
>> On 23 February 2014 10:38, Neil <n...@iamafreeman.com> wrote:
>>> Hello
>>>
>>> Regarding restarts, rather that cold starts, if you configure peers the
>>> state from before the restart should be kept. The new process haproxy
>>> creates is automatically a peer to the existing process and gets the state
>>> as was.
>>>
>>> Neil
>>>
>>> On 23 Feb 2014 03:46, "Patrick Hemmer" <hapr...@stormcloud9.net> wrote:
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Sok Ann Yap <sok...@gmail.com>
>>>> Sent: 2014-02-21 05:11:48 E
>>>> To: haproxy@formilux.org
>>>> Subject: Re: Just a simple thought on health checks after a soft reload of
>>>> HAProxy....
>>>>
>>>> Patrick Hemmer <haproxy@...> writes:
>>>>
>>>>       From: Willy Tarreau <w <at> 1wt.eu>
>>>>
>>>>       Sent:  2014-01-25 05:45:11 E
>>>>
>>>> Till now that's exactly what's currently done. The servers are marked
>>>> "almost dead", so the first check gives the verdict. Initially we had
>>>> all checks started immediately. But it caused a lot of issues at several
>>>> places where there were a high number of backends or servers mapped to
>>>> the same hardware, because the rush of connection really caused the
>>>> servers to be flagged as down. So we started to spread the checks over
>>>> the longest check period in a farm.
>>>>
>>>>     Is there a way to enable this behavior? In my
>>>>     environment/configuration, it causes absolutely no issue that all
>>>>     the checks be fired off at the same time.
>>>>     As it is right now, when haproxy starts up, it takes it quite a
>>>>     while to discover which servers are down.
>>>>     -Patrick
>>>>
>>>> I faced the same problem in http://thread.gmane.org/
>>>> gmane.comp.web.haproxy/14644
>>>>
>>>> After much contemplation, I decided to just patch away the initial spread
>>>> check behavior: https://github.com/sayap/sayap-overlay/blob/master/net-
>>>> proxy/haproxy/files/haproxy-immediate-first-check.diff
>>>>
>>>>
>>>>
>>>> I definitely think there should be an option to disable the behavior. We
>>>> have an automated system which adds and removes servers from the config, 
>>>> and
>>>> then bounces haproxy. Every time haproxy is bounced, we have a period where
>>>> it can send traffic to a dead server.
>>>>
>>>>
>>>> There's also a related bug on this.
>>>> The bug is that when I have a config with "inter 30s fastinter 1s" and no
>>>> httpchk enabled, when haproxy first starts up, it spreads the checks over
>>>> the period defined as fastinter, but the stats output says "UP 1/3" for the
>>>> full 30 seconds. It also says "L4OK in 30001ms", when I know it doesn't 
>>>> take
>>>> the server 30 seconds to simply accept a connection.
>>>> Yet you get different behavior when using httpchk. When I add "option
>>>> httpchk", it still spreads the checks over the 1s fastinter value, but the
>>>> stats output goes full "UP" immediately after the check occurs, not "UP
>>>> 1/3". It also says "L7OK/200 in 0ms", which is what I expect to see.
>>>>
>>>> -Patrick
>>>>
>>
>>
>> --
>> Regards,
>>
>> Malcolm Turnbull.
>>
>> Loadbalancer.org Ltd.
>> Phone: +44 (0)870 443 8779
>> http://www.loadbalancer.org/
>>

Reply via email to