Re: RFC: HAProxy shared health-check for nbproc > 1

2017-02-14 Thread Willy Tarreau
On Wed, Feb 15, 2017 at 10:58:02AM +0530, Krishna Kumar (Engineering) wrote:
> Hi Willy,
> 
> Thanks for your comments, I did not realize that this was discussed earlier.
> 
> Let me go through your feedback and get back. Sorry that I am taking time
> for this, but this is due to work related reasons.

No problem, you're welcome, work is preventing all of us from participating
as much as we'd like to :-)

Willy



Re: RFC: HAProxy shared health-check for nbproc > 1

2017-02-14 Thread Krishna Kumar (Engineering)
Hi Willy,

Thanks for your comments, I did not realize that this was discussed earlier.

Let me go through your feedback and get back. Sorry that I am taking time
for this, but this is due to work related reasons.

Regards
- Krishna


On Tue, Feb 14, 2017 at 2:44 PM, Willy Tarreau  wrote:

> Hi Krishna,
>
> On Tue, Feb 14, 2017 at 12:45:31PM +0530, Krishna Kumar (Engineering)
> wrote:
> > Hi Willy,
> >
> > Some time back, I had worked on making health checks being done by only
> > one HAProxy process, and to share this information on a UP/DOWN event to
> > other processes (tested for 64 processes). Before I finish it
> completely, I
> > wanted to check with you if this feature is useful. At that time, I was
> > able to
> > propagate the status to all processes on UP/DOWN, and state of the
> servers
> > on the other haproxy processes changed accordingly.
> >
> > The implementation was as follows:
> >
> > - For a backend section that requires shared health check (and which has
> >nbproc>1), add a new option specifying that hc is "shared", with an
> > argument
> >which is a multicast address that is used to send/receive HC messages.
> > Use
> >difference unique MC addresses for different backend sections.
> > - Process#0 becomes the Master process while others are Slaves for HC.
> > - Process #1 to #n-1 listens on the MC address (all via the existing
> generic
> >   epoll API).
> > - When the Master finds that a server has gone UP or DOWN, it sends the
> >   information from "struct check", along with proxy-id, server-id on the
> MC
> >   address.
> > - When Slaves receive this message, they find the correct server and
> updates
> >   it's notion of health (each Slave get the proxy as argument via the
> > "struct
> >   dgram_conn) whenever this file-descriptor is ready for reading).
> >
> > There may be other issues with this approach, including what happens
> during
> > reload (not tested yet),  support for non-epoll, or if process #0 gets
> > killed, or if
> > the MC message is "lost", etc. One option is to have HC's done by slaves
> at
> > a
> > much lower frequency to validate things are sane. CLI shows good HC
> values,
> > but the gui dashboards was showing server DOWN in GREEN color,and other
> > minor things that were not fixed at that time.
> >
> > Please let me know if this functionality/approach makes sense, and adds
> > value.
>
> It's interesting that you worked on this, this is among the things we have
> in the pipe as well.
>
> I have some comments, some of which overlap with what you already
> identified.
> The use of multicast can indeed be an issue during reloads, and even when
> dealing with multiple parallel instances of haproxy, requiring the ability
> to configure the multicast group. Another option which seems reasonable is
> to use pipes to communicate between processes (it can be socketpairs as
> well
> but pipes are even cheaper). And the nice thing is that you can then even
> have full-mesh communications for free thanks to inheritance of the FDs.
> Pipes do not provide atomicity in full-mesh however so you can end up with
> some processes writing partial messages, immediately followed by other
> partial messages. But with socketpairs and sendmsg() it's not an issue.
>
> Another point is the fact that only one process runs the checks. As you
> mentionned, there are some drawbacks. But there are even other ones, such
> as the impossibility for a "slave" process to decide to turn a server down
> or to switch to fastinter after an error on regular traffic when some
> options like "observe layer7 on-error shutdown-server" are enabled. In my
> opinion this is the biggest issue.
>
> However there is a solution to let every process update the state for all
> other processes. It's not much complicated. The principle is that before
> sending a health check, each process just has to verify if the age of the
> last check is still fresh or not, and to only run the check when it's not
> fresh anymore. This way, all processes still have their health check tasks
> but when it's their turn to run, most of them realize they don't need to
> start a check and can be rescheduled.
>
> We already gave some thoughts about this mechanism for use with the peers
> protocol so that multiple LB nodes can share their checks, so the principle
> with inter-process communications could very well be the same here.
>
> It's worth noting that with a basic synchronization (ie "here's my check
> result"), there will still be some occasional overlapping checks between
> a few processes which decide to start at the exact same time. But that's
> a minor issue which can easily be addressed by increasing the spread-checks
> setting so that all of them quickly become uniformly spread over the check
> period. Another approach which I don't like much consists in having two
> steps : "I'm starting a check", and "here's the result". The problem is
> that we'll have to deal with the case where a process 

Re: RFC: HAProxy shared health-check for nbproc > 1

2017-02-14 Thread Willy Tarreau
Hi Krishna,

On Tue, Feb 14, 2017 at 12:45:31PM +0530, Krishna Kumar (Engineering) wrote:
> Hi Willy,
> 
> Some time back, I had worked on making health checks being done by only
> one HAProxy process, and to share this information on a UP/DOWN event to
> other processes (tested for 64 processes). Before I finish it completely, I
> wanted to check with you if this feature is useful. At that time, I was
> able to
> propagate the status to all processes on UP/DOWN, and state of the servers
> on the other haproxy processes changed accordingly.
> 
> The implementation was as follows:
> 
> - For a backend section that requires shared health check (and which has
>nbproc>1), add a new option specifying that hc is "shared", with an
> argument
>which is a multicast address that is used to send/receive HC messages.
> Use
>difference unique MC addresses for different backend sections.
> - Process#0 becomes the Master process while others are Slaves for HC.
> - Process #1 to #n-1 listens on the MC address (all via the existing generic
>   epoll API).
> - When the Master finds that a server has gone UP or DOWN, it sends the
>   information from "struct check", along with proxy-id, server-id on the MC
>   address.
> - When Slaves receive this message, they find the correct server and updates
>   it's notion of health (each Slave get the proxy as argument via the
> "struct
>   dgram_conn) whenever this file-descriptor is ready for reading).
> 
> There may be other issues with this approach, including what happens during
> reload (not tested yet),  support for non-epoll, or if process #0 gets
> killed, or if
> the MC message is "lost", etc. One option is to have HC's done by slaves at
> a
> much lower frequency to validate things are sane. CLI shows good HC values,
> but the gui dashboards was showing server DOWN in GREEN color,and other
> minor things that were not fixed at that time.
> 
> Please let me know if this functionality/approach makes sense, and adds
> value.

It's interesting that you worked on this, this is among the things we have
in the pipe as well.

I have some comments, some of which overlap with what you already identified.
The use of multicast can indeed be an issue during reloads, and even when
dealing with multiple parallel instances of haproxy, requiring the ability
to configure the multicast group. Another option which seems reasonable is
to use pipes to communicate between processes (it can be socketpairs as well
but pipes are even cheaper). And the nice thing is that you can then even
have full-mesh communications for free thanks to inheritance of the FDs.
Pipes do not provide atomicity in full-mesh however so you can end up with
some processes writing partial messages, immediately followed by other
partial messages. But with socketpairs and sendmsg() it's not an issue.

Another point is the fact that only one process runs the checks. As you
mentionned, there are some drawbacks. But there are even other ones, such
as the impossibility for a "slave" process to decide to turn a server down
or to switch to fastinter after an error on regular traffic when some
options like "observe layer7 on-error shutdown-server" are enabled. In my
opinion this is the biggest issue.

However there is a solution to let every process update the state for all
other processes. It's not much complicated. The principle is that before
sending a health check, each process just has to verify if the age of the
last check is still fresh or not, and to only run the check when it's not
fresh anymore. This way, all processes still have their health check tasks
but when it's their turn to run, most of them realize they don't need to
start a check and can be rescheduled.

We already gave some thoughts about this mechanism for use with the peers
protocol so that multiple LB nodes can share their checks, so the principle
with inter-process communications could very well be the same here.

It's worth noting that with a basic synchronization (ie "here's my check
result"), there will still be some occasional overlapping checks between
a few processes which decide to start at the exact same time. But that's
a minor issue which can easily be addressed by increasing the spread-checks
setting so that all of them quickly become uniformly spread over the check
period. Another approach which I don't like much consists in having two
steps : "I'm starting a check", and "here's the result". The problem is
that we'll have to deal with the case where a process dies between the two.

Anyway, even with your multicast socket you should be able to implement it
this way so that any process can update the check status for all others. It
will already solve a lot of issues including the impact of a lost message.
Please note however that it's important to spread each check's result, not
only the server state, so that the fastinter etc can be applied.

Thanks!
Willy



RFC: HAProxy shared health-check for nbproc > 1

2017-02-13 Thread Krishna Kumar (Engineering)
Hi Willy,

Some time back, I had worked on making health checks being done by only
one HAProxy process, and to share this information on a UP/DOWN event to
other processes (tested for 64 processes). Before I finish it completely, I
wanted to check with you if this feature is useful. At that time, I was
able to
propagate the status to all processes on UP/DOWN, and state of the servers
on the other haproxy processes changed accordingly.

The implementation was as follows:

- For a backend section that requires shared health check (and which has
   nbproc>1), add a new option specifying that hc is "shared", with an
argument
   which is a multicast address that is used to send/receive HC messages.
Use
   difference unique MC addresses for different backend sections.
- Process#0 becomes the Master process while others are Slaves for HC.
- Process #1 to #n-1 listens on the MC address (all via the existing generic
  epoll API).
- When the Master finds that a server has gone UP or DOWN, it sends the
  information from "struct check", along with proxy-id, server-id on the MC
  address.
- When Slaves receive this message, they find the correct server and updates
  it's notion of health (each Slave get the proxy as argument via the
"struct
  dgram_conn) whenever this file-descriptor is ready for reading).

There may be other issues with this approach, including what happens during
reload (not tested yet),  support for non-epoll, or if process #0 gets
killed, or if
the MC message is "lost", etc. One option is to have HC's done by slaves at
a
much lower frequency to validate things are sane. CLI shows good HC values,
but the gui dashboards was showing server DOWN in GREEN color,and other
minor things that were not fixed at that time.

Please let me know if this functionality/approach makes sense, and adds
value.

Thanks,
- Krishna