Re: RFC: HAProxy shared health-check for nbproc > 1
On Wed, Feb 15, 2017 at 10:58:02AM +0530, Krishna Kumar (Engineering) wrote: > Hi Willy, > > Thanks for your comments, I did not realize that this was discussed earlier. > > Let me go through your feedback and get back. Sorry that I am taking time > for this, but this is due to work related reasons. No problem, you're welcome, work is preventing all of us from participating as much as we'd like to :-) Willy
Re: RFC: HAProxy shared health-check for nbproc > 1
Hi Willy, Thanks for your comments, I did not realize that this was discussed earlier. Let me go through your feedback and get back. Sorry that I am taking time for this, but this is due to work related reasons. Regards - Krishna On Tue, Feb 14, 2017 at 2:44 PM, Willy Tarreauwrote: > Hi Krishna, > > On Tue, Feb 14, 2017 at 12:45:31PM +0530, Krishna Kumar (Engineering) > wrote: > > Hi Willy, > > > > Some time back, I had worked on making health checks being done by only > > one HAProxy process, and to share this information on a UP/DOWN event to > > other processes (tested for 64 processes). Before I finish it > completely, I > > wanted to check with you if this feature is useful. At that time, I was > > able to > > propagate the status to all processes on UP/DOWN, and state of the > servers > > on the other haproxy processes changed accordingly. > > > > The implementation was as follows: > > > > - For a backend section that requires shared health check (and which has > >nbproc>1), add a new option specifying that hc is "shared", with an > > argument > >which is a multicast address that is used to send/receive HC messages. > > Use > >difference unique MC addresses for different backend sections. > > - Process#0 becomes the Master process while others are Slaves for HC. > > - Process #1 to #n-1 listens on the MC address (all via the existing > generic > > epoll API). > > - When the Master finds that a server has gone UP or DOWN, it sends the > > information from "struct check", along with proxy-id, server-id on the > MC > > address. > > - When Slaves receive this message, they find the correct server and > updates > > it's notion of health (each Slave get the proxy as argument via the > > "struct > > dgram_conn) whenever this file-descriptor is ready for reading). > > > > There may be other issues with this approach, including what happens > during > > reload (not tested yet), support for non-epoll, or if process #0 gets > > killed, or if > > the MC message is "lost", etc. One option is to have HC's done by slaves > at > > a > > much lower frequency to validate things are sane. CLI shows good HC > values, > > but the gui dashboards was showing server DOWN in GREEN color,and other > > minor things that were not fixed at that time. > > > > Please let me know if this functionality/approach makes sense, and adds > > value. > > It's interesting that you worked on this, this is among the things we have > in the pipe as well. > > I have some comments, some of which overlap with what you already > identified. > The use of multicast can indeed be an issue during reloads, and even when > dealing with multiple parallel instances of haproxy, requiring the ability > to configure the multicast group. Another option which seems reasonable is > to use pipes to communicate between processes (it can be socketpairs as > well > but pipes are even cheaper). And the nice thing is that you can then even > have full-mesh communications for free thanks to inheritance of the FDs. > Pipes do not provide atomicity in full-mesh however so you can end up with > some processes writing partial messages, immediately followed by other > partial messages. But with socketpairs and sendmsg() it's not an issue. > > Another point is the fact that only one process runs the checks. As you > mentionned, there are some drawbacks. But there are even other ones, such > as the impossibility for a "slave" process to decide to turn a server down > or to switch to fastinter after an error on regular traffic when some > options like "observe layer7 on-error shutdown-server" are enabled. In my > opinion this is the biggest issue. > > However there is a solution to let every process update the state for all > other processes. It's not much complicated. The principle is that before > sending a health check, each process just has to verify if the age of the > last check is still fresh or not, and to only run the check when it's not > fresh anymore. This way, all processes still have their health check tasks > but when it's their turn to run, most of them realize they don't need to > start a check and can be rescheduled. > > We already gave some thoughts about this mechanism for use with the peers > protocol so that multiple LB nodes can share their checks, so the principle > with inter-process communications could very well be the same here. > > It's worth noting that with a basic synchronization (ie "here's my check > result"), there will still be some occasional overlapping checks between > a few processes which decide to start at the exact same time. But that's > a minor issue which can easily be addressed by increasing the spread-checks > setting so that all of them quickly become uniformly spread over the check > period. Another approach which I don't like much consists in having two > steps : "I'm starting a check", and "here's the result". The problem is > that we'll have to deal with the case where a process
Re: RFC: HAProxy shared health-check for nbproc > 1
Hi Krishna, On Tue, Feb 14, 2017 at 12:45:31PM +0530, Krishna Kumar (Engineering) wrote: > Hi Willy, > > Some time back, I had worked on making health checks being done by only > one HAProxy process, and to share this information on a UP/DOWN event to > other processes (tested for 64 processes). Before I finish it completely, I > wanted to check with you if this feature is useful. At that time, I was > able to > propagate the status to all processes on UP/DOWN, and state of the servers > on the other haproxy processes changed accordingly. > > The implementation was as follows: > > - For a backend section that requires shared health check (and which has >nbproc>1), add a new option specifying that hc is "shared", with an > argument >which is a multicast address that is used to send/receive HC messages. > Use >difference unique MC addresses for different backend sections. > - Process#0 becomes the Master process while others are Slaves for HC. > - Process #1 to #n-1 listens on the MC address (all via the existing generic > epoll API). > - When the Master finds that a server has gone UP or DOWN, it sends the > information from "struct check", along with proxy-id, server-id on the MC > address. > - When Slaves receive this message, they find the correct server and updates > it's notion of health (each Slave get the proxy as argument via the > "struct > dgram_conn) whenever this file-descriptor is ready for reading). > > There may be other issues with this approach, including what happens during > reload (not tested yet), support for non-epoll, or if process #0 gets > killed, or if > the MC message is "lost", etc. One option is to have HC's done by slaves at > a > much lower frequency to validate things are sane. CLI shows good HC values, > but the gui dashboards was showing server DOWN in GREEN color,and other > minor things that were not fixed at that time. > > Please let me know if this functionality/approach makes sense, and adds > value. It's interesting that you worked on this, this is among the things we have in the pipe as well. I have some comments, some of which overlap with what you already identified. The use of multicast can indeed be an issue during reloads, and even when dealing with multiple parallel instances of haproxy, requiring the ability to configure the multicast group. Another option which seems reasonable is to use pipes to communicate between processes (it can be socketpairs as well but pipes are even cheaper). And the nice thing is that you can then even have full-mesh communications for free thanks to inheritance of the FDs. Pipes do not provide atomicity in full-mesh however so you can end up with some processes writing partial messages, immediately followed by other partial messages. But with socketpairs and sendmsg() it's not an issue. Another point is the fact that only one process runs the checks. As you mentionned, there are some drawbacks. But there are even other ones, such as the impossibility for a "slave" process to decide to turn a server down or to switch to fastinter after an error on regular traffic when some options like "observe layer7 on-error shutdown-server" are enabled. In my opinion this is the biggest issue. However there is a solution to let every process update the state for all other processes. It's not much complicated. The principle is that before sending a health check, each process just has to verify if the age of the last check is still fresh or not, and to only run the check when it's not fresh anymore. This way, all processes still have their health check tasks but when it's their turn to run, most of them realize they don't need to start a check and can be rescheduled. We already gave some thoughts about this mechanism for use with the peers protocol so that multiple LB nodes can share their checks, so the principle with inter-process communications could very well be the same here. It's worth noting that with a basic synchronization (ie "here's my check result"), there will still be some occasional overlapping checks between a few processes which decide to start at the exact same time. But that's a minor issue which can easily be addressed by increasing the spread-checks setting so that all of them quickly become uniformly spread over the check period. Another approach which I don't like much consists in having two steps : "I'm starting a check", and "here's the result". The problem is that we'll have to deal with the case where a process dies between the two. Anyway, even with your multicast socket you should be able to implement it this way so that any process can update the check status for all others. It will already solve a lot of issues including the impact of a lost message. Please note however that it's important to spread each check's result, not only the server state, so that the fastinter etc can be applied. Thanks! Willy
RFC: HAProxy shared health-check for nbproc > 1
Hi Willy, Some time back, I had worked on making health checks being done by only one HAProxy process, and to share this information on a UP/DOWN event to other processes (tested for 64 processes). Before I finish it completely, I wanted to check with you if this feature is useful. At that time, I was able to propagate the status to all processes on UP/DOWN, and state of the servers on the other haproxy processes changed accordingly. The implementation was as follows: - For a backend section that requires shared health check (and which has nbproc>1), add a new option specifying that hc is "shared", with an argument which is a multicast address that is used to send/receive HC messages. Use difference unique MC addresses for different backend sections. - Process#0 becomes the Master process while others are Slaves for HC. - Process #1 to #n-1 listens on the MC address (all via the existing generic epoll API). - When the Master finds that a server has gone UP or DOWN, it sends the information from "struct check", along with proxy-id, server-id on the MC address. - When Slaves receive this message, they find the correct server and updates it's notion of health (each Slave get the proxy as argument via the "struct dgram_conn) whenever this file-descriptor is ready for reading). There may be other issues with this approach, including what happens during reload (not tested yet), support for non-epoll, or if process #0 gets killed, or if the MC message is "lost", etc. One option is to have HC's done by slaves at a much lower frequency to validate things are sane. CLI shows good HC values, but the gui dashboards was showing server DOWN in GREEN color,and other minor things that were not fixed at that time. Please let me know if this functionality/approach makes sense, and adds value. Thanks, - Krishna