Hi Willy, Thanks for your comments, I did not realize that this was discussed earlier.
Let me go through your feedback and get back. Sorry that I am taking time for this, but this is due to work related reasons. Regards - Krishna On Tue, Feb 14, 2017 at 2:44 PM, Willy Tarreau <w...@1wt.eu> wrote: > Hi Krishna, > > On Tue, Feb 14, 2017 at 12:45:31PM +0530, Krishna Kumar (Engineering) > wrote: > > Hi Willy, > > > > Some time back, I had worked on making health checks being done by only > > one HAProxy process, and to share this information on a UP/DOWN event to > > other processes (tested for 64 processes). Before I finish it > completely, I > > wanted to check with you if this feature is useful. At that time, I was > > able to > > propagate the status to all processes on UP/DOWN, and state of the > servers > > on the other haproxy processes changed accordingly. > > > > The implementation was as follows: > > > > - For a backend section that requires shared health check (and which has > > nbproc>1), add a new option specifying that hc is "shared", with an > > argument > > which is a multicast address that is used to send/receive HC messages. > > Use > > difference unique MC addresses for different backend sections. > > - Process#0 becomes the Master process while others are Slaves for HC. > > - Process #1 to #n-1 listens on the MC address (all via the existing > generic > > epoll API). > > - When the Master finds that a server has gone UP or DOWN, it sends the > > information from "struct check", along with proxy-id, server-id on the > MC > > address. > > - When Slaves receive this message, they find the correct server and > updates > > it's notion of health (each Slave get the proxy as argument via the > > "struct > > dgram_conn) whenever this file-descriptor is ready for reading). > > > > There may be other issues with this approach, including what happens > during > > reload (not tested yet), support for non-epoll, or if process #0 gets > > killed, or if > > the MC message is "lost", etc. One option is to have HC's done by slaves > at > > a > > much lower frequency to validate things are sane. CLI shows good HC > values, > > but the gui dashboards was showing server DOWN in GREEN color,and other > > minor things that were not fixed at that time. > > > > Please let me know if this functionality/approach makes sense, and adds > > value. > > It's interesting that you worked on this, this is among the things we have > in the pipe as well. > > I have some comments, some of which overlap with what you already > identified. > The use of multicast can indeed be an issue during reloads, and even when > dealing with multiple parallel instances of haproxy, requiring the ability > to configure the multicast group. Another option which seems reasonable is > to use pipes to communicate between processes (it can be socketpairs as > well > but pipes are even cheaper). And the nice thing is that you can then even > have full-mesh communications for free thanks to inheritance of the FDs. > Pipes do not provide atomicity in full-mesh however so you can end up with > some processes writing partial messages, immediately followed by other > partial messages. But with socketpairs and sendmsg() it's not an issue. > > Another point is the fact that only one process runs the checks. As you > mentionned, there are some drawbacks. But there are even other ones, such > as the impossibility for a "slave" process to decide to turn a server down > or to switch to fastinter after an error on regular traffic when some > options like "observe layer7 on-error shutdown-server" are enabled. In my > opinion this is the biggest issue. > > However there is a solution to let every process update the state for all > other processes. It's not much complicated. The principle is that before > sending a health check, each process just has to verify if the age of the > last check is still fresh or not, and to only run the check when it's not > fresh anymore. This way, all processes still have their health check tasks > but when it's their turn to run, most of them realize they don't need to > start a check and can be rescheduled. > > We already gave some thoughts about this mechanism for use with the peers > protocol so that multiple LB nodes can share their checks, so the principle > with inter-process communications could very well be the same here. > > It's worth noting that with a basic synchronization (ie "here's my check > result"), there will still be some occasional overlapping checks between > a few processes which decide to start at the exact same time. But that's > a minor issue which can easily be addressed by increasing the spread-checks > setting so that all of them quickly become uniformly spread over the check > period. Another approach which I don't like much consists in having two > steps : "I'm starting a check", and "here's the result". The problem is > that we'll have to deal with the case where a process dies between the two. > > Anyway, even with your multicast socket you should be able to implement it > this way so that any process can update the check status for all others. It > will already solve a lot of issues including the impact of a lost message. > Please note however that it's important to spread each check's result, not > only the server state, so that the fastinter etc can be applied. > > Thanks! > Willy >