Hi Simon ! Last week I could find some time with a quiet place at work to dig again into the master-worker patch series.
I went back to what we discussed a few weeks ago concerning the pid management and the cmdline parsing which must be performed only once, until I realized that I was having trouble again with pid management due to two things that we cannot easily cover : - a new process starting with -sf which would replace the old processes running in master/worker mode. - binding issues within the master process when reloading a configuration The first point causes a simple issue : if we start a new process which wants to replace the old ones that were working in master-worker mode, it will send them a SIGTTOU+SIGUSR1. The issue there is that the master should release all of its listening sockets, forward the signal to all children then passively wait for all of them to leave (including old ones). This is not undoable but adds a bit of complexity, which I was planning on implementing anyway. The second point is more complex. I realized that there normally is the same config in the master process and in all of its active children. What happens if the master fails to reload a configuration ? I'm thinking about the worst issues, the one related to binding IP:ports, which can only be detected by an active process. If the issue is just a conflict with another socket, then surely the new process can restart with -sf instead and get rid of the issue (back to point 1). But still, the issue of keeping a master process alive with a config that has nothing to do with what its children are doing is messy at best, and very dangerous. My long-term solution would be to have everything related to the configuration behind a pointer, and have that pointer referenced in sessions, health checks etc... That way, each session would use the config it was instanciated with and it would be much easier to allow an old and a new conf to coexist. It would not completely fix the issues with some global settings though (eg: nbproc, pollers, tuning options, ...) but it would be a gain. That way, if the new config fails, we can switch pointers back to old config and forget everything that was attempted. Then my thinking went a bit further : before doing what is described above, we could have the current master fork a new master which would handle the new config and issue new processes. If for any reason that new process fails to start, it simply disappears and nothing changes. What I like with this method is that it also implicitly allows changing many global settings, even the master-worker mode may be changed. New sessions would simply use the new config from the new process and old sessions would remain on the old one. The only downside I can think of is that it will never provide any possibility to maintain stats or any state between the old and the new process, but that's secondary as the master-worker model is not meant for that either. Another advantage I was seeing to forking a new master from the old one was that we could probably keep the soft-restart semantics for the situations where the new process cannot bind : it could send a SIGTTOU to old processes to relinquish the ports and send a SIGTTIN in case of failure. That thinking gave me another idea that I have not developped yet. The core of your work is the socket cache and is what makes the system reliable. One possibility would that the master doesn't own any configuration at all, just the sockets. It would be the new processes that would connect to the socket cache to grab some ports, then fork the new workers as it is done today. I must say I'm not completely at ease with such a model because I think that an instanciator is needed, but I like the idea of an autonomous socket cache, which becomes sort of an interface between haproxy and the kernel. It will also help if one day we want to implement FTP support, as we'll have to be able to bind outgoing sockets to local port 20, and that could be performed by the central socket cache. With all this in mind, I think we need to discuss a bit more before going back to the keyboard. Given that a number of bugs have been fixed since 1.5-dev6, I'll probably issue -dev7 soon and that should not stop us from trying to elaborate a model that suits all needs. As usual, I'm very interested in getting your insights, comments, opinions, ideas, etc... Best regards, Willy