Re: [PATCH 00/26] Restart of haproxy without dropping connections
Hi Simon, On Mon, Mar 14, 2011 at 07:05:22AM +0900, Simon Horman wrote: But if it's in the chroot, it can't reread is configuration file. True, I have a slightly dubious way to deal with that. After chrooting the configuration file is opened relative to the chroot. So you need to do something like this. edit /etc/haproxy.cfg mkdir -p /a/chroot/etc cp /etc/haproxy.cfg /a/chroot/etc # chmod / chown /a/chroot/etc/aproxy.cfg as needed haproxy -sr $(cat /var/run/haproxy.pid) # signal restart I must say I really don't like this way of dealing with chroots. For me, a chroot is in an empty directory on a read-only file system, otherwise it's useless. The simple fact that the mkdir above is possible makes it possible for the chrooted processes to use that directory to escape the chroot. But I don't think that changing thing to keep it outside would be much trouble at all. I don't think it should be an issue either, maybe a minor change of the starting sequence. Also, probably that this master will later become the one able to log to real files and to do other things that need to be done outside of the chroot As it stands that should be easy enough to handle using the same technique that is used for the pid file. Of course adding such log directives on restart would need to be prohibited. But that probably needs to happen for permissions/capabilities reasons anyway. And in any case it should also not be difficult. OK. getppid is run once just before before entering the loop. And the workers send kill(ppid, 0) while in the loop. This only occurs in workers when master_worker mode is enabled. I didn't pay particular attention to performance considerations. If you are worried about that then perhaps the workers could kill less often, perhaps using a task. Yes I think that would be better. The task might be run once a second and will not consume more than one syscall per second whatever the load. I'd be much worried about the kill(ppid, 0) when dealing with 20 incoming connections per second! The feature was a request to allow situations where there is more than one process but not detached from the terminal - master_worker either without daemon or with -db (patch forthcoming) to play nicely with a monitor such as runit[1]. [1] http://smarden.org/runit/ OK, I did not imagine there was a use case for doing that. Well, if master_worker mode with -db doesn't daemonize it, that's already OK then. Understood. I'll verify that something reasonable occurs on -db. OK fine. I'll try to find some time ASAP to review your work, it will be easier to discuss it and you won't have to explain me things that I should have figured by myself if I had tested it :-) I don't mind explaining things. But I do look forward to a review. You'll get it, now that I have released 1.5-dev4, it'll be easier to stay focused on that. I'll probably try that tomorrow if time permits. Thanks! Willy
Re: [PATCH 00/26] Restart of haproxy without dropping connections
On Sun, Mar 13, 2011 at 11:16:33PM +0100, Willy Tarreau wrote: Hi Simon, On Mon, Mar 14, 2011 at 07:05:22AM +0900, Simon Horman wrote: But if it's in the chroot, it can't reread is configuration file. True, I have a slightly dubious way to deal with that. After chrooting the configuration file is opened relative to the chroot. So you need to do something like this. edit /etc/haproxy.cfg mkdir -p /a/chroot/etc cp /etc/haproxy.cfg /a/chroot/etc # chmod / chown /a/chroot/etc/aproxy.cfg as needed haproxy -sr $(cat /var/run/haproxy.pid) # signal restart I must say I really don't like this way of dealing with chroots. For me, a chroot is in an empty directory on a read-only file system, otherwise it's useless. The simple fact that the mkdir above is possible makes it possible for the chrooted processes to use that directory to escape the chroot. Our image of chroots is slightly different, though I do take your point. Unfortunately I'm not sure of a good alternative to re-opening the file. Keeping the fd open would be way to fragile as editors often create a new file rather than editing in-place. I guess we could pipe the configuration over a socket or something, and we could also use the same socket as the error-condition detector that you were discussing in an earlier email. But I don't think that changing thing to keep it outside would be much trouble at all. I don't think it should be an issue either, maybe a minor change of the starting sequence. Also, probably that this master will later become the one able to log to real files and to do other things that need to be done outside of the chroot As it stands that should be easy enough to handle using the same technique that is used for the pid file. Of course adding such log directives on restart would need to be prohibited. But that probably needs to happen for permissions/capabilities reasons anyway. And in any case it should also not be difficult. OK. getppid is run once just before before entering the loop. And the workers send kill(ppid, 0) while in the loop. This only occurs in workers when master_worker mode is enabled. I didn't pay particular attention to performance considerations. If you are worried about that then perhaps the workers could kill less often, perhaps using a task. Yes I think that would be better. The task might be run once a second and will not consume more than one syscall per second whatever the load. I'd be much worried about the kill(ppid, 0) when dealing with 20 incoming connections per second! Sure, I'll move it into a worker. The feature was a request to allow situations where there is more than one process but not detached from the terminal - master_worker either without daemon or with -db (patch forthcoming) to play nicely with a monitor such as runit[1]. [1] http://smarden.org/runit/ OK, I did not imagine there was a use case for doing that. It wasn't immediately obvious to me either, but I think it does make sense. Well, if master_worker mode with -db doesn't daemonize it, that's already OK then. Understood. I'll verify that something reasonable occurs on -db. OK fine. I'll try to find some time ASAP to review your work, it will be easier to discuss it and you won't have to explain me things that I should have figured by myself if I had tested it :-) I don't mind explaining things. But I do look forward to a review. You'll get it, now that I have released 1.5-dev4, it'll be easier to stay focused on that. I'll probably try that tomorrow if time permits. Great. I live in Tokyo. Thankfully I am ok. But as a result of the recent earthquakes in Japan I am likely to experience rolling blackouts over the next few weeks starting tomorrow. So I may be a little unresponsive from time to time. Your understanding is appreciated.
Re: [PATCH 00/26] Restart of haproxy without dropping connections
On Mon, Mar 14, 2011 at 07:26:30AM +0900, Simon Horman wrote: Unfortunately I'm not sure of a good alternative to re-opening the file. Keeping the fd open would be way to fragile as editors often create a new file rather than editing in-place. Indeed, keeping the fd open would not work. Many scripts do a cp or mv of the new file over the old one. I guess we could pipe the configuration over a socket or something, and we could also use the same socket as the error-condition detector that you were discussing in an earlier email. Using a socket would be very nice indeed, because the new process would be able to pass the config it has on its command line. But that requires basic socket handling from the new process before we ever start the polling loop, and for long I've thought this required some code duplication. I may be wrong though. In the mean time, I really think that keeping the master out of the chroot is the way to go. In my opinion it makes a lot of sense : the master is the one which owns rights on ports, config, etc... and which can fork slave process deprived from their permissions. Seen that way, there's really nothing that justifies jailing the master. I live in Tokyo. Thankfully I am ok. But as a result of the recent earthquakes in Japan I am likely to experience rolling blackouts over the next few weeks starting tomorrow. So I may be a little unresponsive from time to time. Your understanding is appreciated. OK, thanks for the info, I did not know you were there. I certainly understand the difficulties you might face, given the terrible images I've seen on the net! I wish you and your relatives a good luck ! Best regards, Willy
Re: [PATCH 00/26] Restart of haproxy without dropping connections
On Mon, Mar 14, 2011 at 12:37:17AM +0100, Willy Tarreau wrote: On Mon, Mar 14, 2011 at 07:26:30AM +0900, Simon Horman wrote: Unfortunately I'm not sure of a good alternative to re-opening the file. Keeping the fd open would be way to fragile as editors often create a new file rather than editing in-place. Indeed, keeping the fd open would not work. Many scripts do a cp or mv of the new file over the old one. I guess we could pipe the configuration over a socket or something, and we could also use the same socket as the error-condition detector that you were discussing in an earlier email. Using a socket would be very nice indeed, because the new process would be able to pass the config it has on its command line. But that requires basic socket handling from the new process before we ever start the polling loop, and for long I've thought this required some code duplication. I may be wrong though. In the mean time, I really think that keeping the master out of the chroot is the way to go. In my opinion it makes a lot of sense : the master is the one which owns rights on ports, config, etc... and which can fork slave process deprived from their permissions. Seen that way, there's really nothing that justifies jailing the master. Yes, I think that is an approach worth investigating. I'll see about implementing it.
Re: [PATCH 00/26] Restart of haproxy without dropping connections
On Thu, Mar 10, 2011 at 08:46:20AM +0100, Willy Tarreau wrote: On Thu, Mar 10, 2011 at 03:52:58PM +0900, Simon Horman wrote: Wow, I'm impressed, you managed to do that really quickly ! Thanks. Its taken quite a bit of effort to get this far. I have no doubt about that :-) In the mean time, I have a few questions which come to mind : - does the socket cache consider all of the bind parameters ? (eg: mss, interface, transparent, ...) Yes, it attempts to take them into account, though in a very naive way. If they all match then the socket cache entry can be re-used. Otherwise the entry is invalidated. OK. - what happens if the new config file uses some conflicting bind entries ? Eg: old config used to listen on 192.168.1.1:80 and the new one uses 0.0.0.0:80 ? Or even :80 for the old one (IPv4) and :::80 for the new one (IPv6) ? I think there will be a problem. Lets talk about what would be a sensible thing to do. In my opinion we should define how to fail a restart. Right now with -sf/-st, the new process is able to detect the failure to bind and exits with an error which can be detected by management scripts and reported to the admin which must immediately take action. I think that -sr currently behaves that way. I'll check. I think that a failure to restart smoothly means that the admin will have to take the decision to restart harder (-sf/-st). Sometimes the scripts will do that by themselves as an automatic fallback. So that make me believe that all we want is the master process to refuse to change anything in case an anomaly is detected, and to indicate its refusal. Right. The main thing that I was worrying about is that the master process probably won't know its that by the time the master process detects an anomaly its probably already changes things. So it will either need to know how to roll back, or detect problems before making changes to itself. That said, there are probably only a few changes to the master that actually matter. For example, if it has loaded a configuration file with a bogus proxy port in it, then it doesn't need to roll that back, it just needs to tell the old workers to keep going - they still have the old config. So I think that the main problem that I have is knowing what the master would need to rollback. Loggers spring to mind. I'm sure there are a few other things. Note, chroot, daemon and master_worker currently can't be changed on restart, so we don't need to worry about them. We cannot make it send a signal to the calling process because it does not know its pid, and even if it knew it it would probably not have enough permissions to send it anyway. But we should find something the master process can act on that the calling one can detect (eg: among the horrible ideas, if it's not chrooted, it can change its pid and update the pidfile, but that's ugly). The master process already has a facility to rewrite the pid file even if it is chrooted (it keeps the fd open) so that could work. The pidfile currently does change on restart - all worker's pids are updated, polling the pid file for something new and valid or an error status could work. As an aside, having the worker pids in the pid file isn't strictly necessary. I only added them for consistency with the existing usage of the pidfile. But the master could actually just write the master pid and close the fd. - does the master send a signal to all children asking them to unbind (as we did with -sf) ? Yes. It sends them a SIGUSR1. It does not send them a SIGTTOU. Because although that works fine I am not entirely sure how to sanely unwind the master at that point. I think it's OK that way because the master still owns the sockets, so even in case of a late failure, it can restart new processes anyway. - do the debug modes (-d/-db) disable the master_worker mode ? No. I can make that so if you like. Yes that's the idea. Many users abuse -d/-db (including me) and it's important that they don't have to touch their config for this. In the case of -db, master_worker mode doesn't imply background (daemon) mode, so there shouldn't be any changes there by adding my current patches. Do you want me to disable it anyway? In fact I'm interested in any corner cases we should be aware of so that we can clearly document them and indicate how to handle them (eg: fall back to -st if it's not possible to rebind, etc...) I don't know of any off hand. But one of my reasons for posting the code was to get more eyes on it so we can find such cases. Fine. Any failure to restart with the expected config typically is what I'd call a corner case. That's why I'd like that we find a way to safely fail so that proper action can be taken externally. Re-initialising the configuration was somewhat non-trivial, and I am sure that I have missed a few things. I'm certain that
Re: [PATCH 00/26] Restart of haproxy without dropping connections
On Thu, Mar 10, 2011 at 09:40:44PM +0100, Willy Tarreau wrote: On Thu, Mar 10, 2011 at 05:40:19PM +0900, Simon Horman wrote: I think that a failure to restart smoothly means that the admin will have to take the decision to restart harder (-sf/-st). Sometimes the scripts will do that by themselves as an automatic fallback. So that make me believe that all we want is the master process to refuse to change anything in case an anomaly is detected, and to indicate its refusal. Right. The main thing that I was worrying about is that the master process probably won't know its that by the time the master process detects an anomaly its probably already changes things. So it will either need to know how to roll back, or detect problems before making changes to itself. Good point. That said, there are probably only a few changes to the master that actually matter. For example, if it has loaded a configuration file with a bogus proxy port in it, then it doesn't need to roll that back, it just needs to tell the old workers to keep going - they still have the old config. I agree. The master should only have to take care of keeping the bound sockets cache. It does not need to even keep a copy of the config at all provided it knows how the sockets were bound. As it stands the master needs the entire configuration as it actually starts all the proxies and then forks the workers. This is in order to allow the master to know what sockets to bind and unbind. But other than that it doesn't care about most of the configuration. So I think that the main problem that I have is knowing what the master would need to rollback. Loggers spring to mind. I'm sure there are a few other things. Rolling back an unused config should not be too hard a work once we move all the config behind a pointer. Some little work was started on that a long time ago, you'll notice an unused parent member in the proxy struct, to attach it to a configuration instance. The idea would be that we could have all the global config and all proxies in a same struct behind a pointer, and instantiate a new one when reloading the config, then once everything's OK, we can switch the configs and fork new processes. The socket cache just has to be global. And even if we add new listening sockets to this cache when trying to start a new config, it's not dramatic. Yes, I think that would work. But I wonder if we need to go that far, as its probably ok for the master to have an invalid configuration so long as we are careful about the (few?) configuration parameters that both affect the master and can be changed on restart. The socket_cache currently is global. Note, chroot, daemon and master_worker currently can't be changed on restart, so we don't need to worry about them. That makes sense ;-) The master process already has a facility to rewrite the pid file even if it is chrooted (it keeps the fd open) so that could work. The pidfile currently does change on restart - all worker's pids are updated, polling the pid file for something new and valid or an error status could work. That makes me think that we should most likely keep the master out of the chroot anyway, since it does not participate into network communications, it's not exposed. I don't have any strong opinions on that either way. As it is, having it in the chroot isn't a major burden. But I don't think that changing thing to keep it outside would be much trouble at all. As an aside, having the worker pids in the pid file isn't strictly necessary. Yes it is, because if the master dies, you have no way to tell which children were related to the dead service and must be killed to achieve a sane restart. True. But the children will detect that the master has died and exit of their own accord. I only added them for consistency with the existing usage of the pidfile. But the master could actually just write the master pid and close the fd. (...) - do the debug modes (-d/-db) disable the master_worker mode ? No. I can make that so if you like. Yes that's the idea. Many users abuse -d/-db (including me) and it's important that they don't have to touch their config for this. In the case of -db, master_worker mode doesn't imply background (daemon) mode, so there shouldn't be any changes there by adding my current patches. Do you want me to disable it anyway? Well, if master_worker mode with -db doesn't daemonize it, that's already OK then. Understood. I'll verify that something reasonable occurs on -db.
Re: [PATCH 00/26] Restart of haproxy without dropping connections
On Thu, Mar 10, 2011 at 01:32:42PM +0900, Simon Horman wrote: Hi, this patch series attempts to allow haproxy to be restarted - perhaps reconfigured would be a better term - without refusing or dropping connections. The model that I am using is that of a master process, which really does nothing but hold bound file descriptors, and workers - these should be very similar to what haproxy processes have looked like up to now. This feature can be enabled using the -M command line option of the master_worker configuration file option. It may be used in conjunction with daemon. In order to effect a restart a SIGUSR2 should be sent to the master process. The master process is guaranteed to be the first pid listed in the pid file. It will also be the parent process, and can thus be found using ps. It should also be noted in the logs. SIGUSR2 is ignored if master/worker mode is not enabled. SIGUSR2 is ignored by workers. The patches are available in git git://github.com/horms/haproxy.git master
Re: [PATCH 00/26] Restart of haproxy without dropping connections
Hi Simon, On Thu, Mar 10, 2011 at 01:32:42PM +0900, Simon Horman wrote: Hi, this patch series attempts to allow haproxy to be restarted - perhaps reconfigured would be a better term - without refusing or dropping connections. Wow, I'm impressed, you managed to do that really quickly ! You sent that at the right moment, because I have almost settled down on the internal changes needed to work on server-side keep-alive, so I should issue -dev4 once I'm fine with those changes. Then we'll be able to start by integrating your changes. In the mean time, I have a few questions which come to mind : - does the socket cache consider all of the bind parameters ? (eg: mss, interface, transparent, ...) - what happens if the new config file uses some conflicting bind entries ? Eg: old config used to listen on 192.168.1.1:80 and the new one uses 0.0.0.0:80 ? Or even :80 for the old one (IPv4) and :::80 for the new one (IPv6) ? - does the master send a signal to all children asking them to unbind (as we did with -sf) ? - do the debug modes (-d/-db) disable the master_worker mode ? In fact I'm interested in any corner cases we should be aware of so that we can clearly document them and indicate how to handle them (eg: fall back to -st if it's not possible to rebind, etc...) Thanks ! Willy
Re: [PATCH 00/26] Restart of haproxy without dropping connections
Hi Willy, On Thu, Mar 10, 2011 at 07:29:11AM +0100, Willy Tarreau wrote: Hi Simon, On Thu, Mar 10, 2011 at 01:32:42PM +0900, Simon Horman wrote: Hi, this patch series attempts to allow haproxy to be restarted - perhaps reconfigured would be a better term - without refusing or dropping connections. Wow, I'm impressed, you managed to do that really quickly ! Thanks. Its taken quite a bit of effort to get this far. You sent that at the right moment, because I have almost settled down on the internal changes needed to work on server-side keep-alive, so I should issue -dev4 once I'm fine with those changes. Then we'll be able to start by integrating your changes. Great. In the mean time, I have a few questions which come to mind : - does the socket cache consider all of the bind parameters ? (eg: mss, interface, transparent, ...) Yes, it attempts to take them into account, though in a very naive way. If they all match then the socket cache entry can be re-used. Otherwise the entry is invalidated. Thanks for earlier advice on this else I would have probably written code that didn't take these parameters into account at all. - what happens if the new config file uses some conflicting bind entries ? Eg: old config used to listen on 192.168.1.1:80 and the new one uses 0.0.0.0:80 ? Or even :80 for the old one (IPv4) and :::80 for the new one (IPv6) ? I think there will be a problem. Lets talk about what would be a sensible thing to do. - does the master send a signal to all children asking them to unbind (as we did with -sf) ? Yes. It sends them a SIGUSR1. It does not send them a SIGTTOU. Because although that works fine I am not entirely sure how to sanely unwind the master at that point. - do the debug modes (-d/-db) disable the master_worker mode ? No. I can make that so if you like. In fact I'm interested in any corner cases we should be aware of so that we can clearly document them and indicate how to handle them (eg: fall back to -st if it's not possible to rebind, etc...) I don't know of any off hand. But one of my reasons for posting the code was to get more eyes on it so we can find such cases. Re-initialising the configuration was somewhat non-trivial, and I am sure that I have missed a few things. Thanks ! Willy
Re: [PATCH 00/26] Restart of haproxy without dropping connections
On Thu, Mar 10, 2011 at 03:52:58PM +0900, Simon Horman wrote: Wow, I'm impressed, you managed to do that really quickly ! Thanks. Its taken quite a bit of effort to get this far. I have no doubt about that :-) In the mean time, I have a few questions which come to mind : - does the socket cache consider all of the bind parameters ? (eg: mss, interface, transparent, ...) Yes, it attempts to take them into account, though in a very naive way. If they all match then the socket cache entry can be re-used. Otherwise the entry is invalidated. OK. - what happens if the new config file uses some conflicting bind entries ? Eg: old config used to listen on 192.168.1.1:80 and the new one uses 0.0.0.0:80 ? Or even :80 for the old one (IPv4) and :::80 for the new one (IPv6) ? I think there will be a problem. Lets talk about what would be a sensible thing to do. In my opinion we should define how to fail a restart. Right now with -sf/-st, the new process is able to detect the failure to bind and exits with an error which can be detected by management scripts and reported to the admin which must immediately take action. I think that a failure to restart smoothly means that the admin will have to take the decision to restart harder (-sf/-st). Sometimes the scripts will do that by themselves as an automatic fallback. So that make me believe that all we want is the master process to refuse to change anything in case an anomaly is detected, and to indicate its refusal. We cannot make it send a signal to the calling process because it does not know its pid, and even if it knew it it would probably not have enough permissions to send it anyway. But we should find something the master process can act on that the calling one can detect (eg: among the horrible ideas, if it's not chrooted, it can change its pid and update the pidfile, but that's ugly). - does the master send a signal to all children asking them to unbind (as we did with -sf) ? Yes. It sends them a SIGUSR1. It does not send them a SIGTTOU. Because although that works fine I am not entirely sure how to sanely unwind the master at that point. I think it's OK that way because the master still owns the sockets, so even in case of a late failure, it can restart new processes anyway. - do the debug modes (-d/-db) disable the master_worker mode ? No. I can make that so if you like. Yes that's the idea. Many users abuse -d/-db (including me) and it's important that they don't have to touch their config for this. In fact I'm interested in any corner cases we should be aware of so that we can clearly document them and indicate how to handle them (eg: fall back to -st if it's not possible to rebind, etc...) I don't know of any off hand. But one of my reasons for posting the code was to get more eyes on it so we can find such cases. Fine. Any failure to restart with the expected config typically is what I'd call a corner case. That's why I'd like that we find a way to safely fail so that proper action can be taken externally. Re-initialising the configuration was somewhat non-trivial, and I am sure that I have missed a few things. I'm certain that some parts were a real nightmare ! I'll take time to review all your patches because yes, it's possible that you fell in some traps or had to take some decisions that might have other impacts. That's also why I'd like to get your code quickly merged, it's the best way to find what might have been overlooked. Best regards, Willy