Bug#591030: bcfg2-server init scripts are a complete mess
Hi Arto, What you say seems understandable and reasonable to me. Thanks for the hints about the recommendations of the LSB ;) Just one point, certainly a tad out off topic (but as we already discussed about the client...) : as the "push" part of the agent mode is supposed to now happen through triggers, I think this may be a case for adding a bcfg2 system user, this time, not for the server (like has been suggested in another bug), but rather for the client. At least, this is probably what I intend to do, so to avoid direct connection to the root acount : I may then allow the "bcfg2", or "bcfg2-client", user to passwordless-ly sudo with root rights, to run the bcfg2 command I need it to, and only this one (it may not change much security-wise, in respect with directly connecting to root, with the only right to issue this commandline, but management-wise, it'll be clearer to me - if any and all network daemon that runs as root needed to be replaced with ssh-ing to the root account, this could really start getting unmanageable and unsortable - "sort of trashbin" accounts are a pain, and root certainly does not need this). Just a thought I think it may be not so useless to throw here. Could be something to think about, and include in the README, when you fix the agent disappearance... Have a good day ! -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#591030: bcfg2-server init scripts are a complete mess
Hi Tuomas, Tuomas Noraef writes: > Just two things I noted : > > - first thing in the "stop" stanza : the 15 seconds timeout to switch > from SIGTERM to SIGKILL may be a bit long, especially until the server > is patched to send the SIGTERM to its worker thread, but I don't know > : maybe it is a Debian guideline or something. I'll leave that up to > you to decide : just noting a systematic (at least for now) 15 seconds > is a tad long to wait, especially when testing things with the server > (did quite a few bcfg2-server restarts when I first tested SSL > bi-lateral authentication, and, well, this partly was already because > of that I had removed the 5 seconds sleep there was in the old init > script... so 15 seconds... : 5 seconds could be sufficient, I guess). Agreed, 15 seconds is a long time to wait for something that will never happen. There are a few reasons, though. Just straight up killing the daemon can be dangerous (it can for example stop it from writing some of it's database files completely, and corrupt the repository), and should not be done lightly. Under normal conditions the daemon is unlikely to require 15 seconds to stop properly, but under heavy io load or other less normal conditions it could take long. Since the daemon does not exit after receiving the TERM signal (due to the bugs), we have no way of knowing when it is actually done, and I prefer to be safe than sorry. I'm hoping that version 1.1 comes out with the fix for the actual bug before it's too late to get it into squeeze, or that I manage to backport the bugfix to the current version. I'm not willing to release with an RC version. > - other thing is in the "status" stanza : you make the init script > exit with a "3" error code, if the server was not running, which > outputs something like : > > "bcfg2-server is not running > invoke-rc.d: initscript bcfg2-server, action "status" failed." > > the second line of which I am not so sure is useful... basically, you > make the init script exit with a non-zero error code, even though it > is not the culprit, the bcfg2-server at most being the one : the > action "status" did not failed in itself - in that case, it only > failed our hopes, mostly. Maybe the status stanza should exit with a > "0" code whatever happens, and use a "log_daemon_msg" to signal > anything, would you find the need to ? The Debian Policy does not currently define the init script status action at all, but the LSB standard does. Returning 3 when the daemon is not running is what the LSB expects, and I prefer to stay relatively close to the standard (it also requires other return codes for cases when the pid file exists but the daemon is not running and such, but I haven't bothered to implement those yet). See http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html for the init script part of LSB. > Anyway, thanks for caring this bug, and maintaining bcfg2 in Debian : > after testing a bit more, I really felt in love with it (a few gripes, > like documentation [already bought the Sage short topics which is > rather good, albeit a tad outdated and, well... short], the > disappearance of a native agent-mode, XML [still... though I am > already trying to craft some XML files through auto-templating]... > but, still : IMHO far better than anything else I tried, and I tried > pretty much everything else in this domain). Oh, and thanks for > ditching the use of the LSB "init-functions", unwrapping the > start-stop-daemon from these horrors : highly appreciated :p The documentation situation is hopefully improving, there are people working on it upstream currently. I'm not really a fan of XML either, but Bcfg2 does work very well for what I'm doing with it. I'll mark this bug as pending for the time being, there are a few other issues I want to address before uploading (including the no longer existing agent mode). -- Arto Jantunen -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#591030: bcfg2-server init scripts are a complete mess
Hi Arto, Tested your patch, and all pretty seems a lot better. No more multi-spawning of the daemon, no more PID file chainsawing... seems good. Just two things I noted : - first thing in the "stop" stanza : the 15 seconds timeout to switch from SIGTERM to SIGKILL may be a bit long, especially until the server is patched to send the SIGTERM to its worker thread, but I don't know : maybe it is a Debian guideline or something. I'll leave that up to you to decide : just noting a systematic (at least for now) 15 seconds is a tad long to wait, especially when testing things with the server (did quite a few bcfg2-server restarts when I first tested SSL bi-lateral authentication, and, well, this partly was already because of that I had removed the 5 seconds sleep there was in the old init script... so 15 seconds... : 5 seconds could be sufficient, I guess). - other thing is in the "status" stanza : you make the init script exit with a "3" error code, if the server was not running, which outputs something like : "bcfg2-server is not running invoke-rc.d: initscript bcfg2-server, action "status" failed." the second line of which I am not so sure is useful... basically, you make the init script exit with a non-zero error code, even though it is not the culprit, the bcfg2-server at most being the one : the action "status" did not failed in itself - in that case, it only failed our hopes, mostly. Maybe the status stanza should exit with a "0" code whatever happens, and use a "log_daemon_msg" to signal anything, would you find the need to ? What do you think of these two suggestions ? Oh, by the way : the FAM worker thread seems to end up dying on its own once the server is stopped. I hadn't checked this with my patched init script, but it already did (takes a few dozen seconds), and does it as well with yours. So this zombie doesn't crawl very far once the bcfg2-server is killed, which is a good thing to note. Anyway, thanks for caring this bug, and maintaining bcfg2 in Debian : after testing a bit more, I really felt in love with it (a few gripes, like documentation [already bought the Sage short topics which is rather good, albeit a tad outdated and, well... short], the disappearance of a native agent-mode, XML [still... though I am already trying to craft some XML files through auto-templating]... but, still : IMHO far better than anything else I tried, and I tried pretty much everything else in this domain). Oh, and thanks for ditching the use of the LSB "init-functions", unwrapping the start-stop-daemon from these horrors : highly appreciated :p Farewell. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#591030: bcfg2-server init scripts are a complete mess
Hi Tuomas, Tuomas Noraef writes: > While being in the process of evaluating bcfg2, I came across its > Debian init scripts, which are IMHO a complete mess. Sorry to come to > use these terms, but I am really being honest... I hardly believe such > bugs (really release critical, I guess, as I can't imagine even > remotely using bcfg2, or, more simply, can't use it, with init scripts > in this state) have not been reported yet. I did not write the bcfg2 init scripts, and hadn't really even looked at them much before this bug report. I would have noticed at least few of these issues if I had. Bcfg2 does not have many users on Debian, so noticing bugs may take time. Also I am not running the squeeze version of the server in production (I am running that version of the client, and the lenny version of the server, which does not show most of these issues). I went through the issues you reported and the script itself, and decided to rewrite it in a more debian-like way (using start-stop-daemon instead of the lsb functions, etc) instead of trying to fix the current one. I'm attaching my version as a patch to this mail, I would appreciate it if you could test it and make sure that all of the reported problems are fixed by it. Also it appears that the bugs with server shutdown are currently thought to be fixed upstream, but the fixes aren't easy to backport to the current stable version. I'll have a closer look at this, but it may be that we will need to wait for 1.1 to be released until this is properly fixed and just kill the daemon from the init script until that happens. I will also sort out the situation with the bcfg2 client init script, probably by removing it and noting the removal in NEWS.Debian before uploading a new version. PS. Please use the unified diff format (diff -u) when sending patches, it's quite a bit more readable. -- Arto Jantunen --- bcfg2-server.orig 2010-03-30 10:20:45.0 +0300 +++ bcfg2-server 2010-08-02 15:45:21.291754838 +0300 @@ -1,112 +1,135 @@ -#!/bin/sh -# -# bcfg-server - Bcfg2 configuration daemon -# -# chkconfig: 2345 19 81 -# description: bcfg2 server for configuration requests -# +#! /bin/sh ### BEGIN INIT INFO # Provides: bcfg2-server -# Required-Start:$network $remote_fs $named -# Required-Stop: $network $remote_fs $named +# Required-Start:$network $remote_fs $named $syslog +# Required-Stop: $network $remote_fs $named $syslog # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Configuration management Server -# Description: Bcfg2 is a configuration management system that builds -#installs configuration files served by bcfg2-server +# Description: The server component of the Bcfg2 configuration management +#system ### END INIT INFO -# Include lsb functions -. /lib/lsb/init-functions +# Author: Arto Jantunen + +# PATH should only include /usr/* if it runs after the mountnfs.sh script +PATH=/sbin:/usr/sbin:/bin:/usr/bin +DESC="Configuration management server" +NAME=bcfg2-server +DAEMON=/usr/sbin/$NAME +PIDFILE=/var/run/$NAME.pid +DAEMON_ARGS="-D $PIDFILE" +SCRIPTNAME=/etc/init.d/$NAME -# Commonly used stuff -DAEMON=/usr/sbin/bcfg2-server -PIDFILE=/var/run/bcfg2-server.pid -PARAMS="-D $PIDFILE" +# Exit if the package is not installed +[ -x "$DAEMON" ] || exit 0 # Disabled per default BCFG2_SERVER_OPTIONS="" BCFG2_SERVER_ENABLED=0 -# Include default startup configuration if exists -test -f "/etc/default/bcfg2-server" && . /etc/default/bcfg2-server +# Read configuration variable file if it is present +[ -r /etc/default/$NAME ] && . /etc/default/$NAME + +# Load the VERBOSE setting and other rcS variables +. /lib/init/vars.sh + +# Define LSB log_* functions. +# Depend on lsb-base (>= 3.2-14) to ensure that this file is present +# and status_of_proc is working. +. /lib/lsb/init-functions if [ "$BCFG2_SERVER_ENABLED" -eq 0 ] ; then log_failure_msg "bcfg2-server is disabled - see /etc/default/bcfg2-server" exit 0 fi -# Exit if $DAEMON doesn't exist and is not executable -test -x $DAEMON || exit 5 - -# Internal variables -BINARY=$(basename $DAEMON) - -start () { -echo -n "Starting Configuration Management Server: " -start_daemon ${DAEMON} ${PARAMS} ${BCFG2_SERVER_OPTIONS} -STATUS=$? -if [ "$STATUS" = 0 ] -then -log_success_msg "bcfg2-server" -test -d /var/lock/subsys && touch /var/lock/subsys/bcfg2-server -else -log_failure_msg "bcfg2-server" -fi -return $STATUS -} - -stop () { -echo -n "Stopping Configuration Management Server: " -killproc -p $PIDFILE ${BINARY} -STATUS=$? -if [ "$STATUS" = 0 ]; then - log_success_msg "bcfg2-server" - test -d /var/lock/subsys && touch /var/lock/subsys/bcfg2-server -else - log_failure_msg "bcfg2-server" -fi -return $STATUS +# +# Function that starts the daemon/service +# +do_start() +{ + # Return + #
Bug#591030: bcfg2-server init scripts are a complete mess
Subject: bcfg2-server init scripts are a complete mess Package: bcfg2-server Version: 1.0.1-2 Justification: renders package unusable Severity: grave Tags: patch *** Please type your report below this line *** Hi ! While being in the process of evaluating bcfg2, I came across its Debian init scripts, which are IMHO a complete mess. Sorry to come to use these terms, but I am really being honest... I hardly believe such bugs (really release critical, I guess, as I can't imagine even remotely using bcfg2, or, more simply, can't use it, with init scripts in this state) have not been reported yet. * To start with the "start" (sic) stanza : - well it works quite right on a standard machine, as well on boot as manually. But I have no interest in running it on a standard machine : I need it to be in an OpenVZ container (as long as LXC is still rough, with a lack of commodity executables and other scripts, hardware sharing, ressource management, and so on - until all of that is better in LXC, OpenVZ still rocks a lot - a lot more than KVM and such, which I deem as stupidly used most of the time, while containers do most jobs better, more securely, and consuming less resources - I like KVM, mind you... at most, on top of container-ization, or for networks simulations - but for segregating services ? As ridiculous as an H-bomb so to kill a mosquito). And here comes the problem : it will start manually in an OpenVZ container, but not on its start-up (this one is a special case, of a "not so frequent to come to figure out"-kind, but still... annoying). Fiddling a bit, I came accross the fact that it seems to run too early. Maybe every and all things start quicker, or slower, or in a better way, or in a luckily worse way, outside of an OpenVZ container (I don't know, and honestly, I don't really care), but inside, it is not the case (what I know is that in the test container I set up, a "ls /etc/rc2.d|grep 01" only shows "S01bcfg2", "S01bcfg2-server", S01bootlogs" and "S01rsyslog" as being launched that early : so common sense hints to bcfg2-server requiring rsyslog to be launched before it's spawned). Activating its debug mode, I realized bcfg2-server would try to start, but fail, and be killed immediately - except if it was to be run once the syslog has been started (verified through an "invoke-rc.d rsyslog stop && invoke-rc.d bcfg2-server start" : bcfg2-server indeed fails to start) : hence, I added a $syslog in the "Required-start", as (new) Squeeze (containers) use the dependency based init system to determine the running order at the boot, and after an "update-rc.d bcfg2-server defaults" and a container restart, voilĂ ! This way, bcfg2-server is run just fine at the boot/init of an OpenVZ container. - well... almost "voilĂ ". In the event the "stop" stanza would work (wich it is'nt... but I'll explain it a bit farther), problem could arise. Indeed, there is nothing in your script to protect against launching the bcfg2-server if it already is running. Or, well, if you do this while it already is, it seems to write bogus information in the PID file : try this, starting with a stopped server, and through an "invoke-rc.d bcfg2-server start && cat /var/run/bcfg2-server.pid && invoke-rc.d bcfg2-server start && cat /var/run/bcfg2-server.pid && pidof -x /usr/sbin/bcfg2-server", and you'll see the PID file gets renewed, but the PID of the running "bcfg2-server" actually is... : surprise, it is not what is in the PID file, but instead is the one which was initially existing (I guess the idiotic "start_daemon" functions from LSB starts a new server instance, overwrites its new PID in the PID file, even though this server seems to die for whatever reason, only leaving the original one running... or something like that) - which is highly problematic if one would then want to stop the server (which will necessarily happen, whenever, as stopping the server requires a correct PID file...)... I guess this is another example, amongst a freaking multitude, of why the LSB "init-functions" is a pile of buggy and, mostly, useless crap... but, well, whatever : you seem to want to use it, so, let us try to cope with what you want... problem is simply resolved by testing wether the server is running before trying to launch it, and silently tell everything went OK but not do anything, if it was already running (this also implies setting the PID variable outside of the "status" function of the init script, as it now makes it also be used in the "start" function). Two problems solved... but this is far from ending there. * Now, for the "stop" stanza : - there seems to be a recurring bug in bcfg2-server (see http://trac.mcs.anl.gov/projects/bcfg2/ticket/709 - bug opened, then closed, then re-opened, then re-closed, ... on and on), where the server doesn't fully respond to a SIGTERM, because the fam/gamin/whatever worker thread doesn't stop, while the server waits indefinately for it to terminate... which results in the "sto