Bug#591030: bcfg2-server init scripts are a complete mess

2010-08-03 Thread Tuomas Noraef
Hi Arto,

What you say seems understandable and reasonable to me. Thanks for the
hints about the recommendations of the LSB ;)

Just one point, certainly a tad out off topic (but as we already
discussed about the client...) : as the "push" part of the agent mode
is supposed to now happen through triggers, I think this may be a case
for adding a bcfg2 system user, this time, not for the server (like
has been suggested in another bug), but rather for the client. At
least, this is probably what I intend to do, so to avoid direct
connection to the root acount : I may then allow the "bcfg2", or
"bcfg2-client", user to passwordless-ly sudo with root rights, to run
the bcfg2 command I need it to, and only this one (it may not change
much security-wise, in respect with directly connecting to root, with
the only right to issue this commandline, but management-wise, it'll
be clearer to me - if any and all network daemon that runs as root
needed to be replaced with ssh-ing to the root account, this could
really start getting unmanageable and unsortable - "sort of trashbin"
accounts are a pain, and root certainly does not need this). Just a
thought I think it may be not so useless to throw here. Could be
something to think about, and include in the README, when you fix the
agent disappearance...

Have a good day !



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#591030: bcfg2-server init scripts are a complete mess

2010-08-03 Thread Arto Jantunen
Hi Tuomas,

Tuomas Noraef  writes:

> Just two things I noted :
>
> - first thing in the "stop" stanza : the 15 seconds timeout to switch
> from SIGTERM to SIGKILL may be a bit long, especially until the server
> is patched to send the SIGTERM to its worker thread, but I don't know
> : maybe it is a Debian guideline or something. I'll leave that up to
> you to decide : just noting a systematic (at least for now) 15 seconds
> is a tad long to wait, especially when testing things with the server
> (did quite a few bcfg2-server restarts when I first tested SSL
> bi-lateral authentication, and, well, this partly was already because
> of that I had removed the 5 seconds sleep there was in the old init
> script... so 15 seconds... : 5 seconds could be sufficient, I guess).

Agreed, 15 seconds is a long time to wait for something that will never
happen. There are a few reasons, though. Just straight up killing the daemon
can be dangerous (it can for example stop it from writing some of it's
database files completely, and corrupt the repository), and should not be done
lightly. Under normal conditions the daemon is unlikely to require 15 seconds
to stop properly, but under heavy io load or other less normal conditions it
could take long. Since the daemon does not exit after receiving the TERM
signal (due to the bugs), we have no way of knowing when it is actually done,
and I prefer to be safe than sorry.

I'm hoping that version 1.1 comes out with the fix for the actual bug before
it's too late to get it into squeeze, or that I manage to backport the bugfix
to the current version. I'm not willing to release with an RC version.

> - other thing is in the "status" stanza : you make the init script
> exit with a "3" error code, if the server was not running, which
> outputs something like :
>
> "bcfg2-server is not running
> invoke-rc.d: initscript bcfg2-server, action "status" failed."
>
> the second line of which I am not so sure is useful... basically, you
> make the init script exit with a non-zero error code, even though it
> is not the culprit, the bcfg2-server at most being the one : the
> action "status" did not failed in itself - in that case, it only
> failed our hopes, mostly. Maybe the status stanza should exit with a
> "0" code whatever happens, and use a "log_daemon_msg" to signal
> anything, would you find the need to ?

The Debian Policy does not currently define the init script status action at
all, but the LSB standard does. Returning 3 when the daemon is not running is
what the LSB expects, and I prefer to stay relatively close to the standard
(it also requires other return codes for cases when the pid file exists but
the daemon is not running and such, but I haven't bothered to implement those
yet). See
http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
for the init script part of LSB.

> Anyway, thanks for caring this bug, and maintaining bcfg2 in Debian :
> after testing a bit more, I really felt in love with it (a few gripes,
> like documentation [already bought the Sage short topics which is
> rather good, albeit a tad outdated and, well... short], the
> disappearance of a native agent-mode, XML [still... though I am
> already trying to craft some XML files through auto-templating]...
> but, still : IMHO far better than anything else I tried, and I tried
> pretty much everything else in this domain). Oh, and thanks for
> ditching the use of the LSB "init-functions", unwrapping the
> start-stop-daemon from these horrors : highly appreciated :p

The documentation situation is hopefully improving, there are people working
on it upstream currently. I'm not really a fan of XML either, but Bcfg2 does
work very well for what I'm doing with it.

I'll mark this bug as pending for the time being, there are a few other issues
I want to address before uploading (including the no longer existing agent
mode).

-- 
Arto Jantunen



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#591030: bcfg2-server init scripts are a complete mess

2010-08-02 Thread Tuomas Noraef
Hi Arto,

Tested your patch, and all pretty seems a lot better. No more
multi-spawning of the daemon, no more PID file chainsawing... seems
good.

Just two things I noted :

- first thing in the "stop" stanza : the 15 seconds timeout to switch
from SIGTERM to SIGKILL may be a bit long, especially until the server
is patched to send the SIGTERM to its worker thread, but I don't know
: maybe it is a Debian guideline or something. I'll leave that up to
you to decide : just noting a systematic (at least for now) 15 seconds
is a tad long to wait, especially when testing things with the server
(did quite a few bcfg2-server restarts when I first tested SSL
bi-lateral authentication, and, well, this partly was already because
of that I had removed the 5 seconds sleep there was in the old init
script... so 15 seconds... : 5 seconds could be sufficient, I guess).

- other thing is in the "status" stanza : you make the init script
exit with a "3" error code, if the server was not running, which
outputs something like :

"bcfg2-server is not running
invoke-rc.d: initscript bcfg2-server, action "status" failed."

the second line of which I am not so sure is useful... basically, you
make the init script exit with a non-zero error code, even though it
is not the culprit, the bcfg2-server at most being the one : the
action "status" did not failed in itself - in that case, it only
failed our hopes, mostly. Maybe the status stanza should exit with a
"0" code whatever happens, and use a "log_daemon_msg" to signal
anything, would you find the need to ?

What do you think of these two suggestions ?

Oh, by the way : the FAM worker thread seems to end up dying on its
own once the server is stopped. I hadn't checked this with my patched
init script, but it already did (takes a few dozen seconds), and does
it as well with yours. So this zombie doesn't crawl very far once the
bcfg2-server is killed, which is a good thing to note.

Anyway, thanks for caring this bug, and maintaining bcfg2 in Debian :
after testing a bit more, I really felt in love with it (a few gripes,
like documentation [already bought the Sage short topics which is
rather good, albeit a tad outdated and, well... short], the
disappearance of a native agent-mode, XML [still... though I am
already trying to craft some XML files through auto-templating]...
but, still : IMHO far better than anything else I tried, and I tried
pretty much everything else in this domain). Oh, and thanks for
ditching the use of the LSB "init-functions", unwrapping the
start-stop-daemon from these horrors : highly appreciated :p

Farewell.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#591030: bcfg2-server init scripts are a complete mess

2010-08-02 Thread Arto Jantunen
Hi Tuomas,

Tuomas Noraef  writes:

> While being in the process of evaluating bcfg2, I came across its
> Debian init scripts, which are IMHO a complete mess. Sorry to come to
> use these terms, but I am really being honest... I hardly believe such
> bugs (really release critical, I guess, as I can't imagine even
> remotely using bcfg2, or, more simply, can't use it, with init scripts
> in this state) have not been reported yet.


I did not write the bcfg2 init scripts, and hadn't really even looked at them
much before this bug report. I would have noticed at least few of these issues
if I had.

Bcfg2 does not have many users on Debian, so noticing bugs may take time. Also
I am not running the squeeze version of the server in production (I am running
that version of the client, and the lenny version of the server, which does
not show most of these issues).

I went through the issues you reported and the script itself, and decided to
rewrite it in a more debian-like way (using start-stop-daemon instead of the
lsb functions, etc) instead of trying to fix the current one. I'm attaching my
version as a patch to this mail, I would appreciate it if you could test it
and make sure that all of the reported problems are fixed by it.

Also it appears that the bugs with server shutdown are currently thought to be
fixed upstream, but the fixes aren't easy to backport to the current stable
version. I'll have a closer look at this, but it may be that we will need to
wait for 1.1 to be released until this is properly fixed and just kill the
daemon from the init script until that happens.

I will also sort out the situation with the bcfg2 client init script, probably
by removing it and noting the removal in NEWS.Debian before uploading a new
version. 

PS. Please use the unified diff format (diff -u) when sending patches, it's
quite a bit more readable.

-- 
Arto Jantunen

--- bcfg2-server.orig	2010-03-30 10:20:45.0 +0300
+++ bcfg2-server	2010-08-02 15:45:21.291754838 +0300
@@ -1,112 +1,135 @@
-#!/bin/sh
-#
-# bcfg-server - Bcfg2 configuration daemon
-#
-# chkconfig: 2345 19 81
-# description: bcfg2 server for configuration requests
-#
+#! /bin/sh
 ### BEGIN INIT INFO
 # Provides:  bcfg2-server
-# Required-Start:$network $remote_fs $named
-# Required-Stop: $network $remote_fs $named
+# Required-Start:$network $remote_fs $named $syslog
+# Required-Stop: $network $remote_fs $named $syslog
 # Default-Start: 2 3 4 5
 # Default-Stop:  0 1 6
 # Short-Description: Configuration management Server
-# Description:   Bcfg2 is a configuration management system that builds
-#installs configuration files served by bcfg2-server
+# Description:   The server component of the Bcfg2 configuration management
+#system
 ### END INIT INFO
 
-# Include lsb functions
-. /lib/lsb/init-functions
+# Author: Arto Jantunen 
+
+# PATH should only include /usr/* if it runs after the mountnfs.sh script
+PATH=/sbin:/usr/sbin:/bin:/usr/bin
+DESC="Configuration management server"
+NAME=bcfg2-server
+DAEMON=/usr/sbin/$NAME
+PIDFILE=/var/run/$NAME.pid
+DAEMON_ARGS="-D $PIDFILE"
+SCRIPTNAME=/etc/init.d/$NAME
 
-# Commonly used stuff
-DAEMON=/usr/sbin/bcfg2-server
-PIDFILE=/var/run/bcfg2-server.pid
-PARAMS="-D $PIDFILE"
+# Exit if the package is not installed
+[ -x "$DAEMON" ] || exit 0
 
 # Disabled per default
 BCFG2_SERVER_OPTIONS=""
 BCFG2_SERVER_ENABLED=0
 
-# Include default startup configuration if exists
-test -f "/etc/default/bcfg2-server" && . /etc/default/bcfg2-server
+# Read configuration variable file if it is present
+[ -r /etc/default/$NAME ] && . /etc/default/$NAME
+
+# Load the VERBOSE setting and other rcS variables
+. /lib/init/vars.sh
+
+# Define LSB log_* functions.
+# Depend on lsb-base (>= 3.2-14) to ensure that this file is present
+# and status_of_proc is working.
+. /lib/lsb/init-functions
 
 if [ "$BCFG2_SERVER_ENABLED" -eq 0 ] ; then
  log_failure_msg "bcfg2-server is disabled - see /etc/default/bcfg2-server"
  exit 0
 fi
 
-# Exit if $DAEMON doesn't exist and is not executable
-test -x $DAEMON || exit 5
-
-# Internal variables
-BINARY=$(basename $DAEMON)
-
-start () {
-echo -n "Starting Configuration Management Server: "
-start_daemon ${DAEMON} ${PARAMS} ${BCFG2_SERVER_OPTIONS}
-STATUS=$?
-if [ "$STATUS" = 0 ]
-then
-log_success_msg "bcfg2-server"
-test -d /var/lock/subsys && touch /var/lock/subsys/bcfg2-server
-else
-log_failure_msg "bcfg2-server"
-fi
-return $STATUS
-}
-
-stop () {
-echo -n "Stopping Configuration Management Server: "
-killproc -p $PIDFILE ${BINARY}
-STATUS=$?
-if [ "$STATUS" = 0 ]; then
-  log_success_msg "bcfg2-server"
-  test -d /var/lock/subsys && touch /var/lock/subsys/bcfg2-server
-else
-  log_failure_msg "bcfg2-server"
-fi
-return $STATUS
+#
+# Function that starts the daemon/service
+#
+do_start()
+{
+	# Return
+	#   

Bug#591030: bcfg2-server init scripts are a complete mess

2010-07-31 Thread Tuomas Noraef
Subject: bcfg2-server init scripts are a complete mess
Package: bcfg2-server
Version: 1.0.1-2
Justification: renders package unusable
Severity: grave
Tags: patch

*** Please type your report below this line ***

Hi !

While being in the process of evaluating bcfg2, I came across its
Debian init scripts, which are IMHO a complete mess. Sorry to come to
use these terms, but I am really being honest... I hardly believe such
bugs (really release critical, I guess, as I can't imagine even
remotely using bcfg2, or, more simply, can't use it, with init scripts
in this state) have not been reported yet.



* To start with the "start" (sic) stanza :

- well it works quite right on a standard machine, as well on boot as
manually. But I have no interest in running it on a standard machine :
I need it to be in an OpenVZ container (as long as LXC is still rough,
with a lack of commodity executables and other scripts, hardware
sharing, ressource management, and so on - until all of that is better
in LXC, OpenVZ still rocks a lot - a lot more than KVM and such, which
I deem as stupidly used most of the time, while containers do most
jobs better, more securely, and consuming less resources - I like KVM,
mind you... at most, on top of container-ization, or for networks
simulations - but for segregating services ? As ridiculous as an
H-bomb so to kill a mosquito). And here comes the problem : it will
start manually in an OpenVZ container, but not on its start-up (this
one is a special case, of a "not so frequent to come to figure
out"-kind, but still... annoying). Fiddling a bit, I came accross the
fact that it seems to run too early. Maybe every and all things start
quicker, or slower, or in a better way, or in a luckily worse way,
outside of an OpenVZ container (I don't know, and honestly, I don't
really care), but inside, it is not the case (what I know is that in
the test container I set up, a "ls /etc/rc2.d|grep 01" only shows
"S01bcfg2", "S01bcfg2-server", S01bootlogs" and "S01rsyslog" as being
launched that early : so common sense hints to bcfg2-server requiring
rsyslog to be launched before it's spawned). Activating its debug
mode, I realized bcfg2-server would try to start, but fail, and be
killed immediately - except if it was to be run once the syslog has
been started (verified through an "invoke-rc.d rsyslog stop &&
invoke-rc.d bcfg2-server start" : bcfg2-server indeed fails to start)
: hence, I added a $syslog in the "Required-start", as (new) Squeeze
(containers) use the dependency based init system to determine the
running order at the boot, and after an "update-rc.d bcfg2-server
defaults" and a container restart, voilĂ  ! This way, bcfg2-server is
run just fine at the boot/init of an OpenVZ container.

- well... almost "voilĂ ". In the event the "stop" stanza would work
(wich it is'nt... but I'll explain it a bit farther), problem could
arise. Indeed, there is nothing in your script to protect against
launching the bcfg2-server if it already is running. Or, well, if you
do this while it already is, it seems to write bogus information in
the PID file : try this, starting with a stopped server, and through
an "invoke-rc.d bcfg2-server start && cat /var/run/bcfg2-server.pid &&
invoke-rc.d bcfg2-server start && cat /var/run/bcfg2-server.pid &&
pidof -x /usr/sbin/bcfg2-server", and you'll see the PID file gets
renewed, but the PID of the running "bcfg2-server" actually is... :
surprise, it is not what is in the PID file, but instead is the one
which was initially existing (I guess the idiotic "start_daemon"
functions from LSB starts a new server instance, overwrites its new
PID in the PID file, even though this server seems to die for whatever
reason, only leaving the original one running... or something like
that) - which is highly problematic if one would then want to stop the
server (which will necessarily happen, whenever, as stopping the
server requires a correct PID file...)... I guess this is another
example, amongst a freaking multitude, of why the LSB "init-functions"
is a pile of buggy and, mostly, useless crap... but, well, whatever :
you seem to want to use it, so, let us try to cope with what you
want... problem is simply resolved by testing wether the server is
running before trying to launch it, and silently tell everything went
OK but not do anything, if it was already running (this also implies
setting the PID variable outside of the "status" function of the init
script, as it now makes it also be used in the "start" function). Two
problems solved... but this is far from ending there.



* Now, for the "stop" stanza :

- there seems to be a recurring bug in bcfg2-server (see
http://trac.mcs.anl.gov/projects/bcfg2/ticket/709 - bug opened, then
closed, then re-opened, then re-closed, ... on and on), where the
server doesn't fully respond to a SIGTERM, because the
fam/gamin/whatever worker thread doesn't stop, while the server waits
indefinately for it to terminate... which results in the "sto