Subject: bcfg2-server init scripts are a complete mess
Package: bcfg2-server
Version: 1.0.1-2
Justification: renders package unusable
Severity: grave
Tags: patch

*** Please type your report below this line ***

Hi !

While being in the process of evaluating bcfg2, I came across its
Debian init scripts, which are IMHO a complete mess. Sorry to come to
use these terms, but I am really being honest... I hardly believe such
bugs (really release critical, I guess, as I can't imagine even
remotely using bcfg2, or, more simply, can't use it, with init scripts
in this state) have not been reported yet.



***** To start with the "start" (sic) stanza :

- well it works quite right on a standard machine, as well on boot as
manually. But I have no interest in running it on a standard machine :
I need it to be in an OpenVZ container (as long as LXC is still rough,
with a lack of commodity executables and other scripts, hardware
sharing, ressource management, and so on - until all of that is better
in LXC, OpenVZ still rocks a lot - a lot more than KVM and such, which
I deem as stupidly used most of the time, while containers do most
jobs better, more securely, and consuming less resources - I like KVM,
mind you... at most, on top of container-ization, or for networks
simulations - but for segregating services ? As ridiculous as an
H-bomb so to kill a mosquito). And here comes the problem : it will
start manually in an OpenVZ container, but not on its start-up (this
one is a special case, of a "not so frequent to come to figure
out"-kind, but still... annoying). Fiddling a bit, I came accross the
fact that it seems to run too early. Maybe every and all things start
quicker, or slower, or in a better way, or in a luckily worse way,
outside of an OpenVZ container (I don't know, and honestly, I don't
really care), but inside, it is not the case (what I know is that in
the test container I set up, a "ls /etc/rc2.d|grep 01" only shows
"S01bcfg2", "S01bcfg2-server", S01bootlogs" and "S01rsyslog" as being
launched that early : so common sense hints to bcfg2-server requiring
rsyslog to be launched before it's spawned). Activating its debug
mode, I realized bcfg2-server would try to start, but fail, and be
killed immediately - except if it was to be run once the syslog has
been started (verified through an "invoke-rc.d rsyslog stop &&
invoke-rc.d bcfg2-server start" : bcfg2-server indeed fails to start)
: hence, I added a $syslog in the "Required-start", as (new) Squeeze
(containers) use the dependency based init system to determine the
running order at the boot, and after an "update-rc.d bcfg2-server
defaults" and a container restart, voilà ! This way, bcfg2-server is
run just fine at the boot/init of an OpenVZ container.

- well... almost "voilà". In the event the "stop" stanza would work
(wich it is'nt... but I'll explain it a bit farther), problem could
arise. Indeed, there is nothing in your script to protect against
launching the bcfg2-server if it already is running. Or, well, if you
do this while it already is, it seems to write bogus information in
the PID file : try this, starting with a stopped server, and through
an "invoke-rc.d bcfg2-server start && cat /var/run/bcfg2-server.pid &&
invoke-rc.d bcfg2-server start && cat /var/run/bcfg2-server.pid &&
pidof -x /usr/sbin/bcfg2-server", and you'll see the PID file gets
renewed, but the PID of the running "bcfg2-server" actually is... :
surprise, it is not what is in the PID file, but instead is the one
which was initially existing (I guess the idiotic "start_daemon"
functions from LSB starts a new server instance, overwrites its new
PID in the PID file, even though this server seems to die for whatever
reason, only leaving the original one running... or something like
that) - which is highly problematic if one would then want to stop the
server (which will necessarily happen, whenever, as stopping the
server requires a correct PID file...)... I guess this is another
example, amongst a freaking multitude, of why the LSB "init-functions"
is a pile of buggy and, mostly, useless crap... but, well, whatever :
you seem to want to use it, so, let us try to cope with what you
want... problem is simply resolved by testing wether the server is
running before trying to launch it, and silently tell everything went
OK but not do anything, if it was already running (this also implies
setting the PID variable outside of the "status" function of the init
script, as it now makes it also be used in the "start" function). Two
problems solved... but this is far from ending there.



***** Now, for the "stop" stanza :

- there seems to be a recurring bug in bcfg2-server (see
http://trac.mcs.anl.gov/projects/bcfg2/ticket/709 - bug opened, then
closed, then re-opened, then re-closed, ... on and on), where the
server doesn't fully respond to a SIGTERM, because the
fam/gamin/whatever worker thread doesn't stop, while the server waits
indefinately for it to terminate... which results in the "stop" stanza
not working. This also can be observed while running the server
undaemonized, and trying to stop it with a ^C : the server will hang
there, and wait forever for the file alteration monitor to stop, which
never happens. I guess this is a case for another bug report, but well
: for now, using SIGKILL instead of SIGINT in the stop stanza still
does the trick. The fam worker thread will still remain active once
the "bcfg2-server" is killed, but this is a relatively minor problem,
as it will be killed, and another spawned in his stead, if the
bcfg2-server is relaunched thereafter : so until you correct the bug
of this worker sub-process never obeying the SIGTERM, the SIGKILL
seems an appropriate workaround to me - not an ideal solution, but
still better than a stop script that doesn't work. Especially if we
consider that as the "stop" stanza leaves a process half-dead forever,
and as the "restart" stanza calls the "stop" one, a "restart" will
result in two living "bcfg2-server" threads concurrently running, and
will add one each time it is reissued, which can seriously wreak
havocs, as the PID file will get overwritten, the logs filled with
incoherent messages, and so on (it is by the way already solved with
my patch of the "start" function, which prevents several instances of
bcfg2-server running at once, but this wreakage happens with your
unpatched script : try it, not funny at all)... Plus it only takes
adding a "9" at the end of the "killproc" call (you got to read the
LSB "init-functions" to figure this out, as it is mostly
undocumented... sigh). It is by the way quite interesting to observe
the difference between issuing several time the "start" function,
without calling the "stop" one, and with calling the "stop" one : I
think that issuing the "start" one several time without the "stop"
one, the initially running "bcfg2-server" makes it so the additional
ones you try to spawn kill themselves realizing there already is one
of them serving. But with the "stop" function, the initially running
server passes in a state where it begins to shut down (observed in its
logs, if you activate it), which allows new ones to be spawned, as the
initially running one is not really serving anything anymore, but only
waits for its worker script to stop (which as I said never happens).
Half-dead additional processes would then be processes indefinately
waiting for file alteration monitoring processes to stop, but not
serving anything - zombies, in other terms...

- now, this implies one problem : the "killproc" function will only
remove the PID file if a TERM signal has been requested : would a KILL
signal be requested, it will not clean the PID file... Seriously, I
can't believe how this LSB "init-functions" is a pile of stupid,
ridicule and useless crap ! If a TERM signal isn't send, the daemon
will not be stopped as it is now, and the PID file will not be
cleaned. But if a KILL signal is send, there is a logical test inside
this "init-functions" stupid crap that will prevent the PID file from
being cleaned... yeah, "right, Indian companion" : very useful
wrappers indeed (or, most definately, not)... well, up to you, the
maintainer, to choose : as the PID file will never got cleaned, do you
want the "stop" stanza to effectively stop the daemon, or never do it
? I guess the SIGKILL is still the least obstrusive way to sort things
out for now : once you'll have "bcfg2-server" patched so it manages to
pass the SIGTERM to its FAM/gamin/... worker thread, you could revert
to using the SIGTERM instead of SIGKILL by removing the "9" I added to
the killproc, and the PID file will be managed proprely, but for now,
one way or the other, it is impossible, so...

- other problem is that if the server was not running when the "stop"
stanza is called, an error message from the "start-stop-daemon"
spawned by the "init-functions" wrapper will be issued in the term,
but somehow will not result in an error code, as the killproc function
will have exited cleanly (told you : "init-functions" is a pile of
crappy stupidity...), which is a bit incoherent : wether it should not
display this error message, or wether it should log it. So I think it
would be a bit better to do some more tests, so to know if the server
is running before trying to stop it, and not spawn an error if it was
already terminated (which seems the usual way to do in Debian). I do
this in a similar way to the one I used to resolve the second problem
for the "start" function.



***** As for the "status" stanza :

- well, it just doesn't work, or works too well : whatever happens, it
always tell bcfg2-server is running, even though it obviously is not.
Guess why ? Plain and simple : you use /bin/pidof to find a process
whose base name is "bcfg2-server". And guess how this init script is
called ? You're right : "bcfg2-server"... ask a script wether a
process, "more or less" named as the very same script is, is running :
he'll of course always answer one is, as IT is running... seriously ?
Using the daemon full path in the pidof instead of its base name
simply solves the problem...

- otherwise, with the (absolutely necessary, for now !) SIGKILL to
stop the server, the "$BINARY dead but pid file exists..." message is
plain useless, as this file will not be cleaned at all without a
SIGTERM request. So, until you patch the server to work with a
SIGTERM, I commented it.

- now, it doesn't mean it is not buggy either : to check this, you
test if the PID file exists (good), and wether a process having this
PID is running (bad) - you should test wether it is NOT running,
should you need to test anything. But actually, as you already
previously tested that, in the lines just above, and should
bcfg2-server have been running, the function would already have exited
with a 0 signal, so there is no use in testing that again : it is
implicitely already a known fact. So I removed this "[ -n $PID ]"
test, that should rather have been a "[ -z $PID ]" instead, but is
useless anyway, even if the server responded appropriately to a
SIGTERM...



***** And finally, for the "restart" stanza :

- minor thing : the 5 second sleep is not useful, especially not with
a SIGKILL that doesn't leave any chance for needing to wait anytime -
but not useful anymore with a SIGTERM that works correctly. If it
works, it works : no need for Black Arts, then. So I removed this
useless sleep time.



Hope you'll patch all of this, as I really think (most of) this is
really serious (basically, not any of the defined init stanza works
correctly - "start" runs too early and wreaks havoc in the PID file if
used sequentially, "stop" never works, and, as "restart", only makes
it possible to stack server processes one above the others, while
"status" is always reporting the same thing whatever the
circumstances) : a diff is of course attached (half the size of the
original file ! Ahemmm...). Haven't had the time yet to check much
into the bcfg2 client, but what I saw made me loose any hope about its
init scripts - I may file another bug for those, but, well, I'll start
telling you this here : there is not any support anymore for a
persistent agent mode in bcfg2, starting from version 1.0... maybe you
did not noticed this, but, well : mostly, your client init scripts are
now pretty much useless, and un-working (appart from the init
"one-time" run... but I think this could as well be done using custom
thingies in rc.local - you could as well remove them, leaving you more
time to maintain the server ones : [friendly] pun intended :p ) : the
prefered ways to run the client seem now to use SSH triggering or cron
(don't know yet what I'll do). And, I finally hope that you will not
resent the tone (a bit grumpy, I confess) of my message, but, the init
scripts you maintain (I don't know if you wrote those, and, well,
whatever... : I am pointing finger at scripts, not at people) are a
real mess (I've been trying to understand what I was doing wrong for a
day, while, well... I was mostly right, I guess), and the use of those
LSB debilitated and undocumented "init-functions" can really get on
the nerves of anyone needing to look at it... which is a bit of a
shame, as bcfg2 seems so much goodie (I have come to purely hate
cfengine and puppet : using those, try managing the repository ACLs
for the various unique private keys to be pulled by hundreds and
hundreds of clients... not to speak about proprietary extensions
providing the database pull I need so much while extending the
abilities of the software is a real pain, or the ruby behemoth
ressource consumptions while most of the work has to be done inside
each and every client - strategy and design flaws incarnate : such a
payoff, for _management_ apps... what I've read about bcfg2 really
makes it look like heaven to me [appart from the need to touch XML,
OK, right...], and I really think it deserves a better advertisement,
or at least chance, which its Debian init scripts certainly do not
offer, as they are now).

Farewell.

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-openvz-amd64 (SMP w/4 CPU cores)
Shell: /bin/sh linked to /bin/dash

Versions of packages bcfg2-server depends on:
ii  bcfg2                       1.0.1-2      Configuration management client
ii  gamin                       0.1.10-2+b1  File and directory monitoring syst
ii  libxml2-utils               2.7.7.dfsg-4 XML utilities
ii  lsb-base                    3.2-23.1     Linux Standard Base 3.2 init scrip
ii  openssl                     0.9.8o-1     Secure Socket Layer (SSL) binary a
ii  python                      2.6.5-5      An interactive high-level object-o
ii  python-fam                  1.1.1-2.2+b1 Python interface to FAM
ii  python-gamin                0.1.10-2+b1  Python binding for the gamin clien
ii  python-lxml                 2.2.6-1      pythonic binding for the libxml2 a
ii  python-support              1.0.9        automated rebuilding support for P
ii  ucf                         3.0025       Update Configuration File: preserv

Versions of packages bcfg2-server recommends:
pn  graphviz                      <none>     (no description available)

Versions of packages bcfg2-server suggests:
pn  mail-transport-agent          <none>     (no description available)
pn  python-cheetah                <none>     (no description available)
pn  python-django                 <none>     (no description available)
pn  python-genshi                 <none>     (no description available)
pn  python-profiler               <none>     (no description available)
pn  sqlalchemy                    <none>     (no description available)
10,11c10,11
< # Required-Start:    $network $remote_fs $named
< # Required-Stop:     $network $remote_fs $named
---
> # Required-Start:    $network $remote_fs $named $syslog
> # Required-Stop:     $network $remote_fs $named $syslog
43a44
> PID=$(pidof -x ${DAEMON})
47,48c48,53
<     start_daemon ${DAEMON} ${PARAMS} ${BCFG2_SERVER_OPTIONS}
<     STATUS=$?
---
>     if [ -z ${PID} ]; then
>       start_daemon ${DAEMON} ${PARAMS} ${BCFG2_SERVER_OPTIONS}
>       STATUS=$?
>     else
>       STATUS=0
>     fi
61,62c66,71
<     killproc -p $PIDFILE ${BINARY}
<     STATUS=$?
---
>     if [ ! -z ${PID} ]; then
>       killproc -p $PIDFILE ${BINARY} 9
>       STATUS=$?
>     else
>       STATUS=0
>     fi
74d82
<     PID=$(pidof -x $BINARY)
80,85c88,95
<     if [ -f $PIDFILE ]; then
<       if [ -n "$PID" ]; then
<         log_failure_msg "$BINARY dead but pid file exists..."
<         return 1
<       fi
<     fi
---
> # This section will only be useful when the bug, making the server never fully
> # responding to a SIGTERM, but only to a SIGKILL, will be corrected... but for
> # no wit only spawns unnecessary errors, and returns already expected
> # information...
> #    if [ -f $PIDFILE ]; then
> #      log_failure_msg "$BINARY dead but pid file exists..."
> #      return 1
> #    fi
103d112
<         sleep 5

Reply via email to