Re: vote for systemd: Nay (now working but still Voting Nay)

Jean-Marc Pigeon Tue, 02 Jul 2013 13:58:36 -0700


Thanks Michal,
your answer was really positive and encourage me to proceed further.


So I have now an FC18 running within a container under an EL6.4 HOST
with kernel 3.9.4 (big smile).

Problems starts to unlock themselves as I decided to bypass
network.service altogether
starting network and sshd manually (ifup lo; ifup eth0;
/usr/sbin/sshd). Now able to
work in a quiet room with multiple screens available to poke around
and catch fast scrolling log messages.
(you should never forget about the poor sysadmin freezing in front of
the servers room console
when your software is reporting a problem and not able to run :-}).

As expected the problem stand on a very small detail (within /etc/fstab)

Not working
/vzgot          /               ext4    defaults        0 0
proc            /proc           proc    defaults        0 0
sysfs           /sys            sysfs   defaults        0 0
devpts          /dev/pts                devpts  defaults        0 0
tmpfs           /dev/shm                tmpfs   defaults        0 0

Working
#/vzgot         /               ext4    defaults        0 0
proc            /proc           proc    defaults        0 0
sysfs           /sys            sysfs   defaults        0 0
devpts          /dev/pts                devpts  defaults        0 0
tmpfs           /dev/shm                tmpfs   defaults        0 0


The fact systemd was not able to cope with this /etc/fstab is quite
acceptable,
(even if upstart and init have no problem with it), The fact such
small trouble
drives systemd to an emergency state without reporting clearly is another
question. When the last prominent line before asking for maintenance
password is about, "Not able to exec /bin/plymouth, <no such file>"
you are asking yourself in what mess am I in.
The fact that the line just below says, "Please see journal" but
journal is not available (empty)
just compound the effect.

 Once I was able to log via remote SSH in emergency.service mode, I
played with different services,
trying to "ignore-dependencies" but never got a clear message about
what was missing.
Success was more a lucky guess than the result from a structured approach.

So, no, sorry, systemd doesn't grade "production level" (not yet? or never?).

May I propose some way to improve it.
- journal should be accessible regardless of systemd status or trouble.
- when list-dependencies service is displayed, you should mark dependencies
   already running (or not successfully started?), think about the
poor sysadmin!.
- You should have a way to proceed in a 'step by step' boot mode
   (avoiding in parallel fast scrolling report)

- On a more philosophical side:
   * linking PID1 and systemd seems to me a problem (why it is
mandatory still escape me),
     you are limiting your trouble shooting context (double check
your design).
   * the fact systemd is catching more and more functionality to be
     working should trigger a loud alarm signal about your design
(did I understand
     today's mail correctly?, you can't use logrotate to
expire/archive journal.... :/ )


Bug:
- After a very quick check, there is maybe a bug the way systemd is
handling 'int reboot(int cmd);',
   I have the strong feeling systemd is not feeding WTERMSIG(status),
but it is very
   preliminary, I could be wrong....


As your request,I can provide you with "vzgot", my container
application (which flavor/distribution RPM do you want?
src.rpm is available too). While not a fork of LXC, I think vzgot is
very close to LXC about the
way the container is started, difference is more about container
definition, with vzgot, you just need
a DNS resolution (for the container's IPs) and a config_list, linking
container name to a
distribution name, a template name and an architecture. With that
data, vzgot is
able to create a running container by itself. I tried to have the
container setup as lean,
simple and flexible as possible.

I put that project in sleep mode, because a trouble I reported 3 years ago
(a  syslog+printk cross leakage between HOST and containers) seems to
be very difficult to address within the kernel. But!... very good
news yesterday!, problem is fixed within kernel 3.10.0, maybe
it is time to work on vzgot again?.






Quoting Michal Schmidt <mschm...@redhat.com>:

On 07/02/2013 04:08 PM, Jean-Marc Pigeon wrote:

I was not expecting to have it fully working at the first attempt in my
own container design,


Would you be willing to provide some details about your container
design? Ideally including the code to allow others to reproduce the
problems you saw.

Have you seen these recommendations?:
http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/

but I was expecting systemd (using systemctl very detailed status) to give
 me a very good insight about issues which could occur.

The real goal was to learn how to use systemd components to diagnose an "in
 trouble" real system, a kind of flight simulator exercise, so that we
would be ready in the future to do quick diagnosis if one of our server
in a rack had trouble to boot or reboot with EL7.


Interesting excersise, but I am afraid by running it in a custom
container design and running under a host that itself is not using
systemd you uncovered an entirely different class of problems than
what can happen when running it on the host.

This small exercise turned out very ugly very quickly, I worked very hard
trying all the tricks and bypass I could think about to collect data. To
my dismay I
was unable to get a predictable behaviour, nor reliable data from
systemd, even in the emergency.service mode.
After a while, I was forced to face it, systemd won't help me, not even
start the system in a minimal mode,
I was not able to go beyond kernel level with systemd in control,
services started were a total mess and container was totaly lock up,
with no exploitable data provided.


Not sure how much of it relates to container environments, but have
you seen this?:
http://freedesktop.org/wiki/Software/systemd/Debugging/

My first goal when debugging issues like this would be to make sure
I can see the debugging output of systemd itself (i.e. with
log_level set to debug and log_target to something I can read -
probably "console" in the case of a container).

(Quickly: we had interesting situation within the noisy and cold server
room using the emergency.service console
such as:
$ systemctl start systemd-journald.service
--> "unable to comply!" a dependency job for systemd-journald.service
failed, see journactl -xn.


This is when logging to "kmsg" (the dmesg buffer) or "console" can
really help find out the problem.

I ended up asking myself 'what part of this puzzle am I missing?',
I digged around in Google about systemd and I was stunned by results, I
found
my concerns were already expressed multiple time with more talented
words than mine
and this as early as 2010. Since that time it is my understanding
systemd continuously try to resolve problems
by increasing its complexity and extending its dependencies and its
centrality.

this is wrong, this is very very wrong.
A program as complex as systemd can't be a mandatory PID1 in an open
environment as UNIX.


From the above paragraphs I get the feeling you may be missing the
fact that not all of "systemd" runs in PID1. There are more
components in the "systemd" project, such as journald, logind, ...
- they run as separate processes. There is some ambiguity when
talking about "systemd". Sometimes it refers only to the service
manager (PID1), and sometimes to the whole suite.

BTW and to go a little bit beyond the systemd case, since 1991,
FC18 is the very first distribution I was NOT successful in
installing on a plain hardware


I heard F19 was released today with an improved Anaconda :-)

Michal

--
A bientôt
===========================================================
Jean-Marc Pigeon                        E-Mail: j...@safe.ca
SAFE Inc.                             Phone: (514) 493-4280
  Clement, 'a kiss solution' to get rid of SPAM (at last)
     Clement' Home base <"http://www.clement.safe.ca";>
===========================================================

smime.p7s
Description: S/MIME Cryptographic Signature

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Re: vote for systemd: Nay (now working but still Voting Nay)

Reply via email to