Re: s6 bites noob

2019-02-05 Thread Laurent Bercot

just take this as a data sample for what can happen when a random noob tries to 
use s6.


Although unpleasant (not gonna lie), it was a very useful user
experience report, thank you. Among other things, it comforts me in the
belief that a user interface layer on top of s6 + s6-rc + s6-linux-init
is the way to go - a layer that makes things Just Work even when users
don't do everything perfectly, and with friendlier behaviour in case
of an error. People will still be able to look under the hood and
tweak things manually, but they won't have to, and they won't be
exposed to the nuts and bolts unless they want to.

Also, just in case someone tries the latest s6 / s6-rc git head:
I have added "uid/self" and "gid/self" key checking in the accessrules
library, for when the client runs with the same euid / the same egid
as the server; and I have changed s6-rc-compile to use the 
functionality,

removing its -u and -g options in the process. So now, the behaviour
should always be consistent: the user who can operate a s6-rc database
is always the user who owns the supervision tree. No exceptions.
root can also use s6-rc commands, but services will still run as the
user who owns the supervision tree.

A numbered release of s6 and s6-rc (and lots of other packages) will
happen some time next month.



BTW, your explanations of why things are designed the way they are were helpful 
for understanding the system. I recommend copying them into the docs.


 I should write a "rationale / policy recommendation" section in the
documentation pages, that is a good idea.

--
 Laurent



Re: s6 bites noob

2019-02-05 Thread Kelly Dean


Laurent Bercot writes:
>>Anyway, recompile with -u 1000, re-update, and try again. Now, I can't even 
>>do s6-rc -a list; I get:
>>s6-rc fatal: unable to take locks: Permission denied
>
> Hmmm, that's weird. If all the previous operations have been done as
> the same user, you should never get EPERM. Have you run something as
> root before?

Indeed, I did. My command history from last night shows that before I 
remembered to try compiling with -u 1000, I tried sudo s6-rc change testpipe, 
after the previous non-sudo invocation failed with a permission error, so that 
must be what screwed it up. I don't remember doing that. Must have been really 
tired and frustrated.

So I killed svscan, removed my compiled databases and scan and live dirs, and 
started from scratch. Now s6-rc succeeds, but when I brought up testpipe (two 
daemons funneling to a logger), I got once per second:
fdclose: fatal: unable to exec ./run.user: Exec format error

Oops, I forgot #!/bin/bash at the top of one of the run files. (Would have been 
helpful if the error message had specified which one.) Fix that, recompile, 
make new link, do an update, try again. Now:
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-r-logger: 
Broken pipe
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-w-logger: 
Broken pipe
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-r-logger: 
Connection reset by peer
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-w-logger: 
Connection reset by peer

It also somehow managed to hose the terminal in which svscan was running. As 
in, when I try to type in it, only a small percentage of the letters actually 
appear. Killed svscan, tried to reset the terminal, no luck. This is the first 
time I remember ever getting an un-resettable terminal. No problem, I can just 
kill the terminal, but... weird.

Oops, after I added the forgotten #!/bin/bash, I forgot -u 1000 again when I 
recompiled. So, the failure should be expected, but hosing the terminal? 
Really? And the error messages give no hint of what's actually wrong, unless 
you're familiar with the internal design of s6, which seems an excessive burden 
for a mere user. I guess I'm spoiled by modern C compilers, which have become 
excellent in the past few years at explaining in exquisite detail exactly in 
which way I'm currently being an idiot.

So, remove the compiled databases and scan directory, recompile with -u 1000, 
restart svscan, re-run s6-rc-init, try testpipe again, and... success! Wow, 
that was unexpected. I'd become conditioned to expect failure.

Ok now, quick, while I remember how to use s6, I'll install it into my project 
and make sure it works perfectly, so I never have to touch it again. There are 
other things I'd be curious to try with it too, but I shouldn't keep pestering 
you and the mailing list for unpaid tech support, so I guess just take this as 
a data sample for what can happen when a random noob tries to use s6.

BTW, your explanations of why things are designed the way they are were helpful 
for understanding the system. I recommend copying them into the docs.


Re: s6 bites noob

2019-02-04 Thread Laurent Bercot




But run not existing when supervise starts is a different case from run 
disappearing after supervise is already running.


No, it's not.
First, s6-supervise starts and initializes the state of the service
to "down, wanted up" (unless there's a ./down file in which case
the service isn't wanted up).
Then, s6-supervise enters its main loop, where it sees that the
service is down + wanted up, so it tries to start it. If there is
no run file, it's a temporary failure. No matter whether or not
it's the first time it tries.

Adding a special case where s6-supervise aborts when it tries to
start ./run for the first time would make the code more complex,
especially when you try to answer questions such as "what do I do
if there is a down file and the service is started later" (which
is the case when s6-supervise is driven by s6-rc), or "what do I do
when ./run exists but is non-executable", or other questions in the
same vein. And the end benefit of having such a special case would
be very dubious.



Another example of orneriness: supervise automatically does its own 
initialization, but the s6-rc program (not the eponymous suite) doesn't. 
Instead, the suite has a separate init program, s6-rc-init, that's normally run 
at boot time. But if it isn't run at boot time (which is a policy decision), 
s6-rc doesn't automatically run it if necessary. If rc shouldn't 
auto-initialize, neither should supervise.


Now you are just acting in bad faith. Different programs, with
different goals, obviously have different requirements and different
behaviours; you can't seriously suggest that apples should behave
like oranges.

If a system uses s6-rc as its service manager, then early on in the
boot process, it *will* run s6-rc-init. That's not so much a policy
decision as a s6-rc mechanism. It's running a command line program
in the system initialization sequence, it's not exactly difficult
or convoluted.



Another one: the -d option to s6-rc is overloaded. When used with change, it 
means to down the selected services. But when used with list, it means to 
invert the selection. I'm going to repeatedly forget this.


You know what's interesting? It initially did not do this. And then
people complained that the behaviour wasn't intuitive, and that
s6-rc -d list *should* invert the selection. I thought about it and
realized that it made sense, so I implemented it.
I guess you can't make everyone happy.



One more: the doc for the s6-rc program says it's meant to be the one-stop shop of 
service management, after compilation and initialization are done. It has subcommands 
list, listall, diff, and change. But s6-rc-update is a separate program, not a subcommand 
of s6-rc. I suppose there's a reason for this, but it complicates the user interface with 
a seemingly arbitrary distinction of whether to put a dash between "s6-rc" and 
the subcommand depending on what the particular subcommand is.


The s6-rc command operates the service management engine, relying on
a stable compiled service database. It will read the database and
perform operations in that context. It is the command to use when
querying the database, and starting/stopping services, in a normal
production environment.

The s6-rc-update command does not fit in that model. It is an
administration command: it changes the context in which s6-rc
runs. It is not used as commonly, it is heavier and more dangerous.
It switches databases!
Think atomically regenerating the openrc cache after modifying
your services in /etc/init.d. (OpenRC offers no such thing, and
it's a mess reliability-wise.) This is a fundamentally different
thing from running the engine.



The docs advise entirely copying the service repository to a ramdisk, then 
using (a link to) the copy as the scan directory. This makes the running system 
independent of the original repo. But the doc for s6-rc-init says the rc system 
remains dependent on the original compiled database, and there's no explanation 
of why it isn't also copied in order to make the running system
independent.


The point of operating off a copy of some data is that system operation
won't be disturbed when the user modifies the original data, until
they decide to commit/flush a batch of changes - which the system
should then pick up as atomically as possible.

A service directory is data that the user can modify. That is
why it is better to run s6-supervise on a copy of a service directory
(separate "live" data from "stock" data).

The compiled database is already a copy. It's not data that the
user modifies (except potentially for adding or removing bundles,
but those are a layer on top of the core database, which remains
untouched). What the user will modify is the source directory, that
will be compiled into a different database, and changes will not
happen to the live system until the user calls s6-rc-update, which
is the atomic commit/flush operation.



Ok, Colin Booth mentioned permission issues when running as non-root. It

Re: s6 bites noob

2019-02-03 Thread Kelly Dean


Laurent Bercot writes:
> foo/run not existing is a temporary error condition that can happen
> at any time, not only at the start of s6-supervise. This is a very
> different case: the supervisor is already running and the user is
> relying on its monitoring foo.

But run not existing when supervise starts is a different case from run 
disappearing after supervise is already running. Even though supervise should 
continue running if run disappears, that doesn't imply that it shouldn't abort 
on startup if run doesn't exist in the first place.

Another example of orneriness: supervise automatically does its own 
initialization, but the s6-rc program (not the eponymous suite) doesn't. 
Instead, the suite has a separate init program, s6-rc-init, that's normally run 
at boot time. But if it isn't run at boot time (which is a policy decision), 
s6-rc doesn't automatically run it if necessary. If rc shouldn't 
auto-initialize, neither should supervise.

Another one: the -d option to s6-rc is overloaded. When used with change, it 
means to down the selected services. But when used with list, it means to 
invert the selection. I'm going to repeatedly forget this.

One more: the doc for the s6-rc program says it's meant to be the one-stop shop 
of service management, after compilation and initialization are done. It has 
subcommands list, listall, diff, and change. But s6-rc-update is a separate 
program, not a subcommand of s6-rc. I suppose there's a reason for this, but it 
complicates the user interface with a seemingly arbitrary distinction of 
whether to put a dash between "s6-rc" and the subcommand depending on what the 
particular subcommand is.

The docs advise entirely copying the service repository to a ramdisk, then 
using (a link to) the copy as the scan directory. This makes the running system 
independent of the original repo. But the doc for s6-rc-init says the rc system 
remains dependent on the original compiled database, and there's no explanation 
of why it isn't also copied in order to make the running system independent.

I tried to test the logger. Set up a service repo with foo and bar, each with a 
run like
#!/bin/bash
echo foo starting
sleep 2
echo foo dying

foo and bar are funneled to a logger that has this run file:
s6-log -l 1 s100 T /home/user/testlogs

Try to start the bundle. Hangs. Press ^C. Get:
s6-rc: warning: unable to start service s6rc-fdholder: command crashed with 
signal 2.

Ok, Colin Booth mentioned permission issues when running as non-root. It 
shouldn't be a problem, since all of this (including svscan) is running as the 
same user. Permission problems should only come into play when trying to do 
things inter-user. Anyway, I checked the s6-rc-compile doc. Looks like -h won't 
be necessary, since it defaults to the owner of the svscan proc. But -u is 
needed, since it defaults to allowing only root--even though I've never run any 
of this as root, and I've never asked it to try to do anything as root, and 
I've never told it that it should expect to be root, or even mentioned root at 
all.

And I'm not really sure the doc is right, because it says -u controls who's 
allowed to start and stop services, yet I've already used rc to start and stop 
regular (longrun) services as my non-root user before, with no problem (I had a 
problem only with oneshot), even though the doc says that since I didn't 
compile with -u, it should have disallowed that.

Anyway, recompile with -u 1000, re-update, and try again. Now, I can't even do 
s6-rc -a list; I get:
s6-rc fatal: unable to take locks: Permission denied

Maybe I missed an essential step, and screwed something up? I'm bewildered, 
tired, and going to bed. After reading more of the docs than I expected to be 
necessary, I'm still unable to get s6 to do the basic job I need: manage a 
small group of services, and funnel and log their output. It's especially 
frustrating having to fight with software that generates gratuitous intra-user 
permission errors.

I'll try again in the morning, with replenished willpower.


Re: s6 bites noob

2019-02-03 Thread Laurent Bercot

s6-supervise aborts on startup if foo/supervise/control is already open, but 
perpetually retries if foo/run doesn't exist. Both of those problems indicate 
the user is doing something wrong. Wouldn't it make more sense for both 
problems to result in the same behavior (either retry or abort, preferably the 
latter)?


foo/supervise/control being already open indicates there's already a
s6-supervise process monitoring foo - in which case spawning another
one makes no sense, so s6-supervise aborts.

foo/run not existing is a temporary error condition that can happen
at any time, not only at the start of s6-supervise. This is a very
different case: the supervisor is already running and the user is
relying on its monitoring foo. At that point, the supervisor really
should not die, unless explicitly asked to; and "nonexistent foo/run"
is perfectly recoverable, you just have to warn the user and try
again later.

It's simply the difference between a fatal error and a recoverable
error. In most simple programs, all errors can be treated as fatal:
if you're not in the nominal case, just abort and let the user deal
with it. But in a supervisor, the difference is important, because
surviving all kinds of trouble is precisely what a supervisor is
there for.



https://cr.yp.to/daemontools/supervise.html indicates the original verison of 
supervise aborts in both cases.


That's what it suggests, but it is unclear ("may exit"). I have
forgotten what daemontools' supervise does when foo/run doesn't
exist, but I don't think it dies. I think it loops, just as
s6-supervise does. You should test it.



 I also don't understand the reason for svscan and supervise being different. 
Supervise's job is to watch one daemon. Svscan's job is to watch a collection 
of supervise procs. Why not omit supervise, and have svscan directly watch the 
daemons? Surely this is a common question.


You said it yourself: supervise's job is to watch one daemon, and
svscan's job is to watch a collection of supervise processes. That is
not the same job at all. And if it's not the same job, a Unix guideline
says they should be different programs: one function = one tool. With
experience, I've found this guideline to be 100% justified, and
extremely useful.
Look at s6-svscan's and s6-supervise's source code. You will find
they share very few library functions - there's basically no code
duplication, no functionality duplication, between them.

Supervising several daemons from one unique process is obviously
possible. That's for instance what perpd, sysvinit and systemd do.
But if you look at perpd's source code (which is functionally and
stylistically the closest to svscan+supervise) you'll see that
it's almost as long as the source code of s6-svscan plus s6-supervise
combined, while not being a perfectly nonblocking state machine as
s6-supervise is.

Combining functionality into a single process adds complexity.
Putting separate functionality in separate processes reduces
complexity, because it takes advantage of the natural boundaries
provided by the OS. It allows you to do just as much with much less
code.



I understand svscan must be as simple as possible, for reliability, because it 
must not die. But I don't see how combining it with supervise would really make 
it more complex. It already has supervise's functionality built in (watch a 
target proc, and restart it when it dies).


No, the functionality isn't the same at all, and "restart a process
when it dies" is an excessively simplified view of what s6-supervise
does. If that was all there is to it, a "while true ; do ./run ; done"
shell script would do the job; but if you've had to deal with that
approach once in a production environment, you intimately and
painfully know how terrible it is.

s6-svscan knows how s6-supervise behaves, and can trust it and rely
on an interface between the two programs since they're part of the
same package. Spawning and watching a s6-supervise process is easy,
as easy as calling a function; s6-svscan's complexity comes from the
fact that it needs to manage a *collection* of s6-supervise
processes. (Actually, the brunt of its complexity comes from supporting
pipes between a service and a logger, but that's beside the point.)

On the other hand, s6-supervise does not know how ./run behaves, can
make no assumption about it, cannot trust it, must babysit it no matter
how bad it gets, and must remain stable no matter how much shit it
throws at you. This is a totally different job - and a much harder job
than watching a thousand of nice, friendly s6-supervise programs.
Part of the proof is that s6-supervise's source code is bigger than
s6-svscan's.

By all means, if you want a single supervisor for all your services,
try perp. It may suit you. But I don't think having less processes
in your "ps" output is a worthwhile goal: it's purely cosmetic, and
you have to balance that against the real benefits that separating
processes provides.

--
Laurent



Re: s6 bites noob

2019-02-03 Thread Kelly Dean


Laurent Bercot writes:
> It is impossible to portably wait for the appearance of a file.
> And testing the existence of the file first, before creating the
> subdirs, wouldn't help, because it would be a TOCTOU.

s6-supervise aborts on startup if foo/supervise/control is already open, but 
perpetually retries if foo/run doesn't exist. Both of those problems indicate 
the user is doing something wrong. Wouldn't it make more sense for both 
problems to result in the same behavior (either retry or abort, preferably the 
latter)?

https://cr.yp.to/daemontools/supervise.html indicates the original verison of 
supervise aborts in both cases.

I also don't understand the reason for svscan and supervise being different. 
Supervise's job is to watch one daemon. Svscan's job is to watch a collection 
of supervise procs. Why not omit supervise, and have svscan directly watch the 
daemons? Surely this is a common question.

I suppose supervise on its own might be convenient during testing, to have a 
lone supervise proc watching a daemon. But this could be done just as well with 
a combined svscan-supervise, with the daemon being the only entry in the 
collection of watched procs.

I understand svscan must be as simple as possible, for reliability, because it 
must not die. But I don't see how combining it with supervise would really make 
it more complex. It already has supervise's functionality built in (watch a 
target proc, and restart it when it dies).


Re: s6 bites noob

2019-01-31 Thread Colin Booth
On Fri, Feb 01, 2019 at 04:18:50AM +, Kelly Dean wrote:
> Thanks for the fix. Longrun works now, though oneshot still fails, this time 
> with a different message:
> s6-sudoc: fatal: connect to the s6-sudod server - check that you have 
> appropriate permissions.
> 
> I guess that's related to my running all this (including svscan) as non-root. 
> s6rc-oneshot-runner is running now, though.
> 
> Should I run it as root? But then you'll be able to erase a lot more than 
> just the contents of my home dir. ;-)
It's actually that you need to run your s6-rc call as an allowed user.
See the s6-rc-compile -u and -v options. But more about this in a sec.
> 
> I do prefer that my software recognize that I'm an idiot, and refuse to do 
> dubious things unless I specify some --force option. I've been saved 
> countless times by programs designed with users' mental frailty in mind, and 
> bitten countless times by the opposite.
It actually is recognizing that you're an idiot :) At least, it's
recognizing that you've misconfigured something. The s6-sudo program
connects to a s6-sudod socket (really an s6-ipcserverd socket, but
that's an implementation detail) and sends it's argv to s6-sudod.
Anyway, s6-ipcserver does ACL checks, and the problem that you're
running into is that you haven't set your rules correctly and
s6-ipcserver-access is giving you the finger.
> 
> The doc for rc says its diff's view diverges from s6's view only when the 
> service fails permanently. I suggest adding there that downing the service 
> using svc instead of rc qualifies as a permanent failure from rc's point of 
> view. I guess this also means that if rc is used, then svc isn't supposed to 
> be part of the normal user interface.
That requires that s6-rc be permanently running and monitoring a lot of
stuff. As an aside, it's actually very, very handy that you can fake out
s6-rc and make changes to stuff temporarily without having to deal with
the state engine. Don't get me wrong, the state engine is really nice,
but it's a pretty heavy hand sometimes.
> 
> In the docs, I see no way to ask svc whether a service is up, or ask 
> svscanctl which services are up. But obviously rc must be able to ask, in 
> order to do the diff. I also see no straightforward way to ask rc whether a 
> particular service is up, other than
> s6-rc -a list | grep "^servicename$"
To get the actual state of the system I use:
`for svc in /path/to/scandir/* ; do s6-svstat "$svc" ; done' 

For the state of the system as far as the state engine is concerned:
`s6-rc -a list' or `s6-rc -da list' depending on what I'm going for.
> 
> If inotify were portable, would you still consider svscanctl -a to be the 
> best design, or would you omit the -a option and auto-rescan when the scan 
> directory changed?
`s6-svscanctl -an' is by far the nicest mechanism, not just because of
the portability, but also because it lets you stage changes and then
kick the modifications all in one go. If you need auto updating, you can
either use `s6-svscan -t MSECTIMEOUT' (I suggest five seconds, that's
the daemontools / runit way). If you want something a little fancier,
set an ionotify trigger (say, in a longrun so you know it's always going
to be around) that watches the directory and issues an `s6-svscanctl
-an' when it gets nudged.

-- 
Colin Booth


Re: s6 bites noob

2019-01-31 Thread Colin Booth
On Fri, Feb 01, 2019 at 04:18:50AM +, Kelly Dean wrote:
> 
> Thanks for the fix. Longrun works now, though oneshot still fails, this time 
> with a different message:
> s6-sudoc: fatal: connect to the s6-sudod server - check that you have 
> appropriate permissions.
> 
> I guess that's related to my running all this (including svscan) as non-root. 
> s6rc-oneshot-runner is running now, though.
> 
> Should I run it as root? But then you'll be able to erase a lot more than 
> just the contents of my home dir. ;-)
It's actually that you need to run your s6-rc call as an allowed user.
See the s6-rc-compile -u and -v options. But more about this in a sec.
> 
> I do prefer that my software recognize that I'm an idiot, and refuse to do 
> dubious things unless I specify some --force option. I've been saved 
> countless times by programs designed with users' mental frailty in mind, and 
> bitten countless times by the opposite.
It actually is recognizing that you're an idiot :) At least, it's
recognizing that you've misconfigured something. The s6-sudo program
connects to a s6-sudod socket (really an s6-ipcserverd socket, but
that's an implementation detail) and sends it's argv to s6-sudod.
Anyway, s6-ipcserver does ACL checks, and the problem that you're
running into is that you haven't set your rules correctly and
s6-ipcserver-access is giving you the finger.
> 
> The doc for rc says its diff's view diverges from s6's view only when the 
> service fails permanently. I suggest adding there that downing the service 
> using svc instead of rc qualifies as a permanent failure from rc's point of 
> view. I guess this also means that if rc is used, then svc isn't supposed to 
> be part of the normal user interface.
That requires that s6-rc be permanently running and monitoring a lot of
stuff. As an aside, it's actually very, very handy that you can fake out
s6-rc and make changes to stuff temporarily without having to deal with
the state engine. Don't get me wrong, the state engine is really nice,
but it's a pretty heavy hand sometimes.
> 
> In the docs, I see no way to ask svc whether a service is up, or ask 
> svscanctl which services are up. But obviously rc must be able to ask, in 
> order to do the diff. I also see no straightforward way to ask rc whether a 
> particular service is up, other than
> s6-rc -a list | grep "^servicename$"
To get the actual state of the system I use:
`for svc in /path/to/scandir/* ; do s6-svstat "$svc" ; done' 

For the state of the system as far as the state engine is concerned:
`s6-rc -a list' or `s6-rc -da list' depending on what I'm going for.
> 
> If inotify were portable, would you still consider svscanctl -a to be the 
> best design, or would you omit the -a option and auto-rescan when the scan 
> directory changed?
`s6-svscanctl -an' is by far the nicest mechanism, not just because of
the portability, but also because it lets you stage changes and then
kick the modifications all in one go. If you need auto updating, you can
either use `s6-svscan -t MSECTIMEOUT' (I suggest five seconds, that's
the daemontools / runit way). If you want something a little fancier,
set an ionotify trigger (say, in a longrun so you know it's always going
to be around) that watches the directory and issues an `s6-svscanctl
-an' when it gets nudged.

-- 
Colin Booth


Re: s6 bites noob

2019-01-31 Thread Kelly Dean


Thanks for the fix. Longrun works now, though oneshot still fails, this time 
with a different message:
s6-sudoc: fatal: connect to the s6-sudod server - check that you have 
appropriate permissions.

I guess that's related to my running all this (including svscan) as non-root. 
s6rc-oneshot-runner is running now, though.

Should I run it as root? But then you'll be able to erase a lot more than just 
the contents of my home dir. ;-)

I do prefer that my software recognize that I'm an idiot, and refuse to do 
dubious things unless I specify some --force option. I've been saved countless 
times by programs designed with users' mental frailty in mind, and bitten 
countless times by the opposite.

The doc for rc says its diff's view diverges from s6's view only when the 
service fails permanently. I suggest adding there that downing the service 
using svc instead of rc qualifies as a permanent failure from rc's point of 
view. I guess this also means that if rc is used, then svc isn't supposed to be 
part of the normal user interface.

In the docs, I see no way to ask svc whether a service is up, or ask svscanctl 
which services are up. But obviously rc must be able to ask, in order to do the 
diff. I also see no straightforward way to ask rc whether a particular service 
is up, other than
s6-rc -a list | grep "^servicename$"

If inotify were portable, would you still consider svscanctl -a to be the best 
design, or would you omit the -a option and auto-rescan when the scan directory 
changed?


Re: s6 bites noob

2019-01-31 Thread Laurent Bercot

s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d 
serv/foo/ will stop it, but never exits.


Now that is probably due to your setup, because yours is the only
report I have of it not working.


 Update: just tonight I received another report of the exact same
symptoms, so I investigated, and indeed it's a bug I had introduced
a few commits ago. Sorry about that! It is now fixed in the current
s6 git head, so if you git pull and rebuild s6, everything should
now work flawlessly. (No need to rebuild s6-rc.)

--
 Laurent



Re: s6 bites noob

2019-01-31 Thread Laurent Bercot

mkdir test
s6-svscan --help
Well, that was surprising and unpleasant. It ignores unknown arguments, 
blithely starts a supervision tree in the current dir (my home dir), and spams 
me with a bunch of supervise errors. Ok, kill it.

Next test:
s6-svscan test


Do you always run programs you don't know in your home directory
with random arguments before reading the documentation? Because if
you do, then yes, you're bound to experience a few unpleasant surprises,
and s6-svscan is pretty mild in that aspect. I think you should be
thankful that it didn't erase all the files in your home directory. :)



What purpose is served by supervise automatically creating the supervise and 
event subdirs if there's no run file? It seems to accomplish nothing but errors 
and confusion. Instead of creating the subdirs, and then barfing on the absence 
of a run file, why not just create nothing until a run file appears?


It is impossible to portably wait for the appearance of a file.
And testing the existence of the file first, before creating the
subdirs, wouldn't help, because it would be a TOCTOU.

As you have noticed and very clearly reported, s6 is not user-friendly
- or rather, its friendliness is not expressed in a way you have been
lulled into thinking was good by other programs. Its friendliness
comes from the fact that it does not mistake you for an idiot; it
assumes that you know what you are doing, and does not waste code in
performing redundant checks. That's how it avoids bloat, among other
things.

You may find it unpleasant that s6 does not hold your hand. That is
understandable. But I assure you that as soon as you get a little
experience with it (and that can even be achieved by just reading
the documentation *before* launching a command ;)), all the
hand-holding becomes entirely unnecessary because you know what to do.



The doc for svscan at least says that it creates the .s6-svscan subdir. The doc 
for supervise says nothing about creating the supervise subdir, though the doc 
for servicedir does say it.


I agree, the documentation isn't perfect. I'll make sure to add a
note in the s6-supervise page to mention the creation of subdirs.



Next problem. The doc for s6-svc indicates that
s6-svc -wu serv/foo

will wait until it's up. But that's not what happens. Instead, it exits 
immediately.


Right. I know why this happens, and it's not exactly a bug, but I can
understand why it's confusing - and your expectation is legitimate.
So I will change the behaviour so "s6-svc -wu serv/foo" does what you
thought it would do.



 It also doesn't even try to start the service unless -u is also given, which 
is surprising, but technically not in contradiction of the doc.


Well *that* is perfectly intentional.



And if -u is given, then -wu waits forever, even after the service is up. In 
serv/foo/run I have:
#/bin/bash
echo starting; sleep 2; echo dying

s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d 
serv/foo/ will stop it, but never exits.


Now that is probably due to your setup, because yours is the only
report I have of it not working. Please pastebin the output of
"strace -vf -s 256 s6-svc -uwu serv/foo" somewhere, and post the URL:
I, or other people here, will be able to tell you exactly what's going
wrong. Also, just in case, please also pastebin your sysdeps
(by default: /usr/lib/skalibs/sysdeps/sysdeps).


So, I tried s6-rc. Set up service definition dir, compile database, create 
link, run s6-rc-init, etc, then finally
s6-rc -u change foo

It starts immediately, but rc then waits while foo goes through 12 to 15 
start/sleep/die cycles before rc finally exits with code 0. (And foo continues 
cycling.) But if I press ^C on rc before it exits on its own, then it kills 
foo, writes a warning that it was unable to start the service because foo 
crashed with signal 2, and exits with code 1.


This is directly related to your issue with s6-svc above.
"s6-rc -u change foo" precisely calls "s6-svc -uwu" on foo's service
directory, and waits for it to return. Fixing s6-svc's behaviour
in your installation will also fix s6-rc's behaviour.



So I tried it again, and this time pressed ^C on rc immediately after running 
it, before foo had a chance to die for the first time. It reported the same 
warning! The prophecy is impressive, but still, shouldn't rc just exit 
immediately after foo starts, and let the supervision tree independently handle 
foo's future death?


That is normally what happens, except that in your case s6-svc never
returns, so from s6-rc's point of view, the service is still starting.
It's the exact same issue.



Next test: I moved run to up, changed type to oneshot, recompiled, created new 
link, ran s6-rc-update, and tried foo again. This time, rc hangs forever, and 
up is never executed at all. When I eventually press ^C on rc, though, it 
doesn't say unable to start foo; it says unable to start s6rc-oneshot-runner.


Related to the same issue as well. O

Re: s6 bites noob

2019-01-31 Thread Kelly Dean


Kelly Dean writes:
> In serv/foo/run I have:
> #/bin/bash
> echo starting; sleep 2; echo dying

Just a typo in my message. Actual file does have #!/bin/bash


s6 bites noob

2019-01-31 Thread Kelly Dean
mkdir test
s6-svscan --help
Well, that was surprising and unpleasant. It ignores unknown arguments, 
blithely starts a supervision tree in the current dir (my home dir), and spams 
me with a bunch of supervise errors. Ok, kill it.

Next test:
s6-svscan test

It gives errors about supervise being unable to spawn ./run, and the child 
dying. What? On an empty scan dir? Oh, the previous test's accidental 
supervision tree ran supervise on all the current dir's subdirs--and each 
instance of supervise automatically created a supervise subdir of its service 
dir. So, now there's test/supervise, which svscan now interprets as a service 
dir, and starts supervise on it, which barfs.

What purpose is served by supervise automatically creating the supervise and 
event subdirs if there's no run file? It seems to accomplish nothing but errors 
and confusion. Instead of creating the subdirs, and then barfing on the absence 
of a run file, why not just create nothing until a run file appears?

The doc for svscan at least says that it creates the .s6-svscan subdir. The doc 
for supervise says nothing about creating the supervise subdir, though the doc 
for servicedir does say it.

Next problem. The doc for s6-svc indicates that
s6-svc -wu serv/foo

will wait until it's up. But that's not what happens. Instead, it exits 
immediately. It also doesn't even try to start the service unless -u is also 
given, which is surprising, but technically not in contradiction of the doc.

And if -u is given, then -wu waits forever, even after the service is up. In 
serv/foo/run I have:
#/bin/bash
echo starting; sleep 2; echo dying

s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d 
serv/foo/ will stop it, but never exits. supervise itself does do its job 
though, and perpetually restarts run after run dies while the service is set to 
be up.

So, I tried s6-rc. Set up service definition dir, compile database, create 
link, run s6-rc-init, etc, then finally
s6-rc -u change foo

It starts immediately, but rc then waits while foo goes through 12 to 15 
start/sleep/die cycles before rc finally exits with code 0. (And foo continues 
cycling.) But if I press ^C on rc before it exits on its own, then it kills 
foo, writes a warning that it was unable to start the service because foo 
crashed with signal 2, and exits with code 1.

So I tried it again, and this time pressed ^C on rc immediately after running 
it, before foo had a chance to die for the first time. It reported the same 
warning! The prophecy is impressive, but still, shouldn't rc just exit 
immediately after foo starts, and let the supervision tree independently handle 
foo's future death?

Next test: I moved run to up, changed type to oneshot, recompiled, created new 
link, ran s6-rc-update, and tried foo again. This time, rc hangs forever, and 
up is never executed at all. When I eventually press ^C on rc, though, it 
doesn't say unable to start foo; it says unable to start s6rc-oneshot-runner.

This is all with default configuration for skalibs, execline, s6, and s6-rc, 
sourced from Github, running on Debian 9, in my home directory as a non-root 
user (with -c option for rc-init and -l for rc-init, rc, and rc-update, to 
avoid polluting system dirs while testing).
s6-rc doesn't understand a --version option, but s6-rc/NEWS says 0.4.1.1. And 
s6/NEWS says 2.8.0.0.

And there appears to be an option missing for s6-rc:
s6-rc -d list # List all
s6-rc -a list # List all up
s6-rc -d change foo # Bring foo down
s6-rc -u change foo # Bring foo up
s6-rc -da change # Bring all down
How to bring all up? The examples above suggest it would be
s6-rc -ua change
But that does nothing. (And the doc does indicate that it would do nothing, 
since there's no selection.)

And a question about the advice in the docs. if svscan's rescan is 0, and /tmp 
is RAM, what's the advantage of having the scan directory be /tmp/service with 
symlinks to service directory copies in /tmp/services, instead of simply having 
/tmp/services directly be the scan directory?

I guess an answer might be that there can be a race between svscan's initial 
scan at system startup and the populating of /tmp/services, so it sees 
partially copied service directories. But wouldn't a simpler solution be to 
either delay svscan's start until the populating is complete, or add an option 
to disable its initial scan?

With no initial scan, then you have to run svscanctl -a after the /tmp/services 
populating is complete, but you have to run that anyway even if you're using 
symlinks from a separate /tmp/service directory.