Re: s6 bites noob

2019-02-05 Thread Laurent Bercot

just take this as a data sample for what can happen when a random noob tries to 
use s6.


Although unpleasant (not gonna lie), it was a very useful user
experience report, thank you. Among other things, it comforts me in the
belief that a user interface layer on top of s6 + s6-rc + s6-linux-init
is the way to go - a layer that makes things Just Work even when users
don't do everything perfectly, and with friendlier behaviour in case
of an error. People will still be able to look under the hood and
tweak things manually, but they won't have to, and they won't be
exposed to the nuts and bolts unless they want to.

Also, just in case someone tries the latest s6 / s6-rc git head:
I have added "uid/self" and "gid/self" key checking in the accessrules
library, for when the client runs with the same euid / the same egid
as the server; and I have changed s6-rc-compile to use the 
functionality,

removing its -u and -g options in the process. So now, the behaviour
should always be consistent: the user who can operate a s6-rc database
is always the user who owns the supervision tree. No exceptions.
root can also use s6-rc commands, but services will still run as the
user who owns the supervision tree.

A numbered release of s6 and s6-rc (and lots of other packages) will
happen some time next month.



BTW, your explanations of why things are designed the way they are were helpful 
for understanding the system. I recommend copying them into the docs.


 I should write a "rationale / policy recommendation" section in the
documentation pages, that is a good idea.

--
 Laurent



Re: Generic interrupt command?

2019-02-05 Thread Laurent Bercot
Not outputting anything causes kill (on my system at least) to exit non 
0


 Not outputting anything isn't an option, for the case where -o pid is
used in addition to other fields. The field number and order must be
respected.

 It's probably best to use some OOB indicator. How about NA, which I
already use for non-numeric fields? it makes kill correctly choke.
Would it be better to use NA in all the numeric fields, too?

--
 Laurent



Re: Generic interrupt command?

2019-02-05 Thread John O'Meara
On Tue, Feb 5, 2019, 2:20 AM Laurent Bercot 
wrote:

> >Be careful, though. If the service is down, kill will use -1 for the PID,
> >and will probably signal everything in your system except PID 1.
>
>   That's a good point. Should s6-svstat use 0 as the "service is down"
> pid value instead, to avoid this ?
>

0 behaves better for this use case, but can still produce unexpected
behavior.

The construction "echo 0 | xargs kill -STOP" for example leaves behind a
paused background task that needs to be cleaned by hand.

The construction "kill -STOP $(echo 0)" hangs the terminal until someone
resumes the user's shell.

Most other "kill -whatever $(echo 0)" results in the shell exiting and the
user having to log back in.

So, 0 is a lot better than -1, but still not great.

Not outputting anything causes kill (on my system at least) to exit non 0
and give some diagnostic ("`' not a pid or valid pid spec", "you need to
specify whom to kill", or the usage message). That's nice, but would
probably break other scripting that expects a value, especially for
s6-svstat showing multiple fields.

I can't think of a safe and simple way to do this. For example, we could
suggest people do something like this (based on Roger Pate's post):

   pid=$(s6-svstat -p /my/service) && [ "$pid" -ne -1 ] && kill -SIGNAL $pid

but that's a lot of typing and requires that people see and remember the
suggestion, so not quite simple :-/

-- 
John O'Meara

>


Re: s6 bites noob

2019-02-05 Thread Kelly Dean


Laurent Bercot writes:
>>Anyway, recompile with -u 1000, re-update, and try again. Now, I can't even 
>>do s6-rc -a list; I get:
>>s6-rc fatal: unable to take locks: Permission denied
>
> Hmmm, that's weird. If all the previous operations have been done as
> the same user, you should never get EPERM. Have you run something as
> root before?

Indeed, I did. My command history from last night shows that before I 
remembered to try compiling with -u 1000, I tried sudo s6-rc change testpipe, 
after the previous non-sudo invocation failed with a permission error, so that 
must be what screwed it up. I don't remember doing that. Must have been really 
tired and frustrated.

So I killed svscan, removed my compiled databases and scan and live dirs, and 
started from scratch. Now s6-rc succeeds, but when I brought up testpipe (two 
daemons funneling to a logger), I got once per second:
fdclose: fatal: unable to exec ./run.user: Exec format error

Oops, I forgot #!/bin/bash at the top of one of the run files. (Would have been 
helpful if the error message had specified which one.) Fix that, recompile, 
make new link, do an update, try again. Now:
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-r-logger: 
Broken pipe
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-w-logger: 
Broken pipe
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-r-logger: 
Connection reset by peer
s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-w-logger: 
Connection reset by peer

It also somehow managed to hose the terminal in which svscan was running. As 
in, when I try to type in it, only a small percentage of the letters actually 
appear. Killed svscan, tried to reset the terminal, no luck. This is the first 
time I remember ever getting an un-resettable terminal. No problem, I can just 
kill the terminal, but... weird.

Oops, after I added the forgotten #!/bin/bash, I forgot -u 1000 again when I 
recompiled. So, the failure should be expected, but hosing the terminal? 
Really? And the error messages give no hint of what's actually wrong, unless 
you're familiar with the internal design of s6, which seems an excessive burden 
for a mere user. I guess I'm spoiled by modern C compilers, which have become 
excellent in the past few years at explaining in exquisite detail exactly in 
which way I'm currently being an idiot.

So, remove the compiled databases and scan directory, recompile with -u 1000, 
restart svscan, re-run s6-rc-init, try testpipe again, and... success! Wow, 
that was unexpected. I'd become conditioned to expect failure.

Ok now, quick, while I remember how to use s6, I'll install it into my project 
and make sure it works perfectly, so I never have to touch it again. There are 
other things I'd be curious to try with it too, but I shouldn't keep pestering 
you and the mailing list for unpaid tech support, so I guess just take this as 
a data sample for what can happen when a random noob tries to use s6.

BTW, your explanations of why things are designed the way they are were helpful 
for understanding the system. I recommend copying them into the docs.