Re: s6 bites noob
just take this as a data sample for what can happen when a random noob tries to use s6. Although unpleasant (not gonna lie), it was a very useful user experience report, thank you. Among other things, it comforts me in the belief that a user interface layer on top of s6 + s6-rc + s6-linux-init is the way to go - a layer that makes things Just Work even when users don't do everything perfectly, and with friendlier behaviour in case of an error. People will still be able to look under the hood and tweak things manually, but they won't have to, and they won't be exposed to the nuts and bolts unless they want to. Also, just in case someone tries the latest s6 / s6-rc git head: I have added "uid/self" and "gid/self" key checking in the accessrules library, for when the client runs with the same euid / the same egid as the server; and I have changed s6-rc-compile to use the functionality, removing its -u and -g options in the process. So now, the behaviour should always be consistent: the user who can operate a s6-rc database is always the user who owns the supervision tree. No exceptions. root can also use s6-rc commands, but services will still run as the user who owns the supervision tree. A numbered release of s6 and s6-rc (and lots of other packages) will happen some time next month. BTW, your explanations of why things are designed the way they are were helpful for understanding the system. I recommend copying them into the docs. I should write a "rationale / policy recommendation" section in the documentation pages, that is a good idea. -- Laurent
Re: s6 bites noob
Laurent Bercot writes: >>Anyway, recompile with -u 1000, re-update, and try again. Now, I can't even >>do s6-rc -a list; I get: >>s6-rc fatal: unable to take locks: Permission denied > > Hmmm, that's weird. If all the previous operations have been done as > the same user, you should never get EPERM. Have you run something as > root before? Indeed, I did. My command history from last night shows that before I remembered to try compiling with -u 1000, I tried sudo s6-rc change testpipe, after the previous non-sudo invocation failed with a permission error, so that must be what screwed it up. I don't remember doing that. Must have been really tired and frustrated. So I killed svscan, removed my compiled databases and scan and live dirs, and started from scratch. Now s6-rc succeeds, but when I brought up testpipe (two daemons funneling to a logger), I got once per second: fdclose: fatal: unable to exec ./run.user: Exec format error Oops, I forgot #!/bin/bash at the top of one of the run files. (Would have been helpful if the error message had specified which one.) Fix that, recompile, make new link, do an update, try again. Now: s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-r-logger: Broken pipe s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-w-logger: Broken pipe s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-r-logger: Connection reset by peer s6-fdholder-retrievec: fatal: unable to retrieve fd for id pipe:s6rc-w-logger: Connection reset by peer It also somehow managed to hose the terminal in which svscan was running. As in, when I try to type in it, only a small percentage of the letters actually appear. Killed svscan, tried to reset the terminal, no luck. This is the first time I remember ever getting an un-resettable terminal. No problem, I can just kill the terminal, but... weird. Oops, after I added the forgotten #!/bin/bash, I forgot -u 1000 again when I recompiled. So, the failure should be expected, but hosing the terminal? Really? And the error messages give no hint of what's actually wrong, unless you're familiar with the internal design of s6, which seems an excessive burden for a mere user. I guess I'm spoiled by modern C compilers, which have become excellent in the past few years at explaining in exquisite detail exactly in which way I'm currently being an idiot. So, remove the compiled databases and scan directory, recompile with -u 1000, restart svscan, re-run s6-rc-init, try testpipe again, and... success! Wow, that was unexpected. I'd become conditioned to expect failure. Ok now, quick, while I remember how to use s6, I'll install it into my project and make sure it works perfectly, so I never have to touch it again. There are other things I'd be curious to try with it too, but I shouldn't keep pestering you and the mailing list for unpaid tech support, so I guess just take this as a data sample for what can happen when a random noob tries to use s6. BTW, your explanations of why things are designed the way they are were helpful for understanding the system. I recommend copying them into the docs.
Re: s6 bites noob
But run not existing when supervise starts is a different case from run disappearing after supervise is already running. No, it's not. First, s6-supervise starts and initializes the state of the service to "down, wanted up" (unless there's a ./down file in which case the service isn't wanted up). Then, s6-supervise enters its main loop, where it sees that the service is down + wanted up, so it tries to start it. If there is no run file, it's a temporary failure. No matter whether or not it's the first time it tries. Adding a special case where s6-supervise aborts when it tries to start ./run for the first time would make the code more complex, especially when you try to answer questions such as "what do I do if there is a down file and the service is started later" (which is the case when s6-supervise is driven by s6-rc), or "what do I do when ./run exists but is non-executable", or other questions in the same vein. And the end benefit of having such a special case would be very dubious. Another example of orneriness: supervise automatically does its own initialization, but the s6-rc program (not the eponymous suite) doesn't. Instead, the suite has a separate init program, s6-rc-init, that's normally run at boot time. But if it isn't run at boot time (which is a policy decision), s6-rc doesn't automatically run it if necessary. If rc shouldn't auto-initialize, neither should supervise. Now you are just acting in bad faith. Different programs, with different goals, obviously have different requirements and different behaviours; you can't seriously suggest that apples should behave like oranges. If a system uses s6-rc as its service manager, then early on in the boot process, it *will* run s6-rc-init. That's not so much a policy decision as a s6-rc mechanism. It's running a command line program in the system initialization sequence, it's not exactly difficult or convoluted. Another one: the -d option to s6-rc is overloaded. When used with change, it means to down the selected services. But when used with list, it means to invert the selection. I'm going to repeatedly forget this. You know what's interesting? It initially did not do this. And then people complained that the behaviour wasn't intuitive, and that s6-rc -d list *should* invert the selection. I thought about it and realized that it made sense, so I implemented it. I guess you can't make everyone happy. One more: the doc for the s6-rc program says it's meant to be the one-stop shop of service management, after compilation and initialization are done. It has subcommands list, listall, diff, and change. But s6-rc-update is a separate program, not a subcommand of s6-rc. I suppose there's a reason for this, but it complicates the user interface with a seemingly arbitrary distinction of whether to put a dash between "s6-rc" and the subcommand depending on what the particular subcommand is. The s6-rc command operates the service management engine, relying on a stable compiled service database. It will read the database and perform operations in that context. It is the command to use when querying the database, and starting/stopping services, in a normal production environment. The s6-rc-update command does not fit in that model. It is an administration command: it changes the context in which s6-rc runs. It is not used as commonly, it is heavier and more dangerous. It switches databases! Think atomically regenerating the openrc cache after modifying your services in /etc/init.d. (OpenRC offers no such thing, and it's a mess reliability-wise.) This is a fundamentally different thing from running the engine. The docs advise entirely copying the service repository to a ramdisk, then using (a link to) the copy as the scan directory. This makes the running system independent of the original repo. But the doc for s6-rc-init says the rc system remains dependent on the original compiled database, and there's no explanation of why it isn't also copied in order to make the running system independent. The point of operating off a copy of some data is that system operation won't be disturbed when the user modifies the original data, until they decide to commit/flush a batch of changes - which the system should then pick up as atomically as possible. A service directory is data that the user can modify. That is why it is better to run s6-supervise on a copy of a service directory (separate "live" data from "stock" data). The compiled database is already a copy. It's not data that the user modifies (except potentially for adding or removing bundles, but those are a layer on top of the core database, which remains untouched). What the user will modify is the source directory, that will be compiled into a different database, and changes will not happen to the live system until the user calls s6-rc-update, which is the atomic commit/flush operation. Ok, Colin Booth mentioned permission issues when running as non-root. It
Re: s6 bites noob
Laurent Bercot writes: > foo/run not existing is a temporary error condition that can happen > at any time, not only at the start of s6-supervise. This is a very > different case: the supervisor is already running and the user is > relying on its monitoring foo. But run not existing when supervise starts is a different case from run disappearing after supervise is already running. Even though supervise should continue running if run disappears, that doesn't imply that it shouldn't abort on startup if run doesn't exist in the first place. Another example of orneriness: supervise automatically does its own initialization, but the s6-rc program (not the eponymous suite) doesn't. Instead, the suite has a separate init program, s6-rc-init, that's normally run at boot time. But if it isn't run at boot time (which is a policy decision), s6-rc doesn't automatically run it if necessary. If rc shouldn't auto-initialize, neither should supervise. Another one: the -d option to s6-rc is overloaded. When used with change, it means to down the selected services. But when used with list, it means to invert the selection. I'm going to repeatedly forget this. One more: the doc for the s6-rc program says it's meant to be the one-stop shop of service management, after compilation and initialization are done. It has subcommands list, listall, diff, and change. But s6-rc-update is a separate program, not a subcommand of s6-rc. I suppose there's a reason for this, but it complicates the user interface with a seemingly arbitrary distinction of whether to put a dash between "s6-rc" and the subcommand depending on what the particular subcommand is. The docs advise entirely copying the service repository to a ramdisk, then using (a link to) the copy as the scan directory. This makes the running system independent of the original repo. But the doc for s6-rc-init says the rc system remains dependent on the original compiled database, and there's no explanation of why it isn't also copied in order to make the running system independent. I tried to test the logger. Set up a service repo with foo and bar, each with a run like #!/bin/bash echo foo starting sleep 2 echo foo dying foo and bar are funneled to a logger that has this run file: s6-log -l 1 s100 T /home/user/testlogs Try to start the bundle. Hangs. Press ^C. Get: s6-rc: warning: unable to start service s6rc-fdholder: command crashed with signal 2. Ok, Colin Booth mentioned permission issues when running as non-root. It shouldn't be a problem, since all of this (including svscan) is running as the same user. Permission problems should only come into play when trying to do things inter-user. Anyway, I checked the s6-rc-compile doc. Looks like -h won't be necessary, since it defaults to the owner of the svscan proc. But -u is needed, since it defaults to allowing only root--even though I've never run any of this as root, and I've never asked it to try to do anything as root, and I've never told it that it should expect to be root, or even mentioned root at all. And I'm not really sure the doc is right, because it says -u controls who's allowed to start and stop services, yet I've already used rc to start and stop regular (longrun) services as my non-root user before, with no problem (I had a problem only with oneshot), even though the doc says that since I didn't compile with -u, it should have disallowed that. Anyway, recompile with -u 1000, re-update, and try again. Now, I can't even do s6-rc -a list; I get: s6-rc fatal: unable to take locks: Permission denied Maybe I missed an essential step, and screwed something up? I'm bewildered, tired, and going to bed. After reading more of the docs than I expected to be necessary, I'm still unable to get s6 to do the basic job I need: manage a small group of services, and funnel and log their output. It's especially frustrating having to fight with software that generates gratuitous intra-user permission errors. I'll try again in the morning, with replenished willpower.
Re: s6 bites noob
s6-supervise aborts on startup if foo/supervise/control is already open, but perpetually retries if foo/run doesn't exist. Both of those problems indicate the user is doing something wrong. Wouldn't it make more sense for both problems to result in the same behavior (either retry or abort, preferably the latter)? foo/supervise/control being already open indicates there's already a s6-supervise process monitoring foo - in which case spawning another one makes no sense, so s6-supervise aborts. foo/run not existing is a temporary error condition that can happen at any time, not only at the start of s6-supervise. This is a very different case: the supervisor is already running and the user is relying on its monitoring foo. At that point, the supervisor really should not die, unless explicitly asked to; and "nonexistent foo/run" is perfectly recoverable, you just have to warn the user and try again later. It's simply the difference between a fatal error and a recoverable error. In most simple programs, all errors can be treated as fatal: if you're not in the nominal case, just abort and let the user deal with it. But in a supervisor, the difference is important, because surviving all kinds of trouble is precisely what a supervisor is there for. https://cr.yp.to/daemontools/supervise.html indicates the original verison of supervise aborts in both cases. That's what it suggests, but it is unclear ("may exit"). I have forgotten what daemontools' supervise does when foo/run doesn't exist, but I don't think it dies. I think it loops, just as s6-supervise does. You should test it. I also don't understand the reason for svscan and supervise being different. Supervise's job is to watch one daemon. Svscan's job is to watch a collection of supervise procs. Why not omit supervise, and have svscan directly watch the daemons? Surely this is a common question. You said it yourself: supervise's job is to watch one daemon, and svscan's job is to watch a collection of supervise processes. That is not the same job at all. And if it's not the same job, a Unix guideline says they should be different programs: one function = one tool. With experience, I've found this guideline to be 100% justified, and extremely useful. Look at s6-svscan's and s6-supervise's source code. You will find they share very few library functions - there's basically no code duplication, no functionality duplication, between them. Supervising several daemons from one unique process is obviously possible. That's for instance what perpd, sysvinit and systemd do. But if you look at perpd's source code (which is functionally and stylistically the closest to svscan+supervise) you'll see that it's almost as long as the source code of s6-svscan plus s6-supervise combined, while not being a perfectly nonblocking state machine as s6-supervise is. Combining functionality into a single process adds complexity. Putting separate functionality in separate processes reduces complexity, because it takes advantage of the natural boundaries provided by the OS. It allows you to do just as much with much less code. I understand svscan must be as simple as possible, for reliability, because it must not die. But I don't see how combining it with supervise would really make it more complex. It already has supervise's functionality built in (watch a target proc, and restart it when it dies). No, the functionality isn't the same at all, and "restart a process when it dies" is an excessively simplified view of what s6-supervise does. If that was all there is to it, a "while true ; do ./run ; done" shell script would do the job; but if you've had to deal with that approach once in a production environment, you intimately and painfully know how terrible it is. s6-svscan knows how s6-supervise behaves, and can trust it and rely on an interface between the two programs since they're part of the same package. Spawning and watching a s6-supervise process is easy, as easy as calling a function; s6-svscan's complexity comes from the fact that it needs to manage a *collection* of s6-supervise processes. (Actually, the brunt of its complexity comes from supporting pipes between a service and a logger, but that's beside the point.) On the other hand, s6-supervise does not know how ./run behaves, can make no assumption about it, cannot trust it, must babysit it no matter how bad it gets, and must remain stable no matter how much shit it throws at you. This is a totally different job - and a much harder job than watching a thousand of nice, friendly s6-supervise programs. Part of the proof is that s6-supervise's source code is bigger than s6-svscan's. By all means, if you want a single supervisor for all your services, try perp. It may suit you. But I don't think having less processes in your "ps" output is a worthwhile goal: it's purely cosmetic, and you have to balance that against the real benefits that separating processes provides. -- Laurent
Re: s6 bites noob
Laurent Bercot writes: > It is impossible to portably wait for the appearance of a file. > And testing the existence of the file first, before creating the > subdirs, wouldn't help, because it would be a TOCTOU. s6-supervise aborts on startup if foo/supervise/control is already open, but perpetually retries if foo/run doesn't exist. Both of those problems indicate the user is doing something wrong. Wouldn't it make more sense for both problems to result in the same behavior (either retry or abort, preferably the latter)? https://cr.yp.to/daemontools/supervise.html indicates the original verison of supervise aborts in both cases. I also don't understand the reason for svscan and supervise being different. Supervise's job is to watch one daemon. Svscan's job is to watch a collection of supervise procs. Why not omit supervise, and have svscan directly watch the daemons? Surely this is a common question. I suppose supervise on its own might be convenient during testing, to have a lone supervise proc watching a daemon. But this could be done just as well with a combined svscan-supervise, with the daemon being the only entry in the collection of watched procs. I understand svscan must be as simple as possible, for reliability, because it must not die. But I don't see how combining it with supervise would really make it more complex. It already has supervise's functionality built in (watch a target proc, and restart it when it dies).
Re: s6 bites noob
On Fri, Feb 01, 2019 at 04:18:50AM +, Kelly Dean wrote: > Thanks for the fix. Longrun works now, though oneshot still fails, this time > with a different message: > s6-sudoc: fatal: connect to the s6-sudod server - check that you have > appropriate permissions. > > I guess that's related to my running all this (including svscan) as non-root. > s6rc-oneshot-runner is running now, though. > > Should I run it as root? But then you'll be able to erase a lot more than > just the contents of my home dir. ;-) It's actually that you need to run your s6-rc call as an allowed user. See the s6-rc-compile -u and -v options. But more about this in a sec. > > I do prefer that my software recognize that I'm an idiot, and refuse to do > dubious things unless I specify some --force option. I've been saved > countless times by programs designed with users' mental frailty in mind, and > bitten countless times by the opposite. It actually is recognizing that you're an idiot :) At least, it's recognizing that you've misconfigured something. The s6-sudo program connects to a s6-sudod socket (really an s6-ipcserverd socket, but that's an implementation detail) and sends it's argv to s6-sudod. Anyway, s6-ipcserver does ACL checks, and the problem that you're running into is that you haven't set your rules correctly and s6-ipcserver-access is giving you the finger. > > The doc for rc says its diff's view diverges from s6's view only when the > service fails permanently. I suggest adding there that downing the service > using svc instead of rc qualifies as a permanent failure from rc's point of > view. I guess this also means that if rc is used, then svc isn't supposed to > be part of the normal user interface. That requires that s6-rc be permanently running and monitoring a lot of stuff. As an aside, it's actually very, very handy that you can fake out s6-rc and make changes to stuff temporarily without having to deal with the state engine. Don't get me wrong, the state engine is really nice, but it's a pretty heavy hand sometimes. > > In the docs, I see no way to ask svc whether a service is up, or ask > svscanctl which services are up. But obviously rc must be able to ask, in > order to do the diff. I also see no straightforward way to ask rc whether a > particular service is up, other than > s6-rc -a list | grep "^servicename$" To get the actual state of the system I use: `for svc in /path/to/scandir/* ; do s6-svstat "$svc" ; done' For the state of the system as far as the state engine is concerned: `s6-rc -a list' or `s6-rc -da list' depending on what I'm going for. > > If inotify were portable, would you still consider svscanctl -a to be the > best design, or would you omit the -a option and auto-rescan when the scan > directory changed? `s6-svscanctl -an' is by far the nicest mechanism, not just because of the portability, but also because it lets you stage changes and then kick the modifications all in one go. If you need auto updating, you can either use `s6-svscan -t MSECTIMEOUT' (I suggest five seconds, that's the daemontools / runit way). If you want something a little fancier, set an ionotify trigger (say, in a longrun so you know it's always going to be around) that watches the directory and issues an `s6-svscanctl -an' when it gets nudged. -- Colin Booth
Re: s6 bites noob
On Fri, Feb 01, 2019 at 04:18:50AM +, Kelly Dean wrote: > > Thanks for the fix. Longrun works now, though oneshot still fails, this time > with a different message: > s6-sudoc: fatal: connect to the s6-sudod server - check that you have > appropriate permissions. > > I guess that's related to my running all this (including svscan) as non-root. > s6rc-oneshot-runner is running now, though. > > Should I run it as root? But then you'll be able to erase a lot more than > just the contents of my home dir. ;-) It's actually that you need to run your s6-rc call as an allowed user. See the s6-rc-compile -u and -v options. But more about this in a sec. > > I do prefer that my software recognize that I'm an idiot, and refuse to do > dubious things unless I specify some --force option. I've been saved > countless times by programs designed with users' mental frailty in mind, and > bitten countless times by the opposite. It actually is recognizing that you're an idiot :) At least, it's recognizing that you've misconfigured something. The s6-sudo program connects to a s6-sudod socket (really an s6-ipcserverd socket, but that's an implementation detail) and sends it's argv to s6-sudod. Anyway, s6-ipcserver does ACL checks, and the problem that you're running into is that you haven't set your rules correctly and s6-ipcserver-access is giving you the finger. > > The doc for rc says its diff's view diverges from s6's view only when the > service fails permanently. I suggest adding there that downing the service > using svc instead of rc qualifies as a permanent failure from rc's point of > view. I guess this also means that if rc is used, then svc isn't supposed to > be part of the normal user interface. That requires that s6-rc be permanently running and monitoring a lot of stuff. As an aside, it's actually very, very handy that you can fake out s6-rc and make changes to stuff temporarily without having to deal with the state engine. Don't get me wrong, the state engine is really nice, but it's a pretty heavy hand sometimes. > > In the docs, I see no way to ask svc whether a service is up, or ask > svscanctl which services are up. But obviously rc must be able to ask, in > order to do the diff. I also see no straightforward way to ask rc whether a > particular service is up, other than > s6-rc -a list | grep "^servicename$" To get the actual state of the system I use: `for svc in /path/to/scandir/* ; do s6-svstat "$svc" ; done' For the state of the system as far as the state engine is concerned: `s6-rc -a list' or `s6-rc -da list' depending on what I'm going for. > > If inotify were portable, would you still consider svscanctl -a to be the > best design, or would you omit the -a option and auto-rescan when the scan > directory changed? `s6-svscanctl -an' is by far the nicest mechanism, not just because of the portability, but also because it lets you stage changes and then kick the modifications all in one go. If you need auto updating, you can either use `s6-svscan -t MSECTIMEOUT' (I suggest five seconds, that's the daemontools / runit way). If you want something a little fancier, set an ionotify trigger (say, in a longrun so you know it's always going to be around) that watches the directory and issues an `s6-svscanctl -an' when it gets nudged. -- Colin Booth
Re: s6 bites noob
Thanks for the fix. Longrun works now, though oneshot still fails, this time with a different message: s6-sudoc: fatal: connect to the s6-sudod server - check that you have appropriate permissions. I guess that's related to my running all this (including svscan) as non-root. s6rc-oneshot-runner is running now, though. Should I run it as root? But then you'll be able to erase a lot more than just the contents of my home dir. ;-) I do prefer that my software recognize that I'm an idiot, and refuse to do dubious things unless I specify some --force option. I've been saved countless times by programs designed with users' mental frailty in mind, and bitten countless times by the opposite. The doc for rc says its diff's view diverges from s6's view only when the service fails permanently. I suggest adding there that downing the service using svc instead of rc qualifies as a permanent failure from rc's point of view. I guess this also means that if rc is used, then svc isn't supposed to be part of the normal user interface. In the docs, I see no way to ask svc whether a service is up, or ask svscanctl which services are up. But obviously rc must be able to ask, in order to do the diff. I also see no straightforward way to ask rc whether a particular service is up, other than s6-rc -a list | grep "^servicename$" If inotify were portable, would you still consider svscanctl -a to be the best design, or would you omit the -a option and auto-rescan when the scan directory changed?
Re: s6 bites noob
s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d serv/foo/ will stop it, but never exits. Now that is probably due to your setup, because yours is the only report I have of it not working. Update: just tonight I received another report of the exact same symptoms, so I investigated, and indeed it's a bug I had introduced a few commits ago. Sorry about that! It is now fixed in the current s6 git head, so if you git pull and rebuild s6, everything should now work flawlessly. (No need to rebuild s6-rc.) -- Laurent
Re: s6 bites noob
mkdir test s6-svscan --help Well, that was surprising and unpleasant. It ignores unknown arguments, blithely starts a supervision tree in the current dir (my home dir), and spams me with a bunch of supervise errors. Ok, kill it. Next test: s6-svscan test Do you always run programs you don't know in your home directory with random arguments before reading the documentation? Because if you do, then yes, you're bound to experience a few unpleasant surprises, and s6-svscan is pretty mild in that aspect. I think you should be thankful that it didn't erase all the files in your home directory. :) What purpose is served by supervise automatically creating the supervise and event subdirs if there's no run file? It seems to accomplish nothing but errors and confusion. Instead of creating the subdirs, and then barfing on the absence of a run file, why not just create nothing until a run file appears? It is impossible to portably wait for the appearance of a file. And testing the existence of the file first, before creating the subdirs, wouldn't help, because it would be a TOCTOU. As you have noticed and very clearly reported, s6 is not user-friendly - or rather, its friendliness is not expressed in a way you have been lulled into thinking was good by other programs. Its friendliness comes from the fact that it does not mistake you for an idiot; it assumes that you know what you are doing, and does not waste code in performing redundant checks. That's how it avoids bloat, among other things. You may find it unpleasant that s6 does not hold your hand. That is understandable. But I assure you that as soon as you get a little experience with it (and that can even be achieved by just reading the documentation *before* launching a command ;)), all the hand-holding becomes entirely unnecessary because you know what to do. The doc for svscan at least says that it creates the .s6-svscan subdir. The doc for supervise says nothing about creating the supervise subdir, though the doc for servicedir does say it. I agree, the documentation isn't perfect. I'll make sure to add a note in the s6-supervise page to mention the creation of subdirs. Next problem. The doc for s6-svc indicates that s6-svc -wu serv/foo will wait until it's up. But that's not what happens. Instead, it exits immediately. Right. I know why this happens, and it's not exactly a bug, but I can understand why it's confusing - and your expectation is legitimate. So I will change the behaviour so "s6-svc -wu serv/foo" does what you thought it would do. It also doesn't even try to start the service unless -u is also given, which is surprising, but technically not in contradiction of the doc. Well *that* is perfectly intentional. And if -u is given, then -wu waits forever, even after the service is up. In serv/foo/run I have: #/bin/bash echo starting; sleep 2; echo dying s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d serv/foo/ will stop it, but never exits. Now that is probably due to your setup, because yours is the only report I have of it not working. Please pastebin the output of "strace -vf -s 256 s6-svc -uwu serv/foo" somewhere, and post the URL: I, or other people here, will be able to tell you exactly what's going wrong. Also, just in case, please also pastebin your sysdeps (by default: /usr/lib/skalibs/sysdeps/sysdeps). So, I tried s6-rc. Set up service definition dir, compile database, create link, run s6-rc-init, etc, then finally s6-rc -u change foo It starts immediately, but rc then waits while foo goes through 12 to 15 start/sleep/die cycles before rc finally exits with code 0. (And foo continues cycling.) But if I press ^C on rc before it exits on its own, then it kills foo, writes a warning that it was unable to start the service because foo crashed with signal 2, and exits with code 1. This is directly related to your issue with s6-svc above. "s6-rc -u change foo" precisely calls "s6-svc -uwu" on foo's service directory, and waits for it to return. Fixing s6-svc's behaviour in your installation will also fix s6-rc's behaviour. So I tried it again, and this time pressed ^C on rc immediately after running it, before foo had a chance to die for the first time. It reported the same warning! The prophecy is impressive, but still, shouldn't rc just exit immediately after foo starts, and let the supervision tree independently handle foo's future death? That is normally what happens, except that in your case s6-svc never returns, so from s6-rc's point of view, the service is still starting. It's the exact same issue. Next test: I moved run to up, changed type to oneshot, recompiled, created new link, ran s6-rc-update, and tried foo again. This time, rc hangs forever, and up is never executed at all. When I eventually press ^C on rc, though, it doesn't say unable to start foo; it says unable to start s6rc-oneshot-runner. Related to the same issue as well. O
Re: s6 bites noob
Kelly Dean writes: > In serv/foo/run I have: > #/bin/bash > echo starting; sleep 2; echo dying Just a typo in my message. Actual file does have #!/bin/bash
s6 bites noob
mkdir test s6-svscan --help Well, that was surprising and unpleasant. It ignores unknown arguments, blithely starts a supervision tree in the current dir (my home dir), and spams me with a bunch of supervise errors. Ok, kill it. Next test: s6-svscan test It gives errors about supervise being unable to spawn ./run, and the child dying. What? On an empty scan dir? Oh, the previous test's accidental supervision tree ran supervise on all the current dir's subdirs--and each instance of supervise automatically created a supervise subdir of its service dir. So, now there's test/supervise, which svscan now interprets as a service dir, and starts supervise on it, which barfs. What purpose is served by supervise automatically creating the supervise and event subdirs if there's no run file? It seems to accomplish nothing but errors and confusion. Instead of creating the subdirs, and then barfing on the absence of a run file, why not just create nothing until a run file appears? The doc for svscan at least says that it creates the .s6-svscan subdir. The doc for supervise says nothing about creating the supervise subdir, though the doc for servicedir does say it. Next problem. The doc for s6-svc indicates that s6-svc -wu serv/foo will wait until it's up. But that's not what happens. Instead, it exits immediately. It also doesn't even try to start the service unless -u is also given, which is surprising, but technically not in contradiction of the doc. And if -u is given, then -wu waits forever, even after the service is up. In serv/foo/run I have: #/bin/bash echo starting; sleep 2; echo dying s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d serv/foo/ will stop it, but never exits. supervise itself does do its job though, and perpetually restarts run after run dies while the service is set to be up. So, I tried s6-rc. Set up service definition dir, compile database, create link, run s6-rc-init, etc, then finally s6-rc -u change foo It starts immediately, but rc then waits while foo goes through 12 to 15 start/sleep/die cycles before rc finally exits with code 0. (And foo continues cycling.) But if I press ^C on rc before it exits on its own, then it kills foo, writes a warning that it was unable to start the service because foo crashed with signal 2, and exits with code 1. So I tried it again, and this time pressed ^C on rc immediately after running it, before foo had a chance to die for the first time. It reported the same warning! The prophecy is impressive, but still, shouldn't rc just exit immediately after foo starts, and let the supervision tree independently handle foo's future death? Next test: I moved run to up, changed type to oneshot, recompiled, created new link, ran s6-rc-update, and tried foo again. This time, rc hangs forever, and up is never executed at all. When I eventually press ^C on rc, though, it doesn't say unable to start foo; it says unable to start s6rc-oneshot-runner. This is all with default configuration for skalibs, execline, s6, and s6-rc, sourced from Github, running on Debian 9, in my home directory as a non-root user (with -c option for rc-init and -l for rc-init, rc, and rc-update, to avoid polluting system dirs while testing). s6-rc doesn't understand a --version option, but s6-rc/NEWS says 0.4.1.1. And s6/NEWS says 2.8.0.0. And there appears to be an option missing for s6-rc: s6-rc -d list # List all s6-rc -a list # List all up s6-rc -d change foo # Bring foo down s6-rc -u change foo # Bring foo up s6-rc -da change # Bring all down How to bring all up? The examples above suggest it would be s6-rc -ua change But that does nothing. (And the doc does indicate that it would do nothing, since there's no selection.) And a question about the advice in the docs. if svscan's rescan is 0, and /tmp is RAM, what's the advantage of having the scan directory be /tmp/service with symlinks to service directory copies in /tmp/services, instead of simply having /tmp/services directly be the scan directory? I guess an answer might be that there can be a race between svscan's initial scan at system startup and the populating of /tmp/services, so it sees partially copied service directories. But wouldn't a simpler solution be to either delay svscan's start until the populating is complete, or add an option to disable its initial scan? With no initial scan, then you have to run svscanctl -a after the /tmp/services populating is complete, but you have to run that anyway even if you're using symlinks from a separate /tmp/service directory.