Re: s6-rc user services on Gentoo

2024-04-06 Thread Guillermo
El mié, 3 abr 2024 a las 8:37, Laurent Bercot escribió:
>
> >2) The presence of a notification-fd file tells s6 that dbus-daemon
> >can be somehow coerced into producing an s6-style readiness
> >notification using file descriptor 3 without changing its code, are
> >you sure that's the case with this script? My service definition for
> >the system-wide message bus polls for readiness using s6-notifyoncheck
> >and a dbus-send command...
>
>   "dbus-daemon --print-address=3" should produce a suitable notification.
> The address of the bus can only be printed once the bus exists. :)

But then, there is a problem if one actually wants the server address
information that --print-address provides. Alexis' 'run' script for
example wants to save that to a file (apparently in a directory
suitable for s6-envdir). If the output is sent to the notification
pipe instead, s6-supervise will 'eat' it while waiting for the final
newline character, and then the information is lost.

And more generally, there's also the question about how 'ready'
dbus-daemon actually is when the point in its code that prints the
server address is reached. I can't really say without looking at the
code; dbus-daemon has many duties once its UNIX domain socket is
bound.

G.


Re: s6-rc user services on Gentoo

2024-04-02 Thread Guillermo
Hello,

El mar, 2 abr 2024 a las 1:42, Alexis escribió:
>
>   
> S6RC_SERVICE_REPO=${HOME}/src/srht/gentoo-overlay/sys-apps/s6-rc-services/files/
>
> * The above directory contains two service directories:
> ** dbus-session-bus/, containing:
> *** type: longrun
> *** notification-fd: 3
> *** producer-for: dbus-session-bus-log
> *** run:
> #!/usr/bin/execlineb -P
> importas HOME HOME
> redirfd -w 4 ${HOME}/.env.d/DBUS_SESSION_BUS_ADDRESS
> dbus-daemon --session --fork --nopidfile --print-address=4

You already got an answer about why s6-rc-init doesn't work (the scan
directory and the directory of service definitions given to
s6-rc-compile should not be the same; let s6-rc-init populate the
service directory). I'd like to comment on what you didn't ask :)

1) Why are you telling dbus-daemon to --fork? That defeats the purpose
of service supervision. The service definition for the system-wide
message bus that I have on a Gentoo VM of mine with s6 + s6-rc +
s6-linux-init uses --nofork.

2) The presence of a notification-fd file tells s6 that dbus-daemon
can be somehow coerced into producing an s6-style readiness
notification using file descriptor 3 without changing its code, are
you sure that's the case with this script? My service definition for
the system-wide message bus polls for readiness using s6-notifyoncheck
and a dbus-send command...

> [a] The current s6 and s6-rc pages on the wiki have lot of detail,
> without any "quickstart" tutorials that might make it easy for
> people to get on board.

Most of the content is from around 2017, a time in which the s6 suite
was less known, information in places other than the skarnet.org
website was lacking and inaccurate —despite official documentation,
while being short and to the point, has always been quite good and
complete IMO; it seemed people just didn't bother reading it? [1]—,
and there weren't many usage examples around.

[1] I think I never said it, but what made me look at s6 for the first
time (s6-rc and s6-linux-init didn't exist back then), after seeing a
post in the Gentoo forums, was its documentation ("OK, nice
explanations, let's try this").


Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-12 Thread Guillermo
El mié, 12 oct 2022 a las 23:22, Amelia Bjornsdottir
() escribió:
>
> To clarify, I'm referring to the ->target member (in srv) or the
> ->exchange member (in mx).
>
> Are those not the same as the input format for skadns_send?

Oooh. If you mean the "target" member of a s6dns_message_rr_srv_t
object filled by a call to s6dns_message_get_srv(), or the "exchange"
member of a s6dns_message_rr_mx_t object filled by a call to
s6dns_message_get_mx(), then no, looking at the code from libs6dns
from s6-dns-2.3.5.4, it seems that they are in string format, so you'd
have to convert them to packet format using s6dns_domain_encode()
before using them for a subsequent call to skadns_send().

G.


Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH? [manually resent to list]

2022-10-12 Thread Guillermo
El mié, 12 oct 2022 a las 21:10, Amelia Bjornsdottir escribió:
>
> I'm passing skadns_send an s6dns_domain_t straight out of an
> s6dns_message_rr_srv_t (case 1) or a s6dns_message_rr_mx_t (case 2).Is
> that in packet format or in string format?

Um, neither? As far as I can tell, skadns_send() always takes a domain
name encoded in a s6dns_domain_t object, and the type of resource
record that you want as the "qtype" argument, which go straight to the
"question" section of a DNS query. Objects of types
s6dns_message_rr_srv_t and s6dns_message_rr_mx_t are used for parsing
RRs in the DNS response that skadns_packet() gives you after the
client gets if from skadnsd using skadns_update().

After learning a bit about skadnsd's texclient protocol, looking at
HardenedBSD's truss output, it looks like your program does 3 queries
for SRV RRs, 1 query for an MX RR, 9 queries for A RRs, and 9 queries
for  RRs. I suppose that on OmniOS, the program does the exact
same 22 queries. In both cases you get responses with no error for the
SRV and MX queries. On Vultr's network,the A and  queries all seem
to get a response with a "format error" RCODE, presumably because the
resulting DNS packet is malformed, and on Shaw's network they don't
seem get a response at all. One possible explanation being that, if
packets are really malformed, Shaw's caches might just not bother
responding to them. This:

sendto(17,"\^?!\^A\0\0\^A\0\0\0\0\0\0.perih"...,44,0,NULL,0) = 44 (0x2c)

makes me very suspicious. That looks like a dot followed by the label
"perihelion", i.e. like coming from a s6dns_domain_t object in string
form.

G.


Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-12 Thread Guillermo
El vie, 7 oct 2022 a las 20:29, Amelia Bjornsdottir escribió:
>
> I link truss -f of my application piped through grep skadns' PID to show
> only skadns, on OmniOS
>  and HardenedBSD
> .

After further analysis, I see a pattern and have a hypothesis.

Amelia, how's the program constructing the s6dns_domain_t object that
it passes to skadns_send() for A and  queries? Is it calling
s6dns_domain_encode() or s6dns_domain_encode_list(), i.e., is the
object passed to skadns_send() in packet form instead of string form?

G.


Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-10 Thread Guillermo
El lun, 10 oct 2022 a las 13:28, Laurent Bercot escribió:
>
>   s6dns_engine filters answers that do not seem relevant to in-flight
> queries. That includes malformed answers or ones that do not follow
> RFC 1035.
>   I was made aware (thanks, Ermine) that some caches fail to set the
> RD bit in their responses to queries containing the RD bit; these
> answers were ignored.

However, the OS would still deliver them to skadnsd in a recv() /
recvfrom() call, right? If my reading of the truss outputs is correct,
the HardenedBSD system isn't getting a response at all, and whatever
error happens with the program running on the OmniOS system, if any,
does not involve the network (I can't tell if skadnsd is delivering
all received answers to the client).

I feel that packet capture tools like tcpdump(1) or OmniOS' snoop(8)
would be better suited for answering the questions that have been
raised so far (malformed packets, ignored responses, lack of
responses, etc.). Also, aren't 18 outstanding queries in a short
amount of time from one single host, like, a lot? Couldn't Shaw's
caches think that they are being DoS'ed :P ?

G:


Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-08 Thread Guillermo
El vie, 7 oct 2022 a las 20:29, Amelia Bjornsdottir escribió:
>
> On OmniOS, in Vultr's network, my A and  lookups check in (skadns_t
> *)->list after skadns_update(...), with failure, quickly. On
> HardenedBSD, in Shaw's network (so I'm aware that I'm not controlling
> for different DNS recursors here, and I should be), my A and 
> lookups check with failure after the 45 seconds.
>
> I link truss -f of my application piped through grep skadns' PID to show
> only skadns, on OmniOS
>  and HardenedBSD
> .

On OmniOS, all the DNS queries (apparently 58) received a response. On
HardenedBSD, only the first 4 queries received a response, the next 18
timed out. They were retried 4 additional times, as expected, again
timing out without receiving a response.

G.


Re: Core dump of crashing service program

2022-08-09 Thread Guillermo
> * https://wiki.artixlinux.org/Main/LocalUserServicesOns6

More specifically, this:

* 
https://wiki.artixlinux.org/Main/LocalUserServicesOns6#Complete_s6_supervision_.28optional.29


Re: Core dump of crashing service program

2022-08-09 Thread Guillermo
El mar, 9 ago 2022 a las 9:27, Liviu Nicoara escribió:
>
>> Your earlier posts do not mention s6-rc, though, only s6, so I'm not
>> sure if the system that you are dealing with is using it.
>
> Yep, s6 and s6-rc (artix w/ s6). Will do my own homework.

Oh, Artix. They have official support for s6 + s6-rc + s6-linux-init,
and the documentation you need. Read this:

* https://wiki.artixlinux.org/Main/S6#Updating_bundle_contents
* https://wiki.artixlinux.org/Main/LocalUserServicesOns6

TL;DR: look at Artix' /etc/s6/adminsv directory and 's6-db-reload' script.

G.


Re: Core dump of crashing service program

2022-08-08 Thread Guillermo
G:El lun, 8 ago 2022 a las 16:23, Liviu Nicoara escribió:
>
> I know next to zero about runlevels and the
> initialization orchestra. What is the correct way to get s6 to give me
> something that looks like runlevel 5 (a graphical display manager)? From my
> reading, is via bundles and I get the concept, but what is it that I
> should bundle?

You bundle the s6-rc atomic services.that you want up when you enter
(an equivalent of) runlevel 5. If when you say "runlevel 5" you are
thinking of Debian or something like it, those would be, more or less,
s6-rc versions of all /etc/rcS.d/S* and /etc/rc5.d/S* services, if I
remember the details of Debian's (non-systemd) init system correctly.

Your earlier posts do not mention s6-rc, though, only s6, so I'm not
sure if the system that you are dealing with is using it.

G.


execline: getpid doesn't actually recognize -P and -p

2022-05-18 Thread Guillermo
Hello. What the subject says:

$ getpid -E pid echo \$pid
1543

$ getpid -PE pid echo \$pid
getpid: usage: getpid [ -E | -e ] [ -P | -p ] variable prog...

I guess that because they are not mentioned in the call to subgetopt_r():

* 
https://git.skarnet.org/cgi-bin/cgit.cgi/execline/tree/src/execline/getpid.c?h=v2.8.3.0#n25

G.


Re: [announce] New hypermail archives of skarnet.org mailing lists

2021-05-09 Thread Guillermo
El dom, 9 may 2021 a las 18:44, Guillermo escribió:
>
> The hypermail links don't work for me, browsers (and wget) say that
> the server returns an empty response.

Sorry, I should have clarified that I meant the hypermail links in this page:

*  https://www.skarnet.org/lists.html

and I realized minutes after sending the previous messages that they
don't have the "list/" part in the URL, and that's why they don't
work.

G.


Re: [announce] New hypermail archives of skarnet.org mailing lists

2021-05-09 Thread Guillermo
Hello,

El dom, 9 may 2021 a las 17:18, Laurent Bercot escribió:
>
>   A new web interface to these archives is now available, and this
> one appears to work better (there are still a few quirks with
> utf-8 display but we'll sort them out).
>
>   https://skarnet.org/lists/skaware/
>   https://skarnet.org/lists/supervision/

The hypermail links don't work for me, browsers (and wget) say that
the server returns an empty response. The ezmlm links work OK. With
the occasional display errors, that is :)

BTW, I'm not sure if it matters now, and I think I have said it
before, but the pattern I noticed is that the messages that aren't
displayed correctly by ezmlm-cgi seem to be those that contain an URL
at the beginning of a line, so the bug is probably in the code that
turns URLs into hyperlinks.

G.


Re: execlineb ELF executable stack on Linux

2021-04-11 Thread Guillermo
El dom, 11 abr 2021 a las 15:03, Xavier Stonestreet escribió:
>
> [...] the breaking point
> is making strip and static linking with packages that are build-time
> dependencies. So for example, make strip skalibs and then statically
> link all other skarnet packages with it. Or make strip execline and
> then statically link s6 with it. Then the resulting executables or
> shared libraries end up with an executable stack.
>
> If dynamic linking is used all throughout the dependency chain, I
> don't think it can happen [...]

Ah, I see. There has to be a static library involved. Then yes, I get
the same results. For example, when manually building s6 with a static
libskarnet that has not been stripped and a static libexecline that
has been stripped with 'make strip' and the original makefile, all
executables have GNU_STACK headers with RW flags except s6-svlisten
and s6-ftrig-listen, which have RWE flags and are the only ones that
link to libexecline.

Thanks, this has been instructive.

G.


Re: execlineb ELF executable stack on Linux

2021-04-11 Thread Guillermo
Hello,

El vie, 9 abr 2021 a las 11:02, Xavier Stonestreet escribió:
>
> I did some more investigation and the problem lies
> in the Makefile's strip instructions which remove the GNU-stack
> section from the object files. Without the GNU-stack section the
> linker reverts to its backwards-compatible default which is to make
> the stack executable. Here is a patch to fix skalibs' Makefile for
> example:
> [...]
>  strip: $(ALL_LIBS)
>  ifneq ($(strip $(STATIC_LIBS)),)
> -   exec $(STRIP) -x -R .note -R .comment -R .note.GNU-stack 
> $(STATIC_LIBS)
> +   exec $(STRIP) -x -R .note -R .comment $(STATIC_LIBS)

I guess this executable stack thing is toolchain- / libc-specific? The
relevant test is using 'readelf -l' to check if there is a GNU_STACK
program header, and that its flags are RW instead of RWE, right?

If yes, I've checked manually building the latest numbered releases of
skalibs, execline and s6 with Gentoo's toolchain and packaged GNU
libc, and all the resulting executables and shared libraries have a
GNU_STACK header with RW flags. No sign of RWE flags... Even after
doing 'make strip' with the original version of the makefile that
removes the .note.GNU-stack section.

The same happens with the earlier versions packaged by Gentoo that are
built by Portage. Although Portage strips itself rather than relying
on the packages' build system.

G.


s6-linux-init-runleveld

2020-03-14 Thread Guillermo
Hello,

Currently, the s6-linux-init-runleveld service is an s6-sudod process
invoked with options -0, -1 and -2. Therefore, if the scripts/runlevel
script prints messages to its standard output, they are lost, because
it is redirected to /dev/null. So it can't do that. And messages
printed to its standard error go to the catch-all logger, which may or
may not display them on the console, depending on configuration.

However, the only thing that triggers execution of scripts/runlevel is
the s6-linux-init-telinit program, right? I expect that in most cases
it would be run manually by the administrator from an interactive
shell. So wouldn't it be better to drop the -0, -1 and -2 options, so
that the script's messages go to s6-linux-init-telinit's standard file
descriptors, i.e. display on the controlling terminal of the
administrator's shell in most cases? Woulnd't it be better to drop the
-e option from s6-linux-telinit's s6-sudo invocation, so that the
script can have access to s6-linux-init-telinit's environment, if
written to do so?

Thanks,
G.


Re: s6-tlsd immediately sending EOF during TLS handshake

2020-02-16 Thread Guillermo
El vie., 14 feb. 2020 a las 22:59, Laurent Bercot escribió:
>
>   Indeed, the client's error message indicates that the handshake did
> not complete. But in that case, that would mean the error is in
> libtls, not s6-tlsd.

If this turns out to be a bug in LibreSSL triggered by the OP's
particular certificate and key, it will be hard to debug. It could be
worth trying to obtain a backtrace with GDB. s6-networking and skalibs
would have to be rebuilt with debugging symbols (CFLAGS=-ggdb
./configure $configure-arguments), and debugging symbols for LibreSSL
would also have to be installed, which apparently is possible on Void:

* https://docs.voidlinux.org/xbps/repositories/official/debug.html

Then I'd try launching s6-tlsserver with:

$(which export) CERTFILE /etc/letsencrypt/live/$REDACTED/fullchain.pem \
$(which export) KEYFILE /etc/letsencrypt/live/$REDACTED/privkey.pem \
s6-tcpserver 0.0.0.0 443 ./script

where 'script' is:

#!/bin/execlineb -P
# Possibly drop privileges with s6-setuidgid
getpid PID
importas -u PID PID
background -d {
  redirfd -w 1 gdb-output.txt
  gdb -batch -ex continue -ex bt s6-tlsd $PID
}
s6-tlsd exit 0

This should hopefully attach the s6-tlsd process to GDB in batch mode,
and when the s6-tlsclient invocation makes it segfault, create a
backtrace in file gdb-output.txt. I don't have s6-networking, but this
works for me when used with s6-ipcserver and a test program that
raises SIGSEGV on purpose:

$ cat test-program.c
#include 
#include 

void do_it_for_real () {
  sleep(5);
  raise(SIGSEGV);
}

void do_it () {
  do_it_for_real();
}

int main () {
  do_it();
  return 0;
}

$ s6-ipcserver -v socket ./script &
s6-ipcserverd: info: starting
s6-ipcserverd: info: status: 0/40

$ s6-ipcclient socket exit 0
s6-ipcserverd: info: allow 1000:1000 pid 556 count 1/40
s6-ipcserverd: info: status: 1/40
s6-ipcserverd: info: end pid 556 uid 1000 signal 11
s6-ipcserverd: info: status: 0/40

$ cat gdb-output.txt
0x7fe21b52f3a8 in nanosleep () from /lib64/libc.so.6

Program received signal SIGSEGV, Segmentation fault.
0x7fe21b498ec1 in raise () from /lib64/libc.so.6
#0  0x7fe21b498ec1 in raise () from /lib64/libc.so.6
#1  0x56114282316d in do_it_for_real () at test-program.c:6
#2  0x56114282317e in do_it () at test-program.c:10
#3  0x56114282318f in main () at test-program.c:14
[Inferior 1 (process 556) detached]

Hope that helps,
G.


Re: s6-tlsd immediately sending EOF during TLS handshake

2020-02-14 Thread Guillermo
El jue., 13 feb. 2020 a las 6:50, Laurent Bercot escribió:
>
> >So I guess that means there is either a bug in LibreSSL (oh no), or in
> >s6-networking's LibreSSL code?
>
>   Probably the latter; given your trace, it seems to be the tunnel code
> not handling it correctly when it receives a EOF just after the
> handshake.

Do you think that the handshake completes? I'm not sure that execution
is even reaching the stls_run() call; the segfault could have happened
during the tls_handshake() call in stls_s6tlsd() (i.e. while executing
LibreSSL code), and the tls_handshake() call in stls_s6tlsc() would
report a failed handshake accordingly.

G.


Re: Using s6 and s6-rc tools with an unprivileged user

2020-02-02 Thread Guillermo
El dom., 2 feb. 2020 a las 17:14, Laurent Bercot escribió:
>
> >* s6-rc-db: [...]
> There shouldn't be any drawbacks. Honestly, the lock file in a
> database is only necessary because of s6-rc-bundle, which is the only
> program that can modify a compiled db at the same time it is being read;
> the intent is not for it to be a roadblock for anything. So, change its
> group to your heart's content; I should add an option to s6-rc-compile
> to do that at db creation time.

OK. An option would save an extra chgrp + chmod step. s6-rc-bundle
still fails, as expected:

$ s6-rc-db -c db list all
a-oneshot
a-longrun
s6rc-fdholder
s6rc-oneshot-runner

$ s6-rc-bundle -c db add a-bundle a-longrun
s6-rc-bundle: fatal: unable to open db/resolve.cdb.new: Permission denied

> >* s6-rc: [...]
> This is a bit more borderline, because a user can still run
> "s6-rc change" and it will attempt to change the machine state, and will
> probably yield a big batch of errors about s6-svc being forbidden to
> change a service state or s6-sudoc being unable to connect to the
> oneshot runner. Things should work exactly as they're supposed to, but
> it might be pretty ugly.

Yeah, it all works as expected; YMMV about the clarity of error messages.

$ s6-rc -ul live change a-longrun
s6-rc: fatal: unable to take locks: Permission denied

[After chgrp + chmod, and starting s6rc-oneshot-runner]
$ s6-rc -dl live list
s6rc-oneshot-runner
s6rc-fdholder
a-longrun
a-oneshot

$ s6-rc -ul live change a-oneshot
s6-sudoc: fatal: unable to connect to the s6-sudod server - check that
you have appropriate permissions
s6-rc: warning: unable to start service a-oneshot: command exited 111

$ s6-rc -ul live change a-longrun
s6-svlisten1: fatal: unable to subscribe to events for
live/servicedirs/a-longrun: Permission denied
s6-rc: warning: unable to start service a-longrun: command exited 111
s6-svc: fatal: unable to control live/servicedirs/a-longrun: Permission denied

[OK, make the 'event' fifodir publicly accessible and try again]
$ s6-rc -ul live change a-longrun
s6-svlisten1: fatal: unable to s6_svstatus_read: Permission denied
s6-rc: warning: unable to start service a-longrun: command exited 111
s6-svc: fatal: unable to control live/servicedirs/a-longrun: Permission denied

[OK, make 'supervise' group searchable and try again, see later]
$ s6-rc -ul live -t 1000 change a-longrun
s6-svc: fatal: unable to control live/servicedirs/a-longrun: Permission denied
s6-svlisten1: fatal: timed out
s6-rc: fatal: timed out

> If you don't mind ugliness, fine; but if you
> want me to make it official, I'll want to add a permissions check to
> "s6-rc change" in order to fail early and cleanly.

Personally, I don't mind the messages. I am able to interpret them,
and there is a "permission denied" there anyway, but then again, I am
familiar with some of s6-rc's internals. I don't know about other
people, though.

> >* s6-svdt: [...]
> I haven't studied this in detail, but making the permissions of a
> supervise/ directory distinct from those of the whole supervision tree
> sounds sketchy to me. If you make supervise/ readable by another group,
> it means permissions of the files inside need to be carefully crafted,
> which I'm not sure is the case for everything. At the very least the
> status file should be made unreadable by others - not that it's
> critical,
> but restricting certain stuff to a group is pretty meaningless if other
> stuff such as the state of the service is available to everyone.

After further testing, it turns out that being group *searchable*
(permissions 0710) is enough for unprivileged use. It seems that the
openat() call that s6-svdt makes to read the death tally (and also
s6-svlisten1 for the status file) fails otherwise.

$ s6-svdt scan/a-longrun
s6-svdt: fatal: unable to read death tally for service scan/a-longrun:
Permission denied

strace reveals that, for s6-svlisten1:

openat(AT_FDCWD, "scan/a-longrun/supervise/status",
O_RDONLY|O_NONBLOCK) = -1 EACCES (Permission denied)

And for s6-svdt:

openat(AT_FDCWD, "scan/a-longrun/supervise/death_tally",
O_RDONLY|O_NONBLOCK) = -1 EACCES (Permission denied)

> >*  s6-svstat: [...]
>
> That's an excellent finding!
> I always thought daemontools (from which every other suite takes
> inspiration!) had a separate 'ok' fifo because using the 'control' fifo
> as a usage marker involves some tricky manipulation at supervisor start,
> and djb just couldn't be bothered. I got rid of 'ok' because that saves
> one fd. But you're right: separating 'ok' from 'control' allows checking
> whether the supervisor is running with a different permission set from
> actually sending commands.

To be honest, after seeing that daemontools supervise's code opened
the 'ok' FIFO for reading and did nothing else with it other that
marking the corresponding file descriptor close-on-exec, for the
longest time I wondered why it even existed. But looking at
daemontools svstat's code, yes, that is the FIFO that is checked t

Using s6 and s6-rc tools with an unprivileged user

2020-02-02 Thread Guillermo
Hello,

Because of the way s6, s6-rc and s6-linux-init tools set up
permissions, pretty much every operation that involves the init
system, even those that do not change the machine state, must be done
with root privileges. Other init systems allow some operations to be
done without those privileges. For example, with sysvinit + OpenRC,
runlevel, rc-status and 'rc-service describe' can be used by
unprivileged users, but shutdown, rc-update and 'rc-service start'
cannot. I know that changing group and permissions of specific files
and directories allows doing the same with s6 + s6-rc + s6-linux-init.
However, the fact that one can do it doesn't necessarily mean that one
should. So here are my questions, which are pretty much the same in
all cases:

* s6-rc-db: Changing the group of the 'lock' file in a compiled
dababase and making it group writable allows the group's members to
use the command. s6-rc-db cannot change the database or the service
states, so are there any drawbacks to doing this? Is there a better
way to use the command without being root?

* s6-rc: Changing the group of the 'lock' file in the live state
directory, the group of the 'lock' file in the compiled dababase that
is currently live, and making both group writable, allows the group's
members to use, for example, the 's6-rc -a list' and 's6-rc -a
listall' commands, but not the 's6-rc change' command, because
permissions in other files and directories still prevent it. So are
there any drawbacks to doing this? Is there a better way to use the
command in forms that do not change service states without being root?

* s6-svdt: Changing the group of the 'supervise' subdirectory of a
service directory, and making it group readable, allows the group's
members to use the command for the corresponding service.
s6-svdt-clear still needs root privileges. So are there any drawbacks
to doing this? Is there a better way to use the command without being
root?

*  s6-svstat: This is a tough one. Because the 'control' FIFO in the
'supervise' subdirectory is only user-writable, this command can only
be run as root. As far as I can tell, opening the FIFO is needed to
check if the supervisor is running, and other daemontools-style
supervision suites use a separate FIFO for this purpose, customarily
named 'ok'. But changing the file's group and making it group writable
also allows using s6-svc without being root. So is there a way to
allow using s6-svstat, but not s6-svc, without being root?

* Logging directories and kernel environment store: if they don't
exist, s6-log creates logging directories with permissions 2700.
s6-linux-init with the -s option creates the environment store with
permissions 0700. Are there any drawbacks to changing their group to
allow more users to read and search those directories?

Thanks,
G.


Re: s6-svscan can't find shared libraries

2019-12-05 Thread Guillermo
El mié., 4 dic. 2019 a las 2:30, J. Lewis Muir escribió:
>
> Lastly, pkgsrc seems to use rpath based on
>
>   https://www.netbsd.org/docs/pkgsrc/fixes.html#fixes.libtool
>
> which suggests using libtool with the '-rpath' option for linking.

Note that this talks about using -rpath *when invoking libtool*, where
it has a different meaning. GNU Libtool (which skarnet.org software
does not use) needs -rpath when invoked in link mode to create a
library. Without it, it creates what it calls "a convenience library"
(a static library that is generally not going to be installed).

$ ./libtool --mode=link gcc -shared -o libtest.la libtest.lo
-version-info 1:0 # "Convenience library" with position-independent
code
libtool: warning: '-version-info/-version-number' is ignored for
convenience libraries
libtool: link: rm -fr  .libs/libtest.a .libs/libtest.la
libtool: link: ar cru .libs/libtest.a .libs/libtest.o
libtool: link: ranlib .libs/libtest.a
libtool: link: ( cd ".libs" && rm -f "libtest.la" && ln -s
"../libtest.la" "libtest.la" )

$ ./libtool --mode=link gcc -shared -o libtest.la libtest.lo
-version-info 1:0 -rpath /lib # Actual shared library
libtool: link: rm -fr  .libs/libtest.a .libs/libtest.la
libtool: link: gcc -shared  -fPIC -DPIC  .libs/libtest.o
-Wl,-soname -Wl,libtest.so.1 -o .libs/libtest.so.1.0.0
libtool: link: (cd ".libs" && rm -f "libtest.so.1" && ln -s
"libtest.so.1.0.0" "libtest.so.1")
libtool: link: (cd ".libs" && rm -f "libtest.so" && ln -s
"libtest.so.1.0.0" "libtest.so")
libtool: link: ( cd ".libs" && rm -f "libtest.la" && ln -s
"../libtest.la" "libtest.la" )

The resulting commands are different.

G.


s6-linux-init: rc.shutdown's and rc.shutdown.final's messages

2019-11-24 Thread Guillermo
Hello,

Currently, s6-linux-init-shutdownd executes these files with their
standard output and error redirected to the catch-all logger's FIFO.
The daemon itself redirects its own to /dev/console only before
creating the stage 4 script. So, in the absence of errors, only the
"sending all processes the TERM signal..." and "sending all processes
the KILL signal..." messages are seen on the console.

However, in most circumstances I can think of, messages from
rc.shutdown and rc.shutdown.final, or error messages from commands in
the stage 4 script, will never be seen by the user, because the
machine is in the process of shutting down, and /run/uncaught-logs is
on a tmpfs, so it vanishes when the process completes. Therefore, this
is kind of like having stdout and stderr redirected to /dev/null.
Wouldn't it be better if s6-linux-init-shutdownd ran both rc.shutdown
and the entire stage 4 script with stdout and stderr redirected to
/dev/console instead? Except maybe in the case of receiving the 'S'
command on the control FIFO?

Alternatively, one could use the catch-all logger variant that echoes
logged messages to /dev/console (created by s6-linux-init-maker's '-1'
option), or perform redirections directly in rc.shutdown and
rc.shutdown.final. The first alternative, however, affects logging for
all processes. If the latter alternative is the preferred way, then I
think the example rc.* scripts installed in the skeldir
(/etc/s6-linux-init/skel by default) could contain commented stdout
and stderr redirections and an accompanying explanation, to make
readers aware that they might be needed.

Thanks,
G.


Re: False positive in skalibs system feature test

2019-10-25 Thread Guillermo
El vie., 25 oct. 2019 a las 17:01, Jens Rehsack escribió:
>
> > [...]
> > configure:2634: result: no
> >
> > As you can see, only the equivalent of a skarnet.org 'choose cl' is used 
> > here.
>
> Wasn't that clear enough when I told that weeks before?
> For any typical library function, that is enough.

In cases like this, not without precautions. This only worked because
Autoconf happens to know about GNU libc's __stub_* macros, and adds
garbage to the test source file if the relevant one is defined, so
that the compile phase fails. If the compile phase had succeeded, the
link phase would have as well. The configure script would have
declared that lchown() is available, and one would have ended up with
a useless lchown() substitute.

G.


Re: False positive in skalibs system feature test

2019-10-25 Thread Guillermo
El vie., 25 oct. 2019 a las 14:30, Shengjing Zhu escribió:
>
> Not familiar with autoconf, but I found the following snippet in autoconf 
> code.
>
> https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/c.m4?h=v2.69#n179
>
> ```
> #if defined __stub_$1 || defined __stub___$1
> choke me
> #endif

This seems to be indeed what Autoconf currently uses when the
AC_CHECK_FUNC macro is used to check if a function is available. The
__stub_* macros are in  (which on my Gentoo x86_64 system
includes ), an autogenerated GNU libc header included
by . I tried this with getrandom(), which exists, and
lchmod(), which is a libc stub. Here are the results:

$ nm -D /lib64/libc.so.6 | grep -E 'lchmod|getrandom'
0003d980 T getrandom
000f5aa0 T lchmod

$ cat /usr/include/gnu/stubs-64.h
/* This file is automatically generated.
   It defines a symbol `__stub_FUNCTION' for each function
   in the C library which is a stub, meaning it will fail
   every time called, usually setting errno to ENOSYS.  */
#ifdef _LIBC
 #error Applications may not define the macro _LIBC
#endif
[...]
#define __stub_lchmod
[...]

$ cat configure.ac
AC_INIT(example, 1.0)
AC_CHECK_FUNC(getrandom)
AC_CHECK_FUNC(lchmod)
AC_OUTPUT

$ autoconf -V
autoconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
, 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by David J. MacKenzie and Akim Demaille.

$ autoconf
$ ./configure
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for getrandom... yes
checking for lchmod... no
configure: creating ./config.status

$ cat config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by example configure 1.0, which was
generated by GNU Autoconf 2.69.
[...]
configure:2634: checking for lchmod
configure:2634: gcc -o conftest -g -O2   conftest.c  >&5
conftest.c:37:1: error: unknown type name 'choke'
 choke me
 ^
conftest.c:37:9: error: expected ';' before 'int'
 choke me
 ^
 ;
conftest.c:40:1:
 int
 ~~~
configure:2634: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "example"
| #define PACKAGE_TARNAME "example"
| #define PACKAGE_VERSION "1.0"
| #define PACKAGE_STRING "example 1.0"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| /* end confdefs.h.  */
| /* Define lchmod to an innocuous variant, in case  declares lchmod.
|For example, HP-UX 11i  declares gettimeofday.  */
| #define lchmod innocuous_lchmod
|
| /* System header to define __stub macros and hopefully few prototypes,
| which can conflict with char lchmod (); below.
| Prefer  to  if __STDC__ is defined, since
|  exists even on freestanding compilers.  */
|
| #ifdef __STDC__
| # include 
| #else
| # include 
| #endif
|
| #undef lchmod
|
| /* Override any GCC internal prototype to avoid an error.
|Use char because int might match the return type of a GCC
|builtin and then its argument prototype would still apply.  */
| #ifdef __cplusplus
| extern "C"
| #endif
| char lchmod ();
| /* The GNU C library defines this for functions which it implements
| to always fail with ENOSYS.  Some functions are actually named
| something starting with __ and the normal name is an alias.  */
| #if defined __stub_lchmod || defined __stub___lchmod
| choke me
| #endif
|
| int
| main ()
| {
| return lchmod ();
|   ;
|   return 0;
| }
configure:2634: result: no

As you can see, only the equivalent of a skarnet.org 'choose cl' is used here.

G.


Re: False positive in skalibs system feature test

2019-10-23 Thread Guillermo
El mié., 23 oct. 2019 a las 6:50, Laurent Bercot escribió:
>
> >/usr/bin/ld: src/librandom/random_string.lo: in function `random_string':
> >./src/librandom/random_string.c:26: warning: getrandom is not
> >implemented and will always fail
> [...]
> >ld returns zero, even it knows "xxx is not implemented and will always fail".
>
> That, on the other hand, is completely insane. Why the heck isn't it
> an error? It's *precisely* the kind of thing I want ld to error out on!
> [...]
>
> >I only observe such case on GNU Hurd/K*BSD actually. I'm thinking if
> >only these non-linux systems have such confused behaviour(why it links
> >successfully at all...)
>
>   That would be interesting to explore, yes.

I also think that this is a GNU libc thing. As far as I can tell, it
contains a 'stub' getrandom() that does nothing, returns -1, and sets
errno to ENOSYS, unless it is built for an OS that can provide a
'real' getrandom():

* 
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/getrandom.c;h=9fe6dda34d3f7fe99aaa66fc3a8ad4e540494c4a;hb=56c86f5dd516284558e106d04b92875d5b623b7a

In the upstream libc package, this seems to happen only for Linux,
where it is just a wrapper around the corresponding system call:

* 
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/getrandom.c;h=9d3d919c748d10c7debb6836575c3c44afa0d707;hb=56c86f5dd516284558e106d04b92875d5b623b7a

I would think that for 'pure' GNU (Hurd) and GNU/k*BSD there is no
'real' getrandom() implementation, so the libc gets the 'stub' one,
although the kernel of FreeBSD 12.0, has a getrandom() system call
that the libc could use.

* 
https://svnweb.freebsd.org/base/stable/12/sys/sys/syscall.h?revision=351565&view=markup

That "stub_warning()" macro turns out to be a GCC trick to display the
"is not implemented and will always fail" message: if an ELF object
file contains .gnu.warning. section, GNU ld knows what it
means, shows a warning, removes the section in the resulting file, and
returns 0, because an excecutable is still produced, although with an
implementation of getrandom() that is not useful for skalibs :P Here's
an example of the trick:

$ cat libtest.c
#include 
#include 

int test_function() {
   fprintf (stderr, "test_function(): I told you! I don't work!\n");
   return errno = ENOSYS, -1;
}

//Magic, create a .gnu.warning.test_function ELF section
static char message[] __attribute__ ((used,
section(".gnu.warning.test_function\n\t#")))
   = "libtest: I provide a \"test_function\" symbol, but I don't
actually implement the function";

$ gcc -shared -Wl,-soname=libtest.so -o libtest.so -fPIC -DPIC libtest.c
$ objdump -h libtest.so | grep .gnu.warning
 23 .gnu.warning.test_function 0059  
  3060  2**5

$ cat test-program.c
#include 
#include 
#include 

int test_function();

int main() {
   if (test_function()) fprintf (stderr, "test-program: ERROR: %s\n",
strerror(errno));
}

$ gcc -o test-program -Wl,-rpath=/home/guillermo -L. test-program.c -ltest
/usr/bin/ld: /tmp/ccu3uP4m.o: in function `main':
test-program.c:(.text+0xa): warning: libtest: I provide a
"test_function" symbol, but I don't actually implement the function

$ objdump -h test-program | grep .gnu.warning
$ ./test-program
test_function(): I told you! I don't work!
test-program: ERROR: Function not implemented

So, it seems that,:for the purposes of detecting whether there is a
usable getrandom(),for skallibs:

* If GNU libc is used, a 'choose cl' test is unreliable.
* GNU/Hurd and GNU/k*BSD would be 'getrandom: no' OSes, at least until
the libc is updated to make a system call on GNU/kFreeBSD [1]
* A 'choose clr' test would be reliable.
* Any other alternative (passing an LDFLAGS that produces a build
failure, overriding the result of tests, or whatever) kind of implies
knowledge about what the outcome of the detection is going to be,
anyway, doesn't it? It's not really a detection.

[1] Is Debian the upstream of the libc port to GNU/kFreeBSD?

G.


Re: Porting skalibs to GNU Hurd

2019-10-17 Thread Guillermo
El jue., 17 oct. 2019 a las 16:51, Laurent Bercot escribió:
>
> Hurd should return -1 ENXIO,
> unless there is a real problem with the underlying file system - and the
> test your patch comments exists to catch such problems.

…or  an error in the underlying operating system. POSIX open() seems
to be implemented directly by the Hurd:

* 
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/mach/hurd/open.c;h=e093b67c1e3aa99bc3caf485feaf8f226dc0;hb=56c86f5dd516284558e106d04b92875d5b623b7a
.
And the call seems to be failing with that special Hurd errno code,
EIEIO, which... is not a good thing :P

* http://www.gnu.org/software/hurd/faq/eieio.html

"This error code is used for a variety of "hopeless" error conditions.
Most probably you will encounter it when a translator crashes while
you were trying to use a file that it serves."

* 
https://sourceware.org/git/?p=glibc.git;a=blob;f=manual/errno.texi;h=8cb4ce8b489dbbc6b3decef023cb3cafa471a32b;hb=56c86f5dd516284558e106d04b92875d5b623b7a

"@errno{EIEIO, 104, Computer bought the farm}
Go home and have a glass of warm, dairy-fresh milk.
[...]
@c Translators, please do not translate this litteraly, translate it into
@c an idiomatic funny way of saying that the computer died."

So, whoever calls s6_supervise_lock_mode() should correctly die, as
there is an actual error (that should be reported to the Hurd
developers); libs6, libskarnet and the libc are just the messengers.
Commenting out the errno check just sweeps the error under the carpet.

G.


Re: Porting skalibs to GNU Hurd

2019-10-17 Thread Guillermo
El jue., 17 oct. 2019 a las 17:02, Laurent Bercot escribió:
>
> >__gnu_hurd__
>
>   That's perfect. I replaced __GLIBC__ with this.

I think __GNU__ is still preferred over __gnu_hurd__, judging by the
"Preprocessor Define" and "GNU specific #define" sections of the
porting guidelines linked earlier in the thread:

* http://www.gnu.org/software/hurd/hurd/porting/guidelines.html

Although the intention appears to be that __GNU__ be defined if and
only if __gnu_hurd__ is.

* 
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/gnu.h;h=1cc744b13be567ec1e7177acec59ccd6142960dd;hb=4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d

G.


Re: Porting skalibs to GNU Hurd

2019-10-17 Thread Guillermo
El jue., 17 oct. 2019 a las 4:25, Laurent Bercot escribió:
>
> Do you mean __GNUC__ instead of __GNU__ ?

No, I mean __GNU__ without the "C. __GNUC__ is a different predefined macro.

G.


Re: Porting skalibs to GNU Hurd

2019-10-16 Thread Guillermo
El mié., 16 oct. 2019 a las 14:02, Laurent Bercot escribió:
>
> >1. In src/include/skalibs/nonposix.h,
> >
> >#if defined(__linux__) || defined(__GLIBC__)
> >The if condition seems not working. Of course it's not __linux__, but I am
> >using glibc.
>
> Ah, yes, __GLIBC__ is only defined in features.h, you're right.

If compiler predefined macros like __linux__ or __NetBSD__ are OK for
, how about #if defined(__linux__) ||
defined(__GNU__)? GCC predefines both __GNU__ and __gnu_hurd__ (I
think, can be checked with 'cpp -dM -  >2. No PATH_MAX macro
> >
> What I can promise is to fix every issue that you might
> get with -DPATH_MAX=4096. And remove the dependencies on PATH_MAX that
> can be removed painlessly. How does that sound?

Do you mean doing something like:

#ifndef PATH_MAX
#define PATH_MAX 4096
#endif

or removing references to PATH_MAX altogether?

G.


Re: s6-log timestamp bug after resuming; Plan 9 $path and /package

2019-09-01 Thread Guillermo
El dom., 1 sept. 2019 a las 19:18, Laurent Bercot escribió:
>
> >Doesn't s6-log (indirectly) use clock_gettime() with a CLOCK_MONOTONIC
> >argument if skalibs was built with --enable-monotonic?
>
> Yes, it does. But --enable-monotonic is a bad idea for long-lived
> processes that need timestamping. I need better clock interfaces
> in skalibs, the current ones don't allow a run-time selection
> between clocks. Hopefully I can write those interfaces for 2.9.0.0.

OK. Going back to Casper's problem, I remember from old posts that he
is using Void, Void packages a libskarnet built with
--enable-monotonic, and the man page says that CLOCK_MONOTONIC does
not count time that the system is suspended, so could that be the
reason of s6-log's incorrect timestamps after suspension or
hibernation?

* 
https://github.com/void-linux/void-packages/blob/a35ae159d262f867de4b1c12250baae947b6e512/srcpkgs/skalibs/template

On the other hand, as far as I can tell, GNU Coreutil's date(1) just
calls clock_gettime() with a CLOCK_REALTIME argument, so if that works
correctly after suspension or hibernation, in the absence of better
clock interfaces, maybe linking s6-log to a libskarnet build without
--enable-monotonic would work?

G.


Re: s6-log timestamp bug after resuming; Plan 9 $path and /package

2019-09-01 Thread Guillermo
El dom., 1 sept. 2019 a las 5:41, Laurent Bercot escribió:
>
> - So the only real solution is a mechanism that fixes
> CLOCK_REALTIME (the portable wall clock that is appropriate for
> s6-log) after a resume.

Doesn't s6-log (indirectly) use clock_gettime() with a CLOCK_MONOTONIC
argument if skalibs was built with --enable-monotonic?

G.


Ctrl + Alt + Del setup in s6-linux-init?

2019-07-31 Thread Guillermo
Hello,

s6-linux-init-maker creates an .s6-svscan/SIGINT script that reboots
the computer, and Adélie Linux' packaging of s6-linux-init-1.0.2.0
appears to contain an /etc/sysctl.d/ctrlaltdel.conf file that says
"kernel.ctrl-alt-del = 0". So one might think that having the kernel
send a SIGINT signal to process 1 when Ctrl + Alt + Del is pressed is
the expected default setup. Given that this is a Linux-specific
package, and that the stage1 init is now a C program, was there a
reson for not setting this up directly with a 'reboot(RB_DISABLE_CAD)'
call, like sysvinit, runit and others do?

Thanks,
G.


s6-linux-init-shutdownd: copy & paste error?

2019-07-17 Thread Guillermo
Hello,

s6-linux-init-shutdownd's code contains something like this in
function run_stage3():

if (WIFSIGNALED(wstat))
//...
else if (WEXITSTATUS(wstat))
//…
else if (WEXITSTATUS(wstat))
//…

* 
https://git.skarnet.org/cgi-bin/cgit.cgi/s6-linux-init/plain/src/shutdown/s6-linux-init-shutdownd.c?h=v1.0.2.0

The same condition is tested twice, and the second 'if' branch prints
a wrong warning, so I suppose only the first and last branches are the
ones meant to be there, and the second one is a mistake.

G.


Re: s6-envuidgid: Weird errors with GNU libc's getgrent() and endgrent()

2019-07-07 Thread Guillermo
El dom., 2 jun. 2019 a las 5:53, Laurent Bercot escribió:
>
> So I'd advise reporting the bug to the glibc maintainers.

By the way, Gentoo's toolchain maintainers agreed, so this became
upstream bug #24696. A fix, currently in its 8th iteration after
several rounds of reviews, is in preparation, so it will eventually be
commited to the repository. Hopefully :)

https://sourceware.org/ml/libc-alpha/2019-06/msg00957.html

Meanwhile, I also discovered why I didn't have any problems with
Gentoo's packaging of version 2.27, and did with version 2.29. It
turns out Gentoo did supply an nsswitch.conf file, but it was buried
inside the archive that contained Gentoo's patchset, so one had to
extract all files to discover it. This file said:

group: compat files

So there was no 'db' service configured for the group database, and
the endgrent() bug, which Gentoo maintainers traced back to at least
GNU libc 2.26, was not exposed. As of version 2.28, this file was no
longer supplied with the patchset, and upstream's example one was
installed instead.

G.


Re: s6-envuidgid: Weird errors with GNU libc's getgrent() and endgrent()

2019-06-10 Thread Guillermo
El lun., 10 jun. 2019 a las 4:13, Casper Ti. Vector escribió:
>
> > /etc/nsswitch.conf, which I don't recall having ever modified, says:
> >   group: db files
>
> Try using `qfile -o' to find the owner, and subsequently how it should
> originally have been?  (I used Gentoo for several years before migrating
> to Alpine/Void two or three years ago, which is why I still lurk on its
> forums.)

/etc/nsswitch.conf is 'owned' by sys-libs/glibc, and Gentoo's default
comes directly from the libc's source package:

* 
https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-libs/glibc/glibc-2.29-r2.ebuild#n1281
* 
https://sourceware.org/git/?p=glibc.git;a=blob;f=nss/nsswitch.conf;h=39ca88bf5198df2bfa8f4a2e4bf631f3baee16c0;hb=56c86f5dd516284558e106d04b92875d5b623b7a

> > I have no idea what changed, why this used to work before my upgrade
> > of the libc, or why it apparently never failed for anyone else not on
> > Gentoo.
>
> You are correct: the issue can be reproduced on my void/glibc system if
> `db' is added (whether prepended or appended) to the `group:' line in
> /etc/nsswitch.conf.  (The /etc/nsswitch.conf is the distro-default for
> glibc/x86_64 systems, unchanged on my system.)

This is interesting. It hints at the problem really being in the
upstream package. And you said that you added the 'db' service, so I
take it that it wasn't there by default. Is this Void's current
default /etc/nsswitch.conf?

* 
https://github.com/void-linux/void-packages/blob/6c9706db3f2034677057ab1e70ce59fd06134ea3/srcpkgs/base-files/files/nsswitch.conf

If yes, it means that the 'db' service isn't configured for any
database at all, and would explain Void's 'immunity' to this problem.

G.


Re: s6-envuidgid: Weird errors with GNU libc's getgrent() and endgrent()

2019-06-09 Thread Guillermo
Well, this one required GDB to tell what was going on, but I managed
to find a workaround.

> Short version: For recent libc releases, and at least on Gentoo,
> getgrent() and endgrent() seem to magically set errno to EINVAL (I
> think),
> [...]
> s6-envuidgid [...] fails with a strange "invalid argument" error
> whenever it tries to set GIDLIST. s6-setuidgid invokes s6-envuidgid,
> so it also fails.

For those who don't know, in GNU libc, getgrent(3) and endgrent(3) are
implemented by __nss_getent_r() and __nss_endent(), respectively,
which are part of the Name Service Switch (NSS) mechanism. You know,
the one that Laurent's nsss project is a replacement of. My
/etc/nsswitch.conf, which I don't recall having ever modified, says:

group: db files

With this configuration, the first time __nss_getent_r() is called, it
tries to call the implementation of setgrent(3) from each of these
services, "db", implemented by module /lib64/libnss_db.so.2, and
"files", implemented by module /lib64/libnss_files.so.2. The first one
is _nss_db_setgrent(), which tries to open /var/db/group.db, fails
because on my machine that file does not exist, and returns an
'unavailable' status (NSS_STATUS_UNAVAIL). The second one is
_nss_files_setgrent(), which tries to open /etc/group, succeeds, and
returns a 'successful' status (NSS_STATUS_SUCCESS). From then on,
__nss_getent_r() always calls the implementation of getgrent() from
libnss_files.so, named _nss_files_getgrent_r(). Relevant output of an
strace of the test program in my OP:

openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libnss_db.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/var/db/group.db", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(No such file or directory)
openat(AT_FDCWD, "/etc/group", O_RDONLY|O_CLOEXEC) = 3

But it turns out that, with this configuration, __nss_endent() *also*
wants to call the implementation of endgrent(3) from each of these
services. And the one from libnss_db.so, named _nss_db_endgrent(), is
just a wrapper around a munmap(2) system call, via an intermediate
internal_endent() function:

* 
https://sourceware.org/git/?p=glibc.git;a=blob;f=nss/nss_db/db-open.c;h=8a83d6b9302b39a071d0ddca5ab686e6ecfd6178;hb=56c86f5dd516284558e106d04b92875d5b623b7a

In my case, this results in a 'munmap(NULL, 0)' call that… you
guessed, fails with EINVAL (remember that _nss_files_setgrent() said
the service was unavaliable?). And strace happens to see it:

write(1, "End of file or error (errno = Su"..., 39) = 39
munmap(NULL, 0) = -1 EINVAL (Invalid argument)
close(3)= 0
write(1, "errno = Invalid argument\n", 25) = 25

The implementation from libnss_files.so, _nss_files_endgrent(), is
also called, and succeeds, but errno is already set. So, the
workaround? I removed the "db" service for the group database in
/etc/nsswitch.conf:

group: files

With this change, the output of the test program looks exactly like
Casper's and Brett's, strace no longer shows openat() calls for
/lib64/libnss_db.so.2 and /var/db/group.db, and both s6-setuidgid and
s6-envuidgid work again.

I have no idea what changed, why this used to work before my upgrade
of the libc, or why it apparently never failed for anyone else not on
Gentoo.

G.


Re: s6-envuidgid: Weird errors with GNU libc's getgrent() and endgrent()

2019-06-02 Thread Guillermo
El dom., 2 jun. 2019 a las 12:21, Brett Neumeier escribió:
>
> FWIW, I compiled and ran your test program; the program output concludes with:
>
> name: ec2user members: (errno = Invalid argument)
> End of file or error (errno = Success)
> errno = Success

Huh. So it looks like I've got a combination of upstream and
Gentoo-specific. Lucky me :(

> With glibc 2.29 and branch updates through 2019-05-03,

In the chance that there might have been a regression that has been
fixed at least for endgrent(3) in some of those commits, I'm going to
have a look then, and also check what Void does. If I find nothing,
then next stop for me is Gentoo's bug tracker, I guess.

Many thanks to you and Casper for the testing.
G.


Re: s6-envuidgid: Weird errors with GNU libc's getgrent() and endgrent()

2019-06-02 Thread Guillermo
El dom., 2 jun. 2019 a las 3:27, Casper Ti. Vector escribió:
>
> On my machine using Void with glibc 2.29 since 20190305

Yay! I thought chances of hearing from someone who uses a GNU
libc-based distribution that is not Gentoo, with a sufficiently recent
version (which usually means it's rolling release), and who is also a
subscriber of this list, were rather slim :)

>, I never encountered this issue.

Do you happen to build skarnet.org packages statically linked to musl
on those Void machines, or do you let them link to the distribution's
libc?

> I can confirm the behaviour you described of
> getgrent(3),

Good. Maybe it's upstream then…

> Therefore, at least on my system, endgrent(3) is always called
> with `errno' set to zero.

Yeah, what triggers s6-envuidgid's failure here is that endgrent() is
setting errno to the weird EINVAL value, the program checks errno
*after* the call, and thinks it was caused by a failing  getgrent()
call. Would it be to much to ask you if you could also check if
endgrent(3) flips errno from 0 to EINVAL?

Thanks!
G.


s6-envuidgid: Weird errors with GNU libc's getgrent() and endgrent()

2019-06-01 Thread Guillermo
Hello,

I hereby present to you a weird error report.

Short version: For recent libc releases, and at least on Gentoo,
getgrent() and endgrent() seem to magically set errno to EINVAL (I
think), except when errno's value is actually meaningful. That is,
when getgrent() returns a null pointer. I don't know if this behaviour
is in the upstream package or in Gentoo's patched one. s6-envuidgid
makes a getgrent() call but checks errno only after it has called
endgrent(), so it fails with a strange "invalid argument" error
whenever it tries to set GIDLIST. s6-setuidgid invokes s6-envuidgid,
so it also fails.

Longer version:

After performing the usual, periodic package upgrades on my Gentoo
machine, which included an upgrade of GNU libc from 2.27 to 2.29, I
was surprised to find that s6-setuidgid suddenly and consistently
failed with a mysterious error:

# s6-setuidgid daemon printf Nope\\n
s6-envuidgid: fatal: unable to get supplementary groups for daemon:
Invalid argument

Nothing like this happened before the upgrade. I checked
s6-envuidgid's code, ran some tests, used strace without getting any
meaningful results, and finally came to the conclusion I mentioned
before after compiling and running the following test program:

#include 
#include 
#include 
#include 

int main() {
   struct group *gr;
   int saved_errno;
   for (;;) {
  errno = 0;
  gr = getgrent();
  saved_errno = errno;
  if (gr) {
 printf("name: %s members: ", gr->gr_name);
  }
  else {
 printf ("End of file or error ");
 break;
  }
  for (char **p = gr->gr_mem; *p; p++) printf ("%s ", *p);
  printf("(errno = %s)\n", strerror(saved_errno));
   }
   printf("(errno = %s)\n", strerror(saved_errno));
   endgrent();
   printf("errno = %s\n", strerror(errno));
}

Results:

name: root members: root (errno = Invalid argument)
name: bin members: root bin daemon (errno = Invalid argument)
name: daemon members: root bin daemon (errno = Invalid argument)
...
name: messagebus members: (errno = Invalid argument)
name: nullmail members: (errno = Invalid argument)
End of file or error (errno = Success)
errno = Invalid argument

strace says the program loads libnss_files.so.2 (provided by the libc)
and reads /etc/group, but doesn't make that many system calls.

Sooo... thoughts? Does anyone else use a sufficiently recent version
of GNU libc and experience the same?

G:


s6-rc-compile does not copy all longrun definition files

2019-02-09 Thread Guillermo
Hello,

Despite s6-rc's upgrade notes claiming that version 0.4.1.0 supports
files 'max-death-tally' and 'down-signal' in longrun definitions
(introduced in s6 versions 2.7.1.0 and 2.7.2.0, respectively),
s6-rc-compile does not actually 'compile' them to the service
database:

$ ls -1 srv/test
down-signal
finish
max-death-tally
nosetsid
run
timeout-finish
timeout-kill
type

$ s6-rc-compile db srv

$ ls -1 db/servicedirs/test
finish
nosetsid
run
timeout-finish
timeout-kill

The commit that introduces the support only modifies library libs6rc,
and some quick greps suggest that the code changes affect
s6-rc-update, but not s6-rc-compile.

* 
https://git.skarnet.org/cgi-bin/cgit.cgi/s6-rc/commit/?id=1626bf78dd47a42ce37e984025a434a666fc5bbf

G.


Re: Question about enable-absolute-paths option

2018-09-09 Thread Guillermo
El mar., 4 sept. 2018 a las 2:07, Shengjing Zhu escribió:
>
> And I'm mostly agree that binaries like cd/umask/wait are not POSIX
> compatible[1]
>
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906250#22

This bug is closed, and there is now an execline package in Debian
Sid, so it seems that some resolution has been settled on. However,
I'd still like to understand what's the interpretation of the
'exec-ability' requirement raised in message #22, and its connection
with the placement of execline binaries. The wording used in the
contained links to POSIX confuses me. If someone feels like
explaining, to be concrete, what should a compliant 'execution
environment' do when presented a C program that contains an
execlp("cd", "cd" "blahblah", (char *)0) or execlp("umask", "umask",
"022", (char *)0) call?

1) Make the function call behave as the cd or umask utilities
specified by the standard, with no regard to the value of PATH. In
other words, never execute an execline binary.
2) Guarantee the existence of some value of PATH that makes the
function call behave as the cd or umask utilities specified by the
standard. In other words, may or may not execute an execline binary,
but it must be possible to set PATH in a way that makes the call
behave as 'standard cd' or 'standard umask' -which the message admits
cannot happen in current Debian anyway-.
3) None of the above.

(Message #37 is mostly about Debian policy, so no comment on that)

Thanks,
G.


Re: Question about enable-absolute-paths option

2018-09-02 Thread Guillermo
El dom., 2 sept. 2018 a las 13:48, Shengjing Zhu escribió:
>
> From your previous comment, s6 not only uses execline library, but
> also its binaries. Can I get a list of binaries that s6 will use? So I
> can only package these.

You don't need to do that. You *can* install the full set of execline
binaries in /usr/lib/execline using --enable-absolute-paths, (and you
should add --shebangdir=\$${prefix}/lib/execline too), just like in
your OP. Both s6 and s6-rc should work fine with it.

I think currently only fdmove, fdclose and execlineb are called by s6
binaries, but you should't rely on a list of execline binaries that s6
uses. This list can change from release to release as the author
pleases.

> As a result, users won't get execline from Debian, but just s6 […]

Your setup sort of did this already. You can't easily write or use
execline scripts with it other than those generated automatically by
skarnet.org packages like s6-rc. It is better than packaging half or
less of execline, or not packaging it at all, I suppose, but
relegating execline to being some kind of s6 backend, and not being
able to directly use it without some PATH management contortions
greatly diminishes the usefulness of such a Debian package, I think.

G.


Re: [announce] skarnet.org Summer 2018 release

2018-08-21 Thread Guillermo
2018-08-21 2:29 GMT-03:00 Laurent Bercot:
>>
>> Say this is the compiler's 'standard' headers directory (i.e. normally
>> /usr/include). Then, it is likely also the location of the files of
>> the same name supplied by the libc, and in that case, 'make install'
>> would overwrite them. Or, if the directory is handled by a package
>> manager and files are 'owned' by packages, this would result in file
>> collisions or similar.
>
>  That's normal and expected.
>  Those files provide libc functionality, so they're meant to replace
> libc files if installed into /usr/include.

Oh. That's kind of drastic. I re-read the documentation of nsss and
utmps, in case I missed it the first time, and still think none of it
suggests that this is the intended behaviour. The system consistency
argument is a good one, though, and getting these packages to install
in parallel with libc headers is doable at packaging time, so OK I
guess. But I think that more explicitly warning that 'make install'
will likely overwrite libc files with the 'configure' script's
defaults would be better.

>  The exact same pattern happens with utmps, which replaces the
> libc's  header, and you didn't seem to have a problem
> with it at the time. :)

That's because I didn't notice :P I wanted to see what the effect of
--enable-nsss was on packages of the s6 stack, found surprising that
it didn't affect the C code, not even #include directives, looked at
the 'configure' script, found that it didn't add -I options either,
wondered how could that possibly work, and finally realized where the
nsss headers were installed by default.

Thanks,
G.


Re: [announce] skarnet.org Summer 2018 release

2018-08-20 Thread Guillermo
Hello,

2018-08-14 19:20 GMT-03:00 Laurent Bercot:
>
>  * s6-linux-utils-2.5.0.0
>--
>
>  […]
>  - s6-devd, s6-uevent-listener and s6-uevent-spawner have been removed.
> They have been obsoleted by mdevd.

So I guess it's time to get rid of the examples that use them in the
s6-rc package (examples/source/mdevd{,-listener})? And to modify
examples/source/init-coldplug if this means getting rid of BusyBox
mdev?

>  * s6-rc-0.4.1.0
>-

...and now that I looked again at the s6-rc examples, I noticed that
longruns that are members of pipelines haven't been adapted to the
0.4.x.x format: pipeline-name files are still in the definition
directories of the producers.

>  * nsss-0.0.1.0
>
>
>  This is a new package: an alternative implementation of the "name
> service switch" mechanism, i.e. a way for getpwnam() and friends to
> use another backend than the traditional /etc/passwd and similar
> files, without the problems of NSS such as using dynamically loaded
> modules.

The actions of its build system look problematic. The package contains
replacements for pwd.h, grp.h and shadow.h, these are necessary to
define getpwnam(), getgrnam(), getspnam(), etc. as macros that expand
to the name of the nsss_*() function chosen by the NSSS_DISABLE_*
settings, and the makefile installs them directly in the specified
includedir (./configure --includedir=DIR), correct?

Say this is the compiler's 'standard' headers directory (i.e. normally
/usr/include). Then, it is likely also the location of the files of
the same name supplied by the libc, and in that case, 'make install'
would overwrite them. Or, if the directory is handled by a package
manager and files are 'owned' by packages, this would result in file
collisions or similar.

So alternatively, one could specify a different includedir. But then,
to build applications so that they link to libnsss, one would have to
always supply an -I (capital i) option to the compiler, naming this
directory. Both for 'nsss-unaware' applications (i.e. applications
that simply include ,  or  with no regard to
implementation) and for applications that deliberately want libnsss
(e.g. that explicitly include, say, ). In the
latter case, because none of the src/include/nsss/*.h files would be
in the 'nsss' subdirectory of the 'standard' headers directory either.

Can't src/include/pwd.h get merged with src/include/nsss/pwd.h in a
single file, adjusting #include directives where (if?) needed, and
installed in $includedir/nsss? And the same with grp.h and shadow.h?
This way, 'nsss-aware' applications don't need compiler -I options if
$includedir is the 'standard' headers directory, and there would be no
file overwriting or collisions. Only nsss-unaware applications would
need an -I$include/nsss option, but they also already need at least an
-lnsss option, i.e they already need administrator intervention to
link them to libnsss anyway. And to avoid having to add --with-include
when --enable-nsss is used, I was thinking that './configure
--enable-nsss' and './configure --enable-nsss=yes' could add '-I
$includedir/nsss' to addincpath, and './configure --enable-nsss=DIR'
could add '-I DIR' to addincpath, Unless combined with
--enable-slashpackage, which recent commits to the packages' Git
repositories seem to have taken care of.

Thanks,
G.


Re: [announce] utmps-0.0.1.0

2018-07-07 Thread Guillermo
Hello,

2018-06-07 9:47 GMT-03:00 Laurent Bercot:
>
>  Hello,
>  It's been there for some time but there's now a release number.
>  utmps-0.0.1.0 is out.
> […]
>  It still needs a lot of testing.

So, I noticed the package includes example OpenRC service scripts
(wow!) for utmps-utmpd and utmps-wtmpd. Probably not the kind of
testing you hoped for, but for the sake of correctness: they won't
work :-P The arguments supplied to the program named in the assignment
to 'command' must go in an assignment to variable 'command_args', and
in these particular examples, the end of options marker ("--") before
s6-ipcserver's arguments interacts badly with OpenRC's mechanism for
invoking start-stop-daemon: arguments stored in variable
'start_stop_daemon_args', including -b, end up getting passed to
s6-ipcserver instead, and the script hangs waiting for this program to
exit.

Also, as an improvement, the script doesn't need to assume directory
/run/utmps exists: it can check that using OpenRC's 'checkpath'
command, and create it with the correct owner and mode if it does not
exist, just like service utmps-prepare does in the s6-rc examples (is
its definition directory wrongly named 'sutmp-prepare'?).

Patch at the end of the message with those changes, plus some
additional ones for better readability. Patched scripts tested as
working on Gentoo:

# rc-service utmpd start
 * /run/utmps: creating directory
 * /run/utmps: correcting owner
 * Starting utmpd ... [ ok ]

# rc-service wtmpd start
 * /run/utmps: creating directory
 * Starting wtmpd ... [ ok ]

$ rc-status
...
Dynamic Runlevel: manual
 utmpd[  started  ]
 wtmpd[  started  ]

$ ps -eo pid,ppid,euser,egroup,args
  PID  PPID EUSEREGROUP   COMMAND
…
 3588 1 utmp utmp s6-ipcserverd -- utmps-utmpd
 3620 1 utmp utmp s6-ipcserverd -- utmps-wtmpd

$ ls -ld /run/utmps
drwxr-xr-x 2 utmp utmp 120 Jul  5 23:14 /run/utmps

$ ls -l /run/utmps
total 8
-rw-r--r-- 1 root root 5 Jul  6 23:14 utmpd.pid
srwxrwxrwx 1 utmp utmp 0 Jul  6 23:14 utmpd-socket
-rw-r--r-- 1 root root 5 Jul  6 23:14 wtmpd.pid
srwxrwxrwx 1 utmp utmp 0 Jul  6 23:14 wtmpd-socket

G.

--- original/utmps-0.0.1.2/examples/openrc/utmpd 2017-11-20
11:42:03.0 -0300
+++ patched/utmps-0.0.1.2/examples/openrc/utmpd  2018-07-05
23:26:06.490884765 -0300
@@ -1,8 +1,13 @@
 #!/sbin/openrc-run

-# Assumes the /run/utmps directory already exists and belongs to utmp
-
 name="utmpd"
-command="s6-ipcserver -- /run/utmps/utmpd-socket utmps-utmpd"
+command="s6-ipcserver"
+command_args="/run/utmps/utmpd-socket utmps-utmpd"
+command_background=yes
+command_user=utmp
 pidfile="/run/utmps/utmpd.pid"
-start_stop_daemon_args="-b -m -c utmp -d /run/utmps"
+start_stop_daemon_args="-d /run/utmps"
+
+start_pre() {
+   checkpath -D -d -o utmp:utmp -m 0755 /run/utmps
+}
--- original/utmps-0.0.1.2/examples/openrc/wtmpd 2017-11-20
11:42:03.0 -0300
+++ patched/utmps-0.0.1.2/examples/openrc/wtmpd  2018-07-05
23:26:29.349884765 -0300
@@ -1,8 +1,13 @@
 #!/sbin/openrc-run

-# Assumes the /run/utmps directory already exists and belongs to utmp
-
 name="wtmpd"
-command="s6-ipcserver -- /run/utmps/wtmpd-socket utmps-wtmpd"
+command="s6-ipcserver"
+command_args="/run/utmps/wtmpd-socket utmps-wtmpd"
+command_background=yes
+command_user=utmp
 pidfile="/run/utmps/wtmpd.pid"
-start_stop_daemon_args="-b -m -c utmp -d /run/utmps"
+start_stop_daemon_args="-d /run/utmps"
+
+start_pre() {
+   checkpath -D -d -o utmp:utmp -m 0755 /run/utmps
+}


Small documentation error in s6-rc

2018-04-02 Thread Guillermo
Hello,

The documentation of the s6-rc-compile program has a small mistake in
the description of its -b option. It says:

"make s6-rc-oneshot-runner invocations wait instead of fail on lock contention."

It mentions s6-rc-oneshot-runner, the supporting service autogenerated
by s6-rc-compile, instead of s6-rc-oneshot-run, the internal program,
and as a consequence, the hyperlink returns an HTTP 404 error.

G.


Re: [announce] execline-2.5.0.0

2018-04-02 Thread Guillermo
2018-04-02 7:39 GMT-03:00 Laurent Bercot:
>
>  User reports have come in by the hundreds and they are almost
> unanimous (sorry, Colin): they don't like the 2.4.0.0 change,
> pretending it hurts readability (as if), and writability too,
> of execline scripts. (What? People were actually writing execline
> scripts? Why haven't I heard of them before yesterday?)
>  They want a revert to the old syntax.
>
>  Users. They never know what they want.

My reaction:

1) "Oh, an announcement!" (timezone magic made this happen on Saturday for me)
2) "Wait, what? Whaaat?!"
3) All of this chaotically over a short period of time:
  * "How is something like this execline-2.4.0.0 and not execline-3.0.0.0?"
  * "Wait, is s6-linux-init still going to work? Did I miss a new
s6-linux-init release announcement?" (I don't know why my brains
focused on s6-linux-init instead of the major breakage of s6 and s6-rc
that not retaining the old names somehow would have produced)
  * "Wait, did he rename the C source files too? Like
src/execline/=.c, src/execline/;.c, etc.?"
  * "Wait, execline commands exist as executable files in the
filesystem, are the files going to actually have those names? Like
'test' and '['? That new makefile is going to be quite interesting..."
  * "Wait, are programs still going to be callable by their old names?"
- "How? Compatilibity symlinks? Didn't he dislike multiple
personality binaries? Is execlineb going to implement the conversion
as part of its parsing?" (the latter could actually work?)
- "Does every execline script need to be rewritten now? How many
of those are out there already?"
  * "Hmmm, using execline commands from a shell is going to be hell
now with all that character escaping."
  * "Well, on the other hand, maybe no more ImageMagick-like name collisions..."
  * "Let's see how many programs kept their names. Huh? ímport is still here?"
4) "I definitely have to take a closer look now."
5) "Oh."

G.


Re: [announce] mdevd-0.0.1.0 - a mdev-compatible uevent manager

2018-01-15 Thread Guillermo
2018-01-15 8:39 GMT-03:00 Laurent Bercot:
>
>  The change is obviously incompatible with mdevd-0.0.1.0. It appears to
> work for me, but please test it - especially Olivier who was missing
> events with the old coldplug. If no problems are found, I'll cut the
> 0.1.0.0 release.

Worked for me too. All kernel modules loaded, same as when using eudev.

G.


Re: [announce] mdevd-0.0.1.0 - a mdev-compatible uevent manager

2018-01-14 Thread Guillermo
2018-01-13 12:40 GMT-03:00 Guillermo:
>
> 2018-01-12 23:10 GMT-03:00 Laurent Bercot:
>>
>>  Anyway, I will change to what udev does; but this is annoying because
>> it requires a rearchitecture, so it will take some time.
>>
>>  With the mdev -s model, it was possible to just read from sysfs,
>> synthesize events, and send the synthetic events to a dedicated mdevd
>> instance. With the udevadm trigger model, this is not what happens:
>> instead of reading from sysfs and synthesizing events, the coldplugger
>> actually pokes the kernel, which creates real events; and the netlink
>> listener must be up in order to receive and process them.
>
> I'm not sure you'd need to modify this part. The uevent files can
> still be read, systemd does try to read IFINDEX, MAJOR and MINOR from
> them during enumerator_scan_dir_and_add_devices(), and from what I've
> seen, the information you can read from them is pretty much the same
> as what you get from events triggered by writing to them. I'll tell
> you what, I'm going to try doing both things (using eudev's 'udevadm
> monitor' to check the netlink part) and see if I get the same results.

OK, I did this, I compared what I got by reading every
/sys/class/*/*/uevent and /sys/bus/*/devices/*/uevent file (like
mdevd-coldplug does for /sys/dev/*/*/uevent) with what I got by
writing 'add' to them (like udevadm trigger does) and then using
'udevadm monitor --kernel --property' to watch the netlink events, and
there's no difference.

udevadm monitor showed:

* ACTION=add, as expected. mdevd-coldplug adds it for the synthetic event.
* A DEVPATH starting with /device that corresponds to the /sys/devices
subdirectory. The DEVPATH added by mdevd-coldplug for the synthetic
event starts with /dev and corresponds to the /sys/dev symlink.
* A SEQNUM line that I suppose can be ignored.
* A SUBSYSTEM line. mdevd-coldplug adds it for the synthetic event
using readlinkat() with the 'subsystem' symlink found in the same
directory as the uevent file, and retrieving the last pathname
component. This works with /sys/{class,bus} too.

And everything else was the same as the contents of the uevent file.
So reading /sys/class/*/*/uevent and /sys/bus/*/devices/*/uevent (or
/sys/subsystem/*/devices/*/uevent instead, if it exists), synthesizing
events and sending them to mdevd's standard input would work, it
seems.

G.


Re: [announce] mdevd-0.0.1.0 - a mdev-compatible uevent manager

2018-01-13 Thread Guillermo
2018-01-12 23:10 GMT-03:00 Laurent Bercot:
>
>  There are modules for which a /sys/class entry, and an appropriate
> major/minor pair for mdevd to create the device node, only appear when
> the module is loaded, and there's no prior uevent file to trigger
> loading the module: the one I've tested it on is ppp_generic.
> /sys/class/ppp/ppp/uevent only appears once you modprobe ppp_generic, at
> the same time as /sys/dev/char/108:0. So even scanning /sys/class
> does not guarantee you'll get events for all the modules you need.
> It probably has to do with the fact that ppp_generic isn't tied to
> any hardware in particular; whereas in Olivier's instance, I suspect
> the hardware was already visible somewhere under /sys/bus.

I'm leaning towards the same explanation. I don't think the device
manager was ever expected to load *all* needed modules, only the
hardware related ones. After all, systemd also has modules-load.d to
explicitly request loading them:

* https://www.freedesktop.org/software/systemd/man/modules-load.d.html

>  The only explanation I see is that the canonical way of triggering
> uevents changed after landley stopped being involved in sysfs and mdev.
> I think this is likely, since his doc mentions the obsolete (but still
> present) /sys/block hierarchy.

As a side note, /sys/block is duplicated in /sys/class/block, and
caught by the /sys/class/*/* scan. On my computer, disks and disk
partitions, for example, are only found there. I also wondered why
'udevadm trigger' didn't access /sys/block.

>  Anyway, I will change to what udev does; but this is annoying because
> it requires a rearchitecture, so it will take some time.
>
>  With the mdev -s model, it was possible to just read from sysfs,
> synthesize events, and send the synthetic events to a dedicated mdevd
> instance. With the udevadm trigger model, this is not what happens:
> instead of reading from sysfs and synthesizing events, the coldplugger
> actually pokes the kernel, which creates real events; and the netlink
> listener must be up in order to receive and process them.

I'm not sure you'd need to modify this part. The uevent files can
still be read, systemd does try to read IFINDEX, MAJOR and MINOR from
them during enumerator_scan_dir_and_add_devices(), and from what I've
seen, the information you can read from them is pretty much the same
as what you get from events triggered by writing to them. I'll tell
you what, I'm going to try doing both things (using eudev's 'udevadm
monitor' to check the netlink part) and see if I get the same results.

G.


Re: [announce] mdevd-0.0.1.0 - a mdev-compatible uevent manager

2018-01-11 Thread Guillermo
2018-01-09 21:35 GMT-03:00 Laurent Bercot:
>
>  What it looks like is the kernel not assigning a major and a minor
> to the device before you manually trigger the "add". I suspect it's
> just that the relevant module is not loaded, but why would that be
> different from any other hardware managing to make mdevd autoload the
> appropriate module?
> [...]
>  Well I certainly don't want people to need to write their own
> trigger-uevents service, so I'll make sure your hardware is properly
> found by mdevd-coldplug. I'd just like to understand what is happening
> and add the proper fix rather than take a big hammer and scan all of
> /sys/devices just to be sure.

It might be worth to look at what systemd [1] and eudev [2] do here.
When 'udevadm trigger --type=devices --action=add' is performed after
the udev daemon starts executing, it looks like what happens is
basically this:

* If /sys/subsystem exists, an 'add' string is written to every
/sys/subsystem/*/devices/*/uevent file.
* If /sys/subsystem does not exist, an 'add' string is written to
every /sys/class/*/*/uevent and /sys/bus/*/devices/*/uevent file.

I have a 4.9-series kernel and no /sys/subsystem, but I do have
/sys/class/*/* and /sys/bus/*/devices/* files indeed, and they are in
all cases symbolic links to subdirectories of /sys/devices with
'uevent' files. So it kind of looks like a big hammer is used, but
with an easy to generate set of pathnames instead of actually
traversing
the /sys/devices hierarchy. Maybe this can be taken to be the
currently agreed sysfs interface for device managers (or else systemd
breaks).

On the other hand it looks like mdevd-coldplug and BusyBox' 'mdev -s'
want to access the /sys/dev/block/${major}:${minor} and
/sys/dev/char/${major}:${minor} symlinks. If it is the case that some
of these won't exist until the relevant kernel modules are loaded, and
if the device manager is expected to load them, it would mean it can't
reliably use /sys/dev.

For testing purposes, with mdevd-netlink running, either this:

#/bin/execlineb -P
elglob syssubsystem /sys/subsystem/*/devices/*/uevent
forx f { $syssubsystem }
importas -u ueventfile f
redirfd -w 1 $ueventfile
echo add

or this:

#/bin/execlineb -P
elglob sysclass /sys/class/*/*/uevent
elglob sysbus /sys/bus/*/devices/*/uevent
forx f { $sysclass $sysbus }
importas -u ueventfile f
redirfd -w 1 $ueventfile
echo add

could be used to see what happens, depending on the case. I don't have
mdevd installed with a suitable /etc/mdev.conf file at the moment, but
I tried executing the script while eudev's udevd was running (as a
replacement for udevadm trigger), and I did get all kernel modules to
load, and did see new /sys/dev/*/* symlinks after that.

G.

[1]
* https://github.com/systemd/systemd/blob/master/src/udev/udevadm-trigger.c
adm_trigger(), exec_list()
* https://github.com/systemd/systemd/blob/master/src/libudev/libudev-enumerate.c
udev_enumerate_scan_devices()
* 
https://github.com/systemd/systemd/blob/master/src/libsystemd/sd-device/device-enumerator.c
device_enumerator_scan_devices(), enumerator_scan_devices_all(),
enumerator_scan_dir(), enumerator_scan_dir_and_add_devices()
* 
https://github.com/systemd/systemd/blob/master/src/libsystemd/sd-device/sd-device.c
sd_device_new_from_syspath()

[2]
* https://github.com/gentoo/eudev/blob/master/src/udev/udevadm-trigger.c
adm_trigger(), exec_list()
* https://github.com/gentoo/eudev/blob/master/src/libudev/libudev-enumerate.c
udev_enumerate_scan_devices(), scan_devices_all(), scan_dir(),
scan_dir_and_add_devices(), syspath_add()


Re: s6-svscan - controlling terminal semantics and stdin use

2018-01-02 Thread Guillermo
2018-01-02 0:53 GMT-03:00 Earl Chew:
>
> My first observation is that when killing the cron initiated s6-svscan
> process with SIGINT or SIGTERM, I see the behaviour described in the
> documentation. The child s6-supervise processes are correctly
> terminated, and there are no orphans.
>
> If instead, I start s6-svcscan at the terminate and terminate it with ^C
> (SIGINT), what I observe is that the child s6-supervise processes
> terminate abruptly and their child service processes are orphaned.
> [...]
>
> Has this scenario (ie starting s6-svscan from an interactive terminal)
> been considered previously?
>
> My expectation was that the two scenarios would exhibit similar behaviours.

Do you really need s6-svscan to run in the foreground, or do you just
want to gracefully tear down the supervision tree in this case? What I
do if I want to use s6-svscan from an interactive shell is:

$ s6-svscan /path/to/scandir &
(shell returns a job control job ID)
$ (Do whatever I need to do with the supervision tree)
$ s6-svscanctl -t /path/to/scandir

With a suitable .s6-svscan/finish file, of course. This mostly works
like when launching s6-svscan in any other way (except perhaps for the
environment, as Casper pointed out). No special arrangement with
nosetsid files needed.

Using kill(1) with the job ID returned by the shell (i.e. kill
%some_number) to send a SIGTERM signal instead of using s6-svscanctl
works too, provided that s6-svscan was invoked without the 'divert
signals' option, -s.

G.


Re: [announce] mdevd-0.0.1.0 - a mdev-compatible uevent manager

2017-11-15 Thread Guillermo
2017-11-15 14:11 GMT-03:00 Didier Kryn:
>
> I don't know what evdev is (I guess some virtual device) and what's its
> place in the big Xorg picture, but I think some of you guys know it. Any
> usefull link?

X11 comes in pieces; there is the server, normally installed as
/usr/bin/X and / or /usr/bin/Xorg, and modules, that e.g. Devuan
installs in /usr/lib/xorg/modules, I believe, and that include the
video drivers (/usr/lib/xorg/modules/drivers) and input drivers
(/usr/lib/xorg/modules/input). Evdev an input driver (i.e. it handles
the keyboard, mouse, etc.) that IIUC uses Linux' event interface
(CONFIG_INPUT_EVDEV, "Say Y here if you want your input device events
be accessible under char device 13:64+ - /dev/input/eventX in a
generic way").

You can see which modules are loaded by X at startup in its log file
(normally /var/log/Xorg*.log), and if your computer has the package
that provides the evdev driver installed, see 'man evdev'.

2017-11-15 11:53 GMT-03:00 Laurent Bercot:
>
>  Is it really the only place in Xorg that depends on libudev?
> I'd think it would be much, much more entangled with libudev than
> this.

Indeed it is. The X server itself:

* https://cgit.freedesktop.org/xorg/xserver/tree/config/udev.c

The modesetting video driver:

* 
https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/drivers/modesetting/drmmode_display.c
(drmmode_*uevent*())

The Intel video driver:

* 
https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/tree/src/uxa/intel_driver.c
(I830*UEvent*())

The amdgpu video driver:

* 
https://cgit.freedesktop.org/xorg/driver/xf86-video-amdgpu/tree/src/drmmode_display.c
(drmmode_*uevent*())

And probably more. Although it seems that in most cases the dependency
on the libudev API is selectable by an option at compile time.

G.


Re: s6-fdholder-getdumpc fails with 'Protocol error' (EPROTO)

2017-08-06 Thread Guillermo
2017-08-06 8:47 GMT-03:00 Laurent Bercot:
>
>  Fixed in the latest s6 git. It was a stupid typo.
>  I'll carve a new release some time this month.

Thank you, that worked.

This also affected s6-fdholder-transferdumpc, and, for the record, the
issue fixed by commit ebb6b00bf66828c5a2587dd2cd44f8810e01e00b (which
also worked) affected GNU libc-based operating systems as well, and
manifested as s6-mkfifodir not being able to create publically
accessible FIFO directories, so an s6 bugfix release seems in order
indeed.

* 
http://git.skarnet.org/cgi-bin/cgit.cgi/s6/commit/?id=ebb6b00bf66828c5a2587dd2cd44f8810e01e00b

It turns out GNU libc also has an unsigned gid_t

* 
https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=posix/sys/types.h;hb=HEAD

#ifndef __gid_t_defined
typedef __gid_t gid_t;
# define __gid_t_defined
#endif

* 
https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=posix/bits/types.h;hb=HEAD

#define __U32_TYPE unsigned int
//...
#if __WORDSIZE == 32
//...
#elif __WORDSIZE == 64
//...
# define __STD_TYPE typedef
#else
# error
#endif
#include  /* Defines __*_T_TYPE macros.  */
//...
__STD_TYPE __GID_T_TYPE __gid_t; /* Type of group identifications.  */

* 
https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=bits/typesizes.h;hb=HEAD

#define __GID_T_TYPE __U32_TYPE

G.


s6-fdholder-getdumpc fails with 'Protocol error' (EPROTO)

2017-08-05 Thread Guillermo
Hello,

As the subject says, that is what happens on my computer. According to
the documentation:

"This is generally a programming error. It can also signal a bug in
the s6-fdholder tools, but protocol bugs have usually been wiped out
before a s6 release."

Operating system is Gentoo, architecture x64_64, GNU libc version
2.23, Linux kernel 4.9.16. Unless I made a mistake, all skarnet.org
packages are the latest versions. Steps I've taken to produce the
error:

# Start an fd-holder and store some FDs
$ s6-fdholder-daemon -i rules.d fdholder-socket &
$ mkfifo -m 0660 test-fifo
$ redirfd -rnb 0 test-fifo s6-fdholder-store fdholder-socket
fifo:/home/user/test-fifo
$ s6-ipcserver-socketbinder test-socket s6-fdholder-store
fdholder-socket unix:/home/user/test-socket
$ s6-fdholder-list fdholder-socket
fifo:/home/user/test-fifo
unix:/home/user/test-socket

#Perform a get dump
$ execlineb -Pc '
s6-fdholder-getdump fdholder-socket
pipeline { env } grep ^S6_'

s6-fdholder-getdumpc: fatal: unable to get dump: Protocol error

Partial strace (if requested I'll post it complete):

execve("/home/guillermo/skarnet/bin/s6-ipcclient",
["/home/guillermo/skarnet/bin/s6-i"..., "fdholder-socket",
"/home/guillermo/skarnet/bin/s6-f"..., "pipeline", " env", "", "grep",
"^S6_"], [/* 26 vars */]) = 0
...
socket(AF_UNIX, SOCK_STREAM, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="fdholder-socket"}, 110) = 0
getsockname(3, {sa_family=AF_UNIX}, [110->2]) = 0
dup2(3, 6)      = 6
close(3)    = 0
dup2(6, 7)  = 7
...
execve("/home/guillermo/skarnet/bin/s6-fdholder-getdumpc",
["/home/guillermo/skarnet/bin/s6-f"..., "pipeline", " env", "",
"grep", "^S6_"], [/* 28 vars */]) = 0
...
ppoll([{fd=6, events=POLLOUT}], 1, {tv_sec=1152921504606846976,
tv_nsec=0}, NULL, 8) = 1 ([{fd=6, revents=POLLOUT}], left
{tv_sec=1152921504606846975, tv_nsec=96660})
sendmsg(6, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\0\0\0\1\0\0", iov_len=6}, {iov_base="?",
iov_len=1}], msg_iovlen=2, msg_controllen=0, msg_flags=0},
MSG_NOSIGNAL) = 7
recvmsg(6, {msg_namelen=0}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC)
= -1 EAGAIN (Resource temporarily unavailable)
ppoll([{fd=6, events=POLLIN|POLLHUP}], 1, {tv_sec=1152921504606846975,
tv_nsec=999717297}, NULL, 8) = 1 ([{fd=6, revents=POLLIN}], left
{tv_sec=1152921504606846975, tv_nsec=999489651})
recvmsg(6, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\0\0\0\t\0\0\0\0\0\0\1\0\0\0\2\0\0\0P\0\2P\0\0\0Y\206;.:w\305"...,
iov_len=2047}, {iov_base=NULL, iov_len=0}], msg_iovlen=2,
msg_control=[{cmsg_len=24, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_RIGHTS, cmsg_data=[3, 4]}], msg_controllen=24,
msg_flags=MSG_CMSG_CLOEXEC},
MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 101
close(4)= 0
close(3)= 0
writev(2, [{iov_base="s6-fdholder-getdumpc: fatal: una"...,
iov_len=64}, {iov_base=NULL, iov_len=0}], 2) = 64
exit_group(1)   = ?
+++ exited with 1 +++

Looking at this it *seems* that the fd-passing part actually succeded,
and that the two received FDs are 3 and 4.

Thanks,
G.


Re: Question about s6-svscanboot (s6 examples)

2017-07-23 Thread Guillermo
2017-07-23 14:33 GMT-03:00 Guillermo:
>
> It looks wrong to me...

I mean, not putting the 's6-envdir -I /service/.s6-svscan/env' near
the beginning (perhaps it should use absolute path
/command/s6-envdir), but moving the 'exec -c' just before the
s6-svscan invocation.

G.


Question about s6-svscanboot (s6 examples)

2017-07-23 Thread Guillermo
Hello,

I don't understand the changes made to this script in commit
3bcbbe18f9a60c0743c30d391c7878034e4b1274 (just before the release of
version 2.3.0.0). Previously, the 'exec -c' invocation came first, and
the 's6-envdir -I /service/.s6-svscan/env' invocation, just before
exec...()'ing into s6-svscan, so that the /service/.s6-svscan/env
environment directory would become the supervision tree's complete
environment.

But then the order was inverted. So now /service/.s6-svscan/env seems
useless, and the supervision tree starts with an empty environment,
including an undefined PATH? It looks wrong to me...

* 
http://git.skarnet.org/cgi-bin/cgit.cgi/s6/commit/examples/s6-svscanboot?id=3bcbbe18f9a60c0743c30d391c7878034e4b1274

Thanks,
G.


s6 documentation errors

2017-07-23 Thread Guillermo
Hello,

The documentation for s6-ipcserverd repeats two times IPCREMOTEEUID in
the last sentence of the "Environment variables" section. The second
ocurrence should be IPCREMOTEEGID.

* http://www.skarnet.org/software/s6/s6-ipcserverd.html

The documentation for s6-ipcclient has a dead link to the s6-sudo page
in the "Notes" section, because the URL is missing a final '.html'.

* http://www.skarnet.org/software/s6/s6-ipcclient.html

G.


Re: Configuration and scripts location

2017-02-22 Thread Guillermo
Hi,

2017-02-22 7:18 GMT-03:00 Guillaume Perréal:
>
> Tweaking my services, I have wondered where to put their configuration.
>
> For longruns, one could either put the in the data/ and env/ subdirectories,
> or in a /etc subdirectory. The former allows to easily have different
> instances of the same services but the files might be harder to locate, and
> requires to update the s6-rc database to change the configuration. The
> latter allows to use a well-known location and to update the configuration
> without updating the s6-rc database.

For longruns env/ and data/ work; the current working directory for
longruns is their s6 service directory (generated by s6-rc-compile),
so env/ and data/ can be referred to in 'run' and 'finish' scripts
using relative paths, a style that aligns with daemontools-like
supervision suites usage -s6 is the underlying mechanism after all-.
And doing it that way *does* allow dynamic changes without recompiling
the services database. What happens though is that if the 'boot time'
database (i.e. the one used in the s6-rc-init invocation) is not
recompiled, those changes don't survive a machine reboot. The usage of
env/ and data/ was touched upon in an a supervision ML thread:



Quoting Laurent: "I think env/ and data/ should be used for
configuration that is mostly static, i.e. that can wait until a
database recompilation to be applied. For configuration that needs to
be applied immediately, it is best if services store their config in a
directory that is fully independent from the service directory
location, and access it via absolute paths". But, to be honest, I
still like that one is able to make such in-place changes, and like
s6-rc-bundle for the same reason.

> For oneshots, as one cannot use neither env/nor database. There is no much
> solutions.

Yup, oneshots need independent directories referenced by absolute
path; the current working directory for their 'up' and 'down' scripts
is s6rc-onsehot-runner's service directory in s6-rc's current
implementation (i.e. with s6-sudoc + s6-sudod as the underlying
mechanism).

My personal preference for anything (s6-rc-specific) that needs to be
referenced by absolute path in 'run', 'finish', 'up' or 'down' would
be /etc/s6-rc/misc/$SERVICE_NAME, with $SERVICE_NAME matching the name
in s6-rc's service database (i.e. as shown by 's6-rc-db list all').
And /etc/s6-rc/misc/$SOMETHING_REPRESENTATIVE for things that do not
'belong' to a single service, but might be shared among many (like
general enough auxiliary scripts) -implying that
$SOMETHING_REPRESENTATIVE is then unavailable as a service name, of
course-.

> The same question goes for helper scripts (for example, the "event handler"
> of udhcpc, responsible for applying the configuration). I'm torn between
> putting them in s6-rc/scripts/the-service-that-uses-them, a "scripts"
> subdirectory in their configuration, or something like
> {/lib,/usr/share}/the-service-that-uses-them.

My choice of 'misc' in /etc/s6-rc/misc also means that I think placing
scripts there is OK too :), but if I had to choose an alternative, it
would be something under /lib with 's6-rc' in the name. On my machine
it would be something like /lib/skarnet/s6-rc/$SERVICE_NAME, because I
put anything 'libexec-like' under /lib/skarnet for both s6 and s6-rc
(and s6-linux-init-maker-like stage2, stage2_finish and stage3 scripts
would be in /lib/skarnet/s6-init). I wouldn't choose anything under
/usr, because on a machine with s6-rc as part of its init system I
consider anything that might be invoked by a service 'essential'
enough to be on the rootfs.

G.


Re: Skalibs on OpenBSD

2017-02-13 Thread Guillermo
Hi,

2017-02-13 2:49 GMT-03:00 multiplex'd:
>
> [...] However, when I compile s6, one of the programs fails
> to link (I didn't pay attention to which one), with the linker complaining 
> about
> an undefined reference to arc4random_addrandom().
>
> Long story short, after searching through the skalibs source, OpenBSD's 
> manpages
> and OpenBSD's CVS, I found that OpenBSD removed the arc4random_addrandom() and
> arc4random_stir() functions in 2013.
>
> I tried commenting out the body of random_init() in librandom [...]

Don't. This looks like a portability issue. The definition of
random_init() that makes use of arc4random_addrandom() is
#ifdef-guarded. It gets picked up if macro SKALIBS_HASARC4RANDOM is
defined, and whether it is (in ) or not depends on
the result of tests perfomed by skalib's 'configure' script. In this
case, the particular test is this one:



What it actually tests for is the presence of function
arc4random_uniform(), probably assuming that if it is available, then
arc4random_addrandom() will be as well, and looking at OpenBSD's man
pages, it seems version 5.5 and later defeat that assumption. If
that's really it, you did the right thing by reporting it here, and
Laurent is going to love this one :D

G.


Re: Some glitches on skarnet.org

2016-08-14 Thread Guillermo
Hello,

May I take this opportunity to bring up again the ML archives issue?



Laurent Bercot:
>
> Yeah, it's a bug in ezmlm-cgi, and I haven't investigated it yet. Not sure if
> it's non-ascii content or something else, but some messages just make
> ezmlm-cgi crash.

Perhaps somebody else noticed it too, but at this point I'm pretty
sure it is messages containing URLs, when they are converted to
clickable hyperlinks. But only sometimes. I still don't know what the
pattern is, but a URL at the beginning of a line (or wrapping at the
end of one) seems to almost certainly trigger the crash, and having
other characters before it (like when the URL is inside '<>' or after
'*') seems to prevent it.

Thanks,
G.


Re: backtick to NOT set variable

2016-06-20 Thread Guillermo
2016-06-18 15:13 GMT-03:00 Laurent Bercot:
>
> Please try the -I option to backtick and withstdinas
> in the latest execline git and tell me if it works for you.

Does for me, both programs. However, backtick's documentation doesn't
mention now that the -i, -I and -D options also affect the behaviour
and exit status when 'prog1' (the one that runs as a child) crashes or
exits 0. Not sure if there are other failure modes besides that one
and the presence of null characters. Also, -I is mentioned in the
options description, but not in the command line syntax description.

G.


Re: s6-rc-update do not bring up services

2016-06-04 Thread Guillermo
2016-06-04 8:30 GMT-03:00 Eric Vidal:
>
>>   Um, no. What part of the documentation made you think that? If the
>> documentation is unclear enough that you understood that, then I need to
>> fix it.
>
> this is the complet line : The service has a dependency to a service that 
> must restart, or to an old service that must stop, or to a new service that 
> did not previously exist or that was previously down.
> found it here : http://skarnet.org/software/s6-rc/s6-rc-update.html
> Restarts section

In the context of your first post, what that means is that if *some
other* service that:

a) is both in the old database ("old" meaning "the one that is about
to be replaced by s6-rc-update") and the new database, and
b) is up when you run s6-rc-update, and
c) has the new service *as a dependency* (in the new database, of course)

*then* that new service will be started, and the service that now
depends on it will be restarted. This as part of matching the
machine's state as closely as possible. It won't be brought up in
general just because it is new.

G.


Re: Booting, s6-rc services and controlling terminal

2016-04-30 Thread Guillermo
On a side note:

2016-04-02 12:56 GMT-03:00 Guillermo:
>
> Consider a machine with an s6-linux-init-maker-style setup, that also
> uses s6-rc in the stage2 and stage2_finish scripts. Longruns managed
> by s6-rc without a 'producer-for' file in their definition directory
> have their stdout and stderr redirected to the catch-all logger, and
> so do oneshots via the s6rc-oneshot-runner service. So if one wants to
> have a service output to the console instead (e.g. an early one in the
> boot process), its definition would have to do explicit redirections
> to /dev/console.

I think I made a mistake in that description. The s6rc-oneshot-runner
service is an s6-ipcserverd process that spawns s6-sudod processes.
And the program executed by s6-sudod gets its stdout and stderr from
the corresponding s6-sudoc process by default. Therefore, the output
of s6-rc oneshots wouldn't go to s6-svscan's stdout / stderr, but to
those of whoever made the 's6-rc change' call. When the machine is
booting or shutting down, that would be the stage2 or stage2_finish
script, respectively, which, depending on the setup, may or may not
have their output redirected to the catch-all logger. So for oneshots,
explicit redirections in their service definition might not be needed
after all...

G.


Re: Booting, s6-rc services and controlling terminal

2016-04-02 Thread Guillermo
Some more background on this.

2016-04-02 14:42 GMT-03:00 Laurent Bercot:
>
>  I have no idea what the TIOCSCTTY is for, though.

Apparently, on Linux and the BSDs, if 'fd' is an open file descriptor
to a terminal device, ioctl(fd, TIOCSCTTY, ...) (two or three
arguments) sets it as the controlling terminal [1][2]. I've seen calls
like that in sysvinit, agetty from util-linux and open-controlling-tty
from nosh.

> *What* device would you open as a controlling terminal in the general case?

>From what I understand from the code, sysvinit tries several devices
in order: the contents of the CONSOLE environment variable,
/dev/console, /dev/tty0 and finally /dev/null if everything else
failed. But it looks like it sets the controlling terminal for the
processes it is configured to wait for until they end, and only when
booting the machine (actions 'sysinit' and 'bootwait') or in
single-user runlevel (runlevel 's'). It doesn't for processes with a
'respawn' action in normal runlevels, for example. The relevant part
should be inside the spawn() function in the init.c file [3]. I get
lost easily trying to read that code, though :)

>  I have no idea what systemd does and I don't think it would be a good
> indicator of what *should* be done anyway.

:)

For completeness, the unit file directives involved here are
StandardInput, StandardOutput and StandardError:



> I suspect that the use of open-controlling-tty is there to mimic systemd's
> behaviour exactly;

Maybe. Honoring the expected behaviour of a 'StandardInput=tty'
directive does seem to require setting the device stdin is connected
to as a controlling terminal, according to that documentation. But
surprisingly the unit file-to-bundle directory converter
(system-control convert-systemd-units) doesn't support
'StandardOutput=tty' and 'StandardError=tty' directives and ignores
them. Supposedly, if supported, 'StandardOutput=tty' appearing alone
in the unit file would make the service output to a terminal while
keeping stdin connected to /dev/null, and not set a controlling
terminal. That's why I said that in nosh's case I wasn't sure if the
whole controlling terminal thing is because of this limitation.

> [...] this conversation would have a place on the supervision ML

I was undecided about where to post it, but ultimately since I was
presenting it as a question about s6-rc, I chose the skaware ML.

Thanks,
G.

[1] 
[2] 

[3] 



Booting, s6-rc services and controlling terminal

2016-04-02 Thread Guillermo
Hello,

Excuse me if this is a basic POSIX question.

Consider a machine with an s6-linux-init-maker-style setup, that also
uses s6-rc in the stage2 and stage2_finish scripts. Longruns managed
by s6-rc without a 'producer-for' file in their definition directory
have their stdout and stderr redirected to the catch-all logger, and
so do oneshots via the s6rc-oneshot-runner service. So if one wants to
have a service output to the console instead (e.g. an early one in the
boot process), its definition would have to do explicit redirections
to /dev/console. But should the service also be turned into a session
leader and set a controlling terminal, or only in special cases?

I thought that this was needed only in special cases, like in
processes that end up spawning a shell, but I see other init systems
sometimes do it automatically. For example, with nosh, a service
configured to output to a terminal AND defined using a systemd unit
file, when converted to a bundle directory (nosh's version of a
service definition directory) with the converter tool, has a 'run'
script that calls open-controlling-tty (nosh's chainloading utility to
open a device as a controlling terminal). And sysvinit's code has
setsid() and TIOCSCTTY ioctl() calls for child processes, depending on
the 'runlevel' and 'action' fields of the corresponding line in
/etc/inittab.

What those two cases have in common is that the mechanism involved
also unconditionally redirects stdin to a terminal device; in the nosh
case, I'm not sure if because of a limitation of the unit
file-to-bundle directory converter. But I don't know why the
"controlling terminal maneuver" is done there, so I now wonder if
s6-rc service definitions would also require it in more cases than I
thought, instead of simple FD redirections.

Thanks,
G.


Re: target mismatch building on different versions of darwin

2016-02-28 Thread Guillermo
2016-02-28 7:34 GMT-03:00 Laurent Bercot:
>
> On 27/02/2016 22:22, Laurent Bercot wrote:
>>
>> I'm not sure how gcc/clang -dumpmachine gets its information (on my Linux,
>> strace doesn't show anything conclusive) but if it's provided by the OS,
>> then I consider this a bug of Darwin that you can report
>
>  I'm told it's hardcoded into the compiler at compiler build time.

Yes. I believe that unless it is explicitly specified at configuration
time, their build system tries to autodetect the host's architecture
and OS using GNU's config.guess script [1], which both the GCC and
Clang (well, LLMV) sources include. And yes, the triplet emitted by
config.guess does include the OS release for some of them.

However, assuming libraries and executables built on one version are
binary compatible with the other, specifying a --target option to
execline's configure script that matches the one in skalib's sysdeps
directory wouldn't also solve the issue? I.e.:

./configure --target=x86_64-apple-darwin13.4.0
--with-sysdeps=whatever/skalibs/sysdeps

G.

[1] 


What happens if the catch-all logger restarts?

2015-12-26 Thread Guillermo
Hello,

Consider a machine with an s6-linux-init-maker-style setup (s6-svscan
running as process 1, a catch-all logger reading from a FIFO, etc.). I
was wondering if it was possible to make the catch-all logger change
its logging script when it is already running, and thought that this
would require restarting it, e.g. by sending it SIGHUP or SIGTERM (if
s6-log isn't blocking it because of the -p option), and letting the
corresponding s6-supervise process get it up again. But then I asked
myself what would be the consequences if the catch-all logger
restarts, whatever the reason might be.

I guess that for a short window of time, processes that send their
logs to the catch-all logger could potentially receive a SIGPIPE,
right? With a logging chain arrangement, these would mostly be s6-log
(or similar) processes, s6-supervise processes, and s6-svscan. And the
last two block SIGPIPE, according to the documentation. But would the
failing write operations to the logger's FIFO make this a (transient)
situation where logs could be lost?

Thanks,
G.


Re: [announce] New skarnet.org release, with relaxed requirements.

2015-10-22 Thread Guillermo
(I'm so lagging behind...)

2015-10-15 7:47 GMT-03:00 Laurent Bercot:
>
>  Additionally, the new s6 version features a new "-s" switch to
> s6-svscan, that makes it possible to configure s6-svscan's behaviour
> when it receives certain signals. This is useful, for instance, for
> Ctrl-Alt-Del management when s6-svscan is used as process 1.

Interesting. BusyBox poweroff can now work with s6-svscan if there's
an appropriate .s6-svscan/SIGUSR2 script, I tested it. I suppose
runit-init, if called as 'init 0', would work too with an appropriate
.s6-svscan/SIGCONT script, although that I cannot test.

So, I don't know if the handler scripts for diverted signals that the
new version of s6-linux-init-maker generates are intended to be
compatible with BusyBox. But if that's the intention, then the ones
for SIGUSR1 and SIGUSR2 are inverted: I think that the signal sent by
'busybox halt' to process 1 is SIGUSR1, so its handler should be the
one calling s6-svscanctl -0 $tmpfsdir/service, and the signal sent by
'busybox poweroff' is SIGUSR2, so its handler should be the one
calling s6-svscanctl -7 $tmpfsdir/service.

And speaking of s6-linux-init-maker, the -e VAR=VALUE option generates
a $basedir/env/VAR file that doesn't have a trailing newline after
VALUE, although I don't know if s6-envdir cares.

Thanks,
G.


Re: [announce] skarnet.org Fall 2015 update

2015-09-24 Thread Guillermo
2015-09-24 22:35 GMT-03:00 Laurent Bercot:
>
>  But I guess it doesn't cost much to add support for -wr/-wR, if
> you want it.

No need, I was just curious :)

Thank you,
G.


Re: [announce] skarnet.org Fall 2015 update

2015-09-24 Thread Guillermo
Hi,

2015-09-23 17:10 GMT-03:00 Laurent Bercot:
>
>  * s6-2.2.1.0
>  
>
>  [...]
>  New -wr and -wR options to s6-svc, s6-svlisten1 and s6-svlisten.

Reading the documentation, I couldn't tell the difference between
these and the -wu and -wU options. Looking at the code, -r seems to
mean "wait for a D event, and then for an u event", and -R seems to
mean "wait for a D event, and then for an U event". Is that right?

Also, s6-svwait used to have the same "waiting mode" options as
s6-svlisten1 and s6-svlisten, but didn't get -r and -R this time. Is
this intentional?

Thanks,
G.


Re: Preliminary version of s6-rc available

2015-08-25 Thread Guillermo
2015-08-25 14:01 GMT-03:00 Guillermo:
>
> But the thing is, no matter what I do, strace shows the '-T 3' never
> changes. Even when I explicitly put a 'timeout-up' file in the service
> definition directory, either containing '0' or a long enough timeout,
> or when I give s6-rc change a -t option.

OK, today's commit  fc91cc6cd1384a315a1f33bc83e6d6e9926fc791, which I
noticed after sending that last message, had already fixed that. I git
cloned again, rebuilt, and no longer have that problem :)

Thanks,
G.


Re: Preliminary version of s6-rc available

2015-08-25 Thread Guillermo
2015-08-25 11:01 GMT-03:00 Laurent Bercot:
>
>  I can't reproduce that one, can you please send me a "strace -vf -s 256"
> output of the s6-rc command that gives you these errors ?

OK, this is a tricky one. While looking at the strace output you asked
for, I realized myself what was going on. The s6-svc process 's6-rc
change' is spawning has options '-uwu -T 3', and my test system seems
to be slow enough (don't ask :) ) that it turns out a 3 millisecond
timeout is too small. A manual s6-svc invocation with that timeout
produces the same errors I saw when using s6-rc. And trying different
-T options, it takes over 30 ms to actually start the service I was
performing the tests with, and over 70 ms to do so without
s6-svlisten1 error messages. All my previous manual s6-svc invocations
didn't specify a timeout, so I didn't notice until now.

But the thing is, no matter what I do, strace shows the '-T 3' never
changes. Even when I explicitly put a 'timeout-up' file in the service
definition directory, either containing '0' or a long enough timeout,
or when I give s6-rc change a -t option. 's6-rc-db timeout' does show
the timeout specified by me, and correctly shows 0 when there is no
'timeout-up' file. So the real issue here appears to be that s6-svc is
always told to wait no longer than 3 ms, and, in particular, you
would't be able to make s6-rc wait forever in the "no readiness
notification" case.

Thanks,
G.


Re: Preliminary version of s6-rc available

2015-08-24 Thread Guillermo
Hi,

2015-08-24 10:52 GMT-03:00 Laurent Bercot:
>
>  Should be fixed in the latest git, thanks !

The s6-rc-fdholder-filler issue is fixed indeed, thank you. But I
still have the 's6-ftrigrd: fatal: unable to sync with client: Broken
pipe' one (the second one from my previous message). In fact, I made
further tests, and it happened consistently with longruns that did not
have a 'notification-fd' file in their service definition directories.
It didn't matter if they had pipes to other longruns or not.

Thanks,
G.


Re: Preliminary version of s6-rc available

2015-08-23 Thread Guillermo
Hello,

I have new issues with the current s6-rc git head (after yesterday's
bugfixes), discovered with the following scenario: a service database
with only two longruns, "producersvc" and "loggersvc", the latter
being the former' s logger. Loggersvc's service definition directory
had only 'consumer-for', 'run' and 'type' files, the run script being:

#!/bin/execlineb -P
redirfd -w 1 /home/test/logfile
s6-log t 1

This means no readiness notification for this service.

So the issues:

* s6-rc-fdholder-filler appears to have a bug when creating
identifiers for the writing end of the pipe between producersvc and
loggersvc:

$ s6-fdholder-list /s6rc-fdholder/s
pipe:s6rc-r-loggersvc
pipe:s6rc-w-loggersvc5\0xdaU

I also saw this with longer pipelines; identifiers for the reading
ends were OK, identifiers for the writing ends ended with random
characters. I didn't try to start producersvc, since I expected it to
fail trying to retrieve the nonexistent "pipe:s6rc-w-loggersvc" file
descriptor.

* s6-rc was unable to start loggersvc. More specifically, 's6-rc -v3
change loggersvc' produced this output:

s6-rc: info: bringing selected services up
s6-rc: info: processing service s6rc-fdholder: already up
s6-rc: warning: unable to access /scandir/loggersvc/notification-fd: No such file or directory
s6-rc: info: processing service loggersvc: starting
s6-ftrigrd: fatal: unable to sync with client: Broken pipe
s6-svlisten1: fatal: unable to ftrigr_startf: Connection timed out
s6-rc: warning: unable to start service loggersvc: command exited 111

However, a manual 's6-svc -uwu /loggersvc' succesfully
started the service, and the following test showed that it worked:

$ execlineb -c 's6-fdholder-retrieve /s6rc-fdholder/s
"pipe:s6rc-w-loggersvc5\0xdaU" fdmove 1 0 echo Test message'

(3 times)

$ cat logfile | s6-tai64nlocal
2015-08-23 18:09:10.822137309  Test message
2015-08-23 18:09:16.871541383  Test message
2015-08-23 18:09:18.219259082  Test message

So I'd have to conclude the problem is in s6-rc, although I didn't see
anything obvious that could launch an s6-svlisten1 process.

Thanks,
G.


Re: Preliminary version of s6-rc available

2015-08-21 Thread Guillermo
Hello,

I have the following issues with the current s6-rc git head (last
commit 8bdcc09f699a919b500885f00db15cd0764cebe1):

* s6-rc-compile doesn't copy the 'nosetsid' file in the service
definition directory of a longrun to the compiled database directory.

* s6-rc-compile produces an error if it is given the -u option with
more than one user ID. More precisely, if it was called with '-u
uid1,uid2,uid3,...', the error is 's6-rc-compile: fatal: unable to
symlink  to /servicedirs/s6rc-fdholder/data/rules/uid/: File exists'.

* s6-rc-compile produces an error if it is given the -g option without
the -u option, and produces rules directories that look wrong to me
otherwise (or I didn't understand them). More precisely, if it was
called with '-g gid1,gid2,gid3,...' and no -u option, the error is
's6-rc-compile: fatal: unable to mkdir /servicedirs/s6rc-fdholder/data/rules/uid/0/env: No such file or
directory'. And if it was called with '-u user -g gid1,gid2,gid3,...',
then:

  + Both s6rc-fdholder and s6rc-oneshot-runner have gid1, gid2, gid3,
... directories, but showing up in data/rules/uid, and

  + s6rc-fdholder has symlinks gid1, gid2, gid3, ... under
data/rules/gid, pointing to data/rules/uid/, but
s6rc-oneshot-runner has an empty data/rules/gid.

* 's6-rc-db pipeline' displays the expected result, but outputs a
's6-rc-db: fatal: unable to write to stdout: Success' message at the
end.

* Starting s6rc-fdholder produces an 's6-ipcclient: fatal: unable to
exec s6-rc-fdholder-filler: No such file or directory' error. I guess
because it exists in the libexecdir, which isn't normally in
s6-svscan's PATH, so the run script should probably use the full path?

Thanks,
G.


Re: Bug in ucspilogd v2.2.0.0

2015-08-21 Thread Guillermo
2015-08-12 2:54 GMT-03:00 Laurent Bercot:
>
>  ucspilogd doesn't care about the chosen trailer character. It will
> treat \0 and \n equally as line terminators - which is the only
> sensible choice when logging to a text file and prepending every
> line with some data.

Oh. Then logger version 2.26.2 should work fine adding the -T option,
and does for me using ucspilogd from s6-2.1.6.0 (which I believe
didn't change for 2.2.0.0).

$ logger -V
logger from util-linux 2.26.2

$ BANNER="ucspilogd test" s6-ipcserver test-socket fdmove -c 1 2
ucspilogd BANNER 2>out &
$ printf "This\nshould\n" | logger -Tu test-socket -p user.warning
$ printf "work\nfine\n" | logger -Tu test-socket -p user.alert
$ logger -Tu test-socket -p user.info "then."
$ cat out
ucspilogd test: user.warn: Aug 2121:05:40 test: This
ucspilogd test: user.warn: Aug 21 21:05:40 test: should
ucspilogd test: user.alert: Aug 21 21:05:44 test: work
ucspilogd test: user.alert: Aug 21 21:05:44 test: fine
ucspilogd test: user.info: Aug 21 21:05:51 test: then.

Of course, it worked with /dev/log too.

Thanks,
G.


Re: Bug in ucspilogd v2.2.0.0

2015-08-11 Thread Guillermo
2015-08-10 18:15 GMT-03:00 Laurent Bercot:
>
>  Oh, it's a mess. A huge mess; there doesn't seem to be any authority
> on the details of the syslog protocol. No normative body, the client
> is in the libc, the server is an application: a definite recipe for
> success!

I don't know about syslog on /dev/log, but for syslog over a network
there is this:

RFC 3164 - The BSD Syslog Protocol
http://www.rfc-editor.org/rfc/rfc3164.txt

RFC 5424 - The Syslog Protocol (transport-independent part, obsoletes RFC 3164)
http://www.rfc-editor.org/rfc/rfc5424.txt

And the transport mappings:

RFC 5426 - Transmission of Syslog Messages over UDP
http://www.rfc-editor.org/rfc/rfc5426.txt

RFC 6587 - Transmission of Syslog Messages over TCP
http://www.rfc-editor.org/rfc/rfc6587.txt

RFC 5425 - Transport Layer Security (TLS) Transport Mapping for Syslog
http://www.rfc-editor.org/rfc/rfc5425.txt

RFC 6012 - Datagram Transport Layer Security (DTLS) Transport Mapping for Syslog
http://www.rfc-editor.org/rfc/rfc6012.txt

Logger has --rfc3164 and --rfc5424 options, and has references to the
above documents in the source code and man page, so AFAICS it is aware
of and complies with them for messages sent over a network to a remote
syslog server, and uses the same code for communication over UNIX
domain sockets. The "non-transparent framing" described in the TCP
document (section 3.4.2 starting on page 7) is relevant to the
reported issue.

Glancing over the version 2.26.2 source code:

https://git.kernel.org/cgit/utils/util-linux/util-linux.git/plain/misc-utils/logger.c?id=v2.26.2

I understand that logger uses datagram mode UNIX domain sockets unless
there is a -T option without a -n option (check the logger_open and
unix_socket functions), and uses non-transparent framing in stream
mode, but with LF ('\n') instead of NUL ('\0') as the trailer
character, as suggested by the TCP mappings document (while
acknowledging that NUL is also in widespread use). And version 2.27
will add an --octet-count option to select octet-counting framing.

Moreover, when reading from stdin (check the logger_stdin function)
logger only sends a message when it receives LF characters (discarding
them) or detects an EOF. It copies verbatim every other character to
the output buffer, but later it uses string manipulation functions
that treat NUL as an end-of-string marker, so I guess that would
explain the weird behaviour of ignoring all input between a NUL and a
subsequent LF. This probably translates to "logger doesn't expect NUL
characters in its input stream, don't do that".

As for ucspilogd... who knows. The printf examples without the NULs
would supposedly work as expected using ucspilogd with option "stream
mode sockets using non-transparent framing with LF as trailer
character", if it had one, and piping to 'logger -T'. Or using
ucspilogd with option "datagram mode sockets", which would also make
musl syslog() work. And GNU libc syslog() works fine using ucspilogd
with current "stream mode sockets using non-transparent framing with
NUL as trailer character" behaviour :P

Cheers!
G.


Re: Preliminary version of s6-rc available

2015-07-19 Thread Guillermo
2015-07-12 2:59 GMT-03:00 Laurent Bercot:
>
>  s6-rc is available to play with.
> [...] I decided to publish what's already there, so you can test it and
> give feedback while I'm working on the rest. You can compile service
> definition directories, look into the compiled database, and run the
> service manager. It works on my machines, all that's missing is the
> live update capability.

Hi,

Well, I haven't been very lucky with oneshots. First, the "#!execline"
shebang with no absolute path doesn't work on my system, even if the
execlineb program can be found via the PATH environment variable.
Neither does "#!bash", "#!python", or any similar construct. If I run
a script from the shell with such a shebang line I get a "bad
interpreter: No such file or directory" message. And s6-supervise
fails too:

s6-supervise (child): fatal: unable to exec run: No such file or directory
s6-supervise : warning: unable to spawn ./run -
waiting 10 seconds

And because s6rc-oneshot-runner has a run script with an "#!execline"
shebang, it cannot start, and therefore oneshots don't work :)
However, I was able to work around this in two ways: either by just
modifying s6rc-oneshot-runner's run script in the servicedirs/
subdirectory of the live state directory, or by using Linux
binfmt_misc magic[1]. So now I'm really curious about how the
"#!execline" shebang worked on your test systems.

But once I could get s6rc-oneshot-runner to start, I ran into another
problem. "s6-rc change" then failed to run my test oneshot with this
message:

s6-ipcclient: fatal: unable to connect to
/path-to/live/servicedirs/s6rc-oneshot-runne: No such file or
directory
s6-rc: warning: unable to start service : command exited 111

"/path-to/live/" represents here what was the full path of the live
state directory, and the "" was really a string of random
characters. I suppose this was meant to be the path to
s6rc-oneshot-runner's local socket, but somehow ended up being
gibberish instead. So oneshots still don't work for me :(

Longruns without a logger work for me as expected, and I haven't tried
loggers, bundles and dependencies yet.

Now some other general comments:

* It looks like s6-rc-compile ignores symbolic links to service
definition directories in the source directories specified in the
command line; they seem to have to be real subdirectories. I don't
know if this is deliberate or not, but I'd like symlinks to be allowed
too, just like s6-svscan allows symbolic links to service directories
in its scan directory.

* I'm curious about why is it required to also have a "producer" file
pointing back from the logger, instead of just a "logger" file in the
producer's service definition directory. Is it related to the "parsing
sucks" issue?

* It doesn't really bother me that much, but it might be worth making
"down" files optional for oneshots, with an absent file being the same
as one contanining "exit", just like "finish" files are optional for
longruns.

* I second this:

2015-07-14 13:23 GMT-03:00 Colin Booth:
>
> s6-rc-init: remove the uid 0 restriction to allow non-privileged
> accounts to set up supervision trees.

I test new versions of s6 on an entirely non-root supervision tree,
with services that can be run by that user, separate of the
"system-wide" (privileged) supervision tree, if any. And it is also
the way I'm testing s6-rc now. But, independently of any potential
use-cases, I really see it this way: s6-svscan and s6-supervise are
already installed with mode 0755 and can therefore happily run as any
user besides root. So it is possible to build a non-root supervision
tree, and if some services refuse to run because of "permission
denied" errors, they will be gracefully dealt with just like with any
other failure mode; the user will know via the supervision tree logs,
and no harm is done. So if a non-root supervision tree is allowed, why
not a service manager on top of it, too?

2015-07-16 19:16 GMT-03:00 Laurent Bercot:
>
>  I understand. I guess I can make s6-rc-init and s6-rc 0755 while
> keeping them in /sbin, where Joe User isn't supposed to find them.

It would be nice if s6rc-oneshot-runner's data/rules directory (for
s6-ipcserver-access on the local socket) could also be changed, so it
doesn't allow only root. For example, allow the user s6-rc-init ran as
instead (or in addition to root), or allow the specification of an
allowed user, or a complete rulesdir / rulesfile, with an -u, -i or -x
option to s6-rc-compile or sc-rc-init. The user checked against the
data/rules rulesdir would be the one s6-rc was run as, right? So it
defines which user is allowed to run oneshots?

And finally, for the record, it appears that OpenRC doesn't mount /run
as noexec, so at least Gentoo in the non-systemd configuration, and
probably other [GNU/]Linux distributions with OpenRC as part of their
"init systems", won't have any problems with service directories under
/run.

Cheers!
G.

[1] http://www.kernel.org/doc/Documenta

Re: [announce] s6-linux-init-0.0.1.0

2015-06-17 Thread Guillermo
Hi,

2015-06-17 17:27 GMT-03:00 Laurent Bercot:
>
>  s6-linux-init-0.0.1.0 is out.
>  It is a new package that, for now, only contains one program,
> and more documentation than source code. :)
>
>  Its goal is to automate the creation of stage 1 (/sbin/init) binaries
> for people who want to run s6-svscan as process 1.

Nice!

>  Bug-reports and suggestions welcome, especially since it's still brand
> new and probably rough around the edges.

I did a quick run and found out that in generated execline scripts
except the stage 1 init, the shebang line starts with "#!execlineb".
If I'm not mistaken, it's because the EXECLINE_EXTBINPREFIX macro
(from execline/config.h) is being used, which, unless the execline
package has been configured with the --enable-slashpackage option,
expands to nothing. The bindir variable should be used instead, I
suppose.

Thanks,
G.


Re: Native readiness notification support in s6-supervise

2015-06-15 Thread Guillermo
2015-06-15 16:29 GMT-03:00 Laurent Bercot:
>
>  The feature is available in the latest s6 git head, which is also
> a release candidate for 2.1.4.0.

Hello,

In the examples/ROOT/img/services-local/syslogd-linux subdirectory,
there is an implementation of the syslogd service for Linux , using
s6-ipcserver with the -1 option and s6-notifywhenup for readiness
notification. Maybe you could modify it in the s6 git header to use
the new s6-supervise feature, too.

Also, I happen to use that example in my Gentoo with s6 virtual
machine, almost verbatim I think, so it would come in handy :)

Thanks,
G.


Re: execline's pipeline (and forbacktickx) with closed stdin

2015-05-10 Thread Guillermo
Laurent Bercot  wrote:
>
>  Did you really manage to umount /dev (maybe) and mount a tmpfs
> over it (for sure) with fds still open to the old /dev ? Without
> an EBUSY error ? If it's the case, and you're using Linux, then the
> kernel's behaviour changed.

Yeah, it's Linux. A Gentoo virtual machine with kernel version 3.14
(plus Gentoo patches).

It has sysvinit + OpenRC, one of the two officialy supported
configurations (the other one being systemd), and I tried to replace
it with s6 and keep OpenRC for oneshot initializations tasks, since,
well, they are already written :) Long-lived processes have an actual
servicedir in process 1 s6-svscan's scandir, OpenRC would otherwise
launch them unsupervised.

I mostly followed the example init scripts, but I did deviate, among
other things to delegate tasks to OpenRC (as a bundle of oneshot
services in s6-rc terminology, hehe). And now that you remind me they
were originally there, I don't have the fdclose 1 and fdclose 2 in
stage 1 either. I wanted the stage 2 init to have open FDs to
/dev/console, to show OpenRC's output.

The /dev handling is taken care of by an OpenRC-provided oneshot
service in its sysinit runlevel, called "devfs". It mounts a devtmpfs
(or tmpfs, depending on kernel configuration) on /dev, and creates
some essential device nodes. The rest is left to an udev daemon. So
the kernel needs either CONFIG_DEVTMPFS=y (Gentoo's default) or
CONFIG_TMPFS=y. The devfs service does the equivalent of a "mount -t
devtmps dev /dev", or a "mount -t devtmps -o remount dev /dev" if
there is a filesystem already mounted at /dev. No kernel complaints
when sysvinit is process 1 (I don't know what it does with its open
FDs), no kernel complaints with the custom stage 1 init having devfs
start with open FDs to /dev/console and /dev/null.

There is no unmounting /dev (maybe that's specifically what would
trigger an EBUSY?). The kernel has CONFIG_DEVTMPFS_MOUNT=n, so no
devtmpfs is mounted after the rootfs. And when I boot with
init=/bin/bash, "mount" shows nothing but / and a manually mounted
/proc. However, /dev is not empty. I don't know where the device nodes
came from, but they are there. I didn't create them, maybe they were
put in there during Gentoo's installation, I don't know. I fact, the
after boot /dev looks quite different (and I do see a devtmpfs
mounted). A "mount --bind / /tmp" shows me the original boot time
/dev, so I suppose the root FS has actual static device nodes.

The bottom line is, I'm not entirely sure what magic is involved, but
everyting worked :)

s6-svscan-+-VBoxClient---{SHCLIP}
  |-VBoxClient---{X11 monitor}
  |-VBoxClient---{Host events}
  |-VBoxClient-+-{HGCM-NOTIFY}
  |`-{X11-NOTIFY}
  |-dbus-daemon
  |-dbus-launch
  |-2*[s6-supervise]
  |-8*[s6-supervise---s6-log]
  |-s6-supervise---systemd-udevd
  |-5*[s6-supervise---agetty]
  |-s6-supervise---login---bash---startx---xinit-+-X
  |  `-sh---openbox
  |-s6-supervise---VBoxService-+-{automount}
  ||-{control}
  ||-{cpuhotplug}
  ||-{memballoon}
  ||-{timesync}
  ||-{vminfo}
  |`-{vmstats}
  |-s6-supervise---s6-ipcserverd---2*[ucspilogd]
  |-s6-supervise---ucspilogd
  |-s6-supervise---dbus-daemon
  |-s6-supervise---dhcpcd
  `-urxvtd---bash---pstree

Cheers,
G.


Re: execline's pipeline (and forbacktickx) with closed stdin

2015-05-09 Thread Guillermo
Laurent Bercot wrote:
>
> So I've pushed a fix to the current execline
> git, please tell me if it works for you.

Yes, that worked.

>  However, POSIX considers that UB is acceptable when you run a
> program with 0, 1 or 2 closed: look for "If file descriptor 0" in
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/execve.HTML

Ah, I see.

>  I'll try to support the case as much as I can, and squash those bugs
> whenever they're found, but still, don't do that - Big Bad POSIX will
> bite you if you do.

Let's narrow the scope:

>  I consider it a bug, because there are cases where I do need to
> run programs with fds 0, 1 or 2 closed, and I generally try to
> pay attention to this.

I ran into this while experimenting with the example / template stage
1 and 3 init scripts that come with s6's source code. Both of them do
an early fdclose 0 to ignore input. Wouldn't that be tempting the
demons to fly through your nose, then? :)

Anyway, I had either replaced the early fdclose 0 with redirfd -r 0
/dev/null (and also realized that worked by accident, because I
somehow have a nonempty /dev at startup) or delayed it a bit. I
suppose that's good enough...

Thanks,
G.


execline's pipeline (and forbacktickx) with closed stdin

2015-05-08 Thread Guillermo
Hello,

It appears that if the pipeline program is run with a closed stdin,
like when being part of an execline script that does a previous
"fdclose 0", the program it execs into will get a "bad file
descriptor" error. For example, if I do something like "pipeline {
some-program } grep $regex" I get a "grep: (standard input): Bad file
descriptor". And if I do something like "forbacktickx file { ls
$directory } some-program", I get a "forstdin: fatal: unable to
skagetlnsep: Bad file descriptor". The documentation says forbacktickx
is a wrapper around pipeline and the forstdin program, so I guess this
would be the same pipeline thing.

Are we not supposed to use pipeline or forbacktickx with a closed
stdin, or is this something that needs fixing?

Thanks,
G.


Re: s6 / skalibs runtime error - EOVERFLOW?

2015-03-06 Thread Guillermo
>  Can you try linking s6 against the latest skalibs git, and check
> that it works for you ?

It works now, thank you!

G.


s6 / skalibs runtime error - EOVERFLOW?

2015-03-06 Thread Guillermo
Hello,

Well, that's what I get at least on my 32-bit GNU/Linux system (Gentoo
Linux, which means GNU libc). For example:

* s6-ftrig-wait dies with a 'fatal: unable to ftrigr_startf: Value too
large for defined data type' message.
* s6-supervise dies with a 'fatal: unable to iopause: Value too large
for defined data type' message.
* s6-svscan issues a 'warning: unable to iopause: Value too large for
defined data type' message and then execs into its 'crash' script.

This all means libc functions failing with EOVERFLOW, right?

The last versions that didn't behave this way were s6-2.0.0.1 +
skalibs-2.1.0.0. Maybe there is a problem somewhere in the time
manipulation functions that were changed in skalibs-2.2.0.0?

These are the skalibs sysdeps:

target: i686-pc-linux-gnu
clockrt: yes
clockmon: yes
endianness: little
sizeofushort: 2
sizeofuint: 4
sizeofulong: 4
sizeofgid: 4
sizeoftime: 4
accept4: yes
ancilautoclose: no
cmsgcloexec: yes
devurandom: yes
eproto: yes
eventfd: yes
flock: yes
getpeereid: no
sopeercred: yes
getpeerucred: no
ipv6: yes
malloc0: yes
msgdontwait: yes
nbwaitall: yes
openat: yes
linkat: yes
pipe2: yes
posixspawn: yes
ppoll: yes
revoke: no
sendfile: yes
setgroups: yes
settimeofday: yes
signalfd: yes
splice: yes
strcasestr: yes
uint64t: yes
devrandom: yes

Thanks,
G.